Configuration¶
Various BNDL components can be configured (which often also can be set programmatically as
parameters). At runtime BNDL configuration data is kept in a bndl.util.conf.Config
instance. bndl.compute and bndl.execute have such an instance located in the
Compute/ExecuteContext.
Configuration data¶
Configuration can be supplied
- directly in python (bndl.util.conf.Config
supports the __get/setitem__ protocol).
- through a configuration file
- through the BNDL_CONF environment variable
- command line options
Configuration object¶
bndl.util.conf.Config
is a dict like object (it supports the __get/setitem__ protocol).
bndl.compute and bndl.execute have such an instance located in the Compute/ExecuteContext.
For example:
>>> from bndl.compute.run import ctx
>>> ctx.conf['bndl.compute.worker_count']
2
>>> ctx.conf['foo'] = 'bar'
>>> ctx.conf
<Conf {'bndl.compute.worker_count': '2', 'foo': 'bar', 'bndl.net.listen_addresses': 'localhost:1234'}>
Config file¶
Configuration data is read from bndl.ini / .bndl.ini from the home directory (to whatever ~
expands) through configparser.ConfigParser
. ini sections and keys are simply joined with a .
For example:
$ cat bndl.ini
[bndl]
compute.worker_count = 2
[bndl.net]
listen_addresses = localhost:1234
$ bndl-compute-shell
...
In [1]: ctx.conf
Out[1]: <Conf {'bndl.compute.worker_count': '2', 'bndl.net.listen_addresses': 'localhost:1234'}>
In [2]: ctx.worker_count
Out[2]: 2
In [3]: ctx.node.addresses
Out[3]: ['localhost:1234']
Environment variable¶
Configuration data is read from the BNDL_CONF
environment variable. Configuration data can be
supplied as key=value other=value foo=bar
. Spacing is parsed through shlex.split
. For
example:
$ BNDL_CONF='bndl.compute.worker_count=3 foo=bar' python
>>> from bndl.compute.run import ctx
>>> ctx.await_workers()
3
>>> ctx.conf['foo']
'bar'
Command line options¶
bndl-compute-workers and bndl-compute-shell set:
bndl.net.listen_addresses
,bndl.net.seeds
andbndl.compute.worker_count
through the –listen-addresses, seeds and worker-count flags. See also Getting started.
Precedence¶
Configuration data is read in the following order:
- Default values set as global data
- Config files
- ~/bndl.ini,
- ~/.bndl.ini,
- ./bndl.ini and then
- ./.bndl.ini
- BNDL_CONF environment variable
- Configuration object __init__
- Values set on the configuration object after it’s created
I.e. as configuration data is read (updated) in this order, in a way these sources of configuration data can be considered as layers of defaults / values.
Configuration options¶
The following keys are used throughout BNDL. As this list is manually curated, it may become stale (PR’s for improvements are very welcome!).
Networking¶
-
bndl.net.
listen_addresses
= <bndl.util.conf.CSV object>¶ **The addresses for the local BNDL node to listen on. Defaults to [‘tcp* – //localhost.localdomain* – 5000’].
-
bndl.net.
seeds
= <bndl.util.conf.CSV object>¶ The seed addresses for BNDL nodes to form a cluster through gossip.
Execute¶
BNDL executes tasks on workers (to compute a DAG of datasets and their partitions); if a task fails
attempts
times, the job fails.
-
bndl.execute.
attempts
= <bndl.util.conf.Int object>¶ the number of times a task is attempted before the job is cancelled Defaults to 1.
Workers execute concurrency
tasks simultaneously for each job started.
-
bndl.execute.
concurrency
= <bndl.util.conf.Int object>¶ the number of tasks which can be scheduled at a worker process at the same time Defaults to 1.
Warning
Currently worker-task assignment is orchestrated on a per-job basis. So when multiple jobs are
executed, workers will run tasks from each job concurrently, regardless of the concurrency
settings.
Shuffle¶
Shuffles are executed in memory for as long as a worker consumes less than max_mem_pct
/
os.cpu_count()
(in the assumption that one worker per core is used. Over this limit, shuffle
data is spilled to disk. Not that shuffle data is also spilled when less than 10% or 1 GB of system
wide memory is available. Shuffle data is spilled in blocks (approximately) no larger than
block_size_mb
.
-
bndl.compute.shuffle.
max_mem_pct
= <bndl.util.conf.Float object>¶ Percentage (1-100) indicating the amount of memory to be used for shuffling. Defaults to 50.
-
bndl.compute.shuffle.
block_size_mb
= <bndl.util.conf.Float object>¶ Target (maximum) size (in megabytes) of blocks created by spilling / serializing elements to disk Defaults to 4.
Broadcast¶
Broadcast variables are exchanged in blocks somwhere between:
-
bndl.compute.broadcast.
min_block_size
= <bndl.util.conf.Float object>¶ The maximum size of a block in megabytes. Defaults to 4.
-
bndl.compute.broadcast.
max_block_size
= <bndl.util.conf.Float object>¶ The minimum size of a block in megabytes. Defaults to 16.
When min_block_size
< max_block_size
the number of blocks is ctx.worker_count
unless
they would be to small or large.