The C++ API for partitioning a model over distributed and local hardware is described here.
Load balancing generates a
domain_decomposition given an
and a description of the hardware on which the model will run. Currently Arbor provides
one load balancer,
partition_load_balance(), and more will be added over time.
If the model is distributed with MPI, the partitioning algorithm for cells is
distributed with MPI communication. The returned
describes the cell groups on the local MPI rank.
domain_decomposition type is
independent of any load balancing algorithm, so users can define a
domain decomposition directly, instead of generating it with a load balancer.
This is useful for cases where the provided load balancers are inadequate,
or when the user has specific insight into running their model on the
When users supply their own
domain_decomposition, if they have
Gap Junction connections, they have to be careful to place all cells that
are connected via gap junctions in the same group.
A -gj- B -gj- C and
D -gj- E.
Cells A, B and C need to be in a single group; and cells D and E need to be in a
single group. They may all be placed in the same group but not necessarily.
Be mindful that smaller cell groups perform better on multi-core systems and
try not to overcrowd cell groups if not needed.
Arbor provided load balancers such as
guarantee that this rule is obeyed.
partition_load_balance(const recipe &rec, const arb::context &ctx)¶
The algorithm counts the number of each cell type in the global model, then partitions the cells of each type equally over the available nodes. If a GPU is available, and if the cell type can be run on the GPU, the cells on each node are put one large group to maximise the amount of fine grained parallelism in the cell group. Otherwise, cells are grouped into small groups that fit in cache, and can be distributed over the available cores.
The partitioning assumes that all cells of the same kind have equal computational cost, hence it may not produce a balanced partition for models with cells that have a large variance in computational costs.
Documentation for the data structures used to describe domain decompositions.
Used to indicate which hardware backend to use for running a
Use multicore backend.
Use GPU back end.
Setting the GPU back end is only meaningful if the
cell_grouptype supports the GPU backend.
Describes a domain decomposition and is solely responsible for describing the distribution of cells across cell groups and domains. It holds cell group descriptions (
groups) for cells assigned to the local domain, and a helper function (
gid_domain) used to look up which domain a cell has been assigned to. The
domain_decompositionobject also has meta-data about the number of cells in the global model, and the number of domains over which the model is distributed.
The domain decomposition represents a division all of the cells in the model into non-overlapping sets, with one set of cells assigned to each domain. A domain decomposition is generated either by a load balancer or is directly specified by a user, and it is a requirement that the decomposition is correct:
A function for querying the domain id that a cell assigned to (using global identifier
gid). It must be a pure function, that is it has no side effects, and hence is thread safe.
Number of domains that the model is distributed over.
The index of the local domain. Always 0 for non-distributed models, and corresponds to the MPI rank for distributed runs.
Total number of cells in the global model (sum of
num_local_cellsover all domains).
The indexes of a set of cells of the same kind that are group together in a cell group in a