Tutorial

From Biocellion
Revision as of 18:08, 23 March 2017 by Simon Kahan (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

( This tutorial predates Biocellion 1.2; for 1.2 and later releases, see the Tutorial Series )

Introduction

Biocellion is an agent-PDE (Partial Differential Equation) based simulation framework, designed for solving general models of multicellular biological systems. In the following, we will assume you have some very basic knowledge of Biocellion, in that you may have read a recent article or that you have access to the manual. I suggest reading the introduction of the manual and example files before reading this tutorial. We will also assume that you are a model developer who wishes to implement specific biological models. Therefore, the following is to facilitate the initial development of model specific code.

To start, we will stress that Biocellion was not created to generally ease the task of designing and implementing specific model code. It was instead designed to deal with general computational problems, like solving PDEs and of course much more, in a way that is computationally efficient and scalable (here we mean that the code can be scaled to multiple processors or even multiple nodes with little effort by the modeler). This can be of great benefit as models increase in complexity and size, and much effort has been taken to achieve speeds and scales not currently available by other modeling frameworks. However, implementation of specific model logic (i. e. code) is the modelers responsibility.

Biocellion is a large, complex framework, but like all software programs it consists of various functions (or methods) that are called in a specific fashion to execute a specific model. The specific code underlying a large number of these functions will be created by you, the modeler. This is where and how a specific model is defined. Unlike some tools there is currently no GUI or simple XML file to define models, you will have to implement model logic in c++, but don’t fear, only a modest understanding of sequential c++ programing is needed. While this may seem like a disadvantage, it allows the framework to be very general and flexible which opens it up to almost any model imaginable. However, we do hope that as the Biocellion community grows, specific code and strategies for common situations will be developed and shared by users, which will make model development an ever improving process.

The model specific functions that you will be editing are discussed in the manual, and I suggest glancing over them before you start developing. The functions are organized into several files including a model header and several model routines. The important thing here is that these functions define the three limitations you must work within, that is: what information can be changed (output), what information is available (input) and when it can be changed (a graphic workflow on when functions are called is available in the manual). Within the functions you can include any legitimate c++ code you would like, although specific strategies will lend themselves to common situations, as we shall eventually see.

The general concept of Biocellion revolves around two specific elements, the agents and the PDEs. These elements interact on a regular cubic grid, the PDEs are solved over the grid and the agents live on the grid. The elements have a number of properties that are updated by both the main framework as well as your model specific code. Getting down to specifics, various aspects of the model are defined using multiple Biocellion data types, e.g. there is an agent data type called SpAgent. These data types are described in the manual and, again, I suggest glancing over them before you start developing. Learning the data types can give you a real feel for the organization and type of information in a Biocellion model. These data types describe specific elements of a Biocellion model and all have various properties, e.g. instances of SpAgent will hold SpAgentState which in turn houses many properties defining the state of a specific agent. In general there will be a time and a place within the model specific code that these properties can be accessed or changed, i.e. specific methods. This harps back to the three limitations mentioned above. Biocellion attempts to provide all necessary information within a function to properly define the output; however, when and what can be changed or made visible must be balanced with computational efficiency and scalability, so occasionally you may run into problems and will need to be creative. Incidentally, you should report these limitations on the Biocellion website, as the developers are constantly interested in improving the Biocellion framework.

One Biocellion concept worth mentioning upfront is the use of model specific variables. While the Biocellion framework will assume some common properties shared between all models, a main goal was to allow modeling flexibility. To this end, you will find that some data types have both required properties as well as model specific properties. These variables are currently available in two types, reals and ints, and are housed within an instance of a data type in an effective array, the size of which is defined by the modeler. For example, SpAgentState holds an effective real and integer array for which functions, like setModelReal, are available to call and modify specific values based on a passed in index. Later we will see a basic strategy/example for defining, accessing and modifying these variables.


Cell Sorting, a simple agent based example

The best thing you can do to learn how to develop models is to go through some working examples. Here, we will start with a simple example of cell sorting. The goal here will be to go through the example code and show how it can be run. Since this is the first example, I will try to be as detailed as possible, especially when describing the code. This example will be limited to describing different agents and some basic mechanical interactions between these agents. More complex features, such as modeling PDEs will be introduced in later examples. Furthermore, we will not discuss the relative merits of various rules as related to the biological function of interest, and instead we will focus on how the rules are implemented in a Biocellion model. To follow along for this example you will need the cell sorting model code: [*** insert link to code].

The cell sorting example simulates the ability of cells to spontaneously form patterns based on differential adhesion. Specifically, we will consider two types of cells, and the basic rule that cells of the same type are attracted to each other, i.e. there is some underlying mechanism for cell-cell adhesion. Each cell will be modeled by a distinct spherical agent with a radius of 4 um. Cells that overlap will push each other to remove this overlap (cell-cell shoving). Cells that are within 10 um (distance from one center to the next) will have the potential to pull towards one another (cell-cell adhesion), in this case only if the cells are of the same type. The cells will be placed randomly in a large cubic box, and the simulation will be run to allow the cells to spontaneously form a pattern based on the differences in the adhesion force.

Model Definitions To begin with lets define some common model properties. This is done in a single header file called ‘model_define.h’. This is the place where you should define all model constants, including the different agent types and, as we will see in a later example, the different types of model specific variables.

If you open ‘model_define.h’ you should see:

<source lang="cpp">

  1. ifndef __MY_DEFINE_H__
  2. define __MY_DEFINE_H__
  1. include "biocellion.h"

/* define constants to be used inside model functions */

/* MODEL START */

const REAL MY_PI = 3.14159265358979323846;

typedef enum _cell_type_e { CELL_TYPE_A, CELL_TYPE_B, NUM_CELL_TYPES } cell_type_e;

typedef enum _model_rng_type_e { MODEL_RNG_UNIFORM, NUM_MODEL_RNGS } model_rng_type_e;

const REAL IF_GRID_SPACING = 10.0;

const REAL A_CELL_RADIUS[NUM_CELL_TYPES] = { 4.0, 4.0 }; const REAL A_CELL_DENSITY_PER_UB[NUM_CELL_TYPES] = { 0.8, 0.8 }; const REAL A_CELL_D_MAX[NUM_CELL_TYPES] = { 10.0, 10.0 };

const REAL ADHESION_S = 0.5;

/* MODEL END */

  1. endif/* #ifndef __MY_DEFINE_H__ */

</source>

In nearly all cases you will want to edit code only between the comments /* MODEL START */ and /* MODEL END */. Here we can see the declaration of enumerations as well as scalar and array constants, that will be used latter in the model specific code. One thing worth stressing is the use of enumerations. Although, Biocellion does not require it, you should always define enumerations (using informative names) for any indexing scheme that will be used regularly throughout your code. Here we can see that the enumeration cell_type_e is used to define two cell types CELL_TYPE_A and CELL_TYPE_B, as well as the number of cell types, NUM_CELL_TYPES. We also note that it is standard c practice to use all capital letter when defining constants. In order, these constants will be set to 0, 1 and 2. Although you could use the integers directly in the code, this allows us to instead use a descriptive name, and it also allows us to add additional cell types latter, in any order, without having to go back and change existing code.

You can also see the enumeration model_rng_type_e, which is used to define the types of random number generators used in the code. Latter this will be used to define a uniformly distributed random number generator, which we can access using the Biocellion support function getModelRand. This brings us to an important point, the standard c++ random number generators are not ‘thread safe’ (for a general discription of thread safe see the Wikipedia page [***insert link here]), so you should use the built in Biocellion support function. Throughout your code, you should always use ‘thread safe’ functions to allow for proper parallelization of your code at run time.

In ‘model_define.h’ we also see the definition of scalar and array constants. One thing to note is that Biocellion has redefined many of the basic data types, e.g. the use of type REAL. One advantage here is that REAL can consistently be double or single precision based on more advanced Biocellion options. For more on basic data types see section 2.1 of the manual, advanced users can see these definitions in the Biocellion package under ‘include/base_type.h’.

Moving along, we can see a definition for the constant pi, an adhesion constant and the interface grid spacing, IF_GRID_SPACING. Latter we will use IF_GRID_SPACING to define the size of the regular cubic grid that the agents live on. This size represents the geometric resolution as well as the maximum interaction distance for agents, for more information see section 1.2 of the manual. Although the spacing is actually set elsewhere, we define the value here to keep all of our constant organized in a single file.

Finally, we address the constant arrays. Here, arrays are used to define constants which are defined for both cell types, but who’s specific values may differ between the types. Indexing these arrays using the proper cell_type_e will give us a very natural way to access these properties in the code. The down side of this strategy is that we need to always be sure this array properly corresponds to the definitions in cell_type_e. Using the constant NUM_CELL_TYPES for sizing the array prevents us from adding more cell types without values, but if we change the order of the cell types, it is up to us to remember to change the corresponding values in the array.

Model Configuration There are a number of properties that need to be configured for all Biocellion models. This is done using the functions in the source file ‘model_routine_config.cpp’. Here we will go through the functions one by one, but we will skip empty function which do not pertain to this model. The functions are described in more technical detail in section 4.2 of the manual, so here we will try to describe how they pertain to this specific model and add tips where appropriate.

updateIfGridSpacing is used to set the size of the interface grid spacing as described above. Recall that we defined the constant, IF_GRID_SPACING, here so that the actual value could be set in our common model_define header.

updateOptModelRoutineCallInfo is related to running code that is specific to the grid. Since this model does not use the grid specific functions, we will set the values to zero (which will prevent these functions from being called and save us some cpu time).

updateDomainBdryType sets the type of boundary domain. Predefined enumerations (section 2.3 of manual) are used to set this value and the two options currently available are hard wall and periodic. Here we will be simulating cells trapped in a large cube so, we set the boundaries to hard wall, although in this example either would be acceptable.

There another thing of note in this function, we see our first example of the use of builtin macros: CHECK( DIMENSION == 3 );. This will execute during run time, if CHECK is enabled, and if the inside statement is false, it will abort the run with a message as to its failure. This is a great way to force some safety into our code; although, in this example, DIMENSION is set by the framework to be 3 and so we should never fail this check. To enable CHECK set CHECK_FLAG = -DENABLE_CHECK=1 in the file ‘Makefile.model’ in the Biocellion package, note that disabling CHECK will improve performance.

We also see our first example of a loop which uses the Biocellion base type for signed integers S32 (32 bit signed integer). This is my default choice for integer types.

updatePDEBufferBdryType sets the boundaries between PDE buffer and interface grids. Currently the only option is for a hard wall.

updateTimeStepInfo sets the baseline time step and the state-and-grid time step. Briefly, Biocellion runs three different time scales, the coarsest of which is the agent time scale which is defined here as durationBaselineTimeStep and the highest resolution is the PDE time scale defined elsewhere. The mid time scale is used is called stateAndGridTimeStep and is to sync the agent state update with the changes in the environment (also grid updates) and is set using numStateAndGridTimeStepsPerBaseline. We see here that numStateAndGridTimeStepsPerBaseline is an integer and so the actual time step would be durationBaselineTimeStep/numStateAndGridTimeStepsPerBaseline (more information is available in section 1.3 of the manual). Here, we do not have grid specific code so there is no reason to have a more refined state-and-grid step. It may have been better practice to define these values using constants in ‘model_define.h’, similar to what was done for the grid spacing above.

updateSyncMethod sets how additional pairwise adjusted properties are synced up, again we use predefined enumerations for this (see section 2.3 of the manual). Pairwise adjustments are, or can be, done in parallel and after this is completed we need a consistent way to merge or sync the results. A builtin example is the force property we will see later. In brief, a force property is defined for each agent. Through pairwise mechanical interactions, the force property is incremented such that the final value is a sum of all pairwise forces acting on that agent. The syncing for the force property is done automatically for all Biocellion models. In general, Biocellion allows an arbitrary number of user defined pairwise adjusted properties for both agents and grids; however, in this model we do not make use of this feature, but something needs to be set in this function.

updateSpAgentInfo initializes some of the basic information regarding agents. This is how we would size the arrays for model specific agent variables; although, in this model we have no additional variables. Most importantly, we note the property info.dMax, which sets the maximum interaction distance for this cell type. Again, this value must be equal to or less than the interface grid spacing set above. Here we can see how the constant array, A_CELL_D_MAX, provided a natural way to access the correct constant value for each agent type.

Finally we have the function updateRNGInfo, which is used to setup the different random number generators used. As defined in ‘model_define.h’ we only have one random number generator for uniformly distributed number 0 to 1. I suggest simply coping this code for new models; although, more advanced users may wish to define other choices.

Agents Agents represent the most important aspect of Biocellion. The file ‘model_routine_agent.cpp’ is the source code for many of the functions that control agent behavior and set agent properties. In this simple example only two functions are used, the first for setting up the initial conditions for all agents and the second for controlling the spatial displacement of agents. In future examples, we will introduce the use of other functions which control cell division and death as well as cell grid coupling. The functions are described in more technical detail in section 4.3 of the manual. Here we will focus on describing what was done for this specific example.

Initial conditions for agents can be set by a specifically formatted input file or directly in the code. In this model, we set the conditions using the code, specifically, the function addSpAgents. Incidentally, this function can be used to add agents at any point in the simulation, not just initialization. The input init is used to inform the modeler when the function is called in the initialization of the simulation, opposed to calls made at each baseline step.

First, lets tackle the concept of ‘regions’. For computing reasons, groups or regions of adjoining unit boxes are defined in Biocellion, here we use the term unit box (UB) to indicate a cube defined by the grid (see section 1.2 of the manual) . The function addSpAgents is called for each of these regions, and so we must take this into account in the model code. To deal with this, several inputs are provided to define the region down to the level of specific UBs. The input startVIdx gives us the indexing of the first UB in the region and regionSize gives us the region size in number of UBs, so that startVIdx + regionSize gives us the indexing for the last UB in the region. We see in line 20 that the region size in all dimensions is used to get the volume of the region in terms of UBs, which will be used in conjunction with the agent density defined in ‘model_define.h’, A_CELL_DENSITY_PER_UB, to determine the number of agents of each type to place in this region.

After we have determined the number of agents to place in a region, we need to initialize the state and all the required properties to the correct output vector. In this example setting the agent state is obvious (radius and type, which are the minimum requirements for a state). We also need to set the offset. Biocellion tracks agent locations by two sets of numbers, an integer set (vIdx) to index the UB the agent is in and a real set (vOffset) to identify the position in that UB. Defining locations in this manner has a number of advantages, including the ability to have an arbitrarily large simulation domain without issues of numerical precision. While vIdx is obvious, vOffset needs an explanation. Given a UB with grid spacing IF, the vOffset = {0,0,0} identifies the center of the UB, the upper extreme is an offset of 0.5IF and the lower extreme is an offset of -0.5IF. Here the offset, as well as the UB, is determined randomly, which is a good example of how to call a random number generator: Util::getModelRand( MODEL_RNG_UNIFORM ). Finally all the proerties are added to the correct vectors using the vector function push_back.

The spatial displacement of an agent is calculated every baseline time step (near the end of the step, see section 4.1 of the manual). This is done by calling the function adjustSpAgent. Additionally this same function can be used to change any state properties based on a number of inputs, but here we only consider movement. The spatial displacement is typically calculated using the pairwise adjusted agent property for force, mechIntrctData.force. This, as we will see, is set by mechanical interactions to be the sum of forces acting on this agent. In this simple model the displacement is just set equal to the force. Note that the term force is used for conceptual reasons, it is up to the user to define the quantitative meaning via the logic used in the model code. Later, we will see that force, in this model, is defined in such a way that disp=force will remove all overlap between agents in a single baseline step. We also see that some random movement is added, as a function of cell size, and that a check is put in place to prevent excessive movement. Note, if agents move more then one interface grid space an error will occur.

Mechanical interactions For the most part mechanical interactions represent functions used to update pairwise adjusted agent properties. This is all done in the source file ‘model_routine_mech_intrct.cpp’. Specifically the methods allow you to update junction ends (see section 1.1 and 2.4.15 in the manual), model specific extra mechanical variables (see section 2.4.19 in the manual) and force. Force is actually a property of an instance of AgentMechIntrctData (see section 2.4.20 in the manual), and an instance of this data type exists for each agent in the simulation. The functions are described in more technical detail in section 4.4 of the manual.

Force is updated in a pairwise fashion using the function computeForceSpAgent. This function is called once for every pair of agents within the user defined maximum interaction distance. Information about both agents in the pair is provided along with pre-calculated center-to-center distances and a direction vector, unit vector pointing from agent 1 to agent 0. Force is calculated to be equal yet opposite for the two agents in the pair, with a positive or negative magnitude resulting in a repulsive or attractive force, respectively.

Looking at the example, we can see that shoving is proportional to the amount of cell-cell overlap, incidentally this would be the force for a spring potential. We can see here how the cell radius is, a property of agent state, is provided in the input instance of SpAgent, which was setup by the properties we provided in our initialization step in the file ‘model_routine_agent.cpp’. In a similar manner, we could access any model specific variable for the agent and we could use them to produce more complex logic for the calculation of force. We also used a common formulation for adhesion force when particles are the same type. It may not be obvious, but as we mentioned the adhesion is limited to 10 um in this simulation. Although the 10 um cutoff is not explicit in this function, it is being used as the maximum interaction distance set in the ‘model_routine_config.cpp’. Remember that agent pairs more separated by more then the maximum interaction distance will not be considered for mechanical interactions.

One thing worth mentioning is that the mechanical interaction properties are probably the most affected by agent size and shape. Although Biocellion was written with spherical shapes in mind, this is certainly not required. More complex code can be used to represent agents of any shape and additional model specific variable can be defined and set elsewhere to store information on additional critical lengths as well as physical properties, like torque.

Other functions The functions described above, pretty much complete the model for this simple example. However, I should mention the last two model routine source files. The file ‘model_routine_grid.cpp’ contains the source code for functions related to the grid (see section 4.5 in the manual). The majority of the grid functions describe how the PDEs are setup and run; however, there are functions to include whatever you would like in the form of model specific grid variables. We will see this in future examples. The last file four the model specific source code is ‘model_routine_output.cpp’. Currently the output is only definable for agents (although there is some default output for PDE species, if any are setup). The location of the agent center and the radius are provided in the output by default. The variable color, is provided in the agent output by default as well, but the specific value is set by the user. Typically, we use the agent type to set this variable (in a latter tutorial we will go over visualization, which will make use of this output). For now we have no other output so v_extra is left empty.

Running the model

I am not going to go over instillation here, and I will simply refer the user to section 6.4 of the manual for a description on how to setup the Biocellion framework on a standard unix box. I will present the model configuration file for our example, and the typical strategy I use to execute models.

First, the model configuration file is an xml file with run time configuration options. In this example it is called ‘sorting.xml’. This file hold the information on how to run the code and I typically think of it in four part, domain scale, output properties, parallelization properties and PDE properties. For domain scale, we have length of simulation <time_step num_baseline="1000"/> and <domain x="32" y="32" z="32"/> which sets the spatial domain to a cube 32 * 10 um in all 3 dimensions, you are welcome to chose anything you want for these parameter. You may also want or need to change the output properties, which specify the folder the output should go to and the interval it should be recorded. Finally, you should think about the number of threads you would like this is found as the last property in the system properties, num_threads. Here we are only considering parallelization on a shared memory machine, like a standard desktop with a multicore processor. You can set this property to the number of processors you would like to devote to the simulation. And YES it is that easy it is to take full advantage of the parallel computing power of Biocellion. If you would like to use multiple nodes (non shared memory) or more complex computing setups, its not that much more difficult, but we will save that for another time.

To run a model you can move the model code to the ‘libmodel/model/’ folder in the Biocellion package. However, I find moving code back and forth annoying so I use a shell script that makes a soft link to the model code I am interested in:

BCPATH='/local/biocellion' MYPATH='/users/rtasseff/biocellion' MODEL=$1 rm $BCPATH/libmodel/model ln -s $MYPATH/$MODEL/model $BCPATH/libmodel/model

make clean -C $BCPATH/libmodel/ make -C $BCPATH/libmodel/

The variable passed to this script is the name of the folder where the model code actually resides. You can see that this script also compiles the model code. Once the code is associated to the model folder and it is compiled you are ready to run. Remember, at this point you can only change the properties in the xml file, if you change any model code you obviously need to recompile. I also use a shell script to run:

BCPATH='/local/biocellion' MYPATH='/users/rtasseff/biocellion' MODEL=$1 BEXT='DP.SPAGENT.OPT'

rm $MYPATH/$MODEL/output/*

rm $BCPATH/libmodel/model ln -s $MYPATH/$MODEL/model $BCPATH/libmodel/model

cd $BCPATH/framework/main/ ./biocellion.$BEXT $MYPATH/$MODEL/model/run_param.xml

You might notice that I have a folder called output in the same location as the model code folder. I delete the output from previous runs before new ones. This is because in the best situation it will get written over and in the worst situation I will have changed some run time properties and I might end up with some combination of outputs from multiple runs.

Conclusion

Well, that is how you build and run a simple agent based Biocellion model. The next example will show you how to visualize simple and more complex results, run PDEs, and use more advanced functions. Hopefully it will not be long until you are developing your own high performance, biological models. And, please, don’t forget to cite Biocellion where appropriate!