Genetic Simulation Library

Reference Manual

John Conery
University of Oregon

Copyright © 1997, University of Oregon


Contents

Generation Classes

Genome Classes

Individual Classes

Population Classes

Random Number Generator Classes

Statistics Classes


Generation Classes

A generation is a collection of individuals. In the current version, operations on generations are simply wrappers for calls to the same operations as implemented in the ISet class.
#include "generation.h"
The header file contains the definitions of the Generation and GenerationParamBlock classes.

Generation

class Generation {
public:
  Generation();
The constructor creates a new generation containing no individuals.

  ~Generation();
Make sure the destructor is called to deallocate variables used to maintain the internal set management data structures.

  void insert(Individual *x);
Add individual x to this generation.

  Individual *remove(int i);
Remove individual number i from this generation (see ISet::remove() for the definition of a set index).

  Individual *select();
Remove a random individual from the generation.

  Individual *operator [](int i);
Return a reference to individual i in this generation.

  int size();
Return the number of individuals in the generation.

  static void reset_class();
Reinitialize the class. Currently does nothing.

  static void set_parameters(GenerationParamBlock *pb);
Currently does nothing.

private:
  ...
}

GenerationParamBlock

Since there are no operating parameters for the Generation class, the current version of the parameter block is empty.
class GenerationParamBlock {
};


Genome Classes

A genome object holds all the information related to the genetic fitness of an individual. The class named Genome is an abstract class that defines the basic operations that must be implemented for any genome object. An actual genotype object will be an instance of one of the three derived classes. See the library documentation for descriptions of the three derived classes.

#include "genome.h"
This header file contains the definition of the base class, the inline fitness functions, and the overloaded << operator.

#include "infgenome.h"
#include "sparsegenome.h"
#include "virtualgenome.h"
Use one of these three header files in your application, depending on which representation(s) you will use. The files automatically include the base class header file.

#include "strand.h"
Definition of the Strand class, which is a building block for the Sparse Genome representation. You will not need it unless you are defining your own genome representation using strands.

Genome

The string "=0" at the end of each member function defined in this class signify that the class is an abstract class. In other words, there will never be an object of type Genome. Any objects that represent genes will be defined by a class that is derived from Genome, and every such derived class must implement all of these member functions. The steps carried out by the derived class functions will all do the same thing. The only difference is in how they are implemented: each derived class uses different data structures to represent a genotype, and the functions defined for each class must update the corresponding type of data structure.

Since all derived classes implement the same member functions, the operations of the member functions are described here.

class Genome {
protected:
   static Genome *exemplar;
   virtual Genome *new_genome() = 0;
User code cannot access these private functions, but they are shown here to explain how the "virtual constructors" in the derived classes work -- see the description of make_genome and set_parameters below.
public:
   friend Genome *make_genome();
The "virtual constructor" function. Returns an object of type Genome* by calling exemplar->new_genome(). The object pointed to by exemplar is an object of the derived type last used in a call to set_parameters.

   virtual ~Genome() {};
Each derived class must implement a destuctor that will clean up all the data structures associated with a Genome object.

   virtual fitness_t combine(Genome *p1, Genome *p2) =0;
Set the genes in this object to a combination of the genes in p1 and p2. Note that combine() does not construct a new genome object; the genotype must exist before combine() can be called to fill it with a combination of genes from p1 and p2. This function calculates and returns the fitness of the resulting genotype.

   virtual fitness_t set_locus(int i, mutation_t s[]) =0;
The second argument of this procedure is a vector of two mutation values. The procedure sets the mutation values at locus i of the genotype to these two mutation values. The return value is the new fitness of the genotype.

   virtual fitness_t add_mutations(int n) =0;
Add n new mutations to random loci in this genotype, and return the new relative fitness after the mutations have been added. The location of the mutations within the set of genes and the effect of the mutations are defined by each implementation of the genome class.

   virtual fitness_t fitness() =0;
Returns the fitness of this genotype as last caculated in a call to combine() or add__mutations(). Note that the fitness of a genotype is initialized to 1.0, and this value will be returned if combine() or add_mutations() has not been called yet.

   virtual void print(ostream &s) =0;
Print a string with a visible representation of this genome object on the stream s.

   virtual Genome* operator =(Genome *x) =0;
A "deep copy" operator that makes this genotype a copy of g. Note that since Genome is an abstract class, every variable used to access objects of this class must be a pointer to a Genome and not a Genome itself. If g0 and g1 are both defined to be Genome*, the assignment g0 = g1 simply copies the pointer, not the Genome object. To make g0 a copy of g1, invoke the copy operator by the assignment g0 = *g1.

NOTE: the deep copy has not been implemented yet for all derived types.

};
inline ostream &operator <<(ostream &s, Genome *g);
The base class defines this overloading of the << operator, so it is possible to print a textual representation of any Genome object on an output stream. The operator calls the print member function defined for the derived class.

GenomeParamBlock

The parameter block defined in the base class has fields that are common to all genome classes. The derived classes may also define additional parameters by defining their own parameter blocks that are derived from a GenomeParamBlock.
class GenomeParamBlock {
public:
  mutation_t s;		/* s = mean mutation effect */
  double u;		/* mu = gametic mutation rate */
};

InfiniteGenome

An "infinite genome" allows genes to grow arbitrarily long. When a locus no longer differentiates any individual, it can be reused, i.e. it can be reset to 0.0 in all individuals and used as a location for a new mutation. An internal garbage collection routine, invisible to user code, performs the garbage collection periodically.

Refer to the documentation of the base class for a description of each of member function.

class InfiniteGenome : public Genome {
public:
   InfiniteGenome();
   ~InfiniteGenome();
   fitness_t combine(Genome *p1, Genome *p2);
   fitness_t set_locus(int i, mutation_t s[]);
   fitness_t add_mutations(int n);
   fitness_t fitness();
   void print(ostream &s);
   Genome* operator =(Genome *x);
   static void reset_class();
   static void set_parameters(InfGenomeParamBlock *p);
   static void init_mutations(ISet &s);
   static void class_status(ostream &s);
protected:
  ...
};

InfGenomeParamBlock

class InfGenomeParamBlock : public GenomeParamBlock {
public:
  double d;		/* dominance factor */
};

SparseGenome

A "sparse genome" is basically a sparse vector. The genome length is speficied in advance, and cannot change during the simulation. The genome length can be quite large, but the representation is efficient because only loci that contain mutations are actually stored in memory. Thus the amount of space occupied by a genotype is proportional to the number of mutations. The data structure used to hold the sparse vector is known as a
strand. A genotype will consist of one or more strands, and the combine function will combine the strands independently, implementing a chromosome structure for the genes.
class SparseGenome : public Genome {
public:
   SparseGenome();
   ~SparseGenome();
   fitness_t combine(Genome *p1, Genome *p2);
   fitness_t set_locus(int i, mutation_t s[]);
   fitness_t add_mutations(int n);
   fitness_t fitness();
   void print(ostream &s);
   Genome* operator =(Genome *x);
   static void reset_class();
   static void set_parameters(SparseGenomeParamBlock *p);
   static void class_status(ostream &s);
   static void init_mutations(ISet &is);
protected:
  ...
};

SparseGenomeParamBlock

The additional parameters needed for a sparse genome define the variation in mutation effects (the mean mutation effect is specified in the base class parameter s) and parameters that define the overall genome length, the number of chromosomes, and the probability of a cross-over during recombination.
class SparseGenomeParamBlock : public GenomeParamBlock {
public:
  mutation_t sds;	/* std deviation of s (s = mutation effect) */
  int gl;		/* G = number of loci in genome */
  int nchromosomes;	/* N = number of haploid chromosomes */
  double maplength;	/* M = genetic map length (unit = Morgans) */
};

Strand

A strand is used to hold the mutations in a single chromosome. Conceptually a strand is simply a linked list that chains together loci. The current implementation uses segmented vectors to provide better locality of reference and faster access.

The building block for a strand is defined by a local class named Locus. A locus object will have two mutation values.

class Strand {
friend class StrandIter;
public:
   Strand();
Allocates a new, empty, strand.

   ~Strand();
Deallocate the strand.

   static void set_chromosome_length(int n);
Set the number of loci that can be held in this strand (does not have to be the same for all strands in a genotype).

   fitness_t set_gene(int x, int y, mutation_t s);
Insert a mutation with value s at locus number x, strand y (y is either 0 or 1).

   mutation_t get_gene(int x, int y);
Return the value of the mutation at locus x, strand y.

   fitness_t get_fitness(int x);
Return the relative fitness of locus x. The relative fitness is determined by the two mutation values, and is computed using the fitness function defined in the Genome class.

   fitness_t strand_fitness();
Return the relative fitness of the entire strand, defined to be the product of the locus fitness of each locus in the strand.

   Strand* unravel(int n, int y);
Create a new strand by selecting random genes from this object. Start at a random strand (0 or 1) and copy until a cross-over point or the end of the strand. n is the number of cross-overs to perform; the cross-overs will occur at uniformly random locations distributed throughout the strand. y is the "target" strand (see the description of merge(), below).

   fitness_t merge(Strand *s1, Strand *s2);
Combine two input strands and form a new double strand. The two input strands should be created by calls to unravel() performed on two parent genotypes, one with a "target" of 1 and the other with a "target" of 0.

   void print(ostream &sout, int n);
Print a representation of the strand on the output stream sout. Stop printing after n loci.
private:
  ...
};

StrandIter

A StrandIter object is an "iterator" object for strands. Use it to traverse a strand in order, stepping from the first non-zero locus to the last. The iterator knows how the strand is put together, so it can move efficiently from one locus to the next and skip over non-zero loci.

Here is an example of a piece of code that uses a strand iterator to compute the product of the fitness values of all mutated loci in a strand sp:

   for (StrandIter si(sp); si < max; ++si)
     w *= sp.get_fitness(si);
The name of the iterator is si. Note that the argument of the iterator constructor is a reference to the strand object that the iterator will traverse. When the iterator is created, its value is the index of the first non-zero locus in the strand. The operation ++si will set si to the index of the next non-zero locus in the strand. max is the length of the strand, i.e. a value higher than any locus index.
class StrandIter {
public:
   StrandIter(Strand &sp);
Create a new iterator to traverse the strand object sp.

   int operator++();
Advance the iterator to the next non-zero locus.

   operator const int() const;
Return the index of the current locus.

private:
  ...
};

VirtualGenome

A "virtual" genome is simply a single fitness value that represents the total overall fitness of the genotype. Individual loci are not represented explicitly.

The "combine" operation, which in the usual genome classes creates a child genotype by combining genes from two parents, in this case simply draws a random fitness. The distribution of fitness values is specified by a random number generator passed as a parameter to the set_parameters() function.

class VirtualGenome : public Genome {
public:
   VirtualGenome();
   ~VirtualGenome();
   fitness_t combine(Genome *p1, Genome *p2);
   fitness_t set_locus(int i, mutation_t s[]);
   fitness_t add_mutations(int n);
   fitness_t fitness();
   void print(ostream &s);
   Genome* operator =(Genome *x);
   static void reset_class();
   static void set_parameters(VirtualGenomeParamBlock *p);
   static void init_mutations(ISet &s);
   static void class_status(ostream &s);
protected:
  ...
};

VirtualGenomeParamBlock

The only operating parameter for the virtual genome class is the random number generator object to use for drawing the fitness of new genotypes.
class VirtualGenomeParamBlock : public GenomeParamBlock {
public:
  RNG *R;
};

Fitness Functions

inline fitness_t hs(mutation_t s);
This function computes the reduction in fitness caused by a heterozygous mutation with effect s.

inline fitness_t locus_fitness(mutation_t s0, mutation_t s1);
This function computes the relative fitness of a locus that has mutations with effects s0 and s1.


Individual Classes

An individual consists of a genotype, a sex, and an ID. Although this simple representation is unlikely to suffice for all but the simplest simulations, this class can be used as a base class for deriving more complex individuals.
#include "individual.h"
The include file defines the Individual, IndividualParamBlock, and ISet classes, an overloaded stream operator for printing a representation of the individual, and an enumerated type for sexes.

enum sex_t {NONE, FEMALE, MALE};
The enumerated type. Use NONE to indicate uninitialized values.

inline ostream &operator <<(ostream &sout, Individual &x);
The stream output operator.

Individual

class Individual {
public:
   Individual();
The constructor initializes the genes by calling make_genome(), sets the sex to NONE, and assigns a unique ID. The ID is simply determined by incrementing a hidden class variable used as a counter.

   ~Individual();
Destructor.

   void print(ostream &s);
Print a representation of the individual on stream s; invoked by the << operator.

   void set_sex(sex_t sex);
   sex_t get_sex();
Member functions to assign the sex of the individual and return the current sex.

   int get_id();
Return the individual's ID.

   Genome *const genes;
A pointer to the genes (see the library documentation of Individual for a discussion of why this member is public and what it means for it to be defined as a const).

   void copy_genes(GBase *gb, int n);
Initialize the genes of this individual by making a copy of genotype n in the genotype database gb.

   static void reset_class();
   static void set_parameters(IndividualParamBlock *pb);
Since the class has no operating parameters these two procedures have no effect.
protected:
  ...
};

IndividualParamBlock

Since there are no operating parameters for the Individual class, the current version of the parameter block is empty.
class IndividualParamBlock {
};

ISet

An ISet is a container object for the Set class. It implements a simple ordered set, allowing set elements to be referenced by index. If the set currently holds n items, the items are indexed from 0 to n-1.
class ISet {
public:
   ISet();
Create a new, empty, set of individuals.

   ~ISet();
Deallocate the set. IMPORTANT NOTE: deallocating a set does NOT deallocate the individuals in the set.

   void insert(Individual *x);
Add individual x to the set. Individuals are stored in the order in which they are inserted.

   Individual *remove(int i);
Remove individual number i from the set. Note that this operation will change the index of all remaining individuals.

   Individual *select();
Remove a random individual from the set.

   Individual *operator [](int i);
The array index operator can be used to refer to any individual according to its current position in the set.

   int size();
Return the number of individuals currently in the set.

private:
  ...
};


Population Classes

Populations are collections of generations. The simple population defined in the library has just two generations -- a "current" generation of reproducing adults and a "new" generation of offspring produced by the current generation. As is the case with Individual and Generation, the Population class is too simple to be used in any but the simplest simulations, but it can be used as a base class for more complicated populations.
#include "population.h"
The header file contains the definitions of the Population and PopulationParamBlock classes. It also defines a class named ResultBlock, which is a simple record structure used to hold the results of a simulation.

Population

class Population {
public:
   Population(PopulationParamBlock *pb = NULL);
Create a new Population object, initializing it with values from the parameter block. Note that the use of a parameter block is slightly different in this class: it is used to create a single population object, whereas the parameter blocks for the other classes are used to initialize the class, not any particular object.

   ~Population();
Destructor; make sure it is called to deallocate the generations and any other structures used in a simulation.

   ResultBlock *run(int n);
Run the simulation for up to n generations. Returns either when the population is extinct or after the nth generation has been created. Results of the run are returned in a ResultBlock structure.

   int step();
Do one step of the simulation, typically one generation cycle. The return value is the number of survivors in the new generation.

   friend ostream &operator <<(ostream &s, Population &p);
Print the current status of the population on the output stream s.

protected:
  ...
};
The run() and step() procedures can be used interchangeably. After they return, the population is left in a state where the simulation can be resumed by a subsequent call. For example, an application might define a derived type that has member functions for visualizing the state of the population. A program might call step() two times so the user can view two generations, then call run(100) to advance the simulation to year 100; in this case a call to run(100) simulates 98 more years, and is equivalent to calling step() 98 more times, assuming the population is not extinct before year 100.

The application should not call run() or step() after the population is extinct, since the behavior of these functions is undefined in these situations. Extinction is signified by a population size of zero, either the return value from a call to step() or the population size indicated in the ResultBlock returned by run().

ResultBlock

class ResultBlock {
public:
  int ngen;		/* number of generations simulated */
  int nsur;		/* current size of the population */
};

PopulationParamBlock

class PopulationParamBlock {
public:
  double kmax;		/* K = mean carrying capacity */
  double sdk;		/*     std deviation of kmax */
  double rmax;		/* R = mean reproductive rate */
  double sdr;		/*     std deviation of rmax */
  double u;		/* mu = gametic mutation rate */
};


Random Number Generator Classes

The GSL library includes several random number generators. There are two ways to draw a sample from a random distribution: call a stand-alone function for that distribution, or create a random number generator object and use an operator to advance the object to the next number in its sequence.
#include "rng.h"
The header file contains the definition of a base class, RNG, that describes the basic behavior of random number generator objects; seven derived classes, each defining a different random distribution; seven stand-alone random number functions; and a class for defining seeds for random number generators.

RNGSeed

The random number generators are built on top of the rand48 collection of random number functions that are part of most Unix system libraries. Seeds for these functions are 48 bits long. The RNGSeed class gives users a way to create a seed from an integer and to examine the state of a seed.

The last parameter to each of the stand-alone functions and each of the random number generator constructors is an optional pointer to a seed. If the parameter is missing, or if it is NULL, a global seed, automatically initialized from the system clock, is used. The same seed is shared by all stand-alone functions and all random number generator objects. Users can create their own seeds and pass pointers to them if they want to use their seeds instead of the predefined seed.

class RNGSeed {
public:
   RNGSeed();
Create a new seed, and initialize it from the system clock.

   RNGSeed(unsigned int x0);
Create a new seed, using the 32 bits from x0 and zeros for the remaining 16 bits.

   ~RNGSeed();
Deallocate the seed.

   unsigned short *x;
Return a pointer to the seed. Note: this function will be replaced in future versions, so it should not be called from user code.
};

RNG Base Class

The operations defined for the base class are implemented in every random number generator object created by a derived class.
class RNG {
public:
   virtual double operator ++() =0;
Return the next random deviate in this distribution.

   operator const double();
Return the most recently generated random deviate from this distribution.

   RNGSeed *get_seed();
Return a pointer to the seed object currently being used by this distribution.

   RNGSeed *save_seed();
Return a copy of the seed used by this object (i.e. future calls to the RNG object will not affect the copy).

   void set_seed(RNGSeed *px);
Replace current seed with the seed pointed to by px. Note: the current seed might be shared by several objects, so it is not deleted by this procedure; users must know when the old seed is not used any more and deallocate it when it is not used by any object.

   void restore_seed(RNGSeed *ps);
Copy the seed pointed to be ps to the seed used by this object (i.e. the opposite of save_seed).

   void inc_seed(int n);
Add n to the seed.

};

Binomial Distribution

A value from a binomial distribution will be an integer valued floating point number between 0 and n corresponding to the number of successes in n trials, where each trial has probability p.
double rbinomial(double p, int n, RNGSeed *s = NULL);

class BinomialRNG : public RNG {
public:
  BinomialRNG(double p, int n, RNGSeed *s = NULL);
  double operator ++();
private:
  ...
};

Distributions Based on Cumulative Density Functions

A value from a CDF distribution will have a probability that is defined by a probability density function supplied by the user. The PDF is contained in a file that is read by the class constructor and used to create an internal cumulative density function, which is in turn used whenever the user requests a new value. See "
Arbitrary Distributions" in the GSL User Manual for information on the file format and the types of distributions supported.
typedef enum {CDF_OK, CDF_OPEN_ERR, CDF_FORMAT_ERR} CDFStatus;

class CDFRNG : public RNG {
public:
   CDFRNG(char *CDFFileName)
The argument to the constructor is the name of the data file that contains the input probability density function.

   ~CDFRNG()
Be sure to call the destructor so it deallocates the internal CDF and other state variables.

   int bad()
Returns true if there were any errors encountered in constructing the CDF, e.g. if the input file could not be found or if there was a format error.

   double min()
Returns the smallest X value, i.e. the smallest deviate that will be generated.

   double max()
Returns the largest X value, the largest deviate that will be generated.

   double operator ++();
Return a new random deviate from the distribution.

private:
  ...
};

Exponential Distribution

A value from an exponential distribution will be a positive real value with exponentially decreasing probability of higher values.
double rexponential(RNGSeed *s = NULL);

class ExponentialRNG : public RNG {
public:
  ExponentialRNG(RNGSeed *s = NULL);
  double operator ++();
};

Gamma Distribution

Implementation of a Gamma distribution with mean a and standard deviation b. Note: when a == b the distribution is an exponential distribution; when a < b the distribution is more L-shaped, and when a > b the distribution is similar to a log-normal distribution.
double rgamma(double a, double b, RNGSeed *s = NULL);

class GammaRNG : public RNG {
public:
  GammaRNG(double a, double b, RNGSeed *s = NULL);
  double operator ++();
private:
  ...
};

LogNormal Distribution

There are two ways to specify a log-normal distribution. The first ("lognormal") is used when the mean and standard deviation are specified in the normal scale, and the second ("lognormallog") is used when the mean and standard deviation are specified on the log scale.
double rlognormal(double mean, double stddev, RNGSeed *s = NULL);
double rlognormallog(double mean, double stddev, RNGSeed *s = NULL);

class LogNormalRNG : public RNG {
public:
  LogNormalRNG(double mean, double stddev, RNGSeed *s = NULL);
  ~LogNormalRNG();
  double operator ++();
private:
  ...
};

class LogNormalLogRNG : public RNG {
public:
  LogNormalLogRNG(double mean, double stddev, RNGSeed *s = NULL);
  ~LogNormalLogRNG();
  double operator ++();
private:
  ...
};

Normal Distribution

A value from this distribution will be normally distributed with mean mean and standard deviation stddev.
double rnormal(double mean, double stddev, RNGSeed *s = NULL);

class NormalRNG : public RNG {
public:
  NormalRNG(double mean, double stddev, RNGSeed *s = NULL);
  double operator ++();
private:
  ...
};

Poisson Distribution

A value from a Possion distribution is an integer-valued floating point number with expected value lambda and exponentially decreasing probability of higher values.
double rpoisson(double lambda, RNGSeed *s = NULL);

class PoissonRNG : public RNG {
public:
  PoissonRNG(double lambda, RNGSeed *s = NULL);
  double operator ++();
private:
  ...
};

Uniform Distribution

This distribution consists of real numbers evenly distributed between lower and upper.
double runiform(double lower, double upper, RNGSeed *s = NULL);

class UniformRNG : public RNG {
public:
  UniformRNG(double lower, double upper, RNGSeed *s = NULL);
  double operator ++();
private:
  ...
};

Random Words

unsigned int rword(RNGSeed *s = NULL);
Return a random integer between 0 and 232-1.

unsigned int rword(int n, RNGSeed *s = NULL);
Return a random integer between 0 and n-1.


Statistics Classes

The Statistics class is used to compute descriptive statistics for a set of data points.
#include "statistics.h"
The header file contains the definition of a class that computes statistics and a result block used to return the values of the statistics to the user.

Statistics

A Statistics object works like the stat functions in a hand-held calculator. To compute statistics for a set of data points, create a Statistics object, and then record the values one at a time in the object. After the last value has been recorded, call get_results(); the mean, standard deviation, etc. will be returned in a StatsBlock object.
class Statistics {
public:
   Statistics();
Create a new, initially empty, Statistics object.

   ~Statistics();
Destructor.

   void reset();
Clear the set of values. The constructor automatically initializes the set, so it is not necessary to reset a new object.

   void save(double x);
Add the value x to the set.

   StatsBlock *get_results();
Compute the summary statistics on the values currently in the set and return them in a StatsBlock object.

protected:
  ...
};

StatsBlock

A StatsBlock is simply a record structure with fields for each statistic computed by a Statistics object.
class StatsBlock {
public:
  int n;		/* number of observations */
  double mean;		/* mean */
  double sd;		/* standard deviation */
  double cv;		/* coefficient of variation */
  int min;		/* minimum value */
  int max;		/* maximum value */
};


Copyright © 1997 by the University of Oregon.
ALL RIGHTS RESERVED.

Permission to use, copy, and distribute this software in its entirety for non-commercial purposes and without fee, is hereby granted, provided that the above copyright notice and this permission notice appear in all copies and their documentation.

Software developers, consultants, or anyone else who wishes to use all or part of the software or its documentation for commercial purposes should contact the Technology Transfer Office at the University of Oregon to arrange a commercial license agreement.

This software is provided "as is" without expressed or implied warranty of any kind.


Last update: 20 Nov 97 13:32:39