In this practical, we are going to perform inference for the Lotka-Volterra model of population dynamics. The model describes the dynamics of population sizes of prey ($u(t)$) and predators ($v(t)$), where, here, $t$ denotes a given time in years. These equations are given by:
\[ \begin{aligned} \frac{d u}{dt} &= \alpha u - \beta u v\\ \frac{d v}{dt} &= -\gamma v + \delta u v \end{aligned} \]
where at time $t=0$, $u(0) = u_0$ and $v(0)=v_0$. Here, $\alpha\geq 0$ yields the prey population growth rate in the absence of predators; $\beta \geq 0$ is the rate of decline in the prey as a result of predation; $\gamma \geq 0$ denotes the rate of decline of predators in absence of prey; $\delta > 0$ is the rate of predator population growth due to predation.
Have you got PINTS installed on your machine? If not, clone the PINTS repo on: https://github.com/pints-team/pints and install it by executing:
pip install -e .[dev,docs]
on the terminal in the PINTS main directory. (If you want to avoid clashes with any local installed packages, you may want to use a virtual environment.)
Write a function which solves these two equations for any choice of $(\alpha, \beta, \gamma, \delta, u_0, v_0)$. Plot the solutions for $\alpha=0.55, \beta=0.028, \gamma=0.84, \delta=0.026, u_0=33, v_0=6$.
Code up the RHS of equations and use numerical solver.
import numpy as np
def lotka_volterra_rhs(y, t, alpha, beta, gamma, delta):
u, v = y
dydt = [alpha * u - beta * u * v, -gamma * v + delta * u * v]
return dydt
from scipy.integrate import odeint
from plotnine import *
import pandas as pd
def lotka_volterra_solve(times, alpha, beta, gamma, delta, u_0, v_0):
sol = odeint(lotka_volterra_rhs, [u_0, v_0], times, args=(alpha, beta, gamma, delta))
df = pd.DataFrame({'t': times, 'u': sol[:, 0], 'v': sol[:, 1]})
return df
times = np.linspace(0, 25, 1001)
df = lotka_volterra_solve(times, 0.55, 0.028, 0.84, 0.026, 33, 6)
df_long = pd.melt(df, id_vars='t')
(ggplot(df_long, aes(x='t', y='value', colour='variable')) +
geom_line())
An alternative way to view this is as orbits.
(ggplot(df, aes(x='u', y='v')) +
geom_path())
Run the model with $\alpha=0.79, \beta=0.04, \gamma=1.3, \delta=0.04, u_0=33, v_0=6$. How do the dynamics compare with previous?
df1 = lotka_volterra_solve(times, 0.79, 0.04, 1.3, 0.04, 33, 6)
df1['simulation'] = '2'
df['simulation'] = '1'
df_both = pd.concat([df, df1])
(ggplot(df_both, aes(x='u', y='v', colour='simulation')) +
geom_path())
Suppose that the observations for each population are governed by the following:
\[ \begin{aligned} \tilde u &= u \exp(\epsilon_u)\\ \tilde v &= v \exp(\epsilon_v) \end{aligned} \]
where $\epsilon_u\sim\mathcal{N}(0, \sigma_u)$ and $\epsilon_v\sim\mathcal{N}(0, \sigma_v)$. Show that these are equivalent to:
\[ \begin{aligned} \tilde u &\sim \text{log-normal}(\log u, \sigma_u)\\ \tilde v &\sim \text{log-normal}(\log v, \sigma_v)\\ \end{aligned} \]
Using these observation models, generate annual data for 25 years assuming $\sigma_u=\sigma_v=0.25$ and using the parameter sets from parts 1 and 2. Graph these observations.
Note, this may be a helpful resource: https://ben18785.shinyapps.io/distribution-zoo/
def observations(df, sigma_u, sigma_v):
u_tilde = np.zeros(len(df))
v_tilde = np.zeros(len(df))
for i in range(len(u_tilde)):
u_tilde[i] = scipy.stats.lognorm.rvs(scale=df['u'][i], s=sigma_u, size=1)
v_tilde[i] = scipy.stats.lognorm.rvs(scale=df['v'][i], s=sigma_v, size=1)
df['u_tilde'] = u_tilde
df['v_tilde'] = v_tilde
return df
times = np.arange(0, 25, 1)
df = lotka_volterra_solve(times, 0.55, 0.028, 0.84, 0.026, 33, 6)
df = observations(df, 0.25, 0.25)
df['simulation'] = '1'
df1 = lotka_volterra_solve(times, 0.9, 0.04, 1.3, 0.04, 33, 6)
df1 = observations(df1, 0.25, 0.25)
df1['simulation'] = '2'
df_both = pd.concat([df, df1])
(ggplot(df_both, aes(x='u_tilde', y='v_tilde', colour='simulation')) +
geom_path() +
geom_point())
The likelihood for the prey compartment is given by:
\[ L_u = \prod_{t=1}^{T} \text{log-normal}(\tilde u(t)| \log u(t|\alpha, \beta, \gamma, \delta, u_0, v_0), \sigma_u) \]
with a similar expression for the predator compartment. Here $u(t|\alpha, \beta, \gamma, \delta, u_0, v_0)$ is the solution of the Lotka-Volterra eqns. Write a function which calculates the log-likelihood of a set of observations with given values of $(\alpha, \beta, \gamma, \delta, u_0, v_0, \sigma_u, \sigma_v)$.
def lotka_volterra_u_loglikelihood(df_obs, alpha, beta, gamma, delta, u_0, v_0, sigma_u, sigma_v):
df = lotka_volterra_solve(df_obs['t'], alpha, beta, gamma, delta, u_0, v_0)
log_p = 0.0
for i in range(len(df_obs['t'])):
log_p += scipy.stats.lognorm.logpdf(df_obs['u_tilde'][i], scale=df['u'][i], s=sigma_u)
log_p += scipy.stats.lognorm.logpdf(df_obs['v_tilde'][i], scale=df['v'][i], s=sigma_v)
return log_p
Using the first of the series generated in question 4, plot a 1D slice through the log-likelihood surface for $\alpha$ whilst holding the other parameters $\beta=0.028, \gamma=0.84, \delta=0.026, u_0=33, v_0=6, \sigma_u=0.25, \sigma_v=0.25$. In this slice, how close is $\alpha$ to its true value?
import matplotlib.pyplot as plt
alpha = np.linspace(0.2, 1, 100)
log_like = np.zeros(len(alpha))
for i in range(len(alpha)):
log_like[i] = lotka_volterra_u_loglikelihood(df, alpha[i], 0.028, 0.84, 0.026, 33, 6, 0.25, 0.25)
plt.plot(alpha, log_like)
plt.show()
PINTS is a Python package designed in the Department of Computer Science that provides access to all sorts of inference routines, which is especially good for ODEs. Following the approach given here: https://github.com/pints-team/pints/blob/master/examples/stats/custom-model.ipynb wrap wrap a pints.LogPDF class around the log-likelihood function we just created to allow us to access these methods. We're going to hold a number of parameters constant to make inference a little more manageable for this practical, so set up your method with $(u_0=33, v_0=6, \sigma_u=0.25, \sigma_v=0.25)$.
import pints
class LotkaVolterraLogPDF(pints.LogPDF):
def __init__(self, df_obs, u0=33, v0=6, sigma_u=0.25, sigma_v=0.25):
self._df_obs = df_obs
self._u0 = u0
self._v0 = v0
self._sigma_u = sigma_u
self._sigma_v = sigma_v
def __call__(self, x):
df_obs = self._df_obs
u0 = self._u0
v0 = self._v0
sigma_u = self._sigma_u
sigma_v = self._sigma_v
return lotka_volterra_u_loglikelihood(df_obs, x[0], x[1], x[2], x[3],
u0, v0, sigma_u, sigma_v)
def n_parameters(self):
return 4
We now are going to use an inbuilt MCMC method in PINTS called HaarioBardenetACMC
to generate posterior samples with 2000 iterations per chain and 200 initial phase iterations across 4 chains. To initialise each of the chains assume:
$(\alpha=0.6, \beta=0.02, \gamma=1.0, \delta=0.03)$
which can be done using:
nchains = 4
xs = [[0.6, 0.02, 1.0, 0.03]] * nchains
Note, initialising chains all at a single point is not good practive but allows us to get up and running faster.
To run the MCMC, we first instantiate the pints.LogPDF object you created in the previous question assuming observational data generated in question 4. Then we instantiate an MCMC Controller object using:
mcmc = pints.MCMCController(model, nchains, xs, method=pints.HaarioBardenetACMC)
where model
is an instantiated version of your model. Then we set the total number of iterations per chain and the initial phase iterations using:
mcmc.set_max_iterations(2000)
mcmc.set_initial_phase_iterations(200)
Finally, we execute:
chains = mcmc.run()
to run the MCMC.
Note that, at the moment, since we have not specified priors, PINTS implicitly assumes that the priors are uniform (and, in this case, improper).
model = LotkaVolterraLogPDF(df)
nchains = 4
xs = [[0.6, 0.02, 1.0, 0.03]]*4
mcmc = pints.MCMCController(model, nchains, xs, method=pints.HaarioBardenetACMC)
# Add stopping criterion
mcmc.set_max_iterations(2000)
# Start adapting after 1000 iterations
mcmc.set_initial_phase_iterations(200)
chains = mcmc.run()
PINTS has an in-built MCMC summary object that can be called and printed using:
results = pints.MCMCSummary(chains=chains, time=mcmc.time())
print(results)
Have your MCMC chains yet converged? If Rhat > 1.1 (probably 1.01 for publications), for any of your parameters, in practice you would rerun for more iterations. Since we only have a little time today, we won't rerun things.
results = pints.MCMCSummary(chains=chains, time=mcmc.time())
print(results)
Using PINTS' plotting tools (see: https://github.com/pints-team/pints/blob/master/examples/sampling/adaptive-covariance-haario-bardenet.ipynb) plot pairwise samples from the parameters. Before we do that, let's discard the first half of the chains as these are likely before convergence occurred.
import pints.plot
# Look at distribution across all chains
pints.plot.pairwise(np.vstack(chains[:, 1000:]), kde=False)
# Show graphs
plt.show()
Why is there positive correlation between the estimates of $\alpha$ and $\beta$?
Examining the equation:
$\frac{d u}{dt} = \alpha u - \beta u v$,
we see that if $\alpha\uparrow$ then $\frac{d u}{dt}\uparrow$. So, if $\beta\uparrow$ too, then $\frac{d u}{dt}$ can remain approximately constant, and the dynamics of the system would remain unchanged. In other words, the same observed data sample can be obtained in more than one way.
Using a randomly drawn subset of 100 posterior draws, generate posterior predictive draws of the mean predator and prey numbers of time. Then plot these, overlaying the data. Why do the posterior predictive simulations not encompass all the variation seen in the data?
chains_long = np.vstack(chains[:, 1000:])
nsamples = 100
idx = np.random.randint(len(chains_long), size=nsamples)
chains_long = chains_long[idx, ]
times = np.linspace(0, 25, 1001)
for i in range(nsamples):
temp = lotka_volterra_solve(times,
chains_long[i, 0], chains_long[i, 1],
chains_long[i, 2], chains_long[i, 3],
33, 6)
temp['replicate'] = str(i)
if i == 0:
big_df = temp
else:
big_df = big_df.append(temp)
Combine datasets.
df['type'] = 'observations'
big_df['type'] = 'simulations'
big_df1 = pd.concat([df, big_df])
(ggplot(big_df1.query('type=="simulations"'), aes(x='t', y='u')) +
geom_line(aes(group='replicate'), alpha=0.2) +
geom_point(big_df1.query('type=="observations"'),
aes(y='u_tilde'), colour='orange'))
We are now going to initialise the chains randomly, which is better practice since it is less likely the chains will be biased to a particular area of parameter space. To do this, we are going to create a UniformLogPrior
object in PINTS, then use the sample()
method to generate independent draws for each parameter across each chain. To do this, we are going to use the following code:
log_prior = pints.UniformLogPrior([0, 0, 0, 0], [2, 0.1, 2.0, 0.2])
Using this log_prior
object, sample initial positions for the chains and re-run the MCMC sampling as above. Afterwards, plot the path of the chains over time and compute convergence measures. Do the chains look as close to convergence as before?
log_prior = pints.UniformLogPrior([0, 0, 0, 0],
[2, 0.1, 2.0, 0.2])
xs = log_prior.sample(nchains)
mcmc = pints.MCMCController(model, nchains, xs, method=pints.HaarioBardenetACMC)
# Add stopping criterion
mcmc.set_max_iterations(2000)
# Start adapting after 1000 iterations
mcmc.set_initial_phase_iterations(200)
chains = mcmc.run()
results = pints.MCMCSummary(chains=chains, time=mcmc.time())
print(results)
import pints.plot
pints.plot.trace(chains)
plt.show()