In the process of trying out some models with different tools I found an extremely intuitive and fast version of JAGS. It’s called PyMC.

Here is an example that should convince you about how easy would it be to switch from one to the other.

Let’s have a look at the R code and Jags for a fairly simple model.

library(rjags)
x = rnorm(15,25,2)
data = list(x=x, n=length(x))
hyper = list (a=3, b=8, cc=10, d=1/100)
init = list(mu=0, tau=1)
mymodel = "model {
for (i in 1:n) {
x[i]~dnorm(mu,tau)
}
mu~dnorm(cc,d)
tau~dgamma(a,b)
}"
model=jags.model(textConnection(mymodel), data=append(data,hyper), inits=init)
update(model,n.iter=10000)
output = coda.samples(model=model, variable.names=c("mu", "tau"), n.iter=10000, thin=1)
print(summary(output))
plot(output)

And now the best: Python.

import pymc
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pymc.graph
# we should get mu=25 and tau=2
# HUGE discrepancy wrt rjags on same dataset (cannot estimate std deviation)
x = np.random.normal(25,2,15)
n = len(x)
mu = pymc.Normal("mu", 10, .001)
tau = pymc.Gamma("tau", 3, 8)
xdata = pymc.Normal("xdata", mu, tau, value=x, observed=True)
# finally, "model" our observed y values as above
#y = pymc.Normal("y", y_pred, err, value=depvar, observed=True)
testmodel = pymc.Model([mu, xdata, tau])
mcmc = pymc.MCMC(testmodel)
mcmc.sample(10000, 1000)
pred_mu = mcmc.trace("mu")[:]
pred_tau = mcmc.trace("tau")[:]
pymc.Matplot.plot(mcmc)
plt.show()

This code will generate the plot below

And here is the estimation of JAGS.

Plots are very similar, of course. I found PyMC definitely more readable.

What do you think?

### Like this:

Like Loading...

*Related*

## About Piggy

I am Piggy and I spend my life reading about math and of course eating. I love science and I support my flatmate who provides me problems to solve and, well, food.

Hi, do you happen to know which is faster: RJAGS or PyMC?

I would love to see some benchmarks!

You could try this out https://gist.github.com/fonnesbeck/018afc7d71d6a7ef1cd4

The issue however is not only speed (I never ran benchmarks to compare but I assume that a numpy-based library is indeed faster). The benefits come with a more homogeneous environment due to the fact that in PyMC you write your model in Python as well. You also have the intrinsic limitations of R in terms of memory management. It’s really really bad, compared to the Python virtual machine. This leads to fundamental problems whenever dealing with very large datasets. If you find a better benchmark, feel free to post and update this comments. Thanks in advance and good luck!

Thanks Piggy!

Are you using PyMC3 rather than PyMC2?

Also, do you by any chance have an example of how to implement a mixed model using PyMC?

PyMC3 is under development. A pity because it uses Theano and Hamiltonian Montecarlo (way faster than MCMC). So far I am using PyMC2 which is good enough too. There is plenty of documentation to learn PyMC. When I have some time I will update my github with some code (I am still organizing my library and moving it from Bitbucket to GitHub). Sorry for the delay ;)

No worries, thanks for your help and best of luck!