Switch from JAGS to PyMC. Now.

In the process of trying out some models with different tools I found an extremely intuitive and fast version of JAGS. It’s called PyMC.

Here is an example that should convince you about how easy would it be to switch from one to the other.
Let’s have a look at the R code and Jags for a fairly simple model.


x = rnorm(15,25,2)
data = list(x=x, n=length(x))
hyper = list (a=3, b=8, cc=10, d=1/100)
init = list(mu=0, tau=1)

mymodel = "model {
  for (i in 1:n) {

model=jags.model(textConnection(mymodel), data=append(data,hyper), inits=init)

output = coda.samples(model=model, variable.names=c("mu", "tau"), n.iter=10000, thin=1)

And now the best: Python.

import pymc
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pymc.graph 

# we should get mu=25 and tau=2
# HUGE discrepancy wrt rjags on same dataset (cannot estimate std deviation)

x = np.random.normal(25,2,15)
n = len(x)

mu = pymc.Normal("mu", 10, .001)
tau = pymc.Gamma("tau", 3, 8)
xdata = pymc.Normal("xdata", mu, tau, value=x, observed=True)

# finally, "model" our observed y values as above
#y = pymc.Normal("y", y_pred, err, value=depvar, observed=True)

testmodel = pymc.Model([mu, xdata, tau])
mcmc = pymc.MCMC(testmodel)
mcmc.sample(10000, 1000)

pred_mu  = mcmc.trace("mu")[:]
pred_tau = mcmc.trace("tau")[:]


This code will generate the plot below
pymc_mu pymc_tau


And here is the estimation of JAGS.



Plots are very similar, of course. I found PyMC definitely more readable.

What do you think?

About Piggy

I am Piggy and I spend my life reading about math and of course eating. I love science and I support my flatmate who provides me problems to solve and, well, food.
This entry was posted in General, Statistics and R and tagged , , , , , , . Bookmark the permalink.

5 Responses to Switch from JAGS to PyMC. Now.

  1. pooks says:

    Hi, do you happen to know which is faster: RJAGS or PyMC?
    I would love to see some benchmarks!

    • Piggy says:

      You could try this out https://gist.github.com/fonnesbeck/018afc7d71d6a7ef1cd4
      The issue however is not only speed (I never ran benchmarks to compare but I assume that a numpy-based library is indeed faster). The benefits come with a more homogeneous environment due to the fact that in PyMC you write your model in Python as well. You also have the intrinsic limitations of R in terms of memory management. It’s really really bad, compared to the Python virtual machine. This leads to fundamental problems whenever dealing with very large datasets. If you find a better benchmark, feel free to post and update this comments. Thanks in advance and good luck!

  2. pooks says:

    Thanks Piggy!

    Are you using PyMC3 rather than PyMC2?
    Also, do you by any chance have an example of how to implement a mixed model using PyMC?

    • Piggy says:

      PyMC3 is under development. A pity because it uses Theano and Hamiltonian Montecarlo (way faster than MCMC). So far I am using PyMC2 which is good enough too. There is plenty of documentation to learn PyMC. When I have some time I will update my github with some code (I am still organizing my library and moving it from Bitbucket to GitHub). Sorry for the delay ;)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s