causalpy.pymc_models

Defines generic PyMC ModelBuilder class and subclasses for

  • WeightedSumFitter model for Synthetic Control experiments

  • LinearRegression model

Models are intended to be used from inside an experiment class (see PyMC experiments). This is why the examples require some extra manipulation input data, often to ensure y has the correct shape.

class causalpy.pymc_models.InstrumentalVariableRegression[source]

Custom PyMC model for instrumental linear regression

Example

>>> import causalpy as cp
>>> import numpy as np
>>> from causalpy.pymc_models import InstrumentalVariableRegression
>>> N = 10
>>> e1 = np.random.normal(0, 3, N)
>>> e2 = np.random.normal(0, 1, N)
>>> Z = np.random.uniform(0, 1, N)
>>> ## Ensure the endogeneity of the the treatment variable
>>> X = -1 + 4 * Z + e2 + 2 * e1
>>> y = 2 + 3 * X + 3 * e1
>>> t = X.reshape(10,1)
>>> y = y.reshape(10,1)
>>> Z = np.asarray([[1, Z[i]] for i in range(0,10)])
>>> X = np.asarray([[1, X[i]] for i in range(0,10)])
>>> COORDS = {'instruments': ['Intercept', 'Z'], 'covariates': ['Intercept', 'X']}
>>> sample_kwargs = {
...     "tune": 5,
...     "draws": 10,
...     "chains": 2,
...     "cores": 2,
...     "target_accept": 0.95,
...     "progressbar": False,
... }
>>> iv_reg = InstrumentalVariableRegression(sample_kwargs=sample_kwargs)
>>> iv_reg.fit(X, Z,y, t, COORDS, {
...                  "mus": [[-2,4], [0.5, 3]],
...                  "sigmas": [1, 1],
...                  "eta": 2,
...                  "lkj_sd": 1,
...              }, None)
Inference data...
build_model(X, Z, y, t, coords, priors)[source]

Specify model with treatment regression and focal regression data and priors

Parameters:
  • X – A pandas dataframe used to predict our outcome y

  • Z – A pandas dataframe used to predict our treatment variable t

  • y – An array of values representing our focal outcome y

  • t – An array of values representing the treatment t of which we’re interested in estimating the causal impact

  • coords – A dictionary with the coordinate names for our instruments and covariates

  • priors – An optional dictionary of priors for the mus and sigmas of both regressions priors = {"mus": [0, 0], "sigmas": [1, 1], "eta": 2, "lkj_sd": 2}

fit(X, Z, y, t, coords, priors, ppc_sampler=None)[source]

Draw samples from posterior distribution and potentially from the prior and posterior predictive distributions. The fit call can take values for the ppc_sampler = [‘jax’, ‘pymc’, None] We default to None, so the user can determine if they wish to spend time sampling the posterior predictive distribution independently.

sample_predictive_distribution(ppc_sampler='jax')[source]

Function to sample the Multivariate Normal posterior predictive Likelihood term in the IV class. This can be slow without using the JAX sampler compilation method. If using the JAX sampler it will sample only the posterior predictive distribution. If using the PYMC sampler if will sample both the prior and posterior predictive distributions.

class causalpy.pymc_models.LinearRegression[source]

Custom PyMC model for linear regression

Defines the PyMC model

\[ \begin{align}\begin{aligned}\beta &\sim \mathrm{Normal}(0, 50)\\\sigma &\sim \mathrm{HalfNormal}(1)\\\mu &= X * \beta\\y &\sim \mathrm{Normal}(\mu, \sigma)\end{aligned}\end{align} \]

Example

>>> import causalpy as cp
>>> import numpy as np
>>> from causalpy.pymc_models import LinearRegression
>>> rd = cp.load_data("rd")
>>> X = rd[["x", "treated"]]
>>> y = np.asarray(rd["y"]).reshape((rd["y"].shape[0],1))
>>> lr = LinearRegression(sample_kwargs={"progressbar": False})
>>> lr.fit(X, y, coords={
...                 'coeffs': ['x', 'treated'],
...                 'obs_indx': np.arange(rd.shape[0])
...                },
... )
Inference data...
build_model(X, y, coords)[source]

Defines the PyMC model

class causalpy.pymc_models.ModelBuilder[source]

This is a wrapper around pm.Model to give scikit-learn like API.

Public Methods

  • build_model: must be implemented by subclasses

  • fit: populates idata attribute

  • predict: returns predictions on new data

  • score: returns Bayesian \(R^2\)

Example

>>> import causalpy as cp
>>> import numpy as np
>>> import pymc as pm
>>> from causalpy.pymc_models import ModelBuilder
>>> class MyToyModel(ModelBuilder):
...     def build_model(self, X, y, coords):
...         with self:
...             X_ = pm.Data(name="X", value=X)
...             y_ = pm.Data(name="y", value=y)
...             beta = pm.Normal("beta", mu=0, sigma=1, shape=X_.shape[1])
...             sigma = pm.HalfNormal("sigma", sigma=1)
...             mu = pm.Deterministic("mu", pm.math.dot(X_, beta))
...             pm.Normal("y_hat", mu=mu, sigma=sigma, observed=y_)
>>> rng = np.random.default_rng(seed=42)
>>> X = rng.normal(loc=0, scale=1, size=(20, 2))
>>> y = rng.normal(loc=0, scale=1, size=(20,))
>>> model = MyToyModel(
...             sample_kwargs={
...                 "chains": 2,
...                 "draws": 2000,
...                 "progressbar": False,
...                 "random_seed": rng,
...             }
... )
>>> model.fit(X, y)
Inference data...
>>> X_new = rng.normal(loc=0, scale=1, size=(20,2))
>>> model.predict(X_new)
Inference data...
>>> model.score(X, y)
r2        0.390344
r2_std    0.081135
dtype: float64
__init__(sample_kwargs=None)[source]
Parameters:

sample_kwargs (Optional[Dict[str, Any]]) – A dictionary of kwargs that get unpacked and passed to the pymc.sample() function. Defaults to an empty dictionary.

build_model(X, y, coords)[source]

Build the model, must be implemented by subclass.

Return type:

None

fit(X, y, coords=None)[source]

Draw samples from posterior, prior predictive, and posterior predictive distributions, placing them in the model’s idata attribute.

Return type:

None

Parameters:

coords (Dict[str, Any] | None)

predict(X)[source]

Predict data given input data X

Caution

Results in KeyError if model hasn’t been fit.

score(X, y)[source]

Score the Bayesian \(R^2\) given inputs X and outputs y. :rtype: Series

Caution

The Bayesian \(R^2\) is not the same as the traditional coefficient of determination, https://en.wikipedia.org/wiki/Coefficient_of_determination.

Return type:

Series

class causalpy.pymc_models.PropensityScore[source]

Custom PyMC model for inverse propensity score models

Defines the PyMC model

\[ \begin{align}\begin{aligned}\beta &\sim \mathrm{Normal}(0, 1)\\\sigma &\sim \mathrm{HalfNormal}(1)\\\mu &= X * \beta\\p &= logit^{-1}(mu)\\t &\sim \mathrm{Bernoulli}(p)\end{aligned}\end{align} \]

Example

>>> import causalpy as cp
>>> import numpy as np
>>> from causalpy.pymc_models import PropensityScore
>>> df = cp.load_data('nhefs')
>>> X = df[["age", "race"]]
>>> t = np.asarray(df["trt"])
>>> ps = PropensityScore(sample_kwargs={"progressbar": False})
>>> ps.fit(X, t, coords={
...                 'coeffs': ['age', 'race'],
...                 'obs_indx': np.arange(df.shape[0])
...                },
... )
Inference...
build_model(X, t, coords)[source]

Defines the PyMC propensity model

fit(X, t, coords)[source]

Draw samples from posterior, prior predictive, and posterior predictive distributions. We overwrite the base method because the base method assumes a variable y and we use t to indicate the treatment variable here.

class causalpy.pymc_models.WeightedSumFitter[source]

Used for synthetic control experiments

Note

Generally, the .fit() method should be used rather than calling .build_model() directly.

Defines the PyMC model:

\[ \begin{align}\begin{aligned}\sigma &\sim \mathrm{HalfNormal}(1)\\\beta &\sim \mathrm{Dirichlet}(1,...,1)\\\mu &= X * \beta\\y &\sim \mathrm{Normal}(\mu, \sigma)\end{aligned}\end{align} \]

Example

>>> import causalpy as cp
>>> import numpy as np
>>> from causalpy.pymc_models import WeightedSumFitter
>>> sc = cp.load_data("sc")
>>> X = sc[['a', 'b', 'c', 'd', 'e', 'f', 'g']]
>>> y = np.asarray(sc['actual']).reshape((sc.shape[0], 1))
>>> wsf = WeightedSumFitter(sample_kwargs={"progressbar": False})
>>> wsf.fit(X,y)
Inference data...
build_model(X, y, coords)[source]

Defines the PyMC model