A few months ago, Colin Carroll posted *A Tour of Probabilistic Programming Language APIs*, where he compared the APIs of a variety of probabilistic programming languages (PPLs) using this model:

\[ \begin{aligned} p ( \mathbf { w } ) & \sim \mathcal { N } \left( \mathbf { 0 } , I _ { 5 } \right) \\ p ( \mathbf { y } | X , \mathbf { w } ) & \sim \mathcal { N } \left( X \mathbf { w } , 0.1 I _ { 100 } \right) \end{aligned} \]

Soss.jl has really been picking up steam lately with a big boost from Taine Zhao's generalized generated functions for staged compilation. And thanks to Colin's work, we have a handy Rosetta Stone to other PPLs. So let's implement this and explore a bit.

# Defining the Model

Here we go!

```
cc = @model X begin
w ~ Normal(0,1) |> iid(5)
Xw = X * w
y ~ For(1:100) do j
Normal(Xw[j],0.1)
end
end;
```

If you've done some probabilistic programming, this may look a little strange. In most PPLs, you'd see the observed data passed in as an argument. But in Soss, **the model has no reference to the data!**

There are a few problems with the more common approach:

- It conflates data with hyperparameters,
- It makes dependencies much less clear, and
- It forces an early commitment to which variables are observed, greatly reducing flexibility.

In Soss, a model is written more like a standard function. From the model definition above, replacing each `v ~ dist`

with `v = rand(dist)`

already gets us very close to standard Julia code that can be executed.

Before we get too far ahead of ourselves, let's step through the `cc`

model in some detail.

First, just like a function, we have any arguments, in this case the design matrix `X`

. We could have chosen to put this in the body as well, and not require any arguments.

Next we have the body of the model, which is comprised of a set of `Statements`

. The `cc`

model has three statements.

For

`w`

, the "`~`

" indicates it's a random variable, with its distribution following.`iid`

is a statistical term that means*independent and identically distributed*. So`w`

will be a`Vector`

, with each component drawn from a`Normal(0,1)`

distribution.Next

`Xw`

is declared in an`Assign`

statement. There's nothing too unusual here; it's just like an`=`

in most languages.Finally, we have another

`Sample`

for`y`

in the form of a`For`

block. This is similar to`iid`

, but allows the distribution to vary as a function of the index.

We could optionally have a `return`

statement. Because we don't, the model will return a named tuple, with a slot for each `Sample`

or `Assign`

statement.

# Applying the Model

Above I said models are function-like. So what do we get when we fill in the arguments?

```
julia> X0 = randn(100,5);
julia> cc(X=X0)
Joint Distribution
Bound arguments: [X]
Variables: [w, Xw, y]
@model X begin
w ~ Normal(0, 1) |> iid(5)
Xw = X * w
y ~ For(1:100) do j
Normal(Xw[j], 0.1)
end
end
```

"Distribution"? Yep. `cc(X=X0)`

is a distribution just like `Normal(0,1)`

is. And just as we would expect with a distribution, we can sample from it:

```
julia> obs = rand(cc(X=X0))
(X = [-0.8272484544396099 -0.03646389219747102 … 0.9375627860961913 -0.804611104845261; -0.6202176685874693 0.7053793384222113 … 0.014361484078182381 -0.3737011074670588; … ; -2.8061148816772663 0.9815626536629464 … -0.8081310261204893 -0.5775062378122593; -0.006294543353902432 -0.26016341761264344 … -2.132236712504989 0.0128466239738442], Xw = [-2.221565537313942, -2.1520058746385304, -2.1293094342311103, 2.941188972024378, 0.9375034196035377, 4.714386942469174, 6.287025102610045, 1.7702590639999833, 7.6596956428829674, 0.35806372731922104 … 1.5771365203011163, 1.2979668403687703, 2.19230613396396, -1.1871741199543733, 4.175965213223754, 4.014878023176152, -0.5222101908555612, 5.00462751766363, -2.4025174019374136, 5.6428099448628455], w = [0.7298700914411173, -2.6773826574609143, -0.03040032089651461, -2.324302227817693, -0.6091581898700535], y = [-2.077911721314273, -2.161650284487506, -2.0906144035641847, 2.9433241681835, 0.952057361034113, 4.694415215318885, 6.392498728047622, 1.6900865159917597, 7.625515407370478, 0.2417023002430837 … 1.6578734475509114, 1.2888603774507226, 2.212452426932563, -1.156288447328179, 4.108958690520445, 4.014541785531511, -0.5235177431156618, 4.9852474365678425, -2.4759738971855128, 5.70482453218249])
```

# Conditioning on Observed Data

Now that we have our fake data, we need to sample from the posterior distribution. For continuous parameters, the general best practice is to start with *Hamiltonian Monte Carlo*. In Julia, there are two main implementations of this: Tamas Papp's DynamicHMC.jl, and AdvancedHMC.jl from the Turing.jl team (notably Kai Xu and Hong Ge). There are some trade-offs between these, but I've done more testing with the former, so let's go with that for now.

For HMC, we'll need to specify a joint distribution (remember, that's just the model applied to some arguments) and data to condition on. It's convenient to pipe this into the `particles`

helper function:

```
julia> post = dynamicHMC(cc(X=X0), (y=obs.y,)) |> particles;
julia> post.w
5-element Array{Particles{Float64,1000},1}:
0.721 ± 0.011
-2.68 ± 0.011
-0.0383 ± 0.0098
-2.32 ± 0.0087
-0.611 ± 0.011
```

`Particles`

are from Fredrik Bagge Carlson's `MonteCarloMeasurements.jl library, and make the result much more concise. Conveniently, there's no data loss here; the summarization is only in how it's displayed. `Particles`

support componentwise operations as if each was just a floating-point value. This has great benefits for easily working with the results of an analysis.

# The Posterior Predictive Distribution

Earlier I said that the body of a model is a *set* of statements. No, not a sequence, it's a set. Well, a *partially ordered set*.

Soss understands variable dependencies. In fact, the order of statements chosen by the user is immediately discarded. All computations, even "displaying" the model to user, are done in terms of *topological ordering* of the variables, relative to their dependencies.

This representation is very simple from a programming languages perspective, and we may extend it in the future. But even as it is, the representation is convenient for model transformations. Since we now have samples from the poseterior over `w`

, we can build a new model conditioning on it:

```
julia> pred = predictive(cc, :w)
@model (X, w) begin
Xw = X * w
y ~ For(1:100) do j
Normal(Xw[j], 0.1)
end
end
```

[Alternatively, we could just use `dynanamicHMC`

again. But "forward" computation is much cheaper, and we'll have better effective sample size.]

We're finally ready to set up our posterior predictive check. This takes a bit of maneuvering, so it's one thing we'll try to smooth out in future releases. Here's the code:

```
ppc = mapslices(Matrix(post.w),dims=2) do w
rand(pred(X=X0, w=w)).y
end |> Particles
```

`Matrix(post.w)`

is a 1000×5 matrix. We feed each row into the predictive model, generate a random sample, and retrieve the `y`

component, before finally aggregating the whole thing back into `Particles`

.

If the model is fit well, `obs.y`

should be uniform along the quantiles of the posterior predictive distribution. To check this requires looking at

`q = mean.(y0 .> ppc)`

# Epilogue

Things are moving really fast, and there are lots of other things I haven't mentioned yet, like

- First-class models
- Lifting distributions to models
- Markov chain combinators
- Multiple imputation
- Normalizing Flows with Bijectors.jl
- Deep learning with Flux
- Integration with Turing and Gen