If you've followed my work recently, you've probably heard of my probabilistic programming system Soss.jl. I recently had the pleasure of presenting these ideas at PyData Miami:

[N.B. Above is supposed to be an embedded copy of my slides from PyData Miami. I can see it from Chrome, but not Firefox. Very weird. ]

In April I'll begin another "passion quarter" (essentially a sabbatical) and hope to really push this work forward. I count myself lucky to have an employer so supportive of independent R&D. Thank you, Metis!

It's a bit unusual to document an approach before it's fully implemented. There are a few reasons for doing this:

Making the ideas explicit is helpful for forcing myself to think through them
It could be useful for anyone following this to see where things are going
In case anyone is interesting in collaboration (pull requests welcome 🙂)
In case there's an aspect of this I'm overlooking

The Initial Design

Using a simple model m as a running example, a Soss model looks like this:

m = @model x begin
    μ ~ Normal(0, 5)
    σ ~ HalfCauchy(3)
    x ~ Normal(μ, σ) |> iid
end

@model is a macro. Evaluating code in any language leads to it being transformed through a sequence of intermediate representations. The first step in this process is nearly always to parse the raw text code, resulting in an abstract syntax tree ("AST").

A macro is an "AST transformer". It takes the AST resulting from parsing, and modifies it (or possibly replaces it with something completely different) before passing it down the line.

In the initial implementation, the @model macro does very little. Simplifying a bit, a Model is represented as a Julia struct:

struct Model
    args :: Vector{Symbol}
    body :: Expr
end

To build this, the @model macro turns x into a Vector of Symbols:

julia> m.args == [:x]
true

and essentially leaves the body expression alone:

julia> m.body
quote
    μ ~ Normal(0, 5)
    σ ~ HalfCauchy(3)
    x ~ Normal(μ, σ) |> iid
end

Note, however, that this is not just text, but has already been parsed. For example, we can show the Lisp-like s-expression for the AST:

julia> m.body |> Meta.show_sexpr
(:block,
  (:call, :~, :μ, (:call, :Normal, 0, 5)),
  (:call, :~, :σ, (:call, :HalfCauchy, 3)),
  (:call, :~, :x, (:call, :|>, (:call, :Normal, :μ, :σ), :iid))
)

Where's the Problem?

The initial design gives the ultimate in flexibility, but at a high cost. The bulk of a model is just an Expression, with no real structure. This makes it awkward to reason about, and leads to a lot of code redundancy. For example, if you look in the code, you'll see a lot of things like this:

newbody = postwalk(m.body) do x
    if @capture(x, v_ ~ dist_)
        ...
    else x
    end
    ...
end

This traverses the body and pattern-matches sub-expressions that look like distributional statements. A lot of this is imperative, mutating some other structure as we walk the AST. This kind of code is (at least for me) easy to get wrong in subtle ways.

Tidying Up

This implementation no longer sparks joy. What else can we do?

Trying to pin this down was the goal of what turned into an extended discussion on Julia's Discourse message board. After lots of back and forth and some time rethinking the approach, we have a path forward.

Set Up the Types

Julia uses multiple dispatch, but we're not yet making much use of it. By setting up some types in a sensible way, we can make the code cleaner and more extensible.

For example, often the body of a Model consists of a sequence of v ~ dist statements. Let's call each of these a ModelTilde.

But can we be sure we'll never need more than ModelTildes? In some cases assignment in the usual sense can be useful, and the two should certainly be treated differently. So maybe we also have a ModelAssignment to represent things like y = f(x).

To keep things consistent, we can create a new abstract type, say AbstractModelBlock, and declare ModelTilde and ModelAssignment to be subtypes of it. This gives us a nice clean representation (the body of a Model is an array of AbstractModelBlock values) without sacrificing extensibility (a new subtype of AbstractModelBlock just requires defining some operations on the new type).

Is this enough? Probably not. The most common use of v ~ dist is where dist is a Distribution. But it can certainly be more general than that. a value is generated according to a simulation process. Or maybe a value has been generated earlier, and we'd like a flexible way of assigning a weight. We may even have a Model object, or we'd like to embed a model from Turing.jl and treat it as a distribution-like value.

All of this points to the need for a AbstractTildeRHS ("right hand side") abstract type. Possible subtypes include Distribution, Model, TuringModel, Simulation, and Weight.

Focus on Declarative Modeling

Declarative systems are significantly easier to reason about, so we'll focus on this approach. That's not to say imperative components will be disallowed. Rather, we'll expect effects to be contained (for example in a Simulation or TuringModel).

Have `@model` do more

Suppose our model has a declaration like x ~ Normal(μ, σ). There are quite a few useful things we can precompute:

LHS variable :x
RHS variables [:μ,:σ], making sure only to extract symbols that are either arguments or LHS of some ~ or = statement
A compiled function (μ, σ, shape) -> rand(Normal(μ, σ), shape)
A compiled function (μ, σ, x) -> logpdf(Normal(μ, σ), x)
An algebraic expression for the log pdf in terms of [:μ, :σ, :x]

Note that "compiled function" is not quite accurate, given Julia's generic functions and JIT compilation. The point is that this is an actual function, as opposed to an AST that evaluates to a function.

Having "compiled" functions available gives a way of interpreting a model without the need to call eval. It's possible this may be enough, but I suspect we'll still need eval to be sure we can "run" the model without unnecessary interpretive overhead.

Leverage Existing Julia Packages

For such a young language, Julia has an impressive collection of packages. We're already making extensive use of a few of these:

Distributions.jl for sampling and logpdf evaluation for a wide variety of distributions
MacroTools.jl for AST manipulation
LogDensityProblems.jl and DynamicHMC.jl for inference via Hamiltonian Monte Carlo

There are certainly plenty more opportunities:

Reduce.jl connects with the classic REDUCE computer algebra system. I expect this will be a good fit for manipulating the algebraic form of log-densities
MLStyle.jl seems to offer a fast and elegant approach for pattern matching in ASTs
ResumableFunctions.jl offers a simple way to express efficient iterators, at a similar syntactic cost to yield generators in Python
Zygote.jl gives efficient source-to-source automatic differentiation
Flux.jl gives a way to express deep neural networks at a similar level to Keras, but is implemented entirely in Julia. This avoids the conceptual and implementation overhead of using a library implemented in another language.

There are also a number of other probabilistic programming implementation in Julia:

ForneyLab.jl generates message passing algorithms over factor graphs
Turing.jl takes a very general "universal probabilistic programming" approach. Turing's success with the imperative approach was one motivation for focusing on a different part of the design space.

There are at least two possible relationships between Soss and other probabilistic programming languages (PPLs) like ForneyLab and Turing:

The other PPL can serve as a backend
A model in the other PPL can serve as a TildeRHS in Soss

Of course, there are plenty of probabilistic programming systems outside of Julia. Because Julia connects so easily with C and Python, there's at least potential to explore similar connections with Stan, PyMC3, and Pyro.

Final Thoughts

The "road map" is still pretty high-level, but I feel like it's now converging much more quickly. I'm excited to be getting things lined up to push them forward more quickly in the spring. If you have questions, thoughts, or suggestions, I hope you'll contact me on Julia Discourse or otherwise.

I'd like to thank Chris Foster and Tamas Papp for their time and patience in a Julia Discourse discussion, as well as the Turing team for being supportive and open to collaboration. Most notably, Mohamed Tarek and Martin Trapp have been especially helpful.

julia bayes Soss