If you've followed my work recently, you've probably heard of my probabilistic programming system Soss.jl. I recently had the pleasure of presenting these ideas at PyData Miami:
[N.B. Above is supposed to be an embedded copy of my slides from PyData Miami. I can see it from Chrome, but not Firefox. Very weird. ]
In April I'll begin another "passion quarter" (essentially a sabbatical) and hope to really push this work forward. I count myself lucky to have an employer so supportive of independent R&D. Thank you, Metis!
It's a bit unusual to document an approach before it's fully implemented. There are a few reasons for doing this:
- Making the ideas explicit is helpful for forcing myself to think through them
- It could be useful for anyone following this to see where things are going
- In case anyone is interesting in collaboration (pull requests welcome 🙂)
- In case there's an aspect of this I'm overlooking
The Initial Design
Using a simple model m
as a running example, a Soss model looks like this:
m = @model x begin
μ ~ Normal(0, 5)
σ ~ HalfCauchy(3)
x ~ Normal(μ, σ) |> iid
end
@model
is a macro. Evaluating code in any language leads to it being transformed through a sequence of intermediate representations. The first step in this process is nearly always to parse the raw text code, resulting in an abstract syntax tree ("AST").
A macro is an "AST transformer". It takes the AST resulting from parsing, and modifies it (or possibly replaces it with something completely different) before passing it down the line.
In the initial implementation, the @model
macro does very little. Simplifying a bit, a Model
is represented as a Julia struct
:
struct Model
args :: Vector{Symbol}
body :: Expr
end
To build this, the @model
macro turns x
into a Vector
of Symbols
:
julia> m.args == [:x]
true
and essentially leaves the body
expression alone:
julia> m.body
quote
μ ~ Normal(0, 5)
σ ~ HalfCauchy(3)
x ~ Normal(μ, σ) |> iid
end
Note, however, that this is not just text, but has already been parsed. For example, we can show the Lisp-like s-expression for the AST:
julia> m.body |> Meta.show_sexpr
(:block,
(:call, :~, :μ, (:call, :Normal, 0, 5)),
(:call, :~, :σ, (:call, :HalfCauchy, 3)),
(:call, :~, :x, (:call, :|>, (:call, :Normal, :μ, :σ), :iid))
)
Where's the Problem?
The initial design gives the ultimate in flexibility, but at a high cost. The bulk of a model is just an Expr
ession, with no real structure. This makes it awkward to reason about, and leads to a lot of code redundancy. For example, if you look in the code, you'll see a lot of things like this:
newbody = postwalk(m.body) do x
if @capture(x, v_ ~ dist_)
...
else x
end
...
end
This traverses the body and pattern-matches sub-expressions that look like distributional statements. A lot of this is imperative, mutating some other structure as we walk the AST. This kind of code is (at least for me) easy to get wrong in subtle ways.
Tidying Up
This implementation no longer sparks joy. What else can we do?
Trying to pin this down was the goal of what turned into an extended discussion on Julia's Discourse message board. After lots of back and forth and some time rethinking the approach, we have a path forward.
Set Up the Types
Julia uses multiple dispatch, but we're not yet making much use of it. By setting up some types in a sensible way, we can make the code cleaner and more extensible.
For example, often the body of a Model
consists of a sequence of v ~ dist
statements. Let's call each of these a ModelTilde
.
But can we be sure we'll never need more than ModelTilde
s? In some cases assignment in the usual sense can be useful, and the two should certainly be treated differently. So maybe we also have a ModelAssignment
to represent things like y = f(x)
.
To keep things consistent, we can create a new abstract type, say AbstractModelBlock
, and declare ModelTilde
and ModelAssignment
to be subtypes of it. This gives us a nice clean representation (the body of a Model
is an array of AbstractModelBlock
values) without sacrificing extensibility (a new subtype of AbstractModelBlock
just requires defining some operations on the new type).
Is this enough? Probably not. The most common use of v ~ dist
is where dist
is a Distribution
. But it can certainly be more general than that. a value is generated according to a simulation process. Or maybe a value has been generated earlier, and we'd like a flexible way of assigning a weight. We may even have a Model
object, or we'd like to embed a model from Turing.jl and treat it as a distribution-like value.
All of this points to the need for a AbstractTildeRHS
("right hand side") abstract type. Possible subtypes include Distribution
, Model
, TuringModel
, Simulation
, and Weight
.
Focus on Declarative Modeling
Declarative systems are significantly easier to reason about, so we'll focus on this approach. That's not to say imperative components will be disallowed. Rather, we'll expect effects to be contained (for example in a Simulation
or TuringModel
).
Have @model
do more
Suppose our model has a declaration like x ~ Normal(μ, σ)
. There are quite a few useful things we can precompute:
- LHS variable
:x
- RHS variables
[:μ,:σ]
, making sure only to extract symbols that are either arguments or LHS of some~
or=
statement - A compiled function
(μ, σ, shape) -> rand(Normal(μ, σ), shape)
- A compiled function
(μ, σ, x) -> logpdf(Normal(μ, σ), x)
- An algebraic expression for the log pdf in terms of
[:μ, :σ, :x]
Note that "compiled function" is not quite accurate, given Julia's generic functions and JIT compilation. The point is that this is an actual function, as opposed to an AST that evaluates to a function.
Having "compiled" functions available gives a way of interpreting a model without the need to call eval
. It's possible this may be enough, but I suspect we'll still need eval
to be sure we can "run" the model without unnecessary interpretive overhead.
Leverage Existing Julia Packages
For such a young language, Julia has an impressive collection of packages. We're already making extensive use of a few of these:
- Distributions.jl for sampling and logpdf evaluation for a wide variety of distributions
- MacroTools.jl for AST manipulation
- LogDensityProblems.jl and DynamicHMC.jl for inference via Hamiltonian Monte Carlo
There are certainly plenty more opportunities:
- Reduce.jl connects with the classic REDUCE computer algebra system. I expect this will be a good fit for manipulating the algebraic form of log-densities
- MLStyle.jl seems to offer a fast and elegant approach for pattern matching in ASTs
- ResumableFunctions.jl offers a simple way to express efficient iterators, at a similar syntactic cost to
yield
generators in Python - Zygote.jl gives efficient source-to-source automatic differentiation
- Flux.jl gives a way to express deep neural networks at a similar level to Keras, but is implemented entirely in Julia. This avoids the conceptual and implementation overhead of using a library implemented in another language.
There are also a number of other probabilistic programming implementation in Julia:
- ForneyLab.jl generates message passing algorithms over factor graphs
- Turing.jl takes a very general "universal probabilistic programming" approach. Turing's success with the imperative approach was one motivation for focusing on a different part of the design space.
There are at least two possible relationships between Soss and other probabilistic programming languages (PPLs) like ForneyLab and Turing:
- The other PPL can serve as a backend
- A model in the other PPL can serve as a
TildeRHS
in Soss
Of course, there are plenty of probabilistic programming systems outside of Julia. Because Julia connects so easily with C and Python, there's at least potential to explore similar connections with Stan, PyMC3, and Pyro.
Final Thoughts
The "road map" is still pretty high-level, but I feel like it's now converging much more quickly. I'm excited to be getting things lined up to push them forward more quickly in the spring. If you have questions, thoughts, or suggestions, I hope you'll contact me on Julia Discourse or otherwise.
I'd like to thank Chris Foster and Tamas Papp for their time and patience in a Julia Discourse discussion, as well as the Turing team for being supportive and open to collaboration. Most notably, Mohamed Tarek and Martin Trapp have been especially helpful.