## Fast and Flexible Probabilistic Programming with Soss.jl

A few months ago, Colin Carroll posted A Tour of Probabilistic Programming Language APIs, where he compared the APIs of a variety of probabilistic programming languages (PPLs) using this model: \[ \begin{aligned} p ( \mathbf { w } ) & \sim \mathcal { N } \left( \mathbf { 0 } , I _ { 5 } \right) \\ p ( \mathbf { y } | X , \mathbf { w } ) & \sim \mathcal { N } \left( X \mathbf { w } , 0. [Read More]

## Variational Importance Sampling

Lots of distributions are easy to evaluate (the density), but hard to sample. So when we need to sample such a distribution, we need to use some tricks. We'll see connections between two of these: importance sampling and variational inference, and see a way to use them together for fast inference. Importance sampling Importance sampling aims to make it easy to compute expected values. Say we have a distribution $$p$$, and we'd like to compute the average of some function $$f$$ of the distribution (or equivalently, the expected value of a "push-forward along $$f$$"). [Read More]

## Confusion Confusion

Harder Than it Needs to Be Say you've just fit a (two-class) machine learning classifier, and you'd like to judge how it's doing. This starts out simple: Reality is yes or no, and you predict yes or no. Your model will make some mistakes, which you'd like to characterize. So you go to Wikipedia, and see this: There's a lot of "divide this sum by that sum", without much connection to why we're doing that, or how to interpret the result. [Read More]

## Soss.jl: Design Plans for Spring 2019

If you've followed my work recently, you've probably heard of my probabilistic programming system Soss.jl. I recently had the pleasure of presenting these ideas at PyData Miami: [N.B. Above is supposed to be an embedded copy of my slides from PyData Miami. I can see it from Chrome, but not Firefox. Very weird. ] In April I'll begin another "passion quarter" (essentially a sabbatical) and hope to really push this work forward. [Read More]

## Julia for Probabilistic Metaprogramming

Since around 2010, I've been involved with using and developing probabilistic programming languages. So when I learn about new language, one of my first questions is whether it's a good fit for this kind of development. In this post, I'll talk a bit about working in this area with Julia, to motivate my Soss project. Domain-Specific Languages At a high level, a probabilistic programming languages is a kind of domain-specific language, or DSL. [Read More]

## A Prelude to Pyro

Lately I've been exploring Pyro, a recent development in probabilistic programming from Uber AI Labs. It's an exciting development that has a huge potential for large-scale applications. In any technical writing, it's common (at least for me) to realize I need to add some introductory material before moving on. In writing about Pyro, this happened quite a bit, to the point that it warranted this post as a kind of warm-up. [Read More]

## Bayesian Optimal Pricing, Part 2

This is Part 2 in a series on Bayesian optimal pricing. Part 1 is here. Introduction In Part 1 we used PyMC3 to build a Bayesian model for sales. By the end we had this result: A common advantage of Bayesian analysis is the understanding it gives us of the distribution of a given result. For example, we very easily analyze a sample from the posterior distribution of profit for a given price. [Read More]

## Bayesian Optimal Pricing, Part 1

Pricing is a common problem faced by businesses, and one that can be addressed effectively by Bayesian statistical methods. We'll step through a simple example and build the background necessary to extend get involved with this approach. Let's start with some hypothetical data. A small company has tried a few different price points (say, one week each) and recorded the demand at each price. We'll abstract away some economic issues in order to focus on the statistical approach. [Read More]

## The Bias-Variance Decomposition

Say there's some experiment that generates noisy data. You and I each go through the process independently, and model the results. Would the resulting models be exactly the same? Well no, of course not. That's the whole problem with noise. Instead, we'll usually end up with something like this (for a quadratic fit): The idea is that we'd like to find an approximation to $$f(x)$$, but we can never observe this function directly. [Read More]

## Bayesian Changepoint Detection with PyMC3

A client comes to you with this problem: The coal company I work for is trying to make mining safer. We made some change around 1900 that seemed to improve things, but the records are all archived. Tracking down such old records can be expensive, and it would help a lot if we could narrow the search. Can you tell us what year we should focus on? Also, it would really help to know this is a real effect, and not just due to random variability - we don't want to waste resources digging up the records if there's not really anything there. [Read More]