Linear regression

Import the necessary packages:

using Distributions
using MLJBase
using Soss
using SossMLJ
using Statistics

In this example, we fit a Bayesian linear regression model with the canonical link function.

Suppose that we are given a matrix of features X and a column vector of labels y. X has n rows and p columns. y has n elements. We assume that our observation vector y is a realization of a random variable Y. We define μ (mu) as the expected value of Y, i.e. μ := E[Y]. Our model comprises three components:

  1. The probability distribution of Y: for linear regression, we assume that each Yᵢ follows a normal distribution with mean μᵢ and variance σ².
  2. The systematic component, which consists of linear predictor η (eta), which we define as η := Xβ, where β is the column vector of p coefficients.
  3. The link function g, which provides the following relationship: g(E[Y]) = g(μ) = η = Xβ. It follows that μ = g⁻¹(η), where g⁻¹ denotes the inverse of g. For linear regression, the canonical link function is the identity function. Therefore, when using the canonical link function, μ = g⁻¹(η) = η.

In this model, the parameters that we want to estimate are β and σ. We need to select prior distributions for these parameters. For each βᵢ we choose a normal distribution with zero mean and variance . Here, βᵢ denotes the ith component of β. For σ, we will choose a half-normal distribution with variance . s and t are hyperparameters that we will need to choose.

We define this model using the Soss probabilistic programming library:

m = @model X, s, t begin
    p = size(X, 2) # number of features
    β ~ Normal(0, s) |> iid(p) # coefficients
    σ ~ HalfNormal(t) # dispersion
    η = X * β # linear predictor
    μ = η # `μ = g⁻¹(η) = η`
    y ~ For(eachindex(μ)) do j
        Normal(μ[j], σ) # `Yᵢ ~ Normal(mean=μᵢ, variance=σ²)`
    end
end;

Generate some synthetic features. Let us generate two continuous features and two binary categorical features.

num_rows = 1_000
x1 = randn(num_rows)
x2 = randn(num_rows)
x3 = Int.(rand(num_rows) .> 0.5)
x4 = Int.(rand(num_rows) .> 0.5)
X = (x1 = x1, x2 = x2, x3 = x3, x4 = x4)
(x1 = [-1.6700856897114855, -1.6942240673112658, -0.32029893745609683, 1.159970452395149, 0.49163425514204323, 0.16147929350720341, 1.0062660210329246, 0.333273046858179, 0.3382568505571667, -0.869097290055658  …  -0.10778819068004106, 0.29318866951907074, -1.2512105492262828, -0.03041851654220319, 1.7483919066714215, -0.2780831616504583, 0.0873182389633058, 0.03351974677905975, -1.157180378472239, 0.5120458721664282],
 x2 = [0.5062808706662215, -0.4249878849724696, 0.318092015051467, -0.20298686435691451, -0.4569289744234028, 0.8552890564978077, -0.931751377855438, -0.9938603640929531, 1.2779327765860795, 1.6012053741953365  …  -0.14608824648686372, -0.9810971148000734, -0.9093841118968836, 1.2731142582876696, -1.6775754980618205, 1.8382871020723717, 1.9942584339316822, -0.10356437312761582, -1.7285653848156333, 0.21970860457692595],
 x3 = [1, 1, 0, 1, 0, 0, 1, 1, 0, 1  …  0, 1, 1, 1, 0, 1, 1, 0, 0, 0],
 x4 = [1, 0, 1, 1, 0, 1, 1, 1, 1, 1  …  0, 0, 1, 1, 1, 0, 0, 0, 1, 0],)

Define the hyperparameters of our prior distributions:

hyperparams = (s=0.1, t=0.1)
(s = 0.1,
 t = 0.1,)

Convert the Soss model into a SossMLJModel:

model = SossMLJModel(;
    model       = m,
    hyperparams = hyperparams,
    infer       = dynamicHMC,
    response    = :y,
);

Generate some synthetic labels:

args = merge(model.transform(X), hyperparams)
truth = rand(m(args))
(p = 4,
 η = [-0.20275568525146412, -0.35523942755617555, 0.08861522711190198, 0.2886250233522517, 0.08342322639962983, 0.17608869103234287, 0.2574794885136685, 0.1392873784558527, 0.209495428379339, -0.056152772550266766  …  -0.01972091480964892, -0.010503439026173345, -0.1376417861273326, 0.08878143782593509, 0.4392179767736961, -0.09414415146369073, -0.029261499294802464, 0.005267108668979438, -0.06980161285471137, 0.09092827504471078],
 μ = [-0.20275568525146412, -0.35523942755617555, 0.08861522711190198, 0.2886250233522517, 0.08342322639962983, 0.17608869103234287, 0.2574794885136685, 0.1392873784558527, 0.209495428379339, -0.056152772550266766  …  -0.01972091480964892, -0.010503439026173345, -0.1376417861273326, 0.08878143782593509, 0.4392179767736961, -0.09414415146369073, -0.029261499294802464, 0.005267108668979438, -0.06980161285471137, 0.09092827504471078],
 σ = 0.0017122031201463523,
 β = [0.1750854179049791, 0.005810011548661296, -0.056136314186577954, 0.14284678214993748],
 y = [-0.2017024948271163, -0.35702186577313677, 0.08762483034906982, 0.28743372782242244, 0.08258122038769387, 0.17384055189326758, 0.25910820364294695, 0.13665809619618874, 0.2088466329088354, -0.056420622674392934  …  -0.0192259461580595, -0.011580755358182529, -0.13777460995678234, 0.0875479345664117, 0.4405267697599636, -0.095371751409188, -0.02822488249640713, 0.0027424882475212565, -0.06989280119012435, 0.09311368058461622],)

Create an MLJ machine for fitting our model:

mach = MLJBase.machine(model, X, truth.y)
Machine{SossMLJModel{,…}} @383 trained 0 times.
  args: 
    1:	Source @692 ⏎ `ScientificTypes.Table{Union{AbstractVector{ScientificTypes.Continuous}, AbstractVector{ScientificTypes.Count}}}`
    2:	Source @213 ⏎ `AbstractVector{ScientificTypes.Continuous}`

Fit the machine. This may take several minutes.

fit!(mach)
Machine{SossMLJModel{,…}} @383 trained 1 time.
  args: 
    1:	Source @692 ⏎ `ScientificTypes.Table{Union{AbstractVector{ScientificTypes.Continuous}, AbstractVector{ScientificTypes.Count}}}`
    2:	Source @213 ⏎ `AbstractVector{ScientificTypes.Continuous}`

Construct the posterior distribution and the joint posterior predictive distribution:

predictor_joint = predict_joint(mach, X)
typeof(predictor_joint)
SossMLJ.SossMLJPredictor{SossMLJModel{SossMLJ.SossMLJPredictor, Soss.Model{NamedTuple{(:X, :s, :t), T} where T<:Tuple, TypeEncoding(begin
    σ ~ HalfNormal(t)
    p = size(X, 2)
    β ~ Normal(0, s) |> iid(p)
    η = X * β
    μ = η
    y ~ For(eachindex(μ)) do j
            Normal(μ[j], σ)
        end
end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:s, :t), Tuple{Float64, Float64}}, typeof(Soss.dynamicHMC), Symbol, typeof(SossMLJ.default_transform)}, Vector{NamedTuple{(:σ, :β), Tuple{Float64, Vector{Float64}}}}, Soss.Model{NamedTuple{(:X, :σ, :β), T} where T<:Tuple, TypeEncoding(begin
    η = X * β
    μ = η
    y ~ For(eachindex(μ)) do j
            Normal(μ[j], σ)
        end
end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:X, :s, :t), Tuple{Matrix{Float64}, Float64, Float64}}}

Draw a single sample from the joint posterior predictive distribution:

single_sample = rand(predictor_joint; response = :y)
1000-element Vector{Float64}:
 -0.2058099806640922
 -0.357264596055084
  0.09028597558099684
  0.2885899334316771
  0.08108720168829273
  0.17558552016483095
  0.2583286269280545
  0.1384550056644606
  0.20483706795878984
 -0.05535619223340847
  ⋮
 -0.008861568836340237
 -0.13547858765101395
  0.09236390319495555
  0.4418599667204133
 -0.09529181850164065
 -0.027695379188191854
  0.005834352962548248
 -0.06808782477098402
  0.09148433580196842

Evaluate the logpdf of the joint posterior predictive distribution at this sample:

logpdf(predictor_joint, single_sample)
4980.358137350192

True β:

truth.β
4-element Vector{Float64}:
  0.1750854179049791
  0.005810011548661296
 -0.056136314186577954
  0.14284678214993748

Posterior distribution of β

predict_particles(mach, X; response = :β)
4-element Vector{MonteCarloMeasurements.Particles{Float64, 1000}}:
  0.175 ± 5.4e-5
  0.00582 ± 5.1e-5
 -0.0561 ± 8.6e-5
  0.143 ± 8.3e-5

Difference between the posterior distribution of β to the true values:

truth.β - predict_particles(mach, X; response = :β)
4-element Vector{MonteCarloMeasurements.Particles{Float64, 1000}}:
 -8.5e-5 ± 5.4e-5
 -1.01e-5 ± 5.1e-5
 -2.54e-5 ± 8.6e-5
 -4.39e-5 ± 8.3e-5

Compare the joint posterior predictive distribution of μ to the true values:

truth.μ - predict_particles(mach, X; response = :μ)
1000-element Vector{MonteCarloMeasurements.Particles{Float64, 1000}}:
  6.76e-5 ± 0.00013
  0.000123 ± 0.00012
 -1.99e-5 ± 8.7e-5
 -0.000166 ± 0.00011
 -3.72e-5 ± 3.7e-5
 -6.63e-5 ± 9.3e-5
 -0.000145 ± 0.00012
 -8.76e-5 ± 0.0001
 -8.56e-5 ± 0.0001
 -1.15e-5 ± 0.00013
  ⋮
 -4.04e-5 ± 0.0001
  4.63e-5 ± 0.00011
 -7.95e-5 ± 0.00011
 -0.000176 ± 0.00016
 -2.03e-5 ± 0.00013
 -5.29e-5 ± 0.00013
 -1.81e-6 ± 5.8e-6
  7.19e-5 ± 0.00013
 -4.57e-5 ± 2.9e-5

Compare the joint posterior predictive distribution of y to the true values:

truth.y - predict_particles(mach, X; response = :y)
1000-element Vector{MonteCarloMeasurements.Particles{Float64, 1000}}:
  0.00112 ± 0.0017
 -0.00166 ± 0.0017
 -0.00101 ± 0.0017
 -0.00136 ± 0.0017
 -0.00088 ± 0.0017
 -0.00231 ± 0.0017
  0.00148 ± 0.0017
 -0.00272 ± 0.0017
 -0.000735 ± 0.0017
 -0.000278 ± 0.0017
  ⋮
 -0.00112 ± 0.0017
 -8.67e-5 ± 0.0017
 -0.00131 ± 0.0017
  0.00113 ± 0.0017
 -0.00125 ± 0.0017
  0.000985 ± 0.0017
 -0.00253 ± 0.0017
 -1.84e-5 ± 0.0017
  0.00214 ± 0.0017

Construct each of the marginal posterior predictive distributions:

predictor_marginal = MLJBase.predict(mach, X)
typeof(predictor_marginal)
Vector{SossMLJ.SossMLJPredictor{SossMLJModel{SossMLJ.SossMLJPredictor, Soss.Model{NamedTuple{(:X, :s, :t), T} where T<:Tuple, TypeEncoding(begin
    σ ~ HalfNormal(t)
    p = size(X, 2)
    β ~ Normal(0, s) |> iid(p)
    η = X * β
    μ = η
    y ~ For(eachindex(μ)) do j
            Normal(μ[j], σ)
        end
end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:s, :t), Tuple{Float64, Float64}}, typeof(Soss.dynamicHMC), Symbol, typeof(SossMLJ.default_transform)}, Vector{NamedTuple{(:σ, :β), Tuple{Float64, Vector{Float64}}}}, Soss.Model{NamedTuple{(:X, :σ, :β), T} where T<:Tuple, TypeEncoding(begin
    η = X * β
    μ = η
    y ~ For(eachindex(μ)) do j
            Normal(μ[j], σ)
        end
end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:X, :s, :t), Tuple{Matrix{Float64}, Float64, Float64}}}} (alias for Array{SossMLJ.SossMLJPredictor{SossMLJModel{SossMLJ.SossMLJPredictor, Soss.Model{NamedTuple{(:X, :s, :t), T} where T<:Tuple, TypeEncoding(begin
    σ ~ HalfNormal(t)
    p = size(X, 2)
    β ~ Normal(0, s) |> iid(p)
    η = X * β
    μ = η
    y ~ For(eachindex(μ)) do j
            Normal(μ[j], σ)
        end
end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:s, :t), Tuple{Float64, Float64}}, typeof(Soss.dynamicHMC), Symbol, typeof(SossMLJ.default_transform)}, Array{NamedTuple{(:σ, :β), Tuple{Float64, Array{Float64, 1}}}, 1}, Soss.Model{NamedTuple{(:X, :σ, :β), T} where T<:Tuple, TypeEncoding(begin
    η = X * β
    μ = η
    y ~ For(eachindex(μ)) do j
            Normal(μ[j], σ)
        end
end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:X, :s, :t), Tuple{Array{Float64, 2}, Float64, Float64}}}, 1})

predictor_marginal has one element for each row in X

size(predictor_marginal)
(1000,)

Draw a single sample from each of the marginal posterior predictive distributions:

only.(rand.(predictor_marginal))
1000-element Vector{Float64}:
 -0.2029081681417486
 -0.3576940499452062
  0.08648248590455086
  0.2910674787640341
  0.07939612960203418
  0.17392501442911953
  0.2561922904988066
  0.13815388355650055
  0.205974839276187
 -0.05591489250599671
  ⋮
 -0.011053156647649717
 -0.14095152150154167
  0.08848082228383287
  0.43815443326164366
 -0.09618528626052235
 -0.028975341747410545
  0.006729761408040472
 -0.07113148258663066
  0.09414399184775661

Use cross-validation to evaluate the model with respect to the expected value of the root mean square error (RMSE)

evaluate!(mach, resampling=CV(; nfolds = 6, shuffle = true), measure=rms_expected, operation=predict_particles)
┌──────────────────┬───────────────┬────────────────────────────────────────────
│ _.measure        │ _.measurement │ _.per_fold                                ⋯
├──────────────────┼───────────────┼────────────────────────────────────────────
│ RMSExpected @353 │ 0.00241       │ [0.00242, 0.0024, 0.00233, 0.00246, 0.002 ⋯
└──────────────────┴───────────────┴────────────────────────────────────────────
                                                                1 column omitted
_.per_observation = [missing]
_.fitted_params_per_fold = [ … ]
_.report_per_fold = [ … ]

Use cross-validation to evaluate the model with respect to the median of the root mean square error (RMSE)

evaluate!(mach, resampling=CV(; nfolds = 6, shuffle = true), measure=rms_median, operation=predict_particles)
┌────────────────┬───────────────┬──────────────────────────────────────────────
│ _.measure      │ _.measurement │ _.per_fold                                  ⋯
├────────────────┼───────────────┼──────────────────────────────────────────────
│ RMSMedian @917 │ 0.00241       │ [0.00248, 0.00239, 0.00244, 0.00237, 0.0023 ⋯
└────────────────┴───────────────┴──────────────────────────────────────────────
                                                                1 column omitted
_.per_observation = [missing]
_.fitted_params_per_fold = [ … ]
_.report_per_fold = [ … ]

This page was generated using Literate.jl.