Linear regression

Import the necessary packages:

using Distributions
using MLJBase
using Soss
using SossMLJ
using Statistics

In this example, we fit a Bayesian linear regression model with the canonical link function.

Suppose that we are given a matrix of features X and a column vector of labels y. X has n rows and p columns. y has n elements. We assume that our observation vector y is a realization of a random variable Y. We define μ (mu) as the expected value of Y, i.e. μ := E[Y]. Our model comprises three components:

  1. The probability distribution of Y: for linear regression, we assume that each Yᵢ follows a normal distribution with mean μᵢ and variance σ².
  2. The systematic component, which consists of linear predictor η (eta), which we define as η := Xβ, where β is the column vector of p coefficients.
  3. The link function g, which provides the following relationship: g(E[Y]) = g(μ) = η = Xβ. It follows that μ = g⁻¹(η), where g⁻¹ denotes the inverse of g. For linear regression, the canonical link function is the identity function. Therefore, when using the canonical link function, μ = g⁻¹(η) = η.

In this model, the parameters that we want to estimate are β and σ. We need to select prior distributions for these parameters. For each βᵢ we choose a normal distribution with zero mean and variance . Here, βᵢ denotes the ith component of β. For σ, we will choose a half-normal distribution with variance . s and t are hyperparameters that we will need to choose.

We define this model using the Soss probabilistic programming library:

m = @model X, s, t begin
    p = size(X, 2) # number of features
    β ~ Normal(0, s) |> iid(p) # coefficients
    σ ~ HalfNormal(t) # dispersion
    η = X * β # linear predictor
    μ = η # `μ = g⁻¹(η) = η`
    y ~ For(eachindex(μ)) do j
        Normal(μ[j], σ) # `Yᵢ ~ Normal(mean=μᵢ, variance=σ²)`
    end
end;

Generate some synthetic features. Let us generate two continuous features and two binary categorical features.

num_rows = 1_000
x1 = randn(num_rows)
x2 = randn(num_rows)
x3 = Int.(rand(num_rows) .> 0.5)
x4 = Int.(rand(num_rows) .> 0.5)
X = (x1 = x1, x2 = x2, x3 = x3, x4 = x4)
(x1 = [2.197270666426264, 0.4138312695688653, -1.1541158196208916, 0.25993995089503064, 0.023035746206824557, 0.2545133890020534, 0.7388187983089918, -0.1766994816071766, 0.5517505743453249, -1.5914451912287733  …  -1.179179384709137, 0.1133234538607634, -1.6788062214861181, -0.13165206373061364, -0.09758635217400706, 1.537857647343547, 1.4446217614211656, -0.6871662188147665, -0.5264055613343627, -0.48705200332165277],
 x2 = [0.3701541233677246, 0.07827622305042344, -0.5886498879333973, 0.37406754105414847, -0.6261590779051691, -0.3245275877931951, -0.4148506357463219, -0.3359469033551532, 0.5675768585856337, -1.5536598616527424  …  0.6673668337994053, 0.630034052238109, -1.256094909781851, -0.05153270178008876, -0.7029831038354106, -0.42621555652724913, -0.1455920964369708, -0.18305641741415643, -0.042934636491973106, 1.6418874742818406],
 x3 = [0, 0, 1, 0, 1, 0, 0, 0, 1, 1  …  1, 1, 1, 1, 0, 0, 0, 0, 1, 1],
 x4 = [1, 1, 0, 0, 1, 0, 0, 1, 1, 1  …  1, 1, 1, 0, 1, 0, 1, 0, 0, 1],)

Define the hyperparameters of our prior distributions:

hyperparams = (s=0.1, t=0.1)
(s = 0.1,
 t = 0.1,)

Convert the Soss model into a SossMLJModel:

model = SossMLJModel(;
    model       = m,
    hyperparams = hyperparams,
    infer       = dynamicHMC,
    response    = :y,
);

Generate some synthetic labels:

args = merge(model.transform(X), hyperparams)
truth = rand(m(args))
(p = 4,
 η = [-0.11150566369623519, -0.047564538230596594, 0.11283955141257997, -0.018730990627310956, 0.04474126970354958, 0.0012702577751921833, -0.011279190425107315, -0.01737695967771414, -0.005646255253769855, 0.12145399365241459  …  0.04553745378815674, 0.006262547223902043, 0.11573288854442745, 0.06568402971580438, -0.00942629679293882, -0.03589190783993686, -0.07337685172665509, 0.02664082871704342, 0.07775889071471921, -0.003726492193868655],
 μ = [-0.11150566369623519, -0.047564538230596594, 0.11283955141257997, -0.018730990627310956, 0.04474126970354958, 0.0012702577751921833, -0.011279190425107315, -0.01737695967771414, -0.005646255253769855, 0.12145399365241459  …  0.04553745378815674, 0.006262547223902043, 0.11573288854442745, 0.06568402971580438, -0.00942629679293882, -0.03589190783993686, -0.07337685172665509, 0.02664082871704342, 0.07775889071471921, -0.003726492193868655],
 σ = 0.07225427941267197,
 β = [-0.031206677653237227, -0.02838826469841777, 0.06011268222182188, -0.032428113058783285],
 y = [-0.15999920688273342, -0.010576887605155579, 0.1432688145784165, -0.057297240307195345, 0.06339359413249293, -0.07515270149588585, -0.018520881842564133, 0.14963780640250482, 0.045739247131950475, 0.08771443934606689  …  0.048845353990389265, -0.09412325449873193, 0.1386056359097484, 0.07706541700139062, 0.07320607880590566, 0.012210103620919892, -0.06080456747473415, -0.05775996424380779, 0.10358038240423717, 0.028436889180012845],)

Create an MLJ machine for fitting our model:

mach = MLJBase.machine(model, X, truth.y)
Machine{SossMLJModel{,…}} @819 trained 0 times.
  args: 
    1:	Source @863 ⏎ `ScientificTypes.Table{Union{AbstractVector{ScientificTypes.Continuous}, AbstractVector{ScientificTypes.Count}}}`
    2:	Source @499 ⏎ `AbstractVector{ScientificTypes.Continuous}`

Fit the machine. This may take several minutes.

fit!(mach)
Machine{SossMLJModel{,…}} @819 trained 1 time.
  args: 
    1:	Source @863 ⏎ `ScientificTypes.Table{Union{AbstractVector{ScientificTypes.Continuous}, AbstractVector{ScientificTypes.Count}}}`
    2:	Source @499 ⏎ `AbstractVector{ScientificTypes.Continuous}`

Construct the posterior distribution and the joint posterior predictive distribution:

predictor_joint = predict_joint(mach, X)
typeof(predictor_joint)
SossMLJ.SossMLJPredictor{SossMLJModel{SossMLJ.SossMLJPredictor, Soss.Model{NamedTuple{(:X, :s, :t), T} where T<:Tuple, TypeEncoding(begin
    σ ~ HalfNormal(t)
    p = size(X, 2)
    β ~ Normal(0, s) |> iid(p)
    η = X * β
    μ = η
    y ~ For(eachindex(μ)) do j
            Normal(μ[j], σ)
        end
end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:s, :t), Tuple{Float64, Float64}}, typeof(Soss.dynamicHMC), Symbol, typeof(SossMLJ.default_transform)}, Vector{NamedTuple{(:σ, :β), Tuple{Float64, Vector{Float64}}}}, Soss.Model{NamedTuple{(:X, :σ, :β), T} where T<:Tuple, TypeEncoding(begin
    η = X * β
    μ = η
    y ~ For(eachindex(μ)) do j
            Normal(μ[j], σ)
        end
end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:X, :s, :t), Tuple{Matrix{Float64}, Float64, Float64}}}

Draw a single sample from the joint posterior predictive distribution:

single_sample = rand(predictor_joint; response = :y)
1000-element Vector{Float64}:
 -0.11545644783819184
 -0.012361037251430701
  0.1691934149423639
 -0.022025969957086118
  0.1428815030365233
 -0.022052418983513303
 -0.0456874986136417
  0.005967250908713476
  0.03205562846231033
  0.06893048772110236
  ⋮
  0.050358644828681895
  0.1598466387548342
  0.05264635486065514
  0.08321108987507014
  0.020701160690197104
 -0.03961829175521227
  0.09207813020380065
  0.17082374720576177
 -0.005827642594057962

Evaluate the logpdf of the joint posterior predictive distribution at this sample:

logpdf(predictor_joint, single_sample)
1156.5604431835402

True β:

truth.β
4-element Vector{Float64}:
 -0.031206677653237227
 -0.02838826469841777
  0.06011268222182188
 -0.032428113058783285

Posterior distribution of β

predict_particles(mach, X; response = :β)
4-element Vector{MonteCarloMeasurements.Particles{Float64, 1000}}:
 -0.0331 ± 0.0025
 -0.0288 ± 0.0022
  0.0503 ± 0.0038
 -0.0314 ± 0.0039

Difference between the posterior distribution of β to the true values:

truth.β - predict_particles(mach, X; response = :β)
4-element Vector{MonteCarloMeasurements.Particles{Float64, 1000}}:
  0.00185 ± 0.0025
  0.000376 ± 0.0022
  0.00979 ± 0.0038
 -0.00102 ± 0.0039

Compare the joint posterior predictive distribution of μ to the true values:

truth.μ - predict_particles(mach, X; response = :μ)
1000-element Vector{MonteCarloMeasurements.Particles{Float64, 1000}}:
  0.00319 ± 0.0066
 -0.000228 ± 0.004
  0.00743 ± 0.0048
  0.000622 ± 0.001
  0.00857 ± 0.0038
  0.00035 ± 0.00097
  0.00121 ± 0.0021
 -0.00148 ± 0.004
  0.01 ± 0.0043
  0.00523 ± 0.0061
  ⋮
  0.00921 ± 0.0041
  0.00518 ± 0.006
  0.00953 ± 0.0038
 -0.00147 ± 0.0042
  0.00269 ± 0.004
  0.0016 ± 0.0052
 -0.00134 ± 0.0017
  0.0088 ± 0.004
  0.00848 ± 0.0057

Compare the joint posterior predictive distribution of y to the true values:

truth.y - predict_particles(mach, X; response = :y)
1000-element Vector{MonteCarloMeasurements.Particles{Float64, 1000}}:
 -0.0454 ± 0.073
  0.0368 ± 0.072
  0.0379 ± 0.072
 -0.0379 ± 0.072
  0.0272 ± 0.073
 -0.076 ± 0.072
 -0.00605 ± 0.072
  0.166 ± 0.072
  0.0614 ± 0.073
 -0.0286 ± 0.073
  ⋮
 -0.0911 ± 0.072
  0.028 ± 0.072
  0.0209 ± 0.072
  0.0811 ± 0.073
  0.0508 ± 0.072
  0.0141 ± 0.072
 -0.0856 ± 0.072
  0.0347 ± 0.072
  0.0406 ± 0.073

Construct each of the marginal posterior predictive distributions:

predictor_marginal = MLJBase.predict(mach, X)
typeof(predictor_marginal)
Vector{SossMLJ.SossMLJPredictor{SossMLJModel{SossMLJ.SossMLJPredictor, Soss.Model{NamedTuple{(:X, :s, :t), T} where T<:Tuple, TypeEncoding(begin
    σ ~ HalfNormal(t)
    p = size(X, 2)
    β ~ Normal(0, s) |> iid(p)
    η = X * β
    μ = η
    y ~ For(eachindex(μ)) do j
            Normal(μ[j], σ)
        end
end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:s, :t), Tuple{Float64, Float64}}, typeof(Soss.dynamicHMC), Symbol, typeof(SossMLJ.default_transform)}, Vector{NamedTuple{(:σ, :β), Tuple{Float64, Vector{Float64}}}}, Soss.Model{NamedTuple{(:X, :σ, :β), T} where T<:Tuple, TypeEncoding(begin
    η = X * β
    μ = η
    y ~ For(eachindex(μ)) do j
            Normal(μ[j], σ)
        end
end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:X, :s, :t), Tuple{Matrix{Float64}, Float64, Float64}}}} (alias for Array{SossMLJ.SossMLJPredictor{SossMLJModel{SossMLJ.SossMLJPredictor, Soss.Model{NamedTuple{(:X, :s, :t), T} where T<:Tuple, TypeEncoding(begin
    σ ~ HalfNormal(t)
    p = size(X, 2)
    β ~ Normal(0, s) |> iid(p)
    η = X * β
    μ = η
    y ~ For(eachindex(μ)) do j
            Normal(μ[j], σ)
        end
end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:s, :t), Tuple{Float64, Float64}}, typeof(Soss.dynamicHMC), Symbol, typeof(SossMLJ.default_transform)}, Array{NamedTuple{(:σ, :β), Tuple{Float64, Array{Float64, 1}}}, 1}, Soss.Model{NamedTuple{(:X, :σ, :β), T} where T<:Tuple, TypeEncoding(begin
    η = X * β
    μ = η
    y ~ For(eachindex(μ)) do j
            Normal(μ[j], σ)
        end
end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:X, :s, :t), Tuple{Array{Float64, 2}, Float64, Float64}}}, 1})

predictor_marginal has one element for each row in X

size(predictor_marginal)
(1000,)

Draw a single sample from each of the marginal posterior predictive distributions:

only.(rand.(predictor_marginal))
1000-element Vector{Float64}:
 -0.1230746184619088
 -0.057404377317192834
  0.21208130547482443
 -0.14165790337285958
  0.0990016609144794
  0.09415726293437164
 -0.041550372506801145
 -0.084082193263741
  0.106627129916683
  0.16259805982752218
  ⋮
  0.009533288413319924
  0.18378567233452134
  0.06429967921011412
 -0.0012213393664031276
  0.0009755055107716626
 -0.16096731031085287
  0.10866295491970496
  0.13569664192385855
 -0.025411104329098315

Use cross-validation to evaluate the model with respect to the expected value of the root mean square error (RMSE)

evaluate!(mach, resampling=CV(; nfolds = 6, shuffle = true), measure=rms_expected, operation=predict_particles)
┌──────────────────┬───────────────┬────────────────────────────────────────────
│ _.measure        │ _.measurement │ _.per_fold                                ⋯
├──────────────────┼───────────────┼────────────────────────────────────────────
│ RMSExpected @474 │ 0.102         │ [0.101, 0.102, 0.105, 0.102, 0.104, 0.102 ⋯
└──────────────────┴───────────────┴────────────────────────────────────────────
                                                                1 column omitted
_.per_observation = [missing]
_.fitted_params_per_fold = [ … ]
_.report_per_fold = [ … ]

Use cross-validation to evaluate the model with respect to the median of the root mean square error (RMSE)

evaluate!(mach, resampling=CV(; nfolds = 6, shuffle = true), measure=rms_median, operation=predict_particles)
┌────────────────┬───────────────┬──────────────────────────────────────────────
│ _.measure      │ _.measurement │ _.per_fold                                  ⋯
├────────────────┼───────────────┼──────────────────────────────────────────────
│ RMSMedian @549 │ 0.102         │ [0.101, 0.0977, 0.104, 0.103, 0.102, 0.106] ⋯
└────────────────┴───────────────┴──────────────────────────────────────────────
_.per_observation = [missing]
_.fitted_params_per_fold = [ … ]
_.report_per_fold = [ … ]

This page was generated using Literate.jl.