Linear regression
Import the necessary packages:
using Distributions
using MLJBase
using Soss
using SossMLJ
using Statistics
In this example, we fit a Bayesian linear regression model with the canonical link function.
Suppose that we are given a matrix of features X
and a column vector of labels y
. X
has n
rows and p
columns. y
has n
elements. We assume that our observation vector y
is a realization of a random variable Y
. We define μ
(mu) as the expected value of Y
, i.e. μ := E[Y]
. Our model comprises three components:
- The probability distribution of
Y
: for linear regression, we assume that eachYᵢ
follows a normal distribution with meanμᵢ
and varianceσ²
. - The systematic component, which consists of linear predictor
η
(eta), which we define asη := Xβ
, whereβ
is the column vector ofp
coefficients. - The link function
g
, which provides the following relationship:g(E[Y]) = g(μ) = η = Xβ
. It follows thatμ = g⁻¹(η)
, whereg⁻¹
denotes the inverse ofg
. For linear regression, the canonical link function is the identity function. Therefore, when using the canonical link function,μ = g⁻¹(η) = η
.
In this model, the parameters that we want to estimate are β
and σ
. We need to select prior distributions for these parameters. For each βᵢ
we choose a normal distribution with zero mean and variance s²
. Here, βᵢ
denotes the i
th component of β
. For σ
, we will choose a half-normal distribution with variance t²
. s
and t
are hyperparameters that we will need to choose.
We define this model using the Soss probabilistic programming library:
m = @model X, s, t begin
p = size(X, 2) # number of features
β ~ Normal(0, s) |> iid(p) # coefficients
σ ~ HalfNormal(t) # dispersion
η = X * β # linear predictor
μ = η # `μ = g⁻¹(η) = η`
y ~ For(eachindex(μ)) do j
Normal(μ[j], σ) # `Yᵢ ~ Normal(mean=μᵢ, variance=σ²)`
end
end;
Generate some synthetic features. Let us generate two continuous features and two binary categorical features.
num_rows = 1_000
x1 = randn(num_rows)
x2 = randn(num_rows)
x3 = Int.(rand(num_rows) .> 0.5)
x4 = Int.(rand(num_rows) .> 0.5)
X = (x1 = x1, x2 = x2, x3 = x3, x4 = x4)
(x1 = [-1.6700856897114855, -1.6942240673112658, -0.32029893745609683, 1.159970452395149, 0.49163425514204323, 0.16147929350720341, 1.0062660210329246, 0.333273046858179, 0.3382568505571667, -0.869097290055658 … -0.10778819068004106, 0.29318866951907074, -1.2512105492262828, -0.03041851654220319, 1.7483919066714215, -0.2780831616504583, 0.0873182389633058, 0.03351974677905975, -1.157180378472239, 0.5120458721664282], x2 = [0.5062808706662215, -0.4249878849724696, 0.318092015051467, -0.20298686435691451, -0.4569289744234028, 0.8552890564978077, -0.931751377855438, -0.9938603640929531, 1.2779327765860795, 1.6012053741953365 … -0.14608824648686372, -0.9810971148000734, -0.9093841118968836, 1.2731142582876696, -1.6775754980618205, 1.8382871020723717, 1.9942584339316822, -0.10356437312761582, -1.7285653848156333, 0.21970860457692595], x3 = [1, 1, 0, 1, 0, 0, 1, 1, 0, 1 … 0, 1, 1, 1, 0, 1, 1, 0, 0, 0], x4 = [1, 0, 1, 1, 0, 1, 1, 1, 1, 1 … 0, 0, 1, 1, 1, 0, 0, 0, 1, 0],)
Define the hyperparameters of our prior distributions:
hyperparams = (s=0.1, t=0.1)
(s = 0.1, t = 0.1,)
Convert the Soss model into a SossMLJModel
:
model = SossMLJModel(;
model = m,
hyperparams = hyperparams,
infer = dynamicHMC,
response = :y,
);
Generate some synthetic labels:
args = merge(model.transform(X), hyperparams)
truth = rand(m(args))
(p = 4, η = [-0.20275568525146412, -0.35523942755617555, 0.08861522711190198, 0.2886250233522517, 0.08342322639962983, 0.17608869103234287, 0.2574794885136685, 0.1392873784558527, 0.209495428379339, -0.056152772550266766 … -0.01972091480964892, -0.010503439026173345, -0.1376417861273326, 0.08878143782593509, 0.4392179767736961, -0.09414415146369073, -0.029261499294802464, 0.005267108668979438, -0.06980161285471137, 0.09092827504471078], μ = [-0.20275568525146412, -0.35523942755617555, 0.08861522711190198, 0.2886250233522517, 0.08342322639962983, 0.17608869103234287, 0.2574794885136685, 0.1392873784558527, 0.209495428379339, -0.056152772550266766 … -0.01972091480964892, -0.010503439026173345, -0.1376417861273326, 0.08878143782593509, 0.4392179767736961, -0.09414415146369073, -0.029261499294802464, 0.005267108668979438, -0.06980161285471137, 0.09092827504471078], σ = 0.0017122031201463523, β = [0.1750854179049791, 0.005810011548661296, -0.056136314186577954, 0.14284678214993748], y = [-0.2017024948271163, -0.35702186577313677, 0.08762483034906982, 0.28743372782242244, 0.08258122038769387, 0.17384055189326758, 0.25910820364294695, 0.13665809619618874, 0.2088466329088354, -0.056420622674392934 … -0.0192259461580595, -0.011580755358182529, -0.13777460995678234, 0.0875479345664117, 0.4405267697599636, -0.095371751409188, -0.02822488249640713, 0.0027424882475212565, -0.06989280119012435, 0.09311368058461622],)
Create an MLJ machine for fitting our model:
mach = MLJBase.machine(model, X, truth.y)
[34mMachine{SossMLJModel{,…}} @383[39m trained 0 times. args: 1: [34mSource @692[39m ⏎ `ScientificTypes.Table{Union{AbstractVector{ScientificTypes.Continuous}, AbstractVector{ScientificTypes.Count}}}` 2: [34mSource @213[39m ⏎ `AbstractVector{ScientificTypes.Continuous}`
Fit the machine. This may take several minutes.
fit!(mach)
[34mMachine{SossMLJModel{,…}} @383[39m trained 1 time. args: 1: [34mSource @692[39m ⏎ `ScientificTypes.Table{Union{AbstractVector{ScientificTypes.Continuous}, AbstractVector{ScientificTypes.Count}}}` 2: [34mSource @213[39m ⏎ `AbstractVector{ScientificTypes.Continuous}`
Construct the posterior distribution and the joint posterior predictive distribution:
predictor_joint = predict_joint(mach, X)
typeof(predictor_joint)
SossMLJ.SossMLJPredictor{SossMLJModel{SossMLJ.SossMLJPredictor, Soss.Model{NamedTuple{(:X, :s, :t), T} where T<:Tuple, TypeEncoding(begin σ ~ HalfNormal(t) p = size(X, 2) β ~ Normal(0, s) |> iid(p) η = X * β μ = η y ~ For(eachindex(μ)) do j Normal(μ[j], σ) end end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:s, :t), Tuple{Float64, Float64}}, typeof(Soss.dynamicHMC), Symbol, typeof(SossMLJ.default_transform)}, Vector{NamedTuple{(:σ, :β), Tuple{Float64, Vector{Float64}}}}, Soss.Model{NamedTuple{(:X, :σ, :β), T} where T<:Tuple, TypeEncoding(begin η = X * β μ = η y ~ For(eachindex(μ)) do j Normal(μ[j], σ) end end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:X, :s, :t), Tuple{Matrix{Float64}, Float64, Float64}}}
Draw a single sample from the joint posterior predictive distribution:
single_sample = rand(predictor_joint; response = :y)
1000-element Vector{Float64}: -0.2058099806640922 -0.357264596055084 0.09028597558099684 0.2885899334316771 0.08108720168829273 0.17558552016483095 0.2583286269280545 0.1384550056644606 0.20483706795878984 -0.05535619223340847 ⋮ -0.008861568836340237 -0.13547858765101395 0.09236390319495555 0.4418599667204133 -0.09529181850164065 -0.027695379188191854 0.005834352962548248 -0.06808782477098402 0.09148433580196842
Evaluate the logpdf of the joint posterior predictive distribution at this sample:
logpdf(predictor_joint, single_sample)
4980.358137350192
True β
:
truth.β
4-element Vector{Float64}: 0.1750854179049791 0.005810011548661296 -0.056136314186577954 0.14284678214993748
Posterior distribution of β
predict_particles(mach, X; response = :β)
4-element Vector{MonteCarloMeasurements.Particles{Float64, 1000}}: 0.175 ± 5.4e-5 0.00582 ± 5.1e-5 -0.0561 ± 8.6e-5 0.143 ± 8.3e-5
Difference between the posterior distribution of β
to the true values:
truth.β - predict_particles(mach, X; response = :β)
4-element Vector{MonteCarloMeasurements.Particles{Float64, 1000}}: -8.5e-5 ± 5.4e-5 -1.01e-5 ± 5.1e-5 -2.54e-5 ± 8.6e-5 -4.39e-5 ± 8.3e-5
Compare the joint posterior predictive distribution of μ
to the true values:
truth.μ - predict_particles(mach, X; response = :μ)
1000-element Vector{MonteCarloMeasurements.Particles{Float64, 1000}}: 6.76e-5 ± 0.00013 0.000123 ± 0.00012 -1.99e-5 ± 8.7e-5 -0.000166 ± 0.00011 -3.72e-5 ± 3.7e-5 -6.63e-5 ± 9.3e-5 -0.000145 ± 0.00012 -8.76e-5 ± 0.0001 -8.56e-5 ± 0.0001 -1.15e-5 ± 0.00013 ⋮ -4.04e-5 ± 0.0001 4.63e-5 ± 0.00011 -7.95e-5 ± 0.00011 -0.000176 ± 0.00016 -2.03e-5 ± 0.00013 -5.29e-5 ± 0.00013 -1.81e-6 ± 5.8e-6 7.19e-5 ± 0.00013 -4.57e-5 ± 2.9e-5
Compare the joint posterior predictive distribution of y
to the true values:
truth.y - predict_particles(mach, X; response = :y)
1000-element Vector{MonteCarloMeasurements.Particles{Float64, 1000}}: 0.00112 ± 0.0017 -0.00166 ± 0.0017 -0.00101 ± 0.0017 -0.00136 ± 0.0017 -0.00088 ± 0.0017 -0.00231 ± 0.0017 0.00148 ± 0.0017 -0.00272 ± 0.0017 -0.000735 ± 0.0017 -0.000278 ± 0.0017 ⋮ -0.00112 ± 0.0017 -8.67e-5 ± 0.0017 -0.00131 ± 0.0017 0.00113 ± 0.0017 -0.00125 ± 0.0017 0.000985 ± 0.0017 -0.00253 ± 0.0017 -1.84e-5 ± 0.0017 0.00214 ± 0.0017
Construct each of the marginal posterior predictive distributions:
predictor_marginal = MLJBase.predict(mach, X)
typeof(predictor_marginal)
Vector{SossMLJ.SossMLJPredictor{SossMLJModel{SossMLJ.SossMLJPredictor, Soss.Model{NamedTuple{(:X, :s, :t), T} where T<:Tuple, TypeEncoding(begin σ ~ HalfNormal(t) p = size(X, 2) β ~ Normal(0, s) |> iid(p) η = X * β μ = η y ~ For(eachindex(μ)) do j Normal(μ[j], σ) end end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:s, :t), Tuple{Float64, Float64}}, typeof(Soss.dynamicHMC), Symbol, typeof(SossMLJ.default_transform)}, Vector{NamedTuple{(:σ, :β), Tuple{Float64, Vector{Float64}}}}, Soss.Model{NamedTuple{(:X, :σ, :β), T} where T<:Tuple, TypeEncoding(begin η = X * β μ = η y ~ For(eachindex(μ)) do j Normal(μ[j], σ) end end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:X, :s, :t), Tuple{Matrix{Float64}, Float64, Float64}}}} (alias for Array{SossMLJ.SossMLJPredictor{SossMLJModel{SossMLJ.SossMLJPredictor, Soss.Model{NamedTuple{(:X, :s, :t), T} where T<:Tuple, TypeEncoding(begin σ ~ HalfNormal(t) p = size(X, 2) β ~ Normal(0, s) |> iid(p) η = X * β μ = η y ~ For(eachindex(μ)) do j Normal(μ[j], σ) end end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:s, :t), Tuple{Float64, Float64}}, typeof(Soss.dynamicHMC), Symbol, typeof(SossMLJ.default_transform)}, Array{NamedTuple{(:σ, :β), Tuple{Float64, Array{Float64, 1}}}, 1}, Soss.Model{NamedTuple{(:X, :σ, :β), T} where T<:Tuple, TypeEncoding(begin η = X * β μ = η y ~ For(eachindex(μ)) do j Normal(μ[j], σ) end end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:X, :s, :t), Tuple{Array{Float64, 2}, Float64, Float64}}}, 1})
predictor_marginal
has one element for each row in X
size(predictor_marginal)
(1000,)
Draw a single sample from each of the marginal posterior predictive distributions:
only.(rand.(predictor_marginal))
1000-element Vector{Float64}: -0.2029081681417486 -0.3576940499452062 0.08648248590455086 0.2910674787640341 0.07939612960203418 0.17392501442911953 0.2561922904988066 0.13815388355650055 0.205974839276187 -0.05591489250599671 ⋮ -0.011053156647649717 -0.14095152150154167 0.08848082228383287 0.43815443326164366 -0.09618528626052235 -0.028975341747410545 0.006729761408040472 -0.07113148258663066 0.09414399184775661
Use cross-validation to evaluate the model with respect to the expected value of the root mean square error (RMSE)
evaluate!(mach, resampling=CV(; nfolds = 6, shuffle = true), measure=rms_expected, operation=predict_particles)
┌──────────────────┬───────────────┬──────────────────────────────────────────── │ _.measure │ _.measurement │ _.per_fold ⋯ ├──────────────────┼───────────────┼──────────────────────────────────────────── │ RMSExpected @353 │ 0.00241 │ [0.00242, 0.0024, 0.00233, 0.00246, 0.002 ⋯ └──────────────────┴───────────────┴──────────────────────────────────────────── 1 column omitted _.per_observation = [missing] _.fitted_params_per_fold = [ … ] _.report_per_fold = [ … ]
Use cross-validation to evaluate the model with respect to the median of the root mean square error (RMSE)
evaluate!(mach, resampling=CV(; nfolds = 6, shuffle = true), measure=rms_median, operation=predict_particles)
┌────────────────┬───────────────┬────────────────────────────────────────────── │ _.measure │ _.measurement │ _.per_fold ⋯ ├────────────────┼───────────────┼────────────────────────────────────────────── │ RMSMedian @917 │ 0.00241 │ [0.00248, 0.00239, 0.00244, 0.00237, 0.0023 ⋯ └────────────────┴───────────────┴────────────────────────────────────────────── 1 column omitted _.per_observation = [missing] _.fitted_params_per_fold = [ … ] _.report_per_fold = [ … ]
This page was generated using Literate.jl.