Linear regression
Import the necessary packages:
using Distributions
using MLJBase
using Soss
using SossMLJ
using StatisticsIn this example, we fit a Bayesian linear regression model with the canonical link function.
Suppose that we are given a matrix of features X and a column vector of labels y. X has n rows and p columns. y has n elements. We assume that our observation vector y is a realization of a random variable Y. We define μ (mu) as the expected value of Y, i.e. μ := E[Y]. Our model comprises three components:
- The probability distribution of
Y: for linear regression, we assume that eachYᵢfollows a normal distribution with meanμᵢand varianceσ². - The systematic component, which consists of linear predictor
η(eta), which we define asη := Xβ, whereβis the column vector ofpcoefficients. - The link function
g, which provides the following relationship:g(E[Y]) = g(μ) = η = Xβ. It follows thatμ = g⁻¹(η), whereg⁻¹denotes the inverse ofg. For linear regression, the canonical link function is the identity function. Therefore, when using the canonical link function,μ = g⁻¹(η) = η.
In this model, the parameters that we want to estimate are β and σ. We need to select prior distributions for these parameters. For each βᵢ we choose a normal distribution with zero mean and variance s². Here, βᵢ denotes the ith component of β. For σ, we will choose a half-normal distribution with variance t². s and t are hyperparameters that we will need to choose.
We define this model using the Soss probabilistic programming library:
m = @model X, s, t begin
p = size(X, 2) # number of features
β ~ Normal(0, s) |> iid(p) # coefficients
σ ~ HalfNormal(t) # dispersion
η = X * β # linear predictor
μ = η # `μ = g⁻¹(η) = η`
y ~ For(eachindex(μ)) do j
Normal(μ[j], σ) # `Yᵢ ~ Normal(mean=μᵢ, variance=σ²)`
end
end;Generate some synthetic features. Let us generate two continuous features and two binary categorical features.
num_rows = 1_000
x1 = randn(num_rows)
x2 = randn(num_rows)
x3 = Int.(rand(num_rows) .> 0.5)
x4 = Int.(rand(num_rows) .> 0.5)
X = (x1 = x1, x2 = x2, x3 = x3, x4 = x4)(x1 = [-1.6700856897114855, -1.6942240673112658, -0.32029893745609683, 1.159970452395149, 0.49163425514204323, 0.16147929350720341, 1.0062660210329246, 0.333273046858179, 0.3382568505571667, -0.869097290055658 … -0.10778819068004106, 0.29318866951907074, -1.2512105492262828, -0.03041851654220319, 1.7483919066714215, -0.2780831616504583, 0.0873182389633058, 0.03351974677905975, -1.157180378472239, 0.5120458721664282], x2 = [0.5062808706662215, -0.4249878849724696, 0.318092015051467, -0.20298686435691451, -0.4569289744234028, 0.8552890564978077, -0.931751377855438, -0.9938603640929531, 1.2779327765860795, 1.6012053741953365 … -0.14608824648686372, -0.9810971148000734, -0.9093841118968836, 1.2731142582876696, -1.6775754980618205, 1.8382871020723717, 1.9942584339316822, -0.10356437312761582, -1.7285653848156333, 0.21970860457692595], x3 = [1, 1, 0, 1, 0, 0, 1, 1, 0, 1 … 0, 1, 1, 1, 0, 1, 1, 0, 0, 0], x4 = [1, 0, 1, 1, 0, 1, 1, 1, 1, 1 … 0, 0, 1, 1, 1, 0, 0, 0, 1, 0],)
Define the hyperparameters of our prior distributions:
hyperparams = (s=0.1, t=0.1)(s = 0.1, t = 0.1,)
Convert the Soss model into a SossMLJModel:
model = SossMLJModel(;
model = m,
hyperparams = hyperparams,
infer = dynamicHMC,
response = :y,
);Generate some synthetic labels:
args = merge(model.transform(X), hyperparams)
truth = rand(m(args))(p = 4, η = [-0.20275568525146412, -0.35523942755617555, 0.08861522711190198, 0.2886250233522517, 0.08342322639962983, 0.17608869103234287, 0.2574794885136685, 0.1392873784558527, 0.209495428379339, -0.056152772550266766 … -0.01972091480964892, -0.010503439026173345, -0.1376417861273326, 0.08878143782593509, 0.4392179767736961, -0.09414415146369073, -0.029261499294802464, 0.005267108668979438, -0.06980161285471137, 0.09092827504471078], μ = [-0.20275568525146412, -0.35523942755617555, 0.08861522711190198, 0.2886250233522517, 0.08342322639962983, 0.17608869103234287, 0.2574794885136685, 0.1392873784558527, 0.209495428379339, -0.056152772550266766 … -0.01972091480964892, -0.010503439026173345, -0.1376417861273326, 0.08878143782593509, 0.4392179767736961, -0.09414415146369073, -0.029261499294802464, 0.005267108668979438, -0.06980161285471137, 0.09092827504471078], σ = 0.0017122031201463523, β = [0.1750854179049791, 0.005810011548661296, -0.056136314186577954, 0.14284678214993748], y = [-0.2017024948271163, -0.35702186577313677, 0.08762483034906982, 0.28743372782242244, 0.08258122038769387, 0.17384055189326758, 0.25910820364294695, 0.13665809619618874, 0.2088466329088354, -0.056420622674392934 … -0.0192259461580595, -0.011580755358182529, -0.13777460995678234, 0.0875479345664117, 0.4405267697599636, -0.095371751409188, -0.02822488249640713, 0.0027424882475212565, -0.06989280119012435, 0.09311368058461622],)
Create an MLJ machine for fitting our model:
mach = MLJBase.machine(model, X, truth.y)[34mMachine{SossMLJModel{,…}} @383[39m trained 0 times.
args:
1: [34mSource @692[39m ⏎ `ScientificTypes.Table{Union{AbstractVector{ScientificTypes.Continuous}, AbstractVector{ScientificTypes.Count}}}`
2: [34mSource @213[39m ⏎ `AbstractVector{ScientificTypes.Continuous}`
Fit the machine. This may take several minutes.
fit!(mach)[34mMachine{SossMLJModel{,…}} @383[39m trained 1 time.
args:
1: [34mSource @692[39m ⏎ `ScientificTypes.Table{Union{AbstractVector{ScientificTypes.Continuous}, AbstractVector{ScientificTypes.Count}}}`
2: [34mSource @213[39m ⏎ `AbstractVector{ScientificTypes.Continuous}`
Construct the posterior distribution and the joint posterior predictive distribution:
predictor_joint = predict_joint(mach, X)
typeof(predictor_joint)SossMLJ.SossMLJPredictor{SossMLJModel{SossMLJ.SossMLJPredictor, Soss.Model{NamedTuple{(:X, :s, :t), T} where T<:Tuple, TypeEncoding(begin
σ ~ HalfNormal(t)
p = size(X, 2)
β ~ Normal(0, s) |> iid(p)
η = X * β
μ = η
y ~ For(eachindex(μ)) do j
Normal(μ[j], σ)
end
end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:s, :t), Tuple{Float64, Float64}}, typeof(Soss.dynamicHMC), Symbol, typeof(SossMLJ.default_transform)}, Vector{NamedTuple{(:σ, :β), Tuple{Float64, Vector{Float64}}}}, Soss.Model{NamedTuple{(:X, :σ, :β), T} where T<:Tuple, TypeEncoding(begin
η = X * β
μ = η
y ~ For(eachindex(μ)) do j
Normal(μ[j], σ)
end
end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:X, :s, :t), Tuple{Matrix{Float64}, Float64, Float64}}}Draw a single sample from the joint posterior predictive distribution:
single_sample = rand(predictor_joint; response = :y)1000-element Vector{Float64}:
-0.2058099806640922
-0.357264596055084
0.09028597558099684
0.2885899334316771
0.08108720168829273
0.17558552016483095
0.2583286269280545
0.1384550056644606
0.20483706795878984
-0.05535619223340847
⋮
-0.008861568836340237
-0.13547858765101395
0.09236390319495555
0.4418599667204133
-0.09529181850164065
-0.027695379188191854
0.005834352962548248
-0.06808782477098402
0.09148433580196842Evaluate the logpdf of the joint posterior predictive distribution at this sample:
logpdf(predictor_joint, single_sample)4980.358137350192
True β:
truth.β4-element Vector{Float64}:
0.1750854179049791
0.005810011548661296
-0.056136314186577954
0.14284678214993748Posterior distribution of β
predict_particles(mach, X; response = :β)4-element Vector{MonteCarloMeasurements.Particles{Float64, 1000}}:
0.175 ± 5.4e-5
0.00582 ± 5.1e-5
-0.0561 ± 8.6e-5
0.143 ± 8.3e-5Difference between the posterior distribution of β to the true values:
truth.β - predict_particles(mach, X; response = :β)4-element Vector{MonteCarloMeasurements.Particles{Float64, 1000}}:
-8.5e-5 ± 5.4e-5
-1.01e-5 ± 5.1e-5
-2.54e-5 ± 8.6e-5
-4.39e-5 ± 8.3e-5Compare the joint posterior predictive distribution of μ to the true values:
truth.μ - predict_particles(mach, X; response = :μ)1000-element Vector{MonteCarloMeasurements.Particles{Float64, 1000}}:
6.76e-5 ± 0.00013
0.000123 ± 0.00012
-1.99e-5 ± 8.7e-5
-0.000166 ± 0.00011
-3.72e-5 ± 3.7e-5
-6.63e-5 ± 9.3e-5
-0.000145 ± 0.00012
-8.76e-5 ± 0.0001
-8.56e-5 ± 0.0001
-1.15e-5 ± 0.00013
⋮
-4.04e-5 ± 0.0001
4.63e-5 ± 0.00011
-7.95e-5 ± 0.00011
-0.000176 ± 0.00016
-2.03e-5 ± 0.00013
-5.29e-5 ± 0.00013
-1.81e-6 ± 5.8e-6
7.19e-5 ± 0.00013
-4.57e-5 ± 2.9e-5Compare the joint posterior predictive distribution of y to the true values:
truth.y - predict_particles(mach, X; response = :y)1000-element Vector{MonteCarloMeasurements.Particles{Float64, 1000}}:
0.00112 ± 0.0017
-0.00166 ± 0.0017
-0.00101 ± 0.0017
-0.00136 ± 0.0017
-0.00088 ± 0.0017
-0.00231 ± 0.0017
0.00148 ± 0.0017
-0.00272 ± 0.0017
-0.000735 ± 0.0017
-0.000278 ± 0.0017
⋮
-0.00112 ± 0.0017
-8.67e-5 ± 0.0017
-0.00131 ± 0.0017
0.00113 ± 0.0017
-0.00125 ± 0.0017
0.000985 ± 0.0017
-0.00253 ± 0.0017
-1.84e-5 ± 0.0017
0.00214 ± 0.0017Construct each of the marginal posterior predictive distributions:
predictor_marginal = MLJBase.predict(mach, X)
typeof(predictor_marginal)Vector{SossMLJ.SossMLJPredictor{SossMLJModel{SossMLJ.SossMLJPredictor, Soss.Model{NamedTuple{(:X, :s, :t), T} where T<:Tuple, TypeEncoding(begin
σ ~ HalfNormal(t)
p = size(X, 2)
β ~ Normal(0, s) |> iid(p)
η = X * β
μ = η
y ~ For(eachindex(μ)) do j
Normal(μ[j], σ)
end
end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:s, :t), Tuple{Float64, Float64}}, typeof(Soss.dynamicHMC), Symbol, typeof(SossMLJ.default_transform)}, Vector{NamedTuple{(:σ, :β), Tuple{Float64, Vector{Float64}}}}, Soss.Model{NamedTuple{(:X, :σ, :β), T} where T<:Tuple, TypeEncoding(begin
η = X * β
μ = η
y ~ For(eachindex(μ)) do j
Normal(μ[j], σ)
end
end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:X, :s, :t), Tuple{Matrix{Float64}, Float64, Float64}}}} (alias for Array{SossMLJ.SossMLJPredictor{SossMLJModel{SossMLJ.SossMLJPredictor, Soss.Model{NamedTuple{(:X, :s, :t), T} where T<:Tuple, TypeEncoding(begin
σ ~ HalfNormal(t)
p = size(X, 2)
β ~ Normal(0, s) |> iid(p)
η = X * β
μ = η
y ~ For(eachindex(μ)) do j
Normal(μ[j], σ)
end
end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:s, :t), Tuple{Float64, Float64}}, typeof(Soss.dynamicHMC), Symbol, typeof(SossMLJ.default_transform)}, Array{NamedTuple{(:σ, :β), Tuple{Float64, Array{Float64, 1}}}, 1}, Soss.Model{NamedTuple{(:X, :σ, :β), T} where T<:Tuple, TypeEncoding(begin
η = X * β
μ = η
y ~ For(eachindex(μ)) do j
Normal(μ[j], σ)
end
end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:X, :s, :t), Tuple{Array{Float64, 2}, Float64, Float64}}}, 1})predictor_marginal has one element for each row in X
size(predictor_marginal)(1000,)
Draw a single sample from each of the marginal posterior predictive distributions:
only.(rand.(predictor_marginal))1000-element Vector{Float64}:
-0.2029081681417486
-0.3576940499452062
0.08648248590455086
0.2910674787640341
0.07939612960203418
0.17392501442911953
0.2561922904988066
0.13815388355650055
0.205974839276187
-0.05591489250599671
⋮
-0.011053156647649717
-0.14095152150154167
0.08848082228383287
0.43815443326164366
-0.09618528626052235
-0.028975341747410545
0.006729761408040472
-0.07113148258663066
0.09414399184775661Use cross-validation to evaluate the model with respect to the expected value of the root mean square error (RMSE)
evaluate!(mach, resampling=CV(; nfolds = 6, shuffle = true), measure=rms_expected, operation=predict_particles)┌──────────────────┬───────────────┬────────────────────────────────────────────
│ _.measure │ _.measurement │ _.per_fold ⋯
├──────────────────┼───────────────┼────────────────────────────────────────────
│ RMSExpected @353 │ 0.00241 │ [0.00242, 0.0024, 0.00233, 0.00246, 0.002 ⋯
└──────────────────┴───────────────┴────────────────────────────────────────────
1 column omitted
_.per_observation = [missing]
_.fitted_params_per_fold = [ … ]
_.report_per_fold = [ … ]
Use cross-validation to evaluate the model with respect to the median of the root mean square error (RMSE)
evaluate!(mach, resampling=CV(; nfolds = 6, shuffle = true), measure=rms_median, operation=predict_particles)┌────────────────┬───────────────┬──────────────────────────────────────────────
│ _.measure │ _.measurement │ _.per_fold ⋯
├────────────────┼───────────────┼──────────────────────────────────────────────
│ RMSMedian @917 │ 0.00241 │ [0.00248, 0.00239, 0.00244, 0.00237, 0.0023 ⋯
└────────────────┴───────────────┴──────────────────────────────────────────────
1 column omitted
_.per_observation = [missing]
_.fitted_params_per_fold = [ … ]
_.report_per_fold = [ … ]
This page was generated using Literate.jl.