Linear regression
Import the necessary packages:
using Distributions
using MLJBase
using Soss
using SossMLJ
using StatisticsIn this example, we fit a Bayesian linear regression model with the canonical link function.
Suppose that we are given a matrix of features X and a column vector of labels y. X has n rows and p columns. y has n elements. We assume that our observation vector y is a realization of a random variable Y. We define μ (mu) as the expected value of Y, i.e. μ := E[Y]. Our model comprises three components:
- The probability distribution of
Y: for linear regression, we assume that eachYᵢfollows a normal distribution with meanμᵢand varianceσ². - The systematic component, which consists of linear predictor
η(eta), which we define asη := Xβ, whereβis the column vector ofpcoefficients. - The link function
g, which provides the following relationship:g(E[Y]) = g(μ) = η = Xβ. It follows thatμ = g⁻¹(η), whereg⁻¹denotes the inverse ofg. For linear regression, the canonical link function is the identity function. Therefore, when using the canonical link function,μ = g⁻¹(η) = η.
In this model, the parameters that we want to estimate are β and σ. We need to select prior distributions for these parameters. For each βᵢ we choose a normal distribution with zero mean and variance s². Here, βᵢ denotes the ith component of β. For σ, we will choose a half-normal distribution with variance t². s and t are hyperparameters that we will need to choose.
We define this model using the Soss probabilistic programming library:
m = @model X, s, t begin
p = size(X, 2) # number of features
β ~ Normal(0, s) |> iid(p) # coefficients
σ ~ HalfNormal(t) # dispersion
η = X * β # linear predictor
μ = η # `μ = g⁻¹(η) = η`
y ~ For(eachindex(μ)) do j
Normal(μ[j], σ) # `Yᵢ ~ Normal(mean=μᵢ, variance=σ²)`
end
end;Generate some synthetic features. Let us generate two continuous features and two binary categorical features.
num_rows = 1_000
x1 = randn(num_rows)
x2 = randn(num_rows)
x3 = Int.(rand(num_rows) .> 0.5)
x4 = Int.(rand(num_rows) .> 0.5)
X = (x1 = x1, x2 = x2, x3 = x3, x4 = x4)(x1 = [2.197270666426264, 0.4138312695688653, -1.1541158196208916, 0.25993995089503064, 0.023035746206824557, 0.2545133890020534, 0.7388187983089918, -0.1766994816071766, 0.5517505743453249, -1.5914451912287733 … -1.179179384709137, 0.1133234538607634, -1.6788062214861181, -0.13165206373061364, -0.09758635217400706, 1.537857647343547, 1.4446217614211656, -0.6871662188147665, -0.5264055613343627, -0.48705200332165277], x2 = [0.3701541233677246, 0.07827622305042344, -0.5886498879333973, 0.37406754105414847, -0.6261590779051691, -0.3245275877931951, -0.4148506357463219, -0.3359469033551532, 0.5675768585856337, -1.5536598616527424 … 0.6673668337994053, 0.630034052238109, -1.256094909781851, -0.05153270178008876, -0.7029831038354106, -0.42621555652724913, -0.1455920964369708, -0.18305641741415643, -0.042934636491973106, 1.6418874742818406], x3 = [0, 0, 1, 0, 1, 0, 0, 0, 1, 1 … 1, 1, 1, 1, 0, 0, 0, 0, 1, 1], x4 = [1, 1, 0, 0, 1, 0, 0, 1, 1, 1 … 1, 1, 1, 0, 1, 0, 1, 0, 0, 1],)
Define the hyperparameters of our prior distributions:
hyperparams = (s=0.1, t=0.1)(s = 0.1, t = 0.1,)
Convert the Soss model into a SossMLJModel:
model = SossMLJModel(;
model = m,
hyperparams = hyperparams,
infer = dynamicHMC,
response = :y,
);Generate some synthetic labels:
args = merge(model.transform(X), hyperparams)
truth = rand(m(args))(p = 4, η = [-0.11150566369623519, -0.047564538230596594, 0.11283955141257997, -0.018730990627310956, 0.04474126970354958, 0.0012702577751921833, -0.011279190425107315, -0.01737695967771414, -0.005646255253769855, 0.12145399365241459 … 0.04553745378815674, 0.006262547223902043, 0.11573288854442745, 0.06568402971580438, -0.00942629679293882, -0.03589190783993686, -0.07337685172665509, 0.02664082871704342, 0.07775889071471921, -0.003726492193868655], μ = [-0.11150566369623519, -0.047564538230596594, 0.11283955141257997, -0.018730990627310956, 0.04474126970354958, 0.0012702577751921833, -0.011279190425107315, -0.01737695967771414, -0.005646255253769855, 0.12145399365241459 … 0.04553745378815674, 0.006262547223902043, 0.11573288854442745, 0.06568402971580438, -0.00942629679293882, -0.03589190783993686, -0.07337685172665509, 0.02664082871704342, 0.07775889071471921, -0.003726492193868655], σ = 0.07225427941267197, β = [-0.031206677653237227, -0.02838826469841777, 0.06011268222182188, -0.032428113058783285], y = [-0.15999920688273342, -0.010576887605155579, 0.1432688145784165, -0.057297240307195345, 0.06339359413249293, -0.07515270149588585, -0.018520881842564133, 0.14963780640250482, 0.045739247131950475, 0.08771443934606689 … 0.048845353990389265, -0.09412325449873193, 0.1386056359097484, 0.07706541700139062, 0.07320607880590566, 0.012210103620919892, -0.06080456747473415, -0.05775996424380779, 0.10358038240423717, 0.028436889180012845],)
Create an MLJ machine for fitting our model:
mach = MLJBase.machine(model, X, truth.y)[34mMachine{SossMLJModel{,…}} @819[39m trained 0 times.
args:
1: [34mSource @863[39m ⏎ `ScientificTypes.Table{Union{AbstractVector{ScientificTypes.Continuous}, AbstractVector{ScientificTypes.Count}}}`
2: [34mSource @499[39m ⏎ `AbstractVector{ScientificTypes.Continuous}`
Fit the machine. This may take several minutes.
fit!(mach)[34mMachine{SossMLJModel{,…}} @819[39m trained 1 time.
args:
1: [34mSource @863[39m ⏎ `ScientificTypes.Table{Union{AbstractVector{ScientificTypes.Continuous}, AbstractVector{ScientificTypes.Count}}}`
2: [34mSource @499[39m ⏎ `AbstractVector{ScientificTypes.Continuous}`
Construct the posterior distribution and the joint posterior predictive distribution:
predictor_joint = predict_joint(mach, X)
typeof(predictor_joint)SossMLJ.SossMLJPredictor{SossMLJModel{SossMLJ.SossMLJPredictor, Soss.Model{NamedTuple{(:X, :s, :t), T} where T<:Tuple, TypeEncoding(begin
σ ~ HalfNormal(t)
p = size(X, 2)
β ~ Normal(0, s) |> iid(p)
η = X * β
μ = η
y ~ For(eachindex(μ)) do j
Normal(μ[j], σ)
end
end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:s, :t), Tuple{Float64, Float64}}, typeof(Soss.dynamicHMC), Symbol, typeof(SossMLJ.default_transform)}, Vector{NamedTuple{(:σ, :β), Tuple{Float64, Vector{Float64}}}}, Soss.Model{NamedTuple{(:X, :σ, :β), T} where T<:Tuple, TypeEncoding(begin
η = X * β
μ = η
y ~ For(eachindex(μ)) do j
Normal(μ[j], σ)
end
end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:X, :s, :t), Tuple{Matrix{Float64}, Float64, Float64}}}Draw a single sample from the joint posterior predictive distribution:
single_sample = rand(predictor_joint; response = :y)1000-element Vector{Float64}:
-0.11545644783819184
-0.012361037251430701
0.1691934149423639
-0.022025969957086118
0.1428815030365233
-0.022052418983513303
-0.0456874986136417
0.005967250908713476
0.03205562846231033
0.06893048772110236
⋮
0.050358644828681895
0.1598466387548342
0.05264635486065514
0.08321108987507014
0.020701160690197104
-0.03961829175521227
0.09207813020380065
0.17082374720576177
-0.005827642594057962Evaluate the logpdf of the joint posterior predictive distribution at this sample:
logpdf(predictor_joint, single_sample)1156.5604431835402
True β:
truth.β4-element Vector{Float64}:
-0.031206677653237227
-0.02838826469841777
0.06011268222182188
-0.032428113058783285Posterior distribution of β
predict_particles(mach, X; response = :β)4-element Vector{MonteCarloMeasurements.Particles{Float64, 1000}}:
-0.0331 ± 0.0025
-0.0288 ± 0.0022
0.0503 ± 0.0038
-0.0314 ± 0.0039Difference between the posterior distribution of β to the true values:
truth.β - predict_particles(mach, X; response = :β)4-element Vector{MonteCarloMeasurements.Particles{Float64, 1000}}:
0.00185 ± 0.0025
0.000376 ± 0.0022
0.00979 ± 0.0038
-0.00102 ± 0.0039Compare the joint posterior predictive distribution of μ to the true values:
truth.μ - predict_particles(mach, X; response = :μ)1000-element Vector{MonteCarloMeasurements.Particles{Float64, 1000}}:
0.00319 ± 0.0066
-0.000228 ± 0.004
0.00743 ± 0.0048
0.000622 ± 0.001
0.00857 ± 0.0038
0.00035 ± 0.00097
0.00121 ± 0.0021
-0.00148 ± 0.004
0.01 ± 0.0043
0.00523 ± 0.0061
⋮
0.00921 ± 0.0041
0.00518 ± 0.006
0.00953 ± 0.0038
-0.00147 ± 0.0042
0.00269 ± 0.004
0.0016 ± 0.0052
-0.00134 ± 0.0017
0.0088 ± 0.004
0.00848 ± 0.0057Compare the joint posterior predictive distribution of y to the true values:
truth.y - predict_particles(mach, X; response = :y)1000-element Vector{MonteCarloMeasurements.Particles{Float64, 1000}}:
-0.0454 ± 0.073
0.0368 ± 0.072
0.0379 ± 0.072
-0.0379 ± 0.072
0.0272 ± 0.073
-0.076 ± 0.072
-0.00605 ± 0.072
0.166 ± 0.072
0.0614 ± 0.073
-0.0286 ± 0.073
⋮
-0.0911 ± 0.072
0.028 ± 0.072
0.0209 ± 0.072
0.0811 ± 0.073
0.0508 ± 0.072
0.0141 ± 0.072
-0.0856 ± 0.072
0.0347 ± 0.072
0.0406 ± 0.073Construct each of the marginal posterior predictive distributions:
predictor_marginal = MLJBase.predict(mach, X)
typeof(predictor_marginal)Vector{SossMLJ.SossMLJPredictor{SossMLJModel{SossMLJ.SossMLJPredictor, Soss.Model{NamedTuple{(:X, :s, :t), T} where T<:Tuple, TypeEncoding(begin
σ ~ HalfNormal(t)
p = size(X, 2)
β ~ Normal(0, s) |> iid(p)
η = X * β
μ = η
y ~ For(eachindex(μ)) do j
Normal(μ[j], σ)
end
end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:s, :t), Tuple{Float64, Float64}}, typeof(Soss.dynamicHMC), Symbol, typeof(SossMLJ.default_transform)}, Vector{NamedTuple{(:σ, :β), Tuple{Float64, Vector{Float64}}}}, Soss.Model{NamedTuple{(:X, :σ, :β), T} where T<:Tuple, TypeEncoding(begin
η = X * β
μ = η
y ~ For(eachindex(μ)) do j
Normal(μ[j], σ)
end
end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:X, :s, :t), Tuple{Matrix{Float64}, Float64, Float64}}}} (alias for Array{SossMLJ.SossMLJPredictor{SossMLJModel{SossMLJ.SossMLJPredictor, Soss.Model{NamedTuple{(:X, :s, :t), T} where T<:Tuple, TypeEncoding(begin
σ ~ HalfNormal(t)
p = size(X, 2)
β ~ Normal(0, s) |> iid(p)
η = X * β
μ = η
y ~ For(eachindex(μ)) do j
Normal(μ[j], σ)
end
end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:s, :t), Tuple{Float64, Float64}}, typeof(Soss.dynamicHMC), Symbol, typeof(SossMLJ.default_transform)}, Array{NamedTuple{(:σ, :β), Tuple{Float64, Array{Float64, 1}}}, 1}, Soss.Model{NamedTuple{(:X, :σ, :β), T} where T<:Tuple, TypeEncoding(begin
η = X * β
μ = η
y ~ For(eachindex(μ)) do j
Normal(μ[j], σ)
end
end), TypeEncoding(Main.ex-example-linear-regression)}, NamedTuple{(:X, :s, :t), Tuple{Array{Float64, 2}, Float64, Float64}}}, 1})predictor_marginal has one element for each row in X
size(predictor_marginal)(1000,)
Draw a single sample from each of the marginal posterior predictive distributions:
only.(rand.(predictor_marginal))1000-element Vector{Float64}:
-0.1230746184619088
-0.057404377317192834
0.21208130547482443
-0.14165790337285958
0.0990016609144794
0.09415726293437164
-0.041550372506801145
-0.084082193263741
0.106627129916683
0.16259805982752218
⋮
0.009533288413319924
0.18378567233452134
0.06429967921011412
-0.0012213393664031276
0.0009755055107716626
-0.16096731031085287
0.10866295491970496
0.13569664192385855
-0.025411104329098315Use cross-validation to evaluate the model with respect to the expected value of the root mean square error (RMSE)
evaluate!(mach, resampling=CV(; nfolds = 6, shuffle = true), measure=rms_expected, operation=predict_particles)┌──────────────────┬───────────────┬────────────────────────────────────────────
│ _.measure │ _.measurement │ _.per_fold ⋯
├──────────────────┼───────────────┼────────────────────────────────────────────
│ RMSExpected @474 │ 0.102 │ [0.101, 0.102, 0.105, 0.102, 0.104, 0.102 ⋯
└──────────────────┴───────────────┴────────────────────────────────────────────
1 column omitted
_.per_observation = [missing]
_.fitted_params_per_fold = [ … ]
_.report_per_fold = [ … ]
Use cross-validation to evaluate the model with respect to the median of the root mean square error (RMSE)
evaluate!(mach, resampling=CV(; nfolds = 6, shuffle = true), measure=rms_median, operation=predict_particles)┌────────────────┬───────────────┬────────────────────────────────────────────── │ _.measure │ _.measurement │ _.per_fold ⋯ ├────────────────┼───────────────┼────────────────────────────────────────────── │ RMSMedian @549 │ 0.102 │ [0.101, 0.0977, 0.104, 0.103, 0.102, 0.106] ⋯ └────────────────┴───────────────┴────────────────────────────────────────────── _.per_observation = [missing] _.fitted_params_per_fold = [ … ] _.report_per_fold = [ … ]
This page was generated using Literate.jl.