Introduction

So I’ve learned a little bit about Bayesian Hierarchical Modeling at FDA and decided to put down my thoughts and write about it more to reinforced what I’ve learned. I also want to try out some new javascript data visual libraries.

A great book I’ve found is, “Introduction to Hierarchical Bayesian Modeling for Ecological Data” by Parent and Rivot [1].

While at the FDA I code my own model without using any MCMC framework and it was very slow in R. I realize I need a MCMC framework under my toolbelt. After some research I decided on Stan using the rstan r package.

The Graph (not DAG; not a Bayesian network)

Graph represents the salmon migration and birth cycle. Each edge represent a year pass. The nodes are square because they’re given. The information is given from previous knowledge.

Variable	Definition
Wt	Salmon Eggs
0+	Young-of-the-year (hatched)
PSm	Pre-smolts
Sm1	Smolt after 1 year
Sp1	Returns one year earlier than Sp2
Parr1	Smaller juveniles left behind by Sm1
Sm2	Smolt after 2 year
Sp2	Returns one year after Sp1

The Models - Introducing Probability into the Graph

We’re going to take the graph that represent Salmon’s migration cycle and introduce uncertainty to it (model it via probability). By doing this we create a new graph that is complete different from the Salmon’s migration cycle graph. It is a graph base on probability view.

We go through each square node and one by one apply a model and probability to it.

Model is time base, t will represent a particular year.

Sp _t = Sp1 _t + Sp2 _t = # of spawners at t-th year
W _t = # of eggs spawned by the adults returning in year t
0+ _t = Young-of-the-year at t-th year
PSm _t = pre-smolts at t-th year
Sm1 _t = 1+ smolts (1 year to smolt) at t-th year
P1 _t = Parr1 = smaller juveniles left behind by Sm1 at t-th year
Sm2 _t = 2+ smolts (2 years to smolt) at t-th year

Spawners -> Eggs

W _t = Sp _t ⋅ P_f ⋅ fec

W _t = # of eggs spawned by the adults returning in year t
Sp _t = # of spawners = Sp1 + Sp2
P_f = proportion of females
fec = mean of fecundity (fertility)

Eggs -> 0+ juveniles

This is Ricker Cruve model with parameters (α,β) which is a classic discrete population model.

0+_t+1 = α ⋅ W_t ⋅ e^{-β ⋅ W_t} ⋅ e^ε_t where ε_t ~iid N(0, σ²)

0+_t+1 = freshwater production of juveniles resulting from the reproduction of the spawners returning in year t

0+ juveniles -> Smolts

PSm_t+2 ~ Binomial(0+_t+1, γ₀₊) = # of 0+_t+1 will survive and migrate to PSm_t+2

Sm1_t+2 ~ Binomial(PSm_t+2, θ_Sm1) = # of PSm_t+2 will survive and migrate as 1+smolts (Sm1)

Sm2_t+3 ~ Binomial(Parr1_t+2, γ_parr1) = # of Parr1_t+2 will survive and migrate as 2+smolts (Sm2)

PSm_t+2 = young-of-the-year 0+_t+1 will survive next spring year t+2
γ₀₊ = survival rate of 0+
θ_Sm1 = proportion of pre-smolts will migrate as 1+Smolts (survival rate)
γ_parr1 = survival rate of parr1

Sp1_t+3 ~ Binomial(Sm1_t+2, γ_Sm)

Sp2_t+4 ~ Binomial(Sm2_t+3, γ_Sm)

γ_Sm = survival rate

Learning from observations

These two are observed and given:

C_Sm1,t = observations = # of smolts caught downstream trap
π_Sm = trap efficiency

Using these data points we can figure out the unknowns.

Our unknowns, the parameters, are: α, β, σ, γ₀₊, θ_sm1, γ_Parr1, γ_Sm

# of smolts caught downstream trap can be model as a binomial distribution either the smolt is caught or not.

C_Sm1,t ~ Binomial(Sm1_t, π_Sm)

*Note (advance): observations assume Bayesian’s property of exchangability

The Models - Creating a probability graphical model

0+_t+1 = α ⋅ W_t ⋅ e^{-β ⋅ W_t} ⋅ e^ε_t where ε_t ~iid N(0, σ²)

0+_t+1 = freshwater production of juveniles resulting from the reproduction of the spawners returning in year t

PSm_t+2 ~ Binomial(0+_t+1, γ₀₊) = # of 0+_t+1 will survive and migrate to PSm_t+2

γ₀₊ = survival rate of 0+

Sm1_t+2 ~ Binomial(PSm_t+2, θ_Sm1) = # of PSm_t+2 will survive and migrate as 1+smolts (Sm1)

PSm_t+2 = young-of-the-year 0+_t+1 will survive next spring year t+2
θ_Sm1 = proportion of pre-smolts will migrate as 1+Smolts (survival rate)

Sm2_t+3 ~ Binomial(Parr1_t+2, γ_parr1) = # of Parr1_t+2 will survive and migrate as 2+smolts (Sm2)

Sp1_t+3 ~ Binomial(Sm1_t+2, γ_Sm)

Sp2_t+4 ~ Binomial(Sm2_t+3, γ_Sm)

Now we put all the parts together into graph.

Okay, now that we got the probability graphical model down we can figure out the joint probability distribution.

P(J_t) = ?

Step 1. Looking at the graph, we’re going to start with all nodes with no parent: α, β, σ, W_t, γ₀₊, θ_Sm1, γ_Parr1, and γ_Sm.

P(J_t) = P[α] ⋅ P[β] ⋅ P[σ] ⋅ P[W_t] ⋅ P[γ₀₊] ⋅ P[θ_Sm1] ⋅ P[γ_Parr1] ⋅ P[γ_Sm] …

Step 2. Now we’re going to look at the nodes with parents.

0+_t+1 is P[0+_t+1 | W_t, α, β, σ]
PSm_t+2 is P[PSm_t+2 | 0+_t+2, γ₀₊]
Sm1_t+2 & Parr1_t+2 is P[Sm1_t+2, Parr1_t+2 | PSm_t+2, θ_Sm1]. Notice how complex this one is. It is because Sm1 and Parr1 both share the same parameters.
P[Sp1_t+3 | Sm1_t+2, γ_Sm]
P[Sm2_t+3 | Parr1_t+2, γ_Parr1]
P[Sp2_t+4 | Sm2_t+3, γ_Sm]

Okay so what? Where’s the Bayesian network?

Not yet. The book needs to introduce the concept of a simple model vs a hierarchical model and some terminology.

So far we haven’t introduce any observational variable (random variable) at all.

θ represents parameters
Z represents latent parameters
Y represents the output Random Variable. (little y represent the realization/sample of Y random variable).

Left is a simple model. The right graph is a hierarchical model.

Z represents latent variables (nuisance variables), basically variables we don’t really care, also they’re hidden we don’t observed it directly like Y. Y represents observations. Observations are random so Y is capitalized and smaller y is the realization of Y or a sample of Y. The node is pink because it is an observable.

P[θ, Z, Y] = P[θ] ⋅ P[Z | θ] ⋅ P[Y | θ, Z]

Well first what’s the experiment?

For this it’s salmon captures and they’re model via binomial distribution either you catch the fish or not.

Notice the C stands for catches.

C_{0+, t+1} ~ Binomial(0+_t+1, π₀₊)
C_{Sm1, t+2} ~ Binomial(Sm1_t+2, π_Sm)
C_{Sm2, t+3} ~ Binomial(Sm1_t+3, π_Sm)
C_{Sp1, t+3} ~ Binomial(Sp1_t+3, π_Sp)
C_{Sp2, t+4} ~ Binomial(Sp2_t+4, π_Sp)

Once again the squares represent known/given values (the π’s are given). The pink circle means observed values. Pink in general means they’re known either by given or by observations. The purple boxes represent grouping and group the nodes into their respective group.

Ok. Finally, we got a Bayesian network. Really, what now?

How does y (the sample or realization of Y) fits in this fancy graph?

What happen when the Observation is available?

Before that notice how we build the model and the direction. The direction is downward from the Salmon cycle toward the latent variable and then towards the obsevation.

Why did the book brought this up? It is because when you train the model using the data/observations that are available you go in the opposite direction.

You start at the Y (observation layer and Y is a random variable) and Y is now, Y = y, since little y is the realization of random variable Y. y is a sample of Y or the data (values not just some placeholder variable). And you go up to latent layer and then to the parameter later.

Let’s see it mathematically:

Here’s the joint probability:

P[Y, θ, Z]

Now here’s the joint probability with Y = y, when we have data to train the model and find the paramenter.

P[θ, Z | Y = y]

Given Y = y, the observations propagate upward from the observation to the latent layer to the parameter layer.

This is how you train the model after you are done creating the model.

You can see the Bayes Rule connection too right? We’re always dealing with Joint Probability and Conditional Probability.

Bayesian make it so that they’re conditionally independent. This is one of the property of Bayesian statistic.

This is now a posterior distribution. Posterior being after the data. Prior distribution is before the data.

P[θ, Z | Y = y] = posterior distributionundefined

I’m going to repeat it again.

Posterior is after the data have been inputed.

Prior is before the data. It is your prior belief.

In Bayesian you need to supply a belief in form of a prior distribution. It’s weird but don’t worry if you don’t know anything then you can use a noninformative prior distribution.

The belief thing is also away to encode expert belief too.

So given what we have now, we just have to apply Bayes’ Rule to the conditional probability and you get your parameter values.

Bayes’ Rule

P[θ Z | Y = y] = P[θ, Z, Y = y] / P[Y = y]

Some stat here and you get.

P[θ Z | Y = y] ∝ P[θ] ⋅ P[Z | θ] ⋅ P[Y = y | θ, Z]

Conclusion

I highly recommend this book. Andrew Gelman’s DBA book is more PhD level and his approach is not graphical like this but more mathy. Being visual this book helps a lot into tying things together.

There was no observations/data and no code for this chapter. Ah dangit. Well until next time, stay tuned for the next episode of Bayesian man.

¹Buy the book if you like what you see on the post. This is basically my notes on chapter 1 of the book. It’s an amazing book and I highly recommend it.

What did I learn about myself

I’m glad I’m reviewing this chapter of the book again. I have a confession to make, if I want to understand a material/subject I need to read 3 times and do projects on it and a review of what I’ve learned. I need tons of practice. I guess this is one of the reason why I started this blog.

This chapter ties in again DAG, Bayes’ Rule, and conditional probabilities. Good refresher and clear up things that I was wrong about. Especially the salmon breeding cycle, I didn’t think about the fact that it wasn’t a DAG. And that from that model we create a Bayesian Graph Model (DAG).

I think I’ll go through each chapter of this book as a refresher while playing with javascript graphical libraries and hopefully learn Stan. I need to make sure I didn’t miss out on anything from the first reading.

What Did I Get to Practice? (for me)

~~Bayesian Hierarchical Modeling using rstan.~~
Tried out a javascript data visualisation library, cytoscape.js, for modeling graphs.
Gets to refresh Bayesian Graphical Model (Bayesian Network).

Rough Roadmap for Bayesian HM

Finish off this book. Introduction to Hierarchical Bayesian Modeling for Ecological Data (Chapman & Hall/CRC Applied Environmental Statistics)
Read this for Hamiltonian Markov Chain(Statistics in the social and behavioral sciences series) Gill, Jeff-Bayesian Methods A Social and Behavioral Sciences Approach-CRC Press (2014)
Read https://arxiv.org/abs/1111.4246 an implementation of HMC
Measure theory videos
DBA 3 reread again learning Dirichlet Process

Etc..

Introduction to Hierarchical Bayesian Modeling for Ecological Data (Chapman & Hall/CRC Applied Environmental Statistics) The book link is Amazon affiliated. If you get it at CRC publishing you can get it 20 bucks cheaper if you use a discount code, just that it takes longer to ship. Also note I would recommend reading “Doing Bayesian Data Analysis” first before even trying to get into Hierarchical Modeling.
Would like to thank this website for all the html mathematical notations.
The salmon sushi picture was taken from pixabay under creative common license.