Introduction
So I’ve learned a little bit about Bayesian Hierarchical Modeling at FDA and decided to put down my thoughts and write about it more to reinforced what I’ve learned. I also want to try out some new javascript data visual libraries.
A great book I’ve found is, “Introduction to Hierarchical Bayesian Modeling for Ecological Data” by Parent and Rivot [1].
While at the FDA I code my own model without using any MCMC framework and it was very slow in R. I realize I need a MCMC framework under my toolbelt. After some research I decided on Stan using the rstan r package.
The Graph (not DAG; not a Bayesian network)
Graph represents the salmon migration and birth cycle. Each edge represent a year pass. The nodes are square because they’re given. The information is given from previous knowledge.
Variable | Definition |
---|---|
Wt | Salmon Eggs |
0+ | Young-of-the-year (hatched) |
PSm | Pre-smolts |
Sm1 | Smolt after 1 year |
Sp1 | Returns one year earlier than Sp2 |
Parr1 | Smaller juveniles left behind by Sm1 |
Sm2 | Smolt after 2 year |
Sp2 | Returns one year after Sp1 |
The Models - Introducing Probability into the Graph
We’re going to take the graph that represent Salmon’s migration cycle and introduce uncertainty to it (model it via probability). By doing this we create a new graph that is complete different from the Salmon’s migration cycle graph. It is a graph base on probability view.
We go through each square node and one by one apply a model and probability to it.
Model is time base, t will represent a particular year.
- Sp t = Sp1 t + Sp2 t = # of spawners at t-th year
- W t = # of eggs spawned by the adults returning in year t
- 0+ t = Young-of-the-year at t-th year
- PSm t = pre-smolts at t-th year
- Sm1 t = 1+ smolts (1 year to smolt) at t-th year
- P1 t = Parr1 = smaller juveniles left behind by Sm1 at t-th year
- Sm2 t = 2+ smolts (2 years to smolt) at t-th year
Spawners -> Eggs
W t = Sp t ⋅ P_f ⋅ fec
- W t = # of eggs spawned by the adults returning in year t
- Sp t = # of spawners = Sp1 + Sp2
- P_f = proportion of females
- fec = mean of fecundity (fertility)
Eggs -> 0+ juveniles
This is Ricker Cruve model with parameters (α,β) which is a classic discrete population model.
0+t+1 = α ⋅ Wt ⋅ e-β ⋅ Wt ⋅ eεt where εt ~iid N(0, σ2)
- 0+t+1 = freshwater production of juveniles resulting from the reproduction of the spawners returning in year t
0+ juveniles -> Smolts
PSmt+2 ~ Binomial(0+t+1, γ0+) = # of 0+t+1 will survive and migrate to PSmt+2
Sm1t+2 ~ Binomial(PSmt+2, θSm1) = # of PSmt+2 will survive and migrate as 1+smolts (Sm1)
Sm2t+3 ~ Binomial(Parr1t+2, γparr1) = # of Parr1t+2 will survive and migrate as 2+smolts (Sm2)
- PSmt+2 = young-of-the-year 0+t+1 will survive next spring year t+2
- γ0+ = survival rate of 0+
- θSm1 = proportion of pre-smolts will migrate as 1+Smolts (survival rate)
- γparr1 = survival rate of parr1
Sp1t+3 ~ Binomial(Sm1t+2, γSm)
Sp2t+4 ~ Binomial(Sm2t+3, γSm)
- γSm = survival rate
Learning from observations
These two are observed and given:
CSm1,t = observations = # of smolts caught downstream trap
πSm = trap efficiency
Using these data points we can figure out the unknowns.
Our unknowns, the parameters, are: α, β, σ, γ0+, θsm1, γParr1, γSm
# of smolts caught downstream trap can be model as a binomial distribution either the smolt is caught or not.
CSm1,t ~ Binomial(Sm1t, πSm)
*Note (advance): observations assume Bayesian’s property of exchangability
The Models - Creating a probability graphical model
0+t+1 = α ⋅ Wt ⋅ e-β ⋅ Wt ⋅ eεt where εt ~iid N(0, σ2)
- 0+t+1 = freshwater production of juveniles resulting from the reproduction of the spawners returning in year t
PSmt+2 ~ Binomial(0+t+1, γ0+) = # of 0+t+1 will survive and migrate to PSmt+2
- γ0+ = survival rate of 0+
Sm1t+2 ~ Binomial(PSmt+2, θSm1) = # of PSmt+2 will survive and migrate as 1+smolts (Sm1)
- PSmt+2 = young-of-the-year 0+t+1 will survive next spring year t+2
- θSm1 = proportion of pre-smolts will migrate as 1+Smolts (survival rate)
Sm2t+3 ~ Binomial(Parr1t+2, γparr1) = # of Parr1t+2 will survive and migrate as 2+smolts (Sm2)
Sp1t+3 ~ Binomial(Sm1t+2, γSm)
Sp2t+4 ~ Binomial(Sm2t+3, γSm)
Now we put all the parts together into graph.
Okay, now that we got the probability graphical model down we can figure out the joint probability distribution.
P(Jt) = ?
Step 1. Looking at the graph, we’re going to start with all nodes with no parent: α, β, σ, Wt, γ0+, θSm1, γParr1, and γSm.
P(Jt) = P[α] ⋅ P[β] ⋅ P[σ] ⋅ P[Wt] ⋅ P[γ0+] ⋅ P[θSm1] ⋅ P[γParr1] ⋅ P[γSm] …
Step 2. Now we’re going to look at the nodes with parents.
- 0+t+1 is P[0+t+1 | Wt, α, β, σ]
- PSmt+2 is P[PSmt+2 | 0+t+2, γ0+]
- Sm1t+2 & Parr1t+2 is P[Sm1t+2, Parr1t+2 | PSmt+2, θSm1]. Notice how complex this one is. It is because Sm1 and Parr1 both share the same parameters.
- P[Sp1t+3 | Sm1t+2, γSm]
- P[Sm2t+3 | Parr1t+2, γParr1]
- P[Sp2t+4 | Sm2t+3, γSm]
P(Jt) = P[α] ⋅ P[β] ⋅ P[σ] ⋅ P[Wt] ⋅ P[γ0+] ⋅ P[θSm1] ⋅ P[γParr1] ⋅ P[γSm]
⋅ P[0+t+1 | Wt, α, β, σ] ⋅ P[PSmt+2 | 0+t+2, γ0+]
⋅ P[Sm1t+2, Parr1t+2 | PSmt+2, θSm1]
⋅ P[Sp1t+3 | Sm1t+2, γSm] ⋅ P[Sm2t+3 | Parr1t+2, γParr1] ⋅ P[Sp2t+4 | Sm2t+3, γSm]
Okay so what? Where’s the Bayesian network?
Not yet. The book needs to introduce the concept of a simple model vs a hierarchical model and some terminology.
So far we haven’t introduce any observational variable (random variable) at all.
- θ represents parameters
- Z represents latent parameters
- Y represents the output Random Variable. (little y represent the realization/sample of Y random variable).
Left is a simple model. The right graph is a hierarchical model.
Z represents latent variables (nuisance variables), basically variables we don’t really care, also they’re hidden we don’t observed it directly like Y. Y represents observations. Observations are random so Y is capitalized and smaller y is the realization of Y or a sample of Y. The node is pink because it is an observable.
P[θ, Z, Y] = P[θ] ⋅ P[Z | θ] ⋅ P[Y | θ, Z]
Well first what’s the experiment?
For this it’s salmon captures and they’re model via binomial distribution either you catch the fish or not.
Notice the C stands for catches.
- C0+, t+1 ~ Binomial(0+t+1, π0+)
- CSm1, t+2 ~ Binomial(Sm1t+2, πSm)
- CSm2, t+3 ~ Binomial(Sm1t+3, πSm)
- CSp1, t+3 ~ Binomial(Sp1t+3, πSp)
- CSp2, t+4 ~ Binomial(Sp2t+4, πSp)
Once again the squares represent known/given values (the π’s are given). The pink circle means observed values. Pink in general means they’re known either by given or by observations. The purple boxes represent grouping and group the nodes into their respective group.
Ok. Finally, we got a Bayesian network. Really, what now?
How does y (the sample or realization of Y) fits in this fancy graph?
What happen when the Observation is available?
Before that notice how we build the model and the direction. The direction is downward from the Salmon cycle toward the latent variable and then towards the obsevation.
Why did the book brought this up? It is because when you train the model using the data/observations that are available you go in the opposite direction.
You start at the Y (observation layer and Y is a random variable) and Y is now, Y = y, since little y is the realization of random variable Y. y is a sample of Y or the data (values not just some placeholder variable). And you go up to latent layer and then to the parameter later.
Let’s see it mathematically:
Here’s the joint probability:
P[Y, θ, Z]
Now here’s the joint probability with Y = y, when we have data to train the model and find the paramenter.
P[θ, Z | Y = y]
Given Y = y, the observations propagate upward from the observation to the latent layer to the parameter layer.
This is how you train the model after you are done creating the model.
You can see the Bayes Rule connection too right? We’re always dealing with Joint Probability and Conditional Probability.
Bayesian make it so that they’re conditionally independent. This is one of the property of Bayesian statistic.
This is now a posterior distribution. Posterior being after the data. Prior distribution is before the data.
P[θ, Z | Y = y] = posterior distributionundefined
I’m going to repeat it again.
Posterior is after the data have been inputed.
Prior is before the data. It is your prior belief.
In Bayesian you need to supply a belief in form of a prior distribution. It’s weird but don’t worry if you don’t know anything then you can use a noninformative prior distribution.
The belief thing is also away to encode expert belief too.
So given what we have now, we just have to apply Bayes’ Rule to the conditional probability and you get your parameter values.
Bayes’ Rule
P[θ Z | Y = y] = P[θ, Z, Y = y] / P[Y = y]
Some stat here and you get.
P[θ Z | Y = y] ∝ P[θ] ⋅ P[Z | θ] ⋅ P[Y = y | θ, Z]
Conclusion
I highly recommend this book. Andrew Gelman’s DBA book is more PhD level and his approach is not graphical like this but more mathy. Being visual this book helps a lot into tying things together.
There was no observations/data and no code for this chapter. Ah dangit. Well until next time, stay tuned for the next episode of Bayesian man.
1Buy the book if you like what you see on the post. This is basically my notes on chapter 1 of the book. It’s an amazing book and I highly recommend it.
What did I learn about myself
I’m glad I’m reviewing this chapter of the book again. I have a confession to make, if I want to understand a material/subject I need to read 3 times and do projects on it and a review of what I’ve learned. I need tons of practice. I guess this is one of the reason why I started this blog.
This chapter ties in again DAG, Bayes’ Rule, and conditional probabilities. Good refresher and clear up things that I was wrong about. Especially the salmon breeding cycle, I didn’t think about the fact that it wasn’t a DAG. And that from that model we create a Bayesian Graph Model (DAG).
I think I’ll go through each chapter of this book as a refresher while playing with javascript graphical libraries and hopefully learn Stan. I need to make sure I didn’t miss out on anything from the first reading.
What Did I Get to Practice? (for me)
Bayesian Hierarchical Modeling using rstan.- Tried out a javascript data visualisation library, cytoscape.js, for modeling graphs.
- Gets to refresh Bayesian Graphical Model (Bayesian Network).
Rough Roadmap for Bayesian HM
- Finish off this book. Introduction to Hierarchical Bayesian Modeling for Ecological Data (Chapman & Hall/CRC Applied Environmental Statistics)
- Read this for Hamiltonian Markov Chain(Statistics in the social and behavioral sciences series) Gill, Jeff-Bayesian Methods A Social and Behavioral Sciences Approach-CRC Press (2014)
- Read https://arxiv.org/abs/1111.4246 an implementation of HMC
- Measure theory videos
- DBA 3 reread again learning Dirichlet Process
Etc..
- Introduction to Hierarchical Bayesian Modeling for Ecological Data (Chapman & Hall/CRC Applied Environmental Statistics) The book link is Amazon affiliated. If you get it at CRC publishing you can get it 20 bucks cheaper if you use a discount code, just that it takes longer to ship. Also note I would recommend reading “Doing Bayesian Data Analysis” first before even trying to get into Hierarchical Modeling.
- Would like to thank this website for all the html mathematical notations.
- The salmon sushi picture was taken from pixabay under creative common license.