where E is the set of equivalence classes, t is the set of transcripts compatible with the equivalence class e, and C are the counts, which are the sufficient statistics. l is the effective length, and \(\alpha\) is the parameter of interest (abundances)
Pimental's talk on RNA-seq workflow considerations - https://www.youtube.com/watch?v=96yBPM8lEt8
Example review of genetic basis of gene expression and complex traits:
http://www.nature.com/nrg/journal/v16/n4/pdf/nrg3891.pdf

Calculating strength of genetic association - Bayes Factors

A lot of current studies of genetic association, both GWAS and eQTL studies, simply report the p-values of association, in a simplified model of expression and effect size of a particular variant (with effect size of 0 being the null hypothesis), and then perform some sort of multiple testing correction, such as Bonferroni correction or FDR calibration using Benjamini-Hochberg or q-value.
However, the following papers, along with many others, recommend a Bayesian approach:
http://www.nature.com/nrg/journal/v10/n10/pdf/nrg2615.pdf (Stephens et al. 2009)
https://link.springer.com/article/10.3758/s13423-016-1221-4 (Kruschke et al. 2017)
The main line of reasoning can be summarized as follows:
- The p-values are particular to the study being performed, and cannot be generalized (study design, stopping criteria, number of samples, and in the case of genetic association, even factors like MAF). This is because the p-values only portray how unlikely the data is only under the null model, and has no information about how likely the data could have been obtained under the alternative model. In addition, it is worth noting that by only asking how unlikely the data is under the null the frequentist null-hypothesis testing technically asks the wrong question.
- In contrast, the Bayes Factor shows the proportion of the likelihood of the data between the null model and the alternative model. There are several nice consequences of doing so, such as the fact that they can be generalized to different number of samples, different MAF, or even study designs, and also that the Bayesian tests don't need to worry about multiple testing burden, since the posterior probability of alternative model is provided directly (after factoring in the prior).
- In addition, it would be really nice if a full posterior probability distribution for the parameter (this can even be a joint parameter space, if we are trying to infer multiple parameters) can be estimated, instead of a point estimate, since it will provide more information about how confident we can be about the strength of association. It also answers, in full detail, the right question - given the data and the model, what is our estimate of the parameters of interest?
Once we are convinced that we want to perform genetic association studies in this fashion, we need the following recipes:
- A choice of prior (can be uniform, or be a function of MAF, or closeness to regulatory regions, etc.)
- Calculation of posterior odds of alternative: