## Why Be Bayesian? Let Me Count the Ways

In answer to an old friend's question.

1. Bayesians have more fun.
1. Our conferences are in better places too.
2. It's the model not the estimator.
3. Life's too short to be a frequentist: In an infinite number of replications ...
4. Software works better.
1. Rather surprisingly, Bayesian software is a lot more general than frequentist software.
5. Small sample inference comes standard with most Bayesian model fitting these days.
1. But if you like your inference asymptotic, that's available, just not high on anyone's priority list.
2. We can handle the no-data problem, all the way up to very large problems.
3. Don't need a large enough sample to allow for a bootstrap.
6. Hierarchical random effects models are better fit with Bayesian models and software.
1. If a variance component is small, the natural Bayes model doesn't allow zero as an estimate, while the natural maximum likelihood algorithms do allow zero. If you get a zero estimate, then you're going to get poor estimates of standard errors of fixed effects. [More discussion omitted.]
2. Can handle problems where there are more parameters than data.
7. Logistic regression models fit better with Bayes
1. If there's perfect separation on a particular variable, the maximum likelihood estimate of the coefficient is plus or minus infinity which isn't a good estimate.
2. Bayesian modeling offers (doesn't guarantee it, there's no insurance against stupidity) the opportunity to do the estimation correctly.
3. Same thing if you're trying to estimate a very tiny (or very large) probability. Suppose you observe 20 out of 20 successes on something that you know doesn't have 100% successes.
4. To rephrase a bit: In small samples or with rare events, Bayesian estimates shrink towards sensible point estimates, (if your prior is sensible) thus avoiding the large variance of point estimates.
9. Frequentists keep reinventing Bayesian methods
1. Shrinkage estimates
2. Empirical Bayes
3. Lasso
4. Penalized likelihood
5. Ridge regression
6. James-Stein estimators
7. Regularization
8. Pittman estimation
9. Integrated likelihood
10. In other words, it's just not possible to analyze complex data structures without Bayesian ideas.
1. Admissibility means never having to say you're sorry.
2. Alternatively, admissibility means that someone else can't prove that they can do a better job than you.
3. And if you're a frequentist, someone is clogging our journals with proofs that the latest idiocy is admissible or not.
4. Unless they are clogging it with yet more ways to estimate the smoothing parameter for a nonparametric estimator.
11. Bayesian models are generalizations of classical models. That's what the prior buys you: more models
12. Can handle discrete, categorical, ordered categorical, trees, densities, matrices, missing data and other odd parameter types.
13. Data and parameters are treated on an equal playing field.
14. I would argue that cross-validation works because it approximates Bayesian model selection tools.
15. Bayesian Hypothesis Testing
1. Treats the null and alternative hypotheses on equal terms
2. Can handle two or more than two hypotheses
3. Can handle hypotheses that are
1. Disjoint
2. Nested
3. Overlapping but neither disjoint nor nested
4. Gives you the probability the alternative hypothesis is true.
5. Classical inference can only handle the nested null hypothesis problem.
6. We're all probably misusing p-values anyway.
16. Provides a language for talking about modeling and uncertainty that is missing in classical statistics.
1. And thus provides a language for developing new models for new data sets or scientific problems.
2. Provides a language for thinking about shrinkage estimators and why we want to use them and how to specify the shrinkage.
3. Bayesian statistics permits discussion of the sampling density of the data given the unknown parameters.
4. Unfortunately this is all that frequentist statistics allows you to talk about.
5. Additionally: Bayesians can discuss the distribution of the data unconditional on the parameters.
6. Bayesian statistics also allows you to discuss the distribution of the parameters.
7. You may discuss the distribution of the parameters given the data. This is called the posterior, and is the conclusion of a Bayesian analysis.
8. You can talk about problems that classical statistics can't handle: The probability of nuclear war for example.
17. Novel computing tools -- but you can often use your old tools as well.
18. Bayesian methods allow pooling of information from diverse data sources.
1. Data can come from books, journal articles, older lab data, previous studies, people, experts, the horse's mouth, rats a** or it may have been collected in the traditional form of data.
2. It isn't automatic, but there is language to think about how to do this pooling.
19. Less work.
1. Bayesian inference is via laws of probability, not by some ad hoc procedure that you need to invent for every problem or validate every time you use it.
2. Don't need to figure out an estimator.
3. Once you have a model and data set, the conclusion is a computing problem, not a research problem.
4. Don't need to prove a theorem to show that your posterior is sensible. It is sensible if your assumptions are sensible.
5. Don't need to publish a bunch of papers to figure out sensible answers given a novel problem
6. For example, estimating a series of means $mu_1, mu_2, \ldots$ that you know are ordered $mu_j \le mu_{j+1}$ is a computing problem in Bayesian inference, but was the source of numerous papers in the frequentist literature. Finding a (good) frequentist estimator and finding standard errors and confidence intervals took lots of papers to figure out.
20. Yes, you can still use SAS.
1. Or R or Stata.
21. Can incorporate utility functions, if you have one.
22. Odd bits of other information can be incorporated into the analysis, for example
1. That a particular parameter, usually allowed to be positive or negative, must be positive.
2. That a particular parameter is probably positive, but not guaranteed to be positive.
3. That a given regression coefficient should be close to zero.
4. That group one's mean is larger than group two's mean.
5. That the data comes from a distribution that is not a Poisson, Binomial, Exponential or Normal. For example, the data may be better modeled by a t, gamma.
6. That a collection of parameters come from a distribution that is skewed, or has long tails.
7. Bayesian nonparametrics can allow you to model an unknown density as a non-parametric mixture of normals (or other density). The uncertainty in estimating this distribution is incorporated in making inferences about group means and regression coefficients.
23. Bayesian modeling is about the science.
1. You can calculate the probability that your hypothesis is true.
2. Bayesian modeling asks if this model describes the data, mother nature, the data generating process correctly, or sufficiently correctly.
3. Classical inference is all about the statistician and the algorithm, not the science.
4. In repeated samples, how often (or how accurately) does this algorithm/method/model/inference scheme give the right answer?
5. Classical inference is more about the robustness (in repeated sampling) of the procedure. In that way, it provides robustness results for Bayesian methods.
24. Bayesian methods have had notable successes, to wit:
1. Covariate selection in regression problems
2. Model selection
3. Model mixing
4. And mixture models
5. Missing data
6. Multi-level and hierarchical models
7. Phylogeny

The bottom line: More tools. Faster progress.

## Mathematics Departments and the Talented Mr. Teacher

Today we have a guest post from a colleague named Mathprof. The pseudonym perhaps is needed as Mathprof's colleagues might not be pleased to read all mathprof's comments. I did some very minor editing, but otherwise the content is Mathprof's.

Mathprof's response to me:

Nationwide, mathematics departments, higher education or otherwise, mirror his [Alexander Coward's] interpretation and description of events. He perfectly described my department. Just today, some of my students alerted me that my peers have been complaining that I try too hard and that my approach is hurting students. In their opinion, it is not proper for the teacher to try to teach - it is the student’s responsibility to try to learn. I am, supposedly, doing my students a disservice as I raise their knowledge base through the use of sound, research based, pedagogical practices by letting my students become accustomed to a style of learning they will likely never experience again.

Most mathematicians at large universities are grounded in pedagogical, epistemological, ontological, and methodological paradigms that uphold and maintain the current mathematics education paradigm. There are deep seated beliefs about what is education, who can and should access it, how it should look, and to what end. Although the socialization in this manner of thinking occurs mainly in colleges, it is often introduced in early education, as mathematics undergraduates fill most of the K12 mathematics teacher positions, and they often bring this paradigm with them. This is how the system perpetuates itself and how it constructs continuity between K-12 and higher education.

What I have learned in the few years of reading and writing extensively on this topic is that the form and function of education are completely at odds with one another. This is particularly true in mathematics. If you ask math teachers what they hope to achieve and then observe their method of approach, it is quickly evident that for the majority of teachers, form rules over function. Strategies whose function is to increase learning are aggressively de-emphasized and de-legitimized simply because they clash with the dominant paradigm view of education. This occurs both in practice and in research. Recreating their learning experience in not a means to an end, but an end onto itself. The demonstration of a superior capacity and knowledge in mathematics is the form that must rule over function. For these teachers, learning is a mere externality. Those that are meant to learn math do - everyone else learns their place. There is no maximization of exposure, customization of approach, or an intellectualization of the process. It is complete madness.

I could go on forever but I will stop. I feel bad for the guy [Alexander Coward]. It sounds like he is doing it the right way. That said, welcome to education. Watch out for gravity - you never know which way it will pull tomorrow.

-- Mathprof

## How to Prepare for a PhD in Biostatistics

How would you advise an undergraduate interested in a PhD to prepare for studying biostats?

More math. You can't be too rich or know too much math. In terms of courses, take more mathematics, take as much as you can. When you get into our biostatistics graduate program, we teach you statistics, so taking more statistics now won't help you in the long run, plus we may have to un-teach something if you learn statistics badly.

In terms of what math courses to take, try to take real analysis. Advanced linear algebra is very helpful. Every part of math is useful somewhere in statistics, though connections may be obscure, or more likely, just not part of some current fad. Numerical analysis and combinatorics are also helpful, and everything else is helpful too. First though is Real Analysis and Advanced linear algebra.

What might I read outside of my courses?

Start picking up books and articles that relate to statistics, math, science, and public health and read them. Lately there have been a number of excellent popular science books that relate to science, statistics and statistical thinking. Anything and everything by Malcolm Gladwell I highly recommend. Other books that come to mind are things like Stephen Jay Gould's essays; The Signal and the Noise by Nate Silver; The Theory That Would Not Die by McGrayne; any of the books by Jared Diamond. A very important book for most anyone: Thinking, Fast and Slow by Kahneman. The Black Swan by Nassim Taleb; How to Lie With Statistics by Darrell Huff.

For someone not yet expert in statistics, books on statistical graphics are directly statistical and will be much more accessible than a technical book. Books on statistical graphics will directly make you a better statistician, now. These teach you both to look at data and how to look at data. There's a set of 4 books by Edward Tufte. See http://www.edwardtufte.com/tufte/ . Get all four, I recommend hardcover over paperback, and definitely I wouldn't get the e-books. Read all four! These are not always practical books, but they inspire us to do our best and to be creative in our statistical and graphical analyses.

Read Visualizing Data by William Cleveland (A+ wonderful book). Additional graphics books include Graphical Methods for Data Analysis by Chambers Cleveland Kleiner and Tukey if you can find a copy. A more statistical book that will help instill the proper attitude about data is Exploratory Data Analysis by Tukey.

Read anything else you can find to read. Read widely and diversely. As you get stronger in math and statistics, change the level of the books. Start exploring the literature. Dive into one area and read as much as you can. Then find another area and check it out.

Can I depend on the department to teach me everything I need to be a good statistician?

Of course not. Active learning is paramount. No graduate department will teach you everything. All departments teach a core set of material, and it is up to students to supplement that core with additional material. How you supplement that core determines what kind of statistician you can be, how far you can go. Some people might supplement their core material with an in depth study of non-parametrics; others with Bayesian methods, statistical computing, spatial data analysis or clinical trials. I supplemented my graduate education with statistical graphics, Bayesian methods, statistical computing, regression methods, hierarchical models, semi-parametric modeling, foundations and longitudinal data analysis. The semi-parametric modeling, graphics and computing mostly came from books. The longitudinal data analysis came from a mix of books and journal articles. Bayesian methods and hierarchical models I learned mostly from journal articles. Foundations came from talking to people and listening to seminars, as well as from journal articles and books. I also tried to learn additional mathematical statistics using various texts, but wasn't very successful; similarly with optimal design.

How you supplement your education depends on your interests and may help you refine your interests. I found I wasn't that interested in time series, survey sampling, stochastic processes or optimal design. If you're interested in working with a particular professor, you're going to need to supplement with books in her/his area, and you're going to need to read that professor's research papers to see what you're going to be getting into.

What programming language(s) should I learn?

R is growing fast and may take over, sort of like kudzu, so it is well worth your time to become expert in it. Definitely learn/use R Studio. Some folks make a living just off their R expertise. A lot more make a living off their SAS expertise. But I bet the R people are having a lot more fun. The rest of this is what I garner from others, not from direct knowledge. If you want less of a statistics specialty language and to be closer to the computer end of things, C++ or JAVA are extremely popular (you should figure out why). Python seems to be coming on very strong. So maybe R and Python? Depends on what you like. Learn something about algorithms and something about modern computer programming interfaces. And a little HTML.

Go learn latex now. Become at least a partial expert. Knowing latex before you come in to grad school is very helpful.