# Statistics notation

### Introduction

Probability and statistics is replete with all sorts of strange notation. In this module, we try to clarify some notation that we use in other modules. In doing so, we provide a very brief outline of the foundations of probability and statistics.

### The experimental setup

Every statistics problem begins with an experiment denoted $\mathcal{E}$. It can be someone flipping a coin, determining the time it takes for a cell to divide, or determining whether a certain drug is effective – it doesn’t matter.

Of course, every experiment $\mathcal{E}$ has an outcome. For example, when flipping a coin, there are two possible outcomes, heads $H$ and tails $T$. The collection of all possible outcomes of an experiment we denote $\mathcal{S}$ and call the sample space. Mathematically, $\mathcal{S}$ is a set. For example, in the case of flipping a coin, $\mathcal{S} = \{H, T\}$.

### The definition of probability

Subsets of the sample space, i.e. collections of outcomes of the experiment $\mathcal{E}$, are called events. In most cases, it is not useful to simply assign every element of the sample space $s$ a probability. Instead, we usually

At this point a little set theory helps and sets the stage for all of probability theory. In this article we just give the basic idea; for a more advanced exposition, look for books on measure theoretic probability such as the Resnick’s A Probability Path or Billingsley’s Probability and Measure. These are both advanced texts and are only accessible with an undergraduate level of mathematics. The Wikipedia probability outline is also a helpful handy resource.

Onward! For any set $\mathcal{A}$, the power set of $\mathcal{A}$ is the set of all subsets of $\mathcal{A}$; it’s denoted $\mathcal{P}(\mathcal{A})$. For example, the subsets of $\mathcal{A}$ are $\mathcal{S} = \{H, T\}$, $\{H\}$, $\{T\}$, and $\emptyset = \{\}$, the so-called empty set, which is by definition a subset of any set (as is the set itself). So the power set is $\mathcal{P}(\mathcal{A}) = \big\{\{H,T\}, \{H\}, \{T\}, \{\}\big\}$. In general, if a set has $n$ elements, then its power set will have $2^n$ elements. In the coin flipping case, $\mathcal{S}$ has 2 elements, and the power set has $2^2 = 4$ elements.

We are now at a place where we can define a probability. A probability is a function, usually denoted $P$, which assigns to every element of the power set of the sample space a number. Of course, not just any function will do. The function $P$ must satisfy the three following properties to be a probability :

1. The probability of the sample space is 1 : $P(S) = 1$.
2. Probabilities can’t be negative : for any event $\mathcal{A} \in \mathcal{P}(\mathcal{S})$ $P(\mathcal{A}) \geq 0$.
3. If $\mathcal{A}$ and $\mathcal{B}$ are disjoint sets (they don’t contain any of the same elements), then $P(\mathcal{A} \cup \mathcal{B}) = P(\mathcal{A}) + P(\mathcal{B})$.

Subsets of the sample space, i.e. collections of outcomes of the experiment $\mathcal{E}$, are called events. In most cases, it is not useful to simply assign every element of the sample space $s$ a probability. Instead, we usually

At this point a little set theory helps and sets the stage for all of probability theory. In this article we just give the basic idea; for a more advanced exposition, look for books on measure theoretic probability such as the Resnick’s A Probability Path or Billingsley’s Probability and Measure. These are both advanced texts and are only accessible with an undergraduate level of mathematics. The Wikipedia probability outline is also a helpful handy resource.

Onward! For any set $\mathcal{A}$, the power set of $\mathcal{A}$ is the set of all subsets of $\mathcal{A}$; it’s denoted $\mathcal{P}(\mathcal{A})$. For example, the subsets of $\mathcal{A}$ are $\mathcal{S} = \{H, T\}$, $\{H\}$, $\{T\}$, and $\emptyset = \{\}$, the so-called empty set, which is by definition a subset of any set (as is the set itself). So the power set is $\mathcal{P}(\mathcal{A}) = \big\{\{H,T\}, \{H\}, \{T\}, \{\}\big\}$. In general, if a set has $n$ elements, then its power set will have $2^n$ elements. In the coin flipping case, $\mathcal{S}$ has 2 elements, and the power set has $2^2 = 4$ elements.

We are now at a place where we can define a probability. A probability is a function, usually denoted $P$, which assigns to every element of the power set of the sample space a number. Of course, not just any function will do. The function $P$ must satisfy the three following properties to be a probability :

1. The probability of the sample space is 1 : $P(S) = 1$.
2. Probabilities can’t be negative : for any event $\mathcal{A} \in \mathcal{P}(\mathcal{S})$ $P(\mathcal{A}) \geq 0$.
3. If $\mathcal{A}$ and $\mathcal{B}$ are disjoint sets (they don’t contain any of the same elements), then $P(\mathcal{A} \cup \mathcal{B}) = P(\mathcal{A}) + P(\mathcal{B})$.

Subsets of the sample space, i.e. collections of outcomes of the experiment $\mathcal{E}$, are called events. In most cases, it is not useful to simply assign every element of the sample space $s$ a probability. Instead, we usually

At this point a little set theory helps and sets the stage for all of probability theory. In this article we just give the basic idea; for a more advanced exposition, look for books on measure theoretic probability such as the Resnick’s A Probability Path or Billingsley’s Probability and Measure. These are both advanced texts and are only accessible with an undergraduate level of mathematics. The Wikipedia probability outline is also a helpful handy resource.

Onward! For any set $\mathcal{A}$, the power set of $\mathcal{A}$ is the set of all subsets of $\mathcal{A}$; it’s denoted $\mathcal{P}(\mathcal{A})$. For example, the subsets of $\mathcal{A}$ are $\mathcal{S} = \{H, T\}$, $\{H\}$, $\{T\}$, and $\emptyset = \{\}$, the so-called empty set, which is by definition a subset of any set (as is the set itself). So the power set is $\mathcal{P}(\mathcal{A}) = \big\{\{H,T\}, \{H\}, \{T\}, \{\}\big\}$. In general, if a set has $n$ elements, then its power set will have $2^n$ elements. In the coin flipping case, $\mathcal{S}$ has 2 elements, and the power set has $2^2 = 4$ elements.

We are now at a place where we can define a probability. A probability is a function, usually denoted $P$, which assigns to every element of the power set of the sample space a number. Of course, not just any function will do. The function $P$ must satisfy the three following properties to be a probability :

1. The probability of the sample space is 1 : $P(S) = 1$.
2. Probabilities can’t be negative : for any event $\mathcal{A} \in \mathcal{P}(\mathcal{S})$ $P(\mathcal{A}) \geq 0$.
3. If $\mathcal{A}$ and $\mathcal{B}$ are disjoint sets (they don’t contain any of the same elements), then $P(\mathcal{A} \cup \mathcal{B}) = P(\mathcal{A}) + P(\mathcal{B})$.

# Random triangles

##### The Basics

At a basic level, a random triangle is simply a triangle whose corners are three random points on a piece of paper.

Mathematically speaking, a few decisions have to be made characterize exactly how the random point selection works. Think of it this way : should every place on the piece of paper be equally likely, or should the middle of the page be more likely to be selected than near the borders?

In this module, we assume that the points are coming from a bivariate normal distribution with unit variances and correlation $\rho$.

## Play with random triangles!

The following module generates bunches of random triangles using the bivariate normal distribution with correlation coefficient $\rho$. The red triangles are obtuse, and the green triangles are acute (the likelihood of seeing a right triangle is 0, so it doesn’t get a color.) You can change $\rho$ with the slider under the module. What happens as $\rho$ approaches -1 or 1?

In Professor Strang’s lecture he discusses what the triangles look like in “triangle space”. The basic idea is that every triangle has three angles which sum to $180^{\circ}$, call them $\alpha$, $\beta$, and $\gamma$.

# Tabs test

This is the basics module.

Morbi tincidunt, dui sit amet facilisis feugiat, odio metus gravida ante, ut pharetra massa metus id nunc. Duis scelerisque molestie turpis. Sed fringilla, massa eget luctus malesuada, metus eros molestie lectus, ut tempus eros massa ut dolor. Aenean aliquet fringilla sem. Suspendisse sed ligula in ligula suscipit aliquam. Praesent in eros vestibulum mi adipiscing adipiscing. Morbi facilisis. Curabitur ornare consequat nunc. Aenean vel metus. Ut posuere viverra nulla. Aliquam erat volutpat. Pellentesque convallis. Maecenas feugiat, tellus pellentesque pretium posuere, felis lorem euismod felis, eu ornare leo nisi vel felis. Mauris consectetur tortor et purus.

Hi, this is the first tab.

This is the 2nd tab.

And this is the 3rd tab.

# Bayesian estimation of a population proportion

### The Basics

In statistics, a binomial proportion confidence interval is a confidence interval for a proportion in a statistical population. It uses the proportion estimated in a statistical sample and allows for sampling error. There are several formulas for a binomial confidence interval, but all of them rely on the assumption of a binomial distribution. In general, a binomial distribution applies when an experiment is repeated a fixed number of times, each trial of the experiment has two possible outcomes (labeled arbitrarily success and failure), the probability of success is the same for each trial, and the trials are statistically independent.

A simple example of a binomial distribution is the set of various possible outcomes, and their probabilities, for the number of heads observed when a (not necessarily fair) coin is flipped ten times. The observed binomial proportion is the fraction of the flips which turn out to be heads. Given this observed proportion, the confidence interval for the true proportion innate in that coin is a range of possible proportions which may contain the true proportion. A 95% confidence interval for the proportion, for instance, will contain the true proportion 95% of the times that the procedure for constructing the confidence interval is employed.

There are several ways to compute a confidence interval for a binomial proportion. The normal approximation interval is the simplest formula, and the one introduced in most basic Statistics classes and textbooks. This formula, however, is based on an approximation that does not always work well. Several competing formulas are available that perform better, especially for situations with a small sample size and a proportion very close to zero or one. The choice of interval will depend on how important it is to use a simple and easy-to-explain interval versus the desire for better accuracy.

# Univariate probability distributions

### The Basics

In probability and statistics, a probability distribution assigns a probability to each of the possible outcomes of a random experiment. Examples are found in experiments whose sample space is non-numerical, where the distribution would be a categorical distribution; experiments whose sample space is encoded by discrete random variables, where the distribution is a probability mass function; and experiments with sample spaces encoded by continuous random variables, where the distribution is a probability density function. More complex experiments, such as those involving stochastic processes defined in continuous-time, may demand the use of more general probability measures.
In applied probability, a probability distribution can be specified in a number of different ways, often chosen for mathematical convenience:

1. by supplying a valid probability mass function or probability density function
2. by supplying a valid cumulative distribution function or survival function
3. by supplying a valid hazard function
4. by supplying a valid characteristic function
5. by supplying a rule for constructing a new random variable from other random variables whose joint probability distribution is known.

Important and commonly encountered probability distributions include the binomial distribution, the hypergeometric distribution, and the normal distribution.