KB-Statistics

Sample Problems

1.1-Sampling

You draw a sample from a population to infer abundance of some trait amoung the pop.

Random sampling is best:

simple random sampling: a subset of $n$ members from a population.

each of the $n$ members is chosen from pop randomly

-each of the $n$ members has equal probability of being chosen to be in subset.
each subset of $n$ members is equally likely to be chosen to be the sample

Example: Lottery, random # generator

Sample of convenience: Removing heavy blocks from the top of a pile for testing, rather than throughout the pile.

sample variation:

-different samples vary
-a sample doesn't perfectly reflect the population.

independent: items in sample are independnt if knowing their value does not help predict that of others.

if sample <5% total pop (for cases of non-replacement)

1.2 - Summary Statistics

Sample mean

$\bar{X}=\frac{1}{n}\sum_{i=1}^n X_i$

$X_1, X_2, ..., X_n$ is a sample.

Sample Variance

$s^2=\frac{1}{n-1}\sum_{i=1}^n(X_i-\bar{X})^2 = \frac{1}{n-1}(\sum_{i=1}^nX_i^2-n\bar{X}^2)$

Sample standard deviation: $s=\sqrt{s^2}$

*Purpose is to estimate spread of population about pop mean.

-so ideally computer deviation of all pop members

-dev abt $\bar{X} <$ dev abt $\mu$, dividing by $n-1$, instead of $n$ provides appropriate correction.

For $Y_i=aX_i+b$,

$\bar{Y}=a\bar{X}+b$

$s_Y^2=a^2s_X^2$, $s_Y=|a|s_X$

Note on Variance

Let's say you have an entire population, you can now calculate a variance using $N$ in denominator.

Let's say you have a sample and want the variance of the sample. Use $N$ in denominator

Both of these are true variances about a true mean. The whole pop or just a sample.

If you have just a sample but would really like to know what the pop mean and vriance are, then you'll need to estimate these. Pop variance is best estimated from sample mean using $n-1$ in denominator instead of $n$.

Esssentially, it comes to "sample variance" uses $n-1$ and that's just what is done. You don't use just $n$ no matter how much sense it might make. Perhaps unless you definitely have the entire sample.

-median is often used in presence of outliers.

SE mean: Standard error of the mean: $\frac{s}{\sqrt{n}}$, $n=$ sample size.

frequency: # of times category type appears in sample.

sample proportion (relative freq): $\frac{\text{freq}}{n}$

Summaries like all of the above are called statistics when done on a sample. Are called population parameters when on an entire pop.

Histograms

subranges containing data are called class intervals

rel frequency: $\frac{\text{freq}}{n}$

density = $\frac{\text{rel freq}}{class width}$

for unequal width, the y-axis must be density.

Skew:

mean~center of mass of the histogram

Probability

experiment: process whose outcome cannot certainly be predicted.

example: coin toss, die roll, measure bolt diameter

-weight cereal box contents.

sample space: set of all possible outcomes of experiment.

Example:

coin toss {head, tails}
die roll: {1,2,3,4,5,6}
hole punch in metal: hole punch is 10 mm diameter but variations in metal and angle cause the hole diameter to vary between (10, 10.2). $\{x|10.0<x<10.2\}$

!! Choice of sample space:

Example:

item length should be 5 but varies between 4 and 6. Sample space 1) $\{x|4<x<6\}$
if you only want to know whether it is good or not, consider: {too long, too short, just fine}

event: subset of a sample space.

Example:

die roll has sample space {1,2,3,4,5,6}; evens has subset {2,4,6}.
hole punch (above) with holes <10.1 has subset $\{x|10.0<x<10.1\}$

An event has happened if the experiment outcome is within the event's subset.

Empty set 0 and the entire sample space are events for all sample spaces.

Combine simple events to create more complex events:

$A\cup B$ is the set of outcomes for which $A$ or $B$, or both $A$ and $B$ together, have ocurred. Is read as Union. Can be thought of as "or". Example: The event $A\cup B$ has happened if $A$ or $B$ has happened, or both.
$A\cap B$ is the set of outcomes which belong to both $A$ and $B$. Is read as intersection. Can be thought of as "and". Example: The event $A\cap B$ has happened if both $A$ and $B$ has happened.
$A^c$ complement of A is the set of outcomes which are not within $A$. So it is everything that isn't $A$.
Mutually exclusive events are events which cannot happen together.

Example: coin toss coming up heads or tails/

In general, events $A_1,...A_n$ are mutually exclusive if no 2 have the events in common (no overlap).

Probability

measure how likely an event is to ocurr.
proportion of times an event would occur over many runs of the experiment.
These are denoted, for some event $A$, as $P(A)$.

Axioms of Probability

Let $S$ be a sample space; $P(S)=1$
For any event $A$, $0\leq P(A)\leq 1$
- If $A$ and $B$ are mutually exclusive, then $P(A\cup B) = P(A) + P(B)
- If $A_1,...A_n$ are mutually exclusive, then $P(A_1 \cup \cdots \cup A_n) = P(A_1) + \cdots + P(A_n)$.
Example: $P(too short) + P(too long) = P(either is true)$.

If they weren't mutually exclusive, there would be double counting of the venn diagram overlap.