$$ \newcommand{\vect}[1]{\mathbf{\vec{#1}}} $$ KB-Statistics

Statistics Knowledge Bank

a small sheep logo
Sample Problems

1.1-Sampling

You draw a sample from a population to infer abundance of some trait amoung the pop.

Random sampling is best:

simple random sampling: a subset of $n$ members from a population.

  1. each of the $n$ members is chosen from pop randomly

    -each of the $n$ members has equal probability of being chosen to be in subset.

  2. each subset of $n$ members is equally likely to be chosen to be the sample

Example: Lottery, random # generator

Sample of convenience: Removing heavy blocks from the top of a pile for testing, rather than throughout the pile.

sample variation:

  • -different samples vary
  • -a sample doesn't perfectly reflect the population.

independent: items in sample are independnt if knowing their value does not help predict that of others.

  • if sample <5% total pop (for cases of non-replacement)


1.2 - Summary Statistics

Sample mean

$\bar{X}=\frac{1}{n}\sum_{i=1}^n X_i$

$X_1, X_2, ..., X_n$ is a sample.

Sample Variance

$s^2=\frac{1}{n-1}\sum_{i=1}^n(X_i-\bar{X})^2 = \frac{1}{n-1}(\sum_{i=1}^nX_i^2-n\bar{X}^2)$

Sample standard deviation: $s=\sqrt{s^2}$

*Purpose is to estimate spread of population about pop mean.

-so ideally computer deviation of all pop members

-dev abt $\bar{X} <$ dev abt $\mu$, dividing by $n-1$, instead of $n$ provides appropriate correction.

For $Y_i=aX_i+b$,

$\bar{Y}=a\bar{X}+b$

$s_Y^2=a^2s_X^2$, $s_Y=|a|s_X$

Note on Variance

Let's say you have an entire population, you can now calculate a variance using $N$ in denominator.

Let's say you have a sample and want the variance of the sample. Use $N$ in denominator

Both of these are true variances about a true mean. The whole pop or just a sample.

If you have just a sample but would really like to know what the pop mean and vriance are, then you'll need to estimate these. Pop variance is best estimated from sample mean using $n-1$ in denominator instead of $n$.

Esssentially, it comes to "sample variance" uses $n-1$ and that's just what is done. You don't use just $n$ no matter how much sense it might make. Perhaps unless you definitely have the entire sample.


-median is often used in presence of outliers.

SE mean: Standard error of the mean: $\frac{s}{\sqrt{n}}$, $n=$ sample size.

frequency: # of times category type appears in sample.

sample proportion (relative freq): $\frac{\text{freq}}{n}$

Summaries like all of the above are called statistics when done on a sample. Are called population parameters when on an entire pop.

Histograms

subranges containing data are called class intervals

rel frequency: $\frac{\text{freq}}{n}$

density = $\frac{\text{rel freq}}{class width}$

for unequal width, the y-axis must be density.

Skew:

Symmetric
Left Skew
Right Skew

mean~center of mass of the histogram


Probability

experiment: process whose outcome cannot certainly be predicted.

example: coin toss, die roll, measure bolt diameter

-weight cereal box contents.

sample space: set of all possible outcomes of experiment.

Example:

  • coin toss {head, tails}
  • die roll: {1,2,3,4,5,6}
  • hole punch in metal: hole punch is 10 mm diameter but variations in metal and angle cause the hole diameter to vary between (10, 10.2). $\{x|10.0<x<10.2\}$

!! Choice of sample space:

Example:

  • item length should be 5 but varies between 4 and 6. Sample space 1) $\{x|4<x<6\}$
  • if you only want to know whether it is good or not, consider: {too long, too short, just fine}

event: subset of a sample space.

Example:

  • die roll has sample space {1,2,3,4,5,6}; evens has subset {2,4,6}.
  • hole punch (above) with holes <10.1 has subset $\{x|10.0<x<10.1\}$

An event has happened if the experiment outcome is within the event's subset.

Empty set 0 and the entire sample space are events for all sample spaces.


Combine simple events to create more complex events:

  • $A\cup B$ is the set of outcomes for which $A$ or $B$, or both $A$ and $B$ together, have ocurred. Is read as Union. Can be thought of as "or". Example: The event $A\cup B$ has happened if $A$ or $B$ has happened, or both.

  • $A\cap B$ is the set of outcomes which belong to both $A$ and $B$. Is read as intersection. Can be thought of as "and". Example: The event $A\cap B$ has happened if both $A$ and $B$ has happened.

  • $A^c$ complement of A is the set of outcomes which are not within $A$. So it is everything that isn't $A$.

  • Mutually exclusive events are events which cannot happen together.

    Example: coin toss coming up heads or tails/

    In general, events $A_1,...A_n$ are mutually exclusive if no 2 have the events in common (no overlap).


Probability

  • measure how likely an event is to ocurr.

  • proportion of times an event would occur over many runs of the experiment.

  • These are denoted, for some event $A$, as $P(A)$.

Axioms of Probability

  1. Let $S$ be a sample space; $P(S)=1$

  2. For any event $A$, $0\leq P(A)\leq 1$

    • If $A$ and $B$ are mutually exclusive, then $P(A\cup B) = P(A) + P(B)

    • If $A_1,...A_n$ are mutually exclusive, then $P(A_1 \cup \cdots \cup A_n) = P(A_1) + \cdots + P(A_n)$.

    Example: $P(too short) + P(too long) = P(either is true)$.

    If they weren't mutually exclusive, there would be double counting of the venn diagram overlap.