1.1-Sampling
You draw a sample from a population to infer abundance of some trait amoung the pop.
Random sampling is best:
simple random sampling: a subset of $n$ members from a population.
-
each of the $n$ members is chosen from pop randomly
-each of the $n$ members has equal probability of being chosen to be in subset.
- each subset of $n$ members is equally likely to be chosen to be the sample
Example: Lottery, random # generator
Sample of convenience: Removing heavy blocks from the top of a pile for testing, rather than throughout the pile.
sample variation:
- -different samples vary
- -a sample doesn't perfectly reflect the population.
independent: items in sample are independnt if knowing their value does not help predict that of others.
- if sample <5% total pop (for cases of non-replacement)
1.2 - Summary Statistics
$\bar{X}=\frac{1}{n}\sum_{i=1}^n X_i$
$X_1, X_2, ..., X_n$ is a sample.
$s^2=\frac{1}{n-1}\sum_{i=1}^n(X_i-\bar{X})^2 = \frac{1}{n-1}(\sum_{i=1}^nX_i^2-n\bar{X}^2)$
Sample standard deviation: $s=\sqrt{s^2}$
*Purpose is to estimate spread of population about pop mean.
-so ideally computer deviation of all pop members
-dev abt $\bar{X} <$ dev abt $\mu$, dividing by $n-1$, instead of $n$ provides appropriate correction.
For $Y_i=aX_i+b$,
$\bar{Y}=a\bar{X}+b$
$s_Y^2=a^2s_X^2$, $s_Y=|a|s_X$
Note on Variance
Let's say you have an entire population, you can now calculate a variance using $N$ in denominator.
Let's say you have a sample and want the variance of the sample. Use $N$ in denominator
Both of these are true variances about a true mean. The whole pop or just a sample.
If you have just a sample but would really like to know what the pop mean and vriance are, then you'll need to estimate these. Pop variance is best estimated from sample mean using $n-1$ in denominator instead of $n$.
Esssentially, it comes to "sample variance" uses $n-1$ and that's just what is done. You don't use just $n$ no matter how much sense it might make. Perhaps unless you definitely have the entire sample.
-median is often used in presence of outliers.
SE mean: Standard error of the mean: $\frac{s}{\sqrt{n}}$, $n=$ sample size.
frequency: # of times category type appears in sample.
sample proportion (relative freq): $\frac{\text{freq}}{n}$
Summaries like all of the above are called statistics when done on a sample. Are called population parameters when on an entire pop.
Histograms
subranges containing data are called class intervals
rel frequency: $\frac{\text{freq}}{n}$
density = $\frac{\text{rel freq}}{class width}$
for unequal width, the y-axis must be density.
Skew:



mean~center of mass of the histogram
Probability
experiment: process whose outcome cannot certainly be predicted.
example: coin toss, die roll, measure bolt diameter
-weight cereal box contents.
sample space: set of all possible outcomes of experiment.
Example:
- coin toss {head, tails}
- die roll: {1,2,3,4,5,6}
- hole punch in metal: hole punch is 10 mm diameter but variations in metal and angle cause the hole diameter to vary between (10, 10.2). $\{x|10.0<x<10.2\}$
!! Choice of sample space:
Example:
- item length should be 5 but varies between 4 and 6. Sample space 1) $\{x|4<x<6\}$
- if you only want to know whether it is good or not, consider: {too long, too short, just fine}
event: subset of a sample space.
Example:
- die roll has sample space {1,2,3,4,5,6}; evens has subset {2,4,6}.
- hole punch (above) with holes <10.1 has subset $\{x|10.0<x<10.1\}$
An event has happened if the experiment outcome is within the event's subset.
Empty set 0 and the entire sample space are events for all sample spaces.
Combine simple events to create more complex events:
-
$A\cup B$ is the set of outcomes for which $A$ or $B$, or both $A$ and $B$ together, have ocurred. Is read as Union. Can be thought of as "or". Example: The event $A\cup B$ has happened if $A$ or $B$ has happened, or both.
-
$A\cap B$ is the set of outcomes which belong to both $A$ and $B$. Is read as intersection. Can be thought of as "and". Example: The event $A\cap B$ has happened if both $A$ and $B$ has happened.
-
$A^c$ complement of A is the set of outcomes which are not within $A$. So it is everything that isn't $A$.
-
Mutually exclusive events are events which cannot happen together.
Example: coin toss coming up heads or tails/
In general, events $A_1,...A_n$ are mutually exclusive if no 2 have the events in common (no overlap).
Probability
-
measure how likely an event is to ocurr.
-
proportion of times an event would occur over many runs of the experiment.
-
These are denoted, for some event $A$, as $P(A)$.
Axioms of Probability
-
Let $S$ be a sample space; $P(S)=1$
-
For any event $A$, $0\leq P(A)\leq 1$
-
If $A$ and $B$ are mutually exclusive, then $P(A\cup B) = P(A) + P(B)
If $A_1,...A_n$ are mutually exclusive, then $P(A_1 \cup \cdots \cup A_n) = P(A_1) + \cdots + P(A_n)$.
Example: $P(too short) + P(too long) = P(either is true)$.
If they weren't mutually exclusive, there would be double counting of the venn diagram overlap.