# Intro to Statistical Inference

1. The difference between populations and samples
1. Statistics and parameters
2. Using probability sampling to induce population parameters
2. How to sample
1. Simple random sample
2. Stratified samples
3. Systematic samples
3. Population, sample and sampling distributions
1. Sampling distribution defined – the distribution of all possible samples of size n from a population
2. Population, sample and sampling distribution means
1. The standard error is the standard deviation of the sampling distribution
3. Population, sample and sampling distribution standard deviations
4. The central limit theorem – with a sufficiently large sample size (50 plus) the sampling distribution approaches a normal distribution
5. Using sampling distributions
1. The confidence interval
1. Defined – estimating a population mean or proportion from the sample estimate
2. The notion of a confidence level – risk of being wrong
3. How sample size affects the confidence level
2. Calculations
1. Select the confidence level – usually 95 or 99
2. Look up the z score corresponding to your chosen confidence level
3. Calculate the standard error
4. Multiple the standard error by the z score corresponding to your chosen confidence level
5. Add and subtract the result from “iv” to your sample mean to get the confidence interval
6. Using spss to construct confidence intervals
7. ARCGIS
1. The elements of a map
1. GEODATABASE – a folder with the extension gbd that stores the related elements needed to make maps
2. It stores the geographic references and the attribute data needed to create a map
3. FEATURE CLASS – basically a layer. It includes attribute data and the relational locational aspects necessary to map it. (e.g. longitude and latitude
4. BASEMAP –  collection of background geographic information and related features (e.g. streets) that serves as background to the map.
5. Types of GIS MAPS
1. Vector – consists of point line and polygos
2. Raster – consists of cells (think of pixels) used to show e.g topography

## Homework

Chapter 8 spss problem 1, chapter exercise problem 10

## Lecture

1. Measures of Variability
1. Index of Qualitative Variability – used for nominal data
2. The range – used for interval data – simply the largest value minus the lowest value
1. The interquartile range, the difference between the cutpoint where 25 percent of the cases have a larger values, and the cutpoint where 25 percent of the cases have a lower value
2. The boxplot shows the range, the interquartile range and the median
3. The variance and standard deviation, a measure of the amount scores cluster at the mean or spread out from the mean. Used for interval data. The variance is the average squared deviation of scores from the mean
1. The standard deviation is calculated from the variance. You simply take the square root of the variance.
2. The standard deviation is calculated from the variance. You simply take the square root of the variance.
2. The normal curve
1. Symmetric around the middle
2. Mean, median and mode are the same
3. The concept of Z scores
1. Used to find area under the normal curve
2. Equals the number of standard deviations any score is from the mean
1. Calculations – subtract the mean from the score and divide by the standard deviation
2. Finding the percent of cases – the same as the percent area – between any point and the mean
3. Using the z table
4. Going from percents to z scores

## Homework

Assignment, Chapter 4 Exercise 2, Chapter 5 Exercises 2,4

## USP493 Session 3 Notes

1. Graphic presentation of data
1. For nominal and ordinal data you can do a pie chart or a bar chart
2. For grouped interval data you can construct a histogram
3. For grouped or ungrouped interval data you can construct line graphs
2. Measures of Central Tendency
1. Mode – can be used for nominal, ordinal and interval data the most often selected value (Note the mode is the value of the variable not the frequency) can have a bimodal distribution when more than two modes
2. Median – can be used for ordinal and interval data the “middle” value – half the sample is higher and half lower
1. line up all the cases from the smallest to the largest
2. find the middle position (N+1)/2
1. if odd number of cases, it will be the value of that case
2. if even number of cases, it will be the average of the values of the cases on either side of it. For ordinal data, simply state it is between the two values.
3. Locating a median if you have a cumulative frequency distribution
3. The mean – can be used for interval data
1. calculating a mean from a frequency distribution
1. multiple the number of cases which take on each value by the value, then add together and divide by the number of cases as usual
4. Relationship between mode, median and mean
5. Which one should you use?
6. The shape of a distribution
1. Symmetrical – mean median and more are the same
2. Positive skew – there are some extreme high values
3. Negative skew – there are some extreme low values
3. Measures of Variability
1. Index of Qualitative Variability – used for nominal data
2. The range – used for interval data – simply the largest value minus the lowest value. The interquartile range, the difference between the cutpoint where 25 percent of the cases have a larger values, and the cutpoint where 25 percent of the cases have a lower value. The boxplot shows the range, the interquartile range and the median
3. The variance and standard deviation, a measure of the amount scores cluster at the mean or spread out from the mean. Used for interval data. The variance is the average squared deviation of scores from the mean. The standard deviation is calculated from the variance. You simply take the square root of the variance.

## Introduction to Simple Univariate Descriptive Statistics

1. Frequencies and Percentage Distributions
1. Can be used for nominal and ordinal variables, interval usually when the data are collapsed.
2. Difference between proportions (f/N) and percentages (f/N)* 100
3. How to set up
1. Put the values of the variable along the side, including totals, list frequencies or percentages in a second column
2. But what if you have interval data?
1. One way to group the data
1. Subtract the lowest value from the highest value
2. Divide the subtraction by 10, this will give you your approximate class interval (ie how large each group is)
3. Round what you found in “b” so it is either an even number or a multiple of 5.
4. Establish the smallest interval so it starts with a multiple of the number you calculated in “c”
5. Group the data into these ranges
3. And what do you do with missing values?
1. If important, percentage them
2. Otherwise list in the bottom of the table
1. It will depend on your question. Often you compare several percentage distributions to each other.
5. Rates = the number of cases with the selected value/population size

SPSS: Introduction and constructing frequency distributions

## Assignment

(All in Social Statistics for a Diverse Society)
Chapter 2 SPSS problem 2, and chapter exercises problem 4
Notice that each chapter has hand calculation exercises and SPSS exercises

## Review

• Modeling – turning the research question into researchable form
• Selecting the attributes of interest
• Determining the causal paths
• Dependent vs independent variables
• Time order
• Logic/theory
• Moving towards a researchable hypothesis
• Specifying the unit of analysis
• Specifying the concepts
• More and less abstract concepts
• Conceptual and operational definitions
• Multi-dimensional concepts
• Permitting the ability to show association
• If there is a causal relationship then people who differ on the independent variable will differ, on the average, on the dependent variable
• If a person’s score on the independent variable changes, then their score on the dependent variable will change
• Over time – panel studies
• Across cases – cross-sectional studies
• Why correlation does not equal causation
• Spurious relationships (Sometimes called confounds)
• Variables and Values
• Level of measurement
• Nominal – just categories with no order to them
• Ordinal – the values have order but do simply rank from low to high, they are not counting something
• Interval/ ratio – the values are counting real things
• Dichotomous – the variable only has two values
• The purpose of statistics
• Describing how cases vary on a single variable (descriptive statistics)
• Describing the nature of association/correlation/the relationship between two or more variables (descriptive statistics)
• Generalizing from a sample to a population (inferential statistics

## Vocab

• Concept – a word that represents the similarities in otherwise diverse phenomena: classification by definition
• Variable – a measurable concept that takes on two or more values that are mutually exclusive and exhaustive. By mutually exclusive, I mean that each case (i.e. entity that is being measured) can be assigned to only one of the available values. By exhaustive, I mean that every case can be assigned to a value.
• Independent variable — the variable or variables that effect changes (that is “cause” or explain) in the dependent variable. The x’s
• Dependent variable — the variable that is affected by (is explained or “caused” ) by the independent variables. The y’s.
• Hypothesis — a testable statement predicting a relationship between two or more variables
• Case – one unit as defined by the unit of analysis
• Unit of analysis — the entity, generally described, about which data are gathered

## Homework

For each of the following examples, indicate whether it involves the use of descriptive or inferential statistics. Justify your answer

1. The number of unemployed people in the United States
• Inferential statistics: these numbers are estimates based on data about changes in initial unemployment claims and other data.
2. Determining students’ opinion about the quality of cafeteria food based on a sample of 100 students
• Sampling part of a population to make conclusions about the larger population is an inferential method.
3. The national incidence of breast cancer among Asian women
• This is another inferential statistic based on data about a sample of the population.
4. Conducting a study to determine the rating of the quality of a new smartphone gathered from 1000 new buyers.
• This is another inferential statistic based on data about a sample of the population.
5. The average GPA of various majors (e.g. sociology, psychology, English) at your university
• This is a descriptive statistic since all the data is available for analysis.
6. The change in the number of immigrants coming to the United States from Southeast Asian countries between 2010 and 2015
• This is a descriptive statistic since all the data is available for analysis.

Construct measures of political participation at the nominal, ordinal and interval/ratio levels

• Nominal: Party affiliation
• Ordinal: Political right/left spectrum alignment
• Interval/Ratio: Voter turnout