Intro to Statistics & R
1
Welcome to PSY317 / PSY120R !
1.1
What this book includes and what it doesn’t
1.2
How to use this guide
1.3
Acknowledgements
1.4
References
1.5
Other places to find help about R and Statistics
2
Introduction
2.1
Downloading R
2.2
Downloading RStudio
2.2.1
Successful installation
2.3
Using RCloud
2.4
The RStudio Environment
2.5
Running Code
2.5.1
The Console
2.5.2
RScript
2.5.3
Saving an RScript
2.5.4
Open an existing RScript
2.5.5
Running Code in Scripts
2.6
Packages
2.7
Working with RStudio in PSY317L
2.8
Quitting RStudio
3
Basic Syntax
3.1
Simple mathematical syntax
3.2
Assignment
3.3
Vectors
3.4
Characters
3.5
Naming of objects
3.6
Logical Operators
3.7
Some things that are useful to know.
3.7.1
Tab is your friend
3.8
Error Messages
3.9
Functions
3.10
Chaining Syntax
4
Introduction to Data Carpentry
4.1
Data Types
4.1.1
Categorical Data
4.1.2
Numerical Data (Discrete vs. Continuous)
4.2
Importing Data
4.3
Introduction to Dataframes
4.3.1
Dataframe basics
4.3.2
Indexing dataframes.
4.3.3
Adding and removing columns
4.3.4
Structure of Datasets
4.4
Manually creating a Dataframe
4.5
tidyverse
4.5.1
table()
4.5.2
filter() - Subsetting Data
4.5.3
select() - Selecting specific columns
4.5.4
mutate() - Creating new columns
4.5.5
arrange() - Sort Data Columns
4.5.6
Chaining together
4.6
Wide versus Long Data
4.6.1
Wide to Long
4.6.2
Long to Wide
4.6.3
Real Data Example.
4.7
Joins
5
Data Visualization
5.1
Introduction to ggplot2
5.1.1
Assigning plots
5.1.2
Titles and Axes Titles
5.1.3
Colors, Shapes and Sizes
5.1.4
Themes
5.2
Histograms
5.2.1
Histograms with ggplot2
5.2.2
Density Curves
5.2.3
Comparing Distributions
5.2.4
Stem-and-Leaf Plots
5.3
Scatterplots
5.3.1
Bubble Charts
5.4
Line Graphs
5.4.1
Multiple Line Graphs
5.5
Comparing Distributions across Groups
5.5.1
Strip Plots
5.5.2
Boxplots
5.5.3
Violin Plots
5.5.4
Stacked Boxplots
5.5.5
Ridgeline Plots
5.6
Bar Graphs
5.7
Small Multiples
5.8
Saving and Exporting ggplot2 graphs
6
Descriptives
6.1
Sample vs Population
6.2
Sample and Population Size
6.3
Central Tendency
6.3.1
Mode
6.3.2
Median
6.3.3
Mean
6.4
Variation
6.4.1
Range
6.4.2
Interquartile Range
6.4.3
Average Deviation
6.4.4
Standard Deviation
6.4.5
Variance
6.4.6
Average versus Standard Deviation
6.4.7
Sample Standard Deviation
6.4.8
Sample versus Population Standard Deviation
6.5
Descriptive Statistics in R
6.5.1
Dealing with Missing Data
6.6
Descriptives for Datasets
6.6.1
Descriptives for Groups
6.6.2
Counts by Group
7
Distributions
7.0.1
Uniform Distribution
7.0.2
Bimodal Distribution
7.0.3
Normal Distribution
7.0.4
Standard Normal Distribution
7.0.5
Skewness and Kurtosis
7.1
Z-scores
7.1.1
z-scores in samples.
7.1.2
Using z-scores to determine probabilities
7.2
What is a Sampling Distribution ?
7.2.1
Sample Size and the Sampling Distribution
7.3
Central Limit Theorem
7.4
Sampling distribution problems
7.5
The t-distribution
8
Confidence Intervals
8.1
Sample means as estimates.
8.2
Calculating a confidence interval with z-distribution
8.2.1
Other Confidence Intervals ranges
8.2.2
Confidence Intervals and Sample Size
8.3
Confidence Intervals with t-distribution
8.4
Calculating a t-distribution Confidence Interval
8.4.1
t-distribution CIs and sample size.
8.4.2
Other Confidence Intervals ranges for t-distribution
8.5
Comparing CIs using the z- and t-distributions
9
Hypothesis Testing
9.1
Two-tailed and One-tailed tests
9.2
Examples of 1- and 2-tailed tests
9.3
Significance Levels and p-values
10
One Sample Inferential Statistics
10.1
One-sample z-tests
10.1.1
Sampling Distribution Recap
10.1.2
Calculating p-values for z-test
10.1.3
Using critical values
10.2
One-sample t-tests
10.2.1
Critical values for the one-sample t-test
10.3
Conducting one-sample t-tests in R
10.4
Assumptions of the one-sample t-test
11
Two Sample Inferential Statistics
11.1
Independent Samples t-test
11.2
Sampling Distribution of the Difference in Sample Means
11.2.1
Visualizing the Sampling Distribution
11.3
Pooled Standard Deviation
11.4
Theory behind Student’s t-test
11.5
Confidence Interval for Difference in Means
11.6
Conducting the Student t-test in R
11.7
Assumptions of the Independent t-test
11.8
Welch’s t-test
11.9
Effect Size for Independent two sample t-tests:
11.10
Paired t-tests
11.10.1
The paired t-test is a one-sample t-test
11.10.2
One-tailed paired t-tests
11.10.3
Calculating effect sizes
11.11
Non-parametric Alternatives for Independent t-tests
11.12
Non-parametric Alternatives to the Two Sample t-tests
12
Correlation
12.1
Pearson Correlation
12.2
Cross-products
12.3
Conducting a Pearson Correlation Test
12.3.1
Significance Testing a Pearson Correlation
12.4
Assumptions of Pearson’s Correlation
12.5
Confidence Intervals for r
12.6
Partial Correlations
12.7
Non-parametric Correlations
12.8
Point-Biserial Correlation
13
Linear Regression
13.1
Introduction to Linear Regression
13.2
a and b
13.2.1
How to calculate a and b in R
13.2.2
How to calculate a and b ‘by hand’
13.3
Residuals
13.3.1
How to calculate the residuals
13.3.2
Visualizing the Residuals
13.3.3
Comparing our trendline to other trendlines
13.3.4
Coefficient of Determination R2
13.4
Standard Error of the Estimate
13.4.1
What to do with the Standard Error of the Estimate ?
13.5
Goodness of Fit Test - F-ratio
13.6
Assumptions of Linear Regression
13.6.1
Normality of Residuals
13.6.2
2. Linearity —
13.6.3
3. Homogeneity of Variance / Homoscedasticity
13.6.4
No Colinearity
13.6.5
Unusual Datapoints
13.7
Examining individual predictor estimates
13.7.1
95% confidence interval of ‘b’.
13.7.2
Standard Error of b
13.7.3
Calculating 95% confidence interval of ‘b’ by hand
13.7.4
Signifcance Testing b
14
Permutation Testing
14.1
t-test Permutation
14.2
Correlation Coefficient Permutation Tests
14.3
Permutation test for a Paired t-test
14.4
Permutation tests in Packages
15
Analyzing Categorical Data
Published with bookdown
PSY317L & PSY120R Textbook
Chapter 15
Analyzing Categorical Data
This chapter will explore analyzing categorical data using Chi-Squared and Fisher-Exact tests.