Statistics Revisited

This was not my first statistics course. It has been several years, but I previously studied many of the concepts outlined in our directed studies course: Quantitative Methods of Ed Tech (EDCI 690). However, statistics is not something I use on a daily basis and the concepts included in this course are not basic. So, with that in mind, I spent much of my August re-reading notes and studying three statistics textbooks to re-learn what I had forgotten.

Before I highlight some of the key terms and concepts that I studied in preparation for Quantitative Methods of Ed Tech, I should begin by listing the texts I read to prepare for this course.

The first text I revisited was Statistics for Business & Economics (Revised, 13th Edition). This text provided an excellent refresher on basic to intermediate statistics, as well as many of the manual calculations necessary to understand how values were arrived at (e.g., sum of squares). The text also provided data files that can be opened and analyzed using the add-on Analysis ToolPak in Excel.

The second text I turned to is one I purchased for ED-D 560, a beginner statistics course, before the class was cancelled in January 2021: Fundamental Statistics for the Behavioral Sciences (9th Edition) by David C. Howell. Howell’s text was invaluable and was one I subsequently ended up reading alongside the required course text. Howell has a way of explaining statistical concepts that makes them not only accessible, but also easier to remember. Further, he includes both SPSS and R examples. If I was recommending a statistics textbook, I wouldn’t hesitate to recommend Howell as an author to start with.

Finally, Andy Field’s Discovering Statistics using IBM SPSS Statistics (5th Edition). Unfortunately, I found the required text difficult to use on its own. Field’s way of writing and weaving story into the concept descriptions was distracting and less than helpful to me. As such, I scanned the beginning chapters in Discovering Statistics, while focusing on the Howell for my initial review of statistics. However, despite finding the text difficult, Field was very thorough in compiling a companion website. For example, the resources by chapter for Field’s text were very useful: https://edge.sagepub.com/field5e2/chapter-specific-resources/1.

Now that I have outlined my resources, I will highlight some of the key terms and concepts that are necessary to understand before starting Quantitative Methods of Ed Tech based on Field’s text (chapters 8-15).

Key Terms

  • Statistics: a set of procedures and rules. The outcome of the application of those rules and procedures to sample data
  • Descriptive Statistics: simply describe the set of data at hand (i.e., what the data say about a phenomenon)
  • Inferential Statistics: use statistics, which are measures on a sample, to infer values of parameters, which are measures on a population. In other words, draw a sample of observations from a population and use that sample to infer something about the characteristics of the population or draw inferences about characteristics of populations from characteristics of samples)
  • Variability: the degree of variability in the thing we want to measure is critical
  • Statistics vs. Parameters: basically, the former summarize data in a sample, while the latter are characteristics of a population
  • Scales of Measurement: nominal (names/labels things), ordinal (orders/ranks things), interval (equal intervals represent equal distances), and ratio (has a true zero point and allows us to use phrases such as “1/2 as much”)
  • Variables: properties of objects or events that can take on different values (discrete and continuous)
    • Independent Variable: both quantitative and qualitative, those that are manipulated by the researcher
    • Dependent Variable: usually quantitative, those that are not under the researcher’s control (i.e., the data)
  • Mean (x-bar or µ): a more stable or less variable estimate of the central tendency of a population. However, it is influenced by extreme scores
  • Standard Error: refers to how variable the mean would be over repeated samples
  • Variance (s2 or sigma-square): essentially the average of the squared distances between each observation and the sample mean (gets rid of the signs). It gives an unbiased estimate of the population estimate
  • Standard Deviation (s or sigma): is the positive square root of the variance. Basically, a measure of the average of the deviations of each score from the mean
  • Bias: a biased sample statistic is one whose long-range average is not equal to the population parameter it is supposed to estimate
  • df (N-1): is referred to as the degrees of freedom. It represents an adjustment to the sample size to account for the fact that we are working with sample values
  • Sampling Distributions: the distribution of sample statistics that we would see if we calculated some statistic from multiple samples from some population
  • Sampling Error: the variability of those statistics from one sample to another. This refers to the fact that the value of a sample statistic will probably be in error and will deviate from the parameter it is estimating as a result of certain observations in the sample
  • Error: means random variability
  • Correlation: when we are dealing with the relationship between two variables
  • Correlation Coefficient: simply a number between 0-1.0 reflecting the degree to which two variables are related. The most common correlation coefficient is Pearson (r).
  • Test Statistics: associated with specific statistical procedures and has its own sampling distributions (e.g., t, F)
  • p-value: is a probability that an observation will fall in the rejection region (represents those outcomes that so unlikely under H0 that we would reject). Often 0.05, 0.01, or 0.001 and referred to as the rejection level or significance level
  • Critical Value: those values of the x variable, or test statistic, that describe the boundary or boundaries of the rejection region(s)
  • One-tailed Test (directional): the situation in which we reject H0 for only the lowest/highest observations and the rejection region is only located in one tail of the distribution (usually 5%)
  • Two-tailed Test (non-directional): when we reject extremes in both tails (usually 2.5% from each end)
  • Regression: explains how differences in one variable relate to differences in another and that allows us to predict a score on one variable from knowledge of an individual’s score on another
  • Regression Line: should be interpreted as the line that gives the best predictions of y for a given value of x
  • Squared Correlation Coefficient (r2): explains the strength of a relationship between two variables or is the percentage of explained variation (i.e., an idea of how important the predictor is)

Concepts

  • The field of statistics has moved and is asking different questions. For example, the question used to be “is this difference reliable?” and “is this difference meaningful?”. Although these questions are still asked, more and more “is this what other researchers are finding as well?” is being asked as well.
  • Statistical questions often are based on two overlapping categories: differences and relationships.
  • To make sense of graphs:
    1. Identify what is plotted on each axis (vertical axis/y-axis/ordinate and horizontal axis/x-axis/abscissa)
    2. Identify the dependent and independent variables (e.g., the x-axis usually dependent in histograms, but often independent in line or bar graphs)
    3. Look for patterns and outliers in the data (e.g., histograms – look for the shape of the distribution; bar and line graphs – look for differences between groups and trends in the data)
  • Describing distributions:
    • Modality – the number of meaningful peaks in a distribution (i.e., unimodal, bimodal, multimodal)
    • Symmetry – symmetric if it has the same shape on both sides of the center
    • Skewness – looking at how the data tails of (i.e., positively (skewed to the right) or negatively (skewed to the left))
  • Measures of central tendency – measures that relate to the center of a distribution of scores (i.e., mean, median (middle score), and mode (most common score))
    • When the distribution is nearly symmetric and unimodal, the mean, median and mode will be in general agreement
    • When the distribution is asymmetric, the mean, median and mode can all be quite different
    • Mean is a more stable or less variable estimate of the central tendency of a population. However, it is influenced by extreme scores
    • Trimmed means: discard equal number of scores at each end of the distribution and take the mean of what remains. The use of trimmed means is most common in treating partially skewed data
  • Measures of variability:
    • Range – a measure of distance or distance between two points. Range is greatly affected by outliers
    • Interquartile range – obtained by discarding the upper and lower 25% of the distribution and taking the range of what remains. Basically, it is the range of the middle 50% of the observations OR the difference between the 75th percentile and the 25th percentile. Overall, it may yield a good estimate of variability, but it may discard too much of the data
  • Normal distributions:
    • If we can assume that a variable is at least approximately normally distributed, inferences about the value of that variable can be made (either exact or approximate)
    • N(µ, sigma) where µ=0 and sigma=1
  • Standardized scores:
    • Z-scores
    • Creates a linear transformation of the numerical values, the data have not been changed in any way
  • Sampling distributions:
    • Tell us what values we might or might not expect to obtain from a particular statistic under a set of predefined conditions
    • The probability of something happening if something else is true
    • The size of the sample plays an important role (e.g., means based on fewer scores are less consistent)
  • Hypothesis Testing:
    1. Specify a research hypothesis (H1)
    2. Set up the null hypothesis (H0)
    3. Collect some data
    4. Construct the sampling distribution of the particular statistic on the assumption that H0 is true
    5. Compare the sample statistic to that distribution and find the probability of exceeding the observed statistics value
    6. Reject or retain H0 depending on the probability (under H0) of a sample statistic as extreme as the one obtained
      • Note: we can never prove something to be true, but we can prove something to be false
  • Decision-making Process
    • Type I Error (α) – rejecting H0 when in fact it is true (the size of the rejection region)
    • Type II Error (β) – failing to reject H0 when it is actually false
  • Correlation Coefficients:
    • Range between -1 and 1. The sign denotes the direction of the relationship, and no relationship is 0.
    • r is not to be interpreted as a percentage relationship (e.g., r=.36 is not 36% of a relationship). Instead, it is simply a point on the scale between -1.00 and +1.00, and the closer it is to either end of those limits the stringer the relationship between the two variables
    • Remember correlation does not equal causation

Final Thoughts

After selecting these key terms and concepts from my written notes, I realized there is one more idea that I have not covered that I remember as being important in my previous statistics courses: recognizing the variables and how to interpret the questions in a study. Basically, more than being able to do the calculations, the most important skill is to be able to understand and recognize the variables and the design of the study. If you do not understand how to write and interpret a hypothesis or what assumptions underlie a research study’s questions, or even why they asked a question a particular way, then statistics will remain out of one’s grasp. My goal then for Quantitative Methods of Ed Tech is to develop my skills at interpreting and understanding quantitative research questions and the designs selected for them.

References

Anderson, D. R., Sweeney, D. J., Williams, T. A., Camm, J D., & Cochran, J. J. (2018). Statistics for Business & Economics (13th ed., revised). Cengage Learning.

Field, A. (2017). Discovering statistics using IBM SPSS statistics (5th ed.). SAGE Publications.

Howell, D. C. (2017). Fundamental Statistics for the Behavioral Sciences (9th ed.). Cengage Learning.

Leave a Reply

Your email address will not be published. Required fields are marked *