forbestheatreartsoxford.com

Making Data Useful: Navigating the Complexities of Sampling

Written on

Chapter 1: Understanding Simple Random Sampling

When it comes to data collection, many people instinctively revert to simple random sampling (SRS) as a method they learned in their introductory statistics courses. This is understandable, as SRS is a reliable approach when the circumstances allow for it.

However, each time I engage a new group of students in a discussion about data collection methods, I often hear the term "just" in their responses. For instance, they might say, “Just select them completely at random.”

Let’s examine this notion in a real-world context.

A Data Scientist's Initial Challenge

Picture yourself as a data scientist tasked with estimating the average height of pine trees in a forest, as depicted in the image below.

Pine trees in a forest

With an abundance of information available online regarding tree heights, it’s evident that you are not the first person to take on such a task. Many have successfully measured tall trees. So, how complicated could this really be?

(Note: For clarification on any technical terms mentioned, please refer to the linked explanations.)

Facts vs. Statistics

If you were able to measure every tree with complete accuracy, you wouldn’t need statistics; you would simply have the facts. But is having all the facts necessary? Or can you work with statistics instead?

Statistics enables you to draw conclusions even when you don’t have access to all the desired data. By measuring a select few trees (a sample) rather than the entire forest (the population), you can obtain a cost-effective yet informative perspective. This approach is particularly useful when the total number of trees in the forest is unknown.

Let’s focus on collecting an adequate sample of trees to avoid the impractical task of measuring every single one!

Under the guidance of your supervisor, you are instructed to take precise measurements of 20 trees chosen at random. Following advice from previous discussions, you confirm that this plan is sound for your project. The groundwork is laid!

What does your STAT101 course suggest you do next?

Measuring trees in a forest

In my experience teaching over 100 classes, whenever I ask students how to select the trees, I consistently hear similar responses:

“Just select them [entirely/completely] at random.”

AND/OR

“Just take a simple random sample.”

Embracing Simple Random Sampling

I understand why SRS comes to mind as the go-to answer; it’s an excellent choice when applicable. Yet, what I find amusing is the frequency with which the word “just” appears in their answers.

Just… no.

Anyone who suggests that you can simply “just select them entirely at random” fails to grasp the complexities involved in proper sampling methods. This task is far from straightforward, especially when real-world factors come into play.

Forest with trees

Imagine that you dislike the outdoors so much that you hire someone else to measure the trees for you. You might engage a hiker without any technical expertise, instructing them to “just” select 20 trees at random.

If I were that hiker, I might just choose the first 20 trees I see, teaching you a lesson about the importance of precise instructions.

Sampling Procedures

Terms like simple random sampling, SRS, and entirely at random all describe a sampling method where each tree has an equal chance of being selected.

It’s only considered a true simple random sample if this equal selection probability is maintained; otherwise, the results may be misleading.

There’s a reason why SRS is often the first sampling method taught in statistics classes: it’s straightforward in terms of calculations. While other sampling methods exist, they often involve more complex calculations that are typically beyond the scope of introductory courses.

If you analyze your data according to the guidelines of STAT101 without implementing a true simple random sampling method, your results may be fundamentally flawed.

Always strive for clarity in your instructions, as you never know when misunderstandings might arise.

If your hired hiker chooses trees that are conveniently located near the forest’s edge, that’s not a simple random sample; it’s known as a convenience sample—an approach you should steer clear of. Analyzing such data using SRS methods is statistically inappropriate because those trees may not accurately represent the entire forest, leading to incorrect conclusions.

To discover what a professional statistician would advise, continue to Part 2!

How to Create an Effective Sampling Plan for Your Data Project

When simple random sampling isn't as simple as it seems...

The video titled "Simple Random Sampling" provides an overview of the method and its applications in statistical analysis.

Another video, "Simple Random Sample vs a Random Sample," clarifies the differences between these two sampling techniques, offering valuable insights for your data projects.

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Harnessing the Power of Your First Dollar: A Motivational Guide

Discover the significance of earning your first dollar and how to leverage that feeling for ongoing motivation in your business endeavors.

What 4 Years of Writing Experience Taught Me — Insights for You

Discover valuable lessons from four years of writing and how they can impact your journey.

Exploring the Web3 Landscape: My Personal Journey and Insights

A personal account of my experiences and insights gained while exploring the evolving landscape of Web3 technology.