Figuring Out (Determining) Sample Size for Survey Research

Sawtooth Software

Last updated: 26 Mar 2024

Table of Contents

Figure Out Sample Size (Sample Size Determination)Sample Size Definition Factors Influencing Sample Size Determination Sample Size Formulas Figuring Out Sample Size: The Process Using Sample Size Calculators Troubleshooting Sample Size Issues Real-Life Sample Size Applications Determining Sample Sizes for Different Research Methods FAQ: Frequently Asked Questions About Sample Size Determination

Figuring Out Sample Size (Sample Size Determination)

Folks wanting to learn how to determine the right sample size for their research studies are badly underserved: nearly every article you can find on the internet tells, at best, just half the story. An inadequate sample size could lead to results that are far from the truth, costing your company millions in misguided investments.

The most common advice you’ll find on the internet often leads straight to those inadequate sample sizes. There are different samples size calculations for different purposes – for means (single or multiple, independent or dependent), for proportions (single, paired, independent), for multivariate statistics (factor analysis, regression, logit, etc.) and for experiments (e.g., conjoint, MaxDiff). For brevity’s sake we’ll focus on figuring out sample size for single proportions, leaving the reader to generalize for cases of two proportions, and for single, paired and independent means.

We’ll cover some rules of thumb about multivariate statistics and experiments. We’ll also differentiate between sample size for confidence intervals (the topic of almost every other article about sample size that you’ll find) and sample size for statistical testing (a topic that is almost uniformly neglected).

In this comprehensive guide, we'll dive deep into:

The definition of sample size and its significance in research
Factors influencing the determination of sample size
Step-by-step calculation methods for figuring out both sample size needs, confidence intervals and hypotheses testing.
Sample size advice for studies with complex analyses

Sample Size Definition

When we talk about sample size we just mean the number of respondents (people) that you include in your study. This number depends on whether you want to ensure that the results will (a) reflect the overall population's characteristics or (b) support managerially valuable hypothesis tests, or both.

Significance of Sample Size in Market Research?

Sample size is the currency with which you buy accuracy in survey research, both by generating quantifiable margins of error around any statistics we generate and by delivering credible hypothesis testing results.

Figuring out a properly defined sample size balances cost-efficiency with statistical rigor. It gives your study credibility and it offers a clearer lens through which you can understand your research findings.

To Summarize:

Sample Size Definition: The number of observations or respondents in a study.
Significance of Sample Size in Market Research: It directly impacts the credibility and value of the research.

Need Sample for Your Research?

Let us connect you with your ideal audience! Reach out to us to request sample for your survey research.

Request Sample

Factors Influencing Sample Size Determination

How to find the appropriate sample size depends on a few factors. Each requires careful consideration. Let's delve into these key factors.

Confidence versus power

This factor depends on whether you want your sample size scaled for precision (your margin of error or your confidence interval) or for power (i.e., for supporting hypothesis testing). Just for purposes of a sneak preview, the two formulas are slightly different (the formula for statistical power of a hypothesis test has one extra variable in it).

Population Size

Population sizes only matter in the rare case when your sample size will exceed 5% of the total population size. This happens so infrequently that we can refer anyone interested to Google “finite population correction factor,” which you can then add straightforwardly to your sample size formula.

More information about population vs sample

Margin of Error (Confidence Interval)

The margin of error is the range within which the population parameter is expected to fall. Smaller margins require larger sample sizes. Simply put, the more precise you want to be, the larger your sample size needs to be.

Confidence Level

Confidence level refers to the probability that the sample results will represent the population within the margin of error. Common levels are 90%, 95%, and 99%. Higher confidence levels require larger sample sizes.

Standard Deviation

Standard deviation measures how spread out the values in your data set are. When you expect a high variation, you'll need a larger sample size to capture it accurately.

Quick Reference Table:

Factor	Description	Impact on Sample Size
Margin of Error	Range within which the true population parameter is expected to fall	Inverse
Confidence Level	Probability that the sample results will represent the population parameter within the margin of error	Direct
Standard Deviation	Measure of the data set's dispersion	Direct
Power	How likely you are to find a significant difference if in fact one exists	Direct

Sample Size Formulas

Sample size formula for margin of error (confidence interval, precision)

You may recall when learning statistics that your professor showed a formula for a confidence interval, then did some algebra to use it to solve for sample size (n). That’s where this formula comes from, from the confidence interval around a single proportion:

Where:

n = Sample Size
Z_a/2 = Z-value that corresponds to desired confidence level (1.96 corresponds with the typical 95% confidence level)
p = Proportion of the population (since this is often not known, we usually use a worst case estimate of 0.5)
d = Margin of error (the radius of the confidence interval, or the precision)

Sample size formula for hypothesis testing

What your professor didn’t show you is that there’s a different formula when you want your sample size to support statistical testing. That’s where this formula comes from:

Where:

n, Z_a/2, p and d are as above and
Zb =the Z-value that corresponds to the desired level of statistical power (0.84 corresponds to the commonly used 80% power)

Figuring Out Sample Size: The Process

The sample size calculation process looks harder than it is. Just break it down into systematic steps. Here's how you can approach it, complete with real-world examples.

Step 1: Determine Confidence Level—Choose Wisely

The confidence level you select specifies how confident you can be that your sample results will reflect the true population parameter (a de facto standard is to shoot for 95% confidence). A higher confidence level, such as 99%, will provide greater assurance but will demand a larger sample size. A level like 99% might be appropriate for projects that carry high stakes, such as healthcare studies or regulatory compliance assessments.

On the flip side, a lower confidence level, like 90%, may suffice for quick market assessments or pilot studies. While it reduces the sample size needed, it does come at the cost of confidence in your findings. Here you accept a slightly higher risk that your sample results may not perfectly represent the broader population.

Rule of Thumb: For most business or academic research, a confidence level of 95% is considered a good starting point. For high-stakes, mission-critical projects, aim for 99%. For more exploratory or pilot projects where you can tolerate a bit more risk, 90% might be acceptable.

Z_a/2-the Z score for Confidence Level

In the context of confidence levels, this Z-score gives us the confidence level we want to have that the population score (mean, proportion, whatever you’re measuring) is within the margin of error, or contained within the confidence interval.

To calculate the Z-score, you can look it up in the standard normal distribution table, or use statistical software. The Z-score table below shows the Z-scores for the most commonly used confidence levels in market research (90%, 95%, and 99%).

Z-score Table for Common Confidence Levels

Confidence Level	Z_a/2
90%	1.645
95%	1.96
99%	2.576

Remember, the choice of confidence level dictates how much risk you're willing to accept, and in turn, influences the sample size and potentially, the viability of your project.

Example: Let's say you're researching consumer preferences for a new type of organic snack bar. You decide to go with a 95% confidence level, that is a 95% chance that your margin of error will include the population’s preference for the new snack bar. This equates to a Z-score of 1.96.

Step 2: Choose the Margin of Error/Precision

The margin of error measures the precision of your survey results. Simply put, a smaller margin of error (e.g., 2%) provides more accurate insights but requires a larger sample size. This can be particularly valuable when you're working on high-stakes projects or research where even minor errors could have significant business or policy implications.

Conversely, a larger margin of error (e.g., 5% or 10%) may suffice for exploratory studies or when resource constraints are a significant concern. In these cases, the benefit of a larger sample size may not outweigh the additional time and costs involved.

Rule of Thumb: Always weigh the trade-off between precision and resources to arrive at an optimal margin of error for your study. Larger samples give you more precision but they also cost more. Your margin of error directly influences both the quality and feasibility of your market research. This selection is not merely a statistical decision; it’s a strategic one that can have a meaningful impact on your project's success.

Example: Continuing with the organic snack bar study, you decide a 5% (0.05) margin of error is acceptable: you want your estimate to be accurate to with +/- 5% of the population percentage.

Step 3: Estimate Standard Deviation

The standard deviation is a measure of the dispersion or spread of your data points around their average value. A high standard deviation implies more variability, whereas a low standard deviation indicates that the values are more bunched around the mean.

Why Standard Deviation Matters: A high standard deviation, means that there's a larger spread in the opinions, attitudes, or behaviors of your target population. This level of variability could require a larger sample size to capture the differences adequately. In contrast, a low standard deviation simplifies things; the closer your data points are to the mean, the less sample you may need for precise results.

Rule of Thumb: If you don't have prior data to calculate the actual standard deviation, a typical approach for proportions is to assume a 50:50 split or a proportion (p) of 0.05. This conservative estimate maximizes your sample size and thereby reduces the chance of underestimating it. However, if you have historical data or pilot studies to draw from, use the observed standard deviation as it will provide a more accurate sample size tailored to your research.

Example: Given the lack of preliminary data on consumer preferences for organic snack bars, you choose p = 0.5 to maximize your sample size.

Step 4: Determine Your Level of Power (for Hypothesis Testing Only)

Power is your ability to identify a difference of a particular size in hypothesis testing. If being able to detect a difference of 5% is really important to you, then you want to have a lot of power to detect that size of difference.

Why Power Matters: In a statistical test we have to worry about both confidence and power, because we seek to avoid both false positives (through the confidence level) and false negatives (via the power level). If you calculate sample size and ignore power, your sample sill be too small to detect the things that matter to you and you increase your risk of experiencing a false negative. False negatives can be very costly in practice. Let’s say a new ad campaign will be so successful that it will increase sales by 10%. If your product has $500 million in sales, that 10% increase is $50 million. If you cut costs on sample size and get a false negative result, however, you could conclude that the new ad isn’t a success, and cost your company $50 million in lost sales.

Rule of Thumb: We usually want at least 70% or 80% power to detect differences when they are real. In truth, however, when setting both the confidence level and power, we should consider how costly are false negatives (concluding the advertising doesn’t work when in fact it does) and false positives (concluding a new ad is successful when it is not) and then tailor our confidence and power to reflect those costs.

Step 5: Apply the Appropriate Sample Size Formula

This is where determining the correct sample size formula comes into play. Let’s say we want to make sure our study can identify the percentage of respondents who want our new product. We want 95% confidence the proportion we measure will be within 10 percentage points of the population proportion, but we don’t really have a clue what that might be.

Example: Plug in the Z-score (1.96), estimated proportion (0.5), and margin of error (0.05) into the sample size formula for margin of error:

Note that we rounded our answer up to 385 because we can’t interview 0.16 of a respondent.

Actually, it turns out management wants to know the results of a statistical test. The current advertising scored 50% while it was in the testing phase, so we want to know if our new ad can beat the old one by 5%. Moreover, because we stand to lose sales if we get a false negative here, we want to have 80% power to detect a significant difference. Now we use the sample size formula for power:

Note that when we took power into account because we wanted to avoid a false negative) our sample size requirement more than doubled, from 385 to 784. Had the company gone out with a sample of 385, it would have had only a 50% chance of identifying a successful ad campaign! That’s research money very poorly spent, but it’s exactly what happens if you don’t take power into account.

Summary Checklist: Sample Size Determination Steps

Determine Confidence Level: Usually 95%, but sometimes 90% or 99%.
Choose Margin of Error: A small percentage (2-5%) is common.
Estimate Proportion of Population: Often 0.5 to maximize sample size.
Choose a level of power (hypothesis testing only): 80% is common, 70% is usually a minimum recommendation
Apply the Appropriate Sample Size Formula: Use the formula to find the ideal sample size.

By following these steps, you're well on your way to figuring out sample size correctly for your study. This is a cornerstone of robust and credible market research, one that balances the risks of false positives and false negatives so as to maximize the value of your findings.

Using Sample Size Calculators

Though the sample size formula is a reliable tool for manual calculations, let's face it—math can be tedious. Sample size calculators can offer a more convenient route, often giving you the same level of accuracy with just a few clicks. However, most online sample size calculators use only the sample size for precision formula and thus do not take into account power. To remedy this, you may want just to double the sample size from an online calculator (because when we chose 80% power in the example above, the sample size, 784, was about double the one that came from considering only the confidence interval.

Key Takeaway: Sample size calculators are your go-to tools for quick, accurate, and convenient calculations. Most sample size calculators neglect statistical power, however, so use them with caution.

Troubleshooting Sample Size Issues

Sometimes your calculated sample size may be impractical (unaffordable). However, there are some strategies you can employ to come up with a more affordable sample size (hopefully without compromising your research too much).

Lowering the Confidence Level

If your sample size is turning out too large for your resources, one option is to lower the confidence level. A move from a 99% to a 95% confidence level can noticeably reduce the needed sample size. Remember though, this makes your results less robust.

Lowering the Power

While this comes with risks, lowering your power to 70% from 80%, say, can reduce your sample size.

Increasing the Margin of Error

Similarly, widening the margin of error will also decrease your required sample size. While this increases the range within which your population parameter is expected to fall, it's a trade-off that can sometimes make the research process more feasible.

Key Takeaway: Tweaking your confidence level, power or margin of error can reduce sample size needs, but always weigh the pros and cons.

Troubleshooting Options

Strategy	Effect on Sample Size	Potential Downsides
Lower Confidence Level	Reduces	Greater chance of a false positive
Lower Power	Reduces	Greater chance of a false negative
Increase Margin of Error	Reduces	Less precision

Remember, these are options to help make your study feasible, but they do come with trade-offs. Always consider the impact of these adjustments on the reliability and credibility of your findings.

Real-Life Sample Size Applications

Understanding the mechanics of how to figure out sample size is great, but what does this mean in real-world settings? How has accurate sample size determination influenced the outcomes of actual market research projects?

Success Story

Let's consider a tech company that recently launched a new feature and wanted to gauge user satisfaction. By carefully calculating a sample size that took into account a 95% confidence level and a 4% margin of error, the company was able to reliably conclude that the feature was well-received, leading to its continued investment and improvement.

Consequences of Poor Sample Size

On the flip side, another business failed to adequately figure out sample size for a similar user-satisfaction survey. They concluded there was no change in user satisfaction, but there was and they missed it leading to misguided business decisions.

Key Takeaway: Accurate sample size determination isn't just academic; it has tangible implications for your business decisions and overall strategy.

Real-Life Implications

Success Scenarios: Precise sample size -> Reliable data -> Informed Decisions
Failure Scenarios: Inaccurate sample size -> Unreliable Data -> Misguided Decisions

Figuring out sample size is more than a statistical necessity; it's a vital business tool that can guide a company toward success or contribute to its failure.

Get Started with Your Survey Research Today!

Ready for your next research study? Get access to our free survey research tool. In just a few minutes, you can create powerful surveys with our easy-to-use interface.

Start Survey Research for Free or Request Product Demo

Sample Sizes for Different Research Methods

The calculations above work for a single proportion. Similar equations exist for confidence intervals and statistical tests involving differences in proportions and differences in means. Complex statistical models have their own sample size requirements.

Regression analysis/driver analysis

The old rule of thumb of 10 observations per variable in the model is useful and works for data of average condition. When using particularly clean data we may get by with as few as 5 observations per variable. More common will be data with higher than average levels of multicollinearity and this will require larger sample sizes. So if our regression model has 12 variables, the basic recommendation would be n = 10k = 10(12) = 120.

Logit

Because it estimates the shape of an S-curve rather than a straight line, logit is more sample size intensive than regression. The rule of thumb is 10 times the number of variables in the model divided by the smaller of the two percentages of the binary response: n = 10k/p. So if our model has 2 predictors and we expect the response will be about 60/40 we’d go with n = 10(12)/(0.40) = 300.

Segmentation

Previous advice was a bit all over the board, but the most recent paper on the topic suggested a sample size of 100 for every basis variable included in the segmentation analysis. So if we have 20 basis variables, that suggests n=2,000.

Factor analysis

One source suggests that samples of less than a hundred are held to be “poor,” 200 to be “fair” and 300 “good.” Others suggest that when the number of factors is small and correlations are large and reliable, samples of as few as 50 may be workable. Given the messiness of most survey research data, erring on the side of larger sample size seems prudent.

Tree-Based Segmentation

In classification or regression trees, sample is split and then split again, repeatedly. After three levels of pairwise splits, a tree model could have eight groups. For this reason, we usually recommend having at least 1000 respondents.

Conjoint Analysis/MaxDiff

Our usual recommendation about multivariate statistics (like conjoint analysis and MaxDiff analysis) is to have at least 300 respondents, or at least 200 per separately reportable subgroup. Another way to think about conjoint analysis is to work backward from the simulator: what size differences in shares would be worth capturing, and what size of sample do you need to capture them (using a sample size formula for the difference in two proportions).

Key Takeaway: The methodology you choose can significantly impact your sample size needs, so choose wisely and calculate accordingly. Tailoring your sample size to the specific demands of your chosen methodology isn't just best practice; it's crucial for obtaining valid, actionable insights.

FAQ: Frequently Asked Questions about Figuring Out Sample Size

You've journeyed through the intricate maze of sample size determination, but you may still have lingering questions. Let's tackle some of those.

How do you define sample size?

Sample size refers to the number of individual data points or subjects that are included in a study. It's a crucial aspect of market research that impacts the reliability and credibility of your findings.

What is a good sample size?

A "good" sample size is one that allows for a high confidence level and a low margin of error (and for statistical testing, a high level of power), all while remaining manageable and cost-effective. Figuring out the ideal sample size can vary based on the research methodology.

How do I calculate sample size?

To calculate the ideal sample size, you typically use a sample size formula that takes into account the statistic you want to study, your desired levels of confidence (and power), and the acceptable margin of error. Some online calculators can also do this for you.

And there you have it—a detailed guide on Understanding and Figuring Out Sample Size for Surveys.