Mr. Smiles, on 29 November 2011 - 08:26 PM, said:
Statistics is a bit wonky and hard to understand sometimes. I'll try to explain it as best I understand it--someone with actual professional statistics knowledge, please correct me here.
Take a population of 28,000 people (the MechWarrior Online registered user count). You have a true-false question you want to ask them, but have no idea just how skewed the answers will be toward true or false. Worst case scenario, a 50% "response distribution" as it's called.
Now, there is a "true" answer out there. Suppose your yes:no question has, if you asked all 28,000 people, a ratio of let's say 60:40. You have to set a "margin of error", which means how close to the "true" answer you're willing to get: a typical margin of error is 5%. A margin of error of 5% in this example would mean that your ratio would be anywhere from 65:35 to 55:45.
Obviously, the amount of skew you have makes larger margins of error more acceptable. That is, if the true answer was 90:10, a margin of error of even 25% would still only make the answers range from 100:0 to 65:35. Still a massive majority, and if majority's all you care about, who cares if it's wrong in the specific numbers?
Then, you also have to set a "confidence level". Confidence level is how often you're willing to permit your answer to be beyond your margin of error. Suppose we had 20 questions, true answers are all 60:40, and we set the confidence level at 95%. That means that for 19 of our questions, we would get an answer between 65:35 to 55:45... and for one of those questions, we would get something even farther off, like 70:30 or 40:60.
* * * *
Now, statistics thankfully takes all of these values, and has a way to plug them into a formula to churn out a single number: how many people you have to question to get the values you input.
Of course, you're absolutely right that getting thousands of responses would be the absolute best thing. If I wanted to have only 1 in 100 questions be off my margin of error of 5%, I would only need a sample of 649 people. If I had 1,000, well, you can't get much better than 99% confidence level, so 1,000 people would get me a margin of error of 4%. Meaning, if I had a yes:no ratio of 55:45 on a question, I would be 99% sure that the yeses would still have the majority, since I'm 99% sure that the worst case scenario is the "true" answer is a 51:49 ratio.
Fortunately for me, I still have some pretty good statistics with only 300 people. That gets me a confidence level of 95%, and a margin of error of 5.63%. That's something I can live with...
...except with the really, really narrow questions. Anything with a difference of less than 5.63% between the answers I'm not sure about, and I'm only 95% confident about the rest of them.
...and that assumes that there's absolutely no skew with my answers. But look at them. Most of them are in the 60%'s, some are at 99.9%. Therefore, most of my questions, I could've gotten away with far less respondents when writing about my answers. But, I don't know enough about statistics to change my "response distribution", so I'll leave it the worst case scenario, 50%.
NOTE: Everything I just wrote is based on my best knowledge based on Google and Wikipedia and what makes sense to me. I'm an English person first, a Calculus person second, and a Statistics person like... 74th.