You can run into a lot of problems with this approach. First and foremost is, you get inaccurate forecasting from Elo scores. At it's heart, an Elo score is a precise way of saying how likely one person is to beat another. IE a person with 200 more Elo points than his opponent should win 75% of the time. When you allow teams to be composed of people with largely disparate scores; You end up with rapid swings in Elo scores, due to players being easily over and under ranked and then having the score over or under corrected.
This basically destroys the accuracy of predictive nature Elo. As the 200 point difference in scores doesn't accurately predict that the higher raked team will win 75% of the time. Not to mention that a individual's Elo scores is adjusted based on his team's average Elo Vs the opponents average Elo. So your score is inherently inaccurate as the adjustments to your score are somewhat arbitrary. You maybe either over or under rewarded for your performance in any given match based on your team's average Elo score.
This can lead to developers to make false assessments about the performance of the matchmaking system, depending on what metrics they are looking at to judge success. Looking at something like the distribution of Elo scores might give you the false sense that the matchmaker is functioning well. If you have a normal distribution of Elo scores you might assume all is well because the scores match the typical distribution of skill. What really could be happening is that the inherent inaccuracy of the system creates a feedback loop for the majority of players thereby sticking them within 1.5 standard deviations of the mode score. So you end up with bad matchmaking that looks good on paper.
This may also lead a developer to believe that he needs a larger tolerance in matchmaking spread, which compounds the issue. An inaccurate Elo system leads to a greater range of scores as a function of the fluctuation of players scores. Which means the standard deviation is much larger. Thus any matchmaking criteria is much wide than it needs to be.
An accurate Elo system will tend to concentrate scores along the mode score. Thus we end up with a smaller standard deviation. Which means inherently tighter matchmaking. That functions just as fast as the large standard deviation because the populations of the scores around the mode are greater.
From the published distributions of the Elo scores you can see the range of scores and standard deviation opens way up after 50 matches. That's mostly a function of the inaccuracy of the matchmaking system. So you can see the feedback loop develop and it tells us we need wide matchmaking spreads, which makes it self perpetuate.

What you really should keep you eye on is the frequency and variance in score changes, more is bad. And, the how accurately Elo is predicting outcomes. IF both are off, and your committed to Elo, you tune your system by lowering the maximum amount of points won in a match, and tightening the match and team building criteria.
Edited by Grits N Gravy, 11 November 2013 - 08:21 AM.