Elddric, on 13 December 2012 - 08:04 PM, said:
Actually..... i work for a company that counters your statement .... we have logs from over 5000 terminals processing hundreds of transactions per second each that go back years. We run reports that span years to gather sales data. Db logs are better than you think.
(5000 * ((100 x 60) * 60) * 24) = 43,200,000,000 transactions per day.
Now that's only 100 per second, rather than hundreds and if those were say .. just 50 bytes each (not much info there) ... you'd be talking 2,150,000,000,000 bytes, or a little over two terabytes of data a day. Even at the best case with the best compression algo (which isn't going to happen) you're still in the 125 gigabyte a day range, which means in a year you're at ... roughly 45 terabytes.
Realistically, lets say you're you're getting 8:1 compression rather than 16:1 (far more plausible) but we'll still stay with the tiny 50 bytes per transaction at 100 per second. That puts you at 90 Terabytes a year.
Where ya storing years of that where you can do analysis?
Just curious, because when you look at big data systems that can do that like TerraData, Hadoop, Greenplum, etc ... you also get replication (to provide fault tolerance) so you're in the half a Petabyte range, at least, which is a
lot of nodes and is up there with some of the largest installs I'm aware of.
Where do you work? Sounds cool.
Edited by Lin Shai, 13 December 2012 - 08:40 PM.