Best way to manage LARGE MySQL databases

Portal Home > Knowledgebase > Articles Database > Best way to manage LARGE MySQL databases

Posted by jdk, 05-22-2011, 03:47 PM
I am about to put a new site online that will contain over 9 billion MySQL records however, I do not know the best way to manage a database of this size. As an example lets use phone numbers. When someone searches for a phone number, it will search the database and pull the information out of it. If it is not there, the user can have the option of entering info for the phone number and it will be saved. The site will use Php (possibly Ajax/jQuery) to pull the data. Some ideas I have had to handle the database: 1. Create a separate table for each starting letter, number, character (A-Z,0-9,~!@#$$....) and have each record that starts with the certain number or character in that table. 2. One giant table. 3. Compress the individual tables somehow and retrieve the data in those tables. Does anyone have experience with managing databases of this size and what the best option is? Should I just do a straight mysql database or what? I have no problem paying for a consult from an experienced programmer/database manager.
Posted by luki, 05-22-2011, 04:17 PM
One thing you don't mention is the size of the database (data + index). With 9 billion rows, the index will be large (50+ GB, at least). I can't comment on 9 billion rows, but we have one database with 2 TB of data in a single table with ~500 million rows. We used table partitioning (see manual) into 128 partitions by hash (1 partition = 1 data + 1 index file). Then you end up working with ~16 GB files rather than one huge file. The setup has been pretty stable for several years.
Posted by Krishopper, 05-22-2011, 04:50 PM
For something that big, you might also want to look at a more scalable database solution like HyperTable or Hbase (part of Hadoop). It sounds like your project is a custom written one, so you should easily be able to adapt to those new technologies. Keep in mind future growth, also. That would allow you to scale to billions and trillions of rows easily and quickly, and horizontally.
Posted by jdk, 05-22-2011, 05:11 PM
It sounds to me like I am going to have to research this a little further and perhaps hire someone for a consultation and assist getting things set up. I was planning on running this off of a cloud or VPS, but I am not sure if that is going to be good enough. Right now I have 1.1 million entries in my database on the live site and it is taking up 100mb of space. So 1 billion entries will give me roughly 1GB database size, so I am estimating for 5 billing entries the space will be 10-20GB of space. Could it be possible to run this off of compressed flat files instead of a database or is a MySQL database the better way to go? Last edited by jdk; 05-22-2011 at 05:15 PM.
Posted by Panopta, 05-22-2011, 05:41 PM
I think your math is off a bit - if roughly 1 million entries takes up 100 MB of space, 1 billion entries will take up 100 GB of space, and 5 billion will take up 500 GB or half a TB. As for flat files vs. MySQL, it really depends on what you'll be doing with the data. If you're going to be making lots of random queries into the dataset and need quick turnaround time, something like MySQL with indexes on key fields will be much faster. If you're doing more batch operations where you need to scan through the full dataset to perform computations, flat files could be faster, although something like Hadoop and HDFS will be much faster still.
Posted by jdk, 05-22-2011, 05:47 PM
Thanks my math was a bit off there. I may have to trim down my initial upgrade to the site as I want to keep things under 50GB initially. There are no computations. The site simply queries the database for a unique key and if founds will display the database. If not it will allow the user to add details for the searched string into the database. So it sounds like mysql is the way to keep it. Would having different tables slow the searching down any or would it be a good idea vs one large table with the unique keys.
Posted by nafirici, 05-24-2011, 08:29 PM
If the database is really phone numbers, you could partition by area code or something like that. Another option would be to use MySQL Cluster for all the writes and replicate the cluster out to multiple slaves and load balance against them for the reads. This would give you very fast scalable writes and then the benefits of fast myisam reads with the load spread out. Cluster is mostly used as in memory data, so you would need to figure out the size of your data and indexes and get as many data nodes as you would need to hold the data and grow. When you need more space, you would add more data nodes. The value of NoOfReplicas is also something to keep in mind when doing this calculation.
Posted by Steve_Arm, 05-24-2011, 11:36 PM
I can see only 2 options with this. 1. If there will be no fancy operation on the data and no further features upgrade to the application, I would build a custom system. 2. Use table partitioning with MySQL plus custom caching and "data search" techniques.

Add to Favourites Print this Article