sync_binlog on ext3

I’ve mentioned before about how the sync_binlog setting in MySQL can be especially slow on ext3. Of course, I wasn’t the first to discover this; the MySQL Performance Blog mentioned it months ago.

I was reading through some of the slides I mentioned in my last post, and remembered that I’d left sync_binlog off on an in-house replicated slave. You’re able to set it on the fly, so a quick set global sync_binlog=1 was all it took to ensure we flushed everything to disk.

A while later I blogged about dstat and thought to run it on the in-house MySQL slave. I was confused to notice that the CPU was about 50% idle, 50% “wai” (I/O wait). For a box that’s just replaying INSERTs from production, that’s really bad. Below is a graph of load on the system. Care to guess when I enabled sync_binlog?

Performace with and without sync_binlog enabled

Performace with and without sync_binlog enabled

Disk I/O does roughly the same thing, but it’s less pronounced, “only” doubling in volume. But the difference is still pretty insane!

Cool MySQL Projects

I think part of the thing I love about MySQL is the same thing I love about Firefox and Thunderbird. Even though I sometimes wonder if there are databases that would work better than MySQL or have fewer bugs, MySQL has an incredibly rich community that’s home to myriad fascinating projects.

I’ve looked into MySQL Proxy at work a bit, and found it pretty interesting, but it’s still considered alpha-level code, and we don’t really need a proxy in front of our database right now.

I’m not even going to pretend that the links that follow are all things I’ve just found on my own. They’re inspired from this Xapbr post, with Xapbr being a blog run by one of the top guys at Percona, and Percona being a well-known MySQL consulting firm, perhaps best known for the O’Reilly High Performance MySQL book. (A must-read even for people who aren’t full-time DBAs!)

It turns out that MySQL Proxy has a competitor, Dormando’s Proxy for MySQL, better known as dpm. It’s a BSD-licensed proxy created by SixApart’s DBA. (SixApart used to own LiveJournal, which is well-known for releasing memcached to the community.) Given that I’ve worked with neither dpm nor MySQL Proxy, I’m not able to comment on the merits of either, but it’s a handy link to have.

I’m also interested in the Tungsten Replicator, an Open Source project providing (emphasis mine) “database-neutral master/slave replication,” which was interesting enough to keep me reading past the words “JDK” and “Ant,” which are technical terms meaning “bloat” and “memory leaks.” (I kid… Sort of.) It looks like the project is mostly MySQL-centric right now, but capable of replicating MySQL databases into Oracle. (But apparently not the other way, probably since Oracle binlogs aren’t as publicly-documented as MySQL’s.)

And then there’s WaffleGrid, a project using memcache as an L2 cache. That is, MySQL will first look in its own caches, but in the event of a cache miss, will check memcache before turning to disk. It looks like it gives stellar results, even though it’s one of those things that seems kind of wrong. (You’d think that, if you’re adding memory to better cache your database, you’d want to put it in the MySQL machine, and if you needed to scale to a cluster of memcache boxes, you might be better off spreading the load around several MySQL boxes… But at the same time, clustering isn’t something MySQL excels at right now.)

Drizzle is “a lightweight SQL database for [the] cloud and web,” which is forked from MySQL 6. (And for those of you scratching your heads: yes, I said version 6. Yes, 5.1 is the latest release branch.) There are some big names working on Drizzle, and besides “the cloud,” they talk a lot about “massive concurrency,” which is something that should be of interest even to people running more traditional databases.

And, of course, there’s Percona’s XtraDB engine for MySQL, a significant overhaul to InnoDB. It seems a bit of a young technology for production environments, and yet it seems like it’s production environments where it would have the best impact.

This is only tangentially related to MySQL, but I’ve seen Sphinx mentioned in at least three distinct places today, a SQL-based fulltext search index. I’d be curious in seeing how it compares with Lucene.

Delete Old Files in Linux

Here’s a good one to have in your bag of tricks: all the time I wind up with a directory where I just want to delete anything older than a certain number of days. Here’s a pretty simple one:


for i in `find $DELETE_PATH -ctime +$DAYS_TO_KEEP`; do
ls -lh "$i"
#rm -vf "$i"

The delete is commented out, so it will just list the files. Uncomment when you’re convinced it does what you want. Obviously, change the top two lines to suit your needs, and play around with find (for example, there’s an atime instead of my ctime). But it’s a handy little thing to have around and just adapt as needed.

bash: Looping over a range

This falls into the category of things that are simple yet I’m always trying to remember. Sometimes in bash I find myself wanting a for loop, not of the usual foreach type (for i in `ls /etc/`; do ...), but a more classical one (a la C-style for var=0; var<100; var++ ... syntax). Here’s the bash way:

for i in {0..99}; do
echo $i

The output is a list of numbers from 0 to 99. For a fun example of when you might need this, consider the case of noticing that Postfix isn’t running on a machine and starting it. Your mail client promptly crashes, and the flurry of output in the log indicates that it’s delivering mail as fast as it can, but complaining that some mail has been in the queue for 180 days. “Holy smokes,” you exclaim, stopping Postfix. “Just how much mail is there?” You’re not entirely sure because you aborted your ls after it sat for a good minute with no output. That’s never good. du -sh shows about a gig and a half.

“I’ve got this under control,” you think. rm -f /var/spool/postfix/*. Ta-da! Wait, what’s this “/bin/rm: Argument list too long” business? That error can’t possibly be good. rm very rarely complains. So we tried a smaller delete size, thinking we could do postfix/1*, postfix/2*, etc. Just step through it. That, too, was too much for rm.

So it ended up taking an absurd for i in {0..99}; do rm -f /var/spool/postfix/$i; done to purge the mail. (And that didn’t catch all of it, either; I’m not sure off-hand how to do this in hex.) Each iteration deleted somewhere around 2,000 mail messages, making me think there were something like a half-million messages. (00-FF, times a little under 2,000.)

Removing Duplicates from MySQL

I Googled that a while ago. How do you remove duplicate rows from a SQL database? (MySQL specifically.)

If you have a ‘simple’ duplicate setup, it’s pretty easy, and lots of other sites go into great detail. (And in that case, you might look into how to require that fields be unique so you don’t have that problem anymore…)

In our case, we ran into an obscure problem where we expected the combination of two fields to be unique. Without publicizing anything secret, we had a summary table of user_id and month. (Not that it’s actually anything secret, but I refer to the table I did this on as “tablename,” just since blogging about our production database schemas seems like a bad idea…) A brief bug introduced some duplicates. My task? Remove the duplicate rows.

So I went about it somewhat backwards, and started wondering how do a SELECT DISTINCT across two columns. You can’t, really. Someone recommended something clever: CONCAT() across the two field names, and a DISTINCT on that: SELECT DISTINCT(CONCAT(user_id, month))… After toying with that a bit, I was able to create a temporary table housing a de-duped version of our data:

create temporary table nodupes
SELECT DISTINCT(CONCAT(user_id, month)) asdf, tablename.*
FROM tablename GROUP BY asdf;

But how does that help? The problem with temporary tables is that they’re, uhh, temporary. If you lived on the edge, you could probably “DROP TABLE original_table; CREATE TABLE original_table (SELECT * FROM temp_table)”, but this has a few horrible problems. One is that you’re throwing away all your disk-based data so you can populate it with a version that exists only in RAM. If anything went wrong, you’d lose the data. Furthermore, in our case, this is a live, busy table. Dropping it in production is awful, and it would take many minutes to repopulate it since it’s a huge table.

So then it was clear what I wanted to do: something like DELETE FROM original_table WHERE id NOT IN(SELECT * FROM temp_table_with_no_dupes), except not using an IN() on a table with hundreds of thousands of rows. And this is where it got tricky. The key is to use that temporary table to derive a new temporary table, housing only the duplicates. (Actually, speaking of “the key,” create one in your temporary table: CREATE INDEX gofast ON nodupes(id); — you’ll need this soon, as it’s the difference between the next query taking 10 seconds and me aborting it after 30 minutes.)

The temporary table of duplicate rows is basically derived by joining your ‘real’ table with the list of de-duped tables, with a rather obscure trick: joining where id=NULL. So here’s what I did:

(SELECT tablename.*
FROM tablename
LEFT OUTER JOIN nodupes nd

For bonus points, we’re selecting from the real table, so the silly CONCAT() field doesn’t come through into this temporary table.

So once again, I momentarily flashed a sly grin. This is going to be easy! DELETE FROM tablename WHERE id IN(SELECT id FROM dupes);. But then my sly grin faded. It turns out that there are two problems here. The first is that there are still a lot of rows. In my case, I was doing an IN() with about 6,000 rows. MySQL doesn’t like that. But there’s a bigger problem. (Thanks to Andrew for confirming that this is a real bug: #9090.) MySQL doesn’t use indexes when you use a subquery like that. It seems the bug is pretty narrow in scope, but this is precisely it. So that query is awful, basically doing 6,000 full table scans. And on a really big table.

So the solution comes from something I’ve never even thought about doing before. MySQL calls it the “multi-delete.” I call it the, “Holy smokes, this could go really, really wrong if I’m not careful” delete.

-- Don't run this yet
FROM tablename AS bb
RIGHT JOIN dupes d
ON (;

That’s the command you want, but a few caveats. First, change the first line to “SELECT *” and run it to see what happens. Make sure it’s only grabbing the duplicate rows. You’re not able to use a LIMIT in a multidelete query, so you have to be extremely careful here. Second, we tested this multiple times on various test databases before doing it against our master database, and I was still uneasy when it came time to hit enter. Keep backups before you do this. And finally, in many cases, MySQL will expect an explicit “USE database_name” before the multi-delete, where database_name is the database housing the table you’re deleting from. Keep that in mind if you get an error that it can’t find the table, even if you explicitly reference it (via database_name.table_name).

Viewing all cron jobs

Periodically I run into the situation where I’m trying to find a cron job on a particular machine, but I can’t remember which user owns it. At least on CentOS, it’s easy:

cat /var/spool/cron/* will show all crons. The crontab command doesn’t seem to support doing this. The downside is that that command just mashes them all into one list, which is only useful if you don’t care who the job runs as. Usually I do. Here’s a simple little script to format the output a little bit:

for i in `ls /var/spool/cron/`; do
        echo "Viewing crons for $i"
        echo "--------------------------------------"
        cat /var/spool/cron/$i


Silly MySQL trick of the day. This is probably no stranger to those work with MySQL all the time, but it’s a nice trick to have up your sleeve.

For reasons I still can’t wrap my head around, one of the MySQL tables we need for Cacti just up and vanished. It’s not really used for anything essential, so it’s not a great loss, just really annoying. So I need to recreate it. Fortunately, I have MySQL running on another machine.

mysql> desc poller_output;
| Field         | Type                  | Null | Key | Default             | Extra |
| local_data_id | mediumint(8) unsigned | NO   | PRI | 0                   |       |
| rrd_name      | varchar(19)           | NO   | PRI | NULL                |       |
| time          | datetime              | NO   | PRI | 0000-00-00 00:00:00 |       |
| output        | text                  | NO   |     | NULL                |       |
4 rows in set (0.00 sec)

It’s not that complicated. I could take a second and write out the CREATE TABLE statement I’d need to run on the machine missing the table. But here’s a neat trick (run on the system with the table, obviously):

mysql> show create table poller_outputG
*************************** 1. row ***************************
       Table: poller_output
Create Table: CREATE TABLE `poller_output` (
  `local_data_id` mediumint(8) unsigned NOT NULL default '0',
  `rrd_name` varchar(19) NOT NULL default '',
  `time` datetime NOT NULL default '0000-00-00 00:00:00',
  `output` text NOT NULL,
  PRIMARY KEY  (`local_data_id`,`rrd_name`,`time`)
1 row in set (0.00 sec)

Why write out the CREATE TABLE statement when we can just let MySQL tell us exactly how to re-create a table? I initially discovered this with the considerably more complex issue of views (SHOW CREATE VIEW), but it extends to tables, too. So now we can just copy-and-paste. Viola!

Well not viola for me, but viola for anyone whose filesystem hasn’t gone corrupt.

Building an Improvised CDN

From my “Random ideas I wish I had the resources to try out…” file…

The way the “pretty big” sites work is that they have a cluster of servers… A few are database servers, many are webservers, and a few are front-end caches. The theory is that the webservers do the ‘heavy lifting’ to generate a page… But many pages, such as the main page of the news, Wikipedia, or even these blogs, don’t need to be generated every time. The main page only updates every now and then. So you have a caching server, which basically handles all of the connections. If the page is in cache (and still valid), it’s served right then and there. If the page isn’t in cache, it will get the page from the backend servers and serve it up, and then add it to the cache.

The way the “really big” sites work is that they have many data centers across the country and your browser hits the closest one. This enhances load times and adds in redundancy (data centers do periodically go offline: The Planet did it just last week when a transformer inside blew up and the fire marshalls made them shut down all the generators.). Depending on whether they’re filthy rich or not, they’ll either use GeoIP-based DNS, or have elaborate routing going on. Many companies offer these services, by the way. It’s called CDN, or a Contribution Distribution Network. Akamai is the most obvious one, though you’ve probably used LimeLight before, too, along with some other less-prominent ones.

I’ve been toying with SilverStripe a bit, which is very spiffy, but it has one fatal flaw in my mind: its out-of-box performance is atrocious. I was testing it in a VPS I haven’t used before, so I don’t have a good frame of reference, but I got between 4 and 6 pages/second under benchmarking. That was after I turned on MySQL query caching and installed APC. Of course, I was using SilverStripe to build pages that would probably stay unchanged for weeks at a time. The 4-6 pages/second is similar to how WordPress behaved before I worked on optimizing it. For what it’s worth, static content (that is, stuff that doesn’t require talking to databases and running code) can handle 300-1000 pages/second on my server as some benchmarks I did demonstrated.

There were two main ways to enhance SilverStripe’s performance that I thought of. (Well, a third option, too: realize that no one will visit my SilverStripe site and leave it as-is. But that’s no fun.) The first is to ‘fix’ Silverstripe itself. With WordPress, I tweaked MySQL and set up APC (which gave a bigger boost than with SilverStripe, but still not a huge gain). But then I ended up coding the main page from scratch, and it uses memcache to store the generated page in RAM for a period of time. Instantly, benchmarking showed that I could handle hundreds of pages a second on the meager hardware I’m hosted on. (Soon to change…)

The other option, and one that may actually be preferable, is to just run the software normally, but stick it behind a cache. This might not be an instant fix, as I’m guessing the generated pages are tagged to not allow caching, but that can be fixed. (Aside: people seem to love setting huge expiry times for cached data, like having it cached for an hour. The main page here caches data for 30 seconds, which means that, worst case, the backend would be handling two pages a minute. Although if there were a network involved, I might bump it up or add a way to selectively purge pages from the cache.) squid is the most commonly-used one, but I’ve also heard interesting things about varnish, which was tailor-made for this purpose and is supposed to be a lot more efficient. There’s also pound, which seems interesting, but doesn’t cache on its own. varnish doesn’t yet support gzip compression of pages, which I think would be a major boost in throughput. (Although at the cost of server resources, of course… Unless you could get it working with a hardware gzip card!)

But then I started thinking… That caching frontend doesn’t have to be local! Pick up a machine in another data center as a ‘reverse proxy’ for your site. Viewers hit that, and it will keep an updated page in its cache. Pick a server up when someone’s having a sale and set it up.

But then, you can take it one step further, and pick up boxes to act as your caches in multiple data centers. One on the East Coast, one in the South, one on the West Coast, and one in Europe. (Or whatever your needs call for.) Use PowerDNS with GeoIP to direct viewers to the closest cache. (Indeed, this is what Wikipedia does: they have servers in Florida, the Netherlands, and Korea… DNS hands out the closest server based on where your IP is registered.) You can also keep DNS records with a fairly short TTL, so if one of the cache servers goes offline, you can just pull it from the pool and it’ll stop receiving traffic. You can also use the cache nodes themselves as DNS servers, to help make sure DNS is highly redundant.

It seems to me that it’d be a fairly promising idea, although I think there are some potential kinks you’d have to work out. (Given that you’ll probably have 20-100ms latency in retreiving cache misses, do you set a longer cache duration? But then, do you have to wait an hour for your urgent change to get pushed out? Can you flush only one item from the cache? What about uncacheable content, such as when users have to log in? How do you monitor many nodes to make sure they’re serving the right data? Will ISPs obey your DNS’s TTL records? Most of these things have obvious solutions, really, but the point is that it’s not an off-the-shelf solution, but something you’d have to mold to fit your exact setup.)

Aside: I’d like to put nginx, lighttpd, and Apache in a face-off. I’m reading good things about nginx.

MySQL on CentOS

In case anyone else is struggling like I am… After you install MySQL on CentOS, it doesn’t seem to want to start… You run mysqld and it barfs up an error.

It turns out that it’s because you need to run service mysqld start instead.

(And if you’re looking for how to install GD2 for PHP, and frustrated because a search of “gd2” in yum turns up nothing… That’s because it’s called php-gd, not php-gd2.)

Installation, Lazy

I’m going to use the default Apache and PHP that come from the CentOS repositories (until I get irritated enough with how old they are…).

So I just did an install of php-pecl-memcache and let yum fetch the “dependencies,” which, in this case, included Apache and PHP. O:-)

In other news, I’m slightly confused about why memcache isn’t a package I can install, but the libraries for it are?