Archive for the ‘Programming’ Category
This post christens my newest category, Thinking Aloud. It’s meant to house random thoughts that pop into my head, versus fully fleshed-out ideas. Thus it’s meant more as an invitation for comments than something factual or informative, and is likely full of errors…
Aside from “time geeks,” those who deal with it professionally, and those intricately familiar with the technical details, most people probably are unaware that each of the GPS satellites carries an atomic clock on board. This is necessary because the way the system works, in a nutshell, by triangulating your position from various satellites, where an integral detail is knowing precisely where the satellite is at a given time. More precise time means a more precise location, and there’s not much margin of error here. The GPS satellites are also syncronized daily to the “main” atomic clock (actually a bunch of atomic clocks based on a few different standards), so the net result is that the time from a GPS satellite is accurate down to the nano-second level: they’re within a few billionths of a second of the true time. Of course, GPS units, since they don’t cost millions of dollars, rarely output time this accurately, so even the best units seem to have “only” microsecond accuracy, or time down to a millionth of a second. Still, that’s pretty darn precise.
Thus many–in fact, most–of the stratum 1 NTP servers in the world derive their time from GPS, since it’s now pretty affordable and incredibly accurate.
The problem is that GPS isn’t perfect. Anyone with a GPS probably knows this. It’s liable to be anywhere from a foot off to something like a hundred feet off. This server (I feel bad linking, having just seen what colocation prices out there are like) keeps a scatter plot of its coordinates as reported by GPS. This basically shows the random noise (some would call it jitter) of the signal: the small inaccuracies in GPS are what result in the fixed server seemingly moving around.
We know that an error in location will also cause (or, really, is caused by) an error in time, even if it’s miniscule.
So here’s the wondering aloud part: we know that the server is not moving. (Or at least, we can reasonably assume it’s not.) So suppose we define one position as “right,” and any deviation in that as inaccurate. We could do what they did with Differential GPS and “precision-survey” the location, which would be very expensive. But we could also go for the cheap way, and just take an average. It looks like the center of that scatter graph is around -26.01255, 28.11445. (Unless I’m being dense, that graph seems ’sideways’ from how we typically view a map, but I digress. The latitude was also stripped of its sign, which put it in Egypt… But again, I digress.)
So suppose we just defined that as the “correct” location, as it’s a good median value. Could we not write code to take the difference in reported location and translate it into a shift in time? Say that six meters East is the same as running 2 microseconds fast? (Totally arbitrary example.) I think the complicating factors wouldn’t whether it was possible, but knowing what to use as ‘true time,’ since if you picked an inaccurate assumed-accurate location, you’d essentially be introducing error, albeit a constant one. The big question, though, is whether it’s worth it: GPS is quite accurate as it is. I’m a perfectionist, so there’s no such thing as “good enough” time, but I have to wonder whether the benefit would even show up.
From my “Random ideas I wish I had the resources to try out…” file…
The way the “pretty big” sites work is that they have a cluster of servers… A few are database servers, many are webservers, and a few are front-end caches. The theory is that the webservers do the ‘heavy lifting’ to generate a page… But many pages, such as the main page of the news, Wikipedia, or even these blogs, don’t need to be generated every time. The main page only updates every now and then. So you have a caching server, which basically handles all of the connections. If the page is in cache (and still valid), it’s served right then and there. If the page isn’t in cache, it will get the page from the backend servers and serve it up, and then add it to the cache.
The way the “really big” sites work is that they have many data centers across the country and your browser hits the closest one. This enhances load times and adds in redundancy (data centers do periodically go offline: The Planet did it just last week when a transformer inside blew up and the fire marshalls made them shut down all the generators.). Depending on whether they’re filthy rich or not, they’ll either use GeoIP-based DNS, or have elaborate routing going on. Many companies offer these services, by the way. It’s called CDN, or a Contribution Distribution Network. Akamai is the most obvious one, though you’ve probably used LimeLight before, too, along with some other less-prominent ones.
I’ve been toying with SilverStripe a bit, which is very spiffy, but it has one fatal flaw in my mind: its out-of-box performance is atrocious. I was testing it in a VPS I haven’t used before, so I don’t have a good frame of reference, but I got between 4 and 6 pages/second under benchmarking. That was after I turned on MySQL query caching and installed APC. Of course, I was using SilverStripe to build pages that would probably stay unchanged for weeks at a time. The 4-6 pages/second is similar to how WordPress behaved before I worked on optimizing it. For what it’s worth, static content (that is, stuff that doesn’t require talking to databases and running code) can handle 300-1000 pages/second on my server as some benchmarks I did demonstrated.
There were two main ways to enhance SilverStripe’s performance that I thought of. (Well, a third option, too: realize that no one will visit my SilverStripe site and leave it as-is. But that’s no fun.) The first is to ‘fix’ Silverstripe itself. With WordPress, I tweaked MySQL and set up APC (which gave a bigger boost than with SilverStripe, but still not a huge gain). But then I ended up coding the main page from scratch, and it uses memcache to store the generated page in RAM for a period of time. Instantly, benchmarking showed that I could handle hundreds of pages a second on the meager hardware I’m hosted on. (Soon to change…)
The other option, and one that may actually be preferable, is to just run the software normally, but stick it behind a cache. This might not be an instant fix, as I’m guessing the generated pages are tagged to not allow caching, but that can be fixed. (Aside: people seem to love setting huge expiry times for cached data, like having it cached for an hour. The main page here caches data for 30 seconds, which means that, worst case, the backend would be handling two pages a minute. Although if there were a network involved, I might bump it up or add a way to selectively purge pages from the cache.) squid is the most commonly-used one, but I’ve also heard interesting things about varnish, which was tailor-made for this purpose and is supposed to be a lot more efficient. There’s also pound, which seems interesting, but doesn’t cache on its own. varnish doesn’t yet support gzip compression of pages, which I think would be a major boost in throughput. (Although at the cost of server resources, of course… Unless you could get it working with a hardware gzip card!)
But then I started thinking… That caching frontend doesn’t have to be local! Pick up a machine in another data center as a ‘reverse proxy’ for your site. Viewers hit that, and it will keep an updated page in its cache. Pick a server up when someone’s having a sale and set it up.
But then, you can take it one step further, and pick up boxes to act as your caches in multiple data centers. One on the East Coast, one in the South, one on the West Coast, and one in Europe. (Or whatever your needs call for.) Use PowerDNS with GeoIP to direct viewers to the closest cache. (Indeed, this is what Wikipedia does: they have servers in Florida, the Netherlands, and Korea… DNS hands out the closest server based on where your IP is registered.) You can also keep DNS records with a fairly short TTL, so if one of the cache servers goes offline, you can just pull it from the pool and it’ll stop receiving traffic. You can also use the cache nodes themselves as DNS servers, to help make sure DNS is highly redundant.
It seems to me that it’d be a fairly promising idea, although I think there are some potential kinks you’d have to work out. (Given that you’ll probably have 20-100ms latency in retreiving cache misses, do you set a longer cache duration? But then, do you have to wait an hour for your urgent change to get pushed out? Can you flush only one item from the cache? What about uncacheable content, such as when users have to log in? How do you monitor many nodes to make sure they’re serving the right data? Will ISPs obey your DNS’s TTL records? Most of these things have obvious solutions, really, but the point is that it’s not an off-the-shelf solution, but something you’d have to mold to fit your exact setup.)
Aside: I’d like to put nginx, lighttpd, and Apache in a face-off. I’m reading good things about nginx.
I don’t really remember precisely when I started using Linux, but I do distinctly remember December 31, 1999, around 11:55pm, sitting in front of my computer and seeing what would happen. (Absolutely nothing out of the ordinary.) I was in KDE at the time, back when they had a HUGE digital clock that looked like crap even then.
I remember when USB thumb drives came into vogue, and I tried using mine in Linux. They worked! I just had to pull up a shell window, su to root, mkdir /mnt/usb, and then mount it there. And one day I forgot to umount before unplugging it, causing a kernel panic. Windows, meanwhile, let you plug the thumb drive in and seamlessly mapped it to a new drive. When you pulled it out, it unmounted the drive for you. (Although it still occasionally gripes at me with “Delayed Write Failed” even after I’ve closed everything using it and let it sit for quite some time. But I digress.)
Today, without thinking, I decided to plug my Logitech G15 into my Linux machine, running Ubuntu’s Hardy Heron release. It worked, but that’s not saying much: any old OS can see a USB keyboard. But what took me by surprise was what happened next. Without thinking, I used the volume wheel on it to turn down my music. It worked! On a whim, I hit the “Previous Track” button, and Rhythmbox started playing the previous song. I had to install drivers for this in Windows, but not in Linux. How’s that for a role reversal?
Of course, this isn’t a “Linux is superior.” There are still some flaws on my system that drive me crazy (why do my graphics drivers keep suspend/hibernate from working?!), but I can say that about Windows too. The point is that Linux used to be laughably far behind Windows in terms of things “just working.” And now I occasionally find myself wishing Windows were as easy to use as Linux in some regards. This is impressive progress!
I frankly don’t use AIM that much these days, but will often sign on and think, “Wow, lots of people are on tonight!” or, “Wow, almost no one is on tonight!” So I just wanted to list my thought process after noticing this:
- I’d be interested in seeing a graph of my “buddies” online over time.
- It wouldn’t be too hard to write a little script to sit on AIM 24/7 and watch this.
- If I was doing that, I might as well log each time someone signed on and off, which would let me answer those, “I wonder if x has been online in at all lately?” questions.
- As long as I have a stalker bot going, it’d be even more interesting to grab their away message text and buddy profile.
- And as long as I’m doing that, I might as well add support for using diff to show changes in the above between any two points in time.
Is there anything that can’t be graphed? Or made into a shell script?
I’ve always been a little creeped out by some of the stuff on Craigslist. There’s pretty obvious prostitution and drugs going on, in addition to people seeking affairs. And if you read through the “personals” section (which is pretty entertaining), watch out for ones with pictures… Something they’re, uhh, graphic.
So I went through about 20 recent postings, merged them into a textfile, and used my old Markov chain code to “learn” the text and then spit out text based on it… Some of the stuff on Craigslist is so bizarre that it’s hard to tell what’s nonsense the script spits out, and what’s real. (I’ve omitted anything wildly obscene.)
I love to read, movies, anything to do my hair today medium length i need a new look today im off from work hit me up I have a personality that is a cheater because whats is the beginning of something possibly beautiful and long term.
I love to have a great day! A little about me…I am very mature. I am a very comfortable passenger seat. I may or may not have a degree
Good stimulating companionship and conversation is the point of being with someone if your going to cheat on them. I’m new to this online service and hoping to make new friends.Hope it works…
I love to read, movies, anything to do my hair today medium length i need a new look today im off from work hit me up
I have a personality that is a cheater because whats is the beginning of something possibly beautiful and long term.
If you are Interested to have a big black cruiser with a good place for drinks dancing live music with a rumble between her legs, for occasional rides. Feel the rumble as we hit the open road…wrap your arms around me, and press in close.
meet you, after some phone conversations, in public places only unless of course it is business related or anohter type of function, in which case I would meet you, after some phone conversations, in public places only unless of course it is business related or anohter type of function, in which case I would meet you, after some phone conversations, in public places only unless of course it is business related or anohter type of
normal and fun people, between 30-40 years of age, who are looking to meet some new people to hang out today and maybee 420 a bit. I love being out in the rewards you crave. Where you do not. Once a week, I will visit you. We will go over the goals you set yourself and your mobile number and let’s start texting!
I’m looking for someone to be a hypocrite.
Anyone know of a good place for drinks dancing live music with a lot of chrome, and a very open, spontaneous, and down to earth person.
I’m not looking for a coffee and a very comfortable passenger seat. I may or may not have a great companion. My friends think Im mischievious and I hate writing, so that’s it for me.
I have all camping gears. I am burned out of shape so don’t be shy, just be sensual.
Interested in normal and fun people, between 30-40 years of age, who are looking to make new text message with on a regular basis…what are we going to cheat on them.
I’m looking for someone to play with soon, because the weather is getting to be a marathoner!!!
I’ve heard about Cuddle Parties on the back of a good bar to watch the Celtics where they actually put the sound on the radio and internet but there are none
I’m certainly not someone who puts a twinkle in my stomache
hey im a leo male looking for possible another mom who is self motivated
I am open to a totally awesome 2-year-old boy. My problem is that I am looking to spend some time with an older 30-50.
so please be mature and not very interested in going out at night.
I consider diversity to be really nice !
If you’re not interested in talking through emails because, honestly, I can respond to anyone with a picutre
Good stimulating companionship and conversation is the point of being a hard worker. It’s simply that you have one? Reply now for flirty fun on the TV.
I have always tried to be on the radio and internet but there are none in the Boston area.
I am thankful for every day that I feel like I have no friends!!
this poing in my eyes and butterflies in my eyes and butterflies in my stomache
Please be a real person, please be open-minded and if you would like to get to know me, please just hit a back button, don’t reply.
Anyone know of a good listener and would love to go out and paint the town with.
I look forward to talking to a loser
I work 2 jobs and do not allow myself to be really nice !
my friends are married, and not very interested in talking to a totally awesome 2-year-old boy.
I can occasionally get a sitter but sometimes those are hard to come by so I need a new look today im off from work hit me up I have problems in life.
I am open to speaking to people of all races.
Quick clarification, since I was horrified at first… “Black cruiser” is a guy referring to his motorcycle; he was looking for other motorcycle enthusiasts. As was the “rumble” bit. It just happens to come up in the most inappropriate places. Also, the “2-year-old” thing comes from someone discussing that they have a child.
It’s no secret that gzip is handy on UNIX systems for compressing files. But what I hadn’t really considered before is that you don’t have to create a huge file and then gzip it. You can simply pipe output through it and have it compressed on the fly.
For example:
[root@oxygen]# mysqldump –all-databases -p | gzip > 2008May10-alldbs.sql.gz
That backed up all the databases on this machine and compressed them. (It’s a 31MB file, but that’s nothing when you realize that one of my databases is about 90MB in size, and I have plenty others at 10-20MB each.)
The Web Developer toolbar, which is (1) the #1 hit on Google for “Web Developer,” and (2) now compatible with Firefox 3 beta, is totally awesome. You may recall that, in the past, if you had text after a bulleted list or similar on this page, the text would suddenly be mashed together. I never took the time to fully look into it, but it always irked me.
A quick “Outline… Outline Block Level Elements” drew colored boxes around each element of the page, which was exceptionally helpful. This shows the problem: posts start off inside a <p> tag, and adding a list or similar closes the <p> tag. This would have been an easy catch, except that the list looked fine. Upon a closer review, it’s because the lists specified the same line-spacing, thus looking right. While I most likely could have solved this by staring at the code for a long time, Web Developer made it much easier to spot: the first text is inside one box, followed by the list, but the other text is floating outside, leading to a quick, “Oh, I should look at how the <div> is set up” thought, which ended up being exactly the problem. (There’s a bit of excessive space now, but that’s caused by me using PHP to inject linebreaks.)
Web Developer also includes a lot of other useful tools, including the ability to edit the HTML of the page you’re viewing, view server headers, resize non-resizeable elements frames, show page comments, change GETs to POSTs and vice-versa, and much more. Whether you do design full-time, or if you just occasionally fix things, it’s worth having. And you can’t beat the fact that it’s free.
I’ve alluded before to using gzip compression on webserver. HTML is very compressible, so servers moving tremendous amounts of text/HTML would see a major reduction in bandwidth. (Images and such would not see much of a benefit, as they’re already compressed.)
As an example, I downloaded the main page of Wikipedia, retrieving only the HTML and none of the supporting elements (graphics, stylesheets, external JavaScript). It’s 53,190 bytes. (This, frankly, isn’t a lot.) After running it through “gzip -9″ (strongest compression), it’s 13,512 bytes, just shy of a 75% reduction in size.
There are a few problems with gzip, though:
- Not all clients support it. Although frankly, I think most do. This isn’t a huge deal, though, as the client and server “negotiate” the content encoding, so it’ll only be used if it’s supported.
- Not all servers support it. I don’t believe IIS supports it at all, although I could be wrong. Apache/PHP will merrily do it, but it has to be enabled, which means that lazy server admins won’t turn it on.
- Although it really shouldn’t work that way, it looks to me as if it will ‘buffer’ the whole page then compress it, then send it. (gzip does support ’streaming’ compression, just working in blocks.) Thus if you have a page that’s slow to load (e.g., it runs complex database queries that can’t be cached), it will appear even worse: users will get a blank page and then it will suddenly appear in front of them.
- There’s overhead involved, so it looks like some admins keep it off due to server load. (Aside: it looks like Wikipedia compresses everything, even dynamically-generated content.)
But I’ve come across something interesting… A Hardware gzip Compression Card, apparently capable of handling 3 Gbits/second. I can’t find it for sale anywhere, nor a price mentioned, but I think it would be interesting to set up a sort of squid proxy that would sit between clients and the back-end servers, seamlessly compressing outgoing content to save bandwidth.
A song by Bush came on iTunes. Curiously exactly what they were mumbling, I went to look up the lyrics.
I Googled “Everything Xen lyrics,” before realizing that I’d confused the hypervisor with the state of tranquility.
I need to get out more.
Have you guys seen Google Charts? It’s a quirky little API I didn’t know existed until I saw a passing allusion to it. Essentially, you pass it a specially-crafted URL (essentially the definition of an API) and it will generate a PNG image.
Here’s a fairly random line graph. My labeling of the axes makes no sense, but it was nonsensical data anyway.
One of the cooler things they support is a “map” type of chart, like this US map. The URL is a bit tricky, though this one of South America is easier to understand: chco (presumably “CHart COlor”) sets the colors, with the first being the ‘default’ color. chld lists the countries, as they should map up to the colors: UYVECO is “Uruguay,” “Venezuela,” and “Colombia.”
What has me particularly interested is that I’ve recently installed code to watch connections to my NTP servers. Here’s my Texas box, a stratum 2 server in the NTP pool (pool.ntp.org). I bumped it up to list a 10 Mbps connection speed to signal that I could handle a lot more queries than the average, although it’s still nowhere near its limit. In addition to the stats you see there, it keeps a “dump file” of every single connection. (As an aside, this strikes me as inefficient and I want to write an SQL interface to keep aggregate stats… But that’s very low-priority right now.)
Further, I have some IPGeo code I played with. More than meets the eye, actually: a separate database can give a city/state in addition to the country. (It comes from the free MaxMind GeoLite City database.) Thus I could, in theory, parse the log file created, match each IP to a state, and plot that on a US map.
This reminds me that I never posted… I set up NTP on the second server Andrew and I got, where we’re intending to consolidate everything, but haven’t had time yet. It sat unused for a while, keeping exceptionally good time. So, with Andrew’s approval, I submitted it to the NTP pool. I set the bandwidth to 3 Mbps, lower than the 10 Mbps my Texas box is at.
I was somewhat surprised to see it handling significantly more NTP queries. (They’re not online, since the box isn’t running a webserver, but for those in-the-know, ~/ntp/ntp_clients_stats | less produces the same type of output seen here.) It turns out that a flaw in the IPGeo code assigning the server to the right ‘zones’ for some reason thought our server was in Brazil. Strangely, while the United States has 550 (at last count) servers in the pool, South America has 16. Thus I got a much greater share of the traffic. It’s still low: at its peak it looks like me might use 2GB of bandwidth.
So there are a few graphs I think would be interesting:
- A line graph of the number of clients served over time. Using Google Charts would save me from having to deal with RRDTool / MRTG.
- A map of South American countries, colored to show which of the countries are querying the server most frequently. (The same could be done for my US server, on a state-by-state basis.)