Thursday, July 23, 2009

Bug of the day

Todays entry is about a fun little bug that a customer reported the other day. The cause is very straight forward, if you know a little trivia about Windows.

Without going into too much detail, the product I work is a type of document management system. You add files to the document management system and can do various things to them. For various reasons the document management system sometimes creates icon files for different document types and stores these icon files in the local file system. For example if you added a file called 'abc.mp3' to the system then an icon might get created on the local file system called 'mp3.ico'. Seems reasonable right?

However a customer reported a bug that had our support team stumped. They were getting an error when they tired to add documents that had the file extension 'con'. If you know a bit about windows you have probably guessed the problem already. Since the very begining of time MS-DOS, and later Windows, have had a list of several file names that are reserved in all folders. From here:

Do not use the following reserved device names for the name of a file:
CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9

Also avoid these names followed immediately by an extension; for example, NUL.txt is not recommended.


Of course once you know this the cause of the bug is obvious. The user adds a file named 'abc.con' to the system, which then creates a file named 'con.ico', and then Windows gets upset. The reserved words are of course names of devices, they exist so you can use devices like files à la UNIX. The strange thing is they are reserved in all directories.

Anyway the lessons are:

1. The more general lesson: As the files names were essentially determined by user input they should have been sanitized at some point.

2. The more specific lesson: Be aware that there are some less obvious rules about valid file names in Windows.  

Saturday, April 25, 2009

Playing with the bifurcation diagram of the logistic map



After reading Chaos by James Gleick the other day I was inspired to write a program to draw the bifucation diagram of the logistic map.

The logistic map is an iterative function which can be used to model population growth. It looks like this:


The Wikipedia article gives a good explanation on how the logistic map works. You start by picking a value of x between 0 and 1 and plugging into the equation. This gives you a result that you plug back in as you new value of x. You repeat the process several times and see what happens.

The equation is not particularly complex, so you might not expect anything to exciting to happen. Indeed for values of r less then about 3.45 the value of x will bounce around a few times and quickly converge on a value. However if we keep increasing the value of r things become more interesting. As r approaches approximately 3.45 x starts to bounce around a lot more before settling on a value. As r continues to increase x will no longer settle to a single value, but instead oscillate between two values. If we keep increasing the value of r we will find that x now starts to oscillate between 4 values, and then 8 then 16 and so on. At about r = 3.57 the values of x become chaotic, never settling in to a predictable oscillation. However if we keep increasing r we will occasionally see that some values of r again seem to show more ordered behaviour.

If we draw the long term values of x we end up with a dramatic bifurcation diagram that illustrates the complex behaviour contained in the seemingly simple equation.

So anyway I decided to try writing my own program to draw the bifurcation diagram, which turned out to be a relatively simple task. Here is my first result:


Not bad, but there are a few problems. Firstly the diagram indicates that x settles on several values before the first bifurcation. This is incorrect, this is actually an artefact caused by not iterating enough times before drawing the value(s) of x. Secondly, this diagram doesn't capture a lot of the intricate details we would like to see.

Zooming in and increasing the number of dots draw helps to bring out more detail:


Lets try increasing the number of iterations performed before we draw a dot, and also drawing a lot more dots:


Oh dear. While we have gotten rid of the artefacts around the bifurcations mentioned before, we have also draw so many dots that we have wiped out most of the detail. I looked on the web for a way to increase the detail. One solution others had tried was to count how many times a dot is drawn in a particular point and to draw points that were drawn many times as brighter then those drawn only a few times. This seems like a good solution, but I was feeling lazy so I decided to try a dumber approach first. I simply made points that are drawn in later iterations brighter. My theory was that points that are drawn a lot are more likely to be drawn in a later iteration. I wasn't really expecting this approach to work very well, but to my surprise the results were pretty good:


Above you can see that we are getting nice clean bifurcations, and we can see some nice details in the chaotic parts of the diagram.

I wondered what would happen if we drew all iterations, instead of only drawing dots after a certain number of iterations. This idea was inspired in part by the way the Mandelbrot set is often rendered. A rendering of the Mandelbrot set is also created using an iterative function. Most rendering of the Mandelbrot set draw points that are outside the actual Mandelbrot set. These points are draw in different colours, depending on how many times the function is iterated before it is determined that the point is not actually part of the Mandelbrot set. When I tried this with the logistic map I got the following result:

As you can see some new lines appear. For the non chaotic parts of the diagram these lines trace the values that x visits before settling into an oscillation.

Zooming out we see the logistic map seems to exhibit interesting behaviour for values of r between approximately -2 and 4:
Lets add some colour to make things look nicer, and zoom in on some interesting areas:

The initial value of x used does not affect the bifurcation diagram itself. However it does effect the values that x visits before reaching a stable oscillation (for those values of r where is does reach a stable oscillation). Here is an example of what happens when we vary the initial value of x (zoomed in on the first bifurcation again):
Finally if we zoom way out we see that the logistic map becomes fairly boring for values of r outside of approximately -2 to 4:

The straight line is just the initial value of x chosen.

Monday, September 15, 2008

InstallScript

If you were going to create a programming language for use in installers you would probably make sure it had lots of string manipulation stuff right? Like you would definitely put in regular expressions wouldn't you?

If you answered "yes" then you are not the designer of InstallShield.

Tuesday, April 15, 2008

Google App Engine Beta First Impressions

For a day job I work as a Smalltalk developer on a win32 system. However for a long time I have been meaning to get into web development. I tried to get started with Ruby on Rails, but spending an evening just setting up the database and the web server was no fun. And that was just to play around with it on my own machine, if I actually wanted to share my creation with the world I would be in for a whole new set of pains.

This, claims Google, is where Google App Engine comes in. According to the this spiffy Google video getting my Python powered web site up and running will be a breeze. However what really excites me is that App Engine also gives you access to Google's super-cool distributed datastore, basically you get a bit of storage space and CPU-time on the Google super-computer. Sounds good to me.

At the moment there are only 10000 beta spots available, and it seems I missed out, so I'm in the queue. Hopefully Google will be kicking inactive accounts and giving spots to people like me who are waiting. Not to worry though, you can still get started with the SDK. I happen to be in front of a Windows XP box at the moment so I'm trying out the Windows version.

The first step is to get Python 2.5 from here. The Windows version comes with a spiffy MSI installer, so you can't go wrong.

Next up we need the SDK itself. Once again the install is quick and easy.

next it's time for Google's getting started guide.

The guide is quite nice. Note that the SDK doesn't come with a fancy IDE or anything like that. You will be playing around with the command line and your favorite text editor, but it's all fairly simple. The SDK does however included a web server to test your code on your development machine. Running it is as easy as a single command. Hooray! no more screwing around with DBs and complex web servers on developer machines. The tutorial will also give you an introduction to some of the cool stuff that comes with App Engine: integration with Google user accounts, using the 'datastore' (not 'database'), using django templates etc. One thing you will notice is a lot of this stuff is very specific to the App Engine platform. This is very much a double edged sword. On one hand tight integration makes traditionally hard stuff like account management almost trivial. On the other hand it means you will have a hard time moving to another platform if you ever decide you don't like Google's hosting.

On another note I did come across a bug in the tutorial see:
http://groups.google.nl/group/hosted-settingup/msg/378021de6ae22895

Before you get too excited another thing to look out for is that the App Engine environment is fairly well locked down. You can't do anything to close too the OS, like writing to files. Also if you want to import a Python module if needs to be pure Python. That means, for example, the Python Imaging Library is out, which probably kills off a lot of cool ideas, though Google may decide to support this library in the future. Also at the moment SSL is not supported, though I'm sure it will be ready when the beta finishes.

All in all I was very impressed with the beta. Google have managed to make getting started dead simple. But time will tell if the platform will be flexible enough for "real" websites, or just toys.

Sunday, October 7, 2007

Why they should teach functional languages in Computer Science

This is a Google paper on using a map/reduce strategy to do computations on masses of commodity hardware in parallel. It's very cool, and it helps Google do what it does. If you were taught Lisp or Haskell at university you will probably find most of the paper quite straight-froward.

Word has it that UNSW doesn't teach Haskell in Comp1A anymore. Apparently someone decided that students should now spend their first semester in computer science learning about printf instead of recursion.