BVLog Bryan Voss’ mental synchronization point


One miiiiillion files

A sysadmin rule of thumb I had occasionally heard in the past was to limit the number of files in a directory to around 2000. Larger numbers of files were historically difficult to deal with on old systems with limited RAM and processing capability. And of course old filesystems had limitations on the number of files per directory as well.

We are migrating data from our old document imaging system to our new one. As part of the process, we shipped off a copy of all the optical platters from the old system to an agency that supposedly specializes in the migration process. They pulled the image data from the platters and dumped it all to USB drives along with scripts that we run through the new system to import all the images with the correct index information. I just received two of the three USB drives, so I hooked one up to a virtual machine to see how things look.

Two directories on the root of the drive, one for images and one for scripts. Took a look at the images directory. Lots of subdirectories with lots more subdirectories under each one. Looks ok.

Changed to the scripts directory. Since I'm browsing in Windows Explorer, I get the animated folder and flashlight indicating that it's reading the directory info, please wait. Ok, no problem. I go check my email. Come back to the window a couple of minutes later and it's still reading. What's going on here? Since I'm running it on a VM, I think maybe that's slowing things down.

I close the window, disconnect the drive, and connect it to my laptop. Open a DOS prompt. cd to the directory and type dir. Text starts scrolling by. Looks like all the scripts are in one directory with no subdirectories. Ok, that's to be expected since the test batches the agency sent us previously were laid out the same way. No problem, how many files could there be in that directory? (Text is still scrolling in the window.) I minimize the window in the hopes that it will speed up the process if it's not having to update the display.

I go do something else for a while. Come back about 5 minutes later and bring the DOS window back up. Still scrolling. WHAT?!? This is crazy. I don't remember offhand whether the dir command under DOS even shows the file count anyway (It does, as it turns out.), so I ctrl-C the process and close the window. Gotta break out the real command line utils. I launch a cygwin bash window and cd to the directory in question. Type "ls -1 | wc -l". For the uninitiated, this will generate a single-column listing of the files in the directory and pipe it to the word count util, which will count the number of lines returned. I leave this running and go down the hall to the datacenter to start a Red Hat install on a new box.

I come back to my desk about 15 minutes later and the process is still running. This is crazy! I trust bash more than DOS, though, so I let it run. Finally after about 30 minutes, the prompt suddenly appears again. I look at the output of the wc command. Count digits. Double check. Yes, my first glance was correct. One million, fourteen thousand, seven hundred eighty seven files in a single directory. I'm actually mildly surprised the FAT32 filesystem can handle that many files in a single directory.

My next challenge is copying all those files to a directory on the server that needs to process them. Is the DOS copy command up to the task? We shall see...

Comments (0) Trackbacks (0)

No comments yet.

Leave a comment

No trackbacks yet.