Linux sysadmin tip #1: LVM
There's no excuse for not using logical volumes when setting up a Linux system anymore. All the major distros support LVM. All the major rescue disks support it. The benefits of LVM are too great to ignore. You set up a volume group containing all your disks, then create logical volumes on top of that. The logical volumes can be expanded when necessary until the volume group is filled. When that happens, just add another disk and expand the volume group across it. You can then continue expanding the logical volumes as needed. Logical volumes work on top of RAID arrays and SAN LUNs, as well.
About the only thing better is Solaris' ZFS, which I've just begun experimenting with recently. But that's a topic for another post...
One miiiiillion files
A sysadmin rule of thumb I had occasionally heard in the past was to limit the number of files in a directory to around 2000. Larger numbers of files were historically difficult to deal with on old systems with limited RAM and processing capability. And of course old filesystems had limitations on the number of files per directory as well.
We are migrating data from our old document imaging system to our new one. As part of the process, we shipped off a copy of all the optical platters from the old system to an agency that supposedly specializes in the migration process. They pulled the image data from the platters and dumped it all to USB drives along with scripts that we run through the new system to import all the images with the correct index information. I just received two of the three USB drives, so I hooked one up to a virtual machine to see how things look.
Two directories on the root of the drive, one for images and one for scripts. Took a look at the images directory. Lots of subdirectories with lots more subdirectories under each one. Looks ok.
Changed to the scripts directory. Since I'm browsing in Windows Explorer, I get the animated folder and flashlight indicating that it's reading the directory info, please wait. Ok, no problem. I go check my email. Come back to the window a couple of minutes later and it's still reading. What's going on here? Since I'm running it on a VM, I think maybe that's slowing things down.
I close the window, disconnect the drive, and connect it to my laptop. Open a DOS prompt. cd to the directory and type dir. Text starts scrolling by. Looks like all the scripts are in one directory with no subdirectories. Ok, that's to be expected since the test batches the agency sent us previously were laid out the same way. No problem, how many files could there be in that directory? (Text is still scrolling in the window.) I minimize the window in the hopes that it will speed up the process if it's not having to update the display.
I go do something else for a while. Come back about 5 minutes later and bring the DOS window back up. Still scrolling. WHAT?!? This is crazy. I don't remember offhand whether the dir command under DOS even shows the file count anyway (It does, as it turns out.), so I ctrl-C the process and close the window. Gotta break out the real command line utils. I launch a cygwin bash window and cd to the directory in question. Type "ls -1 | wc -l". For the uninitiated, this will generate a single-column listing of the files in the directory and pipe it to the word count util, which will count the number of lines returned. I leave this running and go down the hall to the datacenter to start a Red Hat install on a new box.
I come back to my desk about 15 minutes later and the process is still running. This is crazy! I trust bash more than DOS, though, so I let it run. Finally after about 30 minutes, the prompt suddenly appears again. I look at the output of the wc command. Count digits. Double check. Yes, my first glance was correct. One million, fourteen thousand, seven hundred eighty seven files in a single directory. I'm actually mildly surprised the FAT32 filesystem can handle that many files in a single directory.
My next challenge is copying all those files to a directory on the server that needs to process them. Is the DOS copy command up to the task? We shall see...
I’m gonna need you to go ahead and reboot the mainframe, mmmmkay?
I just saw a trouble ticket come in from the Help Desk: "User reports error 12 in application. This indicates that the mainframe needs to be rebooted."
Ummm. Thanks. I'll get right on that.
- We don't have a mainframe.
- If we did, we probably wouldn't reboot it because one user is getting an error in an application.
Do not be afraid
I am occasionally asked how I learned so much about [insert technical subject here]. My answer is always: "I just started playing with it until I figured it out." The point I try to get across is that I learned it by doing it.
A coworker asked me to sit down with him and teach him some things about Linux. We can spend all day talking, but the only way to figure out the ideas and concepts behind the Linux shell is to live in it for a while. I told him that around 10 years ago, I determined that I needed to learn Linux since it looked like it was going to be an important platform and doggone it, I didn't have the money to keep buying commercial software. I threw out Windows entirely and used Linux for everything at home. I started a sysadmin job at a small manufacturing company and proceeded to migrate everything I could to Linux. I was probably overzealous in my attempts, but I learned a huge amount as part of the process. I spent countless hours reading man pages and howtos. I signed up on the local Linux users group email list and asked questions. Eventually, I got good enough that I didn't have to post many questions anymore. Then I got good enough that I was able to post responses and help other people with their Linux problems.
I never would have gotten where I am today without banging my head against seemingly insurmountable problems, breaking all kinds of systems and rebuilding them, doggedly sticking with Linux even though it would have been trivial to solve a problem with Windows. I have since learned to back down occasionally and recognize the particular situations where Linux is the right choice and to accept the situations where it's not the right choice. But I wouldn't trade my history for anything. It's what got me to where I am today.
Here's the point in all this: do not be afraid to play with a system. Build a test box if the system is mission critical. Just the experience of building the test system is useful. Break it and rebuild it.
Learn the history of whatever project or system you're experimenting with. Feel the mindset and methods of the developers. There's probably a reason they did it that way.
Once the system is in production, it will eventually break in some unexpected and odd way. Don't be afraid to open the hood and fix it, just document what you did (an internal blog makes a great worklog).
Welcome uncertainty, it's a learning opportunity in disguise. The only way to gain certainty is to act decisively. No route is perfect. Pick what looks like the best one and run with it. If it turns out to be the wrong route, at least you learned something along the way.
But above all, do not be afraid.