Talkback: Discuss this article with peers
Just a few minutes before sitting down to write this article, I managed to fix a problem that has been the bane of my existence for the last two weeks. Since it is a problem that I have often seen mentioned in the Linux Gazette, usually phrased in a manner that shows the writer to be standing on a chair with a noose around his neck and typing with his toes, I've decided to share it with other readers, hopefully saving them wear and tear on good rope. This may also serve as a good guide to troubleshooting software problems in general. Be aware, though, that a login problem could involve _any_ of the areas described - what fixed my particular machine may not be the solution for yours.
A couple of weeks ago, I decided to install an MUA (Mail User Agent) on my machine. A strange thing to do, considering that I live on a sailboat anchored well away from phone lines or electricity - but I had my reasons. I'd done this on land-based systems before; there was just a bit of experimentation that I wanted to do. Wel
l, as a pride of lemmings goeth before a fall off a cliff, so does an MTA (Mail Transfer Agent) go before an MUA - you need something that will deliver the mail, otherwise there's not much point in writing it! So, an MTA/MUA installation. No problem - I keep the entire Debian distribution on the Linux partition of my hard drive; this speeds up installations as well as making package searches a trivial task.
If truth be known, I don't like 'su', at least not for major tasks: the fact that it keeps the original user's environment variables, rather than assuming those of the account being "su"'d to, has caused me a few "interesting moments". Yeah, a quick permissions change or an /etc file modification - all right, - but for serious work, like installing and uninstalling several major packages (I wasn't sure which MTA I wanted yet), I log in as `root'.
On to the task. Midnight Commander makes it the work of a few keystrokes to dive into and explore a directory tree, as well as letting you look inside - and install - any Debian or RedHat package. Let's see... `sendmail'? (Read the `man' page inside the package, look at the docs, install...) Nope, too big and complex. I need something a bit simpler. (Uninstall.) `exim'?... `exmh'?... `mh'?... `nmh'? All got the same "install/uninstall" treatment, with the exception of required libraries: whenever I install a library, it stays installed. After a bit of doing this on a new system, I don't get any complaints about `Required libraries missing' - if it wasn't for the fact that a number of libs in any given distribution are `either/or' choices (they'd conflict with each other), I'd install the entire "libs" directory and never worry about it again!
However, I still had an MTA to choose. Ah, `smail'! Easy to install, painless to configure - done. Easy choice for an MUA - I really like the configurability of `mutt' - and I'm finished! (Prophetic words...)
EXCEPT. Now, I found that I could not log in as a non-root user anymore. The message I got was:
Cannot execute /bin/bash: Permission denied
What in the heck was this?
`Was this some occult illusion?
Some maniacal intrusion?
These were choices Solomon
Himself had never faced before...'
I knew that I hadn't done anything in /etc/password - for that matter, anything in /etc - but I wasn't 100% sure of what those packages, safe as they're supposed to be, were doing under my auspices as `root'. So, I quickly did some double-checks - yes, user `ben' still existed in /etc/password; ditto for group `ben' in /etc/group; entering the wrong string as a password provoked the usual `Login incorrect' message instead of the `Cannot execute'. Hmm.
Another double-check: I created a new user ("joe"), new password and all ("joe"), and tried to log in as that user. No go, same error. Something in the login sequence had died, for reasons unknown. (Goodbye, "joe"...)
At this point, I let out a quiet "eep!" of minor panic, very quickly switched to another VT, and tried to log in as `root'. WHEW; no problems there. At least I would still have access to the machine when I next brought it up... I'd have hated to do an immediate `live' backup and reinstallation!
Open up /bin. What do the file permissions look like? Uh-huh... everything is set to 755 (-rwxr-xr-x); in addition, `login', `mount', `umount', `ping' and `su' are all SETUID (-rwsr-xr-x). So far, so good; how about /etc permissions? They all look OK too - mostly 644 (-rw-r--r--), with an occasional 600 (-rw-------) here and there, for files denied to everyone but `root'. All right, let's try something silly; I overwrote `login' and `bash' with fresh copies, straight out of their original packages, to make sure that they weren't corrupted. Nope; still no luck.
Wait, how about /home? If the permissions on that got mis-set and the user couldn't get in... Rats, it was fine too - 6775 (drwxrwsr-s). Checking the .bashrc and .bash_profile showed nothing unusual - and their perms were OK. Just for kicks, I checked all the other subdirectories in '/'; all except /root were world-readable, which was fine.
There are a couple of files in /var that keep track of who's logged in, when they logged out, and so on; if these guys get corrupted, *all* sorts of strange unpredictable stuff happens. So - emergency measure time! - I typed
cat >/var/log/wtmp cat >var/run/utmpwhich blew their contents away and left them as zero-length files. [He actually typed this without the "cat", but I put the "cat" in to make it clear that the ">" was part of the command line and not the shell prompt. -Ed.] I logged out on all VTs (just so `utmp' and `wtmp' would get some data), and... the usual result.
Permissions on /dev/ttyX and /dev/vcsX (terminals and virtual consoles)? They all looked OK too; I was starting to lose hope.
Wait; what about a systematic approach? Let's get an idea of exactly what's happening before running in every direction. A quick look at the System Administrator's Guide (SAG) to refresh my memory - ah, there's the login process:
From the "System Administrator's Guide", by Lars Wirzenius
First, init makes sure there is a getty program for the terminal connection (or console). getty listens at the terminal and waits for the user to notify that he is ready to login in (this usually means that the user must type something). When it notices a user, getty outputs a welcome message (stored in /etc/issue), and prompts for the username, and finally runs the login program. login gets the username as a parameter, and prompts the user for the password. If these match, login starts the shell configured for the user; else it just exits and terminates the process (perhaps after giving the user another chance at entering the username and password). init notices that the process terminated, and starts a new getty for the terminal.
' ' ' ' ' ' ' ' ------------ ' GIF2ASCII ' | Start | ' conversion by ' ------------ ' "fastfingers" ' V ' program ' ------------------- ' Copyleft 2000 ' ___________| init: fork + exec |_______ ' ' ' ' ' ' ' ' | | "/sbin/getty" | | | ------------------- | ^ V ^ | ---------------------- | | | getty: wait for user | | | ---------------------- | ^ V ^ | ---------------------- | | | getty: read username,| | | | exec "/bin/login" | | | ---------------------- | ^ V ^ | ---------------------- | | | login: read password | | | ---------------------- | ^ V ^ | / \ | | / \ | ------------- / Do \ | | Login: exit |---<-No- / they \ | ------------- \ match?/ ^ \ / | \ / | \ / | | Yes ^ V | ------------------------ | | login: exec("/bin/sh") | | ------------------------ ^ V | ---------------------- | | sh: read and execute | | | commands | ^ ---------------------- | V | ---------- | | sh: exit |----------- ----------Figure 8.1: Logins via terminals: the interaction of init, getty, login, and the shell.Note that the only new process is the one created by init (using the fork system call); getty and login only replace the program running in the process (using the exec system call).
Following the process, we can see that everything up until the last part - the 'exec("/bin/sh")', that is - seems OK. It's during or after that hand-off that things go wild. The problem was now down to system calls, something I wasn't quite sure how to approach... and yet that piece of information contained everything I needed to know; I just didn't know how to apply it. Later on, it would become self-evident.
Over the next ten days or so, every time I logged in I would try something new; some things totally outlandish and unlikely to work; some, bright ideas that produced great disappointment when the Evil Message once again showed its head. Nothing worked. I replaced `getty'; tried a couple of shells other than /bin/bash; tried "su"ing to `ben'; checked the logs (they showed `ben' as having successfully logged in (!), which told me that `login' was fine; the failure occurred when it handed the process off to `bash' - I knew that!)...
After finding only a few references to this on the Net - mostly in Japanese, Swedish, and German (I managed to puzzle out the last two - one of them suggested checking perms on '/' ! Excellent idea... which didn't pan out in my case), I shot off a panicked resume of the problem to the The Answer Guy - Hi! <grin> Unfortunately, he must have been swamped by all those Windows2000 questions that he just loves to answer... anyway, I was cast on my own resources.
Ah - `strace'! Remember `strace'; `strace' is your friend... A really fantastic piece of software that traces the execution of a program and reports it, step by step. Let's go!
Since you have to be logged in to run a program, I ran
strace -s 10000 -vfo login.ben login ben
from my current VT; this meant "Run strace on `login ben'; print all lines up to 10000 characters long (I didn't want to miss any messages, no matter how long they were); make the output verbose; trace any forked processes; output the result to a file called `login.ben'". Then, as a baseline, I ran
strace -s 10000 -vfo login.root login root- and now, I had two files to compare. The `root' one was about twice as long as `ben' - that made sense, since a successful login goes on to execute all the stuff in the "~/.bash*" files.
`strace login' makes for very informative reading. If I hadn't already read the System Administrator's Guide, this would have given me the exact information - in far more detail. It shows all the libraries that are read, every file examined by `login', the comparison procedure for `group' and `password'... the only thing it did NOT show was the reason for the failure; just the fact itself, at exactly the point in the procedure where I expected it to be:
(300+ lines elided) execve("/bin/bash", ["-bash"], ["TERM=linux", "HZ=100", "HOME=/home/ben", "SHELL=/bin/bash", "PATH=/bin:/usr/bin", "USER=ben", "LOGNAME=ben", "MAIL=/var/spool/mail/ben", "LANG=C", "HUSHLOGIN=FALSE"]) = -1 EACCES (Permission denied) write(2, "Cannot execute /bin/bash: Permission denied\n", 44) = 44
Just great. The last thing poor `login' tried to do, before falling over on its back with its legs twitching in the air, was to `execve' bash with the defined variables collected from /etc/password, /etc/login.defs, and so on - all of those looked OK - and write those 44 hateful characters to "stderr" (output descriptor 2). Basically, the stuff I'd already figured out.
I did notice, however, that `login' was opening a number of libraries in /lib that were needed by the Name Service Switch configuration file (/etc/nsswitch.conf). What if one of the mentioned libraries was corrupted? That would be right in line with the `system calls' theory - since libraries are where the system calls come from! Let's check the lib that handles local logins for NSS (see `man nsswitch'):
dpkg -S libnss_compat-2.0.7.so("Tell me, O Mighty Debian Package Manager, whence cometh said program?"), and the Debian Oracle, in his wisdom, replied -
libc6: /lib/libnss_compat-2.0.7.so
Humm. The very core of the Linux libs. Well... a quick replacement of all the /lib/libnss* ... and no change. Next idea.
This procedure got me thinking, though. Something was indeed "rotten in the state of Denmark" - perhaps I needed to check perms on the files in /libs?
The only problem was, I didn't know what they were supposed to be. You see, most of the libs are set to "root.root 644" - owner root, group root, user - read/write, group - read-only, others - read-only. There are a few, though, that should be set "root.root 755" - as above, but with "execute" permissions for everyone added... and without looking at a fresh Linux installation, I had no idea of what was right.
WAIT a minute! As I'd mentioned in a 2-cent tip that I'd sent in to LG, I like to keep a copy of a Debian "base installation" file set (7 files, about 15MB) on my DOS partition as a 'rescue' utility - it should have everything I need!
Yes, I did check the perms on all the other libraries; `ld-2.0.7.so' was the only one that was affected. The only remaining `unknown' was how the perms changed in the first place... but I suspect that question will never be answered.
As usual, the lessons that Linux teaches are hard - but fair. There's *always* a way to solve a problem; admittedly, often the easiest way is to reinstall the system, but this does not teach you the "innards" of an OS the way tracking down a problem will. In my case, reinstallation would have been relatively easy: I have a couple of spare drives, easily big enough to hold my "up to the minute" data so that I don't even need to touch my backups, and a basic Debian install takes me less than 10 minutes. I wasn't interested in that. The thought uppermost in my mind was: "What would happen if this occurred at a customer's site?" I needed to know what the right solution was... and through persistence - no, sheer bloody-mindedness - I succeeded.
I don't suggest that every one of you beat his brains out against some difficult problem once a week just to "keep in practice" - but I do suggest that you use a methodical approach, based on knowledge gained from reading the appropriate HOWTOs and other documentation available before grabbing that installation CD yet another time. There will be times when you'd like nothing better than to laugh maniacally as you watch your system shrink to a pinpoint, dropping away from your lofty perch on the Empire State Building... and there will be other times when the satisfaction of having solved a knotty problem of this sort makes you pound your chest and do Tarzan imitations.
Now, if you all will excuse me, I've got a chimpanzee and an elephant I'm supposed to meet...
Happy Linuxing to all,
Ben Okopnik