"Linux Gazette...making Linux just a little more fun!"


The Answer Guy


By James T. Dennis,
Starshine Technical Services, http://www.starshine.org/


(?)UID/GID Synchronization and Management

From Gordon Haverland on 16 Jul 1998 in the comp.unix.questions newsgroup

Hi:

I inherited sys admin stuff as part of a job. At first, this wasn't a problem: GIS work on a single Linux machine. I did development and analysis, others did just analysis. Soon we got another Linux machine, so development moved to there. To share printing, Ethernet was installed and LPRng. Then a Solaris 2.5.1 machine was added. So, the 2 linux machines have a handful of users, the Sun has those plus a few other groups of users, and I plan to add a Beowulf cluster "real soon now". Is there any rationale out there for assigning UID and GID in a hetrogeous cluster/network like this? It sure looks like users common among machines have to have the same UID and GIDs. The Solaris has NIS on it, so I guess whatever I do should get administered from there. Thanks for any light you might shed on this.

Gordon Haverland

(!)I'm not sure what you mean by "rationale" on this context.
Do you mean:
"Why should I co-ordinate and synchronize the account management on the systems throughout my network?"
... or do you mean:
"How should I ....."
... or do you mean something else entirely?
I'll answer the first two questions (probably in far more detail than you wanted):
There are two principle reasons why you want to co-ordinate the user/UID and group/GID management across your network. The first is relatively obvious --- it has to do with user and administrative convenience.
If each of your users are expected to have relatively uniform access to the systems throughout the network, then they'll expect the same username and password to work on each system that they are supposed to use. If they change their password they will expect that change to be global.
When you --- as the admin --- add, remove, disable, or change an account, you want to do it once, in one place. You don't want to have to manually copy those changes to every system.
Of course these reasons don't require that the UID/GID's match. As you probably know names and group names in Unix and Linux are mapped into numeric forms (UID's and GID's respectively). All file ownership (inodes) and processes use these numerics for all access and identity determination throughout the kernel and drivers. These numeric values are reverse mapped back to their corresponding principle symbolic representations (the names) by the utilities that display or process that information. Thus the 'ls -l' command is doing a lookup on each directory entry to find the name that corresponds to the the owner and group ID's.
Most of the commands you use actually do this through library calls. In deed most of these commands are "dynamically linked" (use shared libraries) which perform the calls through common external files (the libc). As we'll see this is very important as we look at the implications of consolidating the account mapping information into a networked model (such as NIS).
As I said, you could maintain a network of systems which co-ordinated username/password data, and group membership lists without synchronizing the UID's and GID's across the systems. Most network protocols and utilities (the r* gang: rsh, rlogin, rcp, and things like telnet, ftp, etc) exchange this data in "text" (symbolic) form.
However, we then come to NFS!
The NFS protocols use numeric forms to represent ownership. Therefore an NFS server provides access based on an implicit trust that the NFS client is providing a compatible and legimate mapping of the cient's UID/GID to the server's.
It is possible in Linux' NFS implementation to run a ugidd (a UID/GID mapping daemon). Thus you could create maps for every NFS server to map each clients UIDs to this server's UID's, etc. Yes, that idea is as ugly as it sounds!
I won't go into the security implications of NFS' mechanism here. I'll just point out that my pet expansion of NFS is "no flippin' security." I'm told that it is possible to enable a "secure RPC" portmapper which implements host-to-host authentication. I'd like to know more about that.
However, it is still the case that any users who can get root access to any trusted NFS client can impersonate any non-root user so far as the NFS servers in that domain are concerned. Since "sufficient" physical access virtually guarantees that workstation users can get root access (possibly by resorting to a screwdriver and CMOS battery jumper) I come to the conclusion that NFS hopelessly insecure in today's common network configurations (which workstations and PC's at everyone's desks).
(In defense of NFS I should point out that its security model, and the one's we see in the r* gang were not unreasonable when most Unix installations had a small cluster of multi-user systems locked in a server room --- and all user access was via terminals and X-terminals. This suggests that there are some situations where they are still justified).
Despite these limitations and implications, NFS is the most commonly deployed networked filesystem between Unix and Linux systems. I have high hopes for CODA, but even the most optimistic dreams reveal that it will take a long time to be widely adopted.
So, it is in your best interests to synchronize your UID/GID to user/group name mappings throughout your enterprise. It is also recommended that you adopt a policy that UID's are not re-used. When a user leaves your organization you "retire" their UID (disabling their access by *'ing out their passwd, removing them from the groups maps, setting their "shell" to some /bin/denied binary and their "home" directory to a secured "graveyard" --- I use /home/.graveyard on my systems). The reason for this may not be obvious. However, if you are maintaining archival backups for several years (or indefinitely) you'll want to avoid any ambiguities and confusion that might result from restoring one (long gone) user's files and finding them owned by one of your new users.
(This "UID retirement" policy is obviously not feasible for larger ISP's and usually difficult for Universities and other high turnover environments. You can still make it a policy to cycle all the way around the UID/GID space before re-use).
That should answer the questions about "why" we want to co-ordinate account information (user/password, and group/membership data) and why many (most) of us want to synchronize the UID's and GID's that the accounts map to.
Now, we think about "how" to do so.
One common method is to use 'rdist' to distribute a set of files (usually /etc/passwd, /etc/group, and /etc/hosts) to every machine in a "domain" (this being the "administrative" sense of the term, which might or might not match a DNS domain or subdomain). For this to work we have to declare one system to be the "master" and we have to ensure that all account changes occur on that system.
This can be done by manually training everyone to always issue their 'passwd' 'chfn' 'chsh' and similar commands from a shell on that system, or you can create wrappers for each of the affected commands (replacing the client copies of these commands with a script that doesn't something like: 'ssh $master "$0"' for example).
The nice things about this approach are:
It works for just about any Unix and any Linux (regardless of the libraries and programs running on the client).
The new risks and protocols are explicitly put in place by the sysadmin --- we don't introduce new protocols that might affect our security.
There is no additional network latency and overhead for most programs running most of the time. You are never waiting for 'ls' to resolve user and group names over the network!
The concerns about this method are:
You have to ensure the integrity and security of the master --- I'd suggest requiring 'ssh' access to it and using PAM and possibly a chroot jail to limit the access of most users to just the appropriate commands.
All clients must "trust" the master -- they must allow that system to "push" new root owned system configuration files to them. I'd use 'rdist' or 'rsync' over 'ssh' for this as well.
You may have unacceptable propagation delays (a user's new password may take hours to get propagated to all systems).
It doesn't "scale" well and it doesn't conform to any standards. You (as the sysadmin) will have to do your own scripting to deploy it. Any bugs in your scripts are quite likely to take down the entire administrative domain.
Then there's NIS.
NIS is a protocol and a set of utilities and libraries which basically implement exactly the features we've just described. I've deliberately used several NIS terms in my preceding discussion.
NIS distributes various sorts of "maps" (different "maps" for passwords, groups, hosts, etc). The primary NIS server for a domain is called the "master" --- and secondary servers are called "slaves." Nodes (hosts, workstations, etc) that request data from these "maps" are called "clients."
One of the big features of glibc (the GNU libc version 2.x which is being integrated into Linux distributions as libc.6.x) is support for NIS. It used to be the case that supporting NIS on a Linux client required a special version of the shared libraries (a variant compilation of libc.5).
In Red Hat 5.x and Debian 2.x this will not be necessary. We expect that most other Linux distributions will follow suit in their next major releases. (This transition is similar to the a.out to ELF transition we faced a couple of years ago, and much less of a hassle than the infamous "procps" fiasco that we went through between the 1.x and 2.x kernels. Notably it is possible to have libc.5 and glibc concurrently installed on a system --- the major issue is which way your base system binaries and utilities are linked).
The advantages of NIS:
It's a standard. Most modern forms of Unix support it.
It's scaleable and robust. It automatically deals with capacity and availability issues by having two tiers of servers (master and slave).
It's already been written. You won't be re-inventing this wheel. (At the same time it is more generalized --- so this wheel may have more spokes, lug nuts, and axle trimmings than you needed or wanted).
The disadvantages of NIS:
NIS is designed to do more than you might want. It will default to providing host mapping services (which might conflict with your DNS scheme and might give you a bit of extra grief while configuring 'sendmail' --- at least the Solaris default version of 'sendmail'). These are relatively easy issues to resolve --- once you understand the underlying model. However they are cause for sysadmin confusion and frustration in the early stages.
It's not terribly secure. There is a NIS+ which uses cryptographic means to tighten up some of that. However, NIS+ doesn't seem to be available for Linux yet. That is probably largely the result of the U.S. federal government's unpopular and idiotic attitudes towards cryptography --- which has a generally chilling effect on the development and deployment of robust security. The fact that U.S. policy also recognizes patents on software and algorithms (particularly the very broad RSA held patents on public key cryptography) also severely constrains our programmers (they are liable if they re-invent any protected algorithm --- no matter how "obvious" it seemed to them nor how "independently" their derivation). Regardless of these political issues, I still have technical concerns about NIS security.
Hybrid:
You can use NIS within your domain, and you can distribute your NIS maps out to systems that are on the periphery (for example out to your web servers and bastion/proxy systems out on the "firewall" or "perimeter network segment." This can be combined with some custom filtering (to disable shell access by most users to these machines --- helping to ensure that the UID/GID mappings are used solely for marking file ownership --- for example).
NIS maps are is the same format as the files to which they correspond. Thus the NIS passwd map is a regular looking passwd file, and the NIS group map is in the conventional format you'd expect in your /etc/group file.
You might have to fuss with these files a bit to "shadow" them (or "star out" the passwords on accounts that shouldn't be give remote access to a given host).
Ideally I'd like to see a hybrid of NIS and Kerberos. We'd see NIS used to provide the names/UID's --- and Kerberos used for the authentication. However, I haven't yet heard of any movement to do this. I have heard rumblings of LDAP used in a way that might overlap with NIS quite a bit (and I'd hope that there'd be an LDAP to NIS gateway so we wouldn't have to transition all those libraries again).
Back to your case.
NIS sounds like a natural choice. However, you don't have to pick the Solaris system for the administration. You can use any of the Linux systems or any Solaris system (among others) as the NIS master. Since your Solaris system is probably installed on more expensive SPARC hardware, and it probably was purchased to run some services or applications that aren't readily available on your Linux systems --- it would probably be wiser to put up an extra Linux box as a dedicated NIS master and administrative console.
It doesn't sound like internal security is even on your roadmap. That's fine and fairly common. All the members of your team probably have sufficient physical access to all of the systems in your group that significant efforts at intranet (internal) security in software would probably be pointless.
I'd still recommend that you use "private net" addressing (RFC1918 --- 10.*.*.*, 192.168.*.* and the range of class B's from 172.16.*.* through 172.31.*.*) --- and make your systems go through a masquerading router (Linux or any of several others) or a set of proxies or some combination of these.
In fact I highly recommend that you fire up a DNS caching server on at least one system --- and point all of your clients at that, and that you install a caching web proxy (Apache can be configured for this, or you can use Squid --- which is my personal favorite). These caches can save a significant amount of bandwidth for even a small workgroup and they only cost a little bit of installation and configuration time and a bit of disk space and memory.
(The default Red Hat configuration for their 'named' rc file is to just run in caching mode. So that's truly a no brainer --- just distribute a new resolv.conf file to all the clients so that it refers *first* to the host that runs the cache. My squid configuration on a S.u.S.E. machine and has run, unmodified, for months. I vaguely remember having to edit a configuration file. It must not have been too bad. Naturally you have to get users to point their web browsers at the proxy --- that might be a hassle. With 'lynx' I just edit the global lynx.cfg file and send it to each host. Similar features are available in Netscape Navigator --- but you have to touch everyone's configuration at least once).
Once you have your workgroup/LAN isolated on its own group of addresses and working through proxies --- it is relatively easy to configure your router to filter most sorts of traffic that should not be trusted across domains and, especially, to prevent "address spoofing" (incoming packets that claim to be from some point inside of your domain).
You can certainly spend all of your time learning about and implementing security. However, the cost of that effort may exceed your management's valuation of the resources that are accessible on your LAN. Obviously they'll have to do their own risk and cost/benefit analyses on those issues.
I pay an undue amount of attention to systems security because it is my hobby. As a consultant it turns out to be useful since I can explain these concerns and concepts to my customers, and refer to them to specialists when they want "real" security.
To learn more details about how to setup and use NIS under Linux read the "The Linux NIS(YP)/NYS/NIS+ HOWTO" at: (http://www.ssc.com/linux/LDP/HOWTO/NIS-HOWTO.html). This was just updated a couple of weeks ago.
I guess there is support for NIS+ clients in glibc --- so that's new to me. I've copied Thorsten Kukuk (the author of this HOWTO) so he can correct any errors I've made or otherwise comment.
By the way: What is GIS? I've heard references to it --- and I gather that it has to do with geography and informations systems. Would you consider writing an overview of how Linux is being used in GIS related work for LJ or LG?


Copyright © 1998, James T. Dennis
Published in Linux Gazette Issue 31 August 1998


[ Answer Guy Index ] backup uidgid connect 95slow badblock trident sound
kernel solprint idescsi distrib modem NDS rpm
guy maildns memleak multihead cdr


[ Table Of Contents ] [ Front Page ] [ Previous Section ] [ Next Section ]