Blame SOURCES/capfaq-0.2.txt

282e56
This is the Linux kernel capabilities FAQ
282e56
282e56
Its history, to the extent that I am able to reconstruct it is that
282e56
v2.0 was posted to the Linux kernel list on 1999/04/02 by Boris
282e56
Tobotras. Thanks to Denis Ducamp for forwarding me a copy.
282e56
282e56
Cheers
282e56
282e56
Andrew
282e56
282e56
Linux Capabilities FAQ 0.2
282e56
==========================
282e56
282e56
1) What is a capability?
282e56
282e56
The name "capabilities" as used in the Linux kernel can be confusing.
282e56
First there are Capabilities as defined in computer science. A
282e56
capability is a token used by a process to prove that it is allowed to
282e56
do an operation on an object.  The capability identifies the object
282e56
and the operations allowed on that object.  A file descriptor is a
282e56
capability.  You create the file descriptor with the "open" call and
282e56
request read or write permissions.  Later, when doing a read or write
282e56
operation, the kernel uses the file descriptor as an index into a
282e56
data structure that indicates what operations are allowed.  This is an
282e56
efficient way to check permissions.  The necessary data structures are
282e56
created once during the "open" call.  Later read and write calls only
282e56
have to do a table lookup.  Operations on capabilities include copying
282e56
capabilities, transferring capabilities between processes, modifying a
282e56
capability, and revoking a capability.  Modifying a capability can be
282e56
something like taking a read-write filedescriptor and making it
282e56
read-only.  A capability often has a notion of an "owner" which is
282e56
able to invalidate all copies and derived versions of a capability.
282e56
Entire OSes are based on this "capability" model, with varying degrees
282e56
of purity.  There are other ways of implementing capabilities than the
282e56
file descriptor model - traditionally special hardware has been used,
282e56
but modern systems also use the memory management unit of the CPU.
282e56
282e56
Then there is something quite different called "POSIX capabilities"
282e56
which is what Linux uses.  These capabilities are a partitioning of
282e56
the all powerful root privilege into a set of distinct privileges (but
282e56
look at securelevel emulation to find out that this isn't necessary
282e56
the whole truth).  Users familiar with VMS or "Trusted" versions of
282e56
other UNIX variants will know this under the name "privileges".  The
282e56
name "capabilities" comes from the now defunct POSIX draft 1003.1e
282e56
which used this name.
282e56
282e56
2) So what is a "POSIX capability"?
282e56
282e56
A process has three sets of bitmaps called the inheritable(I),
282e56
permitted(P), and effective(E) capabilities.  Each capability is
282e56
implemented as a bit in each of these bitmaps which is either set or
282e56
unset.  When a process tries to do a privileged operation, the
282e56
operating system will check the appropriate bit in the effective set
282e56
of the process (instead of checking whether the effective uid of the
282e56
process i 0 as is normally done).  For example, when a process tries
282e56
to set the clock, the Linux kernel will check that the process has the
282e56
CAP_SYS_TIME bit (which is currently bit 25) set in its effective set.
282e56
282e56
The permitted set of the process indicates the capabilities the
282e56
process can use.  The process can have capabilities set in the
282e56
permitted set that are not in the effective set.  This indicates that
282e56
the process has temporarily disabled this capability.  A process is
282e56
allowed to set a bit in its effective set only if it is available in
282e56
the permitted set.  The distinction between effective and permitted
282e56
exists so that processes can "bracket" operations that need privilege.
282e56
282e56
The inheritable capabilities are the capabilities of the current
282e56
process that should be inherited by a program executed by the current
282e56
process.  The permitted set of a process is masked against the
282e56
inheritable set during exec().  Nothing special happens during fork()
282e56
or clone().  Child processes and threads are given an exact copy of
282e56
the capabilities of the parent process.
282e56
282e56
3) What about other entities in the system? Users, Groups, Files?
282e56
282e56
Files have capabilities.  Conceptually they have the same three
282e56
bitmaps that processes have, but to avoid confusion we call them by
282e56
other names.  Only executable files have capabilities, libraries don't
282e56
have capabilities (yet).  The three sets are called the allowed set,
282e56
the forced set, and the effective set.
282e56
282e56
The allowed set indicates what capabilities the executable is allowed
282e56
to receive from an execing process.  This means that during exec(),
282e56
the capabilities of the old process are first masked against a set
282e56
which indicates what the process gives away (the inheritable set of
282e56
the process), and then they are masked against a set which indicates
282e56
what capabilities the new process image is allowed to receive (the
282e56
allowed set of the executable).
282e56
282e56
The forced set is a set of capabilities created out of thin air and
282e56
given to the process after execing the executable.  The forced set is
282e56
similar in nature to the setuid feature.  In fact, the setuid bit from
282e56
the filesystem is "read" as a full forced set by the kernel.
282e56
282e56
The effective set indicates which bits in the permitted set of the new
282e56
process should be transferred to the effective set of the new process.
282e56
The effective set is best thought of as a "capability aware" set.  It
282e56
should consist of only 1s if the executable is capability-dumb, or
282e56
only 0s if the executable is capability-smart.  Since the effective
282e56
set consists of only 0s or only 1s, the filesystem can implement this
282e56
set using a single bit.
282e56
282e56
NOTE: Filesystem support for capabilities is not part of Linux 2.2.
282e56
282e56
Users and Groups don't have associated capabilities from the kernel's
282e56
point of view, but it is entirely reasonable to associate users or
282e56
groups with capabilities.  By letting the "login" program set some
282e56
capabilities it is possible to make role users such as a backup user
282e56
that will have the CAP_DAC_READ_SEARCH capability and be able to do
282e56
backups.  This could also be implemented as a PAM module, but nobody
282e56
has implemented one yet.
282e56
282e56
4) What capabilities exist?
282e56
282e56
The capabilities available in Linux are listed and documented in the
282e56
file /usr/src/linux/include/linux/capability.h.
282e56
282e56
5) Are Linux capabilities hierarchical?
282e56
282e56
No, you cannot make a "subcapability" out of a Linux capability as in
282e56
capability-based OSes.
282e56
282e56
6) How can I use capabilities to make sure Mr. Evil Luser (eluser)
282e56
can't exploit my "suid" programs?
282e56
282e56
This is the general outline of how this works given filesystem
282e56
capability support exists.  First, you have a PAM module that sets the
282e56
inheritable capabilities of the login-shell of eluser.  Then for all
282e56
"suid" programs on the system, you decide what capabilities they need
282e56
and set the _allowed_ set of the executable to that set of
282e56
capabilities.  The capability rules
282e56
282e56
   new permitted = forced | (allowed & inheritable)
282e56
282e56
means that you should be careful about setting forced capabilities on
282e56
executables.  In a few cases, this can be useful though.  For example
282e56
the login program needs to set the inheritable set of the new user and
282e56
therefore needs an almost full permitted set.  So if you want eluser
282e56
to be able to run login and log in as a different user, you will have
282e56
to set some forced bits on that executable.
282e56
282e56
7) What about passing capabilities between processes?
282e56
282e56
Currently this is done by the system call "setcap" which can set the
282e56
capabilities of another process.  This requires the CAP_SETPCAP
282e56
capability which you really only want to grant a _few_ processes.
282e56
CAP_SETPCAP was originally intended as a workaround to be able to
282e56
implement filesystem support for capabilities using a daemon outside
282e56
the kernel.
282e56
282e56
There has been discussions about implementing socket-level capability
282e56
passing.  This means that you can pass a capability over a socket.  No
282e56
support for this exists in the official kernel yet.
282e56
282e56
8) I see securelevel has been removed from 2.2 and are superceeded by
282e56
capabilities.  How do I emulate securelevel using capabilities?
282e56
282e56
The setcap system call can remove a capability from _all_ processes on
282e56
the system in one atomic operation.  The setcap utility from the
282e56
libcap distribution will do this for you.  The utility requires the
282e56
CAP_SETPCAP privilege to do this.  The CAP_SETPCAP capability is not
282e56
enabled by default.
282e56
282e56
libcap is available from
282e56
ftp://ftp.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.2/
282e56
282e56
9) I noticed that the capability.h file lacks some capabilities that
282e56
are needed to fully emulate 2.0 securelevel.  Is there a patch for
282e56
this?
282e56
282e56
Actually yes - funny you should ask :-).  The problem with 2.0
282e56
securelevel is that they for example stop root from accessing block
282e56
devices.  At the same time they restrict the use of iopl.  These two
282e56
changes are fundamentally different.  Blocking access to block devices
282e56
means restricting something that usually isn't restricted.
282e56
Restricting access to the use of iopl on the other hand means
282e56
restricting (blocking) access to something that is already blocked.
282e56
Emulating the parts of 2.0 securelevel that restricts things that are
282e56
normally not restricted means that the capabilites in the kernel has
282e56
to have a set of capabilities that are usually _on_ for a normal
282e56
process (note that this breaks the explanation that capabilities are a
282e56
partitioning of the root privileges).  There is an experimental patch at
282e56
282e56
ftp://ftp.guardian.no/pub/free/linux/capabilities/patch-cap-exp-1
282e56
282e56
which implements a set of capabilities with the "CAP_USER" prefix:
282e56
282e56
cap_user_sock  - allowed to use socket()
282e56
cap_user_dev   - allowed to open char/block devices
282e56
cap_user_fifo  - allowed to use pipes
282e56
282e56
These should be enough to emulate 2.0 securelevel (tell me if we need
282e56
something more).
282e56
282e56
10) Seems I need a CAP_SETPCAP capability that I don't have to make use
282e56
of capabilities.  How do I enable this capability?
282e56
282e56
Change the definition of CAP_INIT_EFF_SET and CAP_INIT_INH_SET to the
282e56
following in include/linux/capability.h:
282e56
282e56
#define CAP_INIT_EFF_SET    { ~0 }
282e56
#define CAP_INIT_INH_SET    { ~0 }
282e56
282e56
This will start init with a full capability set and not with
282e56
CAP_SETPCAP removed.
282e56
282e56
11) How do I start a process with a limited set of capabilities?
282e56
282e56
Get the libcap library and use the execcap utility.  The following
282e56
example starts the update daemon with only the CAP_SYS_ADMIN
282e56
capability.
282e56
282e56
execcap 'cap_sys_admin=eip' update
282e56
282e56
12) How do I start a process with a limited set of capabilities under
282e56
another uid?
282e56
282e56
Use the sucap utility which changes uid from root without loosing any
282e56
capabilities.  Normally all capabilities are cleared when changing uid
282e56
from root.  The sucap utility requires the CAP_SETPCAP capability.
282e56
The following example starts updated under uid updated and gid updated
282e56
with CAP_SYS_ADMIN raised in the Effective set.
282e56
282e56
sucap updated updated execcap 'cap_sys_admin=eip' update
282e56
282e56
[ Sucap is currently available from
282e56
ftp://ftp.guardian.no/pub/free/linux/capabilities/sucap.c. Put it in
282e56
the progs directory of libcap to compile.]
282e56
282e56
13) What are the "capability rules"
282e56
282e56
The capability rules are the rules used to set the capabilities of the
282e56
new process image after an exec.  They work like this:
282e56
282e56
        pI' = pI
282e56
  (***) pP' = fP | (fI & pI)
282e56
        pE' = pP' & fE          [NB. fE is 0 or ~0]
282e56
282e56
  I=Inheritable, P=Permitted, E=Effective // p=process, f=file
282e56
  ' indicates post-exec().
282e56
282e56
Now to make sense of the equations think of fP as the Forced set of
282e56
the executable, and fI as the Allowed set of the executable.  Notice
282e56
how the Inheritable set isn't touched at all during exec().
282e56
282e56
14) What are the laws for setting capability bits in the Inheritable,
282e56
Permitted, and Effective sets?
282e56
282e56
Bits can be transferred from Permitted to either Effective or
282e56
Inheritable set.
282e56
282e56
Bits can be removed from all sets.
282e56
282e56
15) Where is the standard on which the Linux capabilities are based?
282e56
282e56
There used to be a POSIX draft called POSIX.6 and later POSIX 1003.1e.
282e56
However after the committee had spent over 10 years, POSIX decided
282e56
that enough is enough and dropped the draft.  There will therefore not
282e56
be a POSIX standard covering security anytime soon.  This may lead to
282e56
that the POSIX draft is available for free, however.
282e56
282e56
--
282e56
        Best regards, -- Boris.
282e56