The Shepherd 1.0.0 released!
Finally, twenty-one years after its inception (twenty-one!), the Shepherd leaves ZeroVer territory to enter a glorious 1.0 era. This 1.0.0 release is published today because we think Shepherd has become a solid tool, meeting user experience standards one has come to expect since systemd changed the game of free init systems and service managers alike. It’s also a major milestone for Guix, which has been relying on the Shepherd from a time when doing so counted as dogfooding.
To celebrate this release, the amazing Luis Felipe López Acevedo designed a new logo, available under CC-BY-SA, and the project got a proper web site!
Let’s first look at what the Shepherd actually is and what it can do for you.
At a glance
The Shepherd is a minimalist but featureful service manager and as such, it herds services: it keeps track of services, their state and their dependencies, and it can start, stop, and restart them when needed. It’s a simple job; doing it right and providing users with insight and control over services is a different story.
The Shepherd consists of two commands: shepherd
is the daemon that
manages services, and herd
is the command that lets you interact with
it to inspect and control the status of services. The shepherd
command can run as the first process (PID 1) and serve as the “init
system”, as is the case on Guix
System;
or it can manage services for unprivileged users, as is the case with
Guix
Home.
For example, running herd status ntpd
as root allows me to know what
the Network Time Protocol (NTP) daemon is up to:
$ sudo herd status ntpd
● Status of ntpd:
It is running since Fri 06 Dec 2024 02:08:08 PM CET (2 days ago).
Main PID: 11359
Command: /gnu/store/s4ra0g0ym1q1wh5jrqs60092x1nrb8h9-ntp-4.2.8p18/bin/ntpd -n -c /gnu/store/7ac2i2c6dp2f9006llg3m5vkrna7pjbf-ntpd.conf -u ntpd -g
It is enabled.
Provides: ntpd
Requires: user-processes networking
Custom action: configuration
Will be respawned.
Log file: /var/log/ntpd.log
Recent messages (use '-n' to view more or less):
2024-12-08 18:35:54 8 Dec 18:35:54 ntpd[11359]: Listen normally on 25 tun0 128.93.179.24:123
2024-12-08 18:35:54 8 Dec 18:35:54 ntpd[11359]: Listen normally on 26 tun0 [fe80::e6b7:4575:77ef:eaf4%12]:123
2024-12-08 18:35:54 8 Dec 18:35:54 ntpd[11359]: new interface(s) found: waking up resolver
2024-12-08 18:46:38 8 Dec 18:46:38 ntpd[11359]: Deleting 25 tun0, [128.93.179.24]:123, stats: received=0, sent=0, dropped=0, active_time=644 secs
2024-12-08 18:46:38 8 Dec 18:46:38 ntpd[11359]: Deleting 26 tun0, [fe80::e6b7:4575:77ef:eaf4%12]:123, stats: received=0, sent=0, dropped=0, active_time=644 secs
It’s running, and it’s logging messages: the latest ones are shown here
and I can open /var/log/ntpd.log
to view more. Running herd stop ntpd
would terminate the ntpd
process, and there’s also a start
and
a restart
action.
Services can also have custom actions; in the example above, we see
there’s a configuration
action. As it turns out, that action is a
handy way to get the file name of the ntpd
configuration file:
$ head -2 $(sudo herd configuration ntpd)
driftfile /var/run/ntpd/ntp.drift
pool 2.guix.pool.ntp.org iburst
Of course a typical system runs quite a few services, many of which
depend on one another. The herd graph
command returns a
representation of that service dependency graph that can be piped to
dot
or xdot
to visualize it; here’s what I get on my laptop:
It’s quite a big graph (you can zoom in for details!) but we can learn a
few things from it. Each node in the graph is a service; rectangles are
for “regular” services (typically daemons like ntpd
), round nodes
correspond to one-shot services (services that perform one action and
immediately stop), and diamonds are for timed services (services that
execute code periodically).
Blurring the user/developer line
A unique feature of the Shepherd is that you configure and extend it in its own implementation language: in Guile Scheme. That does not mean you need to be an expert in that programming language to get started. Instead, we try to make sure anyone can start simple for their configuration file and gradually get to learn more if and when they feel the need for it. With this approach, we keep the user in the loop, as Andy Wingo put it.
A Shepherd configuration file is a Scheme snippet that goes like this:
(register-services
(list (service '(ntpd) …)
…))
(start-in-the-background '(ntpd …))
Here we define ntpd
and get it started as soon as shepherd
has read
the configuration file. The ellipses can be filled in with more
services.
As an example, our ntpd
service is defined like this:
(service
'(ntpd)
#:documentation "Run the Network Time Protocol (NTP) daemon."
#:requirement '(user-processes networking)
#:start (make-forkexec-constructor
(list "…/bin/ntpd"
"-n" "-c" "/…/…-ntpd.conf" "-u" "ntpd" "-g")
#:log-file "/var/log/ntpd.log")
#:stop (make-kill-destructor)
#:respawn? #t)
The important parts here are #:start
bit, which says how to start the
service, and #:stop
, which says how to stop it. In this case we’re
just spawning the ntpd
program but other startup mechanisms are
supported by default: inetd, socket activation à la systemd, and
timers. Check out the manual for
examples
and a
reference.
There’s no limit to what #:start
and #:stop
can do. In Guix System
you’ll find services that run daemons in
containers,
that mount/unmount file systems (as can be guessed from the graph
above), that set up/tear down a static networking configuration, and a
variety of other things. The
Swineherd project goes as
far as extending the Shepherd to turn it into a tool to manage system
containers—similar to what the Docker daemon does.
Note that when writing service definitions for Guix System and Guix
Home, you’re targeting a thin
layer
above the Shepherd programming interface. As is customary in Guix, this
is multi-stage programming: G-expressions specified in the start
and
stop
fields are staged and make it into the resulting Shepherd
configuration file.
New since 0.10.x
For those of you who were already using the Shepherd, here are the highlights compared to the 0.10.x series:
- Support for timed services has been added: these services spawn a command or run Scheme code periodically according to a predefined calendar.
herd status SERVICE
now shows high-level information about services (main PID, command, addresses it is listening to, etc.) instead of its mere “running value”. It also shows recently-logged messages.- To make it easier to discover functionality, that command also displays custom actions applicable to the service, if any. It also lets you know if a replacement is pending, in which case you can restart the service to upgrade it.
herd status root
is no longer synonymous withherd status
; instead it shows information about theshepherd
process itself.- On Linux,
reboot --kexec
lets you reboot straight into a new Linux kernel previously loaded withkexec --load
.
The service collection has grown:
The new log rotation service is responsible for periodically rotating log files, compressing them, and eventually deleting them. It’s very much like similar log rotation tools from the 80’s since
shepherd
logs to plain text files like in the good ol’ days.There’s a couple of be benefits that come from its integration into the Shepherd. First, it already knows all the files that services log to, so no additional configuration is needed to teach it about these files. Second, log rotation is race free: no single line of log can be lost in the process.
The new system log service what’s traditionally devoted to a separate
syslogd
program. The advantage of having it inshepherd
is that it can start logging earlier and integrates nicely with the rest of the system.The timer service provides functionality similar to the venerable
at
command, allowing you to run a command at a particular time:
herd schedule timer at 07:00 -- mpg123 alarm.mp3
- The transient service
maker
lets you run a command in the background as a transient service (it
is similar in spirit to the
systemd-run
command):
herd spawn transient -d $PWD -- make -j4
- The GOOPS interface that was deprecated in 0.10.x is now gone.
As always, the NEWS
file
has additional details.
In the coming weeks, we will most likely gradually move service
definitions in Guix from
mcron
to timed services and similarly replace
Rottlog
and
syslogd
.
This should be an improvement for Guix users and system administrators!
Cute code
I did mention that the Shepherd is minimalist, and it really is: 7.4K lines of Scheme, excluding tests, according to SLOCCount. This is in large part thanks to the use of a high-level memory-safe language and due to the fact that it’s extensible—peripheral features can live outside the Shepherd.
Significant benefits also come from the concurrency framework: the concurrent sequential processes (CSP) model and Fibers. Internally, the state of each service is encapsulated in a fiber. Accessing a service’s state amounts to sending a message to its fiber. This way to structure code is itself very much inspired by the actor model. This results in simpler code (no dreaded event loop, no callback hell) and better separation of concern.
Using a high-level framework like Fibers does come with its challenges. For example, we had the case of a memory leak in Fibers under certain conditions, and we certainly don’t want that in PID 1. But the challenge really lies in squashing those low-level bugs so that the foundation is solid. The Shepherd itself is free from such low-level issues; its logic is easy to reason about and that alone is immensely helpful, it allows us to extend the code without fear, and it avoids concurrency bugs that plague programs written in the more common event-loop-with-callbacks style.
In fact, thanks to all this, the Shepherd is probably the coolest init system to hack on. It even comes with a REPL for live hacking!
What’s next
There’s a number of down-to-earth improvements that can be made in the Shepherd, such as adding support for dynamically-reconfigurable services (being able to restart a service but with different options), integration with control groups (“cgroups”) on Linux, proper integration for software suspend, etc.
In the longer run, we envision an exciting journey towards a distributed and capability-style Shepherd. Spritely Goblins provides the foundation for this; using it looks like a natural continuation of the design work of the Shepherd: Goblins is an actor model framework! Juliana Sims has been working on adapting the Shepherd to Goblins and we’re eager to see what comes out of it in the coming year. Stay tuned!
Enjoy!
In the meantime, we hope you enjoy the Shepherd 1.0 as much as we enjoyed making it. Four people contributed code that led to this release, but there are other ways to help: through graphics and web design, translation, documentation, and more. Join us!