The Shepherd 1.0.0 released!

Ludovic Courtès — December 9, 2024

Finally, twenty-one years after its inception (twenty-one!), the Shepherd leaves ZeroVer territory to enter a glorious 1.0 era. This 1.0.0 release is published today because we think Shepherd has become a solid tool, meeting user experience standards one has come to expect since systemd changed the game of free init systems and service managers alike. It’s also a major milestone for Guix, which has been relying on the Shepherd from a time when doing so counted as dogfooding.

To celebrate this release, the amazing Luis Felipe López Acevedo designed a new logo, available under CC-BY-SA, and the project got a proper web site!

Logo of the Shepherd.

Let’s first look at what the Shepherd actually is and what it can do for you.

At a glance

The Shepherd is a minimalist but featureful service manager and as such, it herds services: it keeps track of services, their state and their dependencies, and it can start, stop, and restart them when needed. It’s a simple job; doing it right and providing users with insight and control over services is a different story.

The Shepherd consists of two commands: shepherd is the daemon that manages services, and herd is the command that lets you interact with it to inspect and control the status of services. The shepherd command can run as the first process (PID 1) and serve as the “init system”, as is the case on Guix System; or it can manage services for unprivileged users, as is the case with Guix Home. For example, running herd status ntpd as root allows me to know what the Network Time Protocol (NTP) daemon is up to:

$ sudo herd status ntpd
● Status of ntpd:
  It is running since Fri 06 Dec 2024 02:08:08 PM CET (2 days ago).
  Main PID: 11359
  Command: /gnu/store/s4ra0g0ym1q1wh5jrqs60092x1nrb8h9-ntp-4.2.8p18/bin/ntpd -n -c /gnu/store/7ac2i2c6dp2f9006llg3m5vkrna7pjbf-ntpd.conf -u ntpd -g
  It is enabled.
  Provides: ntpd
  Requires: user-processes networking
  Custom action: configuration
  Will be respawned.
  Log file: /var/log/ntpd.log

Recent messages (use '-n' to view more or less):
  2024-12-08 18:35:54  8 Dec 18:35:54 ntpd[11359]: Listen normally on 25 tun0 128.93.179.24:123
  2024-12-08 18:35:54  8 Dec 18:35:54 ntpd[11359]: Listen normally on 26 tun0 [fe80::e6b7:4575:77ef:eaf4%12]:123
  2024-12-08 18:35:54  8 Dec 18:35:54 ntpd[11359]: new interface(s) found: waking up resolver
  2024-12-08 18:46:38  8 Dec 18:46:38 ntpd[11359]: Deleting 25 tun0, [128.93.179.24]:123, stats: received=0, sent=0, dropped=0, active_time=644 secs
  2024-12-08 18:46:38  8 Dec 18:46:38 ntpd[11359]: Deleting 26 tun0, [fe80::e6b7:4575:77ef:eaf4%12]:123, stats: received=0, sent=0, dropped=0, active_time=644 secs

It’s running, and it’s logging messages: the latest ones are shown here and I can open /var/log/ntpd.log to view more. Running herd stop ntpd would terminate the ntpd process, and there’s also a start and a restart action.

Services can also have custom actions; in the example above, we see there’s a configuration action. As it turns out, that action is a handy way to get the file name of the ntpd configuration file:

$ head -2 $(sudo herd configuration ntpd)
driftfile /var/run/ntpd/ntp.drift
pool 2.guix.pool.ntp.org iburst

Of course a typical system runs quite a few services, many of which depend on one another. The herd graph command returns a representation of that service dependency graph that can be piped to dot or xdot to visualize it; here’s what I get on my laptop:

Example of a service dependency graph.

It’s quite a big graph (you can zoom in for details!) but we can learn a few things from it. Each node in the graph is a service; rectangles are for “regular” services (typically daemons like ntpd), round nodes correspond to one-shot services (services that perform one action and immediately stop), and diamonds are for timed services (services that execute code periodically).

Blurring the user/developer line

A unique feature of the Shepherd is that you configure and extend it in its own implementation language: in Guile Scheme. That does not mean you need to be an expert in that programming language to get started. Instead, we try to make sure anyone can start simple for their configuration file and gradually get to learn more if and when they feel the need for it. With this approach, we keep the user in the loop, as Andy Wingo put it.

A Shepherd configuration file is a Scheme snippet that goes like this:

(register-services
  (list (service '(ntpd) )
        ))

(start-in-the-background '(ntpd ))

Here we define ntpd and get it started as soon as shepherd has read the configuration file. The ellipses can be filled in with more services.

As an example, our ntpd service is defined like this:

(service
  '(ntpd)
  #:documentation "Run the Network Time Protocol (NTP) daemon."
  #:requirement '(user-processes networking)
  #:start (make-forkexec-constructor
           (list "…/bin/ntpd"
                 "-n" "-c" "/…/…-ntpd.conf" "-u" "ntpd" "-g")
           #:log-file "/var/log/ntpd.log")
  #:stop (make-kill-destructor)
  #:respawn? #t)

The important parts here are #:start bit, which says how to start the service, and #:stop, which says how to stop it. In this case we’re just spawning the ntpd program but other startup mechanisms are supported by default: inetd, socket activation à la systemd, and timers. Check out the manual for examples and a reference.

There’s no limit to what #:start and #:stop can do. In Guix System you’ll find services that run daemons in containers, that mount/unmount file systems (as can be guessed from the graph above), that set up/tear down a static networking configuration, and a variety of other things. The Swineherd project goes as far as extending the Shepherd to turn it into a tool to manage system containers—similar to what the Docker daemon does.

Note that when writing service definitions for Guix System and Guix Home, you’re targeting a thin layer above the Shepherd programming interface. As is customary in Guix, this is multi-stage programming: G-expressions specified in the start and stop fields are staged and make it into the resulting Shepherd configuration file.

New since 0.10.x

For those of you who were already using the Shepherd, here are the highlights compared to the 0.10.x series:

  • Support for timed services has been added: these services spawn a command or run Scheme code periodically according to a predefined calendar.
  • herd status SERVICE now shows high-level information about services (main PID, command, addresses it is listening to, etc.) instead of its mere “running value”. It also shows recently-logged messages.
  • To make it easier to discover functionality, that command also displays custom actions applicable to the service, if any. It also lets you know if a replacement is pending, in which case you can restart the service to upgrade it.
  • herd status root is no longer synonymous with herd status; instead it shows information about the shepherd process itself.
  • On Linux, reboot --kexec lets you reboot straight into a new Linux kernel previously loaded with kexec --load.

The service collection has grown:

  • The new log rotation service is responsible for periodically rotating log files, compressing them, and eventually deleting them. It’s very much like similar log rotation tools from the 80’s since shepherd logs to plain text files like in the good ol’ days.

    There’s a couple of be benefits that come from its integration into the Shepherd. First, it already knows all the files that services log to, so no additional configuration is needed to teach it about these files. Second, log rotation is race free: no single line of log can be lost in the process.

  • The new system log service what’s traditionally devoted to a separate syslogd program. The advantage of having it in shepherd is that it can start logging earlier and integrates nicely with the rest of the system.

  • The timer service provides functionality similar to the venerable at command, allowing you to run a command at a particular time:

herd schedule timer at 07:00 -- mpg123 alarm.mp3
  • The transient service maker lets you run a command in the background as a transient service (it is similar in spirit to the systemd-run command):
herd spawn transient -d $PWD -- make -j4
  • The GOOPS interface that was deprecated in 0.10.x is now gone.

As always, the NEWS file has additional details.

In the coming weeks, we will most likely gradually move service definitions in Guix from mcron to timed services and similarly replace Rottlog and syslogd. This should be an improvement for Guix users and system administrators!

Cute code

I did mention that the Shepherd is minimalist, and it really is: 7.4K lines of Scheme, excluding tests, according to SLOCCount. This is in large part thanks to the use of a high-level memory-safe language and due to the fact that it’s extensible—peripheral features can live outside the Shepherd.

Significant benefits also come from the concurrency framework: the concurrent sequential processes (CSP) model and Fibers. Internally, the state of each service is encapsulated in a fiber. Accessing a service’s state amounts to sending a message to its fiber. This way to structure code is itself very much inspired by the actor model. This results in simpler code (no dreaded event loop, no callback hell) and better separation of concern.

Using a high-level framework like Fibers does come with its challenges. For example, we had the case of a memory leak in Fibers under certain conditions, and we certainly don’t want that in PID 1. But the challenge really lies in squashing those low-level bugs so that the foundation is solid. The Shepherd itself is free from such low-level issues; its logic is easy to reason about and that alone is immensely helpful, it allows us to extend the code without fear, and it avoids concurrency bugs that plague programs written in the more common event-loop-with-callbacks style.

In fact, thanks to all this, the Shepherd is probably the coolest init system to hack on. It even comes with a REPL for live hacking!

What’s next

There’s a number of down-to-earth improvements that can be made in the Shepherd, such as adding support for dynamically-reconfigurable services (being able to restart a service but with different options), integration with control groups (“cgroups”) on Linux, proper integration for software suspend, etc.

In the longer run, we envision an exciting journey towards a distributed and capability-style Shepherd. Spritely Goblins provides the foundation for this; using it looks like a natural continuation of the design work of the Shepherd: Goblins is an actor model framework! Juliana Sims has been working on adapting the Shepherd to Goblins and we’re eager to see what comes out of it in the coming year. Stay tuned!

Enjoy!

In the meantime, we hope you enjoy the Shepherd 1.0 as much as we enjoyed making it. Four people contributed code that led to this release, but there are other ways to help: through graphics and web design, translation, documentation, and more. Join us!