Dependency Tracking Evolution (automake)

28.2 Dependency Tracking in Automake

Over the years Automake has deployed three different dependency tracking methods. Each method, including the current one, has had flaws of various sorts. Here we lay out the different dependency tracking methods, their flaws, and their fixes. We conclude with recommendations for tool writers, and by indicating future directions for dependency tracking work in Automake.

First Take
Dependencies As Side Effects
Dependencies for the User
Techniques for Computing Dependencies
Recommendations for Tool Writers
Future Directions for Automake’s Dependency Tracking

Description

Our first attempt at automatic dependency tracking was based on the method recommended by GNU make. (see Generating Prerequisites Automatically in The GNU make Manual)

This version worked by precomputing dependencies ahead of time. For each source file, it had a special .P file that held the dependencies. There was a rule to generate a .P file by invoking the compiler appropriately. All such .P files were included by the Makefile, thus implicitly becoming dependencies of Makefile.

Bugs

This approach had several critical bugs.

The code to generate the .P file relied on gcc. (A limitation, not technically a bug.)
The dependency tracking mechanism itself relied on GNU make. (A limitation, not technically a bug.)
Because each .P file was a dependency of Makefile, this meant that dependency tracking was done eagerly by make. For instance, ‘make clean’ would cause all the dependency files to be updated, and then immediately removed. This eagerness also caused problems with some configurations; if a certain source file could not be compiled on a given architecture for some reason, dependency tracking would fail, aborting the entire build.
As dependency tracking was done as a pre-pass, compile times were doubled–the compiler had to be run twice per source file.
‘make dist’ re-ran automake to generate a Makefile that did not have automatic dependency tracking (and that was thus portable to any version of make). In order to do this portably, Automake had to scan the dependency files and remove any reference that was to a source file not in the distribution. This process was error-prone. Also, if ‘make dist’ was run in an environment where some object file had a dependency on a source file that was only conditionally created, Automake would generate a Makefile that referred to a file that might not appear in the end user’s build. A special, hacky mechanism was required to work around this.

Historical Note

The code generated by Automake is often inspired by the Makefile style of a particular author. In the case of the first implementation of dependency tracking, I believe the impetus and inspiration was Jim Meyering. (I could be mistaken. If you know otherwise feel free to correct me.)

28.2.2 Dependencies As Side Effects

Description
Bugs

Description

The next refinement of Automake’s automatic dependency tracking scheme was to implement dependencies as side effects of the compilation. This was aimed at solving the most commonly reported problems with the first approach. In particular we were most concerned with eliminating the weird rebuilding effect associated with make clean.

In this approach, the .P files were included using the -include command, which let us create these files lazily. This avoided the ‘make clean’ problem.

We only computed dependencies when a file was actually compiled. This avoided the performance penalty associated with scanning each file twice. It also let us avoid the other problems associated with the first, eager, implementation. For instance, dependencies would never be generated for a source file that was not compilable on a given architecture (because it in fact would never be compiled).

Bugs

This approach also relied on the existence of gcc and GNU make. (A limitation, not technically a bug.)
Dependency tracking was still done by the developer, so the problems from the first implementation relating to massaging of dependencies by ‘make dist’ were still in effect.
This implementation suffered from the “deleted header file” problem. Suppose a lazily-created .P file includes a dependency on a given header file, like this:
```
maude.o: maude.c something.h
```
Now suppose that the developer removes something.h and updates maude.c so that this include is no longer needed. If he runs make, he will get an error because there is no way to create something.h.

We fixed this problem in a later release by further massaging the output of gcc to include a dummy dependency for each header file.

28.2.3 Dependencies for the User

Description
Bugs

Description

The bugs associated with ‘make dist’, over time, became a real problem. Packages using Automake were being built on a large number of platforms, and were becoming increasingly complex. Broken dependencies were distributed in “portable” Makefile.ins, leading to user complaints. Also, the requirement for gcc and GNU make was a constant source of bug reports. The next implementation of dependency tracking aimed to remove these problems.

We realized that the only truly reliable way to automatically track dependencies was to do it when the package itself was built. This meant discovering a method portable to any version of make and any compiler. Also, we wanted to preserve what we saw as the best point of the second implementation: dependency computation as a side effect of compilation.

In the end we found that most modern make implementations support some form of include directive. Also, we wrote a wrapper script that let us abstract away differences between dependency tracking methods for compilers. For instance, some compilers cannot generate dependencies as a side effect of compilation. In this case we simply have the script run the compiler twice. Currently our wrapper script (depcomp) knows about twelve different compilers (including a "compiler" that simply invokes makedepend and then the real compiler, which is assumed to be a standard Unix-like C compiler with no way to do dependency tracking).

Bugs

Running a wrapper script for each compilation slows down the build.
Many users don’t really care about precise dependencies.
This implementation, like every other automatic dependency tracking scheme in common use today (indeed, every one we’ve ever heard of), suffers from the “duplicated new header” bug.
This bug occurs because dependency tracking tools, such as the compiler, only generate dependencies on the successful opening of a file, and not on every probe.

Suppose for instance that the compiler searches three directories for a given header, and that the header is found in the third directory. If the programmer erroneously adds a header file with the same name to the first directory, then a clean rebuild from scratch could fail (suppose the new header file is buggy), whereas an incremental rebuild will succeed.

What has happened here is that people have a misunderstanding of what a dependency is. Tool writers think a dependency encodes information about which files were read by the compiler. However, a dependency must actually encode information about what the compiler tried to do.

This problem is not serious in practice. Programmers typically do not use the same name for a header file twice in a given project. (At least, not in C or C++. This problem may be more troublesome in Java.) This problem is easy to fix, by modifying dependency generators to record every probe, instead of every successful open.
Since automake generates dependencies as a side effect of compilation, there is a bootstrapping problem when header files are generated by running a program. The problem is that, the first time the build is done, there is no way by default to know that the headers are required, so make might try to run a compilation for which the headers have not yet been built.
This was also a problem in the previous dependency tracking implementation.

The current fix is to use BUILT_SOURCES to list built headers (see Built sources). This causes them to be built before any other other build rules are run. This is unsatisfactory as a general solution, however in practice it seems sufficient for most actual programs.

This code is used since Automake 1.5.

In GCC 3.0, we managed to convince the maintainers to add special command-line options to help Automake more efficiently do its job. We hoped this would let us avoid the use of a wrapper script when Automake’s automatic dependency tracking was used with gcc.

Unfortunately, this code doesn’t quite do what we want. In particular, it removes the dependency file if the compilation fails; we’d prefer that it instead only touch the file in any way if the compilation succeeds.

Nevertheless, since Automake 1.7, when a recent gcc is detected at configure time, we inline the dependency-generation code and do not use the depcomp wrapper script. This makes compilations faster for those using this compiler (probably our primary user base). The counterpart is that because we have to encode two compilation rules in Makefile (with or without depcomp), the produced Makefiles are larger.

28.2.4 Techniques for Computing Dependencies

There are actually several ways for a build tool like Automake to cause tools to generate dependencies.

makedepend: This was a commonly-used method in the past. The idea is to run a special program over the source and have it generate dependency information. Traditional implementations of makedepend are not completely precise; ordinarily they were conservative and discovered too many dependencies.
The tool: An obvious way to generate dependencies is to simply write the tool so that it can generate the information needed by the build tool. This is also the most portable method. Many compilers have an option to generate dependencies. Unfortunately, not all tools provide such an option.
The file system: It is possible to write a special file system that tracks opens, reads, writes, etc, and then feed this information back to the build tool. clearmake does this. This is a very powerful technique, as it doesn’t require cooperation from the tool. Unfortunately it is also very difficult to implement and also not practical in the general case.
LD_PRELOAD: Rather than use the file system, one could write a special library to intercept open and other syscalls. This technique is also quite powerful, but unfortunately it is not portable enough for use in automake.

28.2.5 Recommendations for Tool Writers

We think that every compilation tool ought to be able to generate dependencies as a side effect of compilation. Furthermore, at least while make-based tools are nearly universally in use (at least in the free software community), the tool itself should generate dummy dependencies for header files, to avoid the deleted header file bug. Finally, the tool should generate a dependency for each probe, instead of each successful file open, in order to avoid the duplicated new header bug.

28.2.6 Future Directions for Automake’s Dependency Tracking

Currently, only languages and compilers understood by Automake can have dependency tracking enabled. We would like to see if it is practical (and worthwhile) to let this support be extended by the user to languages unknown to Automake.