LateralT

How to use Make effectively for complex projects

26 Nov 2016 - 9:58
TL;DR, Don't use "make -C dir" unless you're really sure there aren't any dependencies across directories.

You may have used Make or any of its derivatives for years, you may have worked in large codebases shared by tens or hundreds of people. But I'm sorry to say that there is a really really big probability that you've been using it wrong.

I know because I've seen it everywhere. Yes, in large projects too. In large projects that earned the company millions of dollars, with hundreds of programmers contributing to it.

So why am I right? Am I better than them? Is that what I'm saying? Well, it's not me who's saying they're wrong. It's been a well-known problem for decades, and Peter Miller documented it first.

The problem starts when your project gets big. Then your code gets big. Then you have to split it in modules and the file organization of your codebase starts looking like a very convoluted tree. How do you manage the building process? A traditional convention says that "you use a Makefile for each module and a top-level Makefile runs make for every module recursively".

The problem with this, as Miller points out, is that by splitting the building process into multiple make processess, you're no longer giving make all the information about the files and their dependencies. The information is spread across a number of Makefiles and none of them know the complete dependency DAG. This is ok as long as there aren't any real dependencies between the components of different directories. But if there are any, that means that in most cases your project is only guaranteed to build right if you build it from scratch. That's why you've probably heard this exchange at some point during your life as a programmer:

- The build has failed.
- Did you change anything?
- Yes, this file. But it's like the change is not seen in this other compilation.
- Did you do a make clean before that?
- No.
- Do it.

You see, Make is a good tool. I mean, it's a pain to use and almost impossible to debug, it's arcane and very functionally limited. BUT, in pure Unix fashion, it does one thing and it does it right fast. The thing it does fast is to scan files, check their stats and build a dependency graph. If you want it to do it properly, you have to feed it all the information about file dependencies, and it has to run the whole building process in only one instance.

Here is an example of it for GNU Make. I learnt the trick from Emile van Bergen, so all the credit goes to him. I simply adapted it to my needs.

File organization

Let's assume a codebase organized like this:

.
|-- Makefile
|-- drivers
|   |-- drv1
|   |   |-- drv1.c
|   |   |-- drv1_1
|   |   |   |-- drv1_1.c
|   |   |   `-- module.mk
|   |   |-- include
|   |   |   `-- drv1.h
|   |   `-- module.mk
|   |-- drv2
|   |   |-- drv2.c
|   |   `-- module.mk
|   `-- module.mk
|-- include
|-- lib
|   |-- include
|   |   `-- lib.h
|   |-- module.mk
|   `-- std.c
|-- main
|   |-- main.c
|   |-- main.d
|   `-- module.mk
|-- platform
|   `-- osx
|       `-- make.conf
`-- xtra_module
    |-- module.mk
    |-- xtra.c
    |-- xtra1
    |   |-- include
    |   |   `-- xtra1.h
    |   |-- module.mk
    |   `-- xtra1.c
    `-- xtra2
        |-- include
        |   `-- xtra2.h
        |-- module.mk
        `-- xtra2.c

This covers a wide range of dependency types and module relations:

Some of these may not have sense in a real project, but this is meant as an example only. Note that each module can have their own include directory so the API of every module is easy to locate. There's also a common include dir at the top. The platform directory contains platform-specific definitions so that multi-platform builds can be easily parameterized.

How to do it

Top level Makefile

In the top-level Makefile we have this:


#### Top-level Makefile

########## Base definitions ##########

#### Include platform-specific tool definitions
include platform/osx/make.conf

#### Common build recipes
COMPILE    = $(CC) $(CFLAGS) $(CFLAGS_TARGET) -o $@ -c $<
ASSEMBLE   = $(CC) $(ASFLAGS) $(CFLAGS_TARGET) -o $@ -c $<
LINK       = $(LD) $(LDFLAGS) $(LF_TARGET) -o $@ $^ $(LL_TARGET) $(LL_FLAGS)
LINK_SHLIB = $(CC) -shared -o $@ $^
ARCHIVE    = $(AR) cr $@ $^

#### Pattern-rule definition
%.o: %.c
	$(COMPILE)

%.o: %.S
	$(ASSEMBLE)

%: %.o
	$(LINK)

#### Additional definitions
COMMON_INCLUDES := include .
BASEDIR := $(shell pwd)


########## Targets and rules ##########

#### Main target
all: program

#### Buildable subdirectories in no particular order
dir := lib
include $(dir)/module.mk
SUB_LIBS    := $(SUB_LIBS) $(TARGET_LIB_lib)
SUB_OBJS    := $(SUB_OBJS) $(TARGET_OBJ_lib)
SUB_SHLIBS  := $(SUB_SHLIBS) $(TARGET_SHLIB_lib)

dir := drivers
include $(dir)/module.mk
SUB_LIBS    := $(SUB_LIBS) $(TARGET_LIB_drivers)
SUB_OBJS    := $(SUB_OBJS) $(TARGET_OBJ_drivers)
SUB_SHLIBS  := $(SUB_SHLIBS) $(TARGET_SHLIB_drivers)

dir := main
include $(dir)/module.mk
SUB_LIBS    := $(SUB_LIBS) $(TARGET_LIB_main)
SUB_OBJS    := $(SUB_OBJS) $(TARGET_OBJ_main)
SUB_SHLIBS  := $(SUB_SHLIBS) $(TARGET_SHLIB_main)

dir := xtra_module
include $(dir)/module.mk
SUB_LIBS    := $(SUB_LIBS) $(TARGET_LIB_xtra_module)
SUB_OBJS    := $(SUB_OBJS) $(TARGET_OBJ_xtra_module)
SUB_SHLIBS  := $(SUB_SHLIBS) $(TARGET_SHLIB_xtra_module)


#### Top-level rules: main target and clean target

program:: $(SUB_SHLIBS)

program:: $(SUB_OBJS) $(SUB_LIBS)
	$(LINK)

.PHONY: clean
clean:
	rm -f $(CLEAN) program

# Avoid the deletion of object files
.SECONDARY: $(CLEAN)

This creates a set of common recipes and pattern-based rules to take care of all the compilations. After that, it loads all the submodule build files (more on this later) and it defines the top-level rules and targets. The main target is program, which depends on a number of object files and libraries and is generated by linking them. Note that the rules for this target are split in two (we use the :: operator for that), one of them makes it depend on a list of object files and a list of libs and contains a recipe on how to produce the target. The other one doesn't have a recipe and it's only there to include a list of components (in this case, shared libraries) as part of the building process, although a change in these components shouldn't trigger the re-building of the main target.

The compiler, assembler and linker to use, as well as their flags are defined in a platform-specific file (platform/osx/make.conf in this case, for OS X).


#### Platform-specific definitions for OS X

PLATFORM = osx
ARCH     = x86_64

CC       = gcc
CPP      = cpp
LD       = ld
AS       = as
AR       = ar
OBJCOPY  = objcopy

#### Platform-specific include paths. By default, only the common include dir
INCLUDE = $(addprefix -I, $(COMMON_INCLUDES))

#### C compilation flags
OPT_FLAGS = -Os
#DBG_FLAGS = -g
CFLAGS = $(OPT_FLAGS) $(DBG_FLAGS) -MMD $(INCLUDE)

#### Assembler flags
ASFLAGS = $(DBG_FLAGS) -march=$(ARCH) -$(ARCH) -MMD

#### Linker flags
LDFLAGS = -demangle -dynamic -arch $(ARCH) -macosx_version_min 10.11.0 -lSystem\
/Library/Developer/CommandLineTools/usr/bin/../lib/clang/7.3.0/lib/darwin/libclang_rt.osx.a

Traversing modules

In order to include the building and dependency information from every module, we need some way to visit every one of them, process their module-level Makefiles and gather the appropriate info back in the top level. And, ideally, we'd like to be able to do this recursively and keeping the Makefiles as generic and location-agnostic as possible.

See how the submodule Makefiles (aptly named module.mk files) are visited from the top-level Makefile. For every module that dangles from the top directory, we set the module directory name in the dir variable, then include its module.mk file and, finally, we save the module results into the top-level variables SUB_LIBS, SUB_OBJS and SUB_SHLIBS, which are later used to define the dependencies of the main target.

Here's what the module.mk files do. Let's start with the one in the "lib" directory. This module contains a source file and will build a static library with it.


# lib module Makefile

# Save the directory stack and set the current directory in the "d" variable
sp 		:= $(sp).x
dirstack_$(sp)	:= $(d)
d		:= $(dir)

########################################################################

#### Local variables

# Local target is a shared library
TARGET_SHLIB_$(d) := $(d)/lib.so

# Object files
OBJS_$(d) := $(d)/std.o

# Dep files
DEPS_$(d) := $(OBJS_$(d):%.o=%.d)

# Add the files to be removed with "make clean" to the global "CLEAN" variable
CLEAN := $(CLEAN) $(OBJS_$(d)) $(DEPS_$(d)) $(TARGET_SHLIB_$(d))


#### Local rules and targets

# Include the local headers directory into the local compilation
$(OBJS_$(d)): CFLAGS_TARGET := -fPIC -I$(d)/include

# Rule to build the local target (shared library built from object files)
$(TARGET_SHLIB_$(d)): $(OBJS_$(d))
	$(LINK_SHLIB)

# Include deps
-include $(DEPS_$(d))

########################################################################

# Restore the directory stack
d := $(dirstack_$(sp))
sp := $(basename $(sp))


Here is where van Berger's trick comes into scene. In each module we need to save some things into local variables, and we may need to do the same thing recursively over deeper submodules, so how do we keep track of the directory we came from? by using a stack. The stack is implemented as a string, where the push operation consists of appending ".x" to it, and popping is done by peeling the last ".x" off it, which can be done with the basename function.

The current directory name is always kept in the d variable, which is configured in the beginning to be the same as dir (set by the parent Makefile before loading each module.mk) and restored in the end. The local variables are named by using the module directory as a suffix so they don't clash with variables in other modules.

The most important local variables are the ones that start with TARGET_. I use three of them: TARGET_OBJ, TARGET_LIB and TARGET_SHLIB, to store object files, static libraries and shared libraries respectively. These should contain the result of building the module and will be collected by the parent Makefile to be used as part of a bigger target. In this case, the product is a shared library which is stored in the local TARGET_SHLIB variable. In the top-level Makefile all the module-local TARGET_SHLIB variables are collected into SUB_SHLIBS and used to define the main target dependencies.

We can create and use any other local variables we may need, but those won't be used outside the module. Note also that the module.mk adds components to the global CLEAN variable.

Every module.mk file defines how to build its target and intermediate components. In this example, the shared library is linked from an object file. The object file is built according to the pattern-based rule defined in the top-level Makefile, but an additional definition is used for this compilation only: the CFLAGS_TARGET variable is modified to include file-specific flags. In this case, a local include directory and the -fPIC flag to generate position-independent-code.

It's worth mentioning that the stack mechanism may not be really necessary for all modules. In fact, it shouldn't be needed for this one, only for those who also have to traverse submodules recursively. But by wrapping every module.mk with the stack management code we make sure this module is location-independent and that it can be expanded in the future to contain submodules. The "API" we use to announce what the module target is and how the parent module can retrieve it also makes the building system loosely coupled across modules. Only the parent module needs to know about its children modules, but a children module doesn't have to know nothing about its parent or siblings.

More examples

Find the rest of the module examples here.

Software