Simple GCC projects - GNU Project

This page lists projects which are feasible for people who aren't intimately familiar with GCC's internals. Many of them are things which would be extremely helpful if they got done, but the core team never seems to get around to them. They're all busy wrestling with the problems that do require deep familiarity with the internals. We hope this will make it easier for more people to assist the GCC project, by giving new developers places to jump in.

Most of these projects require a reasonable amount of experience with C and the Unix programming environment. Do not despair if any individual task seems daunting; there's probably an easier one. If you have no programming skills, we can still use your help with documentation or our bug tracker.

We assume that you already know how to get the latest sources, configure and build the compiler, and run the test suite. You should also familiarize yourself with the requirements for contributions to GCC.

Many of these projects will require at least a reading knowledge of GCC's intermediate language, RTL. It may help to understand the higher-level tree structure as well. Unfortunately, for this we only have an incomplete, C/C++ specific manual.

Bug patrol

These projects all have to do with bugs in the compiler, and our testsuite which is supposed to make sure no bugs come back.

General code cleanliness

These are projects which will generally make it easier to work with the source tree.

Port cleanliness

This involves mostly bringing back ends up to date with the current state of the art in the machine-independent code. Many ports date back to the 1980s and have not been actively maintained since then. There is also work to be done in cleaning up the places where the MI code uses machine-specific macros.

Configuration and Makefiles

This largely consists of the same sort of thing as the above, but for per-host configuration instead of per-target. You will need to understand autoconf, or Make, to do these projects.

Library infrastructure

These tasks are about improving the utility routine library used by GCC. If you like data structures, these may be for you.

User interface

Optimizer improvements

These require some knowledge of compiler internals and substantial programming skills, but not detailed knowledge of GCC internals. I think.

Make insn-recog.c use a byte-coded DFA.

Richard Henderson and I started this back in 1999 but never finished. I may still be able to find the code. It produces an order of magnitude size reduction in insn-recog.o, which is huge (432KB on i386).

Make GCSE (and CSE?) capable of digging inside PARALLELs.

This is needed for GCSE to do any good at all on i386.

Here's some dialogue on the subject, which unfortunately may only confuse you.

Michael Meissner:

Actually I would imagine gcse handles clobbers [inside parallels] just fine and dandy, since it uses single_set which strips off the clobbers/uses if there is only one set. What it doesn't handle is a parallel that has two sets, which on the x86 is for setting the condition code register. This probably applies to more phases than just gcse (look for single_set). Another place a parallel with 2 sets is used is for machines that do both the divide and modulus in one step.

Richard Henderson:

Those don't get created until combine.
No, the real problem is that gcse doesn't handle hard registers, so the clobber of hard register 17 (flags) squelches everything.

Daniel Berlin:

The comment above hash_scan_insn claims it doesn't handle clobbers in parallels, yet the code appears to.

Find all the places that simplify RTL and make them use simplify-rtx.c.

Here is some commentary from there:

Right now GCC has three (yes, three) major bodies of RTL simplification code that need to be unified.

fold_rtx in cse.c. This code uses various CSE specific information to aid in RTL simplification.

combine_simplify_rtx in combine.c. Similar to fold_rtx, except that it uses combine specific information to aid in RTL simplification.

The routines in this file.

Long term we want to only have one body of simplification code; to get to that state I recommend the following steps:

Pore over fold_rtx and simplify_rtx and move any simplifications which are not pass dependent state into these routines.

As code is moved by #1, change fold_rtx and simplify_rtx to use this routine whenever possible.

Allow for pass dependent state to be provided to these routines and add simplifications based on the pass dependent state. Remove code from cse.c and combine.c that becomes redundant/dead.

It will take time, but ultimately the compiler will be easier to maintain and improve. It's totally silly that when we add a simplification that it needs to be added to four places (three for RTL simplification and one for tree simplification).

Convert reorg.c to use the flow graph.

Then we can throw away resource.c. Long term we want reorg folded into the scheduler, but that's much harder.

Improve dwarf2out.c.

DWARF2 can handle all kinds of heavy optimizations that we'd like to do, but our generator doesn't know how just yet. At the very least it'd be nice if -gdwarf-2 -fomit-frame-pointer could give you a clean backtrace on all targets where DWARF works. (This is definitely possible.)

You need to coordinate with the gdb team. It does no good for gcc to generate fancy debug info if the debugger doesn't understand it.

Implement clusters of branch tables as a method of handling case statements.

Currently gcc has three different methods for handling case/switch statements. If the labels form a dense cluster a branch table is used. Otherwise if it seems sensible a set of bit test and branch instructions are used. Failing that a set of compare and branch instructions are generated.

A useful optimization would be to detect the situation where there is more than one cluster of labels and use compare and branch instructions to choose the correct cluster and then a branch table to select the correct label.

This optimization has been tried before, as may be seen by this email thread.

C/C++ front end

For questions related to the use of GCC, please consult these web pages and the GCC manuals. If that fails, the gcc-help@gcc.gnu.org mailing list might help. Comments on these web pages and the development of GCC are welcome on our developer list at gcc@gcc.gnu.org. All of our lists have public archives.

Copyright (C) Free Software Foundation, Inc. Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved.

These pages are maintained by the GCC team. Last modified 2024-08-18.