A good start is to read the GNU coding standards and the Information for maintainers of GNU software.
GNU grep's mailing lists are hosted on lists.gnu.org.
To report bugs, suggest features, ask questions, or help in the development of GNU grep, please send email to the bug-grep mailing list. You can attach bug fixes and patches to your email. To save time, you may want to first look at GNU grep's bug report log to see whether the bug has already been reported. If you see, for example, that Bug#16979 is similar to the symptoms you observe, you can follow up to that bug report by sending email to <16292@debbugs.gnu.org>.
Before contributing significant changes to GNU grep, the Free Software Foundation (FSF) requires that you sign copyright assignment papers. Therefore, if you have not already done so and are not willing or able to, it may be better then to just describe bugs or proposed features rather than post actual code (or documentation), as they would then have to be rewritten anyway.
The grep-commit read-only mailing list tracks all changes made to GNU grep.
Older GNU grep releases directed users to the bug-gnu-utils mailing list. As a consequence, some still post their bug reports and questions there. For this reason, it is a good idea for GNU grep developers to monitor this mailing list and follow up on related threads started there by redirecting them to the bug-grep mailing list. New threads about GNU grep should not be intentionally started there.
The Savannah project page for GNU grep features development-related tools.
See the Savannah web page about the Git repository for GNU grep's source code.
See the Savannah web page about the CVS repository for GNU grep's web pages.
Developers with write access to the repositories will need to create an account on Savannah and upload their SSH public identity information there.
A number of tasks must be performed before every release. See README-release.
Drop dfa.[ch] into a copy of gawk and run “make check”. This step will soon be obsolete: we're syncing the two dfa.c files.
See this list of grep implementations.
Take a look at these and consider opportunities for merging or cloning:
In general, interesting things to check in POSIX/OpenGroup include:
For this issue, interesting things to check in POSIX include:
In particular, consider the following with POSIX' approach on case folding in mind. Assume a non-Turkic locale with a character repertoire reduced to the following various forms of “LATIN LETTER I”:
0049;LATIN CAPITAL LETTER I;Lu;0;L;;;;;N;;;;0069; 0069;LATIN SMALL LETTER I;Ll;0;L;;;;;N;;;0049;;0049 0130;LATIN CAPITAL LETTER I WITH DOT ABOVE;Lu;0;L;0049 0307;;;;N;LATIN CAPITAL LETTER I DOT;;;0069; 0131;LATIN SMALL LETTER DOTLESS I;Ll;0;L;;;;;N;;;0049;;0049
First note the differing UTF-8 octet lengths of U+0049 (0x49) and U+0069 (0x69) versus U+0130 (0xC4 0xB0) and U+0131 (0xC4 0xB1). This implies that whole UTF-8 strings cannot be case-converted in place, using the same memory buffer, and that the needed octet-size of the new buffer cannot merely be guessed.
We have
lc(I) = i, uc(I) = I lc(i) = i, uc(i) = I lc(İ) = i, uc(İ) = İ lc(ı) = ı, uc(ı) = I
where lc() and uc() denote lower-case and upper-case conversions.
There are several candidate --ignore-case logics (including the one mandated by POSIX):
Using the
if (lc(input_wchar) == lc(pattern_wchar))
logic leads to the following matches:
\in I i İ ı pat\ ---------- "I" | Y Y Y n "i" | Y Y Y n "İ" | Y Y Y n "ı" | n n n Y
There is a lack of symmetry between CAPITAL and SMALL LETTERs with this.
Using the
if (uc(input_wchar) == uc(pattern_wchar))
logic leads to the following matches:
\in I i İ ı pat\ ---------- "I" | Y Y n Y "i" | Y Y n Y "İ" | n n Y n "ı" | Y Y n Y
There is a lack of symmetry between CAPITAL and SMALL LETTERs with this.
Using the
if ( lc(input_wchar) == lc(pattern_wchar) || uc(input_wchar) == uc(pattern_wchar))
logic leads to the following matches:
\in I i İ ı pat\ ---------- "I" | Y Y Y Y "i" | Y Y Y Y "İ" | Y Y Y n "ı" | Y Y n Y
There is some elegance and symmetry with this. But there are potentially two conversions to be made per input character. If the pattern is pre-converted, two copies of it need to be kept and used in a mutually coherent fashion.
Using the
if ( input_wchar == pattern_wchar || lc(input_wchar) == pattern_wchar || uc(input_wchar) == pattern_wchar)
logic (as mandated by POSIX) leads to the following matches:
\in I i İ ı pat\ ---------- "I" | Y Y n Y "i" | Y Y Y n "İ" | n n Y n "ı" | n n n Y
There is a different CAPITAL/SMALL symmetry with this. But there's also a loss of pattern/input symmetry that's unique to it. Also there are potentially two conversions to be made per input character.
Using the
if (lc(uc(input_wchar)) == lc(uc(pattern_wchar)))
logic leads to the following matches:
\in I i İ ı pat\ ---------- "I" | Y Y Y Y "i" | Y Y Y Y "İ" | Y Y Y Y "ı" | Y Y Y Y
This shows total symmetry and transitivity (at least in this example analysis). There are two conversions to be made per input character, but support could be added for having a single straight mapping performing a composition of the two conversions.
Any optimization in the implementation of each logic must not change its basic semantic.
In general, interesting things to check in Unicode include:
For this issue, interesting things to check in Unicode include:
Unicode uses the
if (toCasefold(input_wchar_string) == toCasefold(pattern_wchar_string))
logic for caseless matching. Let's consider the “LATIN LETTER I” example mentioned above. In a non-Turkic locale, simple case folding yields
toCasefold_simple(U+0049) = U+0069 toCasefold_simple(U+0069) = U+0069 toCasefold_simple(U+0130) = U+0130 toCasefold_simple(U+0131) = U+0131
which leads to the following matches:
\in I i İ ı pat\ ---------- "I" | Y Y n n "i" | Y Y n n "İ" | n n Y n "ı" | n n n Y
This is different from anything so far!
In a non-Turkic locale, full case folding yields
toCasefold_full(U+0049) = U+0069 toCasefold_full(U+0069) = U+0069 toCasefold_full(U+0130) = <U+0069, U+0307> toCasefold_full(U+0131) = U+0131
with
0307;COMBINING DOT ABOVE;Mn;230;NSM;;;;;N;NON-SPACING DOT ABOVE;;;;
which leads to the following matches:
\in I i İ ı pat\ ---------- "I" | Y Y * n "i" | Y Y * n "İ" | n n Y n "ı" | n n n Y
This is just sad!
Note that having toCasefold(U+0131), simple or full, map to itself instead of U+0069 is in contradiction with the rules of Section 5.18 of the Unicode Standard since toUpperCase(U+0131) is U+0049. Same thing for toCasefold_simple(U+0130) since toLowerCase(U+0131) is U+0069. The justification for the weird toCasefold_full(U+0130) mapping is unknown; it doesn't even make sense to add a dot (U+0307) to a letter that already has one (U+0069). It would have been so simple to put them all in the same equivalence class!
Otherwise, also consider the following problem with Unicode's approach on case folding in mind. Assume that we want to perform
echo 'AßBC | grep -i 'Sb'
which corresponds to
input: U+0041 U+00DF U+0042 U+0043 U+000A pattern: U+0053 U+0062
Following “CaseFolding-4.1.0.txt”, applying the toCasefold() transformation to these yields
input: U+0061 U+0073 U+0073 U+0062 U+0063 U+000A pattern: U+0073 U+0062
so, according to this approach, the input should match the pattern. As long as the original input line is to be reported to the user as a whole, there is no problem (from the user's point-of-view; implementation is complicated by this).
However, consider both these GNU extensions:
echo 'AßBC' | grep -i --only-matching 'Sb' echo 'AßBC' | grep -i --color=always 'Sb'
What is to be reported in these cases, since the match begins in the middle of the original input character 'ß'?
Note that Unicode's toCasefold() cannot be implemented in terms of POSIX' towctrans() since that can only return a single wint_t value per input wint_t value.
The purpose of this listing is to help GNU grep maintainers track down bug fixes and improvements made by distributors so they can be integrated back into the upstream releases from GNU, if appropriate.
Users should not use this listing to find a substitute target where to send their bugs reports. These are still best sent upstream, to the GNU grep team, through the use of the bug-grep@gnu.org mailing list or of the GNU grep project page on Savannah.
This listing is not exhaustive; priority is given to listing distributors who actually maintain patches to the upstream package from GNU.
Please keep this listing sorted by entry. Each field type may appear more than once if appropriate, the field order being significant.
Web site | http://www.debian.org/ |
Package database entry | Old stable http://packages.debian.org/oldstable/base/grep |
Maintainer | Robert van der Meulen <rvdm at debian.org> |
Package database entry | Stable http://packages.debian.org/stable/base/grep |
Maintainer | Ryan M. Golbeck <rmgolbeck at debian.org> |
Maintainer | Jeff Bailey <jbailey at nisa.net> |
Package database entry | Testing http://packages.debian.org/testing/base/grep |
Package database entry | Unstable http://packages.debian.org/unstable/base/grep |
Maintainer | Anibal Monsalve Salazar <anibal at debian.org> |
Maintainer | Santiago Ruano Rincon <santiago at unicauca.edu.co> |
Bug tracking | http://bugs.debian.org/grep |
Source package name | grep |
Binary package name | grep |
Entry updated | 2005-11-08 |
Web site | http://fedora.redhat.com/ |
Web site | http://www.redhat.com/ |
Maintainer | Tim Waugh <twaugh at redhat.com> |
Bug tracking | Red Hat Bugzilla http://bugzilla.redhat.com/ |
Managed repository | cvs -d:pserver:anonymous@cvs.fedora.redhat.com:/cvs/dist co devel/grep |
Managed repository | http://cvs.fedora.redhat.com/viewcvs/devel/grep/ |
Source package name | grep |
Binary package name | grep |
Entry updated | 2005-05-05 |
Web site | http://www.freebsd.org/ |
Bug tracking | http://www.freebsd.org/cgi/query-pr-summary.cgi?query |
Managed repository | CVS_RSH=ssh cvs -d:ext:freebsdanoncvs@anoncvs.FreeBSD.org:/home/ncvs co src/gnu/usr.bin/grep |
Managed repository | http://www.freebsd.org/cgi/cvsweb.cgi/src/gnu/usr.bin/grep/ |
Entry updated | 2005-05-05 |
Web site | http://www.gentoo.org/ |
Package database entry | http://packages.gentoo.org/packages/?category=sys-apps;name=grep |
Bug tracking | Gentoo Bugzilla http://bugs.gentoo.org/ |
Managed repository | http://www.gentoo.org/cgi-bin/viewcvs.cgi/sys-apps/grep/ |
Source package name | grep |
Binary package name | grep |
Entry updated | 2005-05-05 |
Web site | http://www.mandrivalinux.com/ |
Bug tracking | Mandriva Bugzilla http://qa.mandriva.com/ |
Source package name | grep |
Binary package name | grep |
Entry updated | 2005-05-05 |
Web site | http://www.netbsd.org/ |
Package database entry | ftp://ftp.netbsd.org/pub/NetBSD/packages/pkgsrc/textproc/grep/README.html |
Bug tracking | http://www.netbsd.org/Misc/query-pr.html |
Managed repository | cvs -d:pserver:anoncvs@anoncvs.NetBSD.org:/cvsroot co pkgsrc/textproc/grep |
Managed repository | http://cvsweb.netbsd.org/bsdweb.cgi/pkgsrc/textproc/grep/ |
Source package name | grep |
Binary package name | grep |
Entry updated | 2005-05-05 |
Web site | http://www.openbsd.org/ |
Package database entry | http://www.openbsd.org/3.8_packages/i386/ggrep-2.5.1p1.tgz-long.html |
Maintainer | Christian Weisgerber <naddy at openbsd.org> |
Bug tracking | http://www.openbsd.org/query-pr.html |
Managed repository | cvs -d:pserver:anoncvs@anoncvs1.ca.openbsd.org:/cvs co ports/sysutils/ggrep |
Managed repository | http://www.openbsd.org/cgi-bin/cvsweb/ports/sysutils/ggrep/ |
Source package name | ggrep |
Binary package name | ggrep |
Entry updated | 2005-11-08 |
Web site | http://www.openpkg.org/ |
Maintainer | Ralf S. Engelschall <rse at openpkg.org> |
Managed repository | cvs -d :pserver:anonymous@cvs.openpkg.org:/v/openpkg/cvs co openpkg-src/grep |
Managed repository | rsync -av rsync://rsync.openpkg.org/openpkg-cvs/openpkg-src/grep/ . |
Managed repository | http://cvs.openpkg.org/dir?d=openpkg-src/grep |
Source package name | grep |
Binary package name | grep |
Entry updated | 2005-06-19 |
Web site | http://www.novell.com/linux/suse/ |
Maintainer | Andreas Schwab <schwab at suse.de> |
Package database entry | Professional http://www.novell.com/products/linuxpackages/professional/grep.html |
Source package name | grep |
Binary package name | grep |
Entry updated | 2005-06-19 |
Return to GNU grep's main page.
Return to the GNU Project's home page.
Return to the FSF's home page.
Please send inquiries about GNU and the FSF to
Free Software Foundation Voice: +1 617 542-5942 51 Franklin Street, Fifth Floor Fax: +1 617 542-2652 Boston MA 02110-1301 USA Email: gnu@gnu.org
Please send broken links and other web page corrections (or suggestions) to
The GNU Webmasters webmasters@gnu.org
Please see the Translations README for information on coordinating and submitting translations.
Copyright © 2005, 2015 Free Software Foundation, Inc.,
51 Franklin Street, Suite 330, Boston, MA 02110-1301, USA
Verbatim copying and distribution of this entire article
are permitted worldwide, without royalty, in any medium,
provided this notice and the copyright notice are preserved.
Updated: $Date: 2015/03/07 00:30:38 $ (UTC) by $Author: eggert $ (at savannah.gnu.org)