From 9762f1d6dee7da503d62cea41bbd49f1412280e7 Mon Sep 17 00:00:00 2001
From: Shulhan <ms@kilabit.info>
Date: Thu, 15 Jan 2026 21:57:43 +0700
Subject: journal/2026: new journal "Gitignore package for Go"

This journal explain how to implement gitignore pattern in Go.
---
 _content/journal/2026/go_gitignore/index.adoc | 240 ++++++++++++++++++++++++++
 _content/journal/2026/index.adoc              |   8 +
 _content/journal/index.adoc                   |   2 +
 3 files changed, 250 insertions(+)
 create mode 100644 _content/journal/2026/go_gitignore/index.adoc
 create mode 100644 _content/journal/2026/index.adoc
diff --git a/_content/journal/2026/go_gitignore/index.adoc b/_content/journal/2026/go_gitignore/index.adoc
new file mode 100644
index 0000000..950abe4
--- /dev/null
+++ b/_content/journal/2026/go_gitignore/index.adoc
@@ -0,0 +1,240 @@
+// SPDX-License-Identifier: CC-BY-SA-4.0
+// SPDX-FileCopyrightText: 2026 M. Shulhan <ms@kilabit.info>
+// REUSE-IgnoreStart
+= Gitignore package for Go
+:toc:
+:sectanchors:
+:lib_git_ref: https://git.sr.ht/~shulhan/pakakeh.go/tree/806359d5462fa8effde5b130da2071ed43d0da56/item/lib/git
+
+== Background
+
+I have several projects that do not use the SPDX license identifiers yet,
+and I want to add them, and if possible convert existing copyright and
+license headers.
+My initial thought was: "is there a tool to help convert the
+license to comply with SPDX?"
+
+I looked it up.
+
+There is one tool that closely matches with my requirements,
+https://reuse.software/[reuse],
+which has options to set copyright, license identifier, and year.
+
+----
+$ reuse annotate \
+	--recursive \
+	--copyright "author <email>" \
+	--year $YEAR \
+	--license "license-name" \
+	.
+----
+
+Example of the result is like below,
+
+----
+@@ -1,7 +1,8 @@
++// SPDX-FileCopyrightText: 2018 M. Shulhan <ms@kilabit.info>
+ // Copyright 2018 Shulhan <ms@kilabit.info>. All rights reserved.
+-// Use of this source code is governed by a BSD-style
+-// license that can be found in the LICENSE file.
++//
++// SPDX-License-Identifier: BSD-3-Clause
+----
+
+It does not remove the old copyright header, I think it is by design.
+I can run a `sed` on those files, to remove the line that start with
+"// Copyright".
+However, there are other problems.
+If multiple files have different copyright year, I need to run another `sed`
+command to correct the years.
+And, what if the file does not have copyright year?
+We need to figure it out from git history the year its created,
+
+----
+$ git log --follow --format=%ad --date=format:%Y $FILE | tail -1
+----
+
+Some big projects, like
+https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/[Linux kernel],
+only set the SPDX license without changing the copyright.
+We can do that.
+Or, we can write another tool that help convert the license headers.
+
+Since this is a long holidays, let's take the hard way, writing a tool to
+convert the old headers to SPDX format.
+This should be simple right?
+
+For each file in directory:
+
+(1) If there is a line prefixed with "// SPDX", skip it continue to the next
+    file. +
+(2) If there is a line prefixed with "// Copyright", capture the year,
+    author, and email using regex, and replace it with
+    "// SPDX-FileCopyrightText: ..." +
+(3) If there is a line contains "^//.*BSD-style" replace it with
+    "// SPDX-License-Identifier: BSD-3-Clause" and remove the line
+    that start with "// license ..." +
+(4) If no "// Copyright" get the year using the above "git log" command and
+    insert the new "// SPDX-FileCopyrightText: ..." using predefined value.
+
+Turns out, there is another problem.
+
+A file can be
+https://reuse.software/faq/#exclude-file[excluded from REUSE compliance]
+if its ignored by git, using the ".gitignore" file.
+And that is why we write parser and checker for Gitignore in Go.
+
+
+== Specification
+
+We use the
+https://git-scm.com/docs/gitignore[gitignore(5)^]
+manual as specification for the implementation.
+
+In short, the rules are as follow:
+
+* Each line is a pattern, that will be matched with file name or path.
+* Empty line is ignored.
+* Line started with '#' is a comment, unless it is escaped with backslash
+  '\\'.
+* Space before and after line are ignored, unless escaped with backslash
+  '\\'.
+* Character '/' is directory separator.
+* Special character '?' in the pattern means match one character except '/'.
+* Special character '*' in the pattern means match zero or more character
+  except '/'.
+* A pattern that end with '/' only match with directory with the same
+  name.
+
+When reading the above rules, my first thought is that this is similar to
+https://pkg.go.dev/path/filepath#Match[filepath.Match^].
+
+I was wrong.
+
+According to the example given in manual, a pattern "foo/" matches with
+"foo" or "a/foo"; but, the result for `filepath.Match` is different,
+----
+fmt.Println(filepath.Match("foo/", "foo"))
+fmt.Println(filepath.Match("foo/", "a/foo"))
+// Output:
+// false <nil>
+// false <nil>
+----
+
+Even if we remove the trailing slash in pattern "foo", the output still
+not as expected,
+
+----
+fmt.Println(filepath.Match("foo", "foo"))
+fmt.Println(filepath.Match("foo", "a/foo"))
+// Output:
+// true <nil>
+// false <nil>
+----
+
+Continuing the rules, there are other special characters that do not
+inline with the [filepath.Match].
+
+* Special character '!' in the beginning of pattern means negation.
+  A file or directory that is excluded by previous pattern, is included
+  again if match with it.
+
+* A pattern "\*\*/foo" means match any file or directory named "foo" with
+  zero or more directory before it.
+* A pattern "foo/\*\*" means match any file or directory inside directory
+  "foo" but not directory named "foo" itself.
+* A pattern "foo/\*\*/bar" means match file or directory named "bar" inside
+  directory "foo", with zero or more directory in between.
+
+
+== Implementation
+
+Based on the above specification, seems like a simple [filepath.Match] or
+[patch.Match] is not sufficient to handle the pattern.
+
+We need to convert those patterns into a regex that complies with the above
+rules:
+
+* If the pattern end with '/', mark it as directory, and remove the
+  trailing '/'.
+
+* Trim the "\*\*/" at the beginning of pattern since it means anything
+  before.
+  Pattern "\*\*/foo" or "\*\*/\*\*/foo" is equal to "foo".
+
+* Ignore the pattern if its end with empty string or only '\*'.
+
+* Now, we need to detect if the pattern contains directory separator '/'.
+  Lets find the index and store it as `$SEP_IDX` for later.
+
+* Escape regex meta-characters '.', '+', '|', '(', and ')' with
+  backslash '\\'.
+
+* Replace single character '\*' with regex "[^/]\*" (accept zero or more
+  characters except "/").
+
+* Replace single character '?' with regex "[^/]" (accept one character
+  except "/").
+
+* Replace string "/\*\*/" with regex "(/.\*)?/" (accept zero or more
+  directories in between).
+
+* Replace string "/\*\*" with regex "/(.\*)" (accept everything inside a
+  directory)
+
+* Replace string "\*\*" with regex "[^/]\*" (second pass for '\*')
+
+* Back to $SEP_IDX,
+** If no directory separator found, prepend the pattern with
+   regex "(/.\*)?/" (accept zero or more directories before).
+** if directory separator is in the beginning or middle of pattern, prepend
+   the pattern with regex "^/?" (do not accept any directory before)
+
+* If the pattern is a directory (end with '/') as we mark before, append
+  back the '/' with '$'; otherwise append regex "/?$" (accept file or
+  directory).
+
+For example, here is the list of pattern and its conversion to regex,
+
+* foo or \*\*/foo   => \^(.\*/|/)?foo/?$
+* foo\*             => \^(.\*/|/)?foo[\^/]\*/?$
+* foo?              => \^(.\*/|/)?foo[\^/]/?$
+* foo/ or \*\*/foo/ => \^(.\*/|/)?foo/$
+* foo/\*\*          => \^(.\*/|/)?foo/(.*)/?$
+* /foo              => \^/?foo/?$
+* /foo/             => \^/?foo/$
+* foo/bar           => \^(.\*/|/)?foo/bar/?$
+* foo/bar/          => \^(.\*/|/)?foo/bar/$
+* /foo/bar          => \^/?foo/bar/?$
+* foo/\*\*/bar => \^/?foo(/.*)?/bar/?$
+
+The result of the implementation can be viewed here:
+https://git.sr.ht/~shulhan/pakakeh.go/tree/main/item/lib/git/[lib/git^].
+
+
+The APIs are quite simple.
+First, load the ".gitignore" from directory using
+{lib_git_ref}/gitignore.go#L37[`LoadGitignore()`^],
+and then check if path is excluded using
+{lib_git_ref}/git.go#L246[`IsIgnored()`^].
+
+----
+func LoadGitignore(dir string) (ign *Gitignore)
+----
+
+LoadGitignore load the gitignore file inside directory `dir`. Any invalid
+pattern will be ignored.
+
+----
+func (ign *Gitignore) IsIgnored(path string) bool
+----
+
+IsIgnored return true if the `path` is ignored by this Gitignore content.
+The `path` is relative to Gitignore directory.
+
+There is also a type
+{lib_git_ref}/ignore_pattern.go[`IgnorePattern`^]
+that one can import and use for other implementation, for example handling
+`path` value in REUSE.toml annotations table.
+
+// REUSE-IgnoreEnd
diff --git a/_content/journal/2026/index.adoc b/_content/journal/2026/index.adoc
new file mode 100644
index 0000000..3f0c5cd
--- /dev/null
+++ b/_content/journal/2026/index.adoc
@@ -0,0 +1,8 @@
+// SPDX-License-Identifier: CC-BY-SA-4.0
+// SPDX-FileCopyrightText: 2026 M. Shulhan <ms@kilabit.info>
+
+=== 2026
+
+link:/journal/2026/go_gitignore/[Gitignore package for Go].
+My thoughts when implementing gitignore parser and checker for Go
+programming language.
diff --git a/_content/journal/index.adoc b/_content/journal/index.adoc
index ec6df32..18c361d 100644
--- a/_content/journal/index.adoc
+++ b/_content/journal/index.adoc
@@ -3,6 +3,8 @@
 
 :toc:
 
+include::./2026/index.adoc[]
+
 include::./2025/index.adoc[]
 
 include::./2024/index.adoc[]
-- 
cgit v1.3