aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorShulhan <ms@kilabit.info>2026-01-15 23:21:53 +0700
committerShulhan <ms@kilabit.info>2026-01-15 23:21:53 +0700
commit65045c80372bd2781d8e950f7e64951c662a6f39 (patch)
tree6476082031f1496073ab3ca6f191aec1eb5f4c69
parent095cee86d92b2e69419caaa5d83e7cbf92e38a83 (diff)
downloadspdxconv-65045c80372bd2781d8e950f7e64951c662a6f39.tar.xz
README: update the wording and grammars
-rw-r--r--README.md309
1 files changed, 167 insertions, 142 deletions
diff --git a/README.md b/README.md
index f1f15e9..d2a2ba2 100644
--- a/README.md
+++ b/README.md
@@ -5,65 +5,87 @@ SPDX-FileCopyrightText: 2026 M. Shulhan <ms@kilabit.info>
# spdxconv
-spdxconv is a tool to convert existing license and copyright into
+spdxconv is a program to convert existing licenses and copyrights into
[SPDX](https://spdx.dev/)
-or insert the new identifiers.
+identifiers or insert new ones.
-This tool works in tandem with [REUSE software](https://reuse.software).
+This program works in tandem with [REUSE software](https://reuse.software).
-Features,
+Features:
-- Detect annotations from REUSE configuration (REUSE.toml)
-- Customizable values for default license identifier and copyright
-- Customizable pattern for setting comment syntax based on file name
-- Customizable pattern for searching and capturing existing license through
- regex
-- Customizable pattern for searching and capturing existing copyright year,
- author, and contact through regex
-- Derive the copyright year from the first commit in git history.
+- **REUSE Integration:** Detects annotations from `REUSE.toml`.
+- **Customizable Defaults:** Set default license identifiers and copyright
+ holders.
+- **Smart Comments:** Customizable patterns to set comment syntax based on
+ file names.
+- **Regex Extraction:** Capture existing licenses, years, authors, and
+ contact info using regex.
+- **Git Integration:** Automatically derives the copyright year from the
+ first commit in git history.
## Background
-Converting the license and copyright in the project to become compliant
-with the SPDX headers is very tedious works, especially if we have so many
-files with different year, copyright, and licenses.
+Converting the license and copyright in a project to become compliant with SPDX
+headers is very tedious work, especially if you have many files with
+different years, copyrights, and licenses.
-This program help to do that by using pattern matching, search, replace, and
-deletion.
+This program helps to do that by using pattern matching, search, replace,
+and deletion.
+
+## Prerequisites
+
+The following program is needed to build and install the tool:
+
+- [Go tools](https://go.dev/dl/) (latest version recommended)
+
+## Installation
+
+The following command will build and install the program into your `$GOBIN`
+directory:
+
+```bash
+$ go install git.sr.ht/~shulhan/spdxconv/cmd/spdxconv@latest
+```
+
+To check the value of `$GOBIN`, run:
+
+```
+$ go env GOBIN
+```
## Usage
-Converting to SPDX is trial-and-error tasks.
-This tool does not guarantee that the conversion will success in one cycle.
-So, to help with it, we provides three commands: `init`, `scan`, and
-`apply`.
+Converting to SPDX is a trial-and-error task.
+This program does not guarantee that the conversion will succeed in one
+cycle.
+To help with this, we provide three commands: `init`, `scan`, and `apply`.
-The init command create the "spdxconv.cfg" configuration in the current
+The `init` command creates the `spdxconv.cfg` configuration in the current
directory.
-The configuration file teach the tool how to scan and apply the license and
-copyright.
+This configuration file teaches the program how to scan and apply the
+license and copyright.
-The scan command list the files that need to be converted or inserted with
-SPDX identifiers into a file named "spdxconv.report".
-User then can inspect and modify the report to see and edit which files
-needs to proceed or not.
+The `scan` command lists the files that need to be converted or inserted
+with SPDX identifiers into a file named `spdxconv.report`.
+Users can then inspect and modify the report to see which files need to
+proceed.
-The apply command read the `spdxconv.report` and apply the license and
+The `apply` command reads `spdxconv.report` and applies the license and
copyright as stated.
-User then can repeat edit "spdxconv.cfg", "scan" and "apply" command
-multiple times, until they satisfied with the result.
+Users can repeat the edit "spdxconv.cfg", `scan`, and `apply` commands
+multiple times until they are satisfied with the result.
-### init command
+### The `init` command
-The first thing to do is to generate the configuration file using
+The first thing to do is to generate the configuration file using:
```
$ spdxconv init
```
-This will create the `spdxconv.cfg` file in the current directory with the
-following content,
+This create the `spdxconv.cfg` file in the current directory with the
+following content (subject to changes in the future),
```
[default]
@@ -132,101 +154,100 @@ delete_line_after = "^(//+|#+|\\*+/|--+>|--+)$"
```
The configuration use the `ini` file format.
-You need to modify it by filling the "default" section before running the
-`scan` or `apply` command.
-You can add match-file-comment, match-license and match-copyright
-section as required, or modify the existing one to match with your use case.
+You must fill in the `[default]` section before running other commands.
+
+You can add `match-file-comment`, `match-license` and `match-copyright`
+section as required, or modify the existing one to match your use case.
-For quick references here are several rules that you need to be aware of,
+For quick reference, here are several rules that you need to be aware of:
-- The regex value must be enclosed in double quote
-- Backslash '\\' character must be escaped. For example, regex for space
- "\\s" must be written as "\\\\s".
+- The regex value must be enclosed in double quotes.
+- The backslash '\\' character must be escaped. For example, a regex for
+ space "\\s" must be written as "\\\\s".
-The next subsection explain the content of configuration file and how it
-affect the program during scan and apply.
+The next subsection explains the content of configuration file and how it
+affects the program during `scan` and `apply`.
-#### default section
+#### The `default` section
-This section define the default license identifier, year, and copyright text
-to be inserted into file if no `match-license` or `match-copyright` found in
-the file.
+This section defines the default license identifier, year, and copyright
+text to be inserted into a file if no `match-license` or `match-copyright`
+found.
-The `license_identifier` set the default license using one of SPDX license
-identifier from [https://spdx.org/licenses/]() .
-For example, `GPL-3.0-only` for GNU General Public License v3.0 only.
+The `license_identifier` sets the default license using one of SPDX license
+identifiers from [https://spdx.org/licenses/]() .
+For example, `GPL-3.0-only`.
-The `copyright_year` set the default year to be used in
+The `copyright_year` sets the default year to be used in
`SPDX-FileCopyrightText`.
-The year can be a single year (for example "2026"), range of year (for
-example, "2000-2026"), or list of year with comma separated (for example,
-"2000,2001,2026"); as long as there is no space in between.
+The year can be a single year (for example "2026"), a range of years (for
+example, "2000-2026"), or list of years separated by comma (for example,
+"2000,2001,2026"); as long as there are no spaces in between.
-The `file_copyright_text` set the default author and contact in
+The `file_copyright_text` sets the default author and contact in
`SPDX-FileCopyrightText`.
For example, "John Doe \<john.doe@example\>".
You should fill the `license_identifier`, `copyright_year`, and
`file_copyright_text` before continue running the program.
-The `max_line_match` define the number of lines to be searched at the
-top and bottom of file for `SPDX-*` identifiers, `match-license`, and
-`match-copyright` before the program insert the default values.
-The default values is 10.
+The `max_line_match` defines the number of lines to be searched at the
+top and bottom of the file for `SPDX-*` identifiers, and `match-license`
+pattern, and `match-copyright` pattern; before the program insert the
+default values.
+The default value is 10.
-### match-file-comment section
+### The `match-file-comment` section
-The first thing that the program do is to detect which comment prefix and
-suffix to be used when inserting SPDX identifiers in the file.
+The first thing that the program does is detect which comment prefix and
+suffix to be used when inserting SPDX identifiers.
For each pattern in the "match-file-comment" section, the program will match
-it with file name to get the comment `prefix` and `suffix`.
+it against the file name to get the comment `prefix` and `suffix`.
-User can add their own "match-file-comment" section as they like or modify
-the existing one.
+User can add their own "match-file-comment" sections as they like or modify
+the existing ones.
-The "match-file-comment" can have empty prefix and suffix.
-That means, if the file name match, it will create new file with ".license"
-suffix that contains SPDX identifiers only, instead of inserting to the file
-directly.
+The "match-file-comment" can have an empty prefix and suffix.
+That means if the file name matches, it will create new file with a
+".license" suffix containing the SPDX identifiers, instead of inserting
+them into the file directly.
-If the file name does not match with one of the "match-file-pattern" then
+If the file name does not match one of the "match-file-pattern" entries,
the file will be flagged as "unknown".
-#### match-license section
+#### The `match-license` section
<!-- REUSE-IgnoreStart -->
-After program detect the file comment syntax to use, then it will search for
-line that match with "SPDX-License-Identifier:".
+After program detects the file comment syntax to use, it searches for a
+line that matches with "SPDX-License-Identifier:".
<!-- REUSE-IgnoreEnd -->
-If there is a match, at the top or bottom, the scan will stop and continue
-for processing copyright.
+If there is a match at the top or bottom, the scan will stop and continue
+to processing copyright.
-If no match it will search for a line that match with "pattern"
+If there is no match, it will search for a line that match with "pattern"
regular expression.
-If there is a line that match with it, the value in
+If a line matches, the value in
"match-license::license_identifier" will replace the
"default::license_identifier" value.
-If there is "delete_line_before" or "delete_line_after" defined, it will
+If "delete_line_before" or "delete_line_after" is defined, it will
search for the pattern before and after the matched line and delete it.
-The "delete_line_before" and "delete_line_after" can be defined zero or
-multiple times.
+These can be defined zero or multiple times.
-#### match-copyright section
+#### The `match-copyright` section
-The match-copyright section define the pattern to match with old copyright
-text.
-The regex must contains named group to capture copyright year, author, and
+The match-copyright section defines the pattern to match old copyright text.
+The regex must contain named group to capture copyright year, author, and
contact.
-If no copyright year found on the file, program will derive the year from
-the date of the first commit in history of the file using the Source Code
-Management (SCM).
+If no copyright year is found in the file, the program will derive the year
+from the date of the first commit in the history of the file using the
+Source Code Management (SCM).
In git SCM, it will run "git log --follow file".
For example, given the following old copyright text,
@@ -241,34 +262,31 @@ we can capture the year, author, and contact using the following regex,
^//+\\s*Copyright\\s+(?<year>\\d{4}),?\\s+(?<author>.*)\\s+<(?<contact>.*)>.*$"
```
-The `match-copyright` section can also contains zero or more
+The `match-copyright` section can also contain zero or more
`delete_line_before` and `delete_line_after` patterns.
-The `delete_line_before` delete lines before matched line pattern, and
-`delete_line_after` contains regex to delete lines after matched line
-pattern.
-### scan command
+### The `scan` command
-The scan command scan the files that need to be converted or inserted with
-SPDX identifiers in the current directory.
-The result of scan is stored inside a report file named "spdxconv.report".
-There are no other files modified during and after scan completed.
+The scan command scans the files that need to be converted or inserted with
+SPDX identifiers in the current directory, recursively.
+The result is stored inside a report file named "spdxconv.report".
+No other files are modified during and after the scan completed.
-User then can inspect and modify the report to exclude certain files or
-changes the behaviour of apply command.
-Deleting a line in the report means excluding the file from being processed
-by "apply" command.
+Users can inspect and modify the report to exclude certain files to
+changes the behaviour of `apply` command.
+Deleting a line in the report means excluding the file from being processed.
-The scan command work in the following way,
+The scan command works in the following way,
-(0) Skip the file if its ignored by git or already annotated in REUSE.toml
-configuration.
+(0) **Skip** the file if it is ignored by git or already annotated in the
+`REUSE.toml` configuration.
-(1) Check the file for SPDX-License-Identifier and SPDX-FileCopyrightText.
+(1) Check the file for `SPDX-License-Identifier` and
+`SPDX-FileCopyrightText`.
If both exist, skip the file.
-(2) If SPDX-License-Identifier line not exist, find the old license using
-the match-license sections.
+(2) If SPDX-License-Identifier line does not exist, find the old license
+using the `match-license` sections.
For each match-license in the configuration,
@@ -278,8 +296,8 @@ into the report.
(2.2) If no match, use the default license from configuration, record it as
"default" with "0" as line number in the report.
-(3) If SPDX-FileCopyrightText line not exist, find the old copyright text
-using the match-copyright sections.
+(3) If `SPDX-FileCopyrightText` line does not exist, find the old copyright
+text using the match-copyright sections.
For each match-copyright in the configuration,
@@ -294,7 +312,7 @@ configuration.
(3.2) If there is no match, use default copyright year and text from
configuration, and record it as "default" in the report.
-#### spdxconv.report file format
+#### The `spdxconv.report` file format
Each line in the report file is formatted using CSV and has several columns
separated by comma,
@@ -321,10 +339,9 @@ copyright_id = "default" | "exist" | "match"
idx_copyright_id = 1 * decimal_digit
```
-The `path` column define the path to the file that will be processed by
-`apply` command.
+The `path` column defines the path to the file.
-The `license_id` column define the license identifier to be used.
+The `license_id` column defines the license identifier to be used.
The value is either,
- default - insert new identifier and using the default license_identifier
@@ -334,7 +351,7 @@ The value is either,
- match - one of the pattern in match-license found in file at line number
set in `idx_license_id`.
-The `idx_license_id` define the line number in file where license_id is
+The `idx_license_id` defines the line number in file where license_id is
"exist" or "match".
Positive value means match found at the top, and negative value means match
found at the bottom.
@@ -364,12 +381,11 @@ found at the bottom.
The `comment_prefix` and `comment_suffix` contains the prefix and suffix
used as comment in the file.
-#### spdxconv.report file groups
+#### The `spdxconv.report` file groups
-Each file in the report file is collected into three groups: regular,
-binary, unknown, done files.
-Each group is separated by line prefixed with "//spdxconv:" and its
-identifier,
+Files are collected into four groups: **regular**, **binary**, **unknown**,
+and **done**.
+Each group is separated by line prefixed with "//spdxconv:" in the report:
```
//spdxconv:regular
@@ -382,42 +398,51 @@ identifier,
...
```
-Regular group are list of file where program can detect its file comment to
-be used.
+**Regular group**: Files where the program can detect the comment syntax.
Program will insert the new SPDX identifiers into the file using the
comment syntax.
-Binary group are list of non-text file, for example images (like jpg, png)
+**Binary group**: Non-text file, for example images (like jpg, png)
or executable files.
-For binary file, program will create new file with the same name plus
-additional suffix ".license".
+The program will create a separate `.license` file.
Inside those "$name.license" file, the new SPDX identifiers will be inserted
as defined in the report.
-Unknown group are list of file where program cannot detect the file comment
-to be used.
-This files will not be processed, it is listed here so user can inspect,
-modify the configuration, and rerun the scan command for the next cycle.
+**Unknown group**: Files where the program cannot detect the comment syntax.
+These files will not be processed; they are listed so user can inspect,
+modify the configuration, and rerun the `scan` command again in the next
+cycle.
-Done group are list of file that already has SPDX identifiers.
+**Done group**: Files that already have SPDX identifiers.
File in regular and binary group that has been applied will be moved here.
-### apply command
+### The `apply` command
+
+The `apply` command reads the `spdxconv.report` and applies the license and
+copyright to the files as stated.
-The apply command read the "spdxconv.report" and apply the license and
-copyright in the file as stated on each line in the report.
+Any failed operations will be logged to `stdout`.
-Any failed operation on file will be logged to stdout.
+Once a file from regular or binary group is successfully processed, it will
+be moved to the **done** group.
-Once completed, it will write back the report file.
+## License
+
+This software is licensed under `GPL-3.0-only`.
+See the file `LICENSE` for full text.
## References
-[SPDX License List](https://spdx.org/licenses/).
-The SPDX License List includes a standardized short identifier, the full
-name, the license text, and a canonical permanent URL for each license and
-exception.
+- [SPDX License List](https://spdx.org/licenses/): Standardized short
+ identifier.
+
+- [REUSE FAQ](https://reuse.software/faq/): Common questions on licensing best
+ practices.
+
+## Links
+
+- [Project website](https://kilabit.info/project/spdxconv/).
+
+- [Changelog](https://kilabit.info/project/spdxconv/CHANGELOG.html).
-[REUSE FAQ](https://reuse.software/faq/).
-This page lists common questions and their answers when dealing with
-licensing and copyright, and with the adoption of REUSE specifically.
+- [Git repository](https://git.sr.ht/~shulhan/spdxconv/)