diff options
| author | Shulhan <ms@kilabit.info> | 2026-01-15 23:21:53 +0700 |
|---|---|---|
| committer | Shulhan <ms@kilabit.info> | 2026-01-15 23:21:53 +0700 |
| commit | 65045c80372bd2781d8e950f7e64951c662a6f39 (patch) | |
| tree | 6476082031f1496073ab3ca6f191aec1eb5f4c69 | |
| parent | 095cee86d92b2e69419caaa5d83e7cbf92e38a83 (diff) | |
| download | spdxconv-65045c80372bd2781d8e950f7e64951c662a6f39.tar.xz | |
README: update the wording and grammars
| -rw-r--r-- | README.md | 309 |
1 files changed, 167 insertions, 142 deletions
@@ -5,65 +5,87 @@ SPDX-FileCopyrightText: 2026 M. Shulhan <ms@kilabit.info> # spdxconv -spdxconv is a tool to convert existing license and copyright into +spdxconv is a program to convert existing licenses and copyrights into [SPDX](https://spdx.dev/) -or insert the new identifiers. +identifiers or insert new ones. -This tool works in tandem with [REUSE software](https://reuse.software). +This program works in tandem with [REUSE software](https://reuse.software). -Features, +Features: -- Detect annotations from REUSE configuration (REUSE.toml) -- Customizable values for default license identifier and copyright -- Customizable pattern for setting comment syntax based on file name -- Customizable pattern for searching and capturing existing license through - regex -- Customizable pattern for searching and capturing existing copyright year, - author, and contact through regex -- Derive the copyright year from the first commit in git history. +- **REUSE Integration:** Detects annotations from `REUSE.toml`. +- **Customizable Defaults:** Set default license identifiers and copyright + holders. +- **Smart Comments:** Customizable patterns to set comment syntax based on + file names. +- **Regex Extraction:** Capture existing licenses, years, authors, and + contact info using regex. +- **Git Integration:** Automatically derives the copyright year from the + first commit in git history. ## Background -Converting the license and copyright in the project to become compliant -with the SPDX headers is very tedious works, especially if we have so many -files with different year, copyright, and licenses. +Converting the license and copyright in a project to become compliant with SPDX +headers is very tedious work, especially if you have many files with +different years, copyrights, and licenses. -This program help to do that by using pattern matching, search, replace, and -deletion. +This program helps to do that by using pattern matching, search, replace, +and deletion. + +## Prerequisites + +The following program is needed to build and install the tool: + +- [Go tools](https://go.dev/dl/) (latest version recommended) + +## Installation + +The following command will build and install the program into your `$GOBIN` +directory: + +```bash +$ go install git.sr.ht/~shulhan/spdxconv/cmd/spdxconv@latest +``` + +To check the value of `$GOBIN`, run: + +``` +$ go env GOBIN +``` ## Usage -Converting to SPDX is trial-and-error tasks. -This tool does not guarantee that the conversion will success in one cycle. -So, to help with it, we provides three commands: `init`, `scan`, and -`apply`. +Converting to SPDX is a trial-and-error task. +This program does not guarantee that the conversion will succeed in one +cycle. +To help with this, we provide three commands: `init`, `scan`, and `apply`. -The init command create the "spdxconv.cfg" configuration in the current +The `init` command creates the `spdxconv.cfg` configuration in the current directory. -The configuration file teach the tool how to scan and apply the license and -copyright. +This configuration file teaches the program how to scan and apply the +license and copyright. -The scan command list the files that need to be converted or inserted with -SPDX identifiers into a file named "spdxconv.report". -User then can inspect and modify the report to see and edit which files -needs to proceed or not. +The `scan` command lists the files that need to be converted or inserted +with SPDX identifiers into a file named `spdxconv.report`. +Users can then inspect and modify the report to see which files need to +proceed. -The apply command read the `spdxconv.report` and apply the license and +The `apply` command reads `spdxconv.report` and applies the license and copyright as stated. -User then can repeat edit "spdxconv.cfg", "scan" and "apply" command -multiple times, until they satisfied with the result. +Users can repeat the edit "spdxconv.cfg", `scan`, and `apply` commands +multiple times until they are satisfied with the result. -### init command +### The `init` command -The first thing to do is to generate the configuration file using +The first thing to do is to generate the configuration file using: ``` $ spdxconv init ``` -This will create the `spdxconv.cfg` file in the current directory with the -following content, +This create the `spdxconv.cfg` file in the current directory with the +following content (subject to changes in the future), ``` [default] @@ -132,101 +154,100 @@ delete_line_after = "^(//+|#+|\\*+/|--+>|--+)$" ``` The configuration use the `ini` file format. -You need to modify it by filling the "default" section before running the -`scan` or `apply` command. -You can add match-file-comment, match-license and match-copyright -section as required, or modify the existing one to match with your use case. +You must fill in the `[default]` section before running other commands. + +You can add `match-file-comment`, `match-license` and `match-copyright` +section as required, or modify the existing one to match your use case. -For quick references here are several rules that you need to be aware of, +For quick reference, here are several rules that you need to be aware of: -- The regex value must be enclosed in double quote -- Backslash '\\' character must be escaped. For example, regex for space - "\\s" must be written as "\\\\s". +- The regex value must be enclosed in double quotes. +- The backslash '\\' character must be escaped. For example, a regex for + space "\\s" must be written as "\\\\s". -The next subsection explain the content of configuration file and how it -affect the program during scan and apply. +The next subsection explains the content of configuration file and how it +affects the program during `scan` and `apply`. -#### default section +#### The `default` section -This section define the default license identifier, year, and copyright text -to be inserted into file if no `match-license` or `match-copyright` found in -the file. +This section defines the default license identifier, year, and copyright +text to be inserted into a file if no `match-license` or `match-copyright` +found. -The `license_identifier` set the default license using one of SPDX license -identifier from [https://spdx.org/licenses/]() . -For example, `GPL-3.0-only` for GNU General Public License v3.0 only. +The `license_identifier` sets the default license using one of SPDX license +identifiers from [https://spdx.org/licenses/]() . +For example, `GPL-3.0-only`. -The `copyright_year` set the default year to be used in +The `copyright_year` sets the default year to be used in `SPDX-FileCopyrightText`. -The year can be a single year (for example "2026"), range of year (for -example, "2000-2026"), or list of year with comma separated (for example, -"2000,2001,2026"); as long as there is no space in between. +The year can be a single year (for example "2026"), a range of years (for +example, "2000-2026"), or list of years separated by comma (for example, +"2000,2001,2026"); as long as there are no spaces in between. -The `file_copyright_text` set the default author and contact in +The `file_copyright_text` sets the default author and contact in `SPDX-FileCopyrightText`. For example, "John Doe \<john.doe@example\>". You should fill the `license_identifier`, `copyright_year`, and `file_copyright_text` before continue running the program. -The `max_line_match` define the number of lines to be searched at the -top and bottom of file for `SPDX-*` identifiers, `match-license`, and -`match-copyright` before the program insert the default values. -The default values is 10. +The `max_line_match` defines the number of lines to be searched at the +top and bottom of the file for `SPDX-*` identifiers, and `match-license` +pattern, and `match-copyright` pattern; before the program insert the +default values. +The default value is 10. -### match-file-comment section +### The `match-file-comment` section -The first thing that the program do is to detect which comment prefix and -suffix to be used when inserting SPDX identifiers in the file. +The first thing that the program does is detect which comment prefix and +suffix to be used when inserting SPDX identifiers. For each pattern in the "match-file-comment" section, the program will match -it with file name to get the comment `prefix` and `suffix`. +it against the file name to get the comment `prefix` and `suffix`. -User can add their own "match-file-comment" section as they like or modify -the existing one. +User can add their own "match-file-comment" sections as they like or modify +the existing ones. -The "match-file-comment" can have empty prefix and suffix. -That means, if the file name match, it will create new file with ".license" -suffix that contains SPDX identifiers only, instead of inserting to the file -directly. +The "match-file-comment" can have an empty prefix and suffix. +That means if the file name matches, it will create new file with a +".license" suffix containing the SPDX identifiers, instead of inserting +them into the file directly. -If the file name does not match with one of the "match-file-pattern" then +If the file name does not match one of the "match-file-pattern" entries, the file will be flagged as "unknown". -#### match-license section +#### The `match-license` section <!-- REUSE-IgnoreStart --> -After program detect the file comment syntax to use, then it will search for -line that match with "SPDX-License-Identifier:". +After program detects the file comment syntax to use, it searches for a +line that matches with "SPDX-License-Identifier:". <!-- REUSE-IgnoreEnd --> -If there is a match, at the top or bottom, the scan will stop and continue -for processing copyright. +If there is a match at the top or bottom, the scan will stop and continue +to processing copyright. -If no match it will search for a line that match with "pattern" +If there is no match, it will search for a line that match with "pattern" regular expression. -If there is a line that match with it, the value in +If a line matches, the value in "match-license::license_identifier" will replace the "default::license_identifier" value. -If there is "delete_line_before" or "delete_line_after" defined, it will +If "delete_line_before" or "delete_line_after" is defined, it will search for the pattern before and after the matched line and delete it. -The "delete_line_before" and "delete_line_after" can be defined zero or -multiple times. +These can be defined zero or multiple times. -#### match-copyright section +#### The `match-copyright` section -The match-copyright section define the pattern to match with old copyright -text. -The regex must contains named group to capture copyright year, author, and +The match-copyright section defines the pattern to match old copyright text. +The regex must contain named group to capture copyright year, author, and contact. -If no copyright year found on the file, program will derive the year from -the date of the first commit in history of the file using the Source Code -Management (SCM). +If no copyright year is found in the file, the program will derive the year +from the date of the first commit in the history of the file using the +Source Code Management (SCM). In git SCM, it will run "git log --follow file". For example, given the following old copyright text, @@ -241,34 +262,31 @@ we can capture the year, author, and contact using the following regex, ^//+\\s*Copyright\\s+(?<year>\\d{4}),?\\s+(?<author>.*)\\s+<(?<contact>.*)>.*$" ``` -The `match-copyright` section can also contains zero or more +The `match-copyright` section can also contain zero or more `delete_line_before` and `delete_line_after` patterns. -The `delete_line_before` delete lines before matched line pattern, and -`delete_line_after` contains regex to delete lines after matched line -pattern. -### scan command +### The `scan` command -The scan command scan the files that need to be converted or inserted with -SPDX identifiers in the current directory. -The result of scan is stored inside a report file named "spdxconv.report". -There are no other files modified during and after scan completed. +The scan command scans the files that need to be converted or inserted with +SPDX identifiers in the current directory, recursively. +The result is stored inside a report file named "spdxconv.report". +No other files are modified during and after the scan completed. -User then can inspect and modify the report to exclude certain files or -changes the behaviour of apply command. -Deleting a line in the report means excluding the file from being processed -by "apply" command. +Users can inspect and modify the report to exclude certain files to +changes the behaviour of `apply` command. +Deleting a line in the report means excluding the file from being processed. -The scan command work in the following way, +The scan command works in the following way, -(0) Skip the file if its ignored by git or already annotated in REUSE.toml -configuration. +(0) **Skip** the file if it is ignored by git or already annotated in the +`REUSE.toml` configuration. -(1) Check the file for SPDX-License-Identifier and SPDX-FileCopyrightText. +(1) Check the file for `SPDX-License-Identifier` and +`SPDX-FileCopyrightText`. If both exist, skip the file. -(2) If SPDX-License-Identifier line not exist, find the old license using -the match-license sections. +(2) If SPDX-License-Identifier line does not exist, find the old license +using the `match-license` sections. For each match-license in the configuration, @@ -278,8 +296,8 @@ into the report. (2.2) If no match, use the default license from configuration, record it as "default" with "0" as line number in the report. -(3) If SPDX-FileCopyrightText line not exist, find the old copyright text -using the match-copyright sections. +(3) If `SPDX-FileCopyrightText` line does not exist, find the old copyright +text using the match-copyright sections. For each match-copyright in the configuration, @@ -294,7 +312,7 @@ configuration. (3.2) If there is no match, use default copyright year and text from configuration, and record it as "default" in the report. -#### spdxconv.report file format +#### The `spdxconv.report` file format Each line in the report file is formatted using CSV and has several columns separated by comma, @@ -321,10 +339,9 @@ copyright_id = "default" | "exist" | "match" idx_copyright_id = 1 * decimal_digit ``` -The `path` column define the path to the file that will be processed by -`apply` command. +The `path` column defines the path to the file. -The `license_id` column define the license identifier to be used. +The `license_id` column defines the license identifier to be used. The value is either, - default - insert new identifier and using the default license_identifier @@ -334,7 +351,7 @@ The value is either, - match - one of the pattern in match-license found in file at line number set in `idx_license_id`. -The `idx_license_id` define the line number in file where license_id is +The `idx_license_id` defines the line number in file where license_id is "exist" or "match". Positive value means match found at the top, and negative value means match found at the bottom. @@ -364,12 +381,11 @@ found at the bottom. The `comment_prefix` and `comment_suffix` contains the prefix and suffix used as comment in the file. -#### spdxconv.report file groups +#### The `spdxconv.report` file groups -Each file in the report file is collected into three groups: regular, -binary, unknown, done files. -Each group is separated by line prefixed with "//spdxconv:" and its -identifier, +Files are collected into four groups: **regular**, **binary**, **unknown**, +and **done**. +Each group is separated by line prefixed with "//spdxconv:" in the report: ``` //spdxconv:regular @@ -382,42 +398,51 @@ identifier, ... ``` -Regular group are list of file where program can detect its file comment to -be used. +**Regular group**: Files where the program can detect the comment syntax. Program will insert the new SPDX identifiers into the file using the comment syntax. -Binary group are list of non-text file, for example images (like jpg, png) +**Binary group**: Non-text file, for example images (like jpg, png) or executable files. -For binary file, program will create new file with the same name plus -additional suffix ".license". +The program will create a separate `.license` file. Inside those "$name.license" file, the new SPDX identifiers will be inserted as defined in the report. -Unknown group are list of file where program cannot detect the file comment -to be used. -This files will not be processed, it is listed here so user can inspect, -modify the configuration, and rerun the scan command for the next cycle. +**Unknown group**: Files where the program cannot detect the comment syntax. +These files will not be processed; they are listed so user can inspect, +modify the configuration, and rerun the `scan` command again in the next +cycle. -Done group are list of file that already has SPDX identifiers. +**Done group**: Files that already have SPDX identifiers. File in regular and binary group that has been applied will be moved here. -### apply command +### The `apply` command + +The `apply` command reads the `spdxconv.report` and applies the license and +copyright to the files as stated. -The apply command read the "spdxconv.report" and apply the license and -copyright in the file as stated on each line in the report. +Any failed operations will be logged to `stdout`. -Any failed operation on file will be logged to stdout. +Once a file from regular or binary group is successfully processed, it will +be moved to the **done** group. -Once completed, it will write back the report file. +## License + +This software is licensed under `GPL-3.0-only`. +See the file `LICENSE` for full text. ## References -[SPDX License List](https://spdx.org/licenses/). -The SPDX License List includes a standardized short identifier, the full -name, the license text, and a canonical permanent URL for each license and -exception. +- [SPDX License List](https://spdx.org/licenses/): Standardized short + identifier. + +- [REUSE FAQ](https://reuse.software/faq/): Common questions on licensing best + practices. + +## Links + +- [Project website](https://kilabit.info/project/spdxconv/). + +- [Changelog](https://kilabit.info/project/spdxconv/CHANGELOG.html). -[REUSE FAQ](https://reuse.software/faq/). -This page lists common questions and their answers when dealing with -licensing and copyright, and with the adoption of REUSE specifically. +- [Git repository](https://git.sr.ht/~shulhan/spdxconv/) |
