spdxconv - Program to convert existing licenses and copyrights into SPDX format.

Age	Commit message (Collapse)	Author
2026-01-15	all: allow multiple pattern in match-file-comment	Shulhan
	This makes the configuration more concise where pattern can be split into multi lines. While at it, add more pattern to match-file-comment.
2026-01-15	all: add test for Scan on new file without git history	Shulhan
	New file without git history should use the year from configuration.
2026-01-15	spdxconv: set the SCM only during scan command, not on New	Shulhan
	The Apply command does not require SCM, so no need to initialize it in New.
2026-01-15	report: do not return an error if report file does not exist	Shulhan
	When scan running, it will try to load the previous report file to minimize re-scanning of file that has been applied or detected (regular or binary). If the report file does not exist, do not return the error, keep going.
2026-01-15	all: move checking REUSE annotation after all files listed	Shulhan
	Use case: on the first scan, the file result in group unknown. User then modify the spdxconv.cfg to add or update the match-file-pattern. The next scan should check again the files in unknown group, in case its match with updated config.
2026-01-15	all: add match-file-pattern for common files	Shulhan
	The Makefile, go.mod, and go.sum are known file names.
2026-01-15	all: add group "done" in the report	Shulhan
	The done group contain list of file that has been processed. File from group regular and binary that has been modified or added with SPDX format will be moved to here.
2026-01-15	file: change the flag for grouping to use int instead of bool	Shulhan
	Previously, we use two boolean fields to flag a file as binary and unknown. In order to simplify it in the future we change it to int.
2026-01-14	all: handle binary file in post-scan	Shulhan
	File that match with match-file-pattern but without prefix and suffix will be marked as binary.
2026-01-14	match_file_comment: do not add space on prefix and suffix	Shulhan
	The space should be added only when generating the SPDX lines. In the spdxconv.report, it should print the prefix and suffix as in match-file-comment.
2026-01-14	all: detect annotation from REUSE configuration	Shulhan
	During scan, the program will read the REUSE.toml configuration. File that is already annotated inside REUSE.toml will be ignored during scan.
2026-01-14	all: fix error when scanning and apply empty file	Shulhan

2026-01-14	report: group the file with missing copyright year as unknown	Shulhan

2026-01-14	all: get the copyright year from git history	Shulhan
	If the line that match with pattern on match-copyright does not contains year, or there is no match, try to get the year from the first commit of the file using "git log --follow ..." command. If no commit history or its not using git, use default copyright year from configuration.
2026-01-13	all: split the delete_line_pattern into before and after	Shulhan
	While at it, also add configuration for delete line before and after for match-copyright section.
2026-01-12	all: fix default regex match license and copyright to ignore comment	Shulhan
	Instead of assuming that the comment prefix and space always exists "^(//)\s+..." change it to be optional, so it will works on the multi-line comment. For example, comment and old headers in html, <-- Copyright ... --> there is no comment prefix and space.
2026-01-12	all: remove prefix config from struct matchCopyright and matchLicense	Shulhan

2026-01-12	all: implement match-file-comment for seting comment based on file name	Shulhan
	The first thing that the program do is to detect which comment string to be used when inserting SPDX identifiers in the file. For each pattern in the "match-file-comment" section, the program will match it with file name to get the comment prefix and suffix to be used later. User can add their own "match-file-comment" section as they like or modify the existing one. The "match-file-comment" can have empty prefix and suffix. That means, if the file name match, it will create new file with ".license" suffix that contains SPDX identifiers only, instead of inserting to the file.
2026-01-12	config_match_license: remove unused field DeleteMatch	Shulhan
	The line that match with pattern will be replaced with new SPDX license identifier, so no need to guard it with this flag.
2026-01-11	report: add SPDX identifiers to generated report	Shulhan
	This is to make all files generated by this program to be SPDX compliant.
2026-01-11	all: exclude file that contains both SPDX license and copyright text	Shulhan
	If the file already contains SPDX-License-Identifier and SPDX-FileCopyrightText, in any order, ignore it from being included during scan.
2026-01-11	all: ignore symlink, COPYING, LICENSE, and LICENSES	Shulhan
	For symlink, we ignore for now. COPYING, LICENSE, and LICENSES are common files part of SPDX/reuse specifications. While at it, fix checking if path is ignored by git by passing the relative path instead of base name.
2026-01-11	all: implement apply command	Shulhan
	The apply command read the "spdxconv.report" and apply the license and copyright as stated on each file in the report. A file that has been successfully processed will be removed from the report.
2026-01-10	report: store the index line number and comments in report	Shulhan
	For the index line number, instead of tied to license_id and copyright_id value (separated by ":"), store it in separate column as idx_license_id and idx_copyright_id. For comments, store the prefix and suffix at column 6 and 7 in CSV line.
2026-01-09	report: change the output to use CSV format	Shulhan
	Using space separated with double quote on some fields are not easy to parse. Using CSV allow us to use the [encoding/csv] package from standard library.
2026-01-09	all: implement the scan command	Shulhan
	The scan command scan the files that need to be converted or inserted with SPDX identifiers in the current directory. The result of scan is stored inside a report file named "spdxconv.report". There are no other files modified after scan completed. User then can inspect and modify the report to exclude certain files or changes the behaviour of apply command. Deleting a line in the report means excluding the file from being processed by "apply" command.
2026-01-08	cmd/spdxconv: implement "init" command	Shulhan
	The init command create the spdxconv configuration file in the current directory.
2026-01-08	all: implement conversion for SPDX-License-Identifier	Shulhan
	If the file contains "SPDX-License-Identifier", it will not modify it. The program will move the identifier to the top of file after shebang. If the spdxconv.cfg contains match-license, and the pattern match with one of the line in the file, it will use the license_identifier instead of default one and insert it at the top, after shebang. If the files does not contains the identifier, it will insert new one based on default value in spdxconv.cfg file.
2026-01-06	all: refactoring loadConfig and scanForSCM	Shulhan
	Previously, given the following command, $ spdxconv $path the loadConfig load the configuration from the path directory. This changes it to load the configuration from the current working directory where the tools run, not from $path directory. While for scanForSCM, previously its detect SCM from $path up to "/", now its scan from $path to current working directory only. While at it, we rename the dummySCM type to noSCM.
2026-01-06	spdxconv: tool to convert license and copyright to SPDX format	Shulhan
	This is the initial implementation, work in progress, with the following functions, * loading the spdxconv.cfg file * scanning list of files to be converted * detect .git repository and exclude files ignored by .gitignore No conversion logic is implemented yet.