jarink - Program to inspects and maintains web sites.

Age	Commit message (Collapse)	Author
2025-06-27	Release jarink version 0.2.0 (2025-05-27)v0.2.0	Shulhan
	🌱 brokenlinks: add option to ignore list HTTP status code. When link known to have an issues, one can ignore the status code during scanning broken links using "-ignore-status" option. 🌱 brokenlinks: add option "insecure". The "-insecure" option does not report an error on server with invalid certificates. 🌱 brokenlinks: implement caching for external URLs. Any successful fetch on external URLs will be recorded into jarink cache file, located in user's cache directory. For example, in Linux it would be `$HOME/.cache/jarink/cache.json`. This help improve the future rescanning on the same or different target URL, minimizing network requests. 🌼 brokenlinks: reduce the number of goroutine on scan. Previously, each scan run on one goroutine and the result is pushed using one goroutine. This makes one scan of link consume two goroutine. This changes the scan function to return the result and push it in the same goroutine.
2025-06-27	cmd/jarink: add "version" command	Shulhan
	The version command print the version of the program.
2025-06-27	all: add SPDX copyright and licenses to README	Shulhan

2025-06-27	brokenlinks: reduce the number of goroutines on scan	Shulhan
	Previously, each scan run on one goroutine and the result is pushed using pushResult also in one goroutine. This makes one link consume two goroutines. This changes the scan function to return the result and push it in the same goroutine.
2025-06-27	brokenlinks: implement caching for external URLs	Shulhan
	Any succesful fetch on external URLs, will be recorded into jarink cache file, located in user's home cache directory. For example, in Linux it would be `$HOME/.cache/jarink/cache.json`. This help improve the future rescanning on the same or different target URL, minimizing network requests.
2025-06-19	all: add test cases for simulating slow server	Shulhan
	The test run a server that contains three six pages that contains various [time.Sleep] duration before returning the response. This allow us to see how the main scan loop works, waiting for resultq and listWaitStatus.
2025-06-17	brokenlinks: add test cases for IgnoreStatus options	Shulhan
	There are two test cases, one for invalid status code like "abc", and one for unknown status code like "50".
2025-06-16	brokenlinks: update comment on test case with path	Shulhan

2025-06-16	all: add comment on GoEmbedReadme variable	Shulhan

2025-06-16	brokenlinks: move parsing scanned Url from worker to Options	Shulhan
	Before the Options passed to worker, it should be valid, including the URL to be scanned.
2025-06-16	all: rename README.adoc back to README	Shulhan
	This is for git.sr.ht to be able to render the README.
2025-06-16	brokenlinks: add option "insecure"	Shulhan
	The insecure option will allow and not report as error on server with invalid certificates.
2025-06-13	brokenlinks: add option to ignore list HTTP status code	Shulhan
	When link known to have an issues, one can ignore the status code during scanning broken links using "-ignore-status" option.
2025-06-12	Release jarink version 0.1.0v0.1.0	Shulhan
	The first release of jarink provides the command "brokenlinks", to scan for broken links. The output of this command is list of page with its broken links in JSON format. This command accept the following options, `-verbose`:: Print the page that being scanned to standard error. `-past-result=<path to JSON file>`:: Scan only the pages reported by result from past scan based on the content in JSON file. This minimize the time to re-scan the pages once we have fixed the URLs.
2025-06-12	all: add SPDX license to testdata files	Shulhan

2025-06-12	all: rename README to README.adoc	Shulhan

2025-06-12	all: refactoring, move brokenlinks code to its own package	Shulhan
	When two or more struct has the same prefix that means it is time to move it to group it. Also, we will group one command to one package in the future.
2025-06-12	all: rename the json field page_links to broken_links	Shulhan
	Naming it page_links does not make sense if the result is from brokenlinks command.
2025-06-11	all: revert to use HTTP GET on external, non-image URL	Shulhan
	Using HTTP HEAD on certain page may return * 404, not found, for example on https://support.google.com/accounts/answer/1066447 * 405, method not allowed, for example on https://aur.archlinux.org/packages/rescached-git For 405 response code we can check and retry with GET, but for 404 its impossible to check if the URL is really exist or not, since 404 means page not found.
2025-06-11	all: check for DNS timeout and retry 5 times	Shulhan
	When the call to HTTP HEAD or GET return an error and the error is *net.DNSError with Timeout, retry the call until no error or Timeout again for 5 times.
2025-06-05	all: encode the whole BrokenlinksResult struct to JSON	Shulhan
	Previously, we only encode the BrokenlinksResult.PageLinks. The struct may changes in the future, so its better to encode the whole struct now rather than changing the output later.
2025-06-05	all: add option to scan pass result	Shulhan
	The brokenlinks command now have option "-past-result" that accept path to JSON file from the past result. If its set, the program will only scan the pages with broken links inside that report.
2025-06-05	all: use snake case for JSON fields in Broken result	Shulhan

2025-06-05	all: move TestMain to jarink_test.go file	Shulhan

2025-06-01	all: brokenlinks should scan only URL on given path	Shulhan
	Previously, if we pass the URL with path to brokenlinks, for example "web.tld/path" it will scan all of the pages in the website "web.tld". Now, it only scan the "/path" and its sub paths.
2025-06-01	all: go embed the README and use it on the CLI for help command	Shulhan

2025-06-01	all: use separate logs for worker and main program	Shulhan
	The worker use log with date and time, while the main program is not.
2025-06-01	all: add a simple README	Shulhan
	The README contains the content from the usage function in the "cmd/jarink".
2025-06-01	all: rename the program and repository into jarink	Shulhan
	Jarink is a program to help web administrator to maintains their website. Currently its provides a command to scan for brokenlinks.
2025-05-31	all: record an error due to broken link in HTML anchor or image	Shulhan

2025-05-31	all: record the error when checking the links	Shulhan
	The error message can help user to debug the problems with links.
2025-05-31	cmd/deadlinks: print the result in JSON	Shulhan
	Using JSON as output can be parsed by other tools.
2025-05-31	all: check list of waiting status after processing result	Shulhan
	After all of the result from scan has been checked for seen or not, check for link that waiting for status in the second loop.
2025-05-31	all: use HTTP method HEAD to check external domains	Shulhan
	For link that is not from the same domain being scanned, use the HTTP method HEAD to minimize resources being transported.
2025-05-31	all: use case with ticker instead of default case in run	Shulhan

2025-05-31	all: refactoring the scan work without sync.Mutex	Shulhan
	When using goroutine to process a link, the result than passed to main goroutine through channel. The main goroutine then process the result one by one, check if its has been seen, error, or need to be scanned. In that way, we don't need mutex to guard if link has been seen or not.
2025-05-31	all: use HTTP method HEAD to check for image link	Shulhan
	Using HEAD does not return the content of image, which consume less resources on both end.
2025-05-30	all: turn off log timestamp during testing	Shulhan
	Printing date and time during testing makes the log lines too long.
2025-05-30	cmd/deadlinks: implement the CLI for deadlinks	Shulhan
	The CLI contains one command: scan Its accept single argument: an URL to be scanned, and one option "-verbose".
2025-05-30	all: cleaning up fragment on links	Shulhan
	The fragment part on URL, for example "/page#fragment" should be removed, otherwise it will indexed as different URL.
2025-05-29	all: change the Scan function parameter to struct ScanOptions	Shulhan
	Using struct allow to extends the parameter later without changing the signature.
2025-05-29	all: remove returned error from parsing HTML	Shulhan
	After we check the code and test for [html.Parse] there are no case actual cases where HTML content will return an error. The only possible error is when reading from body (io.Reader), and that is also almost impossible. [html.Parse]: https://go.googlesource.com/net/+/refs/tags/v0.40.0/html/parse.go#2347
2025-05-29	all: add test cases for broken link and invalid URL	Shulhan

2025-05-29	all: ignore HTML page from external domain	Shulhan
	Any HTML link that is from domain other than the scanned domain should net get parsed. It only check if the link is valid or not.
2025-05-29	all: parse only link to HTML page	Shulhan
	For link to image we can skip parsing it.
2025-05-29	testdata/web: remove anchor to external website	Shulhan
	The test should not require internet connection to be passed.
2025-05-29	all: add case for broken HTML	Shulhan
	Turn out broken HTML still get parsed by "net/html" package.
2025-05-29	all: handle case for invalid URL, dead server, and on subpage	Shulhan
	Scanning invalid URL like "127.0.0.1:14594", without HTTP scheme, and "http://127.0.0.1:14594" (server not available) should return an error. Scanning on subpage like "http://127.0.0.1:11836/page2" should return the same result as scanning from the base URL "http://127.0.0.1:11836/page2".
2025-05-27	all: complete the first minimum working implementation	Shulhan
	The current implementation at least cover 84% of the cases. Todo, * CLI for scan * add more test case for 100% coverage, including scan on invalid base URL, scan on invalid HTML page, scan on invalid href or src image
2025-05-22	deadlinks: a program to scan for dead links on website	Shulhan