jarink - Program to inspects and maintains web sites.

Age	Commit message (Collapse)	Author
2026-02-12	Release jarink 0.3.0 (2026-02-12)HEAD v0.3.0 main dev	Shulhan
	🌼 brokenlinks: refactoring the logic, simplify the code Previously, we made the scan logic to run in multiple goroutine with one channel to push and consume the result and another channel to push and pop link to be processed. The logic is a very complicated code, making it hard to read and debug. These changes refactoring it to use single goroutine that push and pop link from/to a slices, as queue. Another refactoring is in where we store the link. Previously, we have [jarink.Link], [brokenlinks.Broken], and [brokenlinks.linkQueue] to store the metadata for a link. These release unified them into struct [jarink.Link]. 🌱 brokenlinks: print the progress to stderr Each time the scan start, new queue add, fetching start, print the message to stderr. This remove the verbose options for better user experience. 🌼 brokenlinks: improve fetch logging and decrease timeout to 10s When fetching, print log after the fetch completed. If success, print the URL along with HTTP status code. If fail, print the URL along with the error. The timeout now reduce to 10 seconds to prevent long delay when working with broken website. 🌼 brokenlinks: mark the link in queue as seen with status code 0 This is to fix double URL being pushed to queue. Given the following queue and its parent, ---- /page2.html => /index.html /brokenPage => /index.html /brokenPage => /page2.html ---- Before scanning the second "/brokenPage" on parent page "/page2.html", check if its seen first to get the status code before we run the scan. This allow jarink report "/brokenPage" as broken link for both pages, not just in "/index.html". 🌼 brokenlinks: skip parsing non-HTML page If the response Content-type return other than "text/html", skip parsing the content and return immediately. We also skip processing "mailto\:" URL. 🌼 brokenlinks: make link that return HTML always end with slash If parent URL like "/page" return the body as HTML page, the URL should be end with slash to make the relative links inside it works when joined with the parent URL. 🌱 brokenlinks: store the anchor or image source in link In the struct `Link`, we add field `Value` that store the `href` from A element or `src` from IMG element. This allow us to debug any error during scan, especially joining path and link. 🌼 brokenlinks: fix possible panic in markAsBroken If the Link does not have `parentUrl`, set the parent URL using the link URL itself. This only happened if the target URL that we will scan return an error.
2026-02-12	all: update the README	Shulhan
	Rewording some paragraphs, formatting on code, and add INSTALL section.
2026-02-12	make: add `build` task	Shulhan
	The build task set the Version information based on the latest tag and number of commits.
2026-02-12	brokenlinks: fix possible panic in markAsBroken	Shulhan
	If the Link does not have parentUrl, set the parent URL using the link URL itself. This only happened if the target URL that we will scan return an error.
2026-02-12	brokenlinks: store the anchor or image source in link	Shulhan
	In the struct Link, we add field Value that store the href from A element or src from IMG element. This allow us to debug any error during scan, especially joining path and link.
2026-02-11	brokenlinks: make link that return HTML always end with slash	Shulhan
	If parent URL like "/page" return the body as HTML page, the URL should be end with slash to make the relative links inside it works when joined with the parent URL.
2026-02-11	go.mod: update all dependencies	Shulhan

2026-02-11	brokenlinks: skip processing "mailto:" URL	Shulhan

2026-02-11	brokenlinks: test links that wrapped by other elements	Shulhan
	This is to see the behaviour of [Node.Descendants] when traversing the element recursively.
2026-02-11	brokenlinks: check if link has been seen before scan	Shulhan
	Given the following queue and its parent, /page2.html => /index.html /brokenPage => /index.html /brokenPage => /page2.html Before scanning the second "/brokenPage" on parent page "/page2.html", check if its seen first to get the status code before we run the scan. This allow jarink report "/brokenPage" as broken link for both pages, not just in "/index.html".
2026-02-05	brokenlinks: check for redirect during scan	Shulhan
	If the request redirected, use the "Location" value in the response header as the parent URL instead of from the original link in queue.
2026-02-04	brokenlinks: fix generating relative URL	Shulhan
	If the parent URL end with .html or .htm, join the directory of parent instead of the current path with the relative path.
2026-02-04	brokenlinks: skip parsing non-HTML page	Shulhan
	If the response Content-type return other than "text/html", skip parsing the content and return immediately.
2026-02-04	brokenlinks: mark the link in queue as seen with status code 0	Shulhan
	This is to fix double URL being pushed to queue.
2026-01-22	all: mark and skip the slow test	Shulhan
	The TestScan_slow takes around ~11 seconds due to test include [time.Sleep].
2026-01-22	brokenlinks: improve fetch logging and decrease timeout to 10 seconds	Shulhan
	When fetching, print log after the fetch completed. If success, print the URL along with HTTP status code. If fail, print the URL along with the error. The timeout now reduce to 10 seconds to prevent long delay when working with broken website.
2026-01-22	brokenlinks: print the progress to stderr	Shulhan
	Each time the scan start, new queue add, fetching start, print the message to stderr. This remove the verbose options for better user experience.
2026-01-22	all: refactoring, use single struct to represent Link	Shulhan
	Previously, have [jarink.Link], [brokenlinks.Broken], and [brokenlinks.linkQueue] to store the metadata for a link. These changes unified them into struct [jarink.Link].
2026-01-22	brokenlinks: refactoring the logic, simplify the code	Shulhan
	Previously, we made the scan logic to run in multiple goroutine with one channel to push and consume the result and another channel to push and pop link to be processed. The logic is a very complicated code, making it hard to read and debug. These changes refactoring it to use single goroutine that push and pop link from/to a slices, as queue.
2026-01-21	all: use markdown for formatting README	Shulhan
	This is so the README can be rendered in pkg.go.dev and in git.sr.ht. While at it, group documentation files under _doc/ directory.
2025-12-27	Release jarink 0.2.1 (2025-12-27)v0.2.1	Shulhan
	🌼 brokenlinks: fix infinite loop on unknown host On link with invalid domain, it should break and return the error immediately.
2025-11-20	brokenlinks: fix infinite loop on unknown host	Shulhan
	On link with invalid domain, it should break and return the error immediately.
2025-06-27	Release jarink version 0.2.0 (2025-05-27)v0.2.0	Shulhan
	🌱 brokenlinks: add option to ignore list HTTP status code. When link known to have an issues, one can ignore the status code during scanning broken links using "-ignore-status" option. 🌱 brokenlinks: add option "insecure". The "-insecure" option does not report an error on server with invalid certificates. 🌱 brokenlinks: implement caching for external URLs. Any successful fetch on external URLs will be recorded into jarink cache file, located in user's cache directory. For example, in Linux it would be `$HOME/.cache/jarink/cache.json`. This help improve the future rescanning on the same or different target URL, minimizing network requests. 🌼 brokenlinks: reduce the number of goroutine on scan. Previously, each scan run on one goroutine and the result is pushed using one goroutine. This makes one scan of link consume two goroutine. This changes the scan function to return the result and push it in the same goroutine.
2025-06-27	cmd/jarink: add "version" command	Shulhan
	The version command print the version of the program.
2025-06-27	all: add SPDX copyright and licenses to README	Shulhan

2025-06-27	brokenlinks: reduce the number of goroutines on scan	Shulhan
	Previously, each scan run on one goroutine and the result is pushed using pushResult also in one goroutine. This makes one link consume two goroutines. This changes the scan function to return the result and push it in the same goroutine.
2025-06-27	brokenlinks: implement caching for external URLs	Shulhan
	Any succesful fetch on external URLs, will be recorded into jarink cache file, located in user's home cache directory. For example, in Linux it would be `$HOME/.cache/jarink/cache.json`. This help improve the future rescanning on the same or different target URL, minimizing network requests.
2025-06-19	all: add test cases for simulating slow server	Shulhan
	The test run a server that contains three six pages that contains various [time.Sleep] duration before returning the response. This allow us to see how the main scan loop works, waiting for resultq and listWaitStatus.
2025-06-17	brokenlinks: add test cases for IgnoreStatus options	Shulhan
	There are two test cases, one for invalid status code like "abc", and one for unknown status code like "50".
2025-06-16	brokenlinks: update comment on test case with path	Shulhan

2025-06-16	all: add comment on GoEmbedReadme variable	Shulhan

2025-06-16	brokenlinks: move parsing scanned Url from worker to Options	Shulhan
	Before the Options passed to worker, it should be valid, including the URL to be scanned.
2025-06-16	all: rename README.adoc back to README	Shulhan
	This is for git.sr.ht to be able to render the README.
2025-06-16	brokenlinks: add option "insecure"	Shulhan
	The insecure option will allow and not report as error on server with invalid certificates.
2025-06-13	brokenlinks: add option to ignore list HTTP status code	Shulhan
	When link known to have an issues, one can ignore the status code during scanning broken links using "-ignore-status" option.
2025-06-12	Release jarink version 0.1.0v0.1.0	Shulhan
	The first release of jarink provides the command "brokenlinks", to scan for broken links. The output of this command is list of page with its broken links in JSON format. This command accept the following options, `-verbose`:: Print the page that being scanned to standard error. `-past-result=<path to JSON file>`:: Scan only the pages reported by result from past scan based on the content in JSON file. This minimize the time to re-scan the pages once we have fixed the URLs.
2025-06-12	all: add SPDX license to testdata files	Shulhan

2025-06-12	all: rename README to README.adoc	Shulhan

2025-06-12	all: refactoring, move brokenlinks code to its own package	Shulhan
	When two or more struct has the same prefix that means it is time to move it to group it. Also, we will group one command to one package in the future.
2025-06-12	all: rename the json field page_links to broken_links	Shulhan
	Naming it page_links does not make sense if the result is from brokenlinks command.
2025-06-11	all: revert to use HTTP GET on external, non-image URL	Shulhan
	Using HTTP HEAD on certain page may return * 404, not found, for example on https://support.google.com/accounts/answer/1066447 * 405, method not allowed, for example on https://aur.archlinux.org/packages/rescached-git For 405 response code we can check and retry with GET, but for 404 its impossible to check if the URL is really exist or not, since 404 means page not found.
2025-06-11	all: check for DNS timeout and retry 5 times	Shulhan
	When the call to HTTP HEAD or GET return an error and the error is *net.DNSError with Timeout, retry the call until no error or Timeout again for 5 times.
2025-06-05	all: encode the whole BrokenlinksResult struct to JSON	Shulhan
	Previously, we only encode the BrokenlinksResult.PageLinks. The struct may changes in the future, so its better to encode the whole struct now rather than changing the output later.
2025-06-05	all: add option to scan pass result	Shulhan
	The brokenlinks command now have option "-past-result" that accept path to JSON file from the past result. If its set, the program will only scan the pages with broken links inside that report.
2025-06-05	all: use snake case for JSON fields in Broken result	Shulhan

2025-06-05	all: move TestMain to jarink_test.go file	Shulhan

2025-06-01	all: brokenlinks should scan only URL on given path	Shulhan
	Previously, if we pass the URL with path to brokenlinks, for example "web.tld/path" it will scan all of the pages in the website "web.tld". Now, it only scan the "/path" and its sub paths.
2025-06-01	all: go embed the README and use it on the CLI for help command	Shulhan

2025-06-01	all: use separate logs for worker and main program	Shulhan
	The worker use log with date and time, while the main program is not.
2025-06-01	all: add a simple README	Shulhan
	The README contains the content from the usage function in the "cmd/jarink".