aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2026-02-12Release jarink 0.3.0 (2026-02-12)HEADv0.3.0maindevShulhan
**🌼 brokenlinks: refactoring the logic, simplify the code** Previously, we made the scan logic to run in multiple goroutine with one channel to push and consume the result and another channel to push and pop link to be processed. The logic is a very complicated code, making it hard to read and debug. These changes refactoring it to use single goroutine that push and pop link from/to a slices, as queue. Another refactoring is in where we store the link. Previously, we have [jarink.Link], [brokenlinks.Broken], and [brokenlinks.linkQueue] to store the metadata for a link. These release unified them into struct [jarink.Link]. **🌱 brokenlinks: print the progress to stderr** Each time the scan start, new queue add, fetching start, print the message to stderr. This remove the verbose options for better user experience. **🌼 brokenlinks: improve fetch logging and decrease timeout to 10s** When fetching, print log after the fetch completed. If success, print the URL along with HTTP status code. If fail, print the URL along with the error. The timeout now reduce to 10 seconds to prevent long delay when working with broken website. **🌼 brokenlinks: mark the link in queue as seen with status code 0** This is to fix double URL being pushed to queue. Given the following queue and its parent, ---- /page2.html => /index.html /brokenPage => /index.html /brokenPage => /page2.html ---- Before scanning the second "/brokenPage" on parent page "/page2.html", check if its seen first to get the status code before we run the scan. This allow jarink report "/brokenPage" as broken link for both pages, not just in "/index.html". **🌼 brokenlinks: skip parsing non-HTML page** If the response Content-type return other than "text/html", skip parsing the content and return immediately. We also skip processing "mailto\:" URL. **🌼 brokenlinks: make link that return HTML always end with slash** If parent URL like "/page" return the body as HTML page, the URL should be end with slash to make the relative links inside it works when joined with the parent URL. **🌱 brokenlinks: store the anchor or image source in link** In the struct `Link`, we add field `Value` that store the `href` from A element or `src` from IMG element. This allow us to debug any error during scan, especially joining path and link. **🌼 brokenlinks: fix possible panic in markAsBroken** If the Link does not have `parentUrl`, set the parent URL using the link URL itself. This only happened if the target URL that we will scan return an error.
2026-02-12all: update the READMEShulhan
Rewording some paragraphs, formatting on code, and add INSTALL section.
2026-02-12make: add `build` taskShulhan
The build task set the Version information based on the latest tag and number of commits.
2026-02-12brokenlinks: fix possible panic in markAsBrokenShulhan
If the Link does not have parentUrl, set the parent URL using the link URL itself. This only happened if the target URL that we will scan return an error.
2026-02-12brokenlinks: store the anchor or image source in linkShulhan
In the struct Link, we add field Value that store the href from A element or src from IMG element. This allow us to debug any error during scan, especially joining path and link.
2026-02-11brokenlinks: make link that return HTML always end with slashShulhan
If parent URL like "/page" return the body as HTML page, the URL should be end with slash to make the relative links inside it works when joined with the parent URL.
2026-02-11go.mod: update all dependenciesShulhan
2026-02-11brokenlinks: skip processing "mailto:" URLShulhan
2026-02-11brokenlinks: test links that wrapped by other elementsShulhan
This is to see the behaviour of [Node.Descendants] when traversing the element recursively.
2026-02-11brokenlinks: check if link has been seen before scanShulhan
Given the following queue and its parent, /page2.html => /index.html /brokenPage => /index.html /brokenPage => /page2.html Before scanning the second "/brokenPage" on parent page "/page2.html", check if its seen first to get the status code before we run the scan. This allow jarink report "/brokenPage" as broken link for both pages, not just in "/index.html".
2026-02-05brokenlinks: check for redirect during scanShulhan
If the request redirected, use the "Location" value in the response header as the parent URL instead of from the original link in queue.
2026-02-04brokenlinks: fix generating relative URLShulhan
If the parent URL end with .html or .htm, join the directory of parent instead of the current path with the relative path.
2026-02-04brokenlinks: skip parsing non-HTML pageShulhan
If the response Content-type return other than "text/html", skip parsing the content and return immediately.
2026-02-04brokenlinks: mark the link in queue as seen with status code 0Shulhan
This is to fix double URL being pushed to queue.
2026-01-22all: mark and skip the slow testShulhan
The TestScan_slow takes around ~11 seconds due to test include [time.Sleep].
2026-01-22brokenlinks: improve fetch logging and decrease timeout to 10 secondsShulhan
When fetching, print log after the fetch completed. If success, print the URL along with HTTP status code. If fail, print the URL along with the error. The timeout now reduce to 10 seconds to prevent long delay when working with broken website.
2026-01-22brokenlinks: print the progress to stderrShulhan
Each time the scan start, new queue add, fetching start, print the message to stderr. This remove the verbose options for better user experience.
2026-01-22all: refactoring, use single struct to represent LinkShulhan
Previously, have [jarink.Link], [brokenlinks.Broken], and [brokenlinks.linkQueue] to store the metadata for a link. These changes unified them into struct [jarink.Link].
2026-01-22brokenlinks: refactoring the logic, simplify the codeShulhan
Previously, we made the scan logic to run in multiple goroutine with one channel to push and consume the result and another channel to push and pop link to be processed. The logic is a very complicated code, making it hard to read and debug. These changes refactoring it to use single goroutine that push and pop link from/to a slices, as queue.
2026-01-21all: use markdown for formatting READMEShulhan
This is so the README can be rendered in pkg.go.dev and in git.sr.ht. While at it, group documentation files under _doc/ directory.
2025-12-27Release jarink 0.2.1 (2025-12-27)v0.2.1Shulhan
**🌼 brokenlinks: fix infinite loop on unknown host** On link with invalid domain, it should break and return the error immediately.
2025-11-20brokenlinks: fix infinite loop on unknown hostShulhan
On link with invalid domain, it should break and return the error immediately.
2025-06-27Release jarink version 0.2.0 (2025-05-27)v0.2.0Shulhan
**🌱 brokenlinks: add option to ignore list HTTP status code**. When link known to have an issues, one can ignore the status code during scanning broken links using "-ignore-status" option. **🌱 brokenlinks: add option "insecure"**. The "-insecure" option does not report an error on server with invalid certificates. **🌱 brokenlinks: implement caching for external URLs**. Any successful fetch on external URLs will be recorded into jarink cache file, located in user's cache directory. For example, in Linux it would be `$HOME/.cache/jarink/cache.json`. This help improve the future rescanning on the same or different target URL, minimizing network requests. **🌼 brokenlinks: reduce the number of goroutine on scan**. Previously, each scan run on one goroutine and the result is pushed using one goroutine. This makes one scan of link consume two goroutine. This changes the scan function to return the result and push it in the same goroutine.
2025-06-27cmd/jarink: add "version" commandShulhan
The version command print the version of the program.
2025-06-27all: add SPDX copyright and licenses to READMEShulhan
2025-06-27brokenlinks: reduce the number of goroutines on scanShulhan
Previously, each scan run on one goroutine and the result is pushed using pushResult also in one goroutine. This makes one link consume two goroutines. This changes the scan function to return the result and push it in the same goroutine.
2025-06-27brokenlinks: implement caching for external URLsShulhan
Any succesful fetch on external URLs, will be recorded into jarink cache file, located in user's home cache directory. For example, in Linux it would be `$HOME/.cache/jarink/cache.json`. This help improve the future rescanning on the same or different target URL, minimizing network requests.
2025-06-19all: add test cases for simulating slow serverShulhan
The test run a server that contains three six pages that contains various [time.Sleep] duration before returning the response. This allow us to see how the main scan loop works, waiting for resultq and listWaitStatus.
2025-06-17brokenlinks: add test cases for IgnoreStatus optionsShulhan
There are two test cases, one for invalid status code like "abc", and one for unknown status code like "50".
2025-06-16brokenlinks: update comment on test case with pathShulhan
2025-06-16all: add comment on GoEmbedReadme variableShulhan
2025-06-16brokenlinks: move parsing scanned Url from worker to OptionsShulhan
Before the Options passed to worker, it should be valid, including the URL to be scanned.
2025-06-16all: rename README.adoc back to READMEShulhan
This is for git.sr.ht to be able to render the README.
2025-06-16brokenlinks: add option "insecure"Shulhan
The insecure option will allow and not report as error on server with invalid certificates.
2025-06-13brokenlinks: add option to ignore list HTTP status codeShulhan
When link known to have an issues, one can ignore the status code during scanning broken links using "-ignore-status" option.
2025-06-12Release jarink version 0.1.0v0.1.0Shulhan
The first release of jarink provides the command "brokenlinks", to scan for broken links. The output of this command is list of page with its broken links in JSON format. This command accept the following options, `-verbose`:: Print the page that being scanned to standard error. `-past-result=<path to JSON file>`:: Scan only the pages reported by result from past scan based on the content in JSON file. This minimize the time to re-scan the pages once we have fixed the URLs.
2025-06-12all: add SPDX license to testdata filesShulhan
2025-06-12all: rename README to README.adocShulhan
2025-06-12all: refactoring, move brokenlinks code to its own packageShulhan
When two or more struct has the same prefix that means it is time to move it to group it. Also, we will group one command to one package in the future.
2025-06-12all: rename the json field page_links to broken_linksShulhan
Naming it page_links does not make sense if the result is from brokenlinks command.
2025-06-11all: revert to use HTTP GET on external, non-image URLShulhan
Using HTTP HEAD on certain page may return * 404, not found, for example on https://support.google.com/accounts/answer/1066447 * 405, method not allowed, for example on https://aur.archlinux.org/packages/rescached-git For 405 response code we can check and retry with GET, but for 404 its impossible to check if the URL is really exist or not, since 404 means page not found.
2025-06-11all: check for DNS timeout and retry 5 timesShulhan
When the call to HTTP HEAD or GET return an error and the error is *net.DNSError with Timeout, retry the call until no error or Timeout again for 5 times.
2025-06-05all: encode the whole BrokenlinksResult struct to JSONShulhan
Previously, we only encode the BrokenlinksResult.PageLinks. The struct may changes in the future, so its better to encode the whole struct now rather than changing the output later.
2025-06-05all: add option to scan pass resultShulhan
The brokenlinks command now have option "-past-result" that accept path to JSON file from the past result. If its set, the program will only scan the pages with broken links inside that report.
2025-06-05all: use snake case for JSON fields in Broken resultShulhan
2025-06-05all: move TestMain to jarink_test.go fileShulhan
2025-06-01all: brokenlinks should scan only URL on given pathShulhan
Previously, if we pass the URL with path to brokenlinks, for example "web.tld/path" it will scan all of the pages in the website "web.tld". Now, it only scan the "/path" and its sub paths.
2025-06-01all: go embed the README and use it on the CLI for help commandShulhan
2025-06-01all: use separate logs for worker and main programShulhan
The worker use log with date and time, while the main program is not.
2025-06-01all: add a simple READMEShulhan
The README contains the content from the usage function in the "cmd/jarink".