aboutsummaryrefslogtreecommitdiff
path: root/brokenlinks
AgeCommit message (Collapse)Author
2026-02-12brokenlinks: fix possible panic in markAsBrokenShulhan
If the Link does not have parentUrl, set the parent URL using the link URL itself. This only happened if the target URL that we will scan return an error.
2026-02-12brokenlinks: store the anchor or image source in linkShulhan
In the struct Link, we add field Value that store the href from A element or src from IMG element. This allow us to debug any error during scan, especially joining path and link.
2026-02-11brokenlinks: make link that return HTML always end with slashShulhan
If parent URL like "/page" return the body as HTML page, the URL should be end with slash to make the relative links inside it works when joined with the parent URL.
2026-02-11brokenlinks: skip processing "mailto:" URLShulhan
2026-02-11brokenlinks: test links that wrapped by other elementsShulhan
This is to see the behaviour of [Node.Descendants] when traversing the element recursively.
2026-02-11brokenlinks: check if link has been seen before scanShulhan
Given the following queue and its parent, /page2.html => /index.html /brokenPage => /index.html /brokenPage => /page2.html Before scanning the second "/brokenPage" on parent page "/page2.html", check if its seen first to get the status code before we run the scan. This allow jarink report "/brokenPage" as broken link for both pages, not just in "/index.html".
2026-02-05brokenlinks: check for redirect during scanShulhan
If the request redirected, use the "Location" value in the response header as the parent URL instead of from the original link in queue.
2026-02-04brokenlinks: fix generating relative URLShulhan
If the parent URL end with .html or .htm, join the directory of parent instead of the current path with the relative path.
2026-02-04brokenlinks: skip parsing non-HTML pageShulhan
If the response Content-type return other than "text/html", skip parsing the content and return immediately.
2026-02-04brokenlinks: mark the link in queue as seen with status code 0Shulhan
This is to fix double URL being pushed to queue.
2026-01-22all: mark and skip the slow testShulhan
The TestScan_slow takes around ~11 seconds due to test include [time.Sleep].
2026-01-22brokenlinks: improve fetch logging and decrease timeout to 10 secondsShulhan
When fetching, print log after the fetch completed. If success, print the URL along with HTTP status code. If fail, print the URL along with the error. The timeout now reduce to 10 seconds to prevent long delay when working with broken website.
2026-01-22brokenlinks: print the progress to stderrShulhan
Each time the scan start, new queue add, fetching start, print the message to stderr. This remove the verbose options for better user experience.
2026-01-22all: refactoring, use single struct to represent LinkShulhan
Previously, have [jarink.Link], [brokenlinks.Broken], and [brokenlinks.linkQueue] to store the metadata for a link. These changes unified them into struct [jarink.Link].
2026-01-22brokenlinks: refactoring the logic, simplify the codeShulhan
Previously, we made the scan logic to run in multiple goroutine with one channel to push and consume the result and another channel to push and pop link to be processed. The logic is a very complicated code, making it hard to read and debug. These changes refactoring it to use single goroutine that push and pop link from/to a slices, as queue.
2025-11-20brokenlinks: fix infinite loop on unknown hostShulhan
On link with invalid domain, it should break and return the error immediately.
2025-06-27brokenlinks: reduce the number of goroutines on scanShulhan
Previously, each scan run on one goroutine and the result is pushed using pushResult also in one goroutine. This makes one link consume two goroutines. This changes the scan function to return the result and push it in the same goroutine.
2025-06-27brokenlinks: implement caching for external URLsShulhan
Any succesful fetch on external URLs, will be recorded into jarink cache file, located in user's home cache directory. For example, in Linux it would be `$HOME/.cache/jarink/cache.json`. This help improve the future rescanning on the same or different target URL, minimizing network requests.
2025-06-19all: add test cases for simulating slow serverShulhan
The test run a server that contains three six pages that contains various [time.Sleep] duration before returning the response. This allow us to see how the main scan loop works, waiting for resultq and listWaitStatus.
2025-06-17brokenlinks: add test cases for IgnoreStatus optionsShulhan
There are two test cases, one for invalid status code like "abc", and one for unknown status code like "50".
2025-06-16brokenlinks: update comment on test case with pathShulhan
2025-06-16brokenlinks: move parsing scanned Url from worker to OptionsShulhan
Before the Options passed to worker, it should be valid, including the URL to be scanned.
2025-06-16brokenlinks: add option "insecure"Shulhan
The insecure option will allow and not report as error on server with invalid certificates.
2025-06-13brokenlinks: add option to ignore list HTTP status codeShulhan
When link known to have an issues, one can ignore the status code during scanning broken links using "-ignore-status" option.
2025-06-12all: add SPDX license to testdata filesShulhan
2025-06-12all: refactoring, move brokenlinks code to its own packageShulhan
When two or more struct has the same prefix that means it is time to move it to group it. Also, we will group one command to one package in the future.