aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-06-01all: rename the program and repository into jarinkShulhan
Jarink is a program to help web administrator to maintains their website. Currently its provides a command to scan for brokenlinks.
2025-05-31all: record an error due to broken link in HTML anchor or imageShulhan
2025-05-31all: record the error when checking the linksShulhan
The error message can help user to debug the problems with links.
2025-05-31cmd/deadlinks: print the result in JSONShulhan
Using JSON as output can be parsed by other tools.
2025-05-31all: check list of waiting status after processing resultShulhan
After all of the result from scan has been checked for seen or not, check for link that waiting for status in the second loop.
2025-05-31all: use HTTP method HEAD to check external domainsShulhan
For link that is not from the same domain being scanned, use the HTTP method HEAD to minimize resources being transported.
2025-05-31all: use case with ticker instead of default case in runShulhan
2025-05-31all: refactoring the scan work without sync.MutexShulhan
When using goroutine to process a link, the result than passed to main goroutine through channel. The main goroutine then process the result one by one, check if its has been seen, error, or need to be scanned. In that way, we don't need mutex to guard if link has been seen or not.
2025-05-31all: use HTTP method HEAD to check for image linkShulhan
Using HEAD does not return the content of image, which consume less resources on both end.
2025-05-30all: turn off log timestamp during testingShulhan
Printing date and time during testing makes the log lines too long.
2025-05-30cmd/deadlinks: implement the CLI for deadlinksShulhan
The CLI contains one command: scan Its accept single argument: an URL to be scanned, and one option "-verbose".
2025-05-30all: cleaning up fragment on linksShulhan
The fragment part on URL, for example "/page#fragment" should be removed, otherwise it will indexed as different URL.
2025-05-29all: change the Scan function parameter to struct ScanOptionsShulhan
Using struct allow to extends the parameter later without changing the signature.
2025-05-29all: remove returned error from parsing HTMLShulhan
After we check the code and test for [html.Parse] there are no case actual cases where HTML content will return an error. The only possible error is when reading from body (io.Reader), and that is also almost impossible. [html.Parse]: https://go.googlesource.com/net/+/refs/tags/v0.40.0/html/parse.go#2347
2025-05-29all: add test cases for broken link and invalid URLShulhan
2025-05-29all: ignore HTML page from external domainShulhan
Any HTML link that is from domain other than the scanned domain should net get parsed. It only check if the link is valid or not.
2025-05-29all: parse only link to HTML pageShulhan
For link to image we can skip parsing it.
2025-05-29testdata/web: remove anchor to external websiteShulhan
The test should not require internet connection to be passed.
2025-05-29all: add case for broken HTMLShulhan
Turn out broken HTML still get parsed by "net/html" package.
2025-05-29all: handle case for invalid URL, dead server, and on subpageShulhan
Scanning invalid URL like "127.0.0.1:14594", without HTTP scheme, and "http://127.0.0.1:14594" (server not available) should return an error. Scanning on subpage like "http://127.0.0.1:11836/page2" should return the same result as scanning from the base URL "http://127.0.0.1:11836/page2".
2025-05-27all: complete the first minimum working implementationShulhan
The current implementation at least cover 84% of the cases. Todo, * CLI for scan * add more test case for 100% coverage, including scan on invalid base URL, scan on invalid HTML page, scan on invalid href or src image
2025-05-22deadlinks: a program to scan for dead links on websiteShulhan