aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorShulhan <ms@kilabit.info>2026-02-12 09:04:54 +0700
committerShulhan <ms@kilabit.info>2026-02-12 09:05:42 +0700
commit23bbad5b40b5940ca3dcb7839d09c0462cf6c45d (patch)
tree9c9623ba274ec5e17c4c9f62b93cb12397418e00
parent3aaf1dd2a070f04c81c58681b32186abb3a956b9 (diff)
downloadjarink-dev.tar.xz
Release jarink 0.3.0 (2026-02-12)HEADv0.3.0maindev
**🌼 brokenlinks: refactoring the logic, simplify the code** Previously, we made the scan logic to run in multiple goroutine with one channel to push and consume the result and another channel to push and pop link to be processed. The logic is a very complicated code, making it hard to read and debug. These changes refactoring it to use single goroutine that push and pop link from/to a slices, as queue. Another refactoring is in where we store the link. Previously, we have [jarink.Link], [brokenlinks.Broken], and [brokenlinks.linkQueue] to store the metadata for a link. These release unified them into struct [jarink.Link]. **🌱 brokenlinks: print the progress to stderr** Each time the scan start, new queue add, fetching start, print the message to stderr. This remove the verbose options for better user experience. **🌼 brokenlinks: improve fetch logging and decrease timeout to 10s** When fetching, print log after the fetch completed. If success, print the URL along with HTTP status code. If fail, print the URL along with the error. The timeout now reduce to 10 seconds to prevent long delay when working with broken website. **🌼 brokenlinks: mark the link in queue as seen with status code 0** This is to fix double URL being pushed to queue. Given the following queue and its parent, ---- /page2.html => /index.html /brokenPage => /index.html /brokenPage => /page2.html ---- Before scanning the second "/brokenPage" on parent page "/page2.html", check if its seen first to get the status code before we run the scan. This allow jarink report "/brokenPage" as broken link for both pages, not just in "/index.html". **🌼 brokenlinks: skip parsing non-HTML page** If the response Content-type return other than "text/html", skip parsing the content and return immediately. We also skip processing "mailto\:" URL. **🌼 brokenlinks: make link that return HTML always end with slash** If parent URL like "/page" return the body as HTML page, the URL should be end with slash to make the relative links inside it works when joined with the parent URL. **🌱 brokenlinks: store the anchor or image source in link** In the struct `Link`, we add field `Value` that store the `href` from A element or `src` from IMG element. This allow us to debug any error during scan, especially joining path and link. **🌼 brokenlinks: fix possible panic in markAsBroken** If the Link does not have `parentUrl`, set the parent URL using the link URL itself. This only happened if the target URL that we will scan return an error.
-rw-r--r--CHANGELOG.adoc73
-rw-r--r--jarink.go2
2 files changed, 71 insertions, 4 deletions
diff --git a/CHANGELOG.adoc b/CHANGELOG.adoc
index 9d08249..499206d 100644
--- a/CHANGELOG.adoc
+++ b/CHANGELOG.adoc
@@ -1,5 +1,5 @@
-// SPDX-FileCopyrightText: 2025 M. Shulhan <ms@kilabit.info>
// SPDX-License-Identifier: GPL-3.0-only
+// SPDX-FileCopyrightText: 2025 M. Shulhan <ms@kilabit.info>
= jarink releases changelog
:sectanchors:
@@ -10,12 +10,14 @@ The latest release log is put on the top.
Legend,
+* 🪵: Breaking changes
* 🌱: New feature
* 🌼: Enhancement
* 💧: Chores
-[#jarink_v0_2_2]
-== jarink 0.2.2 (2026-xx-xx)
+
+[#jarink_v0_3_0]
+== jarink 0.3.0 (2026-02-12)
**🌼 brokenlinks: refactoring the logic, simplify the code**
@@ -27,6 +29,69 @@ The logic is a very complicated code, making it hard to read and debug.
These changes refactoring it to use single goroutine that push and pop
link from/to a slices, as queue.
+Another refactoring is in where we store the link.
+Previously, we have [jarink.Link], [brokenlinks.Broken], and
+[brokenlinks.linkQueue] to store the metadata for a link.
+These release unified them into struct [jarink.Link].
+
+**🌱 brokenlinks: print the progress to stderr**
+
+Each time the scan start, new queue add, fetching start, print the
+message to stderr.
+This remove the verbose options for better user experience.
+
+**🌼 brokenlinks: improve fetch logging and decrease timeout to 10s**
+
+When fetching, print log after the fetch completed.
+If success, print the URL along with HTTP status code.
+If fail, print the URL along with the error.
+
+The timeout now reduce to 10 seconds to prevent long delay when working
+with broken website.
+
+**🌼 brokenlinks: mark the link in queue as seen with status code 0**
+
+This is to fix double URL being pushed to queue.
+
+Given the following queue and its parent,
+
+----
+/page2.html => /index.html
+/brokenPage => /index.html
+/brokenPage => /page2.html
+----
+
+Before scanning the second "/brokenPage" on parent page "/page2.html",
+check if its seen first to get the status code before we run the scan.
+This allow jarink report "/brokenPage" as broken link for both pages,
+not just in "/index.html".
+
+**🌼 brokenlinks: skip parsing non-HTML page**
+
+If the response Content-type return other than "text/html", skip parsing
+the content and return immediately.
+
+We also skip processing "mailto\:" URL.
+
+**🌼 brokenlinks: make link that return HTML always end with slash**
+
+If parent URL like "/page" return the body as HTML page, the URL should
+be end with slash to make the relative links inside it works when joined
+with the parent URL.
+
+**🌱 brokenlinks: store the anchor or image source in link**
+
+In the struct `Link`, we add field `Value` that store the `href` from A
+element or `src` from IMG element.
+This allow us to debug any error during scan, especially joining path
+and link.
+
+**🌼 brokenlinks: fix possible panic in markAsBroken**
+
+If the Link does not have `parentUrl`, set the parent URL using the link
+URL itself.
+This only happened if the target URL that we will scan return an error.
+
[#jarink_v0_2_1]
== jarink 0.2.1 (2025-12-27)
@@ -85,3 +150,5 @@ Print the page that being scanned to standard error.
Scan only the pages reported by result from past scan based
on the content in JSON file.
This minimize the time to re-scan the pages once we have fixed the URLs.
+
+// vim: textwidth=72:
diff --git a/jarink.go b/jarink.go
index 5c46e4e..a4c4189 100644
--- a/jarink.go
+++ b/jarink.go
@@ -8,7 +8,7 @@ import (
)
// Version of jarink program and module.
-var Version = `0.2.2`
+var Version = `0.3.0`
// GoEmbedReadme embed the README for showing the usage of program.
//