<feed xmlns='http://www.w3.org/2005/Atom'>
<title>jarink/brokenlinks/worker.go, branch main</title>
<subtitle>Program to inspects and maintains web sites.</subtitle>
<id>http://git.kilabit.info/jarink/atom?h=main</id>
<link rel='self' href='http://git.kilabit.info/jarink/atom?h=main'/>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/jarink/'/>
<updated>2026-02-11T18:07:04Z</updated>
<entry>
<title>brokenlinks: fix possible panic in markAsBroken</title>
<updated>2026-02-11T18:07:04Z</updated>
<author>
<name>Shulhan</name>
<email>ms@kilabit.info</email>
</author>
<published>2026-02-11T18:05:52Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/jarink/commit/?id=e4b3383d84cd181f4a9fec1097ec61e583d6f827'/>
<id>urn:sha1:e4b3383d84cd181f4a9fec1097ec61e583d6f827</id>
<content type='text'>
If the Link does not have parentUrl, set the parent URL using the link
URL itself.

This only happened if the target URL that we will scan return an error.
</content>
</entry>
<entry>
<title>brokenlinks: store the anchor or image source in link</title>
<updated>2026-02-11T18:04:40Z</updated>
<author>
<name>Shulhan</name>
<email>ms@kilabit.info</email>
</author>
<published>2026-02-11T18:04:40Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/jarink/commit/?id=9c7ee77376294e9abd70ca356e26d0ab16ad7466'/>
<id>urn:sha1:9c7ee77376294e9abd70ca356e26d0ab16ad7466</id>
<content type='text'>
In the struct Link, we add field Value that store the href from A element
or src from IMG element.
This allow us to debug any error during scan, especially joining path
and link.
</content>
</entry>
<entry>
<title>brokenlinks: make link that return HTML always end with slash</title>
<updated>2026-02-11T14:45:06Z</updated>
<author>
<name>Shulhan</name>
<email>ms@kilabit.info</email>
</author>
<published>2026-02-11T03:47:42Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/jarink/commit/?id=8100b3be0730173a77f1a64f9ac6bc8862a159ac'/>
<id>urn:sha1:8100b3be0730173a77f1a64f9ac6bc8862a159ac</id>
<content type='text'>
If parent URL like "/page" return the body as HTML page, the URL should
be end with slash to make the relative links inside it works when joined
with the parent URL.
</content>
</entry>
<entry>
<title>brokenlinks: skip processing "mailto:" URL</title>
<updated>2026-02-11T14:45:06Z</updated>
<author>
<name>Shulhan</name>
<email>ms@kilabit.info</email>
</author>
<published>2026-02-10T21:45:04Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/jarink/commit/?id=c92d49c41e27d50a36fd6eba0824331606627132'/>
<id>urn:sha1:c92d49c41e27d50a36fd6eba0824331606627132</id>
<content type='text'>
</content>
</entry>
<entry>
<title>brokenlinks: check if link has been seen before scan</title>
<updated>2026-02-10T21:38:20Z</updated>
<author>
<name>Shulhan</name>
<email>ms@kilabit.info</email>
</author>
<published>2026-02-10T21:38:20Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/jarink/commit/?id=61eb5351087be894a4bfeb71c99346b8065bb7f1'/>
<id>urn:sha1:61eb5351087be894a4bfeb71c99346b8065bb7f1</id>
<content type='text'>
Given the following queue and its parent,

  /page2.html =&gt; /index.html
  /brokenPage =&gt; /index.html
  /brokenPage =&gt; /page2.html

Before scanning the second "/brokenPage" on parent page "/page2.html",
check if its seen first to get the status code before we run the scan.

This allow jarink report "/brokenPage" as broken link for both pages,
not just in "/index.html".
</content>
</entry>
<entry>
<title>brokenlinks: check for redirect during scan</title>
<updated>2026-02-04T20:29:49Z</updated>
<author>
<name>Shulhan</name>
<email>ms@kilabit.info</email>
</author>
<published>2026-02-04T20:29:49Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/jarink/commit/?id=d8a892eb2f28b3ef4c2625c682d255f4f616cae2'/>
<id>urn:sha1:d8a892eb2f28b3ef4c2625c682d255f4f616cae2</id>
<content type='text'>
If the request redirected, use the "Location" value in the response
header as the parent URL instead of from the original link in queue.
</content>
</entry>
<entry>
<title>brokenlinks: fix generating relative URL</title>
<updated>2026-02-04T15:58:10Z</updated>
<author>
<name>Shulhan</name>
<email>ms@kilabit.info</email>
</author>
<published>2026-02-04T15:10:42Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/jarink/commit/?id=fa31e0a656d03fe3744c70a1171e3831647923c9'/>
<id>urn:sha1:fa31e0a656d03fe3744c70a1171e3831647923c9</id>
<content type='text'>
If the parent URL end with .html or .htm, join the directory of parent
instead of the current path with the relative path.
</content>
</entry>
<entry>
<title>brokenlinks: skip parsing non-HTML page</title>
<updated>2026-02-04T14:53:00Z</updated>
<author>
<name>Shulhan</name>
<email>ms@kilabit.info</email>
</author>
<published>2026-02-04T14:53:00Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/jarink/commit/?id=b36d6e1f423bc405895d1b72e9a5915c4aa74ecc'/>
<id>urn:sha1:b36d6e1f423bc405895d1b72e9a5915c4aa74ecc</id>
<content type='text'>
If the response Content-type return other than "text/html", skip parsing
the content and return immediately.
</content>
</entry>
<entry>
<title>brokenlinks: mark the link in queue as seen with status code 0</title>
<updated>2026-02-04T10:03:47Z</updated>
<author>
<name>Shulhan</name>
<email>ms@kilabit.info</email>
</author>
<published>2026-02-04T10:03:47Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/jarink/commit/?id=6b5ed409a5f11ed437586c8b046bcfc43749361d'/>
<id>urn:sha1:6b5ed409a5f11ed437586c8b046bcfc43749361d</id>
<content type='text'>
This is to fix double URL being pushed to queue.
</content>
</entry>
<entry>
<title>brokenlinks: improve fetch logging and decrease timeout to 10 seconds</title>
<updated>2026-01-21T21:14:25Z</updated>
<author>
<name>Shulhan</name>
<email>ms@kilabit.info</email>
</author>
<published>2026-01-21T21:14:25Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/jarink/commit/?id=a559b47dc217793ca7f80121b3ea86f03e47afd3'/>
<id>urn:sha1:a559b47dc217793ca7f80121b3ea86f03e47afd3</id>
<content type='text'>
When fetching, print log after the fetch completed.
If success, print the URL along with HTTP status code.
If fail, print the URL along with the error.

The timeout now reduce to 10 seconds to prevent long delay when working
with broken website.
</content>
</entry>
</feed>
