| Age | Commit message (Collapse) | Author |
|
If the Link does not have parentUrl, set the parent URL using the link
URL itself.
This only happened if the target URL that we will scan return an error.
|
|
In the struct Link, we add field Value that store the href from A element
or src from IMG element.
This allow us to debug any error during scan, especially joining path
and link.
|
|
If parent URL like "/page" return the body as HTML page, the URL should
be end with slash to make the relative links inside it works when joined
with the parent URL.
|
|
|
|
Given the following queue and its parent,
/page2.html => /index.html
/brokenPage => /index.html
/brokenPage => /page2.html
Before scanning the second "/brokenPage" on parent page "/page2.html",
check if its seen first to get the status code before we run the scan.
This allow jarink report "/brokenPage" as broken link for both pages,
not just in "/index.html".
|
|
If the request redirected, use the "Location" value in the response
header as the parent URL instead of from the original link in queue.
|
|
If the parent URL end with .html or .htm, join the directory of parent
instead of the current path with the relative path.
|
|
If the response Content-type return other than "text/html", skip parsing
the content and return immediately.
|
|
This is to fix double URL being pushed to queue.
|
|
When fetching, print log after the fetch completed.
If success, print the URL along with HTTP status code.
If fail, print the URL along with the error.
The timeout now reduce to 10 seconds to prevent long delay when working
with broken website.
|
|
Each time the scan start, new queue add, fetching start, print the
message to stderr.
This remove the verbose options for better user experience.
|
|
Previously, have [jarink.Link], [brokenlinks.Broken], and
[brokenlinks.linkQueue] to store the metadata for a link.
These changes unified them into struct [jarink.Link].
|
|
Previously, we made the scan logic to run in multiple goroutine with
one channel to push and consume the result and another channel to push
and pop link to be processed.
The logic is a very complicated code, making it hard to read and debug.
These changes refactoring it to use single goroutine that push and pop
link from/to a slices, as queue.
|
|
On link with invalid domain, it should break and return the error
immediately.
|
|
Previously, each scan run on one goroutine and the result is
pushed using pushResult also in one goroutine.
This makes one link consume two goroutines.
This changes the scan function to return the result and push it
in the same goroutine.
|
|
Any succesful fetch on external URLs, will be recorded into jarink
cache file, located in user's home cache directory.
For example, in Linux it would be `$HOME/.cache/jarink/cache.json`.
This help improve the future rescanning on the same or different target
URL, minimizing network requests.
|
|
The test run a server that contains three six pages that contains
various [time.Sleep] duration before returning the response.
This allow us to see how the main scan loop works, waiting
for resultq and listWaitStatus.
|
|
Before the Options passed to worker, it should be valid, including the
URL to be scanned.
|
|
The insecure option will allow and not report as error on server with
invalid certificates.
|
|
When link known to have an issues, one can ignore the status
code during scanning broken links using "-ignore-status" option.
|
|
When two or more struct has the same prefix that means it is time to
move it to group it.
Also, we will group one command to one package in the future.
|