Hanzoweb API Documentation
Kick off a Gather
Example Request
$ curl -d "\
> username=petef&\
> key=_your_hanzo_developer_key_&\
> url=http://hobix.com/textile/&\
> tags=textile%20syntax%20cheatsheet&\
> description=Why's%20guide%20to%20Textile&\
> name=Textile%20Reference&\
> scope=context&\
> visibility=public\
> " \
> http://hanzoweb.com/xml/gather/
Example Response
<?xml version="1.0" encoding="UTF-8"?>
<response status="ok">
<!--
The item id can be used to list any other instances already archived.
-->
<item id="">
<!--
The instance id can be used to check the progress of the gather
and to retrieve the new instance of the archive item, once it has
been collected.
-->
<instance id="2678">
<!--
Status is one of:
NEW: queued,
GATHERING: gather in progress,
TRANSFERRING: transferring to archive,
COMPLETED: available for download,
ERROR: error - please wait for this to be retried,
-->
<status>NEW</status>
</instance>
</item>
</response>
Details
Parameter names in bold are required. Parameter values in bold are defaults, values in italics are an informal description of the allowed type of value, values in plain roman text are literal.
| URI: | http://hanzoweb.com/xml/gather/ | |
|---|---|---|
| method: | post | |
| parameters: | ||
| username | = | your hanzoweb username |
| key | = | 40 hex digit developer key |
| url | = | the url you wish to gather |
| tags | = | a space separated list of alphanumeric tags |
| name | = | the title of the site |
| description | = | a short plain text description, comments or notes |
| scope | = | bookmark | page | context | website |
| visibility | = | private | public |
Check Status
Example Request
$ curl http://hanzoweb.com/xml/status/2678/
Example Response
<?xml version="1.0" encoding="UTF-8"?>
<response status="ok">
<!--
The item id can be used to list any other instances already archived.
-->
<item id="2683">
<!--
The instance id can be used to check the progress of the gather
and to retrieve the new instance of the archive item, once it has
been collected.
-->
<instance id="2678">
<!--
Status is one of:
NEW: queued,
GATHERING: gather in progress,
TRANSFERRING: transferring to archive,
COMPLETED: available for download,
ERROR: error - please wait for this to be retried,
-->
<status>COMPLETED</status>
</instance>
</item>
</response>
Details
...
Fetch Gathered Page (Cooked)
Example Request
$ curl -L http://hanzoweb.com/fetch/2678/
Example Response
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en"
xml:lang="en">
...
<SCRIPT language="Javascript">
<!--
// FILE ARCHIVED ON 20060309000233 AND RETRIEVED FROM
// HANZO:WEB ON 2006-03-09 00:26:16.178343.
// JAVASCRIPT APPENDED BY HANZO:WEB.
...
//-->
</SCRIPT>
</html>
Details
This will return the page as archived, with one possible exception -- see below. Bear in mind that there are no guarantees about the well-formedness or validity of the returned content.
If the gathered page is (x)html then it will have a chunk of javascript added after the closing body tag and before the closing html tag which will dynamically rewrite the links within the page (to point to the archived versions of the pages stored on www.hanzoweb.com) when the page is viewed in a browser. The page is otherwise unaltered and still contains the original links.
To retrieve a gathered page without this extra chunk use /fetch/raw/, see below.
Fetch Gathered Page (Raw)
Example Request
$ curl -L http://hanzoweb.com/fetch/raw/2678/
Example Response
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en"
xml:lang="en">
<head>
<BASE HREF="http://hobix.com/textile/">
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8" />
<title>Textile Reference</title>
...
</html>
Details
This will return the page exactly as archived. Bear in mind that there are no guarantees about the well-formedness or validity of the returned content.