Hanzoweb API Documentation

  1. Kick off a Gather
  2. Check Status
  3. Fetch Gathered Page (Cooked)
  4. Fetch Gathered Page (Raw)

Kick off a Gather

Example Request

$ curl -d "\ 
> username=petef&\ 
> key=_your_hanzo_developer_key_&\ 
> url=http://hobix.com/textile/&\ 

> tags=textile%20syntax%20cheatsheet&\ 
> description=Why's%20guide%20to%20Textile&\ 
> name=Textile%20Reference&\ 
> scope=context&\ 
> visibility=public\ 
> " \ 
> http://hanzoweb.com/xml/gather/

Example Response

<?xml version="1.0" encoding="UTF-8"?>
<response status="ok">
  <!-- 
    The item id can be used to list any other instances already archived.
  -->
  <item id="">
    <!--
      The instance id can be used to check the progress of the gather
      and to retrieve the new instance of the archive item, once it has 
      been collected.
    -->

    <instance id="2678">
      <!-- 
        Status is one of:
        NEW: queued,
        GATHERING: gather in progress,   
        TRANSFERRING: transferring to archive,
        COMPLETED: available for download,
        ERROR: error - please wait for this to be retried,
      -->
      <status>NEW</status>
    </instance>
  </item>

</response>

Details

Parameter names in bold are required. Parameter values in bold are defaults, values in italics are an informal description of the allowed type of value, values in plain roman text are literal.

URI: http://hanzoweb.com/xml/gather/
method: post
parameters:
username = your hanzoweb username
key = 40 hex digit developer key
url = the url you wish to gather
tags = a space separated list of alphanumeric tags
name = the title of the site
description = a short plain text description, comments or notes
scope = bookmark | page | context | website
visibility = private | public

Check Status

Example Request

$ curl http://hanzoweb.com/xml/status/2678/

Example Response

<?xml version="1.0" encoding="UTF-8"?>
<response status="ok">
    <!-- 
      The item id can be used to list any other instances already archived.
    -->
    <item id="2683">

        <!--
          The instance id can be used to check the progress of the gather
          and to retrieve the new instance of the archive item, once it has 
          been collected.
        -->
        <instance id="2678">
          <!-- 
            Status is one of:
            NEW: queued,
            GATHERING: gather in progress,   
            TRANSFERRING: transferring to archive,
            COMPLETED: available for download,
            ERROR: error - please wait for this to be retried,
          -->
          <status>COMPLETED</status>
        </instance>

    </item>
</response>

Details

...

Fetch Gathered Page (Cooked)

Example Request

$ curl -L http://hanzoweb.com/fetch/2678/

Example Response

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" 
  xml:lang="en">
...
<SCRIPT language="Javascript">
<!--
  // FILE ARCHIVED ON 20060309000233 AND RETRIEVED FROM 
  // HANZO:WEB ON 2006-03-09 00:26:16.178343. 
  // JAVASCRIPT APPENDED BY HANZO:WEB. 
  ...
  //-->

</SCRIPT>
</html>

Details

This will return the page as archived, with one possible exception -- see below. Bear in mind that there are no guarantees about the well-formedness or validity of the returned content.

If the gathered page is (x)html then it will have a chunk of javascript added after the closing body tag and before the closing html tag which will dynamically rewrite the links within the page (to point to the archived versions of the pages stored on www.hanzoweb.com) when the page is viewed in a browser. The page is otherwise unaltered and still contains the original links.

To retrieve a gathered page without this extra chunk use /fetch/raw/, see below.

Fetch Gathered Page (Raw)

Example Request

$ curl -L http://hanzoweb.com/fetch/raw/2678/

Example Response

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" 
      xml:lang="en">
<head>
<BASE HREF="http://hobix.com/textile/">
<meta http-equiv="Content-Type" content="text/html; 
      charset=utf-8" />
<title>Textile Reference</title>

  ...
</html>

Details

This will return the page exactly as archived. Bear in mind that there are no guarantees about the well-formedness or validity of the returned content.