Frequently Asked Questions

This FAQ contains most questions you may want to ask about Hanzo and hanzo:web. If you just need minial documentation to get started, try the Quick Start Guide.

  1. About Hanzo
  2. Archiving
  3. Accessing And Using The Archive
  4. Account Management

1. About Hanzo

Why did you create hanzo:web?

We have observed important and beautiful websites emerge and disappear from the web everyday; whether by simply editing existing content or neglect, accident, censorship or oppression, this is a profound loss. We believe that archiving the content of all sites is a social and cultural necessity and needs to take place now! To this effect we intend to archive all sites, pages and links submitted by hanzo:web members and allow free access to this collection for everyone forever.

Our goal is to ensure the continuity and preservation of our diverse digital culture for the long term.

Who are the creators?

Hanzo was founded in 2005 by a team of software entrepreneurs and archivists from globally renowned memory institutions. Together we recognised the declining relevance and undeserved influence of mass media and institutions on the web and simultaneously we see the increasing participation of individuals and self organising groups of people building a vibrant, creative and dynamic new web. Yet the mass media and institutions are the ones who decide which bits of the net are preserved for the future.

This isn't right.

We founded Hanzo to provide the means to organise and preserve our digital culture ourselves: archiving for the true participants. Our first product, hanzo:web, lets everyone choose what’s worth preserving. Whenever members bookmark anything they can automatically archive it too, along with what they decide is an appropriate depth of context. Importantly, the archived pages and links still work and of course members can tag their pages and share them with everyone else.

What is your product roadmap?

In early 2006 Hanzo launched the first product at hanzo:web. We’re launching our public API at ETech 2006. The API will enable members to websource our archiving services - enabling them to build archiving capabilities into their own web apps.

A new version of hanzo:web will be available following ETech, which will provide a number of unique features such as feed archiving, scheduled and repeated archiving, enhanced archive scope, enhanced archive navigation and our open API.

We're in discussions with developers who intend to use the API to provide the means to archive content as it's produced. We believe this will be particularly beneficial to web2.0 products and for the blogosphere, to avoid a new strain of blog-rot: commentary without the original.

In mid 2006 we will extend our API driven services further, to enable continuous archiving of dynamic Web2.0 sites and services. Also mid-year, our preservation partnership services will be established, through which we can guarantee the long term preservation and access to archived content, independant of our own existence. Finally, towards the end of 2006, will launch a new product hanzo:net, which will take everything we do to a new level, with increased personalisation, flexible scope, and massively increased quotas.

Where are you based?

Hanzo is incorporated in England, with Directors in various locations in UK and in Paris, France. Hanzo:web itself is hosted out of Paris. Product development is in France and Britain, and soon to be extended to other countries - we're truly an international organisation. Our business aim to build membership globally by working with our members to promote the services in their local net space, we will begin this process when we emerge out of Beta in Spring 2006. Interestingly, only a small percentage of our users are European, we have many in the US and Japan, but a lot more actively archiving sites in Russia and China.

What is your Privacy Policy?

We at Hanzo respect the privacy of you the users of our Site and Services and are committed to protecting it; our Privacy Policy ensures we do this in an open and fair way.

Archiving

Why would I archive web content?

Statistical evidence suggests that web pages have an average half-life of around 2 years. This implies that 30% of favoured bookmarks and saved links will fail each year. This takes place despite 90% of users actively trying to keep these pages safe. Websites of importance are either removed, changed or renamed leading to the disappearance of a vast number of pages and sites. Archiving content you care about is the only way to make sure you can acess it over time, as well as share it with others.

How do I start archiving?

The simplest route: register for a free account, confirm by email, and then archive using the collect form.

You can continue to work like this, or you can set yourself up for ongoing archiving using our Collect This bookmarklet and collect bar. The bookmarklet is a link to add to you to your browser's Bookmarks Toolbar.

Drag this bookmarklet to your bookmarks bar to make page collection easy: Collect this

Now, as you browse, on the page(s) you want to archive simply click on the Collect this bookmarklet and the Collect Bar appears at the top of the page, in which you enter information about the page, such as tags and comments, select the scope of capture and public/private status and click on the Collect button. Its much easier to do it than describe it!

Clicking Collect starts a process called gathering. This is where our archival crawlers retrieve the page(s) requested and after a little while, depending on the scope and size of the page or site, these are added to the archive and you will see the item listed in the my archive page.

How do you capture content?

We use archival web crawlers, sofware that browses the Web, extracting links, fetching and archiving pages, images and other embedded content, a process also known as gathering. The crawlers continue gathering until the full scope of the crawl is completed, which means your archive will contain the original page plus all its links and embedded content and, depending on the scope, all the pages the original page linked to, a.k.a. the context. Pro and Ace account holders can also archive entire websites.

What are the different scopes of capture?

Hanzo allows you to archive content with varying scopes. In the collect bar or form, there is a drop-down list which allows you to choose between capturing only the link (a bookmark), the page (and all its embedded images and so on), context (the page and everything else it links to - this is the default scope) and website (to collect the entire site - available only for paid subscriptions). Note that images and other objects part of a page’s integrity will always be gathered along with the page itself.

The default scope is called ‘context’ - this means that pages or documents linked from the entry point page will also be archived. At the moment the crawler will only collect the context which is linked directly, i.e. within a single hop. Context is often very usefull as a page freqquently comments upon or refers to others. With the context scope you are sure to not loose a pages context over time.

If you are interested in collecting a single page only and consider external context is not important, then you can archive with the ‘page only’ scope, which will not capture linked documents. This has the advantage of saving your archive quota, but with the risk of missing something. Its important to think about how useful the context will be in the future before deciding to reduce the scope of a gather.

If you want to archive a complete site, Hanzo allows you to do so, very easily. Website scope means that all pages our crawlers can find within the website will be captured. This is very useful to keep versions of sites you care about.

Hanzo can accomodate varying types of site structure - adapting the crawl automatically to the structure. For example, websites within a specific domain name, example.com say, Hanzo will gather all the pages in this domain including the subdomains and all the directories within them.

For websites located on a subdomain, for example, site.example.com, our crawlers will capture content on this sub-domain but not on other subdomains (like other.example.com).

For webites located in a directory: www.example.com/site/ our crawlers will restrict the gather to the directory and content reached from there, but not above that directory.

It is often good practice to archive using a combination of scopes, such as website to capture the site as a whole, and context crawls on the home page and partner pages, news pages, etc., to capture the context around the website.

What type of files do you capture?

We capture all types of files.

Can I archive password protected sites?

We do not currently provide this feature.

Accessing And Using The Archive

How can I find my archive?

When you are logged in, there is a blue box at the top right of the page with a link to my archive - this is the default location for all your archived items. You can access archived items by clicking on the instance date or by clicking on the title. This will launch the archive banner and load the archived page. You can browse around the page as though it were live on the web today.

With a large collection of archive items it may be necessary to search or browse tags in order to create a smaller list of items to work with. You can do this from the 'my archive' page by clicking on a tag, which limits the contents of the list to items with this tag, or by searching, where the results are those items with the search term in the tag, title of description.

Full text search is coming very soon.

What's happening when I get the Not In Archive page?

There are several reasons an archived item may not be in the archive:

What is the difference between the private and public archives?

The default status of an archived item is 'public', which means that other Hanzo users will have the access to the archived item as well as the tags and comments for this item. This is to ensure more of the web is archived and preserved for more people. For some items, you may prefer to prevent others accessing them and the tags and comments you have added. These items are 'private'. You can choose the private status when collecting and also edit it afterwards by pressing the edit button at the left of the item in the list view. You can only change the status for content you have archived yourself.

Account Management

How much is a subscription to Hanzo?

The subscription to Hanzo is free. This gives you a monthly archiving quota of 100MB of new content. Every month this quota is reset, so you get another 100MB. The content you archive will be preserved whatever its size, forever.

You can expand your archiving quota to 1GB with the Pro Account and 10GB with an Ace account. We also offer custom quotas. Email us with your requirements for more information.

Additional details are on your account page.

What are the differences between the types of subscription accounts?

The main difference between subscription accounts is the amount of material you can gather each month (your quota). Each subscription account has a diffent quota:

Additional details are on your account page.

How do I recover my password?

If you have lost your password, you can reset it here: http://www.hanzoweb.com/lost/