Problems+in+Web+Preservation

= Problems in Web Preservation = toc The Internet is an important part of the current culture. Though the exact numbers are hard to pin down, Mashable suggests that there are approximately 150,000 new web sites each day. This large influx of new data is difficult to archive and difficult to preserve. The ability to preserve the information is hampered by those who do not understand the importance of preserving online information even after the website has been closed, technological limitations and the rapid increase of websites and online information. The Internet Archive and its mirror at Bibliotheca Alexandrina work to snapshot and archive as much of the web as possible. Other archives participate in Archive-It to create more specific online collections. The Internet Archive is one organization that works to archive the web as a whole. Other organizations are working to preserver smaller sections. Despite these efforts, the massive influx of data created on a daily basis will require ongoing preservation work and additional resources to preserve properly.

International Internet Preservation Consortium
As with web archiving, the primary organizations involved in Internet preservation are archives and other information centers. The International Internet Preservation Consortium (IIPC) was created to discuss web preservation problems and solutions for various archives interested in preserving the web. The IIPC develops policies and procedures for net preservation, as well as best practices. Organizations using programs like web crawlers to archive the web can look to various reports and documents to get information on how best to use these tools. A recent IIPC report was published to provide information on organizations harvesting National domains. This report provides information on what should be harvested as part of the scope of the program; in this case items that should be harvested during a crawl include html pages, videos and similar items, but MIME types should not be included. The IIPC maintains a list of software programs that can be used to preserve the web using web crawler programs and web curation tools to make the information available online; this encourages organizations to make the information available in formats other than through the use of a URL.

WebCite Consortium[[image:preservation.png width="480" height="213" align="right" caption="Screen Shot of Curated Collections at the Internet Archive" link="@http://www.archive.org/web/web.php"]]
The WebCite Consortium used to be part of the IIPC, however it now stands on its own; it now works to preserve web sites and pages that have been cited in scholarly journals. The WebCite Consortium is very narrowly focused on collecting web pages that have been cited in scholarly publications only one aspect of the web preservation process. However, it is an example of the methods by which the web may be preserved. The sheer numbers of web pages needing to be preserved overwhelms organizations such as the Internet Archive. Archives, libraries and information centers can assist in the process by targeting specific types of websites. The IIPC and the Internet Archive both provide software and instruction to help organizations begin the web preservation and curation process. Additionally, organizations such as WebCite are able to software to curate the collection, something that the Wayback Machine at the Internet archive is not able to do.