Library spaces open to UCSF ID holders. See timeline for reopening.

Web Archives Collecting Policy

The UCSF Archives and Special Collections derives its collecting mandate from UCOP policies BFB-RMP-1 and BFB-RMP-2, and serves as the repository for the archival records, including websites, generated by or about UCSF, including the Schools of Medicine, Nursing, Dentistry, Pharmacy, the Graduate Division, and the UCSF Medical Center.

It is the responsibility of the Archives to identify, collect, arrange, describe, make available, and preserve records of permanent administrative, legal, fiscal, and historical value. These records are preserved as an asset for the UCSF community and other researchers. The transition to digital information for most University business means that many such relevant records now exist as online resources.

The web archiving activities of the Archives document the following areas:

  • Primary functions of teaching and research
  • Development of health care education and health sciences research
  • Leadership in the community at large
  • Activities of the student body and alumni
  • The development of the physical plant and grounds

To fulfill these goals, the Archives collects websites of:

  • Administrative offices
  • Academic departments
  • Faculty, administrative, and student committees
  • Faculty and student clubs
  • University and student publications
  • Laboratories and research facilities

Additionally, outside of UCSF websites, the Archives collects more broadly in the areas of:

  • AIDS History
  • Anesthesiology
  • Biotechnology and biomedical research
  • Tobacco control and regulation (maintained in collaboration with the Industry Documents Library)
  • Global Health Sciences
  • Neuroscience
  • Computational Medicine

The Archives may make exceptions to the above criteria, considered on a case-by-case basis.

Web Archives program guidelines and responsibilities

The Archives uses the Internet Archive Archive-It service to capture websites. No active participation is required from UCSF content owners and creators, but several steps may be taken to help ensure that websites are preserved as completely as possible.

Responsibilities and best-practices for web-archiving at UCSF are outlined below:

The UCSF Archives and Special Collections will:

  • Identify, appraise, and select websites that reflect the mission and collecting interests of the Archives and Special Collections as outlined in the Collections Policy
  • Organize and manage archived websites to complement current holdings in the UCSF Library
  • Provide descriptions and contextual information for materials
  • Mediate access (via metadata, catalog records, and an access interface) to facilitate the search and retrieval of content
  • Respect the intellectual property rights of owners and ensure compliance with all applicable laws and policies:
    • Distinguish ‘archived’ sites from ‘live’ content with a prominent banner and statement at the top of each preserved web page
    • Suppress content from public view or refrain from website preservation at the request of content owners
    • Not capture any content which requires a password to access or which may contain protected health information or other restricted data
  • Reach out to webmasters when website design or configurations pose issues for the accurate capture of content

The Internet Archive’s Archive-It service and team will:

  • Maintain the web crawler, a computer program (or robot) that browses websites and saves a copy of all the content and hypertext links it encounters. By default, Archive-It’s crawler will not degrade website performance

  • Store archived content in a digital preservation repository at one of the Internet Archives’ facilities

Content creators and owners will be able to:

  • Rely upon the UCSF Archives and Special Collections to identify, preserve, and provide access to multiple versions of select websites over time
  • Allow the Archive-It web crawler to preserve websites by including the following exception in the site’s robots.txt file:
    • User-Agent: archive.org_bot
      Disallow:
  • Inform the Archives if a website is scheduled to go online, be decommissioned, or undergo significant changes
  • Request capture of a website, and specify the frequency with which the capture should occur (one time only, weekly, monthly, quarterly, or annually). Default capture for UCSF content is quarterly
  • View captured pages in our Archive-It collection

Please note: The Archives may not be able to preserve the exact form, functionality, and content of sites as they appear on the live web. The following types of content present significant issues for capture and/or display:

  • Dynamic scripts or applications such as JavaScript or Adobe Flash
  • Streaming media players with video or audio content
  • Password protected material (we do not collect any web-content which requires a password to access)
  • Forms or database-driven content that requires interaction with the site
  • Exclusions specified in robots.txt files

Request capture of a UCSF website

UCSF affiliates are encouraged to request capture of the websites they maintain using the UCSF Website Capture Request Form.

Questions or comments?

Contact the UCSF Digital Archivist at digitalarchives@ucsf.edu.