This post was co-authored by Emma James, Kate Tasker, Eileen Chen, and Lisa Nguyen.
Have you ever searched for an important document on your computer, only to find that it won’t open? Or downloaded a CSV dataset as an Excel file, only to find encoding issues? Or worse, your laptop crashed, and you lost six months of work? If you’ve experienced anything like this, you’ll recognize how vital digital preservation is in preventing data disasters.
This November 7, the UCSF Library joins hundreds of libraries and archival institutions in celebrating World Digital Preservation Day and safeguarding our community’s digital content. Read on to learn about the risks facing your digital files and get tips from our digital archivists and data experts.
Technological obsolescence
Your thesis in Microsoft Word, research datasets created with special software, or graduation photos are all part of your digital archive. Unlike paper-based materials, these digital objects rely on technology to be accessed and understood.
Digital objects, despite their variety, are all just binary code – sequenced strings of 1s and 0s read by machines and interpreted by software. Accessing digital objects requires successful interaction with these components. That’s why a document created using 1990s WordPerfect software may not open on today’s computers. As technology evolves, older digital assets become vulnerable because tech companies stop supporting the older versions of software. This threat, called technological obsolescence, has endangered digital data since the use of floppy disks.
What can I do about technological obsolescence?
- Move files off old media: For example, transfer data from outdated formats like VHS tapes, DVDs, or decade-old hard drives to newer devices. You may need a vendor to digitize and transfer the materials, or gradually replace old hard drives with new ones.
- Regular backups: Develop a habit of backing up your files regularly. Use multiple storage solutions like external hard drives and cloud storage. This ensures that if one storage solution becomes obsolete, your files are still available on another.
- Widely used file formats: Save important digital research, documents, and memories in widely supported, non-proprietary file formats. This minimizes compatibility issues and ensures long-term accessibility. Here are some recommended file formats for your personal archives:
- PDF – documents, textual works, email, presentations, scanned documents
- TIFF – images, graphics, vector graphics, scanned analog images
- WAV or MPEG-4 – audio files
- MOV, ProRes 422 HQ, or ProRes 4444 – video files
- CSV – datasets
- EML or MBOX – email
Bit rot
Damage, or loss, to binary code makes a digital file inaccessible. This is a challenge of a ‘binary’ object. Digital files may seem endlessly reproducible while intact, but any change to their bits, flipping a 0 or 1, can corrupt the data instantly. This can happen due to encoding errors (moving from one system to another), or physical degradation of digital media (hard drives, flash drives, memory cards) that store your data. The gradual corruption of digital data from aging storage devices is called bit rot.
What can I do about bit rot?
- Regular backups: Backups are your first line of defense. Store your backups on multiple types of media (e.g., external hard drives, cloud storage) to prevent all copies from degrading at once. For valuable data, keep a set of hard drives in different geographic locations (one set at work, one set at home), to protect against accidental loss or damage.
- Storage conditions: Physical degradation of tape media, optical discs (DVDs, CDs) and hard drives often cause bit rot. Store your hard drives or digital storage devices in cool, dry places, away from sunlight.
- Media refreshing: Just as you start a car occasionally to keep it in working order, do the same for digital storage devices. This is called media refreshing. Once or twice a year, power your hard drives and connect them to your computer to ensure they’re functioning properly. As an added bonus, log your media refresh dates.
Lack of control
The last threat is human interference: intellectual control, or knowing a file exists, where it’s stored, and who has access to it, is a key archival concept for digital and physical records.
Digital files lack the physical presence that reminds us of their existence, making it even more important to organize and catalog data so it doesn’t get lost in the digital ether. File names also improve discoverability, access, and control. While the ease of sharing digital files is a key benefit, it also introduces risks from hackers, malicious users, or simple human error. A digital file can be just as easily deleted and lost as it was created.
If you are still skeptical, consider how a Pixar Animation staff member saved Toy Story 2 from being deleted during production with their home computer.
What can I do to better control my digital content?
- Human readable file naming: Use clear, descriptive file names that include details like dates, project names, or version numbers. Avoid generic names like “Document1.docx” or “IMG1234.jpg.” Instead, opt for something like “Thesis_Final_2023.docx” or “Graduation_Photo_Family_2023.jpg.” Use underscores instead of spaces in your file name and avoid ‘special characters’ like colons or ampersands.
- Logical folder structure: Create folders and subfolders that suit your needs and stick to them. Organize by date, project, or file type to make files easier to locate.
- Metadata: Use metadata such as author names, creation dates, keywords, or descriptions to improve file discoverability and management. You can do this for most files by adding information to file properties.
- Access: For confidential or sensitive data, limit access to those files. You can use password protection or encrypted hard drives that require a specific key.
Bring digital preservation to your everyday work
Although archivists and librarians focus on addressing digital fragility, anyone can practice digital preservation. Though often seen as an end-of-road activity, archiving begins with you. Digital materials have a shorter lifespan than we realize if not managed regularly or promptly. You play a critical role in ensuring long-term digital preservation of your materials.
A digital preservation mindset ensures your data remains accessible over time, especially when transferring it to libraries and archives. Creating a data management plan at the start of your project or research can streamline workflow and avoid last-minute panic. In fact, as of 2023, the NIH requires all grant seekers to submit a data management and sharing plan. Beyond regular backups and good file naming conventions, depositing your data to a digital repository is a growing practice that improves data accessibility and transparency. Several general and subject-specific repositories are available to UCSF researchers.
For assistance with digital preservation, please reach out to the Archives & Special Collections team, or contact our data management experts for help with data management plans. Check out the data management plan guide.
Whether you’re managing databases, writing papers, or taking photo documentation, digital preservation starts with you!
Image courtesy of the Digital Preservation Coalition