The UCSF Industry Documents Library (IDL) is seeking applicants for an eight-week remote paid fellowship to support computational research using the JUUL Labs Collection. The fellow will take a leading role in a collaborative data project jointly mentored by the IDL and the UCSF Library Data Science and Open Scholarship (DSOS) team. There will also be opportunities to develop computational analysis skills for digital health humanities research, attend data science courses, and participate in staff meetings.
About the Industry Documents Library
The Industry Documents Library is a digital archive of documents created by industries which influence public health, hosted by the University of California, San Francisco Library. Originally established in 2002 to house the millions of documents publicly disclosed in litigation against the tobacco industry in the 1990s, the Library has expanded to include documents from the drug, chemical, food, and fossil fuel industries to preserve open access to this information and to support research on the commercial determinants of public health.
In 2024, the IDL began a collaboration with the University of North Carolina Chapel Hill’s Libraries to expand the Juul Labs Collection, preserving and making millions of documents disclosed in litigation against the e-cigarette manufacturer accessible to the public.
About the Data Science and Open Scholarship team
The UCSF Library Data Science and Open Scholarship (DSOS) team builds computational and data skills in the UCSF community by providing education and resources to trainees, faculty, and staff.
Fellowship structure and support
The fellow will design a summer project using data from the Juul Labs Collection in Consultation with the fellowship supervisor. Some project examples include video transcription and annotation or using UCSF Versa to extract summaries and topics. The fellow will be supported in identifying their project goal, defining an eight-week project scope, and developing tasks and timelines that enables them to learn and apply data science and digital health humanities tools and methodologies.
At the end of the project, the fellow will deliver a 15-minute live presentation via Zoom to UCSF Library staff and share their findings in a written report that will be published on the IDL blog.
Learning opportunities
- Build natural language processing (NPL) and machine learning skills
- Design and execute a data project and present findings
- Explore digital archival methods and practices
- Participate in staff meetings, one hour long, once a week
- Attend data science and digital health humanities workshops and classes (in-person and virtual options available)
- Receive mentorship and training from data scientists, programmers, and librarians
Necessary skills
Applicants must be enrolled in a degree/license program at a two or four-year institution (undergraduate only).
- Interest in digital curation and collection building for libraries and archives
- Excellent analytical and writing skills
- High level of accuracy and attention to detail
- Ability to work independently
- Proficiency in one of the following programming languages preferred: Python, R, Java
- Familiarity with natural language processing (NLP) tools preferred
- Two years or more of programming knowledge/experience preferred
Compensation and work environment
This eight-week fellowship is fully remote with 20-35 hours of work per week. Work hours are flexible and can be arranged to suit student schedules and course requirements (if applicable). The ideal start date is June 10, 2024. The wage for this fellowship is $23/hour.
How to apply
Please email a cover letter, contact information (name, email, and phone number) for two references, and your resume to Industry Documents Library Technical Lead, Rebecca Tang, at rebecca.tang@ucsf.edu. The initial review date for this position is March 29, 2024. Applications will continue to be accepted, but those received after the review date will only be considered if the position has not yet been filled.
Feature image by Susan Merrell, 2019.