No More Silence: Opening the Data of the HIV/AIDS Epidemic using Natural Language Processing techniques
No More Silence: Opening the Data of the HIV/AIDS Epidemic is a UCSF Archives project to prepare digitized archival materials on HIV/AIDS as textual data for use in computational research in the health sciences and humanities. The project aims to facilitate new research and discovery that bridges gaps between patient care, the lived-experiences of people with AIDS, and the historical and cultural components of the epidemic by allowing for analysis of large sets of textual data from this historical moment using one dataset. What can we discover when we are able to ask questions of a large corpus of records at once, tracing things such as the adoption of diverse terminology around gender across time and institutions, and the sentiments around different treatments and therapies as they entered the arena of possibility for the first time? This dataset allows researchers to begin asking such questions across multiple archival collections at different institutions and using hundreds of thousands of pages of documents.
What to expect
This half-day workshop will take participants on a brief introductory tour of the data from the No More Silence project and will demonstrate some Natural Language Processing techniques of analyzing the text computationally using the Python programming language in the Jupyter Notebook interactive coding environment. Participants will subsequently be broken up into groups to collaborate on exploring research questions of their own.
Some knowledge of basic computer programming techniques, practices, and software will be very helpful but will not be required. The workshop will walk through each step in enough detail for people with no experience to follow along. To complete workshop exercises on your own, workshop participants will need a laptop with administrator privileges (ability to install and run software) and will need to install the required software and test that it is working before the day of the workshop.
Required software for participants
- Unix Shell (native on Mac, suggested to use GitBash on Windows)
- Anaconda (This is a software package, specifically we will be using Python 3 and Jupyter, which come inside it)
- Other Python packages or modules may be required in addition to those included in Anaconda, workshop leaders will provide participants with instructions for downloading and installing these modules before the day of the workshop.
Presenters
Clair Kronk, University of Cincinnati College of Medicine, Cincinnati, USA (she, her, hers)
Clair Kronk is a Ph.D. student in Biomedical Informatics at the University of Cincinnati College of Medicine. Her research interests lie in the linguistic aspects of LGBTQIA+ data collection and their utility in biomedical research and patient-centered care. She obtained her undergraduate degree in Bioinformatics from the University of Pittsburgh in 2017 with minors in chemistry, theatre arts, neuroscience, and German language studies.
Charles Macquarie, UCSF, San Francisco, USA (he, him, his)
Charles Macquarie is the Digital Archivist at the UCSF Library, Archives & Special Collections and the project manager for the No More Silence project to provide computational access to the data from digitized AIDS History Project collections in the UCSF Archives. In addition, Charlie manages the larger digital archives program at UCSF where he coordinates the processing and preservation of digital media in the Archives, supports and carries out digital humanities projects using Archives materials, and investigates various experimental methods of using Archives collections in computational research. Outside of UCSF Charlie works on expanding the concept of library as creative platform through co-presenting PLACE TALKS — a visual lecture series on location which is based in San Francisco’s Prelinger Library — and the Library of Approximate Location — an ongoing artist project engaging land and resource use in the Western United States through the installation of site-specific libraries at remote locations throughout the west.
Rebecca Tang, UCSF, San Francisco, USA (she, her, hers)
Rebecca Tang is a programmer at UCSF Library, Center for Knowledge Management. She has been with the Industry Documents Library (IDL) for the past six years. During this time, the project went from two separate archives: Legacy Tobacco Documents Library and Drug Industry Document Archive to a combined Archive with additional industries such as Chemical, Food and Fossil Fuel that allows cross industry queries. Prior to UCSF and Industry Documents Library, Rebecca worked at Stanford University building a knowledge base of pharmacogenomics information. She worked with Oracle database, Java Spring framework, Lucene and Nutch.
Joanna Kang, UCSF, San Francisco, USA (she, her, hers)
Joanna Kang is a program and marketing coordinator at the UCSF Library. She has worked in libraries for seven years and has produced dozens of workshops and events, including programming boot camps for scientists and a storytelling hour for the UCSF community to share their lived experiences. Joanna supports the Library’s diversity and inclusion mission as chair of the DI work group. Her favorite author is David Sedaris and she loves public transit.
Location
UCSF Mission Bay Campus
Mission Hall, Room MH-2100
550 16th Street,
San Francisco, CA 94143
The No More Silence project is supported in part by the U.S. Institute of Museum and Library Services under the provisions of the Library Services and Technology Act, administered in California by the State Librarian.