By Kate Barron, Data Services Librarian, Dr. Martin Luther King, Jr. Library, San José State University
The 2018 Ithaka S+R US Faculty Survey found that faculty increasingly prefer to manage and preserve their data using cloud-based storage services like Google Drive and Dropbox. Cloud computing is a model for accessing a pool of computing resources (in this case, file storage) on-demand. Because pooled computing resources are dynamically assigned based on demand, users can’t always pinpoint the physical location of the resource (hence, the resources are referred to abstractly as “the cloud”). This configuration means that cloud-based storage services afford instant file storage/access from a variety of devices, with little to no service provider interaction. Understandably, these qualities make cloud-based storage services an attractive option for busy faculty. In contrast, utilizing traditional information technology storage services can be time-consuming and costly. At San Jose State University, campus information technology can house faculty- or department-provided servers in secure data centers, or host virtual servers for faculty use. In either case, service requests can take anywhere from 5 to 10 days to process and may be cost-prohibitive because users provide or pay for their own equipment.,
Luckily, many universities have caught on to the cloud storage trend. San Jose State University subscribes to G Suite, which includes, among other applications, the file-hosting service Google Drive. What many SJSU faculty may not realize is that as a member of the California State University (CSU) system, our campus’ service level agreement with Google is governed by CSU information security policy. The policy describes several threats of cloud computing to information security, including (but not limited to): the university’s loss of complete control of (and thus, the ability to protect) its data assets; the potential loss of data privacy (such as when cloud providers aggregate consumer data); and the university’s dependence on 3rd party infrastructure, which may have security or technological defects. In order to reduce these threats, the CSU has specific requirements for acquiring and using cloud solutions. For example, cloud storage vendors must provide a security plan to campuses, and campuses must employ processes to consistently review contracts and incorporate necessary updates. Once acquired, campuses may not use cloud storage for personally identifiable data, and should use a central authentication method to limit access to qualified personnel. As librarians and information professionals, it’s important for us to advise faculty against using cloud storage not managed by campus information technology. This can be rather troubling news for faculty at San Jose State University since we only contract with one vendor, Google. However, other institutions may contract with multiple providers (Box, DropBox, OneDrive, etc.), and so the advice may be less daunting.
Once one can ensure that their cloud storage satisfies requisite information security standards, they must also carefully consider how cloud storage facilitates long-term data preservation (as opposed to working storage). Data preservation involves a variety of activities, of which providing storage space is just one component. To understand the breadth of these activities, it can be helpful to consult the FAIR Principles. Preserved data should be made Findable through rich metadata, unique identifiers, and being indexed in databases or search engines. Preserved data should also be Accessible through a standardized communications protocol which allows for authentication/authorization when necessary. Finally, preserved data should be Interoperable (compatible with a variety of applications and workflows) and ultimately Reusable. Depending on one’s needs, it seems possible to uphold the FAIR principles using popular cloud storage solutions. For example, if I were to preserve my research data in Google Drive, I could express the metadata in XML (eXtensible Markup Language) and upload it to the same. However, Google Drive files may not be discoverable through Google Search; I would need to register the metadata in other discovery systems. Once discovered, it remains to be seen how users could actually access the data. Would I need to add them as specific contributors to my Drive? Could I make the Google Drive accessible to anyone with a link? And for how long could I rely on Google Drive to store my preserved data? Companies and their products change, after all. I map out this hypothetical data curation process not to dissuade librarians or faculty from using popular cloud storage for data preservation, but rather to highlight all of the accompanying considerations. By keeping abreast of developments in information technology and data preservation, librarians can play an important role in facilitating the efficient and secure faculty use of cloud storage.