Open Access to Data and Information during Public Health Emergencies

Tyng-Ruey Chuang
Institute of Information Science, Academia Sinica, Taiwan

[Note: This is a pre-recorded video for a presentation at IASC 2021 Knowledge Commons Virtual Conference.]

Collection: https://m.odw.tw/u/trc/collection/access2data-iasc2021kc

~~~~~ script ~~~~~

--- 0 ---

Hi, I am Tyng-Ruey Chuang from the Institute of Information Science, Academia Sinica, Taiwan.

In the preparation of this talk, I realize I can only focus on data and information, rather than knowledge in general, so I need to change the title.

I am sorry about this.

It is now titled "open access to data and information during public health emergencies".

--- 1 ---

Let's begin by looking into three areas of focus.

They are data, information, and population.

In the current COVID-19 pandemic, we have seen collaborative efforts in producing public health data and information resources.

For data, we will look into data integration.

That is, we will look at examples where data from different sources are aggregated and processed to an extent so that the end results are readily useful.

For information, we will look into information dissemination.

That is, we will look at examples where useful information is distributed to the public, and how people collaborate on that.

Individuals in a population, for sure, are the basic units producing data and consuming information.

We will identify several actors, especially non-profits and communities, that are instrumental in data integration and information dissemination.

The three areas are linked in a circular way.

Of course there are other areas that are linked to and from the three areas.

--- 2 ---

We now look at two examples in data integration: "Our World In Data", and GISAID.

GISAID is the Global Initiative on Sharing Avian Influenza Data.

"Our World in Data" provides timely worldwide COVID-19 datasets that can be broken down by countries.

It is supported by grants, and has a long lists of sponsors including donations from 4,000+ individuals.

Our World in Data is a public good.

It provides more than raw datasets, it is actually a data service.

The GISAID website accepts influenza virus sequences from around the world and it aggregates them into datasets for scientific research.

It has received 1.8 million SARS-CoV-2 genomes since early January last year.

It was created as an alternative to the public domain model of sharing virus sequences.

There is a "Database Access Agreement" where users must first agree to before they can start submitting virus sequences to and using sequences from the database.

GISAID is more like a club good.

Members do not need to pay but are bound by the agreement they sign up to.

Our World in Data and GISAID, in my view, are not typical "commons" as there is no clearly defined boundaries for both the resources and the communities around them.

Free-riding cannot be avoided and it is not actually discouraged.

Incentive, however, is an issue.

These resources are financially sustained by actors who are external to the communities around the data.

--- 3 ---

We now look into the WHO and Wikimedia cooperation in disseminating COVID-19 educational materials.

We also take notes of the distribution of disease misinformation and intelligence on social media.

The WHO-Wikimedia collaboration makes it easier to include WHO public health media files at the Wikimedia Commons, which is a digital library operated by the Wikimedia Foundation.

Wikipedia's COVID-19 coverage, however, is still produced by volunteer editors.

But they can now build on the materials uploaded to Wikimedia Commons by WHO.

The collaboration between an intergovernmental organization and a non-profit with fluid membership is interesting in several ways.

This cooperation shall also be viewed in the context where Wikipedia is not accessible in China and several other countries.

Taiwan is also excluded from participating in the WHO.

Taiwanese editors, however, are major contributors to the Chinese language Wikipedia.

This highlights the current exclusion and fragmentation in the global dissemination of information.

We also need to remember information from the authority is not necessarily useful.

But gossip on social media can be useful intelligence about disease outbreak.

For example, the WHO on January 14, 2020, stated on twitter that that was no clear evidence of human-to-human transmission of the coronavirus.

Two weeks before that, on a popular BBS in Taiwan, however, someone warned about a suspected outbreak of coronavirus cluster infection in Wuhan.

The authorities, the disseminators, and the individuals are embedded in one another's information enterprises.

It is the interaction among them that shapes the global reach of useful information.

--- 4 ---

During the COVID-19 pandemic, the global population is the carrier of public health data and information.

Each person is a reporting unit of disease data point.

Without the viruses extracted from the individuals, there will be no GISAID database to study the disease.

The COVID-19 datasets from "Our World In Data" all come from the affected populations.

If the public health data and information is viewed as a global commons, we see that there are multiple pools of resources and there are may actors.

The boundaries between the pools can be fuzzy as data can be repurposed and information can be enriched.

Instead, we look at the stewardship of data, tools, and projects.

Many of the stewards are not profit-seeking entities.

For data stewardship, we have examples from GISAID, Our World In Data, and Wikimedia.

There are also project stewards that support tool development, such as the Debian Project, Linux Foundation, and Software Freedom Conservancy.

In addition, the public licenses stewardship at Creative Commons etc. makes it easier for people to share data, code, and content.

As non-profits, the issues they face are mainly about governance and sustainability.

--- 5 ---

We now revisit Ostrom's design principles for common-pool resources.

The nature of data and information is different from that of natural resources.

Data and information has no natural boundary, and its distribution cost is marginally zero.

Ostrom's design principle no. 1 refers to "clearly defined boundaries".

That is, "individuals or households who have rights to withdraw resource units from the CPR must be clearly defined, as must be the boundaries of the CPR itself".

In providing open access to public health data and information, we see that there are mutually dependent actors.

Several actors collaborate to maintain a pool of resource, and an actor may work on several pools.

They are recursively related so are the boundaries between them.

Ostrom's design principle no. 8 refers to "nested enterprises".

That is, for CPRs that are parts of larger systems, "Appropriation, provision, monitoring, enforcement, conflict resolution, and governance activities are organized in multiple layers of nested enterprises".

For public health purposes, we can see that the populations themselves produce the data, and useful information are derived from that data.

Based on the derived information the populations take further actions.

For Ostrom's principles, it is natural to envision one CPR is nested in another one as they are homogeneous in nature.

In public health emergencies, instead, the three areas of data, information, and population are circularly connected to one another.

Together they form circular enterprises.

This concludes my presentation.

Thank you!


