septiembre 27 2023

Navigating Issues to Ensure Effective Utilization of Open Data

Share

Data is often described as the “fuel” of the 21st century, and organizations that deploy effective data utilization strategies are well positioned to reap rewards in our increasingly connected and technologically advanced society. Organizations increasingly seek to enrich their proprietary data sources and improve their operations and product/service offerings by using open data—that is, datasets which have been made publicly available. For businesses where sanctity and effective utilization of data is paramount, care is crucial when integrating open data into an organization’s operations and product/service offerings. In the context of technology transactions, vendors should anticipate buyer diligence into the methods by which open data is utilized within a product/service, given the potential pitfalls associated with utilizing open data, and buyers should be aware of those pitfalls and consider them in their diligence activities.

What is open data?

The Open Knowledge Foundation—a global non-profit information sharing network—provides the following definition of open data:

“Open data is data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and share-alike.”

Open data is typically made available through open data licenses, such as the Creative Commons licenses and the Open Data Commons Licenses. Certain organizations may also put bespoke licenses in place for their data - such as The World Bank’s “Summary Terms of Use”, or the UK government’s “UK Open Government Licence”.

Why does this matter now?

Open data is not a new concept—for instance, in the United Kingdom, open data has been commonly used in the public sector and industries such as health, transport and energy. However, as data-reliant technologies continue to be developed and integrated into everyday life at rapid speed—currently seen through developing technologies such as artificial intelligence (AI)—the demand for data will likely result in further demand for open data.

Similarly, the output from these technologies may generate new, innovative datasets which themselves may be made publicly available and expand the pool of available open datasets:

  • global operators of private hire vehicle companies are already making available anonymized, aggregated data from billions of trips taking place globally. The data is divided by city and includes speed data on vehicles travelling along certain city streets at particular times of the day, the average zone-to-zone travel time across different parts of a city, the average travel time from one part of a city to another depending on when in the day the journey takes place and the volume of travel activity across micromobility devices (such as scooters, or bicycles) in certain cities. These datasets may be transformative for, for example, town planners, organizations seeking to open premises in new cities and organizations operating in residential and commercial property markets. The open data may also be enriched with other open data sources, for instance, open data on motor vehicle collisions in New York City as maintained by NYC Open Data at the NYC Office of Technology and Innovation;1
  • organizations working on autonomous vehicles are increasingly releasing datasets from their development activities to the public. For instance, Audi has released its Autonomous Driving Dataset (A2D2), which contains an open dataset of more than 40,000 frames with semantic segmentation across 38 categories, which will assist organizations in training models for self-driving vehicles. Audi describes the initiative as removing the high entry barrier which comes with “equipping a vehicle with a multimodal sensor suite, recording a large dataset, and labelling it;” and
  • in the European Union, the new Data Act which will open up access to information generated by internet-of-things (“IoT”) devices is another indication of the pressure to make data available for broader use.

Key Considerations in Using Open Data

Navigating the risks

In-house counsel teams for organizations seeking to benefit from open data, be it through integrating open data into their operations or those considering whether to make their own proprietary data publicly available as open data, should be aware of some of the following key legal risks associated with effective utilization of open data:

  • Open data license terms: The terms of the open data license may be onerous or incompatible with the organization’s goals. Common restrictions include:
    • Use restrictions: The open data license may be subject to geographic or time limits, or may not permit commercial exploitation. For instance, Yelp maintains an open dataset of millions of business reviews2, but Yelp’s terms of use prevent commercial exploitation of the underlying data.
    • Share-alike terms: Using open datasets governed by licenses which include share-alike obligations may present a risk for organizations and individuals seeking to commercialize their proprietary datasets. This risk arises from share-alike terms obliging users to share publicly any adaptations, alterations or other modifications of the underlying open dataset on the same permissive terms as the share-alike license. In doing so, the user potentially dilutes the commercialization opportunities in their proprietary dataset; for instance, any licensing opportunities arising from any intellectual property or associated right subsisting in the user’s proprietary dataset.
    • Data “as is” and liability exclusions: Typically, open data licenses will not include contractual protections—such as warranties with respect to data accuracy and broad liability exclusion clauses—and thus leave a user entirely at risk as to the quality, accuracy, relevance, bias and other aspects of the data.
  • Privacy Risks: Organizations must comply with applicable data protection laws even for open data. Organizations should not assume that the entity who made the data publicly available has taken sufficient steps to ensure the open data has been collected in accordance with data protection laws. For instance, if the organization utilising the open data is subject to the European Union’s General Data Protection Regulation (“GDPR”), the organization will be likely to be deemed a data controller for any personal data contained in the open dataset and would therefore need to comply with the obligations which comes with that responsibility – such obligations include ensuring appropriate organizational and technical measures are in place to maintain the security of the data, maintaining records of processing activities and ensuring that appropriate privacy policies are implemented to inform data subjects how their personal data is being processed and of their rights under the GDPR.
  • Data Supply: The value of the data to a user may be attached to the propensity of the providing organization’s approach to the continued collection and supply of the open datasets—notably generating a supply chain-type risk. For instance, in stretched economic times, public sector bodies may decide to “turn off the tap” on the collection and supply of some open data sources, or may move to different methods of collection and/or supply which render the underlying dataset less valuable to the user.
  • Misuse of Confidential Information: Simply because data has been made available on an open data basis does not remove the possibility that the data may have been collected in breach of a confidentiality obligation, and therefore subject to being restricted from further redistribution.

Minimizing risks from open data utilization through diligence and operational processes

  • In-house counsel should consider developing and maintaining appropriate due diligence processes surrounding use of open data which meet the needs of the organization. Not only is this important from a legal perspective, given open data providers typically disclaim all liability with respect to the data, but also from an operational perspective to ensure open data is only onboarded from credible sources, as failure to do so could infect the onboarding organization’s proprietary dataset and/or lead to false or otherwise poor output depending on the purposes for which the open data is utilized. Operationally, this could be achieved through appointing relevant individuals to oversee open data usage within the organizations; those individuals may hold different expertise depending on the nature of the dataset—such as privacy professionals, IT professionals and data science/ethics-type roles.
  • In-house counsel should also ensure that the licenses governing the open data usage are reviewed to ensure any usage restrictions are not inconsistent with the plans of the onboarding organization – for instance, the appropriateness of utilizing open data which is subject to a share-alike license. To facilitate this process, in-house counsel teams could implement an internal open data use policy that explains which types of open data license are acceptable for use in the organization and where escalation is required prior to use of open data subject to certain other licenses. They should also monitor any industry or sector licenses which emerge in Europe for IoT data as a result of the EU Data Act, which may set the tone for broader open data licensing in particular sectors.
  • When transacting with open data and open data-rich products or services, in-house counsel for organizations supplying those products and services should avoid providing contractual assurances to customers or users on items such as the accuracy, sufficiency, completeness or relevance of the open data or associated products/services. Ultimately, failing to do so will expose the organization to a risk which is not shared with the data providers given the “as is” nature in which the data was supplied, coupled with the fact that the open data is not proprietary to the organization and may, in any event, be subject to ongoing updates or modifications from contributors in the open data community which impact the sufficiency of the open data provided in the supplied goods or services.
  • Data becomes historic quickly. Therefore, in-house counsel for organizations should regularly monitor their use of existing open datasets to ensure that the data matches the goals of the organization, and that the most up-to-date data is being utilized where the product/service offering is reliant on the most current datasets. This regular monitoring arrangement also coincides with the diligence processes relating to the source of the dataset. For instance, if a credible open dataset provider is committed to providing regular updates to a particular dataset, the dataset will likely be more valuable to organizations utilizing the data to build technology modules given the ability to continually update the underlying datasets. By contrast, a product which places any sort of material reliance on an open dataset which is not being regularly updated by its creators/contributors may directly impact the future value of the product, without further engineering from the owning organization to find replacement datasets.

1https://data.cityofnewyork.us/Public-Safety/Motor-Vehicle-Collisions-Crashes/h9gi-nx95
2https://www.yelp.com/dataset

Stay Up To Date With Our Insights

See how we use a multidisciplinary, integrated approach to meet our clients' needs.
Subscribe