November 06, 2024

Mitigating Risks of Monetizing Open Data

Share

Increasingly, companies are putting to use publicly available (often no-fee) “open” data in a myriad of ways to improve and expand their offerings and competitiveness. For instance, Yelp enhances its restaurant reviews with data from municipal health inspection reports,1 and data from property sales or new business licenses2 can help companies identify and qualify new customers. The increasing power of artificial intelligence tools increases the potential value of data available under “open data” licenses.

Using open data involves risks, such as problems with the accuracy, validity and continued availability, the burden of license compliance, the potential for liability for infringement of intellectual property (“IP”), and the potential loss of commercial value of proprietary data. Companies can mitigate or avoid these concerns and the financial loss and business disruption that may result from them by adopting best practices around open data.

What Is Open Data?

There is no single definition of “open data.” The Open Knowledge Foundation—a global non-profit information-sharing network—defines open data as:

“… data that can be freely used, re-used and redistributed by anyone -- subject only, at most, to the requirement to attribute and sharealike.”3 4

Open data is usually made available under “open data” licenses, the terms of which vary widely.5 Two popular examples are the Creative Commons (“CC”)6 and the Open Data Commons7 family of licenses. Open data may also be distributed under bespoke licenses with their own unique requirements, such as The World Bank’s “Summary Terms of Use”8 and the “UK Open Government Licence.”9 Some providers make their datasets available without any license at all, or under an informal license. 

Navigating the Risks

The risks in using open data arise from its unique license models and the way it is collected and provided. They include the following:

  • Varying Open Data License Terms. There are significant differences among open data licenses. For instance, in the CC license family, the CC0 license10 has few restrictions but the CC-BY license11 requires users to provide notice of the license terms and licensor attribution. Depending on how the data is used, the license may conflict with a company’s business objectives.
  • Commercial Use Restrictions. The license may prohibit commercial exploitation of the open data, thus precluding a company from using the open data in its products. For example, Yelp’s terms of use prohibit commercial use of its open dataset of millions of business reviews.12
  • Share-Alike Requirements. Some open source licenses have share-alike requirements that obligate the licensee to make freely available to the public any data that is combined with or included in the open data, sometimes subject to limited exclusions. For example, the Open Database License13 requires such sharing generally, though it also includes exclusions that allow some types of commercially valuable use from the sharing requirements. 
  • “As-Is”, Without Customary Licensee Protections. Open data licenses typically omit licensee protections often found in commercial license agreements. There thus may be little assurance the data will conform to any specifications or be accurate, up-to-date, non-infringing or suitable for the intended uses. Companies thus bear the entire risk of using such data.
  • No Assurance of Continued Availability or Reliability. Open data licensors are generally not committed to continue to provide the open data. Even if the open data remain available, there is rarely any commitment by the providers to (1) continue using equally reliable methods and sources of collection, or (2) retain the license terms governing such open data. Mitigating this risk requires companies to regularly monitor open data sources for any advance indication of such changes.
  • Open Data Licenses May Impact Product Terms and Conditions. Companies using open data in a product should confirm that their terms of use for the product are consistent with the open data license terms for that open data. Notably, some open data licenses impose restrictions that must be flowed down to the terms of use for the product. Moreover, companies may need to exclude the open data from the assurances that they give to their customers as to the accuracy and availability of their products.
  • Open Data May Be Collected Without Observance of Confidentiality or Other Legal Requirements. Open datasets may include data that have not been collected in a legally adequate manner. The open data provider may not have obtained the required consents from all data contributors or taken adequate care to avoid unlawful data bias. Similarly, the provider may not own sufficient IP rights in data or may have violated applicable confidentiality or other restrictions in collecting and sharing the data. Further, if the provider permits contributors to retain ownership of their data (in contrast to requiring them to assign their IP rights to the organization), user companies may be subject to claims by individual data contributors for breach of the open data license.
  • Privacy Risks. Even if the open data are “anonymized” data, personal data may still be subject to privacy laws given the challenge of achieving the stringent requirements for data de-identification under privacy laws. Further, if a company subject to the EU’s General Data Protection Regulation uses for its own purposes open data that incorporate personal data, that company will be likely to be deemed the “data controller” of such personal data and, therefore, required to comply with substantial legal burdens. 
  • Open Data Licenses May Conflict. When open data from multiple sources are used together, the open data licenses that govern those contributions may be legally incompatible. If this occurs, it may not be possible for the company to comply with any of them.

Mitigating the Risks

Developing and adhering to clearly defined processes and policies regarding open data can help companies realize the benefits, and mitigate the risks, of open data. The particulars of these processes and policies depend on the company’s use of open data. For example, a company that distributes products based on open data likely needs a more detailed open data policy than one that only uses open data for internal purposes. While there is no “one size fits all” approach, we generally recommend the following to companies using open data:  

  • Identify Key Objectives and Stakeholders. As a first step, identify key legal objectives such as open data license compliance and avoiding conflicting license terms and key business objectives such as being able to charge fees for the right to use its proprietary data and avoid a requirement to share and license to others its proprietary data. Involving a range of stakeholders in this process can help to make the resulting policy practical and more likely to be followed. For instance, ask each stakeholder for guidance from their specific expertise (e.g., privacy professionals, IP lawyers, data scientists and ethics professionals, IT professionals, or product business “owners”).
  • Conduct Due Diligence on the Open Data Before Using It. When open data are to be used in a company product or are used in products acquired from third parties, evaluate that open data to understand context, such as who contributed to its creation, the legal basis upon which contributions were made (e.g., by assignment of IP rights or a license granted to the organization compiling the open data), the manner of collection, the continued availability of the underlying data, and the steps taken to ensure the accuracy and credibility of the data. Use these details to evaluate whether the proposed use of an open dataset fits within their risk tolerance for the proposed use.
  • Modify Customer Contracts to Promote License Compliance and Have Appropriate Risk Allocations for Open Data. Ensure the company’s customer contracts for products (i) contain terms that are consistent with those governing the open data used in such products, and (ii) appropriately allocate the risks of inaccuracies, infringement and other problems arising from such open data.
  • Modify Supplier Contracts to Address Open Data Risk. Modify vendor, consulting, and development agreements to require suppliers to report to the company all open data proposed to be used by the supplier for the company and to obtain the company’s approval of the type and nature of such desired use. Use these reports to avoid unwittingly becoming subject to share-alike and other open data license requirements that could be inconsistent with the company’s business objectives.
  • Implement Formal Open Data Policies.
    • These policies may be similar to controls already used to prevent the use of “viral” (or “copyleft”) open source software (which may require the final product created using such software to be made freely available to the public).14 Similarly, in the open data context, companies often strive to avoid using open data governed by share-alike licenses.
    • Deploy open data policies for employees to follow when interacting with open datasets, including:
      • Open data-focused checklists of considerations to assess the company’s potential use of new tools and systems;
      • Streamline the open data license approval process by designating open data license as “GREEN” (acceptable for use without further approval), “YELLOW” (may be acceptable depending on the proposed use) and “RED” (strictly prohibited without special approval from legal or management);15 and
      • Implement policies for regular monitoring or scanning of (i) code and databases to identify non-approved open data use, and (ii) open data sources to confirm the absence of changes to applicable open data license terms or the characteristics of the open data.
    • Design the open data policy for ease of use, and, when finalized, communicate it to all relevant personnel. Just as companies conduct regular trainings related to cybersecurity and harassment in the workplace, open data training for those involved in product development and procurement is critical.
    • Update and repeat this training periodically to address new developments related to open data (e.g., the release of a new version of a popular open data license, a judicial decision interpreting the terms of an open data license).

Conclusion

Open data offers value, often at little or no up-front cost. However, open data licenses provide few protections beyond a protection against claims by the licensor claim for unlicensed use. We recommend that companies using open data adhere to rigorous due diligence processes, proactive monitoring and enforcement of open data license and use restrictions, and “open data” policies to manage the accuracy, availability, infringement, bias, and other risks associated with open data. By doing so, companies can capitalize on the benefits of open data, while helping to mitigate the associated risk and maintain the trust and confidence of their customers and stakeholders.

 


 

1Where Does Yelp Get Health Score Information?”, Yelp Business Help Center, accessed August 7, 2024.

2 What Is the Verified License Badge?”, Yelp Business Help Center, accessed August 7, 2024.

3 Laura James, “Defining Open Data” Open Knowledge Foundation, accessed August 7, 2024.

4 Generally speaking, in open data parlance, “attribution” refers to giving the licensor credit for providing the open data, and “sharealike” means that any data that is derived from, or combined with, the open data must be made available under the same or compatible terms as the original open data. Sharealike requirements often raise major concerns for companies that want to preserve the value of proprietary data to be mixed with open data.

5 Alex Ball, “How to License Research Data”, Digital Curation Centre, accessed August 7, 2024.

6About CC Licenses,” Creative Commons, accessed August 7, 2024.

7 Licenses,” Open Data Commons, accessed August 7, 2024.

8 Terms of Use for World Bank Group Textual Records,” World Bank Group, accessed August 7, 2024.

9 Open Government License for Public Sector Information,” The National Archives, accessed August 7, 2024.

10CC0 1.0 Universal Legal Code,” Creative Commons, accessed August 7, 2024.

11 "Attribution 2.0 Generic Legal Code,” Creative Commons, accessed August 7, 2024.

12 Yelp Open Dataset”, Yelp, accessed August 7, 2024.

13 Open Data Commons Open Database License”, Open Knowledge Foundation, accessed August 7, 2024.

14 Paul Chandler, Emily Nash, “The Importance of Tracking and Managing the Use of Open Source Software,” Mayer Brown LL, October 6, 2022.

15 Id.

Stay Up To Date With Our Insights

See how we use a multidisciplinary, integrated approach to meet our clients' needs.
Subscribe