AI Tools: ChatGPT, GitHub Copilot – The Concerns Over Intellectual Property and Data Privacy you should assess prior of a commercial usage

Macher Tecnologia

2 anos atrás

AI Tools: Concerns over Data Privacy and Intellectual Property

AI Tools: The Concerns Over Intellectual Property and Data Privacy you should assess prior of a commercial usage

AI tools could be a potential risk (and oportunity) to your organization. Learn in this article how to better assess the available solutions, balancing innovation needs and risk management.

(in Portuguese):

Today we will cover a trending topic, which is being up for several weeks: ChatGPT and its concerns over Data Privacy. In addition to that, we will instigate a conversation around other IA tools like GitHub Copilot and related concerns over Data Privacy and Intellectual Property.

This article is relevant for Software Engineers, Developers, Agilists, Project Managers, Security and IT managers.

Hello readers!

Welcome to Macher Tecnologia’s blog.

It is unquestionable the power and use of AI tools in our current, Digital, world. Companies are required to transform their business with the help of any viable technology at hand. AI, ChatGPT and GitHub Copilot, for example, are available at our fingertips and we are required – at least – to take a look and try it out.

It is also unquestionable how compelling it is for us to pilot a project or start using them given their simple approach, how effective they seems to be and how much time they can save from our teams.

So what could be the concerns over its usage?

Data Privacy: Unfortunately, not everything is as easy as it seems to be…

As seen in the past few days, ChatGPT has been temporarily “banned” in Italy due to Privacy concerns. Regulators are concerned that personal data might have been used to train the model and populate data repositories without the required information and/or transparency levels with data subjects.

Although OpenAI said that they are committed to GDPR and other privacy laws (we presume the brazilian LGPD too), as several data sources (books, articles, websites, blogs, and posts) are scrapped and used to provide these state-of-the-art-amazing responses, it is possible that the Italian concerns might have some reasonable justifications on its roots.

The ARS Technica recent article presents a point of view to its users which shows that privacy concerns reside, for example, on:

- Modification of how (legal basis, terms, etc) a given source data was originally published / disclosed on the internet, or, on how it has been made available;
- The fact that it seems OpenAI has no disclosed procedures for data subjects to verify IF the company stores data about them, WHICH data, or, how individuals can exercise their rights is also a Data Privacy issue.

But the concerns are no limited to Data Privacy. We have Intellectual Property concerns as well.

In case you plan to incorporate such AI into a product of yours, or a large-scale process or distribution, it is important to involve your legal and privacy teams.

A few months ago, Forbes published an article from Joe Mckendrick where he shares multiple thoughts and interpretation on the ownership of the intellectual property generated by AI, or, used as the source to generate the AI end result.

As we see on the article, there is not a single or easy answer.

In the other hand, ARS Technica article points out that the AI tools may not be properly compensating the original data owner for its intellectual property usage, or that it can even bypass news articles limitations (paid subscriptions) to access contents and render results. This issue can also generate ramifications to what you are planning to do with AI. Corroborating the article, Bard, Google’s AI, was caught on a plagiarism-like situation where it took a content from the web and replied to a user if it was on its own. Later, Bard assumed what it did could be, potentially, be considered plagiarism.

How does Data Privacy and Intellectual Property matters affects developers and IT teams?

Let’s take a look on something “closer” to Dev teams… a bit older, but still relevant for the analysis we are doing in here. About two years ago, GitHub launched the Copilot solution. GitHub Copilot is a powerful tool that uses machine learning to provide suggestions and auto-complete code for developers. However, its use brings up important intellectual property alerts that teams must be aware of. Firstly, it is important to understand that GitHub Copilot generates results from a large variety of code and data. This includes open-source code, proprietary code, and other forms of intellectual property. It is possible to say that implementing a code generated by GitHub Copilot could potentially infringe intellectual property laws. Copilot’s suggestions may unintentionally replicate or adapt patented code, exposing companies to legal action for patent infringement. Though, it is worth noting that GitHub has stated that they have implemented measures to minimize usage of proprietary code. In any case, it is also correct to say that open-source is also not necessarily “free” in all forms, having developers and companies obligated to follow code and libraries licensing terms, which, if not abided, can lead into companies’ financial and reputational damages, loss of intellectual property rights where modifications or derivative works must be released under the same license terms, preventing the company from selling or licensing their software to others.

Open-source usage: A side conversation…

A recent study from Synopsys shows that:

- - 54% of code bases contains licensing conflicts;
  - 91% of codebases contained code that is not updated / maintained in the past 2 years;
  - 11% of Java codebases contained a vulnerable version of Log4J;
  - 87% had some sort of vulnerability.

Synopsys evaluated over 1700 commercial codebases, from 17 industries

Why mixing up ChatGPT, GitHub Copilot and Open Source?

Well, it is just a small-sized illustration of our current AI business scenario!

If GitHub Copilot accesses open-source data to generate code, and, it contains vulnerabilities, it is possible that the output code will also contain such issues. If GitHub Copilot and ChatGPT accesses and uses copyrighted materials, it is possible that the outcome could expose users and organizations to some level of risks. If AI source contains personal data and you end up using it in a non-anonymized form, your company could be violating data privacy laws and regulations. If AI source contains personal data illegally acquired, all subsequent usage is equally illegal.

But at the end of the day, all comes down to take informed decisions while assuming (and taking) some opportunity and risks to stay relevant in your market.

Equally important to note that we are not discouraging the usage of open-source tools or AI, but its usage has to be leveraged as any other tool, being assessed within a strategic point-of-view.

As a team member working for an organization of any size, the decision whether or not to use an AI tool, open-source code or any given solution must be bounded by some principles:

- - Review the terms of use and privacy policies thoroughly;
  - Follow internal IT processes and procedures (e.g., Secure Software Development);
  - Discussion with peers and managers;
  - Map risks and mitigation approaches;
  - Discuss with IT, Privacy and Legal teams;
  - Even following all internal policies, be careful on what you share. Recently Samsung employees have unintentionally leaked Confidential Intellectual Property into ChatGPT. I am sure you don’t want to be in this situation!

So that at the end of the day, the risk (and/or opportunity) acceptance is done at the company level and not at an individual level. Following this approach, you not only add value to your technical consultation by protecting the company business and reputation, but also protecting yourself as an individual and professional.

As someone very wise once said, “With great power comes great responsibility”. AI is a great power… and the responsibility of an ethical usage resides in you.

Hope you enjoyed this article and that it provides good insights to you and your business.

If you need specialized support on Data Privacy and Web Development teams, reach ou to us! An agent will reply shortly!

We support your company with Data Privacy and Web Development teams

Learn more on how we can support your business to grow.

DEVELOPMENT AND SUPPORT SERVICES

Dev & Infra Teams outsourcing
Brazilian support for global teams
Waterfall, Hybrid and Agile Project Management services
Web and Mobile Development services
AMS, on-going support to your projects