Technology News
  • Home
  • Internet
  • Internet News
  • Language Models Like ChatGPT Could Be Plagiarising in More Ways Than Just ‘Copy Paste’, Say Researchers

Language Models Like ChatGPT Could Be Plagiarising in More Ways Than Just ‘Copy-Paste’, Say Researchers

Penn University research team tested OpenAI's GPT-2 for plagiarism.

By ANI | Updated: 20 February 2023 18:12 IST
Language Models Like ChatGPT Could Be Plagiarising in More Ways Than Just ‘Copy-Paste’, Say Researchers

Photo Credit: Unsplash

ChatGPT has already been banned in several schools in the US

Highlights
  • Study can help AI researchers build more robust language models in future
  • The results of the study only apply to GPT-2
  • Researchers will present their findings at the 2023 ACM Web Conference

Concerns about plagiarism are raised when language models, presumably including ChatGPT, paraphrase and reuse concepts from training data without citing the original source.

Before finishing their next assignment with a chatbot, students might want to give it some thought. According to a research team led by Penn University that undertook the first study to specifically look at the topic, language models that generate text in response to user prompts plagiarise content in more ways than one.

"Plagiarism comes in different flavours," said Dongwon Lee, professor of information sciences and technology at Penn State. "We wanted to see if language models not only copy and paste but resort to more sophisticated forms of plagiarism without realizing it."

The researchers focused on identifying three forms of plagiarism: verbatim, or directly copying and pasting content; paraphrasing, or rewording and restructuring content without citing the original source; and idea, or using the main idea from a text without proper attribution. They constructed a pipeline for automated plagiarism detection and tested it against OpenAI's GPT-2 because the language model's training data is available online, allowing the researchers to compare generated texts to the 8 million documents used to pre-train GPT-2.

The scientists used 210,000 generated texts to test for plagiarism in pre-trained language models and fine-tuned language models, or models trained further to focus on specific topic areas. In this case, the team fine-tuned three language models to focus on scientific documents, scholarly articles related to COVID-19, and patent claims. They used an open-source search engine to retrieve the top 10 training documents most similar to each generated text and modified an existing text alignment algorithm to better detect instances of verbatim, paraphrase and idea plagiarism.

The team found that the language models committed all three types of plagiarism and that the larger the dataset and parameters used to train the model, the more often plagiarism occurred. They also noted that fine-tuned language models reduced verbatim plagiarism but increased instances of paraphrasing and idea plagiarism. In addition, they identified instances of the language model exposing individuals' private information through all three forms of plagiarism. The researchers will present their findings at the 2023 ACM Web Conference, which takes place from April 30-May 4 in Austin, Texas.

"People pursue large language models because the larger the model gets, generation abilities increase," said lead author Jooyoung Lee, a doctoral student in the College of Information Sciences and Technology at Penn State. "At the same time, they are jeopardizing the originality and creativity of the content within the training corpus. This is an important finding."

The study highlights the need for more research into text generators and the ethical and philosophical questions that they pose, according to the researchers.

"Even though the output may be appealing, and language models may be fun to use and seem productive for certain tasks, it doesn't mean they are practical," said Thai Le, assistant professor of computer and information science at the University of Mississippi who began working on the project as a doctoral candidate at Penn State. "In practice, we need to take care of the ethical and copyright issues that text generators pose."

Though the results of the study only apply to GPT-2, the automatic plagiarism detection process that the researchers established can be applied to newer language models like ChatGPT to determine if and how often these models plagiarize training content. Testing for plagiarism, however, depends on the developers making the training data publicly accessible, said the researchers.

The current study can help AI researchers build more robust, reliable and responsible language models in future, according to the scientists. For now, they urge individuals to exercise caution when using text generators.

"AI researchers and scientists are studying how to make language models better and more robust, meanwhile, many individuals are using language models in their daily lives for various productivity tasks," said Jinghui Chen, assistant professor of information sciences and technology at Penn State. "While leveraging language models as a search engine or a stack overflow to debug code is probably fine, for other purposes, since the language model may produce plagiarized content, it may result in negative consequences for the user."

The plagiarism outcome is not something unexpected, added Dongwon Lee.

"As a stochastic parrot, we taught language models to mimic human writings without teaching them how not to plagiarize properly," he said. "Now, it's time to teach them to write more properly, and we have a long way to go."

The OnePlus 11 5G was launched at the company's Cloud 11 launch event which also saw the debut of several other devices. We discuss this new handset and all of OnePlus' new hardware on Orbital, the Gadgets 360 podcast. Orbital is available on Spotify, Gaana, JioSaavn, Google Podcasts, Apple Podcasts, Amazon Music and wherever you get your podcasts.
Affiliate links may be automatically generated - see our ethics statement for details.
Comments

For details of the latest launches and news from Samsung, Xiaomi, Realme, OnePlus, Oppo and other companies at the Mobile World Congress in Barcelona, visit our MWC 2023 hub.

Further reading: OpenAI, GPT 2, ChatGPT, Plagiarism, AI
TRAI to Bring Consultation Paper on Digital Inclusion, Will Focus on Devices, Connectivity, Literacy
Featured video of the day
Samsung's Experiential Store

Related Stories

Language Models Like ChatGPT Could Be Plagiarising in More Ways Than Just ‘Copy-Paste’, Say Researchers
Comment
Share on Facebook Tweet Snapchat Share Reddit Comment google-newsGoogle News
 
 

Advertisement

Follow Us
Latest Videos
More Videos
Tech News in Hindi
More Technology News in Hindi

Advertisement

Popular on Gadgets
Latest Gadgets
Popular Mobile Brands
#Trending Stories
  1. Fire-Boltt Blizzard Smartwatch With Bluetooth Calling Unveiled in India
  2. Realme GT 3 Variant With Snapdragon 8 Gen 3 SoC Said to Launch This Year
  3. Nothing Phone 1 Android 13 Update Reportedly Rolling Out: Details
  4. Uber and Tata Motors Are Bringing 25,000 EV Cabs in India
  5. OnePlus Nord CE 2 Gets Android 13-Based OxygenOS 13 Update in India
  6. Tecno Phantom V Fold Render Surfaces Online, Design Tipped
  7. This Huawei Smartwatch Has Inbuilt TWS Earbuds: Here's How Much It Costs
  8. JioCinema to Stream IPL 2023 in Ultra-HD 4K Resolution for Free: Details
  9. OpenAI’s GPT-2 Plagiarised Verbatim, Paraphrased, Stole Ideas: Study
  10. Vivo V27 Series India Launch Set for This Date; Specifications Teased
#Latest Stories
  1. New Oppo Smartphone With Design Similar to Oppo Reno 8T 5G Reportedly Enters Testing in India
  2. Lava Yuva 2 Pro With 5,000mAh Battery Reportedly Available Offline Ahead of Official Launch: Report
  3. Diablo IV Open Beta Begins in March, Blizzard Confirms: Details
  4. JioCinema to Stream IPL 2023 in Ultra-HD 4K Resolution for Free in 12 Languages: All Details
  5. Language Models Like ChatGPT Could Be Plagiarising in More Ways Than Just ‘Copy-Paste’, Say Researchers
  6. TRAI to Bring Consultation Paper on Digital Inclusion, Will Focus on Devices, Connectivity, Literacy
  7. Hisense Patents Smartphone With a Wraparound Display Design: Report
  8. India, Singapore to Link Digital Payments Systems UPI, PayNow for Cross-Border Transactions: All Details
  9. Google Meet Rolls Out 360-Degree Background Support on Android, iOS Devices
  10. YouTube Set for Web3, Metaverse Revamp Under New CEO Neal Mohan: Details
Gadgets 360 is available in
Follow Us
Download Our Apps
App Store App Store
Available in Hindi
App Store
© Copyright Red Pixels Ventures Limited 2023. All rights reserved.