Researchers claimed that 65 percent of the 50 leading AI companies have leaked verified secrets on GitHub.
Photo Credit: GitHub
The researchers highlighted that AI companies should invest in secret scanning to protect their assets
Perplexity, Anthropic, and other leading artificial intelligence (AI) companies might have exposed sensitive data on GitHub, claims a cloud security firm. As per the firm's report, at least 65 percent of the leading AI companies have exposure risk around their proprietary AI models, datasets, and training processes. Some of the exposed data includes application programming interface (API) keys, tokens, and sensitive credentials, the report claimed. The researchers also highlighted the need for AI companies to use more advanced scanners that can alert them to such exposure.
According to the cloud security platform Wiz, 65 percent of the AI companies mentioned in Forbes' AI 50 list have their AI secrets exposed on GitHub. This would include companies such as Anthropic, Mistral, Cohere, Midjourney, Perplexity, Suno, World Labs, and more. However, the researchers did not name any particular company.
The sensitive data leaks on GitHub as the company's developers use the platform to code and create repositories. These repositories can inadvertently contain API keys, dataset information, and other information that can even reveal critical information about their proprietary AI models. The risk increases with a higher GitHub footprint, although the researchers found an instance where data was leaked even without any public repositories.
To test whether these AI companies have any exposure risk, Wiz's team first identified the employees of the company by scanning through the followers of an organisation on LinkedIn, accounts referencing the organisation name in their GitHub metadata, code contributors, and correlating the information across Hugging Face and other platforms.
After identifying the accounts, the researchers then performed an extensive scan across three parameters of depth, coverage, and perimeter. Depth search or searching for new sources lets the researchers scan the accounts' full commit history, commit history on forks, deleted forks, workflow logs, and gists. The researchers also found that the employees can sometimes add this sensitive data into their own public repositories and gists.
Some of the leaked data surfaced by the team includes model weights and biases, Google API, credentials of Hugging Face and ElevenLabs, and more.
Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.
Google Introduces Private AI Compute for Privacy-Safe Cloud-Backed AI Processing
Elden Ring Nightreign DLC, the Forsaken Hollows, Announced; Launch Set for December