A cybersecurity research firm has raised concerns about a potential data breach involving DeepSeek’s dataset. A report indicates that a ClickHouse database belonging to DeepSeek was found accessible to the public, allowing full control over its database functions. The breach reportedly exposed a significant amount of sensitive data, including chat histories, secret keys, log timestamps, and operational details. It remains uncertain whether DeepSeek has reported the incident to the relevant authorities or if the exposed database has been removed from public access.
Potential Breach of DeepSeek’s Dataset
In a blog post, cybersecurity company Wiz Research disclosed the discovery of an entirely open and unauthenticated dataset containing critical information about the DeepSeek platform. The release of this sensitive data is believed to pose risks not only to the AI company but also to its end users.
The firm aimed to evaluate DeepSeek’s external security to identify potential vulnerabilities in light of the platform’s growing use. Initial investigations involved mapping Internet-facing subdomains, but did not reveal significant exposure risks.
After implementing advanced detection techniques, however, researchers identified two open ports (8123 and 9000) associated with various public hosts. Wiz Research asserted that these ports led them to a ClickHouse database that was accessible without any authentication.
ClickHouse, an open-source columnar database management system created by Yandex, is typically utilized for efficient analytical queries and is favored by ethical hackers scanning for exposed data on the dark web.
According to the findings, the dataset includes a log stream table with over one million log entries, featuring timestamps and logs from January 6. It also references several internal DeepSeek application programming interface (API) endpoints, along with chat histories, API keys, backend information, and operational metadata presented in plain text.
Researchers pointed out that, given the sensitivity of the information disclosed, malicious actors could potentially extract passwords, local files, and proprietary data directly from the server. At the time of this report, no information was available regarding efforts to mitigate the exposure or whether the dataset had been taken offline.