Exposing Hidden Dangers: The Essential Guide to Secret Scanning in Package Repositories

Published in

InfoSec Write-ups

12 min readJun 5, 2024

In the ever-shifting realm of cybersecurity, staying one step ahead of potential threats is a non-negotiable mission. Package repositories like PyPI, npm, NuGet, and RubyGems are goldmines of software packages, cherished by developers worldwide. While these packages are indispensable for crafting powerful applications, they may also harbor concealed secrets, making developers and organizations susceptible to data breaches and malicious exploits. In this blog post, we embark on a journey to unearth the significance of secret scanning within the latest packages from various repositories, revealing some startling revelations.

The Crucial Role of Package Repository Secret Scanning

Package repositories stand as the go-to source for software enthusiasts, housing a myriad of open-source libraries. They are often the starting point for developers when questing for packages to infuse into their projects. However, these packages are not immune to vulnerabilities, and secret leaks represent a perilous abyss.

Demystifying Secrets

Secrets come in various guises, encompassing API keys, authentication tokens, passwords, and encryption keys. These are classified as sensitive nuggets of information that should never see the light of day, for their compromise can herald catastrophic consequences.

The Far-reaching Ramifications of Secret Leaks

Inadvertent inclusion of secrets in packages exposes a chink in the armor, inviting nefarious actors to wreak havoc. For instance, a mishandled AWS (Amazon Web Services) access key might pave the way for unauthorized entry, unleashing a torrent of data breaches, financial setbacks, and operational chaos.

Unveiling Secrets: A Three-Pronged Approach

To underscore the gravity of secret scanning, we’ve adopted an innovative approach using dedicated EC2 machines for each major package manager, namely PyPI, npm, RubyGems, and NuGet. Let’s delve into the specifics of our approach for each:

PyPI: Python Package Index

Our PyPI-specific EC2 machine tirelessly parses the latest PyPI package downloads, extracts their contents, and performs a thorough GitLeaks scan to identify any secrets hidden within Python packages. PyPI packages are a cornerstone of the Python ecosystem, and securing them is paramount.

npm: Node Package Manager

Dedicated to the Node.js ecosystem, our npm EC2 machine is on a mission to parse the latest npm package downloads, extract them, and run GitLeaks scans to uncover any concealed secrets within Node.js packages. npm is the backbone of JavaScript development, and safeguarding it is essential.

RubyGems and NuGet: Multitasking Marvel

Our third EC2 machine is a multitasker, handling both RubyGems and NuGet repositories. It extracts the latest RubyGems and NuGet packages, meticulously scans them using GitLeaks, and reports any secrets that may compromise the security of Ruby and .NET applications. RubyGems and NuGet are pillars of their respective ecosystems, and their security is non-negotiable.

Automating Full Process

When running tasks that involve downloading and analyzing large amounts of data, it’s crucial to monitor and manage disk space. Without proper disk space management, the system can run out of space, causing disruptions and potentially failing the task. To address this issue, we’ve created a script that automates both the secret scan and disk space management.

Script Overview

Let’s break down the free.sh script step by step to understand its functionality:

#!/bin/bash
# Define the threshold for available disk space in GB
threshold=2

The script begins by defining a threshold variable, threshold, which represents the minimum amount of available disk space (in gigabytes) required for the script to proceed without cleaning up.

# Check available disk space in GB
available_space=$(df -h / | awk ‘NR==2 { print $4 }’ | sed ‘s/G//’)

Next, it checks the current available disk space on the root filesystem (i.e., /). It uses the df command to retrieve disk space information, and then, with the help of awk and sed, extracts the available space in gigabytes.

# Convert the available space to a numeric value
available_space_numeric=$(echo $available_space | sed ‘s/,//’)

The script converts the available space into a numeric value to facilitate comparison with the defined threshold.

# Compare available space with the threshold
if [ “$available_space_numeric” -lt “$threshold” ]; then

Here, it compares the available disk space with the threshold. If the available space falls below the specified threshold, the script proceeds to perform cleanup and initiate the secret scan. Otherwise, it simply reports that the available disk space is sufficient.

# Run gitleaks and write the output to a temporary file
tmp_file=$(mktemp)
echo $tmp_file | notify
gitleaks detect — no-git -v downloaded_packages/ — config ~/config.toml -r=$tmp_file

Within this conditional block, the script runs a secret scan using the gitleaks tool. It generates a temporary file to capture the scan results and uses the notify command to send a notification (you may need to customize this part depending on your notification system).

# Check if the downloaded_packages directory exists
if [ -d “downloaded_packages” ]; then
# Delete the downloaded_packages directory
rm -rf downloaded_packages
rm -rf .npm
fi

In this part of the script, it checks if a directory named downloaded_packages exists. If it does, it deletes this directory along with the .npm directory, which is often used for package management in Node.js projects. This cleanup helps free up disk space by removing unnecessary downloaded files.

else
echo “Available disk space is greater than or equal to 2GB.”
fi

Finally, if the available disk space is equal to or greater than 2GB (as specified by the threshold), the script reports that there’s no need for cleanup, ensuring that the secret scan project can continue without interruption.

PackageSpy

PackageSpy is an innovative, open-source tool designed to scan package managers for secrets, user-defined keywords, and patterns. It helps developers safeguard their projects and ensure that sensitive information remains hidden from prying eyes. Here’s how PackageSpy works:

Support for Multiple Package Managers: PackageSpy supports popular package managers like npm, PyPI, RubyGems, and more, making it versatile and adaptable to different development environments.

Customizable Scanning Rules: Developers can define their own scanning rules, keywords, and patterns to identify secrets specific to their projects. This flexibility ensures that PackageSpy can cater to diverse security requirements.

Command-Line Interface (CLI): PackageSpy’s user-friendly CLI interface allows developers to initiate scans easily and integrate it into their development workflows.

Interactive Reports: After scanning, PackageSpy generates detailed reports highlighting any secrets or keywords found, their locations, and suggested actions for mitigation.

Continuous Integration (CI) Integration: PackageSpy seamlessly integrates with CI/CD pipelines, allowing developers to automate scans during the development process, preventing secrets from being committed to repositories.

Usage and Benefits

PackageSpy is simple to use, yet it provides a robust security layer for package manager repositories. Here’s how developers can benefit from this tool:

Enhanced Security: By proactively scanning for secrets and keywords, PackageSpy helps developers identify vulnerabilities and maintain the confidentiality of sensitive information.

Time and Cost Savings: Detecting secrets early in the development process saves time and resources compared to dealing with potential breaches and their aftermath.

Compliance and Peace of Mind: PackageSpy aids in compliance with security best practices and industry standards, offering peace of mind to developers and stakeholders alike.

Open-Source Community: PackageSpy is open-source, encouraging collaboration and contribution from the development community to improve its capabilities and security.

https://github.com/aydinnyunus/PackageSpy

Analyzing Secret Scan Output

Understanding the Risks of Exposed Secrets in NPM Packages: A Breakdown

If you’re a developer using Node.js, you’re likely familiar with the Node Package Manager (NPM), a lifeline for importing countless libraries and tools to streamline your development process. However, there’s a hidden risk lurking in the shadows: the inadvertent exposure of secrets. In this post, we dive into recent scan results revealing the types of secrets most commonly found in NPM packages and discuss the potential risks associated with such exposures.

The Pervasiveness of AWS Access Tokens

The scan results are alarming: a whopping 34.3% of secrets found in NPM packages were AWS access tokens. These tokens are like digital keys to the kingdom of Amazon Web Services, allowing access to a vast array of resources. If these tokens fall into the wrong hands, it could lead to unauthorized access and control over cloud resources, leading to data breaches or costly usage charges.

HashiCorp Terraform Passwords — A Close Second

Following closely are HashiCorp Terraform passwords, constituting 20.6% of the findings. Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently. Passwords for Terraform can provide access to modify infrastructure, potentially allowing attackers to disrupt operations or create malicious environments.

The Danger of Exposed Private Keys

With 12.2% of the secrets being private keys, this issue presents a severe security threat. Private keys are used in cryptographic protocols to ensure secure communications. Exposure of these keys can lead to interception of sensitive data, impersonation, and a host of other security nightmares.

Financial Risks with Stripe Access Tokens

Stripe access tokens, at 7.2%, represent a direct financial risk. These tokens allow applications to charge credit cards and manage payments. Unauthorized access to these tokens could lead to fraudulent transactions and financial losses.

Social Media API Secrets

LinkedIn, Twitter, and Facebook secrets combined make up a small but significant percentage. These are used to interact with social media platforms’ APIs and could lead to unauthorized posting, access to sensitive profile information, and data harvesting.

The Slack and Telegram Tokens

Slack webhook URLs (25.7%) and Telegram Bot API tokens (14.3%) are particularly concerning as they allow for the sending of messages within these platforms. Unauthorized access here could lead to phishing attacks, spreading of malware, or leaking of confidential communication.

The Lesser-Known Culprits

GitHub app tokens, Google Cloud Platform API keys, and Microsoft Teams webhooks, while less prevalent, still pose significant risks. They provide authenticated access to source code, cloud resources, and team communications, respectively.

The Silent Alarm: Exposed Secrets in PyPI Packages

Python developers, take heed. The Python Package Index (PyPI) is an indispensable resource, but recent findings show that it’s also a minefield of security risks due to exposed secrets. In this post, we’ll dissect the nature of these secrets and the potential hazards they pose.

AWS Access Tokens: A Dominant Risk

An astounding 54.6% of the secrets found in PyPI packages were AWS access tokens. These tokens serve as a passport to Amazon Web Services, granting various levels of access to a plethora of services. The exposure of these tokens is akin to leaving the key to your house under the doormat, inviting a host of security issues ranging from data breaches to unauthorized operations that could rack up substantial costs.

HashiCorp Terraform Passwords: The Runner-Up

Making up 20.8% of the secrets, HashiCorp Terraform passwords are the second most common leak. Terraform automates the deployment of infrastructure, and these passwords could allow attackers to alter cloud environments, potentially leading to service disruptions or even data destruction.

Private Keys: The Hidden Dangers

Private keys, which represent 10.4% of the findings, are vital for secure communications in various protocols. If compromised, the consequences can be dire, including data leaks and man-in-the-middle attacks.

JWTs: Small Percentage, Big Problems

JWTs, or JSON Web Tokens, constitute 9.6% of the exposed secrets. These tokens are widely used for authentication and information exchange. Their exposure could lead to unauthorized access and manipulation of user sessions.

Smaller Shares with Significant Impacts

Other exposed secrets include Etsy access tokens, Slack webhook URLs, and Telegram Bot API tokens, each accounting for less than 2% of the findings. Despite their smaller shares, they hold the potential for significant damage. Etsy tokens could allow unauthorized transactions, Slack URLs could enable spreading misinformation or phishing within organizations, and Telegram tokens could compromise bot interactions.

The Less Than 1% Club: Varied and Volatile

A diverse array of secrets fall into this category, including API keys for Google Cloud Platform, GitHub personal access tokens, and Stripe access tokens. Each of these can open the door to their respective services, allowing for unauthorized actions that could range from code theft to financial fraud.

The Red Flags in Ruby: Secrets Exposure in RubyGems

Ruby developers, it’s time for a security check-up. RubyGems, the package manager that serves as a hub for distributing Ruby programs and libraries, has become a hotbed for exposed secrets. Let’s dissect the recent findings from a security scan and discuss the implications for the Ruby community.

AWS Access Tokens Take the Lion’s Share

The scan results are striking. An overwhelming 66.5% of the secrets found in RubyGems packages were AWS access tokens. This is not just a majority; it’s a dominance that should raise eyebrows. AWS tokens are the master keys to cloud services that can control virtually every aspect of AWS. From spinning up servers to accessing databases, the potential for misuse here is vast and the ramifications, from data leakage to service interruption, are serious.

HashiCorp Terraform Passwords — A Distinct Concern

HashiCorp Terraform passwords account for 15.8% of the secrets exposed. As a tool that manages infrastructure as code, Terraform has the power to create and destroy environments. Exposure of these passwords can lead to unauthorized changes to infrastructure, making it a significant point of vulnerability.

Private Keys and JWTs: Small Pieces, Big Puzzle

Though they represent smaller portions — 9.3% for private keys and 6.8% for JWTs — their impact is disproportionately large. Private keys are crucial for the security of communications in various encryption protocols, and JWTs are heavily used for authentication processes. Leaks in these areas can lead to a range of security issues, including unauthorized access and eavesdropping.

Stripe Access Tokens: Financial Implications

Stripe access tokens, which stand at 1.7%, may seem minor but hold the keys to financial transactions. Misuse of these tokens can lead to financial fraud and loss, highlighting the need for stringent protection measures.

The Under 1%: Slack Webhooks and OpenAI API Keys

Even less prevalent but noteworthy are Slack webhook URLs and OpenAI API keys. Slack webhooks can be used to send messages to teams, potentially spreading misinformation or phishing attacks. OpenAI API keys give access to powerful AI tools, which, if misused, could lead to unethical generation of content or exploitation of AI resources.

Reporting Findings

Our secret scanning efforts have uncovered critical vulnerabilities within packages hosted on popular repositories, exposing sensitive information that could lead to severe security breaches. We take the responsibility of reporting these findings to the respective companies and organizations that own or manage the affected services. Below is a summary of our reporting process:

Finding Contacts

To identify and contact the owners or maintainers of the affected projects associated with the following companies, we utilize information available through package managers such as npm, PyPI, RubyGems, NuGet, and others. The process involves:

Package Manager Investigation:

For npm packages: We inspect the package.json file for contact information, including the “maintainers” field.
For PyPI packages: We check the METADATA or pyproject.toml files for maintainers’ details.
For RubyGems: We explore the gemspec file for owner/maintainer information.
For NuGet packages: We review the nuspec file for contact details.
Project Documentation and Repository:

We explore the official documentation and repository of the project, searching for maintainers’ or owners’ contact information.

- Publicly Available Communication Channels:

Look for mailing lists, forums, or community channels associated with the project where maintainers can be reached.

- Package Manager Messaging System:

Utilize the messaging systems provided by package managers, such as npm’s npm owner add or PyPI’s maintainers messaging system.

Reporting Method

Once the contact information is obtained, we initiate the reporting process to the respective companies:

Microsoft
Automattic
Mapbox
Keeper Security
Pulumi
Weblate
Palo Alto Networks
Telefonica Global
Private (+7.5M Downloads)

Reporting Channels

For each company, consider using a combination of the following reporting channels:

Email Communication:

Send detailed emails to the identified contacts within the companies, providing an overview of the discovered vulnerabilities, potential impact, and recommended actions for mitigation.

HackerOne/Bugcrowd Platforms:

If the companies participate in bug bounty programs, submit the vulnerabilities through platforms like HackerOne or Bugcrowd, following their respective guidelines.

Security Disclosure Policy:

Adhere to the companies’ security disclosure policies if available, ensuring that the vulnerabilities are reported responsibly and in compliance with their guidelines.

Follow-Up and Collaboration:

Maintain open lines of communication with the companies’ security teams, responding promptly to any inquiries, and collaborating on the development of patches or mitigations.
By following these steps, we aim to responsibly disclose critical vulnerabilities associated with the mentioned companies and contribute to the overall security of the software ecosystem.

Conclusion

Secret scanning within the latest packages from various repositories is an indispensable practice for upholding the security of software applications. Our three-pronged approach with dedicated EC2 machines, along with the introduction of the user-centric scanning tool, highlights our commitment to thorough security. By proactively identifying and mitigating secrets, developers can significantly diminish the odds of security breaches, safeguarding their organizations and users from the perils that lurk in the shadows.

Always remember, the potency of open source blossoms through collaboration and responsible coding practices. Let us join hands in fortifying the software ecosystem, rendering it a safer haven for all.

For more insights into the realm of cybersecurity, consider subscribing to our newsletter, where we unravel the latest threats and best practices.

Contact

LinkedIn: https://linkedin.com/in/aydinnyunus

Twitter: https://twitter.com/aydinnyunuss

Github: https://github.com/aydinnyunus