Deleting credentials and other sensitive data from Git
If you accidentally commit and push a credential or any other piece of sensitive data to a Git repo (private or otherwise), consider the credential compromised. The first step before deleting the commit containing the sensitive data should be to immediately rotate it out of use.
As per the GitHub documentation:
Once the credential has been rotated out of use, follow these steps to cleanse the credential from your your Git repository.
Step 1: Choose your tool
First, decide whether to use git filter-repo
or BFG Repo-Cleaner. Both tools rewrite your repository's history, which changes the SHA hashes for altered commits and any dependent commits. This could affect open pull requests, so it's wise to merge or close these before proceeding.
The BFG Reop-Cleaner is an open-source tool written in Java, maintained by the community that provides a simpler, more user-friendly option to rewriting your repository’s history and thus cleaning out credentials or other sensitive data that may have been committed.
The git filter-repo
command provides more flexibility however, and offers a finer-grained approach. Use this method if you are a more advanced user, and require a more delicate technique.
Step 2: Removing the sensitive data
Using BFG Repo-Cleaner
Download and install BFG Repo-Cleaner Follow the instructions listed on the official BFG Repo-Cleaner website, to download and install the tool. Note that this tool requires Java to be installed on your machine.
To remove a specific file containing sensitive data without affecting your latest commit, execute:
bfg --delete-files YOUR-FILE-WITH-SENSITIVE-DATA
Instead o replace sensitive text across your repository's history, use:
bfg --replace-text passwords.txt
This command will replace all text from the specified file across your entire repository’s history with
*REMOVED*
.After removal, force push your changes to GitHub with
git push --force
.
Using git filter-repo
- Install the git filter-repo tool The filter-repo tool is not included in Git by default, and must be installed before use.
If using Homebrew, the command is brew install git-filter-repo
.
You can also install the command manually from the official git filter-repo repository.
Navigate to your repository directory:
cd YOUR-REPOSITORY
Execute the following command, replacing
PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA
with the entire path to the file that you want to delete:git filter-repo --invert-paths --path PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA
Add the file to
.gitignore
to prevent future commits:echo "YOUR-FILE-WITH-SENSITIVE-DATA" >> .gitignore git add .gitignore git commit -m "Add YOUR-FILE-WITH-SENSITIVE-DATA to .gitignore"
This will configure Git to automatically ignore this file in any future commits.
Force-push the changes to GitHub to overwrite the history:
git push origin --force --all
To remove the sensitive file from all your tagged releases you also need to run:
git push origin --force --tags
Step 3: Fully removing the data GitHub
Even after these steps, some data might remain cached or referenced in pull requests. If the leaked data was not a credential that could be rotated, such as personal user data you may need to take additional steps to ensure the data has been removed properly:
Contact GitHub support to request the removal of cached views and references to the sensitive data in pull requests.
Inform all collaborators of the repository to rebase their branches instead of merging to avoid reintroducing the removed data.
Step 4: Clean up and prevent future leaks
After ensuring that the sensitive data is completely removed, clean your local repository and take preventive measures:
To force all objects in your local Git repository to be dereferenced and garbage collected, effectively cleaning up and minimizing the size of the repository after sensitive data has been removed or any substantial rewriting of the repository's history, you can follow these steps:
Dereference original references: First, remove references to the original branches and tags that
git filter-repo
(or a similar tool) has rewritten. These references are usually stored inrefs/original/
. This step ensures that the rewritten history is the only one recognized by Git, facilitating the garbage collection process. Execute the following command:git for-each-ref --format="delete %(refname)" refs/original/ | git update-ref --stdin
This command lists all references under
refs/original/
and deletes them by feeding the list togit update-ref --stdin
, which processes these deletion commands from standard input.Expire reflog entries: Next, expire all entries in the reflog. The reflog records the history of the tips of branches and other references within the local repository, and expiring these entries helps in removing any pointers to the old (now unwanted) objects. Run:
git reflog expire --expire=now --all
This command tells Git to immediately expire all reflog entries, effectively removing any references to objects that are no longer in the current history.
Garbage collect: Finally, perform a manual garbage collection to clean up and optimize the repository. This step removes objects that are no longer reachable from any references, compacts the repository, and optimizes its performance. Use the following command:
git gc --prune=now
Here,
--prune=now
forces Git to immediately prune (delete) objects that are no longer needed, instead of waiting for the default period (typically two weeks).
These steps will clean up your repository by dereferencing the rewritten history's original objects and performing a thorough garbage collection. It's a crucial process after using tools like git filter-repo
or BFG Repo-Cleaner to ensure your repository does not retain any unnecessary objects from the old history, potentially including the sensitive data you sought to remove. This cleanup also helps in reducing the repository's size and improving its performance.
In the future it’s important to stop these leaks from happening in the first place.
Employ best practices to avoid accidental commits of sensitive data, such as using visual tools for staging changes, avoiding catch-all git add
commands, and enabling push protection in your repository settings.
By carefully following these steps, you can effectively remove sensitive data from your Git repository and take measures to prevent similar incidents in the future.
For more information on removing sensitive data from your Git repository see the official GitHub documentation.