Sometimes, a large file needs to be tracked in a git repo. For hosts such as GitHub, this can be problematic, because GitHub limits file sizes in repositories to 100MB. In this post, I’ll walk you through the steps I took to remove a large file from a git repo’s entire commit history, and then moving that file to Git LFS.
First, let’s start off with some requirements:
- Git (I’ll assume you already have this installed if you’re already working with a git repo on your PC)
- Java Runtime Environment (JRE) version 8 or higher (download from here)
- BFG Repo-Cleaner (download from here)
- Git Large File Storage (LFS) (download from here)
Now that you’ve got the necessary software on your machine, let’s get to it!
WARNING: This process will rewrite your repository’s commit history, changing all commit hashes downstream of the first affected commit! IT IS HIGHLY RECOMMENDED TO FIRST CREATE A BACKUP COPY OF YOUR ORIGINAL REPO IN YOUR REMOTE!!!
First, let’s create a fresh clone of our repo that has the large file in its commit history, and then navigate to the cloned repo’s root directory by using cd to enter it.
git clone https://github.com/dankoman30/bominator.git cd bominator
Due to the way BFG Repo-Cleaner is designed, the repo’s HEAD (the newest commit in history) is considered a protected commit. As such, since the large file still exists as of the newest commit, we need to create a new commit that deletes this large file entirely. Before you do that though, make a backup copy of the large file because we’ll need it later when we go to add it to LFS.
Let’s delete that large file now, and then commit the change:
git rm released_dicts/released_bounding_boxes.json git commit
If the VIM text editor comes up after committing, you’ll want to type in a commit message. Do this by pressing i on your keyboard to begin entering text, then press ESC to save it. Then, to exit the editor, type :wq and then press ENTER.
Now that the large file has been removed from HEAD, we can continue cleaning this file from the repo’s entire commit history. This is where BFG Repo-Cleaner comes in. It’s a .jar file that needs to be executed using the Java Runtime you installed earlier.
From the same directory, let’s use BFG Repo-Cleaner to perform this task. The following command assumes that the directory where you installed Java is in your system’s PATH environment variable, and that you moved the downloaded bfg.jar into C:\bfg\bfg.jar:
java -jar C:\bfg\bfg.jar --delete-files released_bounding_boxes.json
At this point, any trace of the large file has been removed from the local clone of the repository. To make our remote match our local, we must push changes forcibly:
git push --force
Now, our remote has the large file removed completely! You can verify this by browsing your repo on GitHub at any point in history, and you’ll see that the large file no longer exists. We can continue on to using LFS to manage this large file:
git lfs track "released_dicts/released_bounding_boxes.json"
This command will make a modification to your repo’s .gitattributes file. If your repo currently does not track this file (for instance, if you’ve included it in your .gitignore file), it’s highly recommended to begin tracking it at this point. This makes collaboration easier with other users of your repository.
Now, let’s move the backup copy of the large file back to its original location within our repository, and then create a new commit that adds it back (the -m flag here is a shortcut for quickly defining the commit message without opening the VIM editor). Finally, let’s push that commit to our remote:
git add released_dicts/released_bounding_boxes.json git commit -m "re-add released_bounding_boxes.json after LFS configuration" git push
During the push, you will see a message about your LFS objects being uploaded. This is expected, and indicates successful configuration of LFS for that file.
And that’s it! If you go into GitHub to browse your repository, you should now see your large file in your repo again just like it was before, however now, the actual storage location of that file has changed on GitHub’s side, as it’s now being managed by LFS.