Alex Selimov

Separate files from git repo into a submodule

Published: Feb 22, 2023

I recently had a situation where a library I was working on, originally as part of one project, was going to be needed for another project. The ideal way to handle this situation, is to have the library files as their own git repo which is then added to the projects as a submodule. This way any changes required to the submodule for the needs of each project can be shared easily. It took me much longer than I would’ve liked to, but I finally managed to find the solution and wanted to share it with anyone else who might need it.

Assume I have a git repo as below:

├── main.cpp
├── test.cpp
└── test.h

If I wanted to pull out the test.cpp and test.h files on their own with all the history for only those commits, the command is:

$ git filter-branch --force  --prune-empty --index-filter \
 'git rm --cached --ignore-unmatch $(git ls-files | grep -v "test.h\|test.cpp")'

A couple notes should be mentioned. First when you run the git filter-branch command it will give you a message saying that you shouldn’t use filter-branch and should instead use filter-repo. I don’t know how to do this with filter-repo and didn’t have the time to figure it out. The --prune-empty flag deletes all the commits that aren’t associated with the files of interest so that you don’t have to do a rebase. Finally to specify the files you want, you need to pass them to the grep command as:

grep -v "file1\|file2\|file3"

The -v flag inverts the match, returning all files to the git rm command which don’t match the files you specify. The file names must be separated with \| for matching multiple different tokens. Once this command completes, you should be left with just the files of interest and the associated history. All that’s left is simply setting a new remote url and then push, i.e.

$ git remote set-url origin submodule.git.url
$ git push

Final Important Note: If this goes wrong you may get worried as all of your git history is wiped out. To fix this you can use git reflog. If you run git reflog after you mess up the git filter-branch you should see something like this (dummy commits from my fake repo):

$ git reflog
790c883 (HEAD -> master) HEAD@{0}: filter-branch: rewrite
3b3c8b8 HEAD@{1}: commit: Update test library
93cfbd4 HEAD@{2}: commit: Add main function
18259bc HEAD@{3}: commit: README update
aba8323 HEAD@{4}: commit: init test files

You can then reset your git repo to a state before your filter branch command by running:

$ git reset --hard HEAD@{1}

Hopefully this helps someone out!