Experimental art asset collaboration workflow with git-annex

Tags: ,

0

During the development of the upcoming Open Arena reboot we discussed some possibilities of a more open version control system than the current one. The main problem seems to be handling the large amount of binary art assets. In this blogpost I explain a possible workflow using git-annex.

Goals

The ideal solution has to be:

  • Cost efficient or DIY: Open Arena still doesn’t accept donations, but contributors often have servers and system administration skills
  • Bandwidth efficient: a checkout/pull shouldn’t download the whole history
  • Open to contributors: contributors should be easy to add
  • Reasonably controlled: changes should be revertible and permissions should be customizable at least on a group level
  • Easy to use: we can expect some IT knowledge from open-source contributors, but it shouldn’t be hard for artists

TL;DR

The experimental repository can be accessed at q3textures.udionline.hu, but it is not yet worth registering or downloading content.

  • ✓ Cost efficient or DIY: a cheap VPS with free software can be turned into a central repository
  • ✓ Bandwidth efficient: a pull only downloads metadata, real content has to be explicitly downloaded
  • ✓ Open to contributors: contributors, repositories, branches can be easily added
  • ✓ Reasonably controlled: changes can be reverted, flexible permission settings
  • ✕ Easy to use: currently requires POSIX system and command line skills

Infrastructure

The current experimental system is built with these open-source components:

  • Ubuntu 10.04 Server
  • Nginx with PHP and MySQL
  • Indefero project management software
  • Git, gitosis, git-annex

This setup currently uses 480 MB of memory, so it should be able to run on a 512 MB VPS, which costs anywhere around 5-10 USD/month. (Current experimental system is a 1024MB VPS.)

Indefero was chosen because it’s PHP based and that know-how is easy to come by if something has to be modified.

Server setup

The installation is pretty straightforward, there are countless tutorials on the net about installing git and gitolite and Indefero also has a nice documentation. If your distribution is too old (eg. Ubuntu 10.04) you need to compile git-annex and setup gitolite by hand. Setup is easier if you first install git, gitolite and git-annex, test it, and only then should you try putting Indefero above it all.

Note that the tutorial about gitolite + git-annex on the git-annex site is about gitolite v2, if you want to use gitolite v3 with git-annex, then you should install the git-annex branch of gitolite, and you will find the instructions in the gitolite/src/commands/git-annex-shell file on the bottom. (It’s supposed to be a test version, but it works fine.)

Client setup

You need git and git-annex installed on the client side too. Git-annex requires a POSIX system, Windows support is still under development.

Be aware that git-annex relies on symlinking, so make sure to put your local repository on a filesystem which supports it. (I wasted 6 hours figuring out why it doesn’t work on a ntfs-3g partition.)

Workflow

Adding content

Adding content requires write permissions to the repository. After setup you should add at least one user with write permissions, so you can use that for uploading initial data. We assume that the repository is already cloned to the local machine.

Put the files you want to add into the repository folder (newfile.jpg in this example):

regular file

Git will recognize that this is a new file:

udi@udi-noti:~/vpslux/q3textures$ git status
# On branch master
# Untracked files:
#   (use "git add ..." to include in what will be committed)
#
#    newfile.jpg
nothing added to commit but untracked files present (use "git add" to track)

But instead of adding this as regular file, we add it as a git-annex file:

udi@udi-noti:~/vpslux/q3textures$ git annex add newfile.jpg
add newfile.jpg (checksum...) ok
(Recording state in git...)

Git-annex will create a symlink for this file, and will only store the symlink in git:

symlink

Therefore git will be aware of the addition, but will only push around a small file (4 KB instead of 48 kB in this situation):

udi@udi-noti:~/vpslux/q3textures$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD ..." to unstage)
#
#	new file:   newfile.jpg
#

Commit, push git changes to origin and then copy the real file contents too, so others can retrieve the content from the central repo.

Commit:

udi@udi-noti:~/vpslux/q3textures$ git commit -a -m "Add new file"
[master 9518b92] Add new file
 1 file changed, 1 insertion(+)
 create mode 120000 newfile.jpg

Push (notice that git only uploads 418 bytes, that’s the symlink):

udi@udi-noti:~/vpslux/q3textures$ git push origin master
Counting objects: 4, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 418 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
To git@q3textures.udionline.hu:q3textures
   8bd4634..9518b92  master -> master

Git-annex copy (notice that git-annex uploads 48188 bytes, that’s the real file):

udi@udi-noti:~/vpslux/q3textures$ git annex copy . --to origin
copy colua0.jpg (checking origin...) ok
copy newfile.jpg (checking origin...) (to origin...) 
SHA256-s48188--341f2d101dfcf6af5bc8e9ce444fbbfbde0c9c1893a3a8d33fef452bad730edf
       48188 100%   14.71MB/s    0:00:00 (xfer#1, to-check=0/1)

sent 48335 bytes  received 31 bytes  32244.00 bytes/sec
total size is 48188  speedup is 1.00
ok

Editing files

During collaboration files often get improved and overwritten, but the workflow allows this. Git annex is very secure by default, therefore you have to unlock the files you want to edit:

udi@udi-noti:~/vpslux/q3textures$ git annex unlock newfile.jpg
unlock newfile.jpg (copying...) ok

During unlocking git-annex replaces the symlink with the actual file, so now you can edit a regular file:

regular file

Git will notice the unlocking, because the filetype changes:

udi@udi-noti:~/vpslux/q3textures$ git status
# On branch master
# Changes not staged for commit:
#   (use "git add ..." to update what will be committed)
#   (use "git checkout -- ..." to discard changes in working directory)
#
#	typechange: newfile.jpg
#
no changes added to commit (use "git add" and/or "git commit -a")

If you want to cancel the changes to the file, just relock it:

udi@udi-noti:~/vpslux/q3textures$ git annex lock newfile.jpg
lock newfile.jpg ok
(Recording state in git...)

If you want to commit the changes made to the file, you can simply commit the files and the precommit check of git-annex will take care of the changes. (But if you have a lot of files, its supposed to be faster if you add the changed files by hand before committing.)

udi@udi-noti:~/vpslux/q3textures$ git commit -a -m "Change newfile.jpg"
add newfile.jpg (checksum...) ok
ok
(Recording state in git...)
[master d340c61] Change newfile.jpg
 1 file changed, 1 insertion(+), 1 deletion(-)

After commit you can see that the changed file becomes a symlink too:

edited file

After commit, don’t forget to push and copy the real file with git-annex, so the content will be available from the central repository too.

Reverting files

Because git-annex stores every file under a filename of their own hashes, editing a file will be resulting in a new file. Since the symlinks are stored in git, we can use the revert mechanism of git to revert the symlink pointing to the older file, and thus reverting the change.

The edited file:

edited file

Reverting one commit:

udi@udi-noti:~/vpslux/q3textures$ git revert HEAD

And we can immediately see that the symlink is now pointing to the old file:

original file

Getting files

As a comparison here’s the size of a heavily edited working directory:

udi@udi-noti:~/vpslux/q3textures$ find -maxdepth 1 -exec du -sh {} \;
1,8M	.
4,0K	./colua0.jpg
0	./README.txt
1,7M	./.git
4,0K	./newfile.jpg

We clone the test repository:

udi@udi-noti:~/testrepo$ git clone git@q3textures.udionline.hu:q3textures
Cloning into 'q3textures'...
remote: Counting objects: 61, done.
remote: Compressing objects: 100% (52/52), done.
remote: Total 61 (delta 14), reused 0 (delta 0)
Receiving objects: 100% (61/61), 5.61 KiB, done.
Resolving deltas: 100% (14/14), done.

We can see, that only 5.61 KiB was transferred, that’s because only the symlinks were transferred, but not the content:

cloned repository

We need to tell git-annex that we also want the contents of the files. If we only need some specific files, we can be explicit, but now we get all the content:

udi@udi-noti:~/testrepo/q3textures$ git annex get .
(merging origin/git-annex into git-annex...)
get colua0.jpg (from origin...) 
SHA256-s108114--19ba0417d016dc9a8d027ec66468da522d9f4ee93b1a8238b9c9bc7faac89cb1
      108114 100%  996.04kB/s    0:00:00 (xfer#1, to-check=0/1)

sent 30 bytes  received 108275 bytes  43322.00 bytes/sec
total size is 108114  speedup is 1.00
ok
get newfile.jpg (from origin...) 
SHA256-s59873--1359e788294233a6a18b5ed9b5152b843a0b821c5e983c5e2b7e7f660604948a
       59873 100%   57.10MB/s    0:00:00 (xfer#1, to-check=0/1)

sent 30 bytes  received 60025 bytes  40036.67 bytes/sec
total size is 59873  speedup is 1.00
ok

We can already see by the total sizes, that we only downloaded the most recent content, but here are the sizes for comparison:

udi@udi-noti:~/testrepo/q3textures$ find -maxdepth 1 -exec du -sh {} \;
476K	.
4,0K	./colua0.jpg
0	./README.txt
464K	./.git
4,0K	./newfile.jpg

And all our broken symlinks got their real content:

symlinks cloned

The best part is, that we still have the advantage of the DVCS, we got the all the metadata history, we can revert back changes for example:

udi@udi-noti:~/testrepo/q3textures$ git revert HEAD --no-commit --no-edit

Our symlink brakes, because we only got the most recent content:

symlink broken

But the central repository has all the content, so we can get the old file:

udi@udi-noti:~/testrepo/q3textures$ git annex get .
get newfile.jpg (from origin...) 
SHA256-s48188--341f2d101dfcf6af5bc8e9ce444fbbfbde0c9c1893a3a8d33fef452bad730edf
       48188 100%   45.96MB/s    0:00:00 (xfer#1, to-check=0/1)

sent 30 bytes  received 48340 bytes  5690.59 bytes/sec
total size is 48188  speedup is 1.00
ok

And the symlink is working again, now showing the old file:

old files

Also our working directory gains some weight, but its still smaller, because not all the content history is there:

udi@udi-noti:~/testrepo/q3textures$ find -maxdepth 1 -exec du -sh {} \;
584K	.
4,0K	./colua0.jpg
0	./README.txt
572K	./.git
4,0K	./newfile.jpg

Permissions

Gitosis is used because of the rich permission settings it allows. You can specify permissions per group, per repository or even per branch. But for that you have to go through gitosis, and that means uploading an SSH key. You can still use git-daemon to clone a repository without a key, but such a repository won’t be setup with the appropriate git-annex remote.

So git-daemon cloning should be avoided:

Files tracked by git will be downloaded just fine:

teszt@udi-noti:~/q3textures$ git clone git://q3textures.udionline.hu/q3textures
Cloning into 'q3textures'...
remote: Counting objects: 61, done.
remote: Compressing objects: 100% (52/52), done.
remote: Total 61 (delta 14), reused 0 (delta 0)
Receiving objects: 100% (61/61), 5.61 KiB, done.
Resolving deltas: 100% (14/14), done.

But the git-annex repository won’t be setup properly:

teszt@udi-noti:~/q3textures$ git annex get .
(merging origin/git-annex into git-annex...)
get colua0.jpg (not available)
Try making some of these repositories available:
a340f294-9690-4d44-8e2b-0beeeefc029f -- origin
failed
get newfile.jpg (not available)
Try making some of these repositories available:
a340f294-9690-4d44-8e2b-0beeeefc029f -- origin
failed
git-annex: get: 2 failed

For proper permissions and functioning always use the gitosis way:

Cloning succeeds because everyone has read permissions:

teszt@udi-noti:~$ git clone git@q3textures.udionline.hu:q3textures
Cloning into 'q3textures'...
remote: Counting objects: 61, done.
remote: Compressing objects: 100% (52/52), done.
remote: Total 61 (delta 14), reused 0 (delta 0)
Receiving objects: 100% (61/61), 5.61 KiB, done.
Resolving deltas: 100% (14/14), done.

Getting file content succeeds because everyone has read permissions:

teszt@udi-noti:~/q3textures$ git annex get .
(merging origin/git-annex into git-annex...)
get colua0.jpg (from origin...) 
SHA256-s108114--19ba0417d016dc9a8d027ec66468da522d9f4ee93b1a8238b9c9bc7faac89cb1
      108114 100%    1.34MB/s    0:00:00 (xfer#1, to-check=0/1)

sent 30 bytes  received 108275 bytes  24067.78 bytes/sec
total size is 108114  speedup is 1.00
ok
get newfile.jpg (from origin...) 
SHA256-s59873--1359e788294233a6a18b5ed9b5152b843a0b821c5e983c5e2b7e7f660604948a
       59873 100%    8.16MB/s    0:00:00 (xfer#1, to-check=0/1)

sent 30 bytes  received 60025 bytes  40036.67 bytes/sec
total size is 59873  speedup is 1.00
ok

Pushing and writing file content will fail (even if it says ok) because not everyone has write permissions:

teszt@udi-noti:~/q3textures$ git push origin master
FATAL: W any q3textures teszt DENIED by fallthru
(or you mis-spelled the reponame)
fatal: The remote end hung up unexpectedly
teszt@udi-noti:~/q3textures$ git annex copy newfile-teszt2.jpg --to origin
copy newfile-teszt2.jpg (checking origin...) ok

Ease of use

On the server side gitosis can be hooked together with Indefero and then users can upload SSH keys by themselves and managing groups can be done on the webpage. The test environment still lacks this.

On the client side git-annex is not user friendly… yet. After the wildly successful Kickstarter campaign, Joey Hess is working hard on git-annex assistant. Once it’s released we should check the client side again, but for now git-annex requires POSIX systems and some command line skills which contributing artists may or may not have.

Conclusion

This experimental system fulfills 4 out of the 5 goals, and the fifth goal is under way. But even now this approach has some advantages over the current version control system we have for Open Arena. I think we should use this or a similar setup for OA3, but even if we won’t, I think I will use this for handling the missing texture project.

×

Comments are closed.