I don’t have any issues with disk space being wasted with so many copies, but one thing that does bother me is having to re-download everything everytime. Network speeds are not as fast as I wanted. There are several ways to make this less of a hassle, one being installing a full blown cached http proxy server and point the gem command there.
Another is to create your own small cache server just for rubygems. This is what Nando Vieira did with his Rubygems Proxy. I tweaked it and made a fork for myself with an added trick.
The original idea is: you change your .gemrc and your projects’ Gemfile to point the source to this new proxy application. Then whenever you run gem install or bundle, it would go through this new source. The proxy application is a very simple Rack app that checks if the files exists locally, otherwise it goes to the rubygems.org site, downloads and caches the file, so subsequent requests just fetch the file from disk.
The Hack
I changed this proxy app to request the file directly from production.cf.rubygems.org. It seems like “rubygems.org” merely redirects all gem requests to it. The reason for this hack is because now I can only add 127.0.0.1 rubygems.org to my /etc/hosts file and I don’t need to change the .gemrc or the Gemfile. Whenever something tries to install gems, it will hit my local proxy, which will cache the files as I wanted. Problem is, you lose access to the original website, but I personally don’t use it very much. Another hassle is if you publish gems, the gem push command will probably fail and another hack is probably required. I don’t publish gems too often so it’s not a problem for me. One can make a script to turn the /etc/hosts line on or off or just manually edit it whenever you need to publish a new version of your gem.
This should speed things up and I can see this being used in continuous integration servers, where false-positive alerts sometimes triggers because the bundle command wasn’t able to hit the rubygems.org server due to internal network instabilities, for example. But even for just my development environment, this should make things go smoother as the network will be hit only when the requested file is not available in the local cache.
The Fix?
Now, this is a big hack, I know this much. The correct thing to do is to change the RubyGems project itself to make it cache things correctly. The ideal thing would be for the gem command to fetch the online file and save it in a global repository, for example, /var/local/rubygems, then if you’re in Linux or Mac, the command could just symlink it. That way even if you have multiple rubies, with multiple gemsets and multiple projects using bundler, they would all point to the same original global directory. This would save not only network hits but also disk space, making RubyGems much more useful.
I am not familiar with the history of the RubyGems project, so I don’t know if this was attempted before and even if it is a good idea. If you’re more familiar with the project, let me know. And feedback if this hack is useful or not is welcome.