Last night I saw a tweet re-tweeted by David Glick about a post from one of the original guys who worked on Google Code search, Russ Cox. He has released some code for doing code search based upon the ideas used by Google Code Search.
Google shut down Google Code search recently and his code allows you to do similar hybrid index/regex searching on files locally.
I have to confess, I never really used Google Code Search. I guess probably due to until actually reading the post above I didn't realise that it could do regular expression searching. This is a 'big thing' when searching in code. I really don't know how I missed that fact.
I am a bit of a (lapsed) search and information retrieval geek myself. My final year project at university was a full text mailing list indexer/search system (similar to gmane). This was based heavily on the seminal work mentioned in Russ' post Managing Gigabytes by Witten, Moffat, and Bell and also by Modern Information Retrieval by Baeza-Yates and Ribeiro-Neto and the paper that influenced much of the internet as we know it now, Brin and Page's The Anatomy of a Large-Scale Hypertextual Web Search Engine.
Russ Cox has released the code to his tool at http://code.google.com/p/codesearch/ and you can download binaries for OSX, FreeBSD, Linux, and Windows.
Here is a quick rundown on using it to index and then search my local development directory on my Macbook. This allows me to quickly search through both all the code in the projects I'm working on, but all my egg cache, which contains all the code to Plone itself. Great for when you see a particular bit of markup, or error message and need to work out where it is coming from in the code.
Firstly download the binaries from the link above, or compile it yourself. Then you need to get it to index your directories. In my case everything I do in under /Development:
dhcp10:~ matth$ cindex /Development 2012/01/23 14:47:26 index /Development 2012/01/23 14:49:08 flush index 2012/01/23 14:49:08 merge 7 files + mem 2012/01/23 14:49:23 520694590 data bytes, 82743702 index bytes 2012/01/23 14:49:23 merge /Users/matth/.csearchindex /Users/matth/.csearchindex~ 2012/01/23 14:49:39 done
And then you can use it to search through the code:
dhcp10:~ matth$ csearch "def authenticate" /Development/buildout-cache/eggs/AccessControl-2.13.4-py2.6-macosx-10.7-x86_64.egg/AccessControl/userfolder.py: def authenticate(self, name, password, request): /Development/buildout-cache/eggs/AccessControl-2.13.4-py2.6-macosx-10.7-x86_64.egg/AccessControl/users.py: def authenticate(self, password, request): /Development/buildout-cache/eggs/AccessControl-2.13.4-py2.6-macosx-10.7-x86_64.egg/AccessControl/users.py: def authenticate(self, password, request): /Development/buildout-cache/eggs/Paste-1.7.5.1-py2.6.egg/paste/auth/basic.py: def authenticate(self, environ): /Development/buildout-cache/eggs/Paste-1.7.5.1-py2.6.egg/paste/auth/digest.py: def authenticate(self, environ): /Development/buildout-cache/eggs/Paste-1.7.5.1-py2.7.egg/paste/auth/basic.py: def authenticate(self, environ): /Development/buildout-cache/eggs/Paste-1.7.5.1-py2.7.egg/paste/auth/digest.py: def authenticate(self, environ): ...
One current issue is there is no way to exclude directories from the index, so you get .svn directories in the results
dhcp10:~ matth$ csearch "sspi.ServerAuth" /Development/netsight.windowsauthplugin/netsight/windowsauthplugin/windowsauthplugin/.svn/text-base/krbtest.py.svn-base: sa = sspi.ServerAuth('Negotiate') /Development/netsight.windowsauthplugin/netsight/windowsauthplugin/windowsauthplugin/.svn/text-base/plugin.py.svn-base: sa = sspi.ServerAuth('Negotiate') /Development/netsight.windowsauthplugin/netsight/windowsauthplugin/windowsauthplugin/krbtest.py: sa = sspi.ServerAuth('Negotiate') /Development/netsight.windowsauthplugin/netsight/windowsauthplugin/windowsauthplugin/plugin.py: sa = sspi.ServerAuth('Negotiate') /Development/py24nsp/sanofi/src/netsight.windowsauthplugin/netsight/windowsauthplugin/.svn/text-base/krbtest.py.svn-base: sa = sspi.ServerAuth('Negotiate') /Development/py24nsp/sanofi/src/netsight.windowsauthplugin/netsight/windowsauthplugin/.svn/text-base/plugin.py.svn-base: sa = sspi.ServerAuth('Negotiate') /Development/py24nsp/sanofi/src/netsight.windowsauthplugin/netsight/windowsauthplugin/krbtest.py: sa = sspi.ServerAuth('Negotiate') /Development/py24nsp/sanofi/src/netsight.windowsauthplugin/netsight/windowsauthplugin/plugin.py: sa = sspi.ServerAuth('Negotiate') /Development/py26nsp2/plone41demo/src/netsight.windowsauthplugin/netsight/windowsauthplugin/.svn/text-base/krbtest.py.svn-base: sa = sspi.ServerAuth('Negotiate') /Development/py26nsp2/plone41demo/src/netsight.windowsauthplugin/netsight/windowsauthplugin/.svn/text-base/plugin.py.svn-base: sa = sspi.ServerAuth('Negotiate') /Development/py26nsp2/plone41demo/src/netsight.windowsauthplugin/netsight/windowsauthplugin/krbtest.py: sa = sspi.ServerAuth('Negotiate') /Development/py26nsp2/plone41demo/src/netsight.windowsauthplugin/netsight/windowsauthplugin/plugin.py: sa = sspi.ServerAuth('Negotiate')
You can however restrict the search to a path (regex):
dhcp10:~ matth$ csearch -f /Development/py26nsp/cplonline/src/netsight.cpl "def auth" /Development/py26nsp/cplonline/src/netsight.cpl/netsightcpl/.svn/text-base/models.py.svn-base: def authenticate(self, password): /Development/py26nsp/cplonline/src/netsight.cpl/netsightcpl/models.py: def authenticate(self, password):
As it can take regular expressions, you can do more advanced searches, e.g. 'all methods that take an argument called password':
dhcp10:~ matth$ csearch -f /Development/py26nsp/cplonline/src/netsight.cpl "def.*\(.*password" /Development/py26nsp/cplonline/src/netsight.cpl/netsightcpl/__init__.py:def fetchandload(ftpcmd, loadcmd, ftpcache, host, user, password): /Development/py26nsp/cplonline/src/netsight.cpl/netsightcpl/evidence.py: def __init__(self, uri, user=None, password=None): /Development/py26nsp/cplonline/src/netsight.cpl/netsightcpl/models.py: def __init__(self, name, login, password, company_id, role=None): /Development/py26nsp/cplonline/src/netsight.cpl/netsightcpl/models.py: def _hash_password(self, password, salt): /Development/py26nsp/cplonline/src/netsight.cpl/netsightcpl/models.py: def set_password(self, password): /Development/py26nsp/cplonline/src/netsight.cpl/netsightcpl/models.py: def authenticate(self, password): /Development/py26nsp/cplonline/src/netsight.cpl/netsightcpl/tests/base.py: def login(self, login='admin', password='admin'):
So thanks to Russ for releasing this. It certainly is much quicker for search through large directories of code than using grep or ack