Galago is a toolkit for experimenting with text search. It is based on small, pluggable components that are easy to replace and change, both during indexing and during retrieval.

It includes TupleFlow, which is a distributed computation framework like MapReduce or Dryad. TupleFlow manages the difficult parts of processing text: serializing data, sorting it, and distributing processing. The IndexReader and IndexWriter classes manage storing key/value pairs like inverted lists. This makes it possible to make your own kinds of index structures without starting from scratch.

The retrieval system supports a variant of the Indri query language, but redesigned to be more flexible. You can add your own query operators without recompiling the core libraries; just put your new operator in the classpath and reference it in a query.

Galago is written in Java and works on any system with a JDK version 1.5 or later. It has been tested on Windows, Mac OS X and Linux.

Getting Started

Start with the Quick Start tutorial to get used to the tools. It takes about 10 minutes to build an index and run some queries if you already have Java installed.

Getting Help

Documentation is constantly being improved here. Try looking in the Documentation section or in the JavaDocs for help. You can also try the mailing list.