Xapian text indexer
Xapian-based TextIndexer?
Status
Author
Tarek Ziadé (tziade@nuxeo.com)
Problem/Proposal
Implement a text indexer based on Xapian (http://www.xapian.org), for the same reasons Stephan Richter mentioned here:
Xapian is written in C++ and is very fast. It's instantly usable from Python (SWIG), easier to build and use than PyLucene.
FYI, very interesting link on indexer comparison:
http://blogs.nuxeo.com/sections/blogs/florent_guillaume/2005_06_30_comparison_indexers
Goals
- Write a Zope TextIndexer? using Xapian
- See if a base external indexer can be tought of, to provide an
common interface for all external indexers for zope
Proposed Solution
- Formalize how indexes are stored on the Xapian server
- Take back Stephan's recipe on how to build a Zope 3 index and apply it
- write the indexer
Risks
- the only risk i can see is that these kind of indexers needs another piece of software to be installed on the system.
Why the Zope 3 core? --philikon, 2005/09/27 09:13 EST reply
I like looking out to other software projects. Reusing a fast indexer might have some real benefits (it might also not in terms of flexibility; Python code, especially when it's code written for Zope 3, is usually easier to customize). In either case, I think it's great that you are looking into this, but I don't think this has much of a place in the Zope 3 core. Like Florent puts it in the blog entry you cite above, it's about "something nice for the Z3 ECM." So we should probably have this discussion there.
Note that this is a different situation than the one we have with lxml. lxml depends on libxml/libxslt. Those libraries are available on many systems by default or can be installed easily. This is not the case with xapian (darwinports for example doesn't even have it as a package).
Fortuitiously... --tseaver, 2005/09/28 07:57 EST reply
For a client of ours, Chris McDonough? has figured out how to get
Xapian to build on MacOS? X. I don't know whether he has sent a patch
to the DarwinPorts? guys (or even if he is using DarwinPorts? at all).
While I agree that a Xapian-based index might not be suitable for inclusion in Zope3 due to dependencies, I think we might need to rip out a bunch of other stuff already there. Here are some criteria for removing stuff which appeal to me:
- Unsuitable dependencies
- Incomplete or "sample" implementations (the message board app, for instance).
- Lack of use in "real" applications (the Z3 mysql DA completely fell over when I tried to use it under load, for isntance).
- Lack of maintenance. If it hasn't been modified since before the Zope 3.1 beta release, for instance, it is probably bitrotted.
