lxml dependency
lxml dependency
Status
Author
Julien Anguenot
Problem/Proposal
Zope3 should require lxml as a mandatory dependency. More info about lxml can be found at http://codespeak.net/lxml/
Goals
Having XML related Python code within Zope3 that will
- be more "Pythonic" than SAX based API,
- be faster than the DOM implementation from the standard library,
- benefit from lxml features such as XPath, Relax NG, XML Schema, etc., and
- be more readable and thus easier to maintain.
As well, having lxml as a dependency will provide us lots of new possibilities for the Zope3 core because of the amount of XML technologies available.
In particular, being able to use XPath to test result pages instead of output comparison with simple ellipsis handling would be a real improvement for many tests.
Proposed Solution
Requiring lxml as a mandatory dependency for Zope3
Risks / issues
- lxml dependency implies libxml2, libxlst and PyRex? dependencies (We could get rid of the PyRex? dependency if we want to on lxml layer)
- coping with the evolution of lxml in its own repository (codespeak.net)
- There is no official windows release of lxml right now
What need does this address? --fdrake, 2005/08/04 12:36 EST reply
At this point, I'm not at all convinced that this is needed by Zope 3. The reasons for including this as stated by this proposal revolve entirely around two things: making a better XML library available to the application programmer, and making one new feature (albeit a valuable one) to the test author. The former is not necessary; applications that need XML processing and choose lxml are free to do so now, and applications which don't need XML processing simply don't need it.
The desire to add XPath support to the testing framework is more interesting. XPath is useful in testing even if the application is not itself processing XML, since functional tests of browser views can benefit (often substantially) from a cleaner expression of what's expected. So XPath itself provides a desirable capability for the test author. Whether that's sufficient to justify requiring libxml2, libxslt, Pyrex, and lxml is a difficult question.
The desire to use XPath in tests is the only need for new XML technologies in Zope, at least as expressed in this proposal.
Jim and I have discussed this a few times, and haven't exactly agreed on the answer.
lxml can also serve framework authors --faassen, 2005/08/04 13:18 EST reply
Another motivation, briefly mentioned by Julien, is wanting to use lxml facilities
in other frameworks which are to be part of Zope 3.
Concrete examples:
- XPath expressions in tests
- reading workflow definition language. (Julien is already using lxml there as far as I understand)
Possible examples:
- XSLT templates (in addition to page templates)
- XSLT pipelines for postprocessing Zope output. (this has been experimented with in Zope 2 and Zope 3 contexts)
- Experimental work done on Ajax involved lxml last I heard.
- REST support. (XML generation, transformation, schema can help verifying output is correct, XPath can be used to process results of doing a REST-style request somewhere)
Whenever a piece of framework has XML processing to do, lxml may come in handy, and it's better for the understandability and consistency of the code to use one powerful XML processing library than a diversity of them.
Of course one could question whether any of these proposed features should be part of the Zope 3 core. Opening up these options sounds nice though...
Another, more marketing related, argument could be that in Zope 2, XML support was generally has not been very good or commonly used. Including an XML library with the core of Zope 3 could help convince people that Zope 3 is going to be quite different. Since the buzzword rating of various XML technologies is rather high, I think such a marketing effect should not be underestimated.
So perhaps this is more about opening up potential than addressing specific needs.
lxml can also serve framework authors --fdrake, 2005/08/04 13:29 EST reply
Indeed, frameworks will often want to process XML. And lxml is a good choice for that. However, that's not a technical reason to include or require lxml directly in Zope 3.
The buzzword value (and hoped-for marketing consequences) seems to be the only aspect of the convenience appeal that makes any sense. I don't generally care about those, but generally consider such issues a distraction. The community will just have to decide how much of that we need to include in Zope itself, but from my perspective, requiring something that we don't use is silly. The C libraries lxml exposes are large and take quite some time to build compared to the C code in Zope 3 itself, which doesn't help.
Concrete example --philikon, 2005/08/04 13:48 EST reply
In general, I agree with most what has been said in previous comments by Martijn and Fred. In the end, I actually value lxml inclusion into the Zope 3 core higher than having it not because I see indeed some potential for using it in the core (aside from the huge potential of framework authors).
One concrete example I have is the DAV code. It heavily depends on parsing and outputting XML from and to the client. We're currently using minidom there and it's a pain in the neck, especially because DAV XML can be complicated sometimes (e.g. when it comes to LOCKing?). It could just be coincidence, but I think the fact that we don't have a proper XML tool at hand and that DAV is implemented so sloppily go together...
Other than that, I think XPath expressions in tests is the most important new feature that I value as being useful.
Overall, a +0.5 from me.
Do not forget zope.configuration --srichter, 2005/08/04 13:54 EST reply
Another package that will greatly benefit from the lxml inclusion is zope.configuration. Once we have XPath we can implement selective disabling and overriding of directives, something that is really needed in the future. Currently we have no way of disabling directives!
Concrete example (bis) --anguenot, 2005/08/04 14:01 EST reply
see the xpdlcore inclusion proposal
How can you use something if you don't require it? --faassen, 2005/08/04 14:07 EST reply
Fred, you seem want to see actual uses of lxml in Zope 3 code before we require it. This means that lxml is completely off the board, and so is any other library that doesn't happen to be in Zope 3 already (or that is written by the Zope 3 developers themselves).
There have been a number of suggestions for using lxml in various parts of the Zope 3 code base. Packages in Zope 3.1b1 (not svn) which concretely use XML now are:
- zope.app.dublincore
- zope.tal
- ZConfig
- docutils
- zope.app.rotterdam.xmlobject
- zope.app.apidoc.codemodule
- zope.app.dav
- zope.configuration
- zope.i18n.locales
Finally, marketing the Zope 3 platform is, in my opinion, empathically not a distraction.
How can you use something if you don't require it? --fdrake, 2005/08/04 14:32 EST reply
What I don't want is to require something that isn't used, you're right. I do understand that there are checkin-ordering dependencies; I'm not concerned about the fact that lxml would need to be required before something using it is added.
I found your list of current uses of XML interesting. docutils does provide a way to access a DOM, but does no parsing of XML that I can see. It is also third-party code, so it makes little sense to modify it in any way.
ZConfig has a constraint that it only depend on the standard library; that's intentional. I suspect non-Zope users of ZConfig would be surprised to find lxml a dependency.
The XML support in zope.app.dublincore probably isn't used; I started something there thinking that there had to be a better way to deal with metadata for zsync, but I didn't finish because I haven't figured out how to reconcile the content-object-vs.-annotation dichotomy with the let's-make-filesystem-representations-sensible world view.
The zope.tal case is especially sensitive, since it relies on details of getting information from the parser that extends well beyond the XML infoset into the layer of concrete markup.
The others I'd have to look at.
What I think you're proposing is that 1) lxml be required, and 2) something be added that uses it. If we're adding something (a new feature) that's generally useful to a substantial portion of the Zope user base, then considering lxml makes a lot of sense. Re-writing existing features to use lxml doesn't make sense until the former has been done, and only if it'll make the implementation for that feature cleaner (and no, I'm not naturally anti-SAX).
I want XPATH and XSLT available in Zope 3 "out of the box" --jim, 2005/08/14 10:11 EST reply
I want Zope 3 developers to be able to count on XSLT and XPATH, if only for
writing cleaner tests, although I agree with Martijn that XSLT is likely
to become core (in some way) for creating pages. (I mainly see it as a
potential better alternative to METAL for managing site look and feel.)
lxml looks like the easiest cleanest way to get ubiquitous XPATH and XSLT into Zope. I would be inclined to include lxml in Zope, if practical, and have libxml2 and libxslt be dependencies. I hope that requiring lxml or libx... doesn't cause too much pain.
