Better XML support for Page Templates
Status
Problem
A number of issues regarding the handling of filesystem-based Page Templates have turned up over the past couple of years. In particular:
- Specifying XML-mode parsing by requiring an XML declaration at
the top of the file creates problems:
- Templates that are used to produce fragments of XHTML are required to include the XML declaration, and will generate the declaration on output, but the declaration cannot be included in the middle of an assembled document. (That violates the XML specification.)
- For output that's consumed by Microsoft Internet Explorer. The XML declaration causes MSIE to enter quirks mode incorrectly, regardless of the DOCTYPE declaration or well-formedness of the document. This in turn triggers bugs in the handling of the box model by stylesheets.
- The tight binding of the input mode and the output mode makes it impossible to re-use even simple macros in both XHTML and HTML.
- It should be possible to specify the output mode and content type independently of the input mode.
This proposal addresses the first issue, and questions the approach taken regarding the other two issues.
Enabling XML-mode parsing
Currently, XML-mode parsing is enabled on filesystem-based templates by requiring the document start with an XML declaration. XML does not require this, and MSIE exhibits the issue described above when the XML declaration is present. We need another way to enable XML-mode parsing.
In addition to the current approach to sniffing for an XML declaration, we propose to add new ways to identify a template as an XML document:
- Add another "sniff" test based on XML namespaces. If the
document uses namespaces at all (any
xmlnsorxmlns:fooattributes) and parses as XML, XML mode will be used. This check is much less efficient than a check for theh XML declaration. - Using a filename extension of
.xptshould cause Page Templates to be processed in XML mode. This has the advantage that the identification of the file as XML is performed outside the content and can be checked very quickly.
Once we have ways to explicitly indicate XML mode for a template, it becomes possible to avoid the fragment and MSIE problems and still use well-formed XHTML.
Alternatives
Philipp von Weitershausen suggested, during an IRC discussion, that
using an explicit attribute directive (tal:mode in particular)
with an argument to identify the input mode. While the idea is
appealing and ensures that the input mode is expressed explicitly.
To make it proper XML (XHTML or other) requires that the namespace
prefix be declared anyway, so simply detecting the presence of a
namespace declaration seems a cleaner approach, and avoids having
another burnt offering in template files.
Removing HTML output mode
At this point, there does not appear to be any further requirement for HTML-mode output. HTML input still needs to be supported due to backward compatibility. HTML input can be used to generate XHTML output (with code changes).
By generating XHTML for HTML input, all templates can be made to work together without having the HTML/XML mode incompatibility that exists today. This adds a great deal of flexibility.
We propose removing HTML output mode in favor of generating XHTML from HTML input. This simplifies the implementation of both the "bytecode" generator and the interpreter, and is expected to produce a small speedup as a result.
Note that avoiding HTML mode does not avoid the problem described
in issue 3 above. While most modern browsers are happy enough to
see XHTML properly labelled with the application/xhtml+xml
content type, there are still some which require using text/html
to get HTML of any flavor.
Deprecating HTML input mode
Many templates have already been written in HTML, and these need to continue to be supported for some time. HTML input, however, should eventually be removed. To this end, we propose deprecating HTML input with removal scheduled for Zope 3.4. Using the HTML-mode Page Template parser will cause a deprecation warning to be generated; these should be caught while testing applications with the new versions of Zope to avoid bloating logs in production environments.
My two cents --philikon, 2005/10/15 01:00 EST reply
I've been thinking about this problem for some while and am glad to see most of my suggestions I've made in the past (see http://mail.zope.org/pipermail/zope3-dev/2005-August/015488.html for example) being incorporated into this proposal. After our IRC discussion and a good night's sleep, I'm a bit less irritated by the suggestion of using the .xpt file extension to indicate XML input mode than I was before. I would still prefer something else but alas I can't think of anything better. One question I have is: When we say that HTML input mode will be deprecated starting with Zope 3.2, that means the use of all other file extensions, including .pt, will generate deprecation warnings, right?
My two cents --fdrake, 2005/10/15 22:25 EST reply
There is no plan to deprecate any XML-mode input, regardless of file
extension. The .pt extension does not currently mandate HTML-mode input
and will not have that semantic added. Existing XML-mode input files will
not need to be renamed. The suggested .xpt extension will simply cause
the template to the parsed in XML mode without the need for an XML declaration
at the beginning of the file.
Given the additional possibility of detecting a namespace declaration in the
document (anywhere), it may be reasonable to even skip the use of the .xpt
extension, but using the extension means that the input mode can be specified
without any need to use a sniff-test on the document content. This would
always be more efficient than sniffing the content.
