Locale-specific Text Collation

Status: IsImplementedProposal

Author

JimFulton and Jacob Holm

Problem

When presenting users with ordered text (e.g. sorted lists of options), simply sorting Unicode strings doesn't provide an ordering that users in a given locale will find useful. Various languages have text sorting conventions that don't agree with the ordering of Unicode code points. (This is even true for English. Generally, users prefer to see text sorted without regard to case.)

Proposal

I propose to provide an API for locale-specific ordering of Unicode strings for presentation to users. I propose that there be an adapter from locale to ICollator?:

    class ICollator(Interface):
        """Provide support for collating text strings

        This interface will typically be provided by adapting a locale.
        """

        def key(text):
            """Return a collation key for the given text.
            """

        def cmp(text1, text2):
            """Compare two text strings.

            The return value is negative if text1 < text2, 0 is they are
            equal, and positive if text1 > text2.
            """

Here is some example code that illustrates how the API would be used:

      ordered_options = sorted(unordered_options, 
                               key=ICollator(request.locale).key,
                               )

Note that it is almost always more efficient to pass the key method to sorting functions, rather than the cmp method. (The cmp method is more efficient when strings are long and few and when they tend to differ at their beginnings. This is because computing the entire key can be much more expensive than comparison when the order can be determined based on analyzing a small portion of the original strings.)

We will provide an optional implementation of this adapter that uses the International Components for Unicode (ICU) library.

We will also provide a fall-back adapter that simply normalizes for case (in addition to basic Unicode normalization). This will be included in the zope.i18n package.

Leveraging the API

Various components, such as list widgets displaying text and sorted text columns in tabular display will need to be updated to take advantage of this API. Note that this will be beneficial for all domains and will benefit some domains even when the ICU-based adapter isn't used.



( 96 subscribers )