Proposals Table of Contents Last edited on Jan 20, 2001 11:30 pm
Zope Fish


Proposal Status

This proposal has been subsumed by Python20Migration


This proposal addresses two different but related problems.

  1. Python 2.0 introduces a new built-in unicode string type. It is desirable for zope applications to use this type. The following problems need to be resolved:

    Unicode strings should be available on Zope property sheets.

  2. DTML should allow unicode strings to be used as easily as plain strings today when creating unicode-aware pages.

    Also, existing DTML pages that are not unicode-aware should safely degrade when encountering unicode content. This is necessary to allow unicode values for standard Zope properties, such as title, without breaking the many exiting pages that use a title.

Proposed Solutions

A protype for this solution has been in development since December 1999. Several snapshots are available at . The currently favored implementation strategy is summarised here.

  1. several new tags have been added to ZPublisherx for marshalling unicode data. They fall into two categories.

    Firstly, there are four new type converter tags, ustring, utokens, utext, and ulines. These are unicode equivalents of string, tokens, text, and lines.

    Secondly, there is a new type of tag which specifies the character encoding in which form data is being submitted. For example, a field might now be named "description:utf8:ustring"

  2. DTML's render_blocks function has been changed to return a unicode string if any of it's constituents are unicode. Non-unicode content is promoted up to unicode assuming it contains latin-1 character data.

  3. ZPublisherx has been changed to handle a unicode response. If the response is Unicode then it applies the character encoding specified by the charset property in the Content-Type header. (This applies to all text/* content-types). An error is raise if the response contains a unicode character that can not be represented in the specified encoding.

    If the response does not contain a Content-Type header, or if it does not have a charset property. The content is encoded back down to latin-1. In this case any characters not in the latin-1 range are replaced with a ?. (It is this rule that allows old management pages to work with objects that have a unicode title)

Risk Factors


This proposal does not address the related issues of:


View source UnicodeSupport
Advanced Actions / History
Visitor: Anonymous User
Jump to:
... by pagename prefix or search term.
For a plain search: