Z Object Publishing Environment

Search | Download | Documentation | Resources | Members




Join Zope.org
Log in

Developer Home
Get Involved!
The Fishbowl

 Zope Exits

Zope Newbies

page served by app2

Proposals Table of Contents Last edited on Jan 20, 2001 11:30 pm



Proposal Status

This proposal has been subsumed by Python20Migration


This proposal addresses two different but related problems.

  1. Python 2.0 introduces a new built-in unicode string type. It is desirable for zope applications to use this type. The following problems need to be resolved:

    • A means for ZPublisherx to marshall unicode string parameters.

    • Much code designed for plain strings can also be used for unicode strings. Several places in the Zope source contain gratuitous s=str(s). These need to be either removed, or changed to s=ustr(s). (ustr is like str, except its return value may be either a plain or unicode string).

    • unicode objects are one of the few built-in objects for which str(object) often raises an exception. Various debugging aids need to be changed to use repr() rather than str()

    Unicode strings should be available on Zope property sheets.

  2. DTML should allow unicode strings to be used as easily as plain strings today when creating unicode-aware pages.

    Also, existing DTML pages that are not unicode-aware should safely degrade when encountering unicode content. This is necessary to allow unicode values for standard Zope properties, such as title, without breaking the many exiting pages that use a title.

Proposed Solutions

A protype for this solution has been in development since December 1999. Several snapshots are available at http://www.zope.org/Members/htrd/wstring . The currently favored implementation strategy is summarised here.

  1. several new tags have been added to ZPublisherx for marshalling unicode data. They fall into two categories.

    Firstly, there are four new type converter tags, ustring, utokens, utext, and ulines. These are unicode equivalents of string, tokens, text, and lines.

    Secondly, there is a new type of tag which specifies the character encoding in which form data is being submitted. For example, a field might now be named "description:utf8:ustring"

  2. DTML's render_blocks function has been changed to return a unicode string if any of it's constituents are unicode. Non-unicode content is promoted up to unicode assuming it contains latin-1 character data.

  3. ZPublisherx has been changed to handle a unicode response. If the response is Unicode then it applies the character encoding specified by the charset property in the Content-Type header. (This applies to all text/* content-types). An error is raise if the response contains a unicode character that can not be represented in the specified encoding.

    If the response does not contain a Content-Type header, or if it does not have a charset property. The content is encoded back down to latin-1. In this case any characters not in the latin-1 range are replaced with a ?. (It is this rule that allows old management pages to work with objects that have a unicode title)

Risk Factors

  • This may break code that expects DTML will always return a plain string.

  • The migration path is harder for users who currently have plain strings containing data using a character encoding other than latin-1 (or a subset of latin-1 such as ASCII)


This proposal does not address the related issues of:

  • ZCataloggingx unicode data

  • Localization

  • Better unicode support for XMLDocument and related products.


View source UnicodeSupport
Advanced Actions / History
Visitor: Anonymous User
Jump to:
... by pagename prefix or search term.
For a plain search:
Privacy policy       Printable Page       Feedback about Zope.org