Contact
http://www.zope.org/Members/htrd
Proposal Status
This proposal has been subsumed by Python20Migration
Problem
This proposal addresses two different but related problems.
Python 2.0 introduces a new built-in unicode string type. It is desirable for
zope applications to use this type. The following problems need to be resolved:
A means for ZPublisherx to marshall unicode string parameters.
Much code designed for plain strings can also be used for unicode strings.
Several places in the Zope source contain gratuitous s=str(s). These need to be
either removed, or changed to s=ustr(s). (ustr is like str, except its return
value may be either a plain or unicode string).
unicode objects are one of the few built-in objects for which str(object) often
raises an exception. Various debugging aids need to be changed to use repr() rather
than str()
Unicode strings should be available on Zope property sheets.
DTML should allow unicode strings to be used as easily as plain strings today when
creating unicode-aware pages.
Also, existing DTML pages that are not unicode-aware should safely degrade
when encountering unicode content. This is necessary to allow unicode values for
standard Zope properties, such as title, without breaking the many exiting pages
that use a title.
Proposed Solutions
A protype for this solution has been in development since December 1999. Several snapshots are
available at http://www.zope.org/Members/htrd/wstring . The currently favored
implementation strategy is summarised here.
several new tags have been added to ZPublisherx for marshalling unicode data. They
fall into two categories.
Firstly, there are four new type converter tags, ustring , utokens , utext ,
and ulines . These are unicode equivalents of string , tokens , text , and lines .
Secondly, there is a new type of tag which specifies the character encoding in which
form data is being submitted. For example, a field might now be named
"description:utf8:ustring"
DTML's render_blocks function has been changed to return a unicode string if any of
it's constituents are unicode. Non-unicode content is promoted up to unicode assuming
it contains latin-1 character data.
ZPublisherx has been changed to handle a unicode response. If the response is Unicode
then it applies the character encoding specified by the charset property in the
Content-Type header. (This applies to all text/* content-types). An error is raise
if the response contains a unicode character that can not be represented in the
specified encoding.
If the response does not contain a Content-Type header, or if it does not have a charset
property. The content is encoded back down to latin-1. In this case any characters not
in the latin-1 range are replaced with a ? . (It is this rule that allows old management
pages to work with objects that have a unicode title)
Risk Factors
This may break code that expects DTML will always return a plain string.
The migration path is harder for users who currently have plain strings containing
data using a character encoding other than latin-1 (or a subset of latin-1 such as ASCII)
Scope
This proposal does not address the related issues of:
Deliverables
|