- by Lalo (2000/09/07)
Sorry to add to the top of the page; I know that's
bad wiki etiquette, but I feel people are seeing two different projects
here, so let's dispell some misunderstanding to try to reduce noise:
What Chris proposes is not a new, XML-based serialization/deserlzn
mechanism to replace pickle - this is already been worked on somewhere,
I believe -- look for python's XML sig.
XMLToObjects is about taking XML in a possibly non-zope-related schema
and DTD (for example, RDF/RSS or NewsML) and construct a family of Zope
(or python) objects from the data represented in it.
That said, proceed to the discussion...
- ChrisM (8/31/2000)
Please add your comments here.
- ShaneH (9/1/2000)
I really don't deserve the credit (blame? ;-) ) for this. It's a pretty straightforward idea. Most data can be represented as a DOM, and XSLT or Python can transform most data representable as a DOM into another format. PPML is a pretty simply format to write.
There's just one issue: what are some specific applications where this would be useful? One application is where you have data larger than what can fit in memory and you need to stream the conversion.
- Paul (2000/09/02)
Careful, as Shane knows, I gotta lot to ask for on this! It's Christmas! :^)
First, if the goal is pickling, then we should talk about pickling.
Also, if the goal is Python and not just Zope, pickling is appropriate.
However, I'm curious about what problem this is trying to solve
though. Is the goal safe data interchange between two Zopes? Is
the problem interfacing Zope content with non-Zope systems? Is the
goal to allow authoring complete, structured Zope content in XML
For instance, I've wanted import/export over the web, without
requiring a file to be placed in the Zope directory. However, we've
been told that unpickling can be malicious. Could an XML data format
that didn't involve direct unpickling (meaning contained only approved
data) be better?
Perhaps the goal is to let people browse and edit Zope object data with
XML (e.g. FTP in with Emacs, browse the XML, and do search and replace.)
Having everything look like
might not be so useful. Instead,
having a well-formed schema and XML data like
might be more
- ChrisM (9/2/2000)
It might be better to think of this as "XML import" where the XML to be imported is of an arbitrary nature. The goal is to allow site managers to easily import a well-understood XML file from disk and to turn it in to Zope objects without needing to write their own "loader" routines. During the import, the intermediate format will be ppml. But they should never need (or want) to edit the generated ppml, it just happens to be a convenient existing facility for representing Zope objects. ppml is not really all that important here, we could use a different intermediate format (but it's unclear why we'd want to do work to replace it). Additionally, round-tripping is not a goal. After the XML becomes objects in Zope, there's no immediate need to turn the objects back in to their original XML or anything close to their original XML.
An example: I have an XML file that has the following schema:
<title>System of a Down</title>
<artist>System of a Down</artist>
<title>Wisconsin Death Trip</title>
I'd like to be able to signify in a transform that each
<album> in this file should represent an instance of an existing Zope class "Album", inside which the "title", "artist", and "year" should be attributes of specific types (strings, in this case). The product should allow me to specify this transform easily (otherwise we're just pushing the complexity of writing an event-driven XML loader into writing the transform, and gaining nothing).
A key thing to understand is that we will by definition have no control over the import format. The tranform-generator will need to be pretty smart, and it's probably the real heart of this project. I envision some sort of wizard whereby a transform method is constructed.
Export to non-ppml XML should be the domain of a separate project.
- ChrisM (9/3/2000)
I've been thinking about this a little more. I may have placed too much emphasis on the transform to ppml in the proposal. There actually isn't any real need to transform to ppml (or any other intermediate XML format) at all. The transform is capable of taking the XML directly to objects. So the real problem defined in this proposal is that "its hard to write a loader routine to turn arbitrary XML into arbitrary objects". I'd like this process to be simpler. ppml may play a role, or it may not.
- gwillis (2000/09/04)
Those of you who follow the eGroup know that I have been involved with and interested in XML serialization/deserialization of python/Zope objects. About three weeks ago I developed a "spike" (see XP methodology) test to gather info and do a proof-of-concept using XSLT to transfom XML to python code to declare classes. (Sorry, NDA.) I would like to offer the following:
I respectfully submit that the name of the module should be renamed from XMLToObjects to XMLObjectsx, and make the goal both the serialization and deserialization of objects using XML. A Ser/deser mechanism has a high degree of reuse in systems, ranging from on the fly protocols, distributed objects, rpc, object persistance, etc. One of the nice features of java is that this mechanism is built into the language with a nice interface. I worry that implementing only half of the required interface initially will lead to an overall poor interface. As I will state later, I do not see two mechanisms required for this job, so I see no advantage to half an implementation. In summary, I see no overall goal, strategy or tactic that has benefit from producing half of the capability required to have a complete mechanism. Also, it may be desirable to support the java interface for serialization, since this leads to greater "glueing" capabilities. I see a lot of java XML servers out their. Being able to plug-and-play the major processing mechanisms may make customers consider Zope more closely.
I respectfully submit that XSLT is a desirable technology for implementing this mechanism. Since this is a platform independant technology, the correct architecture can be applied to other platforms and indeed other languages simply be varying stylesheets (more on this next.) As well, as other XSLT engines are refined, they can be "plugged" into the architecture just like XML Parsers which allow "best of breed" deployment. It again might sway a customer towards th platform since a large amount of the stylesheets written will be portable (if done properly). I also feel it is of great marketing importance to leverage XSLT in every aspect of Zope, including as a DTML alternative. Again, system architects will see the value in standard, plug-and-play mechanisms to deliver modular, platform independant, best-of-breed architectures.
Don't make the "Cocoon" architectural mistake. For those familiar, cocoon (XML enabled Apache) was first architected with a single transform approach. The flaws in this approach became quickly apparent. Based on this, and my experience in systems architecture, I recommend a stylesheet library be developed, with a stylesheet representing a particular data format. Each stylesheet will have the capabilities to read and write to a "neutral" format. Therefore, if I want to go from XML to xml pickle, I include the two necessary stylesheets into a third "main routine" stylesheet. The main routine stylesheet serves to enforce a standard interface for this mechanism, kind of like java interface. The implementation to transform into the neatrl form and then to the final form is contained in the other stylesheets. This can be accomplished by using matching and calls. This allows formats to be fanned in and out of the mechanism in the most abstract way.
This architecture yields an open mechanism. Imagine being able to go from XML, streams, HTTP, EDI, pickle into objects, pickle, EDI or vice versa. *Remember XSLT will accept input from any process that emulates the appropriate XML parser API. It is possible to write a SAX interface for a generic parser and be able to glean input from anywhere! One service, many solutions. For those who have the book "XSLT Programmer's Reference", see pages 39 and 40.
By using two stylesheets for each transform and supporting a neutral form, you can leverage existing stylesheets in the library to tranforms from an existing supported protocol to a new one for which you write a stylesheet. *This is exactly the "cocoon" XML approach, and is the design pattern I have used on several projects with great success.
One might see that with this architecture, there is no gain in developing half a solution. This does not mean that ChrisM did not have a good "development pattern", just that in my opinion, this new approach does not benefit from its application.
I appreciate the opportunity given me to share these thoughts, and would like to thank ChrisM and the rest for their guidance which has lead us to this point.
- Amos (9/7)
I'd like to see some specific use cases before we get any farther. Figuring out what problem this is supposed to solve is really important.
I question whether there is a more generically useful object format for arbtrary XML than DOM. Isn't that kind of what you're asking for?
I think the main value of XML is communication. On this view, being able to round trip XML -> Object -> XML is quite important.
I don't think using XSLT to transform abribrary XML -> PPML would be very easy. I think the XSLT to make this work would be harder to write than some python that walked a DOM and built objects that way. Also PPML is rather low-level and unsafe. In general I think you want object editing and initializing methods that take XML/DOM arguments - rather than just building and object from scratch using PPM - it's too tricky and fragile.
- ChrisM (9/7/2000)
I've discovered that this projects goals are too close to the XMLObject project (http://www.zope.org/Wikis/zope-xml/XMLObject) to be able to stand alone. I'm going to try to use what has come out of Digital Creation's relationship with FourThought for a customer project - I'm using a pre-release version of something those guys came up with to convert XML to objects, and we'll figure out where to go from there.