Z Object Publishing Environment

Search | Download | Documentation | Resources | Members

Search  

 

 Guest

Join Zope.org
Log in


 Zope Exits

Zope Newbies
Technocrat.net
Appwatch.com
CodeCatalog.com

Welcome to Zope.org

Copyright O'Reilly, 2000. All rights reserved.

This is an early draft chapter from a forthcoming book on Zope, to be published by O'Reilly & Associates. The material has not been through O'Reilly's editorial process, nor has it been reviewed for technical accuracy. O'Reilly & Associates disclaims responsibility for any errors in this draft and advises readers to use the information contained herein with caution.

O'Reilly & Associates grants readers the right to read this material and to print copies or make electronic copies for their own use. O'Reilly & Associates does not grant anyone the right to use this material as part of a commercial product or to modify and distribute it. When O'Reilly & Associates publishes the final draft of this book in print form, the content will be made available under an open content license, but this chapter is not open content.

If you have any comments on the material in this chapter, you should send them to the authors, Michel Pelletier and Amos Latteier, at docs@digicool.com.


Application Design and Scalability

Introduction

Scaling is an important part of site development. If you are designing a website that will grow in its usage and users, then thinking ahead about scalability issues could save alot of trouble in the long run.

What is scale? Scale is the how a system responds to a growing number of constraints on the systems (and your) resources. As more resources are consumed (usually by more users) the ability for the system to scale up to the larger number of users becomes critical.

When considering a system that can scale to your needs, first consider the risks that scaling presents your system. After considering the risks, there are various solutions to mitigate the risks. This chapter discusses those risks and solutions.

Risks

Introduction

Considering a system for Scalability involves understanding a number of risks. When your site is busy, how long will requests take? How many requests can you handle before it all breaks down? Can you optimize your site to remove unnecessarily expensive operations?

Here are some of the more common risks associated with scaling large sites.

Volume

Volume is the number of hits your website can service in a certain period of time without overloading. Volume is often a tradeoff of latency also; the more volume of hits your site handles, the longer latency typically becomes.

Latency

Latency is the average time it takes to complete one request. Zope is a dynamic system, and lots of circumstances can cause Zope to execute a lot of instructions in the course of a reqeust. These areas are ripe for optimization.

Database "Hot Spots"

If alot of requests in Zope are trying to modify the same object at the same time, then conflicts will occur. A conflict causes one of the conflicting requests to retry itself, this can cause unusual latency. As a worst case, if a request retries three times and fails, the user gets a ConflictError message which can be confusing and possibly intermittent.

An object that is getting written to by lots of requests is called a hot spot and can aggravate latency.

Reliability

As your site grows your need to maintain a 24/7 presence grows too. Scalable solutions are also robust, when one peice fails, some other component should replace it.

Space

When websites get more traffic, they may end up creating more content. As this database of content grows, you will need to contend with the issues of managing such a huge mass of information in scalable ways.

Managment

When the amount of content grows, the managment of content grows with it. A scalable content managment system lets a growing number of content managers work with a growing amount of content.

Solutions

Introduction

The following methods can be used to mitigate scalability risks.

Cataloging

Cataloging allows you to build quickly searchable indexes of thousands of your objects. Because these indexes scale very well, a Catalog is a great way to keep track of and search over thousands of objects.

Delegation

Delegating management rights to subordinates is a great way to handle scaling content managment. As your pool of content managing users grows, you can turn those users over to managing a growing collection of content. By designing your delegation scheme well, you can scale managing content very efficiently.

Factoring

Factoring content and behavior into reusable components is a great tool for scaling common editing tasks. Immaging having to change a hundred thousand document headers to change the name of your company; it would be much more efficient for all the document to share one header that you can change once and effect all the documents.

Versions

XXX - how versions allow scaling of management tasks

Work Privatly

Versions allow you to work privately on objects in Zope. This means that you can change objects in Zope without other users noticing your changes.

Entering/Leaving a Version

To work on a version, you must first enter it so that you can see the private changes made in that version.

Example

Create a version

Enter it

Make changes

Leave it, view site. No change.

Enter it, Save it.

View site, content changed.

Undo it?

When done working with a Version, you must leave it in order to see the public representation of objects instead of the private.

Save all at once

When you are done making all of you changes in a version, you can save those changes to make them public.

When saveing a version, you can also type in a message that gets logged with the version, this is useful for you to keep notes on the changes you made in the version, this note is called a log message:

          <screenshot>

Long Running Transaction

XXX Should this analogy be used? XXX

Versions Undoable

Saveed versions can be undone like other transactions. When a Version is undone, all of the changes made in the version and saveed will be reverted.

Locked Objects

When an object is changed in a version, it becomes locked and cannot be changed by anyone else.:

          <screenshot>

Here, the little red lock means that index_html is being modified in another version and is now locked. If you try to changed a locked object, Zope will raise a LockedInVersion error:

          <screenshot?>

Versions Managment

There is a special area of the control panel for managing all of the versions that have made changes to the object database:

          <screenshot>

From this screen, you can manage these Versions:

          <screenshot>

          PS, if there are no versions with changes, then this says "There
          are no non-empty versions" which is a borderline double negative,
          it at least goes against the spirit of double negativity. XXX

Here you can can click on the names of the current versions and go directly to them. You can also check any of them and commit them with Save or Discard them with Discard.

Caching

Caching is by far the number one solution to almost all problems of latency. Zope uses caching at many levels. For optimal performance, Zope caches:

Database Caching

Objects in memory

Zope objects that come from ZODB are cached in memory until they are unused for a period of time.

ZEO Client cache

ZEO ClientStorages cache objects in a local disk as well as in memory.

DTML Caching

Name Caching

Using a method by name caches the value of the result.

Web Caching?

Simple HTTP Caching

By using http acceleration hardware, such as squid, Zope output can be highly optimized for special cases and for fairly static or seldom changing data.

Conflict Resolution

Objects that are written to heavily by lots of requests can try to resolve conflicting writes themselves. ZODB defines a conflict resolution protocol that allows an object to try and merge two different writes together.

Conflict resolution is complex and is not discussed in detail in this book, but it is mentioned here as a method of scaling lots of writes.

Relational Databases

Relational databases can scale very well for certain models of data. You may find that your information works much better in a relational database than as a persistent object. If you data is very homogeneous then you may get scalability benefits from big commercial RDBMs.

Profiling and Optimizing

A profile is a statistical report on how much time a program is spending in certain routines in the code. Zope has a built in web interface to the standard Python profiler:

        <screenshot of profiler>

The profiler is useful for debugging expensive or long latency operations. The report is ordered from most expensive to least expensive operation.

The profiler slows Zope down alot because profiling requires a lot of overhead and bookkeeping.

ZEO

ZEO is the client/server storage component of ZODB. Zope uses a client/server storage archetecture. In the regular distribution of Zope, ZODB manages Storages. Storages plug into ZODB via a simply but undocumented interface. The default Storage that comes with Zope is FileStorage which stores information on the filesystem. There are other storages such as DemoStorage, ReadOnlyStorage, and BerkeleyStorage (written by Ty Sarna).

ZEO is a Storage that plugs right into this, except that instead of storing the information say, in a file, it talks via tcp/ip to a remote component that takes care of the actual storage. Zope itself does not have any idea that this is going on, it's all below the Zope application level. The Storage component that plugs into Zope is called the ClientStorage. The Server component is the Storage Server.

Both the ClientStorage and the Storage Server are written using Sam Rushing's asyncore, the technology behind ZServer (and other select() based servers like Zeus and Squid).

Multiple ClientStorages can connect to one Storage Server. ClientStorages maintain a local disk and memory cache of objects, so an object is really only ever fetched once. If a ClientStorage writes an object, a cache invalidation protocol makes sure that all clients are up to date.

As an added bonus, the Storage Server itself has a Storage backend. By default, the Storage Server uses FileStorage, but there is no saying that the Storage Server could use a ClientStorage to connect to another Storage Server. This allows to you pretty much distribute your entire object database over a n-deep heirarchy of machines like this:

        Zope1->SClient1
              \
               SServerA->(SClientA)
                /             \
        Zope2->SClient2         \
                                  \
                         MasterSServer->(FileStorage)
                               /           \
        Zope3->SClient3      /          (SClientFoo)<-ZopeFoo
                \          /
               SServerB->(SClientB)
                /
        Zope4->SClient4

Note that every step along theway maintains it's own object cache. Of course, there is still a sort of single point of failure being the MasterSServer, but if you have enough dough to throw at eight machines and ZEO, you have enough dough to throw at a good hardware RAID or a nice distributed or journaling filesystem. Further, there is nothing in the ZEO model that prevents us from introducing failover logic and replication of writes to more than one server.

ZEO lets you debug one client (a client in this context being a Zope instance) while others are still answering requests. With stock Zope, you cannot do that.

ZEO really lets you break Zope up from a multi-threaded model to a multi-process model, so the python interpreter lock is no longer a scalability issue. This is good if you have a N processor machine, you can run N Zope ZEO Clients and not have lock contention (and, of course, each client is still mutli-threaded, so you have a tiered thread/process/machine model).

Storages

Storages have many different scalability properties. Often Storages are based on a particular backend storage mechanism.

FileStorage

The storage mechanism of a FileStorage is a file on the filesystem. FileStorage are robust and very useful, but not as scalable as other storages. FileStorage are write sensitive, and high-write conditions can cause a lot of conflicts. Also, FileStorage work by allways appending new objects at the end of a file without removing the old state of the object. While this gives FileStorages the benefit of Undo, it means that for high-write situations the file grows very quickly and must be packed often.

ReadOnlyStorage

ReadOnlyStorage is, as its name implies, read-only. This is good for storages that are distributed on read-only media, like CD-ROMs.

DemoStorage

The DemoStorage read from a read-only storage. Any changes made to the storage are actually just kept in memory and never saved to the storage. This is good for demo ware that you to be operational but not savable. When a Zope that uses a DemoStorage is restarted, the state of all the objects well be reverted.

BerkeleyStorage

BerkeleyStorage is a storage based on SleepyCat Sofware's Berkeley database system. This storage does not support undo but is more scalable and less sensitve to writes than FileStorage.

ZEO ClientStorage

The ClientStorage is the front end component to the ZEO system. It ties a remote storage to a local Zope instance. ClientStorage scalability depends on the speed and robustness of the network between the ClientStorage and the Storage Server.

???


Copyright O'Reilly, 2000. All rights reserved.

This is an early draft chapter from a forthcoming book on Zope, to be published by O'Reilly & Associates. The material has not been through O'Reilly's editorial process, nor has it been reviewed for technical accuracy. O'Reilly & Associates disclaims responsibility for any errors in this draft and advises readers to use the information contained herein with caution.

O'Reilly & Associates grants readers the right to read this material and to print copies or make electronic copies for their own use. O'Reilly & Associates does not grant anyone the right to use this material as part of a commercial product or to modify and distribute it. When O'Reilly & Associates publishes the final draft of this book in print form, the content will be made available under an open content license, but this chapter is not open content.

If you have any comments on the material in this chapter, you should send them to the authors, Michel Pelletier and Amos Latteier, at docs@digicool.com.

 
 
Privacy policy       Printable Page       Feedback about Zope.org      DTML Source