Welcome to Zope.org
Copyright O'Reilly, 2000. All rights reserved.
This is an early draft chapter from a forthcoming book on Zope, to be
published by O'Reilly & Associates. The material has not been through
O'Reilly's editorial process, nor has it been reviewed for technical
accuracy. O'Reilly & Associates disclaims responsibility for any
errors in this draft and advises readers to use the information
contained herein with caution.
O'Reilly & Associates grants readers the right to read this material
and to print copies or make electronic copies for their own
use. O'Reilly & Associates does not grant anyone the right to use this
material as part of a commercial product or to modify and distribute
it. When O'Reilly & Associates publishes the final draft of this book
in print form, the content will be made available under an open
content license, but this chapter is not open content.
If you have any comments on the material in this chapter, you should
send them to the authors, Michel Pelletier and Amos Latteier, at
docs@digicool.com.
Application Design and Scalability
Introduction
Scaling is an important part of site development. If you are designing
a website that will grow in its usage and users, then thinking ahead
about scalability issues could save alot of trouble in the long run.
What is scale? Scale is the how a system responds to a growing number
of constraints on the systems (and your) resources. As more resources are
consumed (usually by more users) the ability for the system to scale up
to the larger number of users becomes critical.
When considering a system that can scale to your needs, first consider
the risks that scaling presents your system. After considering the
risks, there are various solutions to mitigate the risks. This
chapter discusses those risks and solutions.
Risks
Introduction
Considering a system for Scalability involves understanding a number
of risks. When your site is busy, how long will requests take? How
many requests can you handle before it all breaks down? Can you
optimize your site to remove unnecessarily expensive operations?
Here are some of the more common risks associated with scaling large
sites.
Volume
Volume is the number of hits your website can service in a certain
period of time without overloading. Volume is often a tradeoff of
latency also; the more volume of hits your site handles, the longer
latency typically becomes.
Latency
Latency is the average time it takes to complete one request.
Zope is a dynamic system, and lots of circumstances can cause
Zope to execute a lot of instructions in the course of a
reqeust. These areas are ripe for optimization.
Database "Hot Spots"
If alot of requests in Zope are trying to modify the same object at
the same time, then conflicts will occur. A conflict causes one
of the conflicting requests to retry itself, this can cause unusual
latency. As a worst case, if a request retries three times and
fails, the user gets a ConflictError message which can be confusing
and possibly intermittent.
An object that is getting written to by lots of requests is called
a hot spot and can aggravate latency.
Reliability
As your site grows your need to maintain a 24/7 presence grows too.
Scalable solutions are also robust, when one peice fails, some other
component should replace it.
Space
When websites get more traffic, they may end up creating more
content. As this database of content grows, you will need to contend
with the issues of managing such a huge mass of information in
scalable ways.
Managment
When the amount of content grows, the managment of content grows with
it. A scalable content managment system lets a growing number of
content managers work with a growing amount of content.
Solutions
Introduction
The following methods can be used to mitigate scalability risks.
Cataloging
Cataloging allows you to build quickly searchable indexes of
thousands of your objects. Because these indexes scale very well, a
Catalog is a great way to keep track of and search over thousands of
objects.
Delegation
Delegating management rights to subordinates is a great way to handle
scaling content managment. As your pool of content managing users
grows, you can turn those users over to managing a growing collection
of content. By designing your delegation scheme well, you can scale
managing content very efficiently.
Factoring
Factoring content and behavior into reusable components is a great
tool for scaling common editing tasks. Immaging having to change a
hundred thousand document headers to change the name of your company;
it would be much more efficient for all the document to share one
header that you can change once and effect all the documents.
Versions
XXX - how versions allow scaling of management tasks
Work Privatly
Versions allow you to work privately on objects in Zope. This means
that you can change objects in Zope without other users noticing your
changes.
Entering/Leaving a Version
To work on a version, you must first enter it so that you can see
the private changes made in that version.
Example
Create a version
Enter it
Make changes
Leave it, view site. No change.
Enter it, Save it.
View site, content changed.
Undo it?
When done working with a Version, you must leave it in order to see
the public representation of objects instead of the private.
Save all at once
When you are done making all of you changes in a version, you can
save those changes to make them public.
When saveing a version, you can also type in a message that gets
logged with the version, this is useful for you to keep notes on the
changes you made in the version, this note is called a log
message:
<screenshot>
Long Running Transaction
XXX Should this analogy be used? XXX
Versions Undoable
Saveed versions can be undone like other transactions. When a
Version is undone, all of the changes made in the version and
saveed will be reverted.
Locked Objects
When an object is changed in a version, it becomes locked and
cannot be changed by anyone else.:
<screenshot>
Here, the little red lock means that index_html is being modified
in another version and is now locked. If you try to changed a locked
object, Zope will raise a LockedInVersion error:
<screenshot?>
Versions Managment
There is a special area of the control panel for managing all of the
versions that have made changes to the object database:
<screenshot>
From this screen, you can manage these Versions:
<screenshot>
PS, if there are no versions with changes, then this says "There
are no non-empty versions" which is a borderline double negative,
it at least goes against the spirit of double negativity. XXX
Here you can can click on the names of the current versions and go
directly to them. You can also check any of them and commit them
with Save or Discard them with Discard .
Caching
Caching is by far the number one solution to almost all problems
of latency. Zope uses caching at many levels. For optimal
performance, Zope caches:
Database Caching
- Objects in memory
Zope objects that come from ZODB are
cached in memory until they are unused for a period of time.
- ZEO Client cache
ZEO ClientStorages cache objects in a
local disk as well as in memory.
DTML Caching
- Name Caching
Using a method by name caches the value
of the result.
Web Caching?
Simple HTTP Caching
By using http acceleration hardware, such as squid, Zope
output can be highly optimized for special cases and for
fairly static or seldom changing data.
Conflict Resolution
Objects that are written to heavily by lots of requests can try to
resolve conflicting writes themselves. ZODB defines a conflict
resolution protocol that allows an object to try and merge two
different writes together.
Conflict resolution is complex and is not discussed in detail in this
book, but it is mentioned here as a method of scaling lots of writes.
Relational Databases
Relational databases can scale very well for certain models of data.
You may find that your information works much better in a relational
database than as a persistent object. If you data is very
homogeneous then you may get scalability benefits from big commercial
RDBMs.
Profiling and Optimizing
A profile is a statistical report on how much time a program is
spending in certain routines in the code. Zope has a built in
web interface to the standard Python profiler:
<screenshot of profiler>
The profiler is useful for debugging expensive or long latency
operations. The report is ordered from most expensive to least
expensive operation.
The profiler slows Zope down alot because profiling requires a
lot of overhead and bookkeeping.
ZEO
ZEO is the client/server storage component of ZODB. Zope uses a
client/server storage archetecture. In the regular distribution
of Zope, ZODB manages Storages . Storages plug into ZODB via a
simply but undocumented interface. The default Storage that
comes with Zope is FileStorage which stores information on the
filesystem. There are other storages such as DemoStorage,
ReadOnlyStorage, and BerkeleyStorage (written by Ty Sarna).
ZEO is a Storage that plugs right into this, except that instead
of storing the information say, in a file, it talks via tcp/ip
to a remote component that takes care of the actual storage.
Zope itself does not have any idea that this is going on, it's
all below the Zope application level. The Storage component
that plugs into Zope is called the ClientStorage . The Server
component is the Storage Server .
Both the ClientStorage and the Storage Server are written using
Sam Rushing's asyncore, the technology behind ZServer (and other
select() based servers like Zeus and Squid).
Multiple ClientStorages can connect to one Storage Server.
ClientStorages maintain a local disk and memory cache of
objects, so an object is really only ever fetched once. If a
ClientStorage writes an object, a cache invalidation protocol
makes sure that all clients are up to date.
As an added bonus, the Storage Server itself has a Storage
backend. By default, the Storage Server uses FileStorage, but
there is no saying that the Storage Server could use a
ClientStorage to connect to another Storage Server. This allows
to you pretty much distribute your entire object database over a
n-deep heirarchy of machines like this:
Zope1->SClient1
\
SServerA->(SClientA)
/ \
Zope2->SClient2 \
\
MasterSServer->(FileStorage)
/ \
Zope3->SClient3 / (SClientFoo)<-ZopeFoo
\ /
SServerB->(SClientB)
/
Zope4->SClient4
Note that every step along theway maintains it's own object
cache. Of course, there is still a sort of single point of
failure being the MasterSServer, but if you have enough dough
to throw at eight machines and ZEO, you have enough dough to
throw at a good hardware RAID or a nice distributed or
journaling filesystem. Further, there is nothing in the ZEO
model that prevents us from introducing failover logic and
replication of writes to more than one server.
ZEO lets you debug one client (a client in this context being
a Zope instance) while others are still answering requests.
With stock Zope, you cannot do that.
ZEO really lets you break Zope up from a multi-threaded model
to a multi-process model, so the python interpreter lock is no
longer a scalability issue. This is good if you have a N
processor machine, you can run N Zope ZEO Clients and not
have lock contention (and, of course, each client is still
mutli-threaded, so you have a tiered thread/process/machine
model).
Storages
Storages have many different scalability properties. Often
Storages are based on a particular backend storage mechanism.
- FileStorage
The storage mechanism of a FileStorage is a
file on the filesystem. FileStorage are robust and very
useful, but not as scalable as other storages. FileStorage
are write sensitive, and high-write conditions can cause a lot
of conflicts. Also, FileStorage work by allways appending new
objects at the end of a file without removing the old state of
the object. While this gives FileStorages the benefit of
Undo, it means that for high-write situations the file grows
very quickly and must be packed often.
- ReadOnlyStorage
ReadOnlyStorage is, as its name implies,
read-only. This is good for storages that are distributed on
read-only media, like CD-ROMs.
- DemoStorage
The DemoStorage read from a read-only storage. Any
changes made to the storage are actually just kept in memory and
never saved to the storage. This is good for demo ware that you to
be operational but not savable. When a Zope that uses a
DemoStorage is restarted, the state of all the objects well be
reverted.
- BerkeleyStorage
BerkeleyStorage is a storage based on SleepyCat
Sofware's Berkeley database system. This storage does not support
undo but is more scalable and less sensitve to writes than
FileStorage.
- ZEO ClientStorage
The ClientStorage is the front end component
to the ZEO system. It ties a remote storage to a local Zope
instance. ClientStorage scalability depends on the speed and
robustness of the network between the ClientStorage and the Storage
Server.
???
Copyright O'Reilly, 2000. All rights reserved.
This is an early draft chapter from a forthcoming book on Zope, to be
published by O'Reilly & Associates. The material has not been through
O'Reilly's editorial process, nor has it been reviewed for technical
accuracy. O'Reilly & Associates disclaims responsibility for any
errors in this draft and advises readers to use the information
contained herein with caution.
O'Reilly & Associates grants readers the right to read this material
and to print copies or make electronic copies for their own
use. O'Reilly & Associates does not grant anyone the right to use this
material as part of a commercial product or to modify and distribute
it. When O'Reilly & Associates publishes the final draft of this book
in print form, the content will be made available under an open
content license, but this chapter is not open content.
If you have any comments on the material in this chapter, you should
send them to the authors, Michel Pelletier and Amos Latteier, at
docs@digicool.com.
|