Z Object Publishing Environment

Search | Download | Documentation | Resources | Members

Search  

 

 Guest

Join Zope.org
Log in


 Zope Exits

Zope Newbies
Technocrat.net
Appwatch.com
CodeCatalog.com

Welcome to Zope.org

Copyright O'Reilly, 2000. All rights reserved.

This is an early draft chapter from a forthcoming book on Zope, to be published by O'Reilly & Associates. The material has not been through O'Reilly's editorial process, nor has it been reviewed for technical accuracy. O'Reilly & Associates disclaims responsibility for any errors in this draft and advises readers to use the information contained herein with caution.

O'Reilly & Associates grants readers the right to read this material and to print copies or make electronic copies for their own use. O'Reilly & Associates does not grant anyone the right to use this material as part of a commercial product or to modify and distribute it. When O'Reilly & Associates publishes the final draft of this book in print form, the content will be made available under an open content license, but this chapter is not open content.

If you have any comments on the material in this chapter, you should send them to the authors, Michel Pelletier and Amos Latteier, at docs@digicool.com.


Searching and Categorizing Content

Introduction

A ZCatalog is a catalog of objects for Zope. A Catalog is a collection of indexes that store references to other Zope objects.

ZCatalog is a powerful tool, providing a number of compelling features:

  • Searches are fast. The data structures used by the indexes provide quick searches.

  • Searches are robust. ZCatalog supports boolean search terms, relevance ranking, synonyms and stopwords.

  • Indexing is flexible. A ZCatalog can catalog custom properties and track unique values. Since ZCatalog catalogs objects instead of file handles, you can index any content that can have a Python object wrapped around it. This also lets objects participate in how they are cataloged, e.g. de-HTML-ifying contents or extracting PDF properties.

  • Transactional. An indexing operation is part of a Zope transaction. If something goes wrong after content is indexed, the index is restored to its previous condition. This also means that Undo will restore an index to its previous condition. A ZCatalog can be altered privately in a Version, meaning no one else can see the changes to the index.

  • Cache-friendly. The index is internally broken into different "buckets", with each bucket being a separate Zope database object. Only the part of the index that is needed is loaded into memory. Alternatively, an un-needed part of the index can be removed from memory.

  • Results are lazy. A search that returns a tremendous number of matches won't return a large result set. Only the part of the results, such as the second batch of twenty, are returned.

Indexing Concepts

Text Index

A text index is like an index in the back of a book:

        ZCatalog: 59, 22, 15, 67, 88

This index shows the term ZCatalog occouring on five pages in a book. Text indexes are good because they let you look up specific words, or terms in a document. This is how almost all searching systems work; by mapping words to the location of the documents or pages that the words occour on.

ZCatalog text indexes actually map the location of a word to a sequences of paths to the object that the word occurs in:

        { 'bob'   -> '/Document1', '/Document2',
          'uncle' -> '/Document2', '/Document3',
          'bobo'  -> '/Document1', '/Document3',
        }

Vocabularies

Vocabularies are used by text indexes. A vocabulary is basically a language abstraction. In order for the ZCatalog to work with any kind of language, it must understand certain behaviors of that language. For example, all languages:

  • have a different concept of words. In english and many other languages, words are defined by whitespace boundaries, but in other languages, like Chinese and Japanese, words are defined by their contextual usage.

  • have different concepts of stopwords. The french word nous (we) would be extremely common in french text and should probably be removed as a stopword, but in english text it might make perfect sense to catalog this word because it is very infrequent.

  • have different concepts of synonymous, The synonym pair automobile -> car would not make sense in any language but English.

  • have different concepts of stemming. In english, it is common for text indexers to strip suffixes like ing from words, so that bake and baking match the same word. These suffix strippings would only make sense to english, and other languages would want to provide their own stemming (or none at all).

Current Vocabularies

Plain Vocbularies

Plain vocabularies are very simple and do minimal english language specific tasks.

Globbing Vocbularies

Globbing vocabularies are more complex vocabularies that allow wildcard searches on english text to be performed.

JVocabulary

JVocabulary is a ZCatalog vocabularies that supports splitting and indexing Japanese text.

Field Index

A field index is an index that maps atomic values to sequences of paths to the object that has that value. An example would be an index that kept track of when objects were last modified.

uniqueValues

Field indexes have a uniqueValues() method that returns a list of all unique values in the index mapping.

Keyword Indexes

A keyword index indexes a sequence of keywords for objects and can be queried for any objects that have one or more of those keywords.

Indexing patterns

Mass Indexing

Mass indexing is simple but has severe drawbacks. The total amount of content you can index in one transaction is equvalent to the amount of free virtual memory available to the Zope process, plus the amount of temporary storage the system has. If you have one gig of virtual RAM and 10 gigs of temp storage, then you could theoretically index 11 gigs of content.

But just indexing that much content would take a long, long time, and as soon as virtual memory ran out, Zope would start doing a lot of hard disk activity out to temp storage.

So mass indexing is cool if you want to index up to a few thousand objects, but beyond that, you want to use incremental indexing, which is much more efficient.

Mass Indexing - Example

Index lots of default content.

Incremental Indexing

Incremental indexing is when a stream of content is indexed over time. This technique is more complex mass indexing but can scale much better and is more efficient:

efficient

When new content is added, old content does not have to be re-indexed in a new sweep.

smaller footprint

because less information is being indexed per transaction the memory requirements of the Catalog reduce.

no hot spot

Catalogs can become notorious hot spots in a database, possible causing lots of conflicts. By spreading out the database writing, less hot spots occur.

Incremental Incremental - Exmaple

Catalog an RSS stream into Document objects?

Automatic Indexing

Automatic indexing is, as its name applies, the easiest of all. Automatic indexing is alot like incremental indexing becaue a stream of content is being indexed when it is created. However, automaticly indexed content can also re-indexed when it changes or removed from the indexes when it is destroyed. This is the most efficient usage of Zope, but it requires your objects knowing special things about Cataloging themselves, so basic Zope objects like DTML Documents and DTML Methods do not yet support Automatic indexing. This is an advanced technique and will not be discussed until Chapter X.

XXX So I guess we need an event model now No example cuzza ZClasses XXX

Using ZCatalog

Querying

Once you have some content in a catalog you can query the Catalog for objects that match certain criteria.

Search for object by Type - Example

XXX Use Form Action and uniqueValuesFor

Text Search for a Word in Certain Types - Example

XXX Use Form/Action and uniqueValuesFor

Explicit Queries

Aquery object should be in the form of a python mapping:

          <dtml-in "Catalog({'index1' : term1, 'index2', term2,
                             'text_indexN', 'some words to look for',})>

            ...

          </dtml-in>

They key of the mapping items should be the name of an index. The value should be the term you want to query the index for.

Searching for a Certain Date - Example

Range Searching

You may want to search for a whole range of information, like all the objects created after a certain date.

Date Range Search - Example

Range searches can be done easily with date fields:

             <dtml-var standard_html_header>

             <form action="search" method="get">
             <TABLE>
               <TR VALIGN="TOP">
                 <TD><p>containing the text:</p></TD>
                 <TD><input name="text_content" value=""></TD>
               </TR>
               <TR VALIGN="TOP">
                 <TD><p>with the type of:</p></TD>

                 <TD>
                   <select name="meta_type:list" size=6 MULTIPLE>
                   <dtml-in expr="uniqueValuesFor('meta_type')">
                     <option value="<dtml-var sequence-item>"><dtml-var sequence-item></options>
                   </dtml-in >
                   </select>
                 </TD>
               </TR>
               <TR>
                 <TD><p>modified since:</p></TD>
                 <TD>
                   <input type="hidden" name="date_usage" value="range:min">
                   <select name="date:date">
                     <option value="<dtml-var "ZopeTime(0)" >">Ever</option> 
                     <option value="<dtml-var "ZopeTime() - 1" >">Yesterday</option>
                     <option value="<dtml-var "ZopeTime() - 7" >">Last Week</option>
                     <option value="<dtml-var "ZopeTime() - 30" >">Last Month</option>
                     <option value="<dtml-var "ZopeTime() - 365" >">Last Year</option>
             <dtml-if "_.hasattr(AUTHENTICATED_USER,'prev_visit')">
             <option value="<dtml-var "AUTHENTICATED_USER.prev_visit">">
             Last Visit (<dtml-var "AUTHENTICATED_USER.prev_visit" fmt=Date>)
                </option>  
             </dtml-if>
                   </select>
                 </TD>
               </TR>

             <tr><td></td>
             <td><input type="submit" value=" Search "><input type="reset" value=" Clear ">
             </td>
             </tr>
             </form>
             </TABLE>

             <dtml-var standard_html_footer>  

Defining Record Objects with Meta-Data

Record objects work just like Brains from ZSQLMethods with the exception of data_record_id_.

The schema and values of a record object come from the Meta Data table. This is useful when you want to present information on a report page.

You should only create the minimum amount of meta data you need for your report, lots of meta data tables can consume excessive resources.

Fancy Report Form - Example

Found items text_content) and REQUEST['text_content']"> matching ""

Type' Title' Last modified Author

There were no results.

Truncated summary synopsis of content - Example

XXX


Copyright O'Reilly, 2000. All rights reserved.

This is an early draft chapter from a forthcoming book on Zope, to be published by O'Reilly & Associates. The material has not been through O'Reilly's editorial process, nor has it been reviewed for technical accuracy. O'Reilly & Associates disclaims responsibility for any errors in this draft and advises readers to use the information contained herein with caution.

O'Reilly & Associates grants readers the right to read this material and to print copies or make electronic copies for their own use. O'Reilly & Associates does not grant anyone the right to use this material as part of a commercial product or to modify and distribute it. When O'Reilly & Associates publishes the final draft of this book in print form, the content will be made available under an open content license, but this chapter is not open content.

If you have any comments on the material in this chapter, you should send them to the authors, Michel Pelletier and Amos Latteier, at docs@digicool.com.

 
 
Privacy policy       Printable Page       Feedback about Zope.org      DTML Source