Index creation and updating capabilities are implemented within ZSearch module and Java Lucene. You can use both of these capabilities.
The PHP code listing below provides an example of how to index a file using ZSearch indexing API:
$index = new ZSearch('/data/my-index', true /* true to create new index */); $doc = new Zend_Search_Lucene_Document(); // Store document URL to identify it in search result. $doc->addField(Zend_Search_Lucene_Field::Text('url', $docUrl)); // Index document content $doc->addField(Zend_Search_Lucene_Field::UnStored('contents', $docContent)); // Add document to the index. $index->addDocument($doc);
ZSearchAnalyzer
class is used by indexer to tokenize document
text fields.
ZSearchAnalyzer::getDefault()
and ZSearchAnalyzer::setDefault()
methods are used
to get and set default analyser.
Thus you can assign your own text analayser or choose it from the set of predefined analysers:
ZSearchTextAnalyser
and ZSearchTextCIAnalyser
(default).
Bouth of them interpret token as a sequense of letters. ZSearchTextCIAnalyser
converts tokens
to lower case.
To switch between analysers use code:
ZSearchAnalyzer::setDefault(new ZSearchTextAnalyzer()); ... $index->addDocument($doc);
Newly added documents could be retrived from the index after commit operation.
ZSearch::commit()
is automatically called at the end of script execution and
before any search request.
Each commit() call generates new index segment. So it must be requested as rarely as possible. From the other side commiting large amount of documents in one step needs more memory.
Automatic segment management optimization is a subject of future ZSearch enhancements.
The Java program listing below provides an example of how to index a file using Java Lucene:
/** * Index creation: */ import org.apache.lucene.index.IndexWriter; import org.apache.lucene.document.*; import java.io.* ... IndexWriter indexWriter = new IndexWriter("/data/my_index", new SimpleAnalyzer(), true); ... String filename = "/path/to/file-to-index.txt" File f = new File(filename); Document doc = new Document(); doc.add(Field.Text("path", filename)); doc.add(Field.Keyword("modified",DateField.timeToString(f.lastModified()))); doc.add(Field.Text("author", "unknown")); FileInputStream is = new FileInputStream(f); Reader reader = new BufferedReader(new InputStreamReader(is)); doc.add(Field.Text("contents", reader)); indexWriter.addDocument(doc);
It is important to note that any kind of information can be added to the index. Application-specific information or metadata can be stored in the document fields, and later retrieved with the document during search.
It is the responsibility of your application to control the Lucene indexer and ZSearch. This means that data can be indexed from any source that is accessible by your application. For example, this could be the filesystem, a database, an HTML form, etc.