Everybody but your boss understands that a relational database isn't "searchable" in the usual sense - you have to explicitly identify keywords, maintain search tables, etc. Fortunately it's easy to do incremental updates of Lucene (http://jakarta.apache.org/lucene/docs/index.html) indexes via the Hibernate Lifecycle interface.
For each class we wish to make searchable, we start by providing a method that creates a Lucene Document to describe the instance:public class Example implements Lifecycle {
/** hibernate id */
private Long id;
/** various fields we want to search */
private String name;
private String department;
private String skills;
...
/**
* Return a Lucene Document that provides the searchable elements
* of the object.
*/
Document getDocument() {
Document d = new Document();
d.add(Field.Keyword("id", id.toString()));
d.add(Field.Keyword("classname", this.getClass().getName());
d.add(Field.Keyword("name", name);
if (department != null) {
d.add(Field.Keyword("department", department);
}
if (skills != null) {
d.add(Field.Unstored("skills", skills);
}
}
Where the four standard types of fields are:
Keyword Indexed and stored in the index verbatim. This field is suitable for URLs, dates, personal names, telephone numbers, etc. For this technique to work we must store the Hibernate ID as a keyword.
Text Tokenized, indexed and stored in the index. This field can be searched, but you don't want to use it for large fields.
Unstored Tokenized, indexed but not stored in the index. This field is ideal when indexing large amounts of text that does not need to be retrieved in its original form, e.g., bodies of web pages or PDF documents.
Unindexed Stored in the index verbatim, but unsearchable. These values are normally used to provide displayable text for search results.
Since we store the Hibernate ID as a keyword, we can immediately retrieve the full object from the database and don't need unindexed or text fields, although they may be useful for non-hibernated tools.
We now need to provide Lifecycle methods that incrementally update a Lucene index.public class Example implements Lifecycle {
/** Directory containing Lucene index */
File idx;
...
/**
* Open up a Lucene IndexWriter.
*/
protected IndexWriter getIndexWriter() {
return new IndexWriter(idx, new StopAnalyser(), false);
}
/**
* Open a Lucene IndexReader.
*/
protected IndexReader getIndexReader() {
return IndexReader.open(idx);
}
/**
* Saving an object for the first time - add it to the Lucene
* index...
*/
public boolean onSave(Session s) throws CallbackException {
try {
IndexWriter writer = getIndexWriter();
writer.addDocument(getDocument());
writer.close();
} catch (IOException e) {
throw new CallbackException(e.getMessage());
}
}
/**
* Updating an object - must delete old object and reinsert it.
*/
public boolean onUpdate(Session s) throws CallbackException {
try {
IndexReader reader = getIndexReader();
reader.delete(new Term("id", id.toString()));
IndexWriter writer = getIndexWriter();
writer.addDocument(getDocument());
writer.close();
} catch (IOException e) {
throw new CallbackException(e.getMessage());
}
return false;
}
/**
* Deleting an object.
*/
public boolean onDelete(Session s) throws CallbackException {
try {
IndexReader reader = getIndexReader();
reader.delete(new Term("id", id.toString()));
} catch (IOException e) {
throw new CallbackException(e.getMessage());
}
return false;
}
/**
* Loading an object - we don't have to do anything here.
*/
public void onLoad(Session s, Serializable id) {
}
}
In practice these methods would often be put into a base class for all persistent objects, with the getIndexReader and getIndexWriter methods overridden if desired to provide disjoint indexes.
Implementation of the user interface and search functionality is left as an exercise for the reader.
Gavin points out that Interceptor.onUpdate() is only called on explicit calls to update(). This isn't hard to code... but murder to maintain. All it takes is one oversight and your index will get out of sync with your database. A second solution, using Interceptors, is discussed below
Another approach is to incrementally update the Lucene index in an Interceptor. We begin by defining a new interface.public interface Searchable {
/**
* Get Lucene IndexWriter - can be different for each class allowing
* multiple indexes
*/
public IndexWriter getIndexWriter();
/**
* Get Lucene IndexReader - must refer to same directory as
* getIndexWriter.
*/
public IndexReader getIndexReader();
/**
* Get Lucene Document describing our searchable content. The
* term keywords "id" and "classname" are reserved by our interceptor.
*/
public Document getDocument();
}
Any persistent class that we want to make searchable simply implements these three methods.
We now define an interceptor. To mix things up a bit the interceptor handles the Hibernate ID and classname, not the target class.public class LuceneInterceptor implements Interceptor, Serializable {
/**
* Drop object from Lucene index
*/
public void drop(Searchable entity, Long id) throws IOException {
IndexReader reader = entity.getIndexReader();
reader.delete(new Term("id", id.toString()));
}
/**
* Add object to Lucene index
*/
public void add(Searchable entity, Long id) throws IOException {
Document doc = entity.getDocument();
doc.add(Field.Keyword("id", id.toString());
doc.add(Field.Keyword("classname", entity.getClass().getName()));
IndexWriter writer = entity.getIndexWriter();
writer.addDocument(doc);
writer.close();
}
/**
* Method called when an existing record is updated.
*/
public boolean onFlushDirty (
Object entity,
Serializable id,
Object[] currentState,
Object[] previousState,
Object[] propertyNames,
Types[] types)
throws CallbackException {
if (entity instanceof Searchable) {
if (id instanceof Long) {
try {
drop((Searchable) entity, (Long) id);
add((Searchable) entity, (Long) id);
} catch (IOException e) {
throw new CallbackException(e.getMessage());
}
}
else {
// unsupported ID
}
}
return false;
}
/**
* Method called when a new record is saved.
*/
public boolean onSave (
Object entity,
Serializable id,
Object[] state,
Object[] propertyNames,
Types[] types
throws CallbackException {
if (entity instanceof Searchable) {
if (id instanceof Long) {
try {
add((Searchable) entity, (Long) id);
} catch (IOException e) {
throw new CallbackException(e.getMessage());
}
}
else {
// unsupported ID
}
}
return false;
}
/**
* Method called when an existing record is deleted.
*/
public boolean onDelete (
Object entity,
Serializable id,
Object[] state,
Object[] propertyNames,
Types[] types)
throws CallbackException {
if (entity instanceof Searchable) {
if (id instanceof Long) {
try {
drop((Searchable) entity, (Long) id);
} catch (IOException e) {
throw new CallbackException(e.getMessage());
}
}
else {
// unsupported ID
}
}
return false;
}
// rest of methods elided as they follow default behavior
}
|