本站首页    管理页面    写新日志    退出


«August 2025»
12
3456789
10111213141516
17181920212223
24252627282930
31


公告
 本博客在此声明所有文章均为转摘,只做资料收集使用。

我的分类(专题)

日志更新

最新评论

留言板

链接

Blog信息
blog名称:
日志总数:1304
评论数量:2242
留言数量:5
访问次数:7594036
建立时间:2006年5月29日




[Hibernate]Using Lifecycles and Interceptors to update Lucene searches
软件技术

lhwork 发表于 2007/1/22 14:52:33

Everybody but your boss understands that a relational database isn't "searchable" in the usual sense - you have to explicitly identify keywords, maintain search tables, etc. Fortunately it's easy to do incremental updates of Lucene (http://jakarta.apache.org/lucene/docs/index.html) indexes via the Hibernate Lifecycle interface. For each class we wish to make searchable, we start by providing a method that creates a Lucene Document to describe the instance:public class Example implements Lifecycle { /** hibernate id */ private Long id; /** various fields we want to search */ private String name; private String department; private String skills; ... /** * Return a Lucene Document that provides the searchable elements * of the object. */ Document getDocument() { Document d = new Document(); d.add(Field.Keyword("id", id.toString())); d.add(Field.Keyword("classname", this.getClass().getName()); d.add(Field.Keyword("name", name); if (department != null) { d.add(Field.Keyword("department", department); } if (skills != null) { d.add(Field.Unstored("skills", skills); } } Where the four standard types of fields are: Keyword Indexed and stored in the index verbatim. This field is suitable for URLs, dates, personal names, telephone numbers, etc. For this technique to work we must store the Hibernate ID as a keyword. Text Tokenized, indexed and stored in the index. This field can be searched, but you don't want to use it for large fields. Unstored Tokenized, indexed but not stored in the index. This field is ideal when indexing large amounts of text that does not need to be retrieved in its original form, e.g., bodies of web pages or PDF documents. Unindexed Stored in the index verbatim, but unsearchable. These values are normally used to provide displayable text for search results. Since we store the Hibernate ID as a keyword, we can immediately retrieve the full object from the database and don't need unindexed or text fields, although they may be useful for non-hibernated tools. We now need to provide Lifecycle methods that incrementally update a Lucene index.public class Example implements Lifecycle { /** Directory containing Lucene index */ File idx; ... /** * Open up a Lucene IndexWriter. */ protected IndexWriter getIndexWriter() { return new IndexWriter(idx, new StopAnalyser(), false); } /** * Open a Lucene IndexReader. */ protected IndexReader getIndexReader() { return IndexReader.open(idx); } /** * Saving an object for the first time - add it to the Lucene * index... */ public boolean onSave(Session s) throws CallbackException { try { IndexWriter writer = getIndexWriter(); writer.addDocument(getDocument()); writer.close(); } catch (IOException e) { throw new CallbackException(e.getMessage()); } } /** * Updating an object - must delete old object and reinsert it. */ public boolean onUpdate(Session s) throws CallbackException { try { IndexReader reader = getIndexReader(); reader.delete(new Term("id", id.toString())); IndexWriter writer = getIndexWriter(); writer.addDocument(getDocument()); writer.close(); } catch (IOException e) { throw new CallbackException(e.getMessage()); } return false; } /** * Deleting an object. */ public boolean onDelete(Session s) throws CallbackException { try { IndexReader reader = getIndexReader(); reader.delete(new Term("id", id.toString())); } catch (IOException e) { throw new CallbackException(e.getMessage()); } return false; } /** * Loading an object - we don't have to do anything here. */ public void onLoad(Session s, Serializable id) { } } In practice these methods would often be put into a base class for all persistent objects, with the getIndexReader and getIndexWriter methods overridden if desired to provide disjoint indexes. Implementation of the user interface and search functionality is left as an exercise for the reader. Gavin points out that Interceptor.onUpdate() is only called on explicit calls to update(). This isn't hard to code... but murder to maintain. All it takes is one oversight and your index will get out of sync with your database. A second solution, using Interceptors, is discussed below Another approach is to incrementally update the Lucene index in an Interceptor. We begin by defining a new interface.public interface Searchable { /** * Get Lucene IndexWriter - can be different for each class allowing * multiple indexes */ public IndexWriter getIndexWriter(); /** * Get Lucene IndexReader - must refer to same directory as * getIndexWriter. */ public IndexReader getIndexReader(); /** * Get Lucene Document describing our searchable content. The * term keywords "id" and "classname" are reserved by our interceptor. */ public Document getDocument(); } Any persistent class that we want to make searchable simply implements these three methods. We now define an interceptor. To mix things up a bit the interceptor handles the Hibernate ID and classname, not the target class.public class LuceneInterceptor implements Interceptor, Serializable { /** * Drop object from Lucene index */ public void drop(Searchable entity, Long id) throws IOException { IndexReader reader = entity.getIndexReader(); reader.delete(new Term("id", id.toString())); } /** * Add object to Lucene index */ public void add(Searchable entity, Long id) throws IOException { Document doc = entity.getDocument(); doc.add(Field.Keyword("id", id.toString()); doc.add(Field.Keyword("classname", entity.getClass().getName())); IndexWriter writer = entity.getIndexWriter(); writer.addDocument(doc); writer.close(); } /** * Method called when an existing record is updated. */ public boolean onFlushDirty ( Object entity, Serializable id, Object[] currentState, Object[] previousState, Object[] propertyNames, Types[] types) throws CallbackException { if (entity instanceof Searchable) { if (id instanceof Long) { try { drop((Searchable) entity, (Long) id); add((Searchable) entity, (Long) id); } catch (IOException e) { throw new CallbackException(e.getMessage()); } } else { // unsupported ID } } return false; } /** * Method called when a new record is saved. */ public boolean onSave ( Object entity, Serializable id, Object[] state, Object[] propertyNames, Types[] types throws CallbackException { if (entity instanceof Searchable) { if (id instanceof Long) { try { add((Searchable) entity, (Long) id); } catch (IOException e) { throw new CallbackException(e.getMessage()); } } else { // unsupported ID } } return false; } /** * Method called when an existing record is deleted. */ public boolean onDelete ( Object entity, Serializable id, Object[] state, Object[] propertyNames, Types[] types) throws CallbackException { if (entity instanceof Searchable) { if (id instanceof Long) { try { drop((Searchable) entity, (Long) id); } catch (IOException e) { throw new CallbackException(e.getMessage()); } } else { // unsupported ID } } return false; } // rest of methods elided as they follow default behavior }  


阅读全文(2852) | 回复(0) | 编辑 | 精华
 



发表评论:
昵称:
密码:
主页:
标题:
验证码:  (不区分大小写,请仔细填写,输错需重写评论内容!)



站点首页 | 联系我们 | 博客注册 | 博客登陆

Sponsored By W3CHINA
W3CHINA Blog 0.8 Processed in 1.453 second(s), page refreshed 144753319 times.
《全国人大常委会关于维护互联网安全的决定》  《计算机信息网络国际联网安全保护管理办法》
苏ICP备05006046号