本站首页    管理页面    写新日志    退出

The Neurotic Fishbowl

[/*Java*/][转]Using Lucene with EJB
nybon 发表于 2004/12/24 23:03:38

Search is important! All too often search looks like where thing like '%that%'. Users know google, and quite a few even know its query language at this point. Aside from wanting to provide more functionality in search, users are expecting it. Google seems simple, doesn't it? Enter Lucene. I'll presume you've heard of it at least, if not used it. Lucene does full text indexing, and that is it. It does this really well. The beauty (well, one) is that you can index anything. In this case, I'll index an object being persisted by OJB. The key is to embed information required to retrieve the document being indexed. Take a gander at a fairly simple Student class (this is frmo an app I am doing for my little brother, who is a professor (of such terrible subjects as rock climbing and white water kayaking, don't get me started)). The primary use case for this application is for a student coop employee to be finding a student in the system, then finding gear and checking the gear out for the student. Finding the student is key, and that is best served by... searching! So we have a database record for each student, and want to have a convenient search facility, which can search based on name, student id (idNumber), phone number, even address. Lucene makes this is a snap. To do it, we just store the id (internal/pk id) in an unindexed field when we add a student in the StudentIndexer: public void add(final Student student) throws ServiceException { final Document doc = new Document(); doc.add(Field.Text(NAME, student.getName())); doc.add(Field.Text(ID_NUMBER, student.getIdNumber())); doc.add(Field.Text(ADDRESS, student.getAddress())); doc.add(Field.Text(PHONE, student.getPhone())); doc.add(Field.UnIndexed(IDENTITY, student.getId().toString())); try { synchronized (mutex) { final IndexWriter writer = new IndexWriter(index, analyzer, false); writer.addDocument(doc); writer.optimize(); writer.close(); } } catch (IOException e) { throw new ServiceException("Unable to index student", e); } } Notice the UnIndexed field on the Document? This tells Lucene to store this field with the record, but don't index it or search on it. When you retrieve the document you will get the field though. Perfect place to stash the primary key. When we look for the students, we don't want to get back Lucene Document instances, though, we want to go ahead and get the nice domain model instances of Student. What we'll do is query against the index, pull all the pk's for the hits out, then select for the domain objects using those pks (from the StudentIndex: public List findStudents(final String search) throws ServiceException { return this.findStudents(search, Integer.MAX_VALUE); } public List findStudents(final String search, final int numberOfResults) throws ServiceException { final Query query; try { query = QueryParser.parse(search, StudentIndexer.NAME, analyzer); } catch (ParseException e) { throw new ServiceException("Unable to make any sense of the query", e); } final ArrayList ids = new ArrayList(); try { final IndexReader reader = IndexReader.open(index); final IndexSearcher searcher = new IndexSearcher(reader); final Hits hits = searcher.search(query); for (int i = 0; i != hits.length() && i != numberOfResults; ++i) { final Document doc = hits.doc(i); ids.add(new Integer(doc.getField(StudentIndexer.IDENTITY).stringValue())); } searcher.close(); reader.close(); } catch (IOException e) { throw new ServiceException("Error while reading student data from index", e); } final List students = dao.findStudentsWithIdsIn(ids); Collections.sort(students, new Comparator() { public int compare(final Object o1, final Object o2) { final Integer id_1 = ((Student) o1).getId(); final Integer id_2 = ((Student) o1).getId(); for (int i = 0; i != ids.size(); i++) { final Integer integer = (Integer) ids.get(i); if (integer.equals(id_1)) { return -1; } if (integer.equals(id_2)) { return 1; } } return 0; } }); return students; } The findStudents(string, string, int): List method is a little bit more complex than I like as it does a few things: query against the lucene index, extract the primary keys for the hits, query for the students matching those pk's (via the StudentDAO), and finally sorts the results (no way to specify the sort order in the query, it is dependent on the order of the hits from the lucene query). With that though, we support queries such as Tiffany, which is simple, or a more fun one, name: Aching phone: ???-1234 or what not. Go look at the Lucene query parser syntax. It is worth noting that the above query defaults to searching on the name field if no specific field is specified. This seems to make sense to me =) If you look at the StudentIndex and StudentIndexer you will see there are also facilities for adding and removing documents from the lucene index. This gets important on any insert/update/delete operation. The update is important to catch as you need to remove the old entry and insert a new one in the index. Doing this is best done (my opinion) via an aspect which picks these operations out. That is outside the scope of this article though ;-) For a larger application with more things being indexed (this just has two searchable domain types) I might generalize the search capability via a DocumentFactory such as: public class BeanDocumentFactory implements DocumentFactory { public Document build(Object entity) { final Document document = new Document(); try { final BeanInfo info = Introspector.getBeanInfo(entity.getClass()); final PropertyDescriptor[] props = info.getPropertyDescriptors(); for (int i = 0; i != props.length; ++i) { final PropertyDescriptor prop = props[i]; final String name = prop.getName(); final Method reader = prop.getReadMethod(); final Object value = reader.invoke(entity, new Object[]{}); final Field field = Field.Text(name, String.valueOf(value)); document.add(field); } } catch (Exception e) { throw new RuntimeException("Handle these in real application", e); } return document; } } But I have not needed to generalize it for a real project yet =) Speaking of Lucene (which rocks) I am eagerly anticipating Erik Hatcher's new book, Lucene in Action. If it is anything like Erik and and Steve Loughran's Java Development with Ant Lucene will be a lucky project to have it in circulation. About the author Brian McCallister Blog: http://kasparov.skife.org/blog/

阅读全文(1470) | 回复(0) | 编辑 | 精华

 



发表评论:
昵称:
密码:
主页:
标题:
验证码:  (不区分大小写,请仔细填写,输错需重写评论内容!)

 
 



The Neurotic Fishbowl

.: 公告

This blog focuses on:

Semantic Web && Java Technology


Bloginess

«August 2025»
12
3456789
10111213141516
17181920212223
24252627282930
31

.: 我的分类(专题)

首页(171)
/*SemanticWeb*/(34)
/*Java*/(74)
/*FreeComments*/(59)
/*Agent*/(4)


In the Bowl

.: 最新日志

The End
使用Google Trends进行选型
怎样才能称为一次新的版本发行?
如何防止RSS信息过载
使用Excel作为用户接口
如何有效地报告Bug
sourceforge再次被封
趣文两篇
编写Firefox扩展
Jetspeed心得随笔


.: 最新回复

回复:Google API与yahoo 
回复:JADE 3.3的bug
回复:JADE 3.3的bug
回复:JADE 3.3的bug
回复:JADE 3.3的bug
回复:Jbpm和Shark比较的feat
回复:JADE 3.3的bug
回复:JADE 3.3的bug
回复:[转]批判性地看待一种可行的表示技
回复:JIRA破解


The Fishkeeper
blog名称:SW Portal
日志总数:171
评论数量:219
留言数量:8
访问次数:1045229
建立时间:2004年10月30日



Text Me

.: 留言板

签写新留言

路过
路过
页脚问题
RE:请问一下你的主页的下面部分是怎么关
请问一下你的主页的下面部分是怎么关闭的?
我是做Mobile Agent的
Gmail
不错
不错啊小倪同学


Other Fish in the Sea

.: 链接





站点首页 | 联系我们 | 博客注册 | 博客登陆

Sponsored By W3CHINA
W3CHINA Blog 0.8 Processed in 0.031 second(s), page refreshed 144753962 times.
《全国人大常委会关于维护互联网安全的决定》  《计算机信息网络国际联网安全保护管理办法》
苏ICP备05006046号