中文XML论坛 - 专业的XML技术讨论区--显示贴子

以文本方式查看主题

-  中文XML论坛 - 专业的XML技术讨论区  (http://bbs.xml.org.cn/index.asp)
--  『 Semantic Web(语义Web)/描述逻辑/本体』  (http://bbs.xml.org.cn/list.asp?boardid=2)
----  trip report for SIGMOD 2009  (http://bbs.xml.org.cn/dispbbs.asp?boardid=2&rootid=&id=76371)

--  作者：whfcarter
--  发布时间：8/14/2009 12:48:00 PM

--  trip report for SIGMOD 2009
Finally I have some time to write the trip report for SIGMOD 2009 and wish to share some of my experiences with you. This year's SIGMOD/PODS was held in Providence, Rhode Isand, USA from June 29 to July 2. Note that PODS is always held together with SIGMOD. It mainly focuses on the database theoritical research.

This is my first time to attend SIGMOD. Although the program schedule was a bit tight, I listened to a lot of interesting talks, nice presentations and great demos during the main conference.

I list some valuable findings and new trends for you to track:
1) new hardware for DB?
You can refer to the invited talk "Storage Class Memory: Technology, Systems and Applications" by Richard F. Freitas (IBM). There is also one separate session named "Databases on Modern Hardware". I am sure it will continue to be popular since the characteristics of new hardware (especially SCM) breaks several old constraints or assumption of traditional data management, but brings new challenges.

2) cloud computing
In my opnion, cloud computing is the business realization of parallel computing, grid computing and distributed computing, which maximize the business value of these technologies. For the DB community, a new cross-field research topic emerges: data management in the cloud. I felt almost everyone was talking about the state of the art and the future of cloud computing. New comers also show their interests on that and raised interesting questions to those experts during coffee breaks. If you have interest, you can refer to "Distributed Data-Parallel Computing Using a High-Level Programming Language" by MSR. Moreover, you can have a look at "A Comparison of Approaches to Large-Scale Data Analysis" from the "Large-Scale Data Analysis". It proposed a benchmark (including several tasks both from parallel DBMS and map-reduce) to compare the map-reduce framework and the parallel DBMS (considering both row-oriented and column-oriented). It tries to tell us in which cases you need to select map-reduce while in other cases parallel DBMS will be better. While it is a bit biased and just shows some preliminary results within 100 nodes, it is worth having a closer look at the difference between two choices.

3) keyword search
While it has been studied for long, it is still a hot topic for DB. If you want to learn the basic knowledge of keyword search for structured data (DB, XML and graph data) and have a clear understanding of the existing work, I recommend you to look at a half day tutorial "Keyword Search on Structured and Semi-Structured Data". You can find the slides from Wei Wang's homepage. You can also have a look at a separate research session on keyword search and one paper "Combining Keyword Search and Forms for Ad Hoc Querying of Databases" from the Data on the Web session.

4) Data Fusion
With the growth of the data, it is still an open problem. This year there are two sessions about data fusion: one is Data Integration and the other is Entity Resolution. You can clearly see that one is interested in schema-level mapping while the latter one focuses on data-level. In fact, it is not limited to data fusion, you can also consider service composition or lightweight way (e.g. mashup). It also emphasizes the ability to handle data change and inconsistency in a large scale.

5) Semantic Web related
It is still not mainstream. However, since I am from the SW community, I pay much attention to the work presented in SIGMOD. There are three kinds of work: ontology matching ("A Gauss Funtion based Approach for Unbalanced Ontology Matching" done by Tsinghua University and IBM China Research Lab), RDF triple store ("Scalable Join Processing on Very Large RDF Graphs" done by MPi, Germany), and Semantic Search ("Hermes: A Travel through Semantics on the Data Web" done by us, Shanghai Jiao Tong University). If you have interests, you can have a look.

I do not plan to describe each interesting topic in detail. You can also track the development of Column Store. This year, cross-field research topics are recoginized as the trend for DB (i.e., Computer Human Interaction (CHI) and Information Visualization (IV) for DB, see the invited talk "Transforming Data Access Through Public Visualization", Data management in the online game see the tutorial "Database Research in Computer Games", and new hardware for DB see the tutorial "FPGA (Field Programmable Gate Array): What's in it for a Database?").

I also like the social events in this year's SIGMOD. I listened to the "new researcher symposium" and learned how to design your research career. I also listened to the "Relational Data model 40 years celebration" and heard interesting stories about "Edgar F. Codd". The business meeting and closing ceromony were also attractive. Daniel Abadi won this year's Jim Gray Dissertation Award for his excellent work on C-store (a kind of column-oriented store). The SIGMOD Edgar F. Codd Innovations Award Talk was given to Masaru Kitsuregawa for his contribution to hash-join and parallel computing.

--  作者：whfcarter
--  发布时间：8/14/2009 12:53:00 PM

--
补充一下，在SIGMOD上看到microsft在大力宣传Bing，这一点让我有一些惊讶，因为毕竟Bing是MS的新的搜索引擎，若在SIGIR上看到此番景象应该不足为奇。不过仔细想了一下也很正常，因为他的很多new feature以及搜索的改善等（特别是本地搜索和垂直搜索等）均使用了大量结构化数据，这就与数据库有了更加紧密的关联。我觉得这对于我们做semantic search的人来说是一个好消息，随着Yahoo searchmonkey, google rich snippet和MS的Bing均使用了相应的数据和技术，DB或者SW+IR的前途还是比较光明的。

--  作者：Humphrey
--  发布时间：8/14/2009 3:39:00 PM

--
据新闻联播的报道，微软和雅虎达成了一个长达10年的协议：微软的“必应”搜索引擎采用雅虎的技术；而雅虎则负责两家公司共同的广告业务。
不知道最终这两款搜索引擎会变成什么样子，只剩下一个？还是两者并存？

--  作者：viaphone
--  发布时间：8/19/2009 10:37:00 AM

--
Gain a lot from your report, thanks

W 3 C h i n a ( since 2003 ) 旗下站点
苏ICP备05006046号《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》

62.500ms