以文本方式查看主题 - 中文XML论坛 - 专业的XML技术讨论区 (http://bbs.xml.org.cn/index.asp) -- 『 Semantic Web(语义Web)/描述逻辑/本体 』 (http://bbs.xml.org.cn/list.asp?boardid=2) ---- 用RDF表达时间(一) (http://bbs.xml.org.cn/dispbbs.asp?boardid=2&rootid=&id=76326) |
-- 作者:admin -- 发布时间:8/11/2009 9:34:00 AM -- 用RDF表达时间(一) http://iandavis.com/blog/2009/08/time-in-rdf-1 注:本文作者Ian Davis是著名语义网公司Talis的CTO。 Representing Time in RDF Part 1 Way back in 2006 I wrote a blog post concerning the modelling of time in RDF (see [URL=http://iandavis.com/blog/2006/03/refactoring-bio-with-einstein-part-3-temporal-invariants]Refactoring Bio With Einstein Part 3: Temporal Invariants[/URL]. That post also provoked [URL=http://dig.csail.mit.edu/breadcrumbs/node/101]some discussion in the blogosphere[/URL]. Although I haven’t written anything on the subject for the past three years I haven’t stopped thinking about it. In fact I’ve been working quite hard on the problem, mainly by modelling real data, especially geographical information. This is the first of a series of blog posts describing my experiments. I’d like to thank [URL=http://www.ldodds.com/blog/]Leigh Dodds[/URL] and [URL=http://www.jenitennison.com/blog/]Jeni Tennison[/URL] who gave me valuable feedback on an earlier version of this write-up. In a comment to my blog post [URL=http://www.fruitfly.org/%7Ecjm/]Chris Mungall[/URL] made an excellent point about the importance of solving the time problem: However, it’s also seems clear to me that this is a recipe for trouble for the semantic web. Surely all real-world data that concerns non-trivial applications such as science and electronic health records, or any kind of human activity _must_ take time into account? Which ever hack you make to account for time, it has to propagate through all your ontologies. An ontology that treats the world as time-slices can’t interoperate with one that has a standard view of objects and processes. It may be just about workable, but I can’t see it being anything other than tremendously complicated. We’ll essentially end up with layering 3-place relations on top of RDF in an extremely inelegant way. This is not made clear when people are lured into the semantic web with examples of toy ontologies about pizzas that live floating in some mathematical space untroubled by time. Unless more is done to address these issues (and I commend this article for tackling this) the semantic web will face a huge backlash when people start realising they have to warp their ontologies and refactor their instance data to deal with time in order to represent real entities. Why is there no best practices document on representing instances that vary in time (that is, all real-world instances)? I do find it curious that more people aren’t making noise about this problem – I can only conclude that there’s a dearth of serious applications using RDF or OWL for instance data. Those comments are still true today and in fact they are being accentuated by the wide availability of data brought about by the successful [URL=http://linkeddata.org/]Linked Data[/URL] project. For example dbpedia, freebase and geonames all have descriptions of London (in England) and their URIs are all declared to be owl:sameAs one another: * http://dbpedia.org/resource/London As an another example of the kind of data that I should be able to model in RDF consider the city of Istanbul. During it’s long and varied history it has been named Byzantium, New Rome, Constantinople and Stamboul (see [URL=http://en.wikipedia.org/wiki/Names_of_Istanbul]Wikipedia’s page on the names of Istanbul[/URL]). At various times it has been the capital city of Roman Empire, the Byzantine/Eastern Roman Empire (twice), the Latin Empire, the Ottoman Empire and modern Turkey and of course its extent has varied considerably over that period of time too. No existing geographic ontology can model that variation in properties accurately enough for me to write a query to return the name of that city during the sixth crusade. My main requirements for modelling time are: * to be able to query the properties of and relations between entities at any point in time * to be able to sequence data in relative terms such as before, after and during In this post I’m going to explore some of the possible solutions to this modelling problem. I take four main approaches: 1. [URL=http://iandavis.com/blog/2005/10/refactoring-bio-with-einstein-part-2-conditions]Conditions[/URL] were my invention to model the state of being of an individual at a point in time (basically time slices like [URL=http://www.cyc.com/cycdoc/vocab/time-vocab.html#subAbstractions]CYC sub abstractions[/URL]). I’m going to illustrate the various approaches using three scenarios all drawn from problems in the genealogy field which happens to be both an enduring interest of mine and a minefield for time-insensitive applications: 1. In the first a woman is born as “Maria Smith” in 1867 and marries “Richard Johnson” in 1888. I want to write a sparql query that gives me her name so I can find her in the 1891 census. For completeness, there were approaches that I didn’t consider in detail: * [URL=http://www.dcc.uchile.cl/%7Ecgutierr/papers/temporalRDF.pdf]Temporal RDF[/URL] introduces a fourth time component to the triple. I chose not to cover this approach in a lot of detail because it extends the RDF model in a way that no current triple store implements and it requires a numeric time to be associated with each triple, preventing relative times from being expressed. It is worth noting that the scenarios analysed in these posts are very specialist. Most data modelling is only concerned with “The Now”. The data and corresponding queries I show in the following posts are quite convoluted and don’t reflect usual usage of RDF. This would likely be true of any data representation format that attempted to model time-varying properties of arbitrary things. I have split this write-up into six parts, of which this post is the first: * Part 1: Introduction
|
-- 作者:admin -- 发布时间:8/11/2009 9:36:00 AM -- 用RDF表达时间(二) Representing Time in RDF Part 2 Approach 1: Conditions and Time Slices [URL=http://iandavis.com/blog/2005/10/refactoring-bio-with-einstein-part-2]Conditions[/URL] were my invention to model the state of being of an individual at a point in time. They are basically [URL=http://sw.opencyc.org/concept/Mx4rvWn4OZwpEbGdrcN5Y29ycA]time slices[/URL] as used in OpenCyc (the original CYC notion of [URL=http://www.cyc.com/cycdoc/vocab/time-vocab.html#subAbstractions]subAbstractions[/URL] seems to be missing from OpenCyc). For this scenario I’m going to create a description of Maria which refers to two conditions to represent her before and after her marriage: @prefix bio: <http://purl.org/vocab/bio/0.1/> . thing:maria thing:mariaUnmarried thing:mariaMarried Original file: [URL=http://iandavis.com/2009/time-in-rdf/a1s1.ttl]a1s1.ttl[/URL] The URI thing:maria represents Maria the person. thing:mariaUnmarried and thing:mariaMarried are representations of her state of being at various times. Because they are the subjects of foaf:name properties they must be foaf:Agents. They are also the subject of OWL-Time properties which also infers that they are time:Intervals too. This may be a hint that this approach may be flawed: can any foaf:Agent also be an interval of time? They can certainly exist for an interval of time, but is it appropriate to use the OWL-Time properties in this way? Note that I have simplified this by putting an end year on her married name. A twist here is that even if Maria Johnson died in 1920 she is still known by that name, so realistically that name should hold for an open ended time interval. However, if I left it off then I would not be able to distinguish between her name remaining the same from a point in time onwards and the situation where it is not known when she next changed her name. I can now write a SPARQL query to find Maria’s name in 1891: prefix bio: <http://purl.org/vocab/bio/0.1/> select ?name where { Original file: [URL=http://iandavis.com/2009/time-in-rdf/a1s1.sq]a1s1.sq[/URL] This works well for my simplified situation. If I had an open-ended time interval for her name after her marriage then I would need to add a union clause to this query to match that case. To cover intervals with an known end but unknown start would require still another union block. Scenario two is structured in a fairly similar way. I describe two resources as being counties and one as being a parish. Just like the description of Maria the description of Widford refers to two conditions that represent the parish’s association with each county. @prefix bio: <http://purl.org/vocab/bio/0.1/> . thing:oxfordshire thing:gloucestershire thing:widford thing:widfordInGloucestershire thing:widfordInOxfordshire Original file: [URL=http://iandavis.com/2009/time-in-rdf/a1s2.ttl]a1s2.ttl[/URL] My SPARQL query is quite simple. I just need to find the name of the condition that starts in or before 1841 and ends in or after that year. prefix bio: <http://purl.org/vocab/bio/0.1/> select ?name where { Original file: [URL=http://iandavis.com/2009/time-in-rdf/a1s2.sq]a1s2.sq[/URL] I describe resources representing the three places: thing:lymeregis, thing:charmouth and thing:hastings. Then I describe thing:anon which is a person with a condition for each location they have lived in. These conditions are described with an ex:residence property that links the condition to the place and some time properties to describe when the condition applied. @prefix bio: <http://purl.org/vocab/bio/0.1/> . thing:lymeRegis thing:charmouth thing:hastings thing:anon a foaf:Person ; thing:anonInLymeRegis thing:anonInCharmouth thing:anonInHastings Original file: [URL=http://iandavis.com/2009/time-in-rdf/a1s3.ttl]a1s3.ttl[/URL] My SPARQL query needs to find the latest address that contains a date before 1874 and also the address following that one. prefix bio: <http://purl.org/vocab/bio/0.1/> # Find the latest address that contains a date before 1874 thing:anon bio:condition ?condAfter . ?condBefore time:intervalBefore ?condAfter . Original file: [URL=http://iandavis.com/2009/time-in-rdf/a1s3.sq]a1s3.sq[/URL] So this seems to do the job. However I am omitting a major piece of detail: the places will also have conditions similar to those in Scenario 2. That is going to add a huge amount of complexity to the SPARQL query. This approach seems very workable and there is [URL=http://plato.stanford.edu/entries/temporal-parts/]a large body of philosophical work[/URL] pertaining to this view of the world. In RDF terms though, there seems to be a problem with the domains of properties. Using the properties as I have done here implies that the “conditions” are both intervals of time and real-world things which seems awkward to me. I could separate the time intervals into new resources linked to the conditions with some kind of “existed during” predicate. This would complicate things a little, but perhaps not too greatly. |
-- 作者:admin -- 发布时间:8/11/2009 9:38:00 AM -- 用RDF表达时间(三) Representing Time in RDF Part 3 Approach 2: Named Graphs In this approach I physically divide my data up into separate graphs. Each graph contains triples that hold true for a specified time interval. One graph is designated as holding the time interval information for all the other graphs. The first graph a2s1g1.ttl contains information about the other graphs. Specifically it describes the time periods for which their triples hold true. @prefix bio: <http://purl.org/vocab/bio/0.1/> . thing:maria a foaf:Person . <a2s1g2.ttl> <a2s1g3.ttl> Original file: a2s1g1.ttl The triples say that the graph a2s1g2.ttl has a start time of 1867 and an end of 1888 and that a2s1g3.ttl starts in 1888 and ends at an arbitrary maximum date. This should be interpreted to mean that the triples contained in those graphs hold only between those dates. This means that I am saying a graph is also a time interval which might be a problem. That could be fixed by introducing a new predicate with the meaning “holds during” that can relate a graph to a time interval and moving the time predicates from the graphs to new interval resources. That would make the following somewhat more complicated. Note that because I’m using named graphs I won’t be able to use blank nodes to refer to things across the graphs. The triples that hold true for Maria before she is married are held in a2s1g2.ttl : @prefix foaf: <http://xmlns.com/foaf/0.1/> . thing:maria foaf:name "Maria Smith" . Original file: a2s1g2.ttl The triples that hold true after her marriage are in a2s1g3.ttl: @prefix foaf: <http://xmlns.com/foaf/0.1/> . thing:maria foaf:name "Maria Johnson" . Original file: a2s1g3.ttl I only have one triple in each of these graphs but of course I could assert lots more facts that were true in the specific time periods for each graph. Overall this approach uses 8 triples compared to 10 used by Approach 1. The SPARQL query has many similarities with Approach 1.The major difference is that the time period filters are applied to the named graphs not the conditions. Note that in this query I assume a2s1g1.ttl is the default graph. prefix foaf: <http://xmlns.com/foaf/0.1/> select ?name where { graph ?g { Original file: a2s1.sq In this scenario the data is split into four graphs. a2s2ag1.ttl holds the time interval descriptions for the other graphs. @prefix bio: <http://purl.org/vocab/bio/0.1/> . <a2s2ag2.ttl> <a2s2ag3.ttl> <a2s2ag4.ttl> Original file: a2s2ag1.ttl a2s2ag2.ttl holds information about Widford being in Gloucestershire: @prefix ex: <http://example.org/ex#> . thing:widford ex:partOf thing:gloucestershire . Original file: a2s2ag2.ttl a2s2ag3.ttl holds information about Widford being in Oxfordshire: @prefix ex: <http://example.org/ex#> . thing:widford ex:partOf thing:oxfordshire . Original file: a2s2ag3.ttl And finally, a2s2ag4.ttl holds information about the names of the places: @prefix ex: <http://example.org/ex#> . thing:oxfordshire thing:gloucestershire thing:widford Original file: a2s2ag4.ttl As can be seen from these examples, the data quickly becomes fragmented across many graphs. The granularity of this fragmentation depends on how frequently information about things changes over time. In the limit you might have one graph per year, per day or even per minute. However in practice you are likely to have graphs with long overlapping time intervals. Naively you might expect the following query to work for us, following the pattern set by Scenario 1: prefix bio: <http://purl.org/vocab/bio/0.1/> select ?name where { graph ?g { However that only works when both the ex:partOf and the foaf:name triples are in the same named graph. I didn’t see this in Scenario 1 because I was only interested in a single triple. Here I am trying to discover a relationship and a name, both of which could change at different times and so may be in different graphs. So the query is a little complex: prefix bio: <http://purl.org/vocab/bio/0.1/> select ?name where { graph ?g { ?g2 time:start ?start2 . graph ?g2 { } Original file: a2s2a.sq This shows a major disadvantage of the named graph approach because I would have to repeat the time filtering of graphs for each triple I wanted to find! Once again this graph (a2s3g1.ttl) contains time information about the other graphs. For this example I have chosen to include some facts about the places here too. By doing this I am basically saying that they are timeless facts which helps simplify this approach. @prefix bio: <http://purl.org/vocab/bio/0.1/> . thing:lymeRegis thing:charmouth thing:hastings <a2s3g2.ttl> <a2s3g3.ttl> <a2s3g4.ttl> Original file: a2s3g1.ttl In this case I am using relative times for the graphs, saying that the triples in a2s3g2.ttl hold before the triples in a2s3g3.ttl which in turn hold true before the triples in a2s3g4.ttl. The triples that assert my person lived in Lyme Regis are held in a2s3g2.ttl: @prefix ex: <http://example.org/ex#> . thing:anon ex:residence thing:lymeRegis . Original file: a2s3g2.ttl The triples that assert my person lived in Charmouth are held in a2s3g3.ttl: @prefix ex: <http://example.org/ex#> . thing:anon ex:residence thing:charmouth . Original file: a2s3g3.ttl And the triples for Hastings are in a2s3g4.ttl: @prefix ex: <http://example.org/ex#> . thing:anon ex:residence thing:hastings . Original file: a2s3g4.ttl Now the query looks like this: prefix bio: <http://purl.org/vocab/bio/0.1/> select ?nameBefore ?nameAfter where { ?gAfter time:intervalContains ?dateAfter . ?gBefore time:intervalBefore ?gAfter . graph ?gBefore { graph ?gAfter { ?placeBefore foaf:name ?nameBefore . } Original file: a2s3.sq What the first two clauses do is find graphs that might contain relevant triples. The first looks for a graph that has triples with time:intervalContains predicate whose value is less than equal to 1874. The second repeats it looking for graphs containing triples after that date. It then uses those two graphs to lookup residence information for the person at those times. I have a suspicion that this query will bring back more results than are necessary if I had more data. I may have several addresses after 1874 and this query will bring them all back, not just the first. It’s worth noting that this query would be even more complex if I had not chosen to treat the names of the places as timeless data. If they were in separate graphs then I would have to add additional clauses to find graphs that hold at the specified time just like I did in Scenario 2 Named graphs appear to partition the data very nicely. However it seems that they don’t make the querying any simpler. If it were possible to define a merge of all possible graphs that cover the time interval of interest and query that directly then it could be possible to write very natural queries and completely ignore the time component. This could be possible with a two-phase approach to running the queries or perhaps SPARQL sub-queries might help. The main problem with named graphs is that they lie outside of the standard RDF model. In fact they are only really formalised by the SPARQL specification. There are no standardised serialisations for named graph data so it is not generally possible to query a SPARQL service and retrieve the named graph information. The TRIG and TRIX serialisations do support named graphs but they are not widely implemented in comparison to RDF/XML, Turtle or Ntriples, none of which support anything beyond the standard RDF model. I don’t know of any reasoners that can work across multiple named graphs like this either. |
-- 作者:admin -- 发布时间:8/11/2009 9:40:00 AM -- 用RDF表达时间(四) Representing Time in RDF Part 4 Approach 3: Reified Relations In this approach I reify every triple that is time-dependent. That means every triple is split out into its constituent parts which will obviously lead to an explosion of triples. In this scenario I use thing:maria to refer to Maria, and I state the fact of her being a foaf:Person as a piece of timeless information. Facts about her name are reified and time interval information is attached. The URIs thing:mariaUnmarried and thing:mariaMarried represent the reified triples holding her name before and after marriage: @prefix bio: <http://purl.org/vocab/bio/0.1/> . thing:maria a foaf:Person . thing:mariaUnmarried thing:mariaMarried Original file: a3s1.ttl Facts about her name are reified using custom reification predicates, not the standard RDF ones. I do this to allow more flexibility in the use of these predicates without requiring that their subject is always an RDF statement. Potentially I could have multiple occurrences of ex:subject, ex:property or ex:value which would allow more compact descriptions to be created. For example I could have a description where multiple objects are included for the same predicate and subject. The query in this case is straighforward and quite similar to the query for Approach 1. prefix bio: <http://purl.org/vocab/bio/0.1/> select ?name where { Original file: a3s1.sq I simply search for resources that represent reified triples that hold in 1891. Then I select the ex:value of the resulting resource. For simplicity I start of by treating the place name information as timeless, only reifying the information about relationships between the places: @prefix bio: <http://purl.org/vocab/bio/0.1/> . thing:oxfordshire thing:gloucestershire thing:widford thing:widfordInGloucestershire thing:widfordInOxfordshire Original file: a3s2.ttl The query is again very simple and reminiscent of Approach 1: prefix bio: <http://purl.org/vocab/bio/0.1/> select ?name where { Original file: a3s2.sq However, a more realistic approach would be to model the names of the places as time-dependent triples which means reifying them. I introduce some new resources thing:widfordName, thing:oxfordshireName and thing:gloucestershireName to represent the reified forms of those triples. Now the data is much more verbose: @prefix bio: <http://purl.org/vocab/bio/0.1/> . thing:oxfordshire a ex:County . thing:gloucestershire a ex:County . thing:widford a ex:Parish . thing:widfordInGloucestershire thing:widfordInOxfordshire thing:widfordName thing:oxfordshireName thing:gloucestershireName Original file: a3s2a.ttl Also the query becomes more complex: prefix bio: <http://purl.org/vocab/bio/0.1/> select ?name where { ?p2 ex:subject ?x . Original file: a3s2a.sq Each triple being selected requires its time interval information to be specified sepearately which is reminiscent of Approach 2 Once again for simplicity I keep the place name information timeless. I introduce new resources for the reified statements about the residence of the person over time: @prefix bio: <http://purl.org/vocab/bio/0.1/> . thing:lymeRegis thing:charmouth thing:hastings thing:anon a foaf:Person . thing:anonInLymeRegis thing:anonInCharmouth thing:anonInHastings Original file: a3s3.ttl The query is straightforward although quite verbose. If the place name triples were also reified then this query would suddenly become much more complex: prefix bio: <http://purl.org/vocab/bio/0.1/> select ?nameBefore ?nameAfter where { ?pBefore time:intervalContains ?dateBefore . ?pAfter ex:subject thing:anon . ?pAfter time:intervalContains ?dateAfter . ?pBefore time:intervalBefore ?pAfter . Original file: a3s3.sq Approach 3 avoids the domain and range problem experienced by Approach 1 where the conditions were being used with properties whose domains were foaf:Agents. In Approach 3, properties are used with the appropriate resource types when they are timeless and never actually asserted when they are time-dependent. However, this approach is tedious for large quantities of data. The semantics are all locked away behind the reified triples. Once again I don’t know of any reasoners that could work with this kind of data. |
-- 作者:admin -- 发布时间:8/11/2009 9:40:00 AM -- 用RDF表达时间(五) Representing Time in RDF Part 5 Approach 4: N-ary Relations In this approach a new class is created for each time-dependent predicate. This new class represents the context of the property and allows more specific predicates to be used that provide extra meaning. In the first scenario we use a new ex:NameInContext class. This provides two predicates ex:individual and ex:name to link an individual to a name in a particular context. @prefix bio: <http://purl.org/vocab/bio/0.1/> . thing:maria a foaf:Person . thing:mariaUnmarried thing:mariaMarried Original file: a4s1.ttl The query is very similar to that in Approach 3: prefix bio: <http://purl.org/vocab/bio/0.1/> select ?name where { Original file: a4s1.sq For this scenario I use a class to represent the part-of relationship with two new predicates: ex:part and ex:whole. Once again, for simplicity I assume the place name information is timeless. @prefix bio: <http://purl.org/vocab/bio/0.1/> . thing:oxfordshire thing:gloucestershire thing:widford thing:widfordInGloucestershire thing:widfordInOxfordshire Original file: a4s2.ttl The query here looks like: prefix bio: <http://purl.org/vocab/bio/0.1/> select ?name where { Original file: a4s2.sq For the final scenario I use ex:ResidenceContext to represent the context of someone being resident somewhere. The person and the place are referred to using new predicates ex:individual and ex:place: @prefix bio: <http://purl.org/vocab/bio/0.1/> . thing:lymeRegis thing:charmouth thing:hastings thing:anon a foaf:Person . thing:anonInLymeRegis thing:anonInCharmouth thing:anonInHastings Original file: a4s3.ttl Once again the query is very similar to that in Approach 3: prefix bio: <http://purl.org/vocab/bio/0.1/> select ?nameBefore ?nameAfter where { ?pBefore time:intervalContains ?dateBefore . ?pAfter a ex:ResidenceContext . ?pAfter time:intervalContains ?dateAfter . ?pBefore time:intervalBefore ?pAfter . Original file: a4s3.sq In the examples shown here Approach 4 is identical to Approach 3 in complexity. In fact the key difference is the use of rdf:type rather than ex:property to distinguish the different types of relationships. In this respect it seems to offer no advantage over Approach 3 and adds the complexity of specific property names for each context relationship. However, it does potentially offer a wider use beyond simply recording time-varying properties. A context could include other factors such as provenance or location. Also it could be easier to model multi-agent contexts such as a marriages with predicates to represent the bride and groom separately. For example: thing:marriage |
-- 作者:admin -- 发布时间:8/11/2009 9:41:00 AM -- 用RDF表达时间(六) Representing Time in RDF Part 6 I found these documents useful while researching this topic. I include them here because they could make a useful list of background reading for modelling time with RDF. * Refactoring BIO with Einstein Part 1: First Steps — my first post that touches on modelling of time in genealogy. At this point I was attempting to model it simply using an event model, i.e. a sequence of things that happen to people and places. |
-- 作者:Humphrey -- 发布时间:8/12/2009 11:20:00 AM -- 嗬!够长。曾经大概地了解过几种程序设计语言对各种对象的处理方式,时间都是一个重要部分。一些语言甚至专门用一个栏目说明时间的表示问题,或许是没碰上过对时间敏感的问题的原因吧,我并不清楚对时间进行这么复杂的加工究竟是为了什么。 |
W 3 C h i n a ( since 2003 ) 旗 下 站 点 苏ICP备05006046号《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》 |
171.875ms |