中文XML论坛 - 专业的XML技术讨论区--显示贴子

以文本方式查看主题

-  中文XML论坛 - 专业的XML技术讨论区  (http://bbs.xml.org.cn/index.asp)
--  『 Semantic Web(语义Web)/描述逻辑/本体』  (http://bbs.xml.org.cn/list.asp?boardid=2)
----  用RDF表达时间（一）  (http://bbs.xml.org.cn/dispbbs.asp?boardid=2&rootid=&id=76326)

--  作者：admin
--  发布时间：8/11/2009 9:34:00 AM

--  用RDF表达时间（一）
http://iandavis.com/blog/2009/08/time-in-rdf-1

注：本文作者Ian Davis是著名语义网公司Talis的CTO。

Representing Time in RDF Part 1

Way back in 2006 I wrote a blog post concerning the modelling of time in RDF (see [URL=http://iandavis.com/blog/2006/03/refactoring-bio-with-einstein-part-3-temporal-invariants]Refactoring Bio With Einstein Part 3: Temporal Invariants[/URL]. That post also provoked [URL=http://dig.csail.mit.edu/breadcrumbs/node/101]some discussion in the blogosphere[/URL]. Although I haven’t written anything on the subject for the past three years I haven’t stopped thinking about it. In fact I’ve been working quite hard on the problem, mainly by modelling real data, especially geographical information. This is the first of a series of blog posts describing my experiments. I’d like to thank [URL=http://www.ldodds.com/blog/]Leigh Dodds[/URL] and [URL=http://www.jenitennison.com/blog/]Jeni Tennison[/URL] who gave me valuable feedback on an earlier version of this write-up.

In a comment to my blog post [URL=http://www.fruitfly.org/%7Ecjm/]Chris Mungall[/URL] made an excellent point about the importance of solving the time problem:

However, it’s also seems clear to me that this is a recipe for trouble for the semantic web. Surely all real-world data that concerns non-trivial applications such as science and electronic health records, or any kind of human activity _must_ take time into account? Which ever hack you make to account for time, it has to propagate through all your ontologies. An ontology that treats the world as time-slices can’t interoperate with one that has a standard view of objects and processes. It may be just about workable, but I can’t see it being anything other than tremendously complicated. We’ll essentially end up with layering 3-place relations on top of RDF in an extremely inelegant way.

This is not made clear when people are lured into the semantic web with examples of toy ontologies about pizzas that live floating in some mathematical space untroubled by time. Unless more is done to address these issues (and I commend this article for tackling this) the semantic web will face a huge backlash when people start realising they have to warp their ontologies and refactor their instance data to deal with time in order to represent real entities. Why is there no best practices document on representing instances that vary in time (that is, all real-world instances)? I do find it curious that more people aren’t making noise about this problem – I can only conclude that there’s a dearth of serious applications using RDF or OWL for instance data.

Those comments are still true today and in fact they are being accentuated by the wide availability of data brought about by the successful [URL=http://linkeddata.org/]Linked Data[/URL] project. For example dbpedia, freebase and geonames all have descriptions of London (in England) and their URIs are all declared to be owl:sameAs one another:

    *  http://dbpedia.org/resource/London
    *  http://sws.geonames.org/2643743/
    *  http://rdf.freebase.com/ns/guid.9202a8c04000641f80000000000242b2

These descriptions assert population figures for London of 7,355,400, 7,421,209 and 7,512,400 respectively. Since all these resources are owl:sameAs one another then I have three different populations for exactly the same thing with no temporal context (to be fair both freebase and dbpedia do attempt to assign a date, but they both say it’s for the year 2006). Perhaps they are all correct but are taken at different times, or perhaps they are actually referring to slightly different definitions of “London”. Whatever the cause, the effect is that the data is not particularly useful. It would be helpful for them to indicate when the measurement of population took place. This is not intended as a criticism of the LOD project but to demonstrate that simplistic modelling of data that ignores time can quickly produce unhelpful results.

As an another example of the kind of data that I should be able to model in RDF consider the city of Istanbul. During it’s long and varied history it has been named Byzantium, New Rome, Constantinople and Stamboul (see [URL=http://en.wikipedia.org/wiki/Names_of_Istanbul]Wikipedia’s page on the names of Istanbul[/URL]). At various times it has been the capital city of Roman Empire, the Byzantine/Eastern Roman Empire (twice), the Latin Empire, the Ottoman Empire and modern Turkey and of course its extent has varied considerably over that period of time too.

No existing geographic ontology can model that variation in properties accurately enough for me to write a query to return the name of that city during the sixth crusade.

My main requirements for modelling time are:

to be able to query the properties of and relations between entities at any point in time

    * to be able to sequence data in relative terms such as before, after and during
    * not to extend the RDF triple model beyond possibly allowing named graphs
    * not to require changes to existing RDF schemas
    * avoid duplication of data

In this post I’m going to explore some of the possible solutions to this modelling problem. I take four main approaches:

   1. [URL=http://iandavis.com/blog/2005/10/refactoring-bio-with-einstein-part-2-conditions]Conditions[/URL] were my invention to model the state of being of an individual at a point in time (basically time slices like [URL=http://www.cyc.com/cycdoc/vocab/time-vocab.html#subAbstractions]CYC sub abstractions[/URL]).
   2. Named graphs, with one graph containing time interval information about the other graphs.
   3. Reification of all triples and attaching time interval information to the reified statements.
   4. N-ary relations representing contexts.

I’m going to illustrate the various approaches using three scenarios all drawn from problems in the genealogy field which happens to be both an enduring interest of mine and a minefield for time-insensitive applications:

   1. In the first a woman is born as “Maria Smith” in 1867 and marries “Richard Johnson” in 1888. I want to write a sparql query that gives me her name so I can find her in the 1891 census.
   2. For scenario 2 imagine that I discover an ancestor in the 1861 census who claims to have been born in in Widford, Gloucestershire. However when I check I find three Widfords: one in Essex, one in Hertfordshire and one in Oxfordshire. Has there been an error in the census transcription? The explanation is that prior to 1844 the Oxfordshire Widford was actually in Gloucestershire. I want to write a sparql query that finds out which county the parish was in when the 1841 census was being taken.
   3. The final scenario is where I have records of the addresses that a person has lived at. I don’t have precise dates for the moves between them because the information has been derived from locating that person in public records. I know, for instance that in 1870 this person lived in Lyme Regis, Dorset; in 1871 they were in Charmouth, Dorset and in 1881 they were in Hastings, Sussex. Given that information, where is the most likely place to look for them in 1874? Obviously in the absence of any other information, I would start looking in Charmouth and if that proved fruitless, I would move onto Hastings. Can I write a sparql query to give me that ordering of possibilities?

For completeness, there were approaches that I didn’t consider in detail:

* [URL=http://www.dcc.uchile.cl/%7Ecgutierr/papers/temporalRDF.pdf]Temporal RDF[/URL] introduces a fourth time component to the triple. I chose not to cover this approach in a lot of detail because it extends the RDF model in a way that no current triple store implements and it requires a numeric time to be associated with each triple, preventing relative times from being expressed.

It is worth noting that the scenarios analysed in these posts are very specialist. Most data modelling is only concerned with “The Now”. The data and corresponding queries I show in the following posts are quite convoluted and don’t reflect usual usage of RDF. This would likely be true of any data representation format that attempted to model time-varying properties of arbitrary things.

I have split this write-up into six parts, of which this post is the first:

    * Part 1: Introduction
    * Part 2: Approach 1
    * Part 3: Approach 2
    * Part 4: Approach 3
    * Part 5: Approach 4
    * Part 6: References

--  作者：admin
--  发布时间：8/11/2009 9:36:00 AM

--  用RDF表达时间（二）
Representing Time in RDF Part 2

Approach 1: Conditions and Time Slices

[URL=http://iandavis.com/blog/2005/10/refactoring-bio-with-einstein-part-2]Conditions[/URL] were my invention to model the state of being of an individual at a point in time. They are basically [URL=http://sw.opencyc.org/concept/Mx4rvWn4OZwpEbGdrcN5Y29ycA]time slices[/URL] as used in OpenCyc (the original CYC notion of [URL=http://www.cyc.com/cycdoc/vocab/time-vocab.html#subAbstractions]subAbstractions[/URL] seems to be missing from OpenCyc).
Scenario 1

For this scenario I’m going to create a description of Maria which refers to two conditions to represent her before and after her marriage:

@prefix bio: <http://purl.org/vocab/bio/0.1/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix time: <http://www.w3.org/2006/time#> .
@prefix ex: <http://example.org/ex#> .
@prefix thing: <http://example.org/thing#> .

thing:maria
  a foaf:Person ;
  bio:condition thing:mariaUnmarried ;
  bio:condition thing:mariaMarried .

thing:mariaUnmarried
  foaf:name "Maria Smith" ;
  time:start "1867" ;
  time:end "1888" .

thing:mariaMarried
  foaf:name "Maria Johnson" ;
  time:start "1888" ;
  time:end "9999" .

Original file: [URL=http://iandavis.com/2009/time-in-rdf/a1s1.ttl]a1s1.ttl[/URL]

The URI thing:maria represents Maria the person. thing:mariaUnmarried and thing:mariaMarried are representations of her state of being at various times. Because they are the subjects of foaf:name properties they must be foaf:Agents. They are also the subject of OWL-Time properties which also infers that they are time:Intervals too. This may be a hint that this approach may be flawed: can any foaf:Agent also be an interval of time? They can certainly exist for an interval of time, but is it appropriate to use the OWL-Time properties in this way?

Note that I have simplified this by putting an end year on her married name. A twist here is that even if Maria Johnson died in 1920 she is still known by that name, so realistically that name should hold for an open ended time interval. However, if I left it off then I would not be able to distinguish between her name remaining the same from a point in time onwards and the situation where it is not known when she next changed her name.

I can now write a SPARQL query to find Maria’s name in 1891:

prefix bio: <http://purl.org/vocab/bio/0.1/>
prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix time: <http://www.w3.org/2006/time#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix ex: <http://example.org/ex#>
prefix thing: <http://example.org/thing#>

select ?name where {
  thing:maria bio:condition ?x .
  ?x foaf:name ?name .
  ?x time:start ?start .
  ?x time:end ?end .
  filter (xsd:integer(?start) <= 1891 && xsd:integer(?end) >= 1891) .
}

Original file: [URL=http://iandavis.com/2009/time-in-rdf/a1s1.sq]a1s1.sq[/URL]

This works well for my simplified situation. If I had an open-ended time interval for her name after her marriage then I would need to add a union clause to this query to match that case. To cover intervals with an known end but unknown start would require still another union block.
Scenario 2

Scenario two is structured in a fairly similar way. I describe two resources as being counties and one as being a parish. Just like the description of Maria the description of Widford refers to two conditions that represent the parish’s association with each county.

thing:oxfordshire
a ex:County ;
foaf:name "Oxfordshire" .

thing:gloucestershire
a ex:County ;
foaf:name "Gloucestershire" .

thing:widford
  a ex:Parish ;
  foaf:name "Widford" ;
  bio:condition thing:widfordInGloucestershire ;
  bio:condition thing:widfordInOxfordshire .

thing:widfordInGloucestershire
  ex:partOf thing:gloucestershire ;
  time:start "1837" ;
  time:end "1844" .

thing:widfordInOxfordshire
  ex:partOf thing:oxfordshire ;
  time:start "1844" ;
  time:end "2009" .

Original file: [URL=http://iandavis.com/2009/time-in-rdf/a1s2.ttl]a1s2.ttl[/URL]

My SPARQL query is quite simple. I just need to find the name of the condition that starts in or before 1841 and ends in or after that year.

select ?name where {
  thing:widford bio:condition ?cond .
  ?cond time:start ?start .
  ?cond time:end ?end .
  ?cond ex:partOf ?x .
  ?x foaf:name ?name .
  filter (xsd:integer(?start) <= 1841 && xsd:integer(?end) >= 1841) .
}

Original file: [URL=http://iandavis.com/2009/time-in-rdf/a1s2.sq]a1s2.sq[/URL]
Scenario 3

I describe resources representing the three places: thing:lymeregis, thing:charmouth and thing:hastings. Then I describe thing:anon which is a person with a condition for each location they have lived in. These conditions are described with an ex:residence property that links the condition to the place and some time properties to describe when the condition applied.

thing:lymeRegis
a ex:Town ;
foaf:name "Lyme Regis" .

thing:charmouth
a ex:Town ;
foaf:name "Charmouth" .

thing:hastings
a ex:Town ;
foaf:name "Hastings" .

thing:anon a foaf:Person ;
  bio:condition thing:anonInLymeRegis ;
  bio:condition thing:anonInCharmouth ;
  bio:condition thing:anonInHastings .

thing:anonInLymeRegis
  ex:residence thing:lymeRegis ;
  time:intervalBefore thing:anonInCharmouth ;
  time:intervalContains "1844" .

thing:anonInCharmouth
  ex:residence thing:charmouth ;
  time:intervalAfter thing:anonInLymeRegis ;
  time:intervalBefore thing:anonInHastings ;
  time:intervalContains "1871" .

thing:anonInHastings
  ex:residence thing:hastings ;
  time:intervalAfter thing:anonInCharmouth ;
  time:intervalContains "1881" .

Original file: [URL=http://iandavis.com/2009/time-in-rdf/a1s3.ttl]a1s3.ttl[/URL]

My SPARQL query needs to find the latest address that contains a date before 1874 and also the address following that one.

# Find the latest address that contains a date before 1874
# Find the next address
select ?nameBefore ?nameAfter where {
  thing:anon bio:condition ?condBefore .
  ?condBefore ex:residence ?resBefore .
  ?resBefore foaf:name ?nameBefore .
  ?condBefore time:intervalContains ?dateBefore .
  filter (xsd:integer(?dateBefore) <= 1874) .

  thing:anon bio:condition ?condAfter .
  ?condAfter ex:residence ?resAfter .
  ?resAfter foaf:name ?nameAfter .
  ?condAfter time:intervalContains ?dateAfter .
  filter (xsd:integer(?dateAfter) > 1874) .

?condBefore time:intervalBefore ?condAfter .
}

Original file: [URL=http://iandavis.com/2009/time-in-rdf/a1s3.sq]a1s3.sq[/URL]

So this seems to do the job. However I am omitting a major piece of detail: the places will also have conditions similar to those in Scenario 2. That is going to add a huge amount of complexity to the SPARQL query.
Approach 1 Conclusions

This approach seems very workable and there is [URL=http://plato.stanford.edu/entries/temporal-parts/]a large body of philosophical work[/URL] pertaining to this view of the world. In RDF terms though, there seems to be a problem with the domains of properties. Using the properties as I have done here implies that the “conditions” are both intervals of time and real-world things which seems awkward to me. I could separate the time intervals into new resources linked to the conditions with some kind of “existed during” predicate. This would complicate things a little, but perhaps not too greatly.

--  作者：admin
--  发布时间：8/11/2009 9:38:00 AM

--  用RDF表达时间（三）
Representing Time in RDF Part 3

Approach 2: Named Graphs

In this approach I physically divide my data up into separate graphs. Each graph contains triples that hold true for a specified time interval. One graph is designated as holding the time interval information for all the other graphs.
Scenario 1

The first graph a2s1g1.ttl contains information about the other graphs. Specifically it describes the time periods for which their triples hold true.

thing:maria a foaf:Person .

<a2s1g2.ttl>
time:start "1867" ;
time:end "1888" .

<a2s1g3.ttl>
time:start "1888" ;
time:end "9999" .

Original file: a2s1g1.ttl

The triples say that the graph a2s1g2.ttl has a start time of 1867 and an end of 1888 and that a2s1g3.ttl starts in 1888 and ends at an arbitrary maximum date. This should be interpreted to mean that the triples contained in those graphs hold only between those dates. This means that I am saying a graph is also a time interval which might be a problem. That could be fixed by introducing a new predicate with the meaning “holds during” that can relate a graph to a time interval and moving the time predicates from the graphs to new interval resources. That would make the following somewhat more complicated.

Note that because I’m using named graphs I won’t be able to use blank nodes to refer to things across the graphs.

The triples that hold true for Maria before she is married are held in a2s1g2.ttl :

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix thing: <http://example.org/thing#> .

thing:maria foaf:name "Maria Smith" .

Original file: a2s1g2.ttl

The triples that hold true after her marriage are in a2s1g3.ttl:

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix thing: <http://example.org/thing#> .

thing:maria foaf:name "Maria Johnson" .

Original file: a2s1g3.ttl

I only have one triple in each of these graphs but of course I could assert lots more facts that were true in the specific time periods for each graph.

Overall this approach uses 8 triples compared to 10 used by Approach 1. The SPARQL query has many similarities with Approach 1.The major difference is that the time period filters are applied to the named graphs not the conditions. Note that in this query I assume a2s1g1.ttl is the default graph.

prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix time: <http://www.w3.org/2006/time#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix ex: <http://example.org/ex#>
prefix thing: <http://example.org/thing#>

select ?name where {
  ?g time:start ?start .
  ?g time:end ?end .
  filter (xsd:integer(?start) <= 1891 && xsd:integer(?end) >= 1891) .

  graph ?g {
    thing:maria foaf:name ?name .
  }
}

Original file: a2s1.sq
Scenario 2

In this scenario the data is split into four graphs. a2s2ag1.ttl holds the time interval descriptions for the other graphs.

<a2s2ag2.ttl>
time:start "1837" ;
time:end "1844" .

<a2s2ag3.ttl>
time:start "1844" ;
time:end "9999" .

<a2s2ag4.ttl>
time:start "1837" ;
time:end "9999" .

Original file: a2s2ag1.ttl

a2s2ag2.ttl holds information about Widford being in Gloucestershire:

@prefix ex: <http://example.org/ex#> .
@prefix thing: <http://example.org/thing#> .

thing:widford ex:partOf thing:gloucestershire .

Original file: a2s2ag2.ttl

a2s2ag3.ttl holds information about Widford being in Oxfordshire:

@prefix ex: <http://example.org/ex#> .
@prefix thing: <http://example.org/thing#> .

thing:widford ex:partOf thing:oxfordshire .

Original file: a2s2ag3.ttl

And finally, a2s2ag4.ttl holds information about the names of the places:

@prefix ex: <http://example.org/ex#> .
@prefix thing: <http://example.org/thing#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

thing:oxfordshire
a ex:County ;
foaf:name "Oxfordshire" .

thing:gloucestershire
a ex:County ;
foaf:name "Gloucestershire" .

thing:widford
a ex:Parish ;
foaf:name "Widford" .

Original file: a2s2ag4.ttl

As can be seen from these examples, the data quickly becomes fragmented across many graphs. The granularity of this fragmentation depends on how frequently information about things changes over time. In the limit you might have one graph per year, per day or even per minute. However in practice you are likely to have graphs with long overlapping time intervals.

Naively you might expect the following query to work for us, following the pattern set by Scenario 1:

select ?name where {
  ?g time:start ?start .
  ?g time:end ?end .
  filter (xsd:integer(?start) <= 1841 && xsd:integer(?end) >= 1841) .

  graph ?g {
    thing:widford ex:partOf ?x .
    ?x foaf:name ?name .
  }
}

However that only works when both the ex:partOf and the foaf:name triples are in the same named graph. I didn’t see this in Scenario 1 because I was only interested in a single triple. Here I am trying to discover a relationship and a name, both of which could change at different times and so may be in different graphs.

So the query is a little complex:

select ?name where {
  ?g time:start ?start .
  ?g time:end ?end .
  filter (xsd:integer(?start) <= 1841 && xsd:integer(?end) >= 1841) .

  graph ?g {
    thing:widford ex:partOf ?x .
  }

  ?g2 time:start ?start2 .
  ?g2 time:end ?end2 .
  filter (xsd:integer(?start2) <= 1841 && xsd:integer(?end2) >= 1841) .

  graph ?g2 {
    ?x foaf:name ?name .
  }

}

Original file: a2s2a.sq

This shows a major disadvantage of the named graph approach because I would have to repeat the time filtering of graphs for each triple I wanted to find!
Scenario 3

Once again this graph (a2s3g1.ttl) contains time information about the other graphs. For this example I have chosen to include some facts about the places here too. By doing this I am basically saying that they are timeless facts which helps simplify this approach.

thing:lymeRegis
a ex:Town ;
foaf:name "Lyme Regis" .

thing:charmouth
a ex:Town ;
foaf:name "Charmouth" .

thing:hastings
a ex:Town ;
foaf:name "Hastings" .

<a2s3g2.ttl>
time:intervalBefore <a2s3g3.ttl> ;
time:intervalContains "1844" .

<a2s3g3.ttl>
  time:intervalAfter <a2s3g2.ttl> ;
  time:intervalBefore <a2s3g4.ttl> ;
  time:intervalContains "1871" .

<a2s3g4.ttl>
time:intervalAfter <a2s3g3.ttl> ;
time:intervalContains "1881" .

Original file: a2s3g1.ttl

In this case I am using relative times for the graphs, saying that the triples in a2s3g2.ttl hold before the triples in a2s3g3.ttl which in turn hold true before the triples in a2s3g4.ttl.

The triples that assert my person lived in Lyme Regis are held in a2s3g2.ttl:

@prefix ex: <http://example.org/ex#> .
@prefix thing: <http://example.org/thing#> .

thing:anon ex:residence thing:lymeRegis .

Original file: a2s3g2.ttl

The triples that assert my person lived in Charmouth are held in a2s3g3.ttl:

@prefix ex: <http://example.org/ex#> .
@prefix thing: <http://example.org/thing#> .

thing:anon ex:residence thing:charmouth .

Original file: a2s3g3.ttl

And the triples for Hastings are in a2s3g4.ttl:

@prefix ex: <http://example.org/ex#> .
@prefix thing: <http://example.org/thing#> .

thing:anon ex:residence thing:hastings .

Original file: a2s3g4.ttl

Now the query looks like this:

select ?nameBefore ?nameAfter where {
?gBefore time:intervalContains ?dateBefore .
filter (xsd:integer(?dateBefore) <= 1874) .

?gAfter time:intervalContains ?dateAfter .
filter (xsd:integer(?dateAfter) > 1874) .

?gBefore time:intervalBefore ?gAfter .

  graph ?gBefore {
    thing:anon ex:residence ?placeBefore .
  }

  graph ?gAfter {
    thing:anon ex:residence ?placeAfter .
  }

?placeBefore foaf:name ?nameBefore .
?placeAfter foaf:name ?nameAfter .

}

Original file: a2s3.sq

What the first two clauses do is find graphs that might contain relevant triples. The first looks for a graph that has triples with time:intervalContains predicate whose value is less than equal to 1874. The second repeats it looking for graphs containing triples after that date. It then uses those two graphs to lookup residence information for the person at those times.

I have a suspicion that this query will bring back more results than are necessary if I had more data. I may have several addresses after 1874 and this query will bring them all back, not just the first.

It’s worth noting that this query would be even more complex if I had not chosen to treat the names of the places as timeless data. If they were in separate graphs then I would have to add additional clauses to find graphs that hold at the specified time just like I did in Scenario 2
Approach 2 Conclusion

Named graphs appear to partition the data very nicely. However it seems that they don’t make the querying any simpler. If it were possible to define a merge of all possible graphs that cover the time interval of interest and query that directly then it could be possible to write very natural queries and completely ignore the time component. This could be possible with a two-phase approach to running the queries or perhaps SPARQL sub-queries might help.

The main problem with named graphs is that they lie outside of the standard RDF model. In fact they are only really formalised by the SPARQL specification. There are no standardised serialisations for named graph data so it is not generally possible to query a SPARQL service and retrieve the named graph information. The TRIG and TRIX serialisations do support named graphs but they are not widely implemented in comparison to RDF/XML, Turtle or Ntriples, none of which support anything beyond the standard RDF model. I don’t know of any reasoners that can work across multiple named graphs like this either.

--  作者：admin
--  发布时间：8/11/2009 9:40:00 AM

--  用RDF表达时间（四）
Representing Time in RDF Part 4

Approach 3: Reified Relations

In this approach I reify every triple that is time-dependent. That means every triple is split out into its constituent parts which will obviously lead to an explosion of triples.
Scenario 1

In this scenario I use thing:maria to refer to Maria, and I state the fact of her being a foaf:Person as a piece of timeless information. Facts about her name are reified and time interval information is attached. The URIs thing:mariaUnmarried and thing:mariaMarried represent the reified triples holding her name before and after marriage:

thing:maria a foaf:Person .

thing:mariaUnmarried
  ex:subject thing:maria ;
  ex:property foaf:name ;
  ex:value "Maria Smith" ;
  time:start "1867" ;
  time:end "1888" .

thing:mariaMarried
  ex:subject thing:maria ;
  ex:property foaf:name ;
  ex:value "Maria Johnson" ;
  time:start "1888" ;
  time:end "9999" .

Original file: a3s1.ttl

Facts about her name are reified using custom reification predicates, not the standard RDF ones. I do this to allow more flexibility in the use of these predicates without requiring that their subject is always an RDF statement. Potentially I could have multiple occurrences of ex:subject, ex:property or ex:value which would allow more compact descriptions to be created. For example I could have a description where multiple objects are included for the same predicate and subject.

The query in this case is straighforward and quite similar to the query for Approach 1.

select ?name where {
  ?p ex:subject thing:maria .
  ?p ex:property foaf:name .
  ?p time:start ?start .
  ?p time:end ?end .
  ?p ex:value ?name .
  filter (xsd:integer(?start) <= 1891 && xsd:integer(?end) >= 1891) .
}

Original file: a3s1.sq

I simply search for resources that represent reified triples that hold in 1891. Then I select the ex:value of the resulting resource.
Scenario 2

For simplicity I start of by treating the place name information as timeless, only reifying the information about relationships between the places:

thing:oxfordshire
a ex:County ;
foaf:name "Oxfordshire" .

thing:gloucestershire
a ex:County ;
foaf:name "Gloucestershire" .

thing:widford
a ex:Parish ;
foaf:name "Widford" .

thing:widfordInGloucestershire
  ex:subject thing:widford ;
  ex:property ex:partOf ;
  ex:value thing:gloucestershire ;
  time:start "1837" ;
  time:end "1844" .

thing:widfordInOxfordshire
  ex:subject thing:widford ;
  ex:property ex:partOf ;
  ex:value thing:oxfordshire ;
  time:start "1844" ;
  time:end "9999" .

Original file: a3s2.ttl

The query is again very simple and reminiscent of Approach 1:

select ?name where {
  ?p ex:subject thing:widford .
  ?p ex:property ex:partOf .
  ?p time:start ?start .
  ?p time:end ?end .
  ?p ex:value ?x .
  ?x foaf:name ?name
  filter (xsd:integer(?start) <= 1841 && xsd:integer(?end) >= 1841) .
}

Original file: a3s2.sq

However, a more realistic approach would be to model the names of the places as time-dependent triples which means reifying them. I introduce some new resources thing:widfordName, thing:oxfordshireName and thing:gloucestershireName to represent the reified forms of those triples. Now the data is much more verbose:

thing:oxfordshire a ex:County .

thing:gloucestershire a ex:County .

thing:widford a ex:Parish .

thing:widfordInGloucestershire
  ex:subject thing:widford ;
  ex:property ex:partOf ;
  ex:value thing:gloucestershire ;
  time:start "1837" ;
  time:end "1844" .

thing:widfordInOxfordshire
  ex:subject thing:widford ;
  ex:property ex:partOf ;
  ex:value thing:oxfordshire ;
  time:start "1844" ;
  time:end "9999" .

thing:widfordName
  ex:subject thing:widford ;
  ex:property foaf:name ;
  ex:value "Widford" ;
  time:start "1837" ;
  time:end "9999" .

thing:oxfordshireName
  ex:subject thing:oxfordshire ;
  ex:property foaf:name ;
  ex:value "Oxfordshire" ;
  time:start "1837" ;
  time:end "9999" .

thing:gloucestershireName
  ex:subject thing:gloucestershire ;
  ex:property foaf:name ;
  ex:value "Gloucestershire" ;
  time:start "1837" ;
  time:end "9999" .

Original file: a3s2a.ttl

Also the query becomes more complex:

select ?name where {
  ?p ex:subject thing:widford .
  ?p ex:property ex:partOf .
  ?p time:start ?start .
  ?p time:end ?end .
  ?p ex:value ?x .
  filter (xsd:integer(?start) <= 1841 && xsd:integer(?end) >= 1841) .

  ?p2 ex:subject ?x .
  ?p2 ex:property foaf:name .
  ?p2 time:start ?start2 .
  ?p2 time:end ?end2 .
  ?p2 ex:value ?name .
  filter (xsd:integer(?start2) <= 1841 && xsd:integer(?end2) >= 1841) .
}

Original file: a3s2a.sq

Each triple being selected requires its time interval information to be specified sepearately which is reminiscent of Approach 2
Scenario 3

Once again for simplicity I keep the place name information timeless. I introduce new resources for the reified statements about the residence of the person over time:

thing:lymeRegis
a ex:Town ;
foaf:name "Lyme Regis" .

thing:charmouth
a ex:Town ;
foaf:name "Charmouth" .

thing:hastings
a ex:Town ;
foaf:name "Hastings" .

thing:anon a foaf:Person .

thing:anonInLymeRegis
  ex:subject thing:anon ;
  ex:property ex:residence ;
  ex:value thing:lymeRegis ;
  time:intervalBefore thing:anonInCharmouth ;
  time:intervalContains "1844" .

thing:anonInCharmouth
  ex:subject thing:anon ;
  ex:property ex:residence ;
  ex:value thing:charmouth ;
  time:intervalAfter thing:anonInLymeRegis ;
  time:intervalBefore thing:anonInHastings ;
  time:intervalContains "1871" .

thing:anonInHastings
  ex:subject thing:anon ;
  ex:property ex:residence ;
  ex:value thing:hastings ;
  time:intervalAfter thing:anonInCharmouth ;
  time:intervalContains "1881" .

Original file: a3s3.ttl

The query is straightforward although quite verbose. If the place name triples were also reified then this query would suddenly become much more complex:

select ?nameBefore ?nameAfter where {
  ?pBefore ex:subject thing:anon .
  ?pBefore ex:property ex:residence .
  ?pBefore ex:value ?placeBefore .
  ?placeBefore foaf:name ?nameBefore .

?pBefore time:intervalContains ?dateBefore .
filter (xsd:integer(?dateBefore) <= 1874) .

  ?pAfter ex:subject thing:anon .
  ?pAfter ex:property ex:residence .
  ?pAfter ex:value ?placeAfter .
  ?placeAfter foaf:name ?nameAfter .

?pAfter time:intervalContains ?dateAfter .
filter (xsd:integer(?dateAfter) > 1874) .

?pBefore time:intervalBefore ?pAfter .
}

Original file: a3s3.sq
Approach 3 Conclusions

Approach 3 avoids the domain and range problem experienced by Approach 1 where the conditions were being used with properties whose domains were foaf:Agents. In Approach 3, properties are used with the appropriate resource types when they are timeless and never actually asserted when they are time-dependent. However, this approach is tedious for large quantities of data. The semantics are all locked away behind the reified triples. Once again I don’t know of any reasoners that could work with this kind of data.

--  作者：admin
--  发布时间：8/11/2009 9:40:00 AM

--  用RDF表达时间（五）
Representing Time in RDF Part 5

Approach 4: N-ary Relations

In this approach a new class is created for each time-dependent predicate. This new class represents the context of the property and allows more specific predicates to be used that provide extra meaning.
Scenario 1

In the first scenario we use a new ex:NameInContext class. This provides two predicates ex:individual and ex:name to link an individual to a name in a particular context.

thing:maria a foaf:Person .

thing:mariaUnmarried
  a ex:NameInContext ;
  ex:individual thing:maria ;
  ex:name "Maria Smith" ;
  time:start "1867" ;
  time:end "1888" .

thing:mariaMarried
  a ex:NameInContext ;
  ex:individual thing:maria ;
  ex:name "Maria Johnson" ;
  time:start "1888" ;
  time:end "9999" .

Original file: a4s1.ttl

The query is very similar to that in Approach 3:

select ?name where {
  ?p a ex:NameInContext .
  ?p ex:individual thing:maria .
  ?p time:start ?start .
  ?p time:end ?end .
  ?p ex:name ?name .
  filter (xsd:integer(?start) <= 1891 && xsd:integer(?end) >= 1891) .
}

Original file: a4s1.sq
Scenario 2

For this scenario I use a class to represent the part-of relationship with two new predicates: ex:part and ex:whole. Once again, for simplicity I assume the place name information is timeless.

thing:oxfordshire
a ex:County ;
foaf:name "Oxfordshire" .

thing:gloucestershire
a ex:County ;
foaf:name "Gloucestershire" .

thing:widford
a ex:Parish ;
foaf:name "Widford" .

thing:widfordInGloucestershire
  a ex:PartOfContext ;
  ex:part thing:widford ;
  ex:whole thing:gloucestershire ;
  time:start "1837" ;
  time:end "1844" .

thing:widfordInOxfordshire
  a ex:PartOfContext ;
  ex:part thing:widford ;
  ex:whole thing:oxfordshire ;
  time:start "1844" ;
  time:end "9999" .

Original file: a4s2.ttl

The query here looks like:

select ?name where {
  ?p a ex:PartOfContext .
  ?p ex:part thing:widford .
  ?p ex:whole ?x .
  ?p time:start ?start .
  ?p time:end ?end .
  ?x foaf:name ?name
  filter (xsd:integer(?start) <= 1841 && xsd:integer(?end) >= 1841) .
}

Original file: a4s2.sq
Scenario 3

For the final scenario I use ex:ResidenceContext to represent the context of someone being resident somewhere. The person and the place are referred to using new predicates ex:individual and ex:place:

thing:lymeRegis
a ex:Town ;
foaf:name "Lyme Regis" .

thing:charmouth
a ex:Town ;
foaf:name "Charmouth" .

thing:hastings
a ex:Town ;
foaf:name "Hastings" .

thing:anon a foaf:Person .

thing:anonInLymeRegis
  a ex:ResidenceContext ;
  ex:individual thing:anon ;
  ex:place thing:lymeRegis ;
  time:intervalBefore thing:anonInCharmouth ;
  time:intervalContains "1844" .

thing:anonInCharmouth
  a ex:ResidenceContext ;
  ex:individual thing:anon ;
  ex:place thing:charmouth ;
  time:intervalAfter thing:anonInLymeRegis ;
  time:intervalBefore thing:anonInHastings ;
  time:intervalContains "1871" .

thing:anonInHastings
  a ex:ResidenceContext ;
  ex:individual thing:anon ;
  ex:place thing:hastings ;
  time:intervalAfter thing:anonInCharmouth ;
  time:intervalContains "1881" .

Original file: a4s3.ttl

Once again the query is very similar to that in Approach 3:

select ?nameBefore ?nameAfter where {
  ?pBefore a ex:ResidenceContext .
  ?pBefore ex:individual thing:anon .
  ?pBefore ex:place ?placeBefore .
  ?placeBefore foaf:name ?nameBefore .

?pBefore time:intervalContains ?dateBefore .
filter (xsd:integer(?dateBefore) <= 1874) .

  ?pAfter a ex:ResidenceContext .
  ?pAfter ex:individual thing:anon .
  ?pAfter ex:place ?placeAfter .
  ?placeAfter foaf:name ?nameAfter .

?pAfter time:intervalContains ?dateAfter .
filter (xsd:integer(?dateAfter) > 1874) .

?pBefore time:intervalBefore ?pAfter .
}

Original file: a4s3.sq
Approach 4 Conclusions

In the examples shown here Approach 4 is identical to Approach 3 in complexity. In fact the key difference is the use of rdf:type rather than ex:property to distinguish the different types of relationships. In this respect it seems to offer no advantage over Approach 3 and adds the complexity of specific property names for each context relationship.

However, it does potentially offer a wider use beyond simply recording time-varying properties. A context could include other factors such as provenance or location. Also it could be easier to model multi-agent contexts such as a marriages with predicates to represent the bride and groom separately. For example:

thing:marriage
  a ex:MarriedContext ;
  ex:husband thing:person1 ;
  ex:wife thing:person2 ;
  time:start "1820" ;
  time:end "1862" .

--  作者：admin
--  发布时间：8/11/2009 9:41:00 AM

--  用RDF表达时间（六）
Representing Time in RDF Part 6

I found these documents useful while researching this topic. I include them here because they could make a useful list of background reading for modelling time with RDF.

    * Refactoring BIO with Einstein Part 1: First Steps — my first post that touches on modelling of time in genealogy. At this point I was attempting to model it simply using an event model, i.e. a sequence of things that happen to people and places.
    * Refactoring BIO with Einstein Part 2: Conditions — this is the post in which I first introduced Conditions (i.e. Approach 1). The post uses almost exactly the same example as Scenario 1.
    * Refactoring Bio With Einstein Part 3: Temporal Invariants — in the third part of the series I explored those properties of a person that are timeless, or “temporally invariant”.
    * Refactoring BIO with Einstein Part 4: Employment and Families — in this post I continue the theme of part 2 and demonstrate how employment periods could be described using conditions and events that mark transitions between conditions.
    * Temporal Scope for RDF Triples — in this blog post Jeni Tennison describes her attempts to model time for the London Gazette data she is working with. She describes the reified approach (Approach 3) as unacceptable because most triple stores do not deal with reified data (i.e. don’t allow you to query it in its un-reified form). She describes two acceptable approaches which are N-ary relationships (Approach 4) and named graphs (Approach 2) with a preference for the latter. Some useful comments point to other examples of these approaches.
    * RDF and the Time Dimension Part 1 — in this post the author explains succinctly where the problem lies although the example used is flawed because it contains hidden context (i.e. “August is a summer month…” is not true in general and needs the context “…for those in the Northern Hemisphere”, which can be modelled in RDF). The post also settles on named graphs as a solution but claims they cannot be used for continuous dimensions such as time (missing the solution of using something like OWL-Time to represent intervals and relative timings).
    * RDF and the Time Dimension Part 2 — in this follow-up post the author proposes reifying statements and adding a new predicate to relate the reified statement to the context. (A hybrid of Approach 3 and Approach 4?). A second and preferred solution using named graphs is also presented. The author also demonstrates how to obtain a snapshot of all triples that held true at a specific time under both approaches (a “snapshot”).
    * OWL Time — an ontology of time concepts
    * Property Reification Vocabulary — a proposal for mapping between reified properties and classes that represent the reifications (i.e. between Approach 3 and Approach 4)
    * Temporal RDF
    * Music Ontology Events

--  作者：Humphrey
--  发布时间：8/12/2009 11:20:00 AM

--
嗬！够长。曾经大概地了解过几种程序设计语言对各种对象的处理方式，时间都是一个重要部分。一些语言甚至专门用一个栏目说明时间的表示问题，或许是没碰上过对时间敏感的问题的原因吧，我并不清楚对时间进行这么复杂的加工究竟是为了什么。

W 3 C h i n a ( since 2003 ) 旗下站点
苏ICP备05006046号《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》

171.875ms