See 3.3.5 Emulating container types: http://docs.python.org/ref/sequence-types.html#l2h-232
>>> from rdflib.graph import Graph
>>> listName = BNode()
>>> g = Graph('IOMemory')
>>> listItem1 = BNode()
>>> listItem2 = BNode()
>>> g.add((listName,RDF.first,Literal(1)))
>>> g.add((listName,RDF.rest,listItem1))
>>> g.add((listItem1,RDF.first,Literal(2)))
>>> g.add((listItem1,RDF.rest,listItem2))
>>> g.add((listItem2,RDF.rest,RDF.nil))
>>> g.add((listItem2,RDF.first,Literal(3)))
>>> c=Collection(g,listName)
>>> print list(c)
[rdflib.term.Literal(u'1', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')), rdflib.term.Literal(u'2', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')), rdflib.term.Literal(u'3', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer'))]
>>> 1 in c
True
>>> len(c)
3
>>> c._get_container(1) == listItem1
True
>>> c.index(Literal(2)) == 1
True
>>> from rdflib.graph import Graph
>>> listName = BNode()
>>> g = Graph()
>>> c=Collection(g,listName,[Literal(1),Literal(2)])
>>> links = [list(g.subjects(object=i,predicate=RDF.first))[0] for i in c]
>>> len([i for i in links if (i,RDF.rest,RDF.nil) in g])
1
>>> from rdflib.graph import Graph
>>> listName = BNode()
>>> g = Graph('IOMemory')
>>> listItem1 = BNode()
>>> listItem2 = BNode()
>>> g.add((listName,RDF.first,Literal(1)))
>>> g.add((listName,RDF.rest,listItem1))
>>> g.add((listItem1,RDF.first,Literal(2)))
>>> g.add((listItem1,RDF.rest,listItem2))
>>> g.add((listItem2,RDF.rest,RDF.nil))
>>> g.add((listItem2,RDF.first,Literal(3)))
>>> c=Collection(g,listName)
>>> print c.n3()
( "1"^^<http://www.w3.org/2001/XMLSchema#integer> "2"^^<http://www.w3.org/2001/XMLSchema#integer> "3"^^<http://www.w3.org/2001/XMLSchema#integer> )
A common class for representing query result in a variety of formats, namely:
xml : as an XML string using the XML result format of the query language python: as Python objects json : as JSON
This module defines the different types of terms...
This module defines the parser plugin interface and contains other related parser support code.
The module is mainly useful for those wanting to write a parser that can plugin to rdflib. If you are wanting to invoke a parser you likely want to do so through the Graph class parse method.
An rdflib graph event handler than indexes text literals that are added to a another graph.
This class lets you ‘search’ the text literals in an RDF graph. Typically in RDF to search for a substring in an RDF graph you would have to ‘brute force’ search every literal string looking for your substring.
Instead, this index stores the words in literals into another graph whose structure makes searching for terms much less expensive. It does this by chopping up the literals into words, removing very common words (currently only in English) and then adding each of those words into an RDF graph that describes the statements in the original graph that the word came from.
First, let’s create a graph that will transmit events and a text index that will receive those events, and then subscribe the text index to the event graph:
>>> e = ConjunctiveGraph()
>>> t = TextIndex()
>>> t.subscribe_to(e)
When triples are added to the event graph (e) events will be fired that trigger event handlers in subscribers. In this case our only subscriber is a text index and its action is to index triples that contain literal RDF objects. Here are 3 such triples:
>>> e.add((URIRef('a'), URIRef('title'), Literal('one two three')))
>>> e.add((URIRef('b'), URIRef('title'), Literal('two three four')))
>>> e.add((URIRef('c'), URIRef('title'), Literal('three four five')))
Of the three literal objects that were added, they all contain five unique terms. These terms can be queried directly from the text index:
>>> t.term_strings() == set(['four', 'five', 'three', 'two', 'one'])
True
Now we can search for statement that contain certain terms. Let’s search for ‘one’ which occurs in only one of the literals provided, ‘a’. This can be queried for:
>>> t.search('one')==set([(URIRef('a'), URIRef('title'), None)])
True
‘one’ and ‘five’ only occur in one statement each, ‘two’ and ‘four’ occur in two, and ‘three’ occurs in three statements:
>>> len(list(t.search('one')))
1
>>> len(list(t.search('two')))
2
>>> len(list(t.search('three')))
3
>>> len(list(t.search('four')))
2
>>> len(list(t.search('five')))
1
Lets add some more statements with different predicates.
>>> e.add((URIRef('a'), URIRef('creator'), Literal('michel')))
>>> e.add((URIRef('b'), URIRef('creator'), Literal('Atilla the one Hun')))
>>> e.add((URIRef('c'), URIRef('creator'), Literal('michel')))
>>> e.add((URIRef('d'), URIRef('creator'), Literal('Hun Mung two')))
Now ‘one’ occurs in two statements:
>>> assert len(list(t.search('one'))) == 2
And ‘two’ occurs in three statements, here they are:
>>> t.search('two')==set([(URIRef('d'), URIRef('creator'), None), (URIRef('a'), URIRef('title'), None), (URIRef('b'), URIRef('title'), None)])
True
The predicates that are searched can be restricted by provding an argument to ‘search()’:
>>> t.search('two', URIRef('creator'))==set([(URIRef('d'), URIRef('creator'), None)]) True>>> t.search('two', URIRef(u'title'))==set([(URIRef('a'), URIRef('title'), None), (URIRef('b'), URIRef('title'), None)]) True
You can search for more than one term by simply including it in the query:
>>> t.search('two three', URIRef(u'title'))==set([(URIRef('c'), URIRef('title'), None), (URIRef('a'), URIRef('title'), None), (URIRef('b'), URIRef('title'), None)])
True
The above query returns all the statements that contain ‘two’ OR ‘three’. For the documents that contain ‘two’ AND ‘three’, do an intersection of two queries:
>>> t.search('two', URIRef(u'title')).intersection(t.search(u'three', URIRef(u'title')))==set([(URIRef('a'), URIRef('title'), None), (URIRef('b'), URIRef('title'), None)])
True
Intersection two queries like this is probably not the most efficient way to do it, but for reasonable data sets this isn’t a problem. Larger data sets will want to query the graph with sparql or something else more efficient.
In all the above queries, the object of each statement was always ‘None’. This is because the index graph does not store the object data, that would make it very large, and besides the data is available in the original data graph. For convenience, a method is provides to ‘link’ an index graph to a data graph. This allows the index to also provide object data in query results.
>>> t.link_to(e)
>>> set([str(i[2]) for i in t.search('two', URIRef(u'title')).intersection(t.search(u'three', URIRef(u'title')))]) == set(['two three four', 'one two three'])
True
You can remove the link by assigning None:
>>> t.link_to(None)
Unindexing means to remove statments from the index graph that corespond to a statement in the data graph. Note that while it is possible to remove the index information of the occurances of terms in statements, it is not possible to remove the terms themselves, terms are ‘absolute’ and are never removed from the index graph. This is not a problem since languages have finite terms:
>>> e.remove((URIRef('a'), URIRef('creator'), Literal('michel')))
>>> e.remove((URIRef('b'), URIRef('creator'), Literal('Atilla the one Hun')))
>>> e.remove((URIRef('c'), URIRef('creator'), Literal('michel')))
>>> e.remove((URIRef('d'), URIRef('creator'), Literal('Hun Mung two')))
Now ‘one’ only occurs in one statement:
>>> assert len(list(t.search('one'))) == 1
And ‘two’ only occurs in two statements, here they are:
>>> t.search('two')==set([(URIRef('a'), URIRef('title'), None), (URIRef('b'), URIRef('title'), None)])
True
The predicates that are searched can be restricted by provding an argument to ‘search()’:
>>> t.search('two', URIRef(u'creator')) set([])>>> t.search('two', URIRef(u'title'))==set([(URIRef('a'), URIRef('title'), None), (URIRef('b'), URIRef('title'), None)]) True
TODO: merge this first bit from sparql.sparql.py into rest of doc... updating all along the way.
SPARQL implementation on top of RDFLib
Implementation of the W3C SPARQL language (version April 2005). The basic class here is supposed to be a superclass of rdflib.sparql.sparqlGraph; it has been separated only for a better maintainability.
There is a separate description for the functionalities.
For a general description of the SPARQL API, see the separate, more complete description.
The top level (__init__.py) module of the Package imports the important classes. In other words, the user may choose to use the following imports only:
from rdflibUtils import myTripleStore
from rdflibUtils import retrieveRDFFiles
from rdflibUtils import SPARQLError
from rdflibUtils import GraphPattern
The module imports and/or creates some frequently used Namespaces, and these can then be imported by the user like:
from rdflibUtils import ns_rdf
Finally, the package also has a set of convenience string defines for XML Schema datatypes (ie, the URI-s of the datatypes); ie, one can use:
from rdflibUtils import type_string
from rdflibUtils import type_integer
from rdflibUtils import type_long
from rdflibUtils import type_double
from rdflibUtils import type_float
from rdflibUtils import type_decimal
from rdflibUtils import type_dateTime
from rdflibUtils import type_date
from rdflibUtils import type_time
from rdflibUtils import type_duration
These are used, for example, in the sparql-p implementation.
The three most important classes in RDFLib for the average user are Namespace, URIRef and Literal; these are also imported, so the user can also use, eg:
from rdflibUtils import Namespace, URIRef, Literal
- Version 1.0: based on an earlier version of the SPARQL, first released implementation
- Version 2.0: version based on the March 2005 SPARQL document, also a major change of the core code (introduction of the separate GraphPattern rdflibUtils.graphPattern.GraphPattern class, etc).
- Version 2.01: minor changes only: - switch to epydoc as a documentation tool, it gives a much better overview of the classes - addition of the SELECT * feature to sparql-p
- Version 2.02: - added some methods to myTripleStore rdflibUtils.myTripleStore.myTripleStore to handle Alt and Bag the same way as Seq - added also methods to add() collections and containers to the triple store, not only retrieve them
- Version 2.1: adapted to the inclusion of the code into rdflib, thanks to Michel Pelletier
- Version 2.2: added the sorting possibilities; introduced the Unbound class and have a better interface to patterns using this (in the BasicGraphPattern class)
@author: Ivan Herman
@license: This software is available for use under the W3C Software License
@contact: Ivan Herman, ivan@ivan-herman.net
@version: 2.2