Getting started with rdflib

Introduction to parsing RDF into rdflib graphs

Reading an NT file

RDF data has various syntaxes (xml, n3, ntriples, trix, etc) that you might want to read. The simplest format is ntriples. Create the file demo.nt in the current directory with these two lines:

<http://bigasterisk.com/foaf.rdf#drewp> \
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> \
<http://xmlns.com/foaf/0.1/Person> .
<http://bigasterisk.com/foaf.rdf#drewp> \
<http://example.com/says> \
"Hello world" .

In an interactive python interpreter, try this:

>>> from rdflib.graph import Graph
>>> g = Graph()
>>> g.parse("demo.nt", format="nt")
<Graph identifier=HCbubHJy0 (<class 'rdflib.graph.Graph'>)>
>>> len(g)
2
>>> import pprint
>>> for stmt in g:
...     pprint.pprint(stmt)
...
(rdflib.term.URIRef('http://bigasterisk.com/foaf.rdf#drewp'),
 rdflib.term.URIRef('http://example.com/says'),
 rdflib.term.Literal(u'Hello world'))
(rdflib.term.URIRef('http://bigasterisk.com/foaf.rdf#drewp'),
 rdflib.term.URIRef('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
 rdflib.term.URIRef('http://xmlns.com/foaf/0.1/Person'))

The final lines show how rdflib represents the two statements in the file. The statements themselves are just length-3 tuples; and the subjects, predicates, and objects are all rdflib types.

Reading remote graphs

Reading graphs from the net is just as easy:

>>> g.parse("http://bigasterisk.com/foaf.rdf")
>>> len(g)
42

The format defaults to xml, which is the common format for .rdf files you’ll find on the net.

See also

Graph.parse(source=None, publicID=None, format=None, location=None, file=None, data=None, **args)

Parse source adding the resulting triples to the Graph.

The source is specified using one of source, location, file or data.

Parameters:
  • source: An InputSource, file-like object, or string. In the case of a string the string is the location of the source.
  • location: A string indicating the relative or absolute URL of the source. Graph’s absolutize method is used if a relative location is specified.
  • file: A file-like object.
  • data: A string containing the data to be parsed.
  • format: Used if format can not be determined from source. Defaults to rdf/xml.
  • publicID: the logical URI to use as the document base. If None specified the document location is used (at least in the case where there is a document location).
Returns:

self, the graph instance.

Examples:

>>> my_data = '''
... <rdf:RDF
...   xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
...   xmlns:rdfs='http://www.w3.org/2000/01/rdf-schema#'
... >
...   <rdf:Description>
...     <rdfs:label>Example</rdfs:label>
...     <rdfs:comment>This is really just an example.</rdfs:comment>
...   </rdf:Description>
... </rdf:RDF>
... '''
>>> import tempfile
>>> file_name = tempfile.mktemp()
>>> f = file(file_name, "w")
>>> f.write(my_data)
>>> f.close()
>>> g = Graph()
>>> result = g.parse(data=my_data, format="application/rdf+xml")
>>> len(g)
2
>>> g = Graph()
>>> result = g.parse(location=file_name, format="application/rdf+xml")
>>> len(g)
2
>>> g = Graph()
>>> result = g.parse(file=file(file_name, "r"), format="application/rdf+xml")
>>> len(g)
2

Other parsers supported by rdflib

class rdflib.syntax.parsers.N3Parser.N3Parser
class rdflib.syntax.parsers.NTParser.NTParser

N-Triples Parser License: GPL 2, W3C, BSD, or MIT Author: Sean B. Palmer, inamidst.com Documentation: http://inamidst.com/proj/rdf/ntriples-doc

Command line usage:

./ntriples.py <URI>    - parses URI as N-Triples
./ntriples.py --help   - prints out this help message

# @@ fully empty document?

class rdflib.syntax.parsers.ntriples.NTriplesParser(sink=None)

An N-Triples Parser.

Usage:

p = NTriplesParser(sink=MySink())
sink = p.parse(f) # file; use parsestring for a string
parse(f)
Parse f as an N-Triples file.
parsestring(s)
Parse s as an N-Triples string.
readline()
Read an N-Triples line from buffered input.
class rdflib.syntax.parsers.RDFXMLParser.RDFXMLParser
class rdflib.syntax.parsers.TriXParser.TriXParser
A parser for TriX. See http://swdev.nokia.com/trix/TriX.html

Introduction to using SPARQL to query an rdflib graph

Create an Rdflib Graph

You might parse some files into a new graph (Introduction to parsing RDF into rdflib graphs) or open an on-disk rdflib store.

from rdflib.graph import Graph
g = Graph()
g.parse("http://bigasterisk.com/foaf.rdf")
g.parse("http://www.w3.org/People/Berners-Lee/card.rdf")

LiveJournal produces FOAF data for their users, but they seem to use foaf:member_name for a person’s full name. For this demo, I made foaf:name act as a synonym for foaf:member_name (a poor man’s one-way owl:equivalentProperty):

from rdflib.namespace import Namespace
FOAF = Namespace("http://xmlns.com/foaf/0.1/")
g.parse("http://danbri.livejournal.com/data/foaf")
[g.add((s, FOAF['name'], n)) for s,_,n in g.triples((None, FOAF['member_name'], None))]

Run a Query

for row in g.query(
        """SELECT ?aname ?bname
           WHERE {
              ?a foaf:knows ?b .
              ?a foaf:name ?aname .
              ?b foaf:name ?bname .
           }""",
        initNs=dict(foaf=Namespace("http://xmlns.com/foaf/0.1/"))):
    print "%s knows %s" % row

The results are tuples of values in the same order as your SELECT arguments.

Timothy Berners-Lee knows Edd Dumbill
Timothy Berners-Lee knows Jennifer Golbeck
Timothy Berners-Lee knows Nicholas Gibbins
Timothy Berners-Lee knows Nigel Shadbolt
Dan Brickley knows binzac
Timothy Berners-Lee knows Eric Miller
Drew Perttula knows David McClosky
Timothy Berners-Lee knows Dan Connolly
...

Namespaces

The Graph.parse() initNs argument is a dictionary of namespaces to be expanded in the query string. In a large program, it’s common to use the same dict for every single query. You might even hack your graph instance so that the initNs arg is already filled in.

If someone knows how to use the empty prefix (e.g. “?a :knows ?b”), please write about it here and in the Graph.query() docs.

ewan klein provides the answer, use BASE to set a default namespace ...

BASE <http://xmlns.com/foaf/0.1/>

Bindings

Just like with SQL queries, it’s common to run the same query many times with only a few terms changing. rdflib calls this initBindings:

FOAF = Namespace("http://xmlns.com/foaf/0.1/")
ns = dict(foaf=FOAF)
drew = URIRef('http://bigasterisk.com/foaf.rdf#drewp')
for row in g.query("""SELECT ?name
                      WHERE { ?p foaf:name ?name }""",
                   initNs=ns, initBindings={'?p' : drew}):
    print row

Output:

(rdflib.Literal('Drew Perttula', language=None, datatype=None),)
Graph.query(strOrQuery, initBindings={}, initNs={}, DEBUG=False, PARSE_DEBUG=False, dataSetBase=None, processor='sparql', extensionFunctions={rdflib.term.URIRef('http://www.w3.org/TR/rdf-sparql-query/#describe'): <function describe at 0x15ab130>})

Executes a SPARQL query (eventually will support Versa queries with same method) against this Graph.

  • strOrQuery: Either a string consisting of the SPARQL query or

    an instance of rdflib.sparql.bison.Query.Query

  • initBindings: A mapping from a Variable to an RDFLib term (used

    as initial bindings for SPARQL query)

  • initNS: A mapping from a namespace prefix to an instance of

    rdflib.Namespace (used for SPARQL query)

  • DEBUG: A boolean flag passed on to the SPARQL parser and

    evaluation engine

  • processor: The kind of RDF query (must be ‘sparql’ until Versa

    is ported)

  • USE_PYPARSING: A flag indicating whether to use the

    experimental pyparsing parser for SPARQL

Store operations

Example code to create a MySQL triple store, add some triples, and serialize the resulting graph.

import rdflib
from rdflib.graph import ConjunctiveGraph as Graph
from rdflib import plugin
from rdflib.store import Store, NO_STORE, VALID_STORE
from rdflib.namespace import Namespace
from rdflib.term import Literal
from rdflib.term import URIRef

default_graph_uri = "http://rdflib.net/rdfstore"
configString = "host=localhost,user=username,password=password,db=rdfstore"

# Get the mysql plugin. You may have to install the python mysql libraries
store = plugin.get('MySQL', Store)('rdfstore')

# Open previously created store, or create it if it doesn't exist yet
rt = store.open(configString,create=False)
if rt == NO_STORE:
    # There is no underlying MySQL infrastructure, create it
    store.open(configString,create=True)
else:
    assert rt == VALID_STORE,"There underlying store is corrupted"

# There is a store, use it
graph = Graph(store, identifier = URIRef(default_graph_uri))

print "Triples in graph before add: ", len(graph)

# Now we'll add some triples to the graph & commit the changes
rdflib = Namespace('http://rdflib.net/test/')
graph.add((rdflib['pic:1'], rdflib['name'], Literal('Jane & Bob')))
graph.add((rdflib['pic:2'], rdflib['name'], Literal('Squirrel in Tree')))
graph.commit()

print "Triples in graph after add: ", len(graph)

# display the graph in RDF/XML
print graph.serialize()