Drupal and the Semantic Web

RDFa and vocabularies

Murray Woodman

cruncht.com

A bit about me

Coming to Drupal

I moved to Drupal because of its purported sem web capabilities. Thanks Alexandre Passant!

Hi Alexandre, I've just been converted to WordPress... however, I have run into the limitations ... with its ability to support sem web. I now see that there appears to be a lot of sem web momentum in the Drupal world. This post of yours has given me some inspiration to take a closer look at Drupal. RDF CCK, in particular, looks to be the kind of thing I was thinking about.

cheers Murray

Presentation overview

  1. Vision, hype and reality (15 min)
  2. Semantic Web overview (40 min + break)
  3. Structure in Drupal (20 min)
  4. Mapping custom fields (10 min)
  5. The future (10 min)

1. Vision, hype and reality

For the last 10-15 years or so the semantic web has mainly been an academic and enthusiasts enclave.

Lots of good work has been done in that time (data model, serializations, triple stores, identity, linked data).

Big players, Drupal included, starting to get on board with structured data.

The vision

TBL: 1994

To a computer, then, the web is a flat, boring world devoid of meaning. This is a pity, as in fact documents on the web describe real objects and imaginary concepts, and give particular relationships between them. Adding semantics to the web involves two things: allowing documents which have information in machine-readable forms, and allowing links to be created with relationship values. Only when we have this extra level of semantics will we be able to use computer power to help us exploit the information to a greater extent than our own reading.

Plenary at WWW Geneva 94: The Need for Semantics in the Web

Documents, objects and concepts

Plenary at WWW Geneva 94: The Need for Semantics in the Web

TBL: 1999

I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A 'Semantic Web', which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The 'intelligent agents' people have touted for ages will finally materialize.

Mainstream pickup

Yahoo: May 15, 2008

Site owners share structured data with Yahoo!, using semantic markup (microformats, RDF), standardized XML feeds, APIs (OpenSearch or other web services), and page extraction.

The Monkey is Out and the Challenge is On

Drupal: Oct 15, 2008

It is waiting to happen; we just have to connect the dots. That is, we have to make Drupal emit structured information.

Drupal, the semantic web and search

Google: May 12 2009

Rich Snippets give users convenient summary information about their search results at a glance. We are currently supporting data about reviews and people... It's a simple change to the display of search results, yet our experiments have shown that users find the new data valuable - if they see useful and relevant information from the page, they are more likely to click through... As a webmaster, you can help by annotating your pages with structured data in a standard format.

Google Rich Snippets

Drupal: May 14, 2009

Two days ago, Google announced "Rich Snippets", a move that is sure to shake up the SEO industry, and cause hundreds of thousands of people to reconsider their skepticism of the semantic web. Yes, that probably includes many of you.

Structured data is the new search engine optimization

Facebook: Apr 21, 2010

The Open Graph protocol enables you to integrate your Web pages into the social graph. It is currently designed for Web pages representing profiles of real-world things — things like movies, sports teams, celebrities, and restaurants. Once your pages become objects in the graph, users can establish connections to your pages as they do with Facebook Pages. Based on the structured data you provide via the Open Graph protocol, your pages show up richly across Facebook: in user profiles, within search results and in News Feed.

Facebook Open Graph Protocol

Good News

Drupal 7 has RDFa built in!

Ordinary users don't have to sweat the details.

Immediate benefits for all

Goodies for advanced users

Drupal 7 has a lot of promise.

2. Semantic Web Overview

Current Web vs Sem Web

URLsURLs
HumansMachines
DocumentsThings
TextData
PresentationSemantics
ProseProperties
LinksRelationships

Problems with current web

...which can be solved by sem web...

Humans are smart agents

Humans easily parse and understand content which is not possible for computers.

Let me out!

Structure exists in CMSs but trapped inside

Hundreds of thousands of Drupal sites contain vast amounts of structured data, covering an enormous range of topics, including product information. Unfortunately, that structure is hidden deep in Drupal's database and doesn't surface to the HTML code generated by Drupal.

Source: Dries, http://buytaert.net/drupal-the-semantic-web-and-search

Identity

Strings aren't unique, URIs are

Example: "set" (maths, art, sport, science)

Extracting meaning

Semantic web stack

Semantic Web Stack

Resource Description Framework

Data model, not a syntax

The triple is at the heart of RDF: Subject -> Predicate -> Object.

Subjects and Predicates are URIs. Objects are URIs or literals.

Very simple!

RDF serializations

URIs and URLs

Resources and URLs

What's the big deal?

Q: Who's doing it now?

Credit: Richard Cyganiak and Anja Jentzsch, About the Linking Open Data dataset cloud

Q: Who will be doing it?

A: Drupal 7 sites

Your site becomes part of the Giant Global Graph.

Ontology

RDF lets you make statements about things but how do we know what they mean?

An ontology is a formal representation of the knowledge by a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain, and may be used to describe the domain.

en.wikipedia.org/wiki/Ontology_(information_science)

Ontology is a foundation

Ontology forms a number of important functions.

RDF Schema

[RDF Schema] provides the facilities needed to describe such classes and properties, and to indicate which classes and properties are expected to be used together. In other words, RDF Schema provides a type system for RDF.

Source: RDF Primer

RDFS: Classes

ex:MotorVehicle rdf:type rdfs:Class .
exthings:companyCar rdf:type ex:MotorVehicle .

ex:Van rdf:type rdfs:Class .
ex:Van rdfs:subClassOf ex:MotorVehicle .

Source: RDF Primer

RDFS: Properties

ex:Person rdf:type rdfs:Class .
ex:Book rdf:type rdfs:Class .

ex:author rdf:type rdf:Property .
ex:author rdfs:range ex:Person .
ex:author rdfs:domain ex:Book .

Source: RDF Primer

RDFa

RDF in attributes

RDF can be embedded in XHTML and still validate with proper XHTML DTD.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
"http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">

Doesn't break rendering in HTML 4 documents.

Attributes used by RDFa

  • about and src: URI specifying the subject resource
  • rel and rev: specifying a (reverse) relationship
  • href and resource: specifying the object resource
  • property: specifying a property for the content of an element
  • content:optional, overrides the content of the element
  • datatype: optional, specifies the datatype of text
  • typeof: optional, specifies the RDF type

Datatypes

Specified as the datatype attribute in RDFa.

XML Schema namespace included with page.
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"

XML Schema datatypes good matchup with Field types.

Selected datatypes in XML Schema

  • string
  • boolean, decimal, float, double
  • duration
  • dateTime, time, date
  • anyURI
  • QName

Questions and Discussion

Review of sem web material.

3. Drupal 7 structure

  • Node metadata
  • Node content types
  • Users
  • Vocabularies
  • Terms
  • Comments

Vocabularies used in Drupal 7

  • rdf: RDF
  • dc: Dublin Core
  • foaf: Friend Of A Friend
  • sioc: Semantically Interlinked Online Communities
  • skos: Simple Knowledge Organisation System
  • rdfs: RDF Schema
  • content: Content

Dublin Core

Basic set of metadata elements which describe most resources.

  • dc:title
  • dc:created
  • dc:date
  • dc:modified

Friend Of A Friend

Describes people, their activities and relationships.

  • foaf:Document
  • foaf:page
  • foaf:name

Semantically Interlinked Online Communities

Interconnects discussion methods such as blogs, forums and mailing lists to each other.

  • sioc:Item
  • sioc:Post
  • sioc:Comment
  • sioc:reply_of
  • sioc:has_creator
  • sioc:num_replies
  • sioc:topic

Simple Knowledge Organisation System

Representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary.

  • skos:Concept
  • skos:Collection
  • skos:member
  • skos:prefLabel
  • skos:definition
  • skos:broader

RDF Schema

Provides basic elements for the description of ontologies.

  • rdfs:Class
  • rdfs:Property
  • rdfs:label
  • rdfs:comment
  • rdf:type

Content

A module for the actual content of websites, in multiple formats.

  • content:encoded

Namespaces imported into HTML element

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" version="XHTML+RDFa 1.0" dir="ltr"
  xmlns:content="http://purl.org/rss/1.0/modules/content/"
  xmlns:dc="http://purl.org/dc/terms/"
  xmlns:foaf="http://xmlns.com/foaf/0.1/"
  xmlns:owl="http://www.w3.org/2002/07/owl#"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  xmlns:rss="http://purl.org/rss/1.0/"
  xmlns:sioc="http://rdfs.org/sioc/ns#"
  xmlns:sioct="http://rdfs.org/sioc/types#"
  xmlns:skos="http://www.w3.org/2004/02/skos/core#"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema#">

Default semantics review

Credit: Stéphane Corlosquet (scor), RDFa in Drupal 7: last call for feedback before alpha release

Default mappings

The RDF mappings defined for node.

array(
  'type' => 'node',
  'bundle' => RDF_DEFAULT_BUNDLE,
  'mapping' => array(
    'rdftype' => array('sioc:Item', 'foaf:Document'),
    'title' => array(predicates' => array('dc:title'),),
    'created' => array('predicates' => array('dc:date', 'dc:created'),'datatype' => 'xsd:dateTime','callback' =>'date_iso8601',),
    'body' => array('predicates' => array('content:encoded'),),
    'uid' => array('predicates' => array('sioc:has_creator'),),
    'name' => array('predicates' => array('foaf:name'),),
  ),
);

Questions and Discussion

4. User defined structure

Fields (CCK) allows admins to define structure.

Node fields to RDF

  • node -> RDF resource
  • content type -> RDF class
  • field -> RDF property
  • Custom vocab namespace provided in mapping.
  • Field Mapping: Add namespace

    hook_rdf_namespaces() adds new namespaces to the page, if required.

    function demo_rdf_namespaces() {
      return array(
        'demo'    => 'http://demo.example.com/schema#',
      );
    }
    

    Field Mapping: Add namespace result

    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" version="XHTML+RDFa 1.0" dir="ltr"
      xmlns:demo="http://demo.example.com/schema#"
      xmlns:content="http://purl.org/rss/1.0/modules/content/"
      xmlns:dc="http://purl.org/dc/terms/"
      xmlns:foaf="http://xmlns.com/foaf/0.1/"
      xmlns:owl="http://www.w3.org/2002/07/owl#"
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
      xmlns:rss="http://purl.org/rss/1.0/"
      xmlns:sioc="http://rdfs.org/sioc/ns#"
      xmlns:sioct="http://rdfs.org/sioc/types#"
      xmlns:skos="http://www.w3.org/2004/02/skos/core#"
      xmlns:xsd="http://www.w3.org/2001/XMLSchema#">
    

    Vocabulary reuse

    Don't reinvent the wheel! Look around for pre-existing ontologies.

    Geo, calendars, contacts, bio, creative commons, music, provenance, business, trust, programmes, reviews, resumes, recipes, projects, etc, etc, etc.

    Reuse leads to interoperability!

    Vocab candidates

    Recommend looking to:

    • Yahoo SearchMonkey
    • Google Rich Snippets
    • Linked data: DBpedia, Freebase, IMDB

    Yahoo SearchMonkey

    Recommends following vocabularies:

    • Dublin Core
    • FOAF
    • vCard, vCalendar
    • Good Relations
    • hReview
    • SIOC
    • DBpedia, Freebase

    Source: http://developer.yahoo.com/searchmonkey/smguide/profile_vocab.html

    Google Rich Snippets

    Supports RDFa for:

    • Reviews
    • People
    • Businesses
    • Events
    • Recipes

    http://rdf.data-vocabulary.org/#

    Field Mapping: hook_rdf_mapping()

    hook_rdf_mapping() allows mapping:

    • rdf type
    • properties
      • datatype
      • property type
      • content via callback

    RDF Mapping example

    function demo_rdf_mapping() {
      return array(
        array(
          'type' => 'node',
          'bundle' => 'person',
          'mapping' => array(
            'rdftype' => array('demo:Person'),
          ),
        ),
      );
    }
    

    RDF Mapping UI

    Contributed module will allow for these mappings to be made in the UI as well as module.

    View source!

    Lets review the output from Drupal 7 for a demo page about John Doe with one comment.

    Questions and Discussion

    Review of mapping fields.

    5. The Future

    More RDF produced

    • 300K sites on Drupal 6
    • Lots of sites publishing RDF
    • Drupal will be a key part of the semantic web

    Not just stable linked data resources - dynamic resources discussing and conversing about those resources.

    Chicken and Egg

    Consumption needs Production needs Consumption needs...

    RDF is being produced now. More apps will start consuming.

    Drupal 7 is the start of something big.

    More RDF consumed

    • HTML pages can be parsed and queried
    • SPARQL endpoints queried

    SPARQL

    • Like SQL for the semantic web
    • Queries run against an endpoint (URL)
    • Allows more efficient gathering of data
    • SPARQL module for Drupal

    SPARQL example

    PREFIX dc: <http://purl.org/dc/elements/1.1/>
    SELECT ?book ?who
    WHERE { ?book dc:creator ?who }

    That's it!

    Drupal 7, the Semantic Web and RDFa

    Murray Woodman

    cruncht.com

    Slideshow (will be) available at:

    http://cruncht.com/slides/drupal-semantic-web.

    Further Reading