<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: The Uriverse Experiment is Over</title>
	<atom:link href="http://cruncht.com/618/uriverse-experiment-over/feed/" rel="self" type="application/rss+xml" />
	<link>http://cruncht.com/618/uriverse-experiment-over/</link>
	<description>Semantic web development and publishing</description>
	<lastBuildDate>Mon, 07 May 2012 10:57:13 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
	<item>
		<title>By: Murray Woodman</title>
		<link>http://cruncht.com/618/uriverse-experiment-over/#comment-19984</link>
		<dc:creator>Murray Woodman</dc:creator>
		<pubDate>Wed, 15 Feb 2012 13:38:15 +0000</pubDate>
		<guid isPermaLink="false">http://cruncht.com/?p=618#comment-19984</guid>
		<description>Hi Alex. 

I wrote a bit about how I imported it over this way: http://cruncht.com/361/uriverse-dbpedia-drupal-case-study/

It was a labour of love which took a lot of SQL hacking to get to work. I could not afford to do node_loads and node_saves for 13M nodes so I opted to do the DB inserts directly. That way I could get 1000s of inserts a second rather than 10. I learnt a lot about MySQL, indexes, efficient queries and how to import stuff as quickly as possible. Also, how to recover from where you left off when a long running process died. All up it took a couple of months of tooling around with code and SQL. Once all of it was in it took another couple of months for Solr to index it all :)

I wouldn&#039;t recommend this approach as it is a one hit import which takes a lot of effort.

If you have a smaller dataset then you have a few options. 

Firstly, the migrate module couple be a good way to go if you have complex relationships and IDs to maintain. I really want to get into this big time one of these days. Imports will still be slow but you have the most flexibility. 

Secondly, you have the feeds module. Good if you have a flat data structure and one to one mappings. You might be able to get away with crafting up a CSV and importing that way. I believe that Lin Clark has some good screencasts with Feeds and SPARQL (IIRC). Theres a lot of flexibility in Feeds too so this might be an easier option for you if you like wiring up config instead of writing code.

Thirdly, you can go old school and just write your own PHP script and do it with node_saves. Fire it up with &quot;drush scr&quot;. I like this approach because it feels natural to me. However, Migrate gives you a lot of nice goodies such as rollback, tracking IDs and making stub objects which will save you from pulling your hair out.

Finally, you have SQL inserts if you are handling a very big dataset. This will be more tedious now because of the way fields is handled. ie. a lot more inserts over more tables. You&#039;ll get to know Drupal&#039;s schema well though.

Oh yeah - another option is to just keep Dbpedia in a triple store and then provide a view onto that. That way you have more chance of keeping up with updates from Dbpedia. It isn&#039;t updated that often though.

You may also want to take a look at Freebase. They have a nice API, up to date data and links to Dbpedia as well. You might consider importing from there using the search, image, mql etc apis. I&#039;ve been doing a bit of that lately and it is quite pleasant.

All the best with it.</description>
		<content:encoded><![CDATA[<p>Hi Alex. </p>
<p>I wrote a bit about how I imported it over this way: <a href="http://cruncht.com/361/uriverse-dbpedia-drupal-case-study/" rel="nofollow">http://cruncht.com/361/uriverse-dbpedia-drupal-case-study/</a></p>
<p>It was a labour of love which took a lot of SQL hacking to get to work. I could not afford to do node_loads and node_saves for 13M nodes so I opted to do the DB inserts directly. That way I could get 1000s of inserts a second rather than 10. I learnt a lot about MySQL, indexes, efficient queries and how to import stuff as quickly as possible. Also, how to recover from where you left off when a long running process died. All up it took a couple of months of tooling around with code and SQL. Once all of it was in it took another couple of months for Solr to index it all <img src='http://cdn-small.cruncht.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>I wouldn&#8217;t recommend this approach as it is a one hit import which takes a lot of effort.</p>
<p>If you have a smaller dataset then you have a few options. </p>
<p>Firstly, the migrate module couple be a good way to go if you have complex relationships and IDs to maintain. I really want to get into this big time one of these days. Imports will still be slow but you have the most flexibility. </p>
<p>Secondly, you have the feeds module. Good if you have a flat data structure and one to one mappings. You might be able to get away with crafting up a CSV and importing that way. I believe that Lin Clark has some good screencasts with Feeds and SPARQL (IIRC). Theres a lot of flexibility in Feeds too so this might be an easier option for you if you like wiring up config instead of writing code.</p>
<p>Thirdly, you can go old school and just write your own PHP script and do it with node_saves. Fire it up with &#8220;drush scr&#8221;. I like this approach because it feels natural to me. However, Migrate gives you a lot of nice goodies such as rollback, tracking IDs and making stub objects which will save you from pulling your hair out.</p>
<p>Finally, you have SQL inserts if you are handling a very big dataset. This will be more tedious now because of the way fields is handled. ie. a lot more inserts over more tables. You&#8217;ll get to know Drupal&#8217;s schema well though.</p>
<p>Oh yeah &#8211; another option is to just keep Dbpedia in a triple store and then provide a view onto that. That way you have more chance of keeping up with updates from Dbpedia. It isn&#8217;t updated that often though.</p>
<p>You may also want to take a look at Freebase. They have a nice API, up to date data and links to Dbpedia as well. You might consider importing from there using the search, image, mql etc apis. I&#8217;ve been doing a bit of that lately and it is quite pleasant.</p>
<p>All the best with it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Alex McLintock</title>
		<link>http://cruncht.com/618/uriverse-experiment-over/#comment-19980</link>
		<dc:creator>Alex McLintock</dc:creator>
		<pubDate>Wed, 15 Feb 2012 12:40:07 +0000</pubDate>
		<guid isPermaLink="false">http://cruncht.com/?p=618#comment-19980</guid>
		<description>Hi, 

I am interested in learning more about how you imported DBPedia into Drupal. I am attempting something similar for a small subset of Wikipedia - but using Drupal 7.</description>
		<content:encoded><![CDATA[<p>Hi, </p>
<p>I am interested in learning more about how you imported DBPedia into Drupal. I am attempting something similar for a small subset of Wikipedia &#8211; but using Drupal 7.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Goonbroab</title>
		<link>http://cruncht.com/618/uriverse-experiment-over/#comment-11769</link>
		<dc:creator>Goonbroab</dc:creator>
		<pubDate>Wed, 09 Nov 2011 14:56:02 +0000</pubDate>
		<guid isPermaLink="false">http://cruncht.com/?p=618#comment-11769</guid>
		<description>What is this</description>
		<content:encoded><![CDATA[<p>What is this</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using disk: basic
Page Caching using disk: enhanced
Content Delivery Network via cdn-small.cruncht.com

Served from: cruncht.com @ 2012-05-19 04:12:06 -->
