It’s just data

RDFa in HTML5

Manu Sporny: Another Q/A page has been started concerning questions surrounding RDFa, including common red-herrings and repeated discussions (with answers) that can be avoided when discussing what RDFa can and cannot do (more)

Related links:

Other highlights:


Ian is wrong. Absolutely, completely, and dead wrong.

The whole concept of reusability is based on the belief, and practice, that there are common sets of functionality that can be implemented in a library and the library reused, rather than re-created with each use.

Ian is saying that Dojo is wrong, jQuery is wrong, REST is wrong, Rails is wrong--every instance of a reusable common format and functionality is wrong. HTML is wrong, XML is wrong, it’s all wrong, wrong, wrong. Why? Because my god, each seeks to efficiently solve a type of problem, rather than a specific instance of a problem.

We’re given a capability to allow us to solve five problems with one solution, and rather than Ian shouting out “Harrah!”, he says we must have five different solutions to the five problems, because to do otherwise is to...what? Give up control? Fail to meet the Guinness Book of World Records for largest, most pedantic specification ever derived by man?

Posted by Shelley at

We Hear you Sister

Shelley Powers on Sam Ruby’s blog Ian is wrong. Absolutely, completely, and dead wrong. ... rather than Ian shouting out “Hurrah!”, he says we must have five different solutions to the five problems, because to do otherwise is to...what? Give up...

Excerpt from Last Week in HTML5 at

Ian is not saying he is against reusable solutions, but that he is not convinced RDF is a viable solution for any of these 5 problems, let alone other ones. By that, he really means he is convinced RDF is not a solution, of course, to either these 5 problems or any others for that matter.

As for the size of the specification, I fail to see how omitting stuff like RDFa from it will make it larger. It seems to me like this is mostly a battle over what level of the namespace RDF will fit in, or if it belongs at all. As such, it is aesthetic or political and the technical discussions are purely rhetorical, not substantive.

Poor Sam...

Posted by Fazal Majid at

“Ian is not saying he is against reusable solutions, but that he is not convinced RDF is a viable solution for any of these 5 problems, let alone other ones. By that, he really means he is convinced RDF is not a solution, of course, to either these 5 problems or any others for that matter.

As for the size of the specification, I fail to see how omitting stuff like RDFa from it will make it larger. It seems to me like this is mostly a battle over what level of the namespace RDF will fit in, or if it belongs at all. As such, it is aesthetic or political and the technical discussions are purely rhetorical, not substantive.”

I agree with you Fazal. But we keep going through these intellectual exercises that are the equivalent of hamsters on a wheel. I also appreciate that Sam is trying to bring together the disconnects in these discussions. But some disconnects exists on different planes of existence, and there is no connectivity.

I wrote something a while back that said, I don’t have to justify wanting RDFa, because RDF is, to me, a goal, not a solution. Or at least not a solution that can be baked down into crunchy bite-sized use care morsels. Now, we could look at alternative serializations, but the one we have for XHTML, RDFa, would eloquently fit into the HTML5 effort. More importantly, would fit and most likely without any alteration is the tools we use now with RDFa.

I, a person who has worked with data, professionally, for a quarter of a century, cannot understand, can NOT understand, the rejection of something with such an elegant fit. It is not for technical reasons, as you say. The rejection is politically motivated, and I have no patience for it.

Posted by Shelley at

Sorry, use case morsels.

Posted by shelleyp at

Bravo Shelley!  As more and more people keep discovering, HTML5 appears to only be the vision of one mind-set, and external voices are humored but generally ignored.  It truly is a political beast that we battle.

Posted by John Foliot at

The whole concept of reusability is based on the belief, and practice, that there are common sets of functionality that can be implemented in a library and the library reused, rather than re-created with each use.

Ian seems to prefer high quality natural language processing.

It is a different perspective, captured exquisitely by Mark Pilgrim over six years ago in his Million dollar markup post.  Be sure to (re-)read that post to the end, as it hits the nail on the head.

it is aesthetic or political

I actually don’t believe that it is political.

I have no patience for it.

Patience is something I happen to have in abundance.

Posted by Sam Ruby at

On abstract solutions and reuse, I think there is another important aspect that has not yet been mentioned. That is that solutions such as RDF, even if they are a little sub-par in solving 5 different solutions, end up providing solutions to n different problems faced by authors (where n is a very large number). If 5 of the use cases RDF solves are worthy of more tailored solutions, then add RDF and also add 5 other solutions to those specific use cases. Include ‘href’ for hyperlinks. Include ‘cite’ for references. Include ‘rel’ and ‘rev’ and so on. Those are all more specific solutions to the same metadata problems solved by RDF and there’s no reason we don’t have room for all of them in HTML.

Take a look at Unicode as an example. Would a new independent character set, encoding, text processing algorithms, and font/text imaging solution be better if each of the thousands of written languages had their own? Certainly! Unicode is a bit sub par for each language when compared to a custom solution for each language (if only that it requires something like Buginese to use a 16-bit or 24-bit encoding when 8-bits would have worked for its own encoding). However, Unicode makes implementation of those thousands of writing systems possible whereas a custom solution for each would mean that most writing systems would never get a solution (or never get a solution implementation anyway).

Similarly with RDF. There may better solutions to the hundreds or thousands of problems RDF may address. However, RDF provides authors with a powerful solution to address those use cases and many others not yet conceived. In solving existing problems RDF has proven it has those capabilities and that those capabilities are wrapped up in a fairly simply and easy to understand framework. As far as I can tell these ridiculous faux claims to be unconvinced are simply more of the same standards process disruption we see so much of these days.

Posted by Rob Burns at

I remember Mark’s old post Sam. However, I’m not sure there is a connection between what he wrote and this issue.

You hit it directly and succinctly on the head, and are repeating what I said in the WhatWG user group list: Ian prefers natural language processing. His mindset is such that no matter the number of use cases thrown at him, no matter the arguments, he will not see any other approach. At least that’s what I see from these never ending discussions.

What’s sad is there is nothing about RDF which precludes natural language processing. In fact, the two are complementary. But, if there is no way to insert data structured in such a way that it can be extracted as RDF into HTML5, then the natural language process approach does preclude support for RDF. And before someone comes along and says, “What about microformats and XSLT, and conversion to RDF”, these approaches do not work. The use of XSLT is, and will remain, outside the reach of even the average tech, and microformats are both closed and extremely limited.

If it were a case of two people disagreeing in a list, great, difference is good, hurray for different. But one of the people in this argument is the HTML5 gatekeeper. That changes everything.

We have two issues with support for RDFa. The first is whether it can physically be added to HTML5 without breaking the DOM. A group went off to work on the issue. I have no idea what they came up with, but I have a feeling that yes, it is doable. Unless someone comes in and says, unequivocally it is impossible, and with proof of same, then we’ll assume it is possible.

The second issue? Again, you hit it on the head — Ian sees natural language processing, with a few scattered simple microformats as the way of the future. Without wanting to pick on his employer, he really is well matched to the company he works for.

Google, as a corporate entity, has always disdained the concept of structured data, and relied, heavily on algorithmic approaches, such as those involved in natural language processing. We are told that mere mortals will never be willing or capable of incorporating such as RDF because it is beyond most of our skills.

Yet Google’s first reaction, when hit with pagerank manipulation and fraud, was to have webloggers incorporate rel="nofollow" as a way of classifying links. So much for not needing structured data. So much for assuming that the application is beyond the ken of mortal people.

That “too hard for the wee folk” argument is both old and obsolete. The Google mindset is still based in a time when pages were handcrafted, using a variety of not particularly effective tools. Before most of the web was based on generated content. Content generated by tools, where one could flick a couple of buttons or add a few classes to some elements in a template, viola, you have structured data. Oh, the natural language processing material is still there, but now we have more. It’s not zero sum game: if you use structured data, you don’t lose natural language processing potential.

Why am I saying this, though? It does not matter how many people grudgingly or willingly agree that maybe, just maybe, the addition of RDFa would not be a bad thing. Ian is gatekeeper. End of story.

Posted by Shelley at

Speaking of political, do you know what happens in the legislation process in Congress? First a bill is proposed and then sent to the appropriate committee for review and refinement. That committee has the power to kill the bill before it even has a chance to get to the floor for a vote. Worse, the committee can stall the bill, indefinitely. A lot of bills never make it out of committee.

It doesn’t matter what oversight exists in Congress, or how many people are interested in the bill inside of Congress, or outside of Congress, or in the Executive branch — if it doesn’t get past the committee, it is effectively stalled.

Just a little social studies talk, to pass the time.

Posted by Shelley at

I understand Ian’s desire to make sure the things that go into HTML5 actually pull their weight. But the syntactic overhead of RDFa is so low compared to the complexity of integrating SVG or MathML that it seems petulant to me to insist on treating it as if it were of comparable cost to the other vocabularies.

However, [a sub-par unified solution] makes implementation of those thousands of [problems] possible whereas a custom solution for each would mean that most [problems] would never get a solution (or never get a solution implementation anyway).

Exactly. RDF is all about creating marginal spaces, and explicitly not about solving each problem in its own best possible way. Ian’s dogged insistence on the latter, while serviceable when it comes to central spaces, misses the entire point of creating marginal ones.

Posted by Aristotle Pagaltzis at

Yet Google’s first reaction, when hit with pagerank manipulation and fraud, was to have webloggers incorporate rel="nofollow" as a way of classifying links.

I don’t know about first. But they acquiesced eventually. They tried to wing duplicate detection fully algorithmically too; now it falls to rel="canonical" to resolve undecidable ambiguities.

Were I in a sardonic mood, I might close this comment like this: NLP NLP, lama sabachthani?

Posted by Aristotle Pagaltzis at

I don’t know about first. But they acquiesced eventually. They tried to wing duplicate detection fully algorithmically too; now it falls to rel="canonical" to resolve undecidable ambiguities.

Speaking of which:

1) “canonical” seems to replicate HTTP’s Content-Location header.

2) It does not appear in [link].

Posted by Julian Reschke at

Ian is gatekeeper. End of story.

... there are at least three groups of people who have the authority to override Ian.

It doesn’t matter what oversight exists in Congress, or how many people are interested in the bill inside of Congress, or outside of Congress, or in the Executive branch — if it doesn’t get past the committee, it is effectively stalled.

Thankfully, this isn’t the US Congress.  The W3C has requirements for advancement, including the requirement that all issues raised about the document be formally addressed (#6).  Ian believes that this spec is on track for Last Call in October of this year.  Note: these concerns aren’t binary and gating.  Nothing in HTML5 is perfect.  Acknowledging the concerns and showing how they can be mitigated is generally the best strategy in matters such as these.

What would it take for inclusion of the RDFa attributes in HTML 5 to be tracked in the W3C HTML Working Group issues list?  Given the links I provided at the top of this post, I’d say that pretty much all of the pieces are in place except for a discussion on the public-html mailing list.

What work would be helpful in getting this to be resolved successfully?  Fleshing out the use cases addressing as much of these concerns as are relevant.

How can you help?  Join the WG and/or contribute to the wiki.

Just so that it is clear, as we move towards summer I plan to become ruthless in clearing out issues which have been raised but don’t appear to have any substantive proposals or support.  There is much good work in HTML5 and it would be positively criminal for it not to advance due to procedural maneuverings.  I don’t intend to let that happen either.

Posted by Sam Ruby at

Go for it Sam. I’m sure you’ll please some of the people, some of the time.

Posted by Shelley at

Joining the WG may work for some. The wiki demonstrates a disconnect between those who are proposing RDFa and, well, Ian.

Kjetil Kjernsmo had some very astute observations in the thread you pointed out. In particular, the following:

“I can’t speak for the RDFa community, but the reason you can’t see a lot
of problem descriptions separate from technical solution is probably
that the community feels that RDF is a well established technology, and
so the focus is on showing how it is used rather than abstract
speculation on how it could be used. There’s a lot of that too, in
research projects. What you’re asking us to do amounts to describe a
global hypertext system without mentioning HTML. That may have been an
interesting exercise in 1992, but it seems like a waste of time today.”

In other words, Ian is asking the RDFa folks to not just provide use cases, but to revalidate the RDF model with each and every case. It is equivalent to asking the people who put forth interest in local client storage, to revalidate SQL, or even the concept of a database, with their requests. At a certain point, the request becomes more a point of obfuscation than general interest.

Personally I think the client-side database is a dumb as bricks idea. However, I can accept that if the HTML5 does incorporate a client-side database, that it would support pre-existing and mature technologies in order to implement the storage. In other words, a relational database and SQL.  However, I doubt that there could ever be enough use cases to demonstrate that the client-side database can’t be “gamed”, or to provide an answer about companies wanting to charge for the use of the client-side storage, ala justification questions put forth by Ian:

“Do we have reason to believe that it is more likely that we will get
authors to widely and reliably include such relations than it is that we
will get high quality natural language processing? Why?

How would an RDF/RDFa system deal with people gaming the system?

How would an RDF/RDFa system deal with the problem of the questions
being unstructured natural language?

How would an RDF/RDFa system deal with data provided by companies that
have no interest in providing the data in RDF or RDFa? (e.g. companies
providing data dumps in XML or JSON.)

How would an RDF/RDFa system deal with companies that do not want to
provide the data free of charge?

How would an RDF/RDFa system deal with companies that want to track
per-developer usage of their data?”

First of all, the reference to the “RDF/RDFa system” kind of cracked me up...

I would imagine that there are few aspects of HTML5, especially in some of the more escoteric elements that have been asked questions such as these — how would one prevent “gaming”. We can’t prevent gaming with a hypertext link? Does this mean we should bag it, and go home?

What do we do if people don’t want to provide RDFa? Well, tell me: what do we do about people not wanting to provide client-side storage? Or people who don’t use the canvas element — do we slap their hands, take their validator away from them?

Look at these questions asked, Sam. Now tell me that these are legitimate questions?

So all we need to do is answer these questions in the Wiki or in the WhatWG group, and the doors of HTML heaven will open and we’ll all be equal in the light. I’m not going to speak for anyone else, certainly not Aristotle, and Rob, and Kingsley, and Ben, and so many others who are busting their butts trying to make this all work. But for me, when a legitimate question is asked, I’ll do my best to answer. Until then?

...

Posted by Shelley at

The exchange you highlighted does show ample evidence of arrogance, entitlement, and frustration by all involved.

I also tend to agree with Kjetil that “problem descriptions separate from technical solution” is not always the most appropriate way forward. 

when a legitimate question is asked, I’ll do my best to answer.

Great!

I do see both evidence of progress and legitimate questions here, with the proviso that the parenthetical that Ian closes with does appear to veer off and may very well end up in the weeds at the end.  Perhaps monetization is a non-goal of RDFa; I, for one, would be OK with that.

But back to the legitimate questions.

Sample markup, code snippets and an exploration of the likely problems all would be helpful.  Such things need not appear directly on public-html or even on the RDFa wiki; as long as they appear on the web, they can be cited.

Posted by Sam Ruby at

A battle of Beliefs: RDF, Natural Language Processing, and the future of the web

Last Week in HTML has been practicing its wicked ways , and pulled a quote from a comment I made to a post at Sam Ruby’s Ian is wrong. Absolutely, completely, and dead wrong. ... rather than Ian shouting out “Hurrah!”, he says we must have five...

Excerpt from RealTech at

As model, where can one fine due diligence documents for elements and functionality included in HTML5? I know that Mark is documenting some of this, but he’s only started this documentation recently. It helps to look at the due diligence efforts put forth for other elements, when trying to implement the same for new features. I tried to search for this type of documentation for several features, but couldn’t find anything related to the discussions about including the elements and/or functionality, mostly chatter after the fact.

Posted by Shelley at

This looks like an argument about which new technologies should be knighted by HTML. The HTML spec should probably try to get out of that business as much as possible.

Posted by Rob Sayre at

The HTML spec should probably try to get out of that business as much as possible.

As near as I can tell, the approach that the current (a.k.a kitchen sink) spec takes is intentional:

We request that people not invent new elements and attributes to add to HTML without first contacting the working group and getting a proposal discussed with interested parties.

Posted by Sam Ruby at

This looks like an argument about which new technologies should be knighted by HTML. The HTML spec should probably try to get out of that business as much as possible.

I wonder what made you conclude that? Is it because of the poorly conceived elements that HTML5 has already proposed adding to the HTML vocabulary. When one looks at ill-conceived elements such as ‘datagrid’, ‘details’, ‘header’, ‘footer’, etc. I can understand why you might think that. But just because those poor examples are there, it doesn’t stop many of us who would like to see HTML5 add something both new and worthwhile to the HTML vocabulary. I would say things like ‘figure’ might be welcome additions to the vocabulary, so we simply need to build on strengths like that.

Posted by Rob Burns at

Sam: I agree that it is Ian’s intent.

Rob Burns: I do not consider myself able to prejudge additions to HTML.

Posted by Rob Sayre at

Rob Sayre, I’m afraid I don’t understand your original comment then. If HTML5 should not knight new technologies such as RDFa by including them within HTML, then why should it knight other technologies such as client-side SQL or the ‘video’ element and associated DOM interfaces? Isn’t anything the HTML WG says about document conformance simply knighting one technology or another (or conversely condemning the technology if it decides its not worthy). Also, if you feel you are unable to prejudge additions to HTML, then shouldn’t you also not decide that certain things like RDFa should or should not be included within HTML5? After all, you’re saying that one specific thing RDFa should not be knighted by HTML5, while allowing all sorts of other things in HTML5 to receive that honor.

I’m just trying to make sense of those two comments.

Posted by Rob Burns at

If HTML5 should not knight new technologies such as RDFa by including them within HTML, then why should it knight other technologies such as client-side SQL or the ‘video’ element and associated DOM interfaces?

It shouldn’t.

Isn’t anything the HTML WG says about document conformance simply knighting one technology or another (or conversely condemning the technology if it decides its not worthy).

Not in my opinion.

After all, you’re saying that one specific thing RDFa should not be knighted by HTML5, while allowing all sorts of other things in HTML5 to receive that honor.

I don’t think that is what I am saying.

Posted by Rob Sayre at

The fact is, RDF and RDFa have a small following. Devoted, to be sure, but if you use the lazy journalist/homework technique of counting Google hits for “HTML5 RDFa” vs. say, “HTML5 SVG”, “HTML5 SQLite” or “HTML5 Canvas”, there seems to be an order of magnitude more interest in non-proprietary vector graphics than in semantic web technologies. I don’t think ad-hominem attacks against Ian, Google or anyone else help build creibility or support for inclusion of RDFa in the standard. It’s a good thing we have people like Sam around to keep things civil.

Perhaps the RDF community should focus on HTML5a - a spec that essentially says “take HTML5 and add all this extra RDFa stuff”. It’s not as if the HTML5 police will force browsers to deliberately fail to render markup that has the extra attributes. All the advantages of a well-defined spec, none of the political wrangling, and no need to incur the burden of a full fork of the spec. Maybe assign a not-so-deprecated HTML5a doctype in the process...

Posted by Fazal Majid at

It took about two years to get MathML into the HTML5 Spec. SVG is taking even longer.

I don’t think the size of the following is the relevant factor.

Posted by Jacques Distler at

Fazal, as the RDFa email group is demonstrating, we don’t have to fork HTML5, we can use HTML5 and just add the attributes to the documents, and say the heck with validation. Or start making XHTML a whole lot friendlier.

You’re doing a search on HTML5 and RDFa...why on earth would you see that as some form of metric? Search on that term and you either get me, or Sam. But you’re not going to be able to quantify interest.

I know that RDFa will be incorporated into Drupal 7. Of course, you can see Drupal as only having a small following, too...

And the “attacks” were not ad hominen, but they were critical. There is a difference. Point of fact, about the only time any movement is made to actually incorporate RDFa into HTML5 is when plain, frequently strong, talk is exercised--all the hearts and flowers crap, aside.

Posted by Shelley at

“In other words, Ian is asking the RDFa folks to not just provide use cases, but to revalidate the RDF model with each and every case. It is equivalent to asking the people who put forth interest in local client storage, to revalidate SQL, or even the concept of a database, with their requests.”

We actually did do that.

“I doubt that there could ever be enough use cases to demonstrate that the client-side database can’t be “gamed””

We did that too (the database is accessible only by the site that set it, not any other site — this actually was a bug in the original design of the local storage feature which had to be fixed, despite Mozilla already having an implementation, which caused some pain for a number of sites that were already using the feature).

“or to provide an answer about companies wanting to charge for the use of the client-side storage”

I’m not aware of any site that wants to do this, so this is a non-issue (though in practice, nothing would stop a site from doing this with the way we designed it).

Sorry, RDFa is just going through the same process every other feature proposal went through.

Posted by Ian Hickson at

“Sorry, RDFa is just going through the same process every other feature proposal went through.”

Then can you point out to me where is the debate on adding client side storage? Footer? The canvas element? If we can see the use cases that led to success with these items, we may be better able to extrapolate successful use cases for RDFa.

Posted by Shelley at

Whilst Julian is, I guess, technically correct that the implementation costs of these attributes are zero — at least as long as there are no conformance criteria on applications for supporting them beyond that of any other unrecognized attribute (which, itself seems like a bad situation; surely users would have some expectation of a DOM API beyond simple getAttribute/setAttribute for getting at whatever metadata they or others are providing) — it seems to me to be only part of the cost of specifying them. The burden on tutorial writers has already been debated; however what I have not seen mentioned so much is the barrier to future innovation caused by specifying poor solutions to problems.

If we standardise some technology Y that addresses usecase X, that dramatically increases the barrier to coming up with a superior solution for X in the future, even if the initial solution is basically a failure. It becomes easy to tell people asking “how do I do X?” to use Y then never know that they went away, tried, failed, and blamed themselves. Or they do it wrong and never realise that they have failed to solve the original problem. People start building careers around forcing Y to actually solve X and give enormous pushback when anyone suggests that maybe Y isn’t a great solution for X. Trying to get anything to replace Y often ends up requiring some external disruptive innovation (I guess an example of such a disruptive innovation, albeit not in a case where the original technology failed, would be JSON; for some types of data it is much easier than XML, but it’s hard imagining the W3C could have come up with JSON because it is so attached to the XML stack. The case where it is one aspect of a larger, successful, technology that has failed is harder to fix). Finally, even if a new solution is introduced, we end up with fragmentation, with some people doing things one way and some people doing things another and the resulting mess to deal with.

So, if we introduce something and it fails — or is not the best solution — that has a real cost attached to it. The estimated total cost of introducing RDFa now is at least the estimated probability of it failing times the cost of that failure. That is clearly not zero. One can, of course, also associate a cost with failing to act; that is the total cost borne by people for whom Y could have been a good solution but now have to use some inferior solution (or who are simply prevented from doing whatever they set out to do). The remaining problem is to actually carry out these calculations and work out what is the best course of action. I leave that as an exercise to the reader ;) Seriously though it is clear that calculating such costs is a rather theoretical business since the data required to be accurate is rather hard, or impossible, to obtain. However we can make some guesses. I will make some below. Others will disagree with them.

I would guess that the probability of RDFa failing is rather high. That is because it is a dramatic increase in complexity compared to most successful web technologies. Most existing web technologies use trees to provide the underlying structure and attach metadata through key-value pairs. By turning everything into a graph RDF (and hence RDFa) represents a step change in the cognitive load associated with understanding the platform. The use of URIs as, er, identifiers, is also rather unusual on the web. For almost all current purposes it is possible to think URI=address of webpage. Using URIs in other ways not only has the much-discussed syntactic issues but requires shifting to a much more complex mental model of URIs. My personal experience is that whenever I have had a problem that required use of RDF to be part of the solution, that problem seemed unnecessarily hard. To be fair, that hasn’t been a situation I have been in all that often would get easier with practice. However if my experience is representative then I would imagine that many people would never make it over the initial learning curve.

My general feeling is that this problem stems from the fact that RDF has been largely designed by people with (the capacity for) higher degrees in subjects like maths, computer science, physics, philosophy and other abstract, logical, disciplines. At the web scale, not everyone has the ability to deal with the same level of abstract reasoning and technologies that require very strong skills in this area are unlikely to be widely successful.

I also guess that the cost of (partial) failure of RDFa is rather high. Because it claims to do so much, any use case for which it turns out to be suboptimal needs to have a new solution developed, resulting in all the costs documented above. Whilst this may not be unusually high on a per-case basis, the sum has the potential to be very great.

Whilst I understand that not everyone will agree with my guesses about the cost and chance of failure of RDFa (and I am certainly happy to hear reasoned arguments about why I am wrong), I think it would be nice to stop suggesting that introducing RDFa is somehow free simply because it is “just attributes”

Posted by jgraham at

what I have not seen mentioned so much is the barrier to future innovation caused by specifying poor solutions to problems.

Clay Shirky covered this over a decade ago.  Voltaire did too a few centuries before that.

I’m concerned that a high barrier to entry creates opportunity costs.  I think that’s what Aristotle was referring to above.

Posted by Sam Ruby at

It seems I have managed to convey quite the opposite impression to the one I intended.

My concern about introducing RDF is that it seems to me to be an example of the kind of “perfect” solution that is the enemy of the good[1]. It has a great deal of academic thought behind it and purports to be a solution for a large area of problem space. However I am unconvinced that it has the usability required to succeed in the large. For this reason I think it does not have the characteristics that Clay Shirky argued for. Despite this it may act as an block to more effective solutions with a higher chance of success.

[1 ]Or to use somewhat different terms, I see it as closer to "MIT/Stanford" style design rather than “worse is better”.

Posted by jgraham at

¿Varios pájaros de un tiro?

Vía intertwingly, el blog de Sam Ruby , me encuentro con una cita extraída de una de las listas del W3C que traduzco sin más: Un error muy común que los ingenieros de software cometen al diseñar una arquitectura es fijarse en cinco problemas, ver lo...

Excerpt from Yet Another Programming Weblog at

jgraham: hmmm.

I’d characterize Ian’s position thus: the fact that RDFa may “solve” five problems poorly doesn’t impress him.

I’d now characterize your position thus: RDFa is a “perfect” solution that may inhibit development of solutions that aren’t quite such a good fit.

I must say, while I understand and respectfully disagree with Ian’s position; I can’t quite say the same about yours.  RDF is far from perfect, and for that I’ll cite another Shirky gem, one that many in the RDF community despise.

From my perspective, RDF is far from perfect, but not for the reasons that Shirky states.  What made it clear to me is when I once heard a talk from Jo Walsh who was trying to make an open map of England, something we in the US take for granted.  She was using RDF not because the data was all consistent and pretty, but precisely because it was messy, incomplete, and self-contradictory.  It was only by smashing the data into tiny bits and then collating the remains that she could see patterns that emerged.

RDF encourages people to reuse existing vocabularies, and failing that make up their own that can be later recombined using OWL or other techniques.  Both approaches are imperfect.  Existing vocabularies rarely are a perfect fit.  Defining your own vocabulary merely delays the problem.  As an example, I can imagine that one of the first things people will do with RDF when faced with Atom is try to equate the updated element with something from Dublin Core despite the fact that we explicitly explored and rejected that.  But the fact is that Atom’s updated element isn’t always used as spec’ed, and even if it were, not everybody uses Atom so that person may be forced to deal with the imperfect data that they got.

Clearly Jo impressed me.  She impressed me in ways that Steve Loughran does, and in ways that I aspire to.  I gleefully link to well thought out critiques of technologies I very much believe in.

It is a shame that Jo doesn’t blog more.  Meanwhile, my automatic referer chasing logic drew my attention to this classic [via Miguel]

Posted by Sam Ruby at

TWAT

Googles Ian ‘hixie’ hickson on Sam Rubys blog “Sorry, RDFa is just going through the same process every other feature proposal went through.” Mrs last week sez: ‘is logic is infaddigable, like me aunt fred’s varicose veins, theres no escapin’ it....

Excerpt from Last Week in HTML5 at

Fazal, as the RDFa email group is demonstrating, we don’t have to fork HTML5, we can use HTML5 and just add the attributes to the documents, and say the heck with validation.

Yes! Do eeeeet!

Posted by Rob Sayre at

Yes! Do eeeeet!

And if enough people do it, the cowpaths will be paved, and the validators will [eventually] fall in line.

Posted by Sam Ruby at

Indeed.

For now, RDFa in HTML 4 is MUCH more important than RDFa in HTML 5.

Posted by Julian Reschke at

“Then can you point out to me where is the debate on adding client side storage? Footer? The canvas element?”

Most of the discussion for all this is spread across the WHATWG list over the past five years. For canvas the use case discussion uncovered at least one fundamental problem which required Apple, who originally came up with this element, to have to change their initial implementation after their initial deployment, at some cost. Client side storage similarly had changes (see my last comment on this blog post). Footer didn’t have many changes, data collected to see whether it was worth keeping it actually ended up supporting it more than expected ("footer" is the most common class name according to several studies of Web content).

“If we can see the use cases that led to success with these items, we may be better able to extrapolate successful use cases for RDFa.”

Manu is actually doing a great job of doing just that. See his e-mails on public-rdfa if you want to take part.

Regarding RDFa and HTML4 — if people widely use a feature, then that would definitely be a good sign that we should add the feature to HTML5 (or 6, or whatever).

Posted by Ian Hickson at

I’d now characterize your position thus: RDFa is a “perfect” solution that may inhibit development of solutions that aren’t quite such a good fit.

I think that characterization of my position is somewhat misleading. The solutions that “aren’t quite such a good fit” might actually be a better fit for the web than RDF. That is why I prefer to think of it as “better is better” vs “worse is better”, or if you prefer, MIT/Sanford design vs New Jersey design, rather than to use words like “perfect” in a way that means “free from internal flaws according to its own principles but with significant flaws when viewed in a wider context”.

I don’t doubt that RDF can work “in the small” on suitable projects with people who understand the technology and who have the ability to transform non-RDF data into an RDF-like form. I have significant doubts that it can work in the large with a comparable number of producers and consumers as HTML has today.

Posted by jgraham at

I don’t doubt that RDF can work “in the small” on suitable projects with people who understand the technology and who have the ability to transform non-RDF data into an RDF-like form. I have significant doubts that it can work in the large with a comparable number of producers and consumers as HTML has today.

It is worth pointing out that blogging systems have pushed a variety of schemes for embedding a motley assortment of metadata (including, in the case of trackback-autodiscovery, embedding RDF-in-HTML-comments).

The great unwashed masses seem to have no trouble using such tools.

And I think it’s arguable that standardizing on something (perhaps RDFa-in-HTML5) would bring some order to that chaos, and ultimately make things easier for the great unwashed.

Microformats (for instance) are, individually, much simpler ... till your blog implement a dozen different ones. At which point, the cognitive load is probably no lighter than RDFa.

Posted by Jacques Distler at

the cognitive load is probably no lighter than RDFa.

Probably no lighter.

Posted by Aristotle Pagaltzis at

Sam Ruby: RDFa in HTML5

なんで統合させようとするんだろうなあ。...

Excerpt from vantguarde / RDFa (49) at

Shirky: The Semantic Web, Syllogism, and Worldview

Shirky: The Semantic Web, Syllogism, and Worldview via Sam Ruby : "There is a list of technologies that are actually political philosophy masquerading as code, a list that includes Xanadu, Freenet, and now the Semantic Web. The Semantic Web’s...

Excerpt from Coding In Paradise at

Oh noes! Shirky’s strawman is back!

See also: [link]

/me awaits metacrap reprise (should be fun in the age of del.icio.us)...

Posted by Danny at

metacrap

Posted by Sam Ruby at

Episode 3

Host Updates Mikeal released PushMarks [link] https://addons.mozilla.org/en-US/firefox/addon/10806/ Michael has epic battle with QVC [link]...

Excerpt from WebDevGeekly at

It amazes me, that Ian (and various others) seem to have every interest in making HTML5 NOT extensible in a sane way. Better bake in SVG and MathML and claim that everything else is not HTML’s problem, that metadata will always be more corrupt than useful and that explicit reference to defined vocabulary is confusing overhead. Sometimes he sounds as if Joe User has to understand the source code of a web page, then again he brings up convoluted class-hacks to “demonstrate” that structured markup does already work in HTML.

Having some reservation against over-engineering is fine, especially in a space like this. But seriously, what gives? - these guys add audio and video tags and whatnot, even though there is an object tag and they cannot agree to adding some handful of attributes to provide a more generic extension mechanism because it adds too much clutter!

I don’t care that much about RDFa and I agree that XML name spaces have lots of pitfalls, but using spans with class attributes? Having an “extension mechanism” that is not able to clearly separate identifier names from data from viewable content from random formating hooks, not even speaking about identifying the vocabulary that is used? That sounds at least like ten times worse than embedding additional XML vocabularies in XHTML.

Well, maye in a few years, we will have a Wikipedia entry for “class-attriute hell” and then there will maybe be a revival of a new name space mechanism under some shiny new name as the latest and greatest trend.

Posted by Christian Steinert at

Shirky on the Semantic Web

I came across it while reading Sam Ruby’s thoughts on HTML5 and RDFa.  I like RDFa because it allows for using one presentation format while hiding another within it.  I’m sure those that like steganography also like this approach....

Excerpt from Noah Campbell at

Sam Ruby: RDFa in HTML5

[link]...

Excerpt from Delicious/parees/html5 at

Shirky on the Semantic Web

I came across it while reading Sam Ruby’s thoughts on HTML5 and RDFa.  I like RDFa because it allows for Read the Rest......

Excerpt from Noah Campbell at

Shirky: The Semantic Web, Syllogism, and Worldview

Shirky: The Semantic Web, Syllogism, and Worldview via Sam Ruby : "There is a list of technologies that are actually political philosophy masquerading as code, a list that includes Xanadu, Freenet, and now the Semantic Web. The Semantic Web’s...

Excerpt from Coding In Paradise at

Add your comment