It’s just data

RSS schema

Jorgen Thelin: Adding in the regexp suggested by Sam for RFC-822 format dates to the RSS 2.0 schema, I have come to the conclusion that I must be missing the point somewhere... although I get the syntactic validation of the data, I have lost the semantic meaning of the schema type model.  Am I missing something obvious here?

IMHO, schema only captures higher level syntax.  Title and Description are both strings... what semantics can one infer from that?  Meanwhile, you did define a simpleType that can be profitably reused.

On a related note: Mark Nottingham is looking for a new element that had the same semantic, but was a proper W3C datetime.  This can be found in the Dublin Core module.

Are there any other types for which a regex is desired?


I agree with you that there isn't much "semantic meaning" in a schema in the first place (which is kinda the point of my rant at http://www.kuro5hin.org/story/2003/4/19/211533/168 ) but this doesn't change the fact that something is lost by having to use an xs:string instead of an xs:dateTime.

Specifically technologies that utilize the PSVI to provide typed access to validated XML including technologies like XQuery and object<->XML data binding technologies would not treat the type as a date but instead would treat it as a string which makes processing such dates a lot more tedious.

Posted by Dare Obasanjo at

Dare, would it be fair to say that you would prefer something like dcterms:issued over pubDate and dcterms:lastmodified over lastBuildDate?

What I would most like to see happen here is that a bunch of people who produce or consume RSS feeds express their preferences on subjects like this, and that this be collected into a set of best practices, and that the validator be updated to provide guidance in areas such as these.

Do you have any other preferences?

Posted by Sam Ruby at

Sam,
Not really. I'm not currently using a schema to convert RSS feeds into objects or to get typed access to specific values so it doesn't matter that some date format is RFC 822 instead of ISO 8601 since the code I use to process them is indifferent to such issues.

I personally prefer pubDate because that's what my code currently supports and that's what's in most of the feeds that I've seen. Thus less work for me. ;)

PS: Exactly what am I supposed to do with the lastBuildDate info?

Posted by Dare Obasanjo at

I'm unclear as to what the benefit of this proposal is.  pubDate gives aggregators the information required, and it's commonly used.  Why introduce a new element that shows the same information - and moreover, hides it in a namespace.  I don't get it....

Posted by James Robertson at

In your feed, you have the 'link' element as '/blog'.  Based only on the information available in your RSS feed, how would I deduce the actual url for that?

Posted by James Robertson at

James: I see we're having the problems. I had chalked it up to the server problems his hoster is having. Perhaps I was mistaken.

Posted by Timothy Appnel at

James: you never heard of relative urls?

Now that the DNS changes have had a chance to catch back up, I've gone back to full urls.

Posted by Sam Ruby at

Of course I've heard of relative urls.  However, if your link (feed level) is set to

/blog

and items to things like

/blog/1368.html

and the feed url is

http://www.intertwingly.net/blog/index.rss

then by eyeballing it, the relative url is obvious.  In code however, it could easily be guessed as either:

http://www.intertwingly.net/blog/

or http://www.intertwingly.net/

as the base url.  See what I mean about guessing?

Posted by James Robertson at

James: when relative URLs start with a slash, they need to be interpreted as replacing everything after the hostname.

Posted by Sam Ruby at

Dare: care to take a position you will stick with?  ;-)

Posted by Sam Ruby at

Ok, I guess my question is:  Is there a spec somewhere on how one should interpret relative urls?

Posted by James Robertson at

James, see the documentation for URIs and XML:



Posted by Sam Ruby at

RSS Schema and dates

Sam mentions dc:date; that's what I was thinking, except that 'date' on its own is pretty useless. As Bill points......

Excerpt from mnot's weblog at

RE: RSS schema

Sam,
My position is that I would prefer to keep pubDate because it means I dont have to change my code. :)

Just because I can see some edge cases where using RFC 822 dates would require a relatively trivial amount more work to process than ISO 8601 doesn't mean I think they should be replaced.

Message from Dare Obasanjo at

Dare, so it probably is a good thing that Jorgen's schema defines a simpleType with a regex, wouldn't you think?

P.S.  My rss2 feed uses dc:date.

Posted by Sam Ruby at

Sam,
I wouldn't go as far as calling it a good thing but "good enough" probably accurately describes how I feel about it.

PS: RSS Bandit supports both pubDate and dc:date so I already had you covered.

Posted by Dare Obasanjo at

Sam,
  By my reading of those docs, the way your urls were set up was incorrect.  i.e., the only possible base url was the feed link, which ended in /blog/blah.html

now the relative url looked like

/blog/blah2.html

which, as I read the docs, would result in

/blog/blog/blah2.html

which isn't correct.  Am I reading them wrong?  I don't think so...

Posted by James Robertson at

James, The RSS feed itself is at http://www.intertwingly.net/blog/index.rss.  Evaluating /blog/1368.html relative to that URL would result in http://www.intertwingly.net/blog/1368.html.

I've made the relative URL cited above a hypertext link.  Try clicking on it to see where your browser takes you.  View source if you'd like to verify.

Posted by Sam Ruby at

http://intertwingly.net/blog/ + /test.html = http://intertwingly.net/test.html

If I define this in HTML:

[BASE HREF="http://intertwingly.net/blog/"]

then I do this:

[A HREF="test.html"]

that points to http://intertwingly.net/blog/test.html

However, if I do this on the same page:

[A HREF="/test.html"]

that points to http://intertwingly.net/test.html

Try it yourself.

Posted by Mark at

RSS Schema

(SOURCE-"sam ruby") - This is a LiveTopics 1.1.3 beta Test Post. <quote> Jorgen Thelin: Adding in the regexp suggested by Sam for RFC-822 format dates to the RSS 2.0 schema, I have come to the conclusion that I must be missing the point...

Excerpt from Roland Tanglao: XML at

RSS Schema

(SOURCE-"sam ruby") - This is a LiveTopics 1.1.3 beta Test Post .... [more]

Trackback from Roland Tanglao's Weblog

at

Add your comment