EchoFeedWithAuthorRefs

See also : SyntaxExtensionMechanism

[SimonWillison] This example posted to show an alternative way of dealing with entry authors and contributors, where their personal details are provided once in a contributors block at the start of the feed, then <author ref="bob" /> style elements are used to indicate authorship of entries by a pre-designated author. This elimniates duplicated/redundant data in the entry feeds, reducing the size of the feed and the number of elements that an application must parse in order to understand the feed.

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://example.com/newformat#" 
      xmlns:ent="http://www.purl.org/NET/ENT/1.0/" 
      version="1.0" >
  <title>My First Weblog</title>
  <link>http://bob.blog/</link>
  <modified>2003-02-05T12:29:29Z</modified>

  <contributors>
    <person id="bob">
      <name>Bob B. Bobbington</name>
      <homepage>http://bob.name/</homepage>
      <weblog>http://bob.blog/</weblog>
      <email>bob@bobbington.org</email>
    </person>
    <person id="yoyo">
      <name>Yo-Yo Dyne</name>
      <homepage>http://yoyo.dyne.name/</homepage>
      <weblog>http://blog.yoyo.dyne.name/</weblog>
      <email>yoyo@bobbington.org</email>
     </person>
  </contributors>

  <entry>
    <title>My First Entry</title>
    <summary>A very boring entry; just learning how to blog here...</summary>

    <author ref="bob" />
    <contributor ref="yoyo" />
    <!-- and another couple contributors could go here -->

    <link>http://bob.blog/28</link>
    <id>http://bob.blog/28</id>

    <created>2003-02-05T12:29:29Z</created>
    <issued>2003-02-05T08:29:29-04:00</issued>
    <modified>2003-02-05T12:29:29Z</modified>

    <content type="application/xhtml+xml" xml:lang="en-us">
      <p xmlns="...">Hello, <em>weblog</em> world! 2 &lt; 4!</p>
    </content>
  </entry>
</feed>

[GeorgBauer] +1, I like this approach, as it reduces volume. Maybe two forms of reference, one local and one external to FOAF representations?

[SimonWillison] An external reference to a FOAF representation is an interesting idea, but I'm worried that it would raise the barrier of entry for Atom applications - now instead of just understanding Atom they would have to be able to retrieve other files over HTTP and understand FOAF as well. A FOAF link in the person section would be a good idea though, maybe even as a namespaced extension:

...
  <contributors>
    <person id="bob">
      <name>Bob B. Bobbington</name>
      <homepage>http://bob.name/</homepage>
      <weblog>http://bob.blog/</weblog>
      <foaf:url>http://bob.name/foaf</foaf:url>
    </person>
  </contributor>
...

[DannyAyers] This looks ok, but what does it mean? Ok, so presumably here you're saying that the foaf:url applies to the person, but I don't think the interpretation has been formalised anywhere. What if you wanted to include the whole FOAF profile? I think this is one special case of which there are likely to be loads(e.g. topics, threads) - so I reckon what's needed is a rock-solid, well-defined general purpose extension mechanism.

[PeteProdoehl] +1, I too think this is a good approach. Are there any negatives to doing it this way rather than specifying author/contributor on a per item basis?

[MartinAtkins] Can we maybe give the people globally-unique IDs too, so that we can see if the John Smith in one document is the same John Smith mentioned in another? Some kind of URI seems to be the standard way of doing GUIDs elsewhere, but making sure the same person is always referenced by the same URI would be tricky. Any other ideas? (note that I'm not proposing replacing the 'id' attribute of the person element, which is local to a given feed and thus can be short, which is desirable.)

[GeorgBauer] Actually a URL with mailto: and the email address should make a usefull GUID for persons. Or a http: URL to a contact form. So maybe some way of contacting the person directly, withoug looking through the homepage. On the other hand, the homepage itself is useable as a GUID, too. Two persons are unlikely to have the same name and homepage

[AsbjornUlsberg, RefactorOk] I still dislike the <homepage> and <weblog> items, and in my example (that has been ereased) these were replaced with <link>. <link rel="foaf" href="http://example.com/bob" /> could very well work as a reference to Bob's FOAF instance. Globally unique person or author ID's are typically email adresses. I can't see why email adress can't be required...

[GaryF, RefactorOK] Agreed. These element's names seem very restrictive and assuming in what Echo will be used for (i.e blogs only). I suggest we allow an arbitary number of <url> tags, where the first tag is the most important (perhaps a blog for an individul, or a homepage for a corporation).
[DamianCugley] +1 on using <link rel= href= /> everywhere. With a short list of blogging-oriented values for the rel attribute.

[SimonWillison] I think email addresses in feeds are best avoided as they would be a God-send to spam harvesters - were they to be used as the basis of a unique ID I think the best approach would be to use a cryptographic hash of the address, like the sah_mailbox thing in FOAF.

[AdamRice] +1 for Simon's proposal in general. -1 on e-mail addresses as author IDs (or any kind of required element) because of spam. +1 on use of the Link tag: Keeps the number of unique tags to a minimum.

[MortenFrederiksen] An element with the same content as the mbox_sha1sum of FoaF would be great for interop - especially for trust networks. It keeps spammers out, while still establishing identity.

[JimDabell] +1 on Simon's idea, I don't see any downside. I would also like some way of resolving an author to a document describing that person in detail (FOAF-like). Perhaps make the id a URI? Inventing a person: URN would be possible for non-URL ids, I can't think of a reason why we couldn't have id="person:hash-of-email".

[PeteProdoehl] +1 on the email address idea, though the 'hash-of-email' thingy might be a little to advanced for some users, since you need some sort of generator to create it, rather than just typing in an email address. I know, avoid spam, etc... but there's a good chance an email address is going to appear elsewhere in the file anyway, right? I'd vote for plain email address over obscured and hashed. Besides, it's not like hashing the email address is going to make the spam problem go away.

[PatrickLioi] +1 for me too. Showing the same person's <homepage> etc on every single <entry> is just begging for a fix like this.

[GregReinacker] I think email should be optional, but not required, for two reasons. First, the obvious spam problem. Second, not all feeds are from a person, and thus may not easily be able to support an email address - they would have to synthesize one, which isn't necessarily a good idea.

[GaryF] +1 on moving author details to feed-level like this. -1 for using email as a GUID. I think a GUID for users is a great idea, but this is not the way to do it. Hash of email would be fine. What is the most widespread hashing algorithm (or similar)? I would go for md5, since I've seen implementations in almost every language.

[RogerBenningfield] A reluctant -1. I sincerely like the idea, and it seems to be quite efficient. Unfortunately, it would probably require some changes to the way many tools generate feeds. For example, you can't just loop over the content twice, once for contributors and once for entries... if you do, you'll just end up with duplicated data in a new location. Tools would have to be revised to specifically allow a list of *unique* authors to be dumped independently of the entries themselves. I'd be willing to dedicate 30 minutes to changing my own code, but I'm hesitant to demand that other people do the same.

[JimDabell] Roger: A few possibilities to allay your concerns:

Output buffering seems to be the easiest solution to this trouble - why can't people just build up the <contributors> and <entry> elements before outputting them both?
Multiple <contributors> elements. In the rare cases where you cannot loop twice or buffer output, stick a <contributors> element before each <entry> element that has a previously undefined author. This will increase reading complexity a little.
Put the <contributors> element after all <entry> elements. May be more annoying to read.
Allow feeds to insert <person> elements directly within <author> elements, and leave ref="" as an optimisation.

I don't particularly like 2-4, primarily because they make things more difficult, but I really don't see a problem with 1.

[MichelValdrighi] Here's another way, that requires some logical changes in requirements of <author> elements. This assumes that an author's details are the same throughout the feed. One would only have to put the whole collection of <author> tags once, and then following entries by the same author would only need to use <author><name>John Smith</name></author>. Only problem with this approach is when two contributors bear the same name, but then again this is something that should be handled by the content producer already.

[AdamRice] Roger--You're right. But if we call this a desirable optimization for one's feed, not a requirement, that should work. Suggest harmonizing the two points of view thus: If a feed contains <author ref="Alice" /> with no further qualifications, there must be a "contributors" section. Otherwise, it is valid to have contributor-info inline and have an empty "contributors" section (but keep it there as a flag--"I haven't forgotten about this, I'm choosing to leave it empty").

[DamianCugley] How about allowing a choice between two elements, <authorRef href="#bob"/> or <author>... complete entry included inline ... </author>. Whichever the feed generating software finds most convenient to issue will depend on whether the feed is single-author or mixed. So long as the 'person' definitions preceed their use in references, consumers of feeds should not have too much trouble with either variation.

[Arien, RefactorOk] I propose having author info completely external to the feed: every author should have a URI. The analogue case for a URI for every entry is discussed in RestEchoApiOneUriForEachEntry. (Copied from EchoExample)

[EricScheid] +1. We could also use the same method to pre-define other bulky defaults like <copyright>, pulling them into individual entries as required. Especially useful when there is a mix of copyrights in the feed.

[JeremyGray] +1 to the general concept, with the following notes (in descending order of importance to me):

I would prefer to see authors and contributors always in the linked form, not in linked or nested form. Using a single form will keep things simple and consistent.
With various kinds of referencing starting to appear in the Wiki, on this page for author and contributor references and elsewhere regarding relation of trackbacks and comments to other entries (as well as other discussions surrounding 'threading'-related issues), might it be worth selecting a single reference mechanism for use across the spec, whether it be an 'HTML' style or 'XLink' style (as coined by another contributor elsewhere in the Wiki) ? Would a ReferenceMechanismDiscussion page do the trick?

AdamRice

JeremyGray

FOAF should be addressed separately, as an extension
For those suggesting unique identifiers of some kind for authors, I agree they would prove quite useful. Direct exposure of email addresses does raise some concerns for some people, though with the power available in current anti-spam technology I hardly worry about it any more, personally. That said, identifiers could easily be generated using a hash i.e. as just recently mentioned (although in a FOAF-specific context, but still relevant) in Persona Hash Key?
Given <contributors> should <person> not be <contributor> for sake of consistency? Does anyone see a present or reasonable future need for non-person contributors, and if so, how should that distinction be made (different elements vs. a single contributor element with sub-typing within) ?

<person>

<person>

CategoryModel, CategorySyntax