MultipleContentDiscussion

Multiple Content Elements

What does multiple <content> elements mean? Do all of them together form the content, or are they alternative representations, much as in <img src="..." alt="..." />

[MarkHershberger] In MimeContent, the case is made for a single <content> element and using multipart/alternative or multipart/related to include different sections. I like the idea since having multiple <content> elements is confusing, but then feed readers have to know our funky XML translation of MIME. Bottom line: multiple <content>'s are ambiguious.

From EchoExample and content, the comments below refer to this example:

  <content type="multipart/alternative"> 
    <content type="image/jpeg" encoding="base64"> 
      xo+Hello0AFWeblogh5FWorldh1mImagedsTbrVbF3 
    </content> 
    <content type="text/html" xml:lang="en-us" mode="escaped" rel="fragment"> 
      <![CDATA[<p>Hello, <em>weblog</em> world! 2 &lt; 4!</p>]]> 
    </content> 
    <content type="application/xhtml+xml" xml:lang="en-us" rel="fragment"> 
      <p xmlns="http://www.w3.org/1999/xhtml"> 
        Hello, <em>weblog</em> world! 2 &lt; 4! 
      </p> 
    </content> 
    <content type="application/pdf" src="http://example.org/blog/hello.pdf" /> 
  </content>

[JamesAylett RefactorOk] Having a single containing <content> element and then using the methodology of multipart/* is fine, but please, please, we should not be saying that the <content> element contains content "of MIME type multipart/alternate" (or multipart/anything) unless it is precisely in that MIME format.

[KenMacLeod] Very good point. It's clear that the intent is to use the methodology, and you are correct that those MIME types (multipart/*) already have a serialization.

Possible solutions:

Special case the multipart/* MIME type in !Atom resource entities, and look for XML child elements to indicate whether this is Atom XML multipart/* or RFC2045/RFC2046 text multipart/* (which would not be/have child elements and would have MIME text headers).
Introduce a new media type '[x.]multipart+xml/*'.
Use only RFC2045/RFC2046 text multipart/* within <content> (dropping the idea of multipart <content> children).
Turn the resource entity transferred in feeds, archives, and the API inside-out and allow a MIME multipart/related to be the resource entity transferred, with a !Atom <feed> or <entry> XML instance as the "start" or root object of the multipart. This is the approach used in SOAP Messages with Attachments.
Use content by reference for multiple content (src=).
Allow multiple <content> elements for a "first level" support of multiple content items, and define their processing as the same as 'multipart/alternative' (increasing order of fidelity, last is the publisher's best). Deeper or other multipart/* types would use RFC2045/RFC2046 text or be content by reference.

[JamesAylett RefactorOk] I would argue that by far the best way of doing this is to not have any direct support in !Atom, but simply to note that "an XML serialisation of the concepts of MIME multipart would be a convenient mechanism for multiple content entities or representations". The best way of doing this would be to introduce a new media type, which I'd argue should probably be application/x.multipart+xml rather than multipart+xml/* (because a whole new major type will take ages to standardise). However this is really beyond what !Atom should be defining, particularly because if it's done right, it's applicable far beyond this field. So:

Atom has at most one <content> element (exactly one if the poll goes that way)
multipart/* can be used with appropriate XML entity references
src='...' can specify a URL which returns an entity of type multipart/*, giving the same facility somewhat more readably (and allowing for ContentNegotiation, which would avoid having to do multipart/alternate)
if anyone gets an XML serialisation of the multipart concepts, that can also be used; trying to do that ourselves will just slow things down. If it's needed, someone will pick up the ball. (And it probably won't be in this field.)

[MarkNottingham] The current model allows for 1+ (possibly 0+) content modules. What does it mean to have multiple content modules in the same entry? Are they expected to be semantically equivalent (e.g., HTML vs. plain text of the same content)? If they're different, what does that mean? Each one has a media type; are content modules required to have distinguished types in that domain? Hmm. They have identifiers, and media types... seems to me you might call them, oh, I don't know... maybe Representations?

[JamesSnell] Yes, the requirement should be that all ContentModules in a given WellFormedEntry MUST be semantically equivalent.

Semantically

[MarkNottingham] This gets especially interesting when you mix in the discussion re: language tags. It seems like this might be moving into the realm of a portable Web representation format, which has already been hinted at in PASWA (not that I'm *very* happy with that part of PASWSA) as well as in Graham Klyne's XMLization of MIME messages. related MIME discussion

If this is accepted, it means that we should be considering this stuff with a RESTifarian hat on (I know some will object, but hey - it works). If that's the case, we should probably allow/force each content module to specify not only a media type, but also a (base) language, optional encoding (e.g., "This content is base64-encoded"), and so forth.

Taking this to its (possibly) logical conclusion, that would leave us with a model like:

Entry (Resource)

Entry Metadata (Resource Metadata)

author

name

permalink (URI)
creation date
etc...

zero or more instances of content (Variant Representations)

data (Entity-Bodies)
Variant Representation Metadata

media type
language
title
etc...

The cool part is where you can substitute a URI for the entity-body (just like in PASWA) and fetch it remotely, instead of shipping it around with the representation. That way, the metadata and content uses of the model are united.

[JamesSnell] Ok, it's 11:30 at night and I'm just not sure I'm groking this, but, if it mean what me think it mean then me think it good but me not sure if it really mean what me think it mean. In any case, a single WellFormedEntry should be capable of containing or referencing multiple semantically equivalent representations of it's content. Each ContentModule is a unique entity with it's own UniqueIdentity.

[MarkNottingham] I don't think that's the point. What I'm leaning towards is something where an entry looks like (if you'll excuse the crassness of a serialization, this is just thinking out loud):

<item uri="http://example.com/items/54">
  <creationDate>2003-06-12</creationDate>
  <content type="text/html" xml:lang="en" title="Stuff"> <h:p xmlns:h="..">This is the content</h:p> </content>
  <content type="text/plain"> .. </content>
</item>

OR it could look like

<item uri="http://example.com/items/54">
  <creationDate>2003-06-12</creationDate>
  <content type="text/html" xml:lang="en" title="Stuff"> <include href="http://example.com/items/54.html" /> </content>
  <content type="text/plain"> .. </content>
</item>

(There are, of course, lots of ways to serialize this, so don't get hung up on that now.)

So, we could require at least one "content module" (not really sure about that name), but the actual content might be there by reference, not by value. That way, you can have metadata-based entries (e.g., news RSS), or you can have content-based entries (e.g., weblog RSS).

[TimothyAppnel] How is the actual content as a reference different then a permalink assuming you have one ContentModule? This is an area where turning conceptual data model into practice seems rather fuzzy to me.

[MarkNottingham] That's a good point. content modules contain syndicated content, and they ususally don't have the context of the original (e.g., ads, Web site navigation, style, etc.); the permaLink is to the original (the source of the syndication) that does contain this stuff. I think that's an important distinction to make, and preserve in the model.

[PhilWolff] If I understand you correctly, you are describing alternate representations, not really multiple content. The syndication equivalent of the img alt tag. Is that what was intended? Assistance with transcoding and translation?

[AsbjornUlsberg, RefactorOk] Whow, I think you all are mixing things together here. First, there is a big difference between <content> and <feed>. A "permalink" as I understand it, goes to a <feed>, and I can't see that we have discussed nor come to any concensus to wether <content> should have permalinks or not, nor what a permalink in that context is.

Each feed should be retrievable through some kind of URI. Let's call this permalink. This URI should point to a resource returning the exact same data as you are looking at, e.g. the Echo feed. Other representations of the feed (HTML etc) should have alternative <link> elements pointing to it, or some other kind of external referencing method.

Each <content> can be retrievable via an URI, but doesn't have to. The reason why <content> should be retrievable over web protocols like HTTP, is because <content> can be a big CD image, ZIP archive, a downloadable database, etc. Such content is just stupid to have inline the Echo feed, unless you have to because of firewall problems and such.

So, <content> should in most cases contain the content inline, but in many cases it's much better to have it by reference. We need support for both methods, and each method serves different needs. They neither exclude nor replace one another; every type of content can be inline and externally referred. I hope I'm not alone in this view.

CategoryMetadata, CategoryModel