Abstract
Given a separate technique for creating and managing arbitrary content types (PaceSimpleResourcePosting, PaceNonEntryResources), the opportunity arises to vastly simplify the remaining inline content cases to only XML text (markup and/or characters) and escaped HTML markup, and content-by-reference for specifying full body content using an arbitrary Internet media type. This proposal removes the @type attribute of "Content Constructs" and removes the 'base64' mode (replaced by content-by-reference). The 'escaped' mode is replaced by 'escaped-html', as that is the only backwards compatible use for escaped markup.
PaceContentSrc is an alternative that provides a "src" attribute for content-by-reference for all Content constructs, where this Pace only allows a "src" attribute for full body content and alternates.
Status
Open.
Author: KenMacLeod
Revised: 29-Jul-2004, 15-Jul-2004, 8-Jun-2004
Related:
-
PaceSimpleResourcePosting -- how to create related or arbitrary, non-entry resources
-
PaceNonEntryResources -- how to create and manage related or arbitrary, non-entry resources
Rationale
The "unbounded openness" of allowing arbitrary MIME content types in Content Constructs has always been strongly debated. One of the principle use cases for this unbounded openness was the ability to create and manage "complex" and multipart entries. Recently, several proposals have come forward (PaceSimpleResourcePosting, PaceNonEntryResources) to handle complex, multipart entries in a more robust and direct manner. Therefore, the content models of the <name>, <title>, <tagline>, <copyright>, <info>, <summary>, and <content> can be reduced to just the common cases of escaped HTML or XML text (markup and/or characters).
Further, the Content construct of the format has never been thoroughly specified. It does not specify what the interpretation, or profile, of an arbitrary media type resource should be, particularly the definition of a "resource" or "payload", whether the mode attribute is base64 or not. The Content construct, and the atom:content element in particular, never specified the informative elements originally found in content (internal wiki link currently broken).
This proposal allows content-by-reference only for the atom:content element ("src" and "type"), and alternatives for that content using a new atom:content-alternate element.
One other use case for multipart/alternative content or allowing multiple <content> elements is multiple language content. multipart/alternative was only ever specced for <content> and not other fields, which tends to make this case not applicable. This proposal relies on the user creating unique entries and using xml:lang for multiple languages.
Why is <content-alternate> required when alternate textual content could be inline in the <content> element?
-
<content-alternate> is required to support multiple alternate textual or non-textual alternatives. Since <content-alternate> is required for that purpose, it seems simpler to exclude the combined use than to describe the simultanous use of <content> with @src and with element content and still with <content-alternate>s.
Proposal
(1) Replace section 3.1 Content Constructs with the following:
-
A Content construct is an element with arbitrary textual child content. A Content construct MAY have a "mode" attribute that indicates whether the content is XML text (the default; markup and/or characters) or escaped HTML as used for backwards compatibility with existing publishing systems. When present, this attributes value MUST be listed below. If not present, its value MUST be considered to be "xml". Text and markup from HTML, XHTML, and other XML namespaces may be used within Atom according to section XX, Content Profiles. Some Atom elements, such as atom:title and atom:summary, may further restrict the content to what may appear in "one line" or "one paragraph".
- xml
-
A mode attribute with the value "xml" indicates that the element's content is inline XML text (for example, a series of XML characters without any non-character markup, namespace-qualified XHTML, or XML text from other XML namespaces).
Examples:
<title>Ben & Jerry's, yumm! :-></title>
<content mode="xml"> <div xmlns="http://www.w3.org/1999/xhtml"> Here's some <em>important</em> mathematics: <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> <mi>4</mi> <mo>></mo> <mi>3</mi> </mrow> </math> </div> </content>
- escaped-html
-
A mode attribute with the value "escaped-html" indicates that the element's content is an escaped string of HTML markup; the version of HTML is undefined. The string is passed to an HTML processor to be rendered.
Examples:
<title mode="escaped-html">Ben &amp; Jerry&apos;s, yumm! :-&gt;</title>
<content mode="escaped-html"> Here's some <em>important</em> mathematics: 4 &gt; 3. </content>
(2) Replace section 4.13.10 "atom:content" Element with the following:
-
The "atom:content" element is a Content construct that conveys the full body content of the entry. atom:entry elements MAY contain one but MUST NOT contain more than one atom:content element. An atom:content element MAY contain a "src" attribute, whose value is a URI. When "src" is present, it is a URI location of a resource that is displayed as the full body content. If a "src" attribute is used, the atom:content element MUST NOT contain element content. If a "src" attribute is used, the atom:content element MAY have a "type" attribute that indicates an advisory media type, as used in Link constructs.
Example:
<content src="kitty.jpg" type="image/jpeg" />
(3) Add a new section 4.13.XX "atom:content-alternate" Element with the following:
-
The "atom:content-alternate" element is a Content construct that conveys an alternate representation of the full body content of the entry. atom:content-alternate elements may use the "src" and "type" attributes as defined for atom:content elements. atom:entry elements MAY contain one or more atom:content-alternate elements. When atom:content-alternate elements are used, an atom:content element, if present, should be considered the most faithful to the original content, otherwise systems should choose the "best" type based on the local environment and references, in some cases even through user interaction. Ordering of atom:content-alternate elements MUST NOT be considered significant.
Example:
<content src="kitty.tiff" type="image/tiff"/> <content-alternate src="kitty.jpg" type="image/jpeg"/> <content-alternate>White calico with light and dark tans resting on a couch pillow.</content-alternate> <content-alternate><surface-texture xmlns="http://example.com/surface/ns#>...</surface-texture></content-alternate>
(4) In each Content construct Atom element, specify whether it uses "inline", "paragraph", or "block" content.
-
atom:title -- inline
-
atom:tagline -- inline
-
atom:copyright -- inline
-
atom:info -- inline or paragraph (specify)
-
atom:summary -- paragraph or block (specify)
-
atom:content -- block
(5) Add new section, Content Profiles, and subsections as below:
-
XML Characters (plain text)
-
When element content of a Content construct contains only character markup (XML characters, character entity references, CDATA sections; but no start tags, end tags, or empty-element tags), it is treated as a whitespace normalized XML 3.3.3 string. Characters of the string MUST be rendered literally, not further interpreted as markup.
-
To use XHTML 1.0 content, the Content construct MUST contain only one child element and that element MUST be in the XHTML namespace 'http://www.w3.org/1999/xhtml'. Child elements of the XHTML element MAY be in other namespaces. XHTML content MUST consist of only block-level and inline elements that can appear in the "body" XHTML element; some Atom elements may further restrict content to inline or a single block element, such as a paragraph. As a good practice, XHTML content SHOULD be a "div" or "span" element that declares a default namespace. Consumers MAY strip or filter elements and attributes of XHTML content that are unsafe in their environments [guidance?]. Also: rendering issues, DOCTYPE, quirks, css, charsets, must ignore.
-
Future specifications or general practice are expected to profile usage of XML qualified in other namespaces. As a good practice, XML content SHOULD consist of one namespace-qualified element. Consumers MUST be able to accept mixed-content, including content where XML characters precede any start-tags or empty-element tags or follow any empty-element tags or end-tags. Processing of character content that is outside of any namespaced element is undefined. User agents that encounter XML namespaces that are not renderable must display the document in such a way that it is obvious to the user that normal rendering has not taken place.
-
For backwards compatibility, Atom allows producers to pass HTML content that is potentially not well-formed or valid (according to XML or HTML recommendations). The Content construct mode attribute value "escaped-html" is used to indicate escaped HTML content. In escaped HTML content, the XML character content of the Content construct element is a string that contains characters that represent markup that should be passed to and interpreted by an HTML processor. Escaped HTML content, in practice, generally consists of characters, block-level, and inline elements that appear within a "body" HTML element; some Atom elements may further restrict content to inline or a single block element, such as a paragraph. Consumers MAY strip or filter elements and attributes of HTML content that are unsafe in their environments [guidance?].
Impacts
This proposal deprecates the @type attribute (it can be ignored by processors).
The @mode value of "base64" is dropped. The extent of the usage of this mode is unknown. The TypePad Atom implementation uses base64 entries for uploading photos, which this Pace presumes will be superceded by PaceSimpleResourcePosting / PaceNonEntryResources.
The @mode value of "escaped" is changed to "escaped-html" as a cue to users. During transition, consumers receiving "escaped" should treat it as "escaped-html".
Non-text content that previously may have been found in <content) (extent unknown) is moved to a <link> construct.
Extensibility
The current specification has several dimensions of extensibility (content type, mode of encoding, partial content, XML namespaces, multipart content) that contribute to its complexity.
This proposal reduces the extensibility to two areas:
-
The XML namespace of the atom:content element content. The Atom specification will provide a profile of XHTML that Atom implementations must support, while leaving open the ability to support other XML namespaces with additional profiles. Because XML namespaces do not require a central registrar, there is no need to register either the namespaces or the profiles. On the other hand, developers will benefit from having some technique (ie. RDDL at the namespace URL) or central location from which to find profiles and processing guidelines for use of other XML namespace qualified content within Atom.
-
Alternate content uses the <link> element, Internet Media Types, and transfer modes of relevant href scheme identifier (MIME or HTTP).
Notes
Changes from 15-Jul-2004:
-
Noted that some Atom elements, like atom:title or atom:summary, may restrict content to inline or a single block, like a paragraph.
Changes from 8-Jun-2004 (3-Jul-2004) (diff):
-
Reference updated Paces for non-entry resource uploading.
-
Replace link/@rel="content" with content/@src, which seems to be the favored solution for content-by-reference.
-
Separate <content> and <content-alternate> so that the most faithful representation can be indicated.
Changes from original version (diff):
-
added use-cases to Rationale
-
replaced fixed media types to a fixed selector of XML text or escaped HTML markup.
-
added specification on how to access alternate non-XML text content via <link>
-
added Extensibility section