Abstract
Make it clear that ids are: mandatory and immutable uris that are to be compared character by character, based primarily on the language defined in Namespaces in XML 1.1, with additional guidance inspired by rfc 2396bis
Status
Open
Related paces
Rationale
Promote interoperability in the ability to reliably identify and correlate whole feeds and individual entries independent of the time and location of access.
Proposal
Replace section 4.2.6 and 5.5, append a section to "3. Common Atom Constructs", and modify section 5.12:
3.5 Identification Constructs
An Identification construct is an element whose content conveys a permanent, universally unique identifier the parent element of the construct. Its content MUST be a URI, subject to the following rules:
-
It MUST be universally unique, which means that it must be unique both at the time of creation and in the future.
-
It MUST NOT change over time, even if the parent feed or entry element is relocated, migrated, syndicated, republished, exported or imported.
-
atom:id MUST NOT be a relative URI (see "absoluteURI" in RFC-2396, section 3)
The URI scheme of an Identification construct MAY be a dereferencable URL scheme (like HTTP), but MUST NOT be expected to be one. That means that atom:id MUST NOT be expected to be dereferencable; it is just an identifier.
If the identified resource is served dynamically, the content of an Identification construct MUST be created only once and then stored along with the resource. The content of an Identification construct MUST NOT be created dynamically.
URI references identifying entries and feeds are compared when determining whether a entry or feed is the same as one seen before. [Definition: The two URIs are treated as strings, and they are identical if and only if the strings are identical, that is, if they are the same sequence of characters. ] The comparison is case-sensitive, and no %-escaping is done or undone.
A consequence of this is that URI references which are not identical in this sense may resolve to the same resource. Examples include URI references which differ only in case or %-escaping. Note that relative URIs are not allowed as ids. Replacement of XML character and entity references must be done before any comparison.
Examples:
The URI references below are all different for the purposes of identifying entries, since they differ in case:
The URI references below are also all different for the purposes of identifying entries:
Because of the risk of confusion between URIs that would be equivalent if dereferenced, the following normalization rules are strongly encouraged when generating new ids:
-
Always provide the scheme in lowercase characters.
-
Always provide the host, if any, in lowercase characters.
-
Only perform percent-encoding where it is essential.
-
Always use uppercase A-through-F characters when percent-encoding.
-
Prevent dot-segments appearing in paths.
-
For schemes that define a default authority, use an empty authority if the default is desired.
-
For schemes that define an empty path to be equivalent to a path of "/", use "/".
-
For schemes that define a port, use an empty port if the default is desired
-
Empty Ufragment identifiers and queries must be preserved
-
All portions of the URI must be utf-8 encoded NFC from Unicode strings
4.2.6 "atom:id" Element
"atom:id" is an Identification construct that conveys a permanent, universally unique identifier for a feed. atom:head elements MUST contain an atom:id element, but MUST NOT contain more than one.
5.5 "atom:id" Element
"atom:id" is an Identification construct that conveys a permanent, universally unique identifier for an entry. atom:entry MUST contain exactly one atom:id element.
As defined in "3.5 Identification Construct", atom:id, MUST NOT change over time, even if other representations of the entry (such as a web representation pointed to by the entry's atom:link element) are relocated. For a given entry, the atom:id element's content MUST be stable across all Atom documents published by the same entity. This means that if the entry is represented in many different feeds simultaneously, the atom:id of these entries MUST be the same. Also, if the entry is relocated, migrated, syndicated, republished, exported or imported, the atom:id MUST NOT change.
5.12 "atom:origin" Element
The "atom:origin" element's content conveys the original source of the entry; e.g., the feed where the entry was first published.
If the source is an Atom Feed Document, then the content of atom:origin MUST be the same, character-for-character, as that of the atom:id element in that document's atom:head section (i.e., the XPath expression "/atom:feed/atom:head/atom:id").
The content of this element MUST be a URI. atom:entry elements MAY contain an atom:origin element, but MUST NOT contain more than one.
Impacts
The biggest change is to make it clear that atom:id elements are to remain unchanged even when relocated, migrated, syndicated, republished, exported or imported.
Aditionally atom:ids at the feed level are mandatory; and canonicalization is recommended, but not required.