Source Citation Feed Validator
1.
Introduction

This profile is a set of recommendations for how to create RSS documents that work best in the wide and diverse audience of client software that supports the format. The definitions of the RSS elements in this profile are provided for convenience and must not be treated as definitive. Refer to the specification for authorititive guidance on the format.

Boilerplate; ignore.
1.
Introduction

An RSS document, also called a feed, must conform to the XML 1.0 specification and may contain elements and attributes defined in a namespace according to the Namespaces in XML specification. RSS elements do not belong to a namespace. All elements in an RSS feed that are not defined in a namespace must be described in the specification. None of the restrictions described in the specification apply to elements or attributes defined in a namespace.

foo is in an invalid namespace

XML Parsing error: syntax error

2.
Conventions

In this document, the key words may, must, must not, optional, recommended, required, shall, shall not, should and should not are to be interpreted as described in RFC 2119.

Boilerplate; ignore.
3.1
Character Data

For all elements defined in the RSS specification that enclose character data, the text should be interpreted as plain text with the exception of an item's description element, which must be suitable for presentation as HTML. All of these elements must not contain child elements.

Undefined foo element: bar
3.1
Character Data

A publisher should encode "&" and "<" in plain text using hexadecimal character references. When encoding the ">" character, a publisher should use the hexadecimal reference &#x3E;.

Encode "&" and "<" in plain text using hexadecimal character references.

Test cases.

3.2
Dates and Times

All date-time values must conform to the RFC 822 Date and Time Specification with the exception that a four-digit year is permitted in addition to a two-digit year.

foo must be an RFC-822 date-time
3.2
Dates and Times

All date-time values should use a four-digit year.

Problematical RFC 822 date-time value
3.2
Dates and Times

Although RFC 822 permits multiple spaces and comments between each component in date-time values, most aggregators fail to interpret them correctly. Publishers should not include comments or more than one space between components.

Problematical RFC 822 date-time value
3.2
Dates and Times

With the exception of "Z", the military time zones in RFC 822 are specified incorrectly and should not be used.

Problematical RFC 822 date-time value
3.2
Dates and Times

Each of these values employs Universal Time. The weekday, month and timezone should be capitalized as shown and the leading zero in the day of the month may be omitted.

Problematical RFC 822 date-time value
3.3
E-mail Addresses

Several elements must contain an e-mail address, but there's no requirement to follow a specific format for such addresses. Publishers could format addresses according to the RFC 2822 Address Specification, the RFC 2368 guidelines for mailto links, or some other scheme.

foo must be an email address

Need to add support for RFC 2368.

Test cases.

3.3
E-mail Addresses

The recommended format for e-mail addresses in RSS elements is username@hostname.tld (Real Name), as in the following example:

New requirement.

Test cases.

3.4
URLs

In all link and url elements, the first non-whitespace characters in a URL must begin with a scheme defined by the IANA Registry of URI Schemes such as "ftp://", "http://", "https://", "mailto:" or "news://". These elements must not contain relative URLs.

foo must be a full URI
3.4
URLs

Because an aggregator may choose which URI schemes to support, publishers of RSS documents must not assume that all schemes are available.

Untestable.
3.4
URLs

An Internationalized Resource Identifier (IRI) provides a means to identify Internet resources using non-ASCII characters that can't be present in URLs. All link and url elements must be valid URLs, so an IRI that contains non-ASCII characters must be converted to a URL using the procedure described in RFC 3987.

IRI found where URL expected
4.1
rss

The rss element is the top-level element of an RSS feed. A feed that conforms to the RSS specification must contain a version attribute with the value "2.0".

Should warnings be issued when version="0.91" or version="0.92" is encountered?
4.1
rss

This element is required and must contain a channel element. The rss element must not contain more than one channel.

Missing foo element: bar
4.1.1
channel

This element is required and must contain three child elements: description, link and title.

Missing foo element: bar
4.1.1
channel

The channel may contain each of the following optional elements: category, cloud, copyright, docs, generator, image, language, lastBuildDate, managingEditor, pubDate, rating, skipDays, skipHours, textInput, ttl and webMaster.

Supported.
4.1.1
channel

The preceding elements must not be present more than once in a channel, with the exception of category.

foo contains more than one bar
4.1.1
channel

The channel also may contain zero or more item elements. The order of elements within the channel must not be treated as significant.

Supported.
4.1.1
channel

All item elements should appear after all of the other elements in a channel.

New requirement.

Test cases.

4.1.1.1
description

The description element holds character data that provides a human-readable characterization or summary of the feed (required).

Missing foo element: bar
4.1.1.2
link

The link element identifies the URL of the web site associated with the feed (required).

Missing foo element: bar
4.1.1.3
title

The title element holds character data that provides the name of the feed (required).

Missing foo element: bar
4.1.1.3
title

If the feed corresponds directly to a web site, the name should match the name of the site.

Untestable.
4.1.1.4
category

The category element identifies a category or tag to which the feed belongs (optional).

Supported.
4.1.1.4
category

This element may include a domain attribute that identifies the taxonomy in which the category is placed.

Supported.
4.1.1.4
category

A channel may contain more than one category element.

Supported.
4.1.1.4
category

The category's value should be a slash-delimited string that identifies a hierarchical position in the taxonomy.

If read one way, this statement implies that all category values should have a slash in it. If read any other way, this statement is meaningless.
4.1.1.5
cloud

The cloud element indicates that updates to the feed can be monitored using a web service that implements the RssCloud application programming interface (optional).

Supported.
4.1.1.5
cloud

The element must have five attributes that describe the service:

Supported.
4.1.1.5
cloud
  • The protocol attribute must contain the value "xml-rpc" if the service employs XML-RPC or "soap" if it employs SOAP.
  • Supported.
    4.1.1.6
    copyright

    The copyright element declares the human-readable copyright statement that applies to the feed (optional).

    Supported.
    4.1.1.6
    copyright

    When a feed lacks a copyright element, aggregators should not assume that is in the public domain and can be republished and redistributed without restriction. Under the Berne Convention adopted by the United States and more than 150 other countries, a work does not require a copyright statement to be protected by copyright.

    N/A
    4.1.1.7
    docs

    The docs element identifies the URL of the RSS specification implemented by the software that created the feed (optional).

    Supported.
    4.1.1.7
    docs

    If you are relying on the specification and profile published by the RSS Advisory Board, the value of this element should be "http://www.rssboard.org/rss-specification".

    Should the feed validator recommend this?
    4.1.1.8
    generator

    The generator element credits the software that created the feed (optional).

    Supported.
    4.1.1.9
    image

    The image element supplies a graphical logo for the feed (optional).

    Supported.
    4.1.1.9
    image

    The image must contain three child elements: link, title and url. It also may contain three optional elements: description, height and width.

    Missing foo element: bar
    4.1.1.9.1
    link

    The image's link element identifies the URL of the web site represented by the image (required).

    Missing foo element: bar
    4.1.1.9.1
    link

    This should be the same URL as the channel's link element.

    Image link doesn't match channel link
    4.1.1.9.2
    title

    The image's title element holds character data that provides a human-readable description of the image (required).

    Missing foo element: bar
    4.1.1.9.2
    title

    This element should have the same text as the channel's title element and be suitable for use as the alt attribute of the img tag in an HTML rendering.

    New requirement.

    Test cases.

    4.1.1.9.3
    url

    The image's url element identifies the URL of the image, which must be in the GIF, JPEG or PNG formats (required).

    Untestable.
    4.1.1.9.4
    description

    The image's description element holds character data that provides a human-readable characterization of the site linked to the image (optional).

    Supported.
    4.1.1.9.4
    description

    The description should be suitable for use as the title attribute of the a tag in an HTML rendering.

    Untestable.
    4.1.1.9.5
    height

    The image's height element contains the height, in pixels, of the image (optional). The image must be no taller than 400 pixels. If this element is omitted, the image is assumed to be 31 pixels tall.

    foo must be between 1 and 400
    4.1.1.9.6
    width

    The image's width element contains the width, in pixels, of the image (optional). The image must be no wider than 144 pixels. If this element is omitted, the image is assumed to be 88 pixels wide.

    foo must be between 1 and 144
    4.1.1.10
    language

    The channel's language element identifies the natural language employed in the feed (optional).

    Supported.
    4.1.1.10
    language

    The language must be identified using one of the RSS language codes or a language code permitted by the World Wide Web Consortium for use in HTML. The U.S. Library of Congress publishes the current list of ISO 639 language codes adopted by HTML.

    foo must be an ISO-639 language code
    4.1.1.11
    lastBuildDate

    The channel's lastBuildDate element indicates the last date and time the content of the feed was updated (optional).

    Supported.
    4.1.1.12
    managingEditor

    The channel's managingEditor element provides the e-mail address of the person to contact regarding the editorial content of the feed (optional).

    Supported.
    4.1.1.13
    pubDate

    The channel's pubDate element indicates the publication date and time of the feed's content (optional). Publishers of daily, weekly or monthly periodicals could use this element to associate feed items with the date they most recently went to press.

    Supported.
    4.1.1.14
    rating

    The channel's rating element supplies an advisory label for the content in a feed, formatted according to the specification for the Platform for Internet Content Selection (PICS) (optional).

    Supported.
    4.1.1.15
    skipDays

    The channel's skipDays element identifies days of the week during which the feed is not updated (optional). This element contains up to seven day elements identifying the days to skip.

    Supported.
    4.1.1.15
    skipDays

    Although an aggregator should not request the feed on the days identified by this element, the point is largely moot because of how infrequently it is used by publishers. Fewer than one percent of surveyed feeds included a skipDays element.

    N/A
    4.1.1.15.1
    day

    The day element identifies a weekday in Greenwich Mean Time (GMT) (required). Seven values are permitted -- "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday" or "Sunday" -- and must not be duplicated.

    Incorrect day of week: foo

    foo values must not be duplicated within a feed

    4.1.1.16
    skipHours

    The channel's skipHours element identifies the hours of the day during which the feed is not updated (optional). This element contains individual hour elements identifying the hours to skip.

    Supported.
    4.1.1.16
    skipHours

    An aggregator should not request the feed on the hours identified by this element.

    N/A
    4.1.1.16.1
    hour

    The hour element identifies an hour of the day in Greenwich Mean Time (GMT) (required). The hour must be expressed as an integer representing the number of hours since 00:00:00 GMT. Values from 0 to 23 are permitted, with 0 representing midnight. An hour must not be duplicated.

    foo must be between an integer 0 and 23

    foo values must not be duplicated within a feed

    New: Use zero for midnight

    4.1.1.16.1
    hour

    RSS specifications differ in the number assigned to midnight, which is 0 in the current RSS specification and 24 in RSS 0.91. For this reason, aggregators should accept both 0 and 24 to represent midnight.

    Issue a warning or an error on 24?
    4.1.1.17
    textInput

    The textInput element defines a form to submit a text query to the feed's publisher over the Common Gateway Interface (CGI) (optional).

    Supported.
    4.1.1.17
    textInput

    The element must contain a description, link, name and title child element.

    Missing foo element: bar
    4.1.1.17
    textInput

    For this reason, publishers should not expect it to be supported in most aggregators.

    New requirement.

    Test cases.

    4.1.1.17.1
    description

    The input form's description element holds character data that provides a human-readable label explaining the form's purpose (required).

    Missing foo element: bar
    4.1.1.17.2
    link

    The input form's link element identifies the URL of the CGI script that handles the query (required).

    Missing foo element: bar
    4.1.1.17.3
    name

    The input form's name element provides the name of the form component that contains the query (required). The name must begin with a letter and contain only these characters: the letters A to Z in either case, numeric digits, colons (":"), hyphens ("-"), periods (".") and underscores ("_").

    Missing foo element: bar

    Invalid form component name

    4.1.1.17.4
    title

    The input form's title element labels the button used to submit the query (required).

    Missing foo element: bar
    4.1.1.18
    ttl

    The channel's ttl element represents the feed's time to live (TTL): the maximum number of minutes to cache the data before an aggregator requests it again (optional).

    Supported.
    4.1.1.18
    ttl

    Because of these differences, aggregators that support this element should treat it as a publisher's suggestion of a feed's update frequency, not a hard rule. For instance, an aggregator that gives users the ability to choose how often to check a feed could use its TTL as the default value.

    N/A.
    4.1.1.19
    webMaster

    The channel's webMaster element provides the e-mail address of the person to contact about technical issues regarding the feed (optional).

    Supported.
    4.1.1.20
    item

    An item element represents distinct content published in the feed such as a news article, weblog entry or some other form of discrete update. A channel may contain any number of items (or no items at all).

    Supported.
    4.1.1.20
    item

    An item may contain the following child elements: author, category, comments, description, enclosure, guid, link, pubDate, source and title. All of these elements are optional but an item must contain either a title or description.

    item must contain either title or description
    4.1.1.20
    item

    The preceding elements must not be present more than once in an item, with the exception of category.

    foo contains more than one bar
    4.1.1.20.1
    author

    An item's author element provides the e-mail address of the person who wrote the item (optional).

    Supported.
    4.1.1.20.1
    author

    A feed published by an individual should omit this element and use the managingEditor or webMaster channel elements to provide contact information.

    Untestable.
    4.1.1.20.2
    category

    An item's category element identifies a category or tag to which the item belongs (optional).

    Supported.
    4.1.1.20.2
    category

    This element may include a domain attribute that identifies the category's taxonomy.

    Supported.
    4.1.1.20.2
    category

    An item may contain more than one category element.

    Supported.
    4.1.1.20.2
    category

    The category's value should be a slash-delimited string that identifies a hierarchical position in the taxonomy.

    Same issue as in section 4.1.1.4
    4.1.1.20.3
    comments

    An item's comments element identifies the URL of a web page that contains comments received in response to the item (optional).

    Supported.
    4.1.1.20.4
    description

    An item's description element holds character data that contains the item's full content or a summary of its contents, a decision entirely at the discretion of the publisher. This element is optional if the item contains a title element.

    item must contain either title or description
    4.1.1.20.4
    description

    The description must be suitable for presentation as HTML. HTML markup must be encoded as character data either by employing the HTML entities &lt; ("<") and &gt; (">") or a CDATA section.

    Invalid HTML: explanation
    4.1.1.20.4
    description

    The description should not contain relative URLs, because the RSS format does not provide a means to identify the base URL of a document. When a relative URL is present, an aggregator may attempt to resolve it to a full URL using the channel's link as the base.

    foo should not contain relative URL references
    4.1.1.20.5
    enclosure

    An item's enclosure element associates a media object such as an audio or video file with the item (optional). The element must have three attributes:

    Supported.
    4.1.1.20.5
    enclosure

    For best support in the widest number of aggregators, an item should not contain more than one enclosure.

    item contains more than one enclosure
    4.1.1.20.5
    enclosure

    Though an enclosure must specify its size with the length attribute, the size of some media objects cannot be determined by an RSS publisher. Examples include the streaming media formats RealAudio and Apple QuickTime.

    Missing foo attribute: bar
    4.1.1.20.5
    enclosure

    When an enclosure's size cannot be determined, a publisher should use a length of 0.

    foo must be a non-negative integer
    4.1.1.20.5
    enclosure

    When an enclosure is delivered in a multi-step process like the one used by BitTorrent, the length should be the size of the first file that must be downloaded to begin the process.

    Untestable.
    4.1.1.20.6
    guid

    An item's guid element provides a string that uniquely identifies the item (optional). The guid may include an isPermaLink attribute.

    Supported.
    4.1.1.20.6
    guid

    The guid enables an aggregator to detect when an item has been received previously and does not need to be presented to a user again. If the guid's isPermaLink attribute is omitted or has the value "true", the guid must be the permanent URL of the web page associated with the item.

    guid must be a full URL, unless isPermaLink attribute is false
    4.1.1.20.6
    guid

    If the guid's isPermaLink attribute has the value "false", the guid may employ any syntax the feed's publisher has devised for ensuring the uniqueness of the string, such as the Tag URI scheme described in RFC 4151.

    Supported.
    4.1.1.20.6
    guid

    A publisher should provide a guid with each item.

    item should contain a guid element
    4.1.1.20.7
    link

    An item's link element identifies the URL of a web page associated with the item (optional).

    Supported.
    4.1.1.20.8
    pubDate

    An item's pubDate element indicates the publication date and time of the item (optional).

    Supported.
    4.1.1.20.8
    pubDate

    The specification recommends that aggregators should ignore items with a publication date that occurs in the future, providing a means for publishers to embargo an item until that date.

    N/A
    4.1.1.20.8
    pubDate

    None of the tested aggregators withheld an item with a future publication date from readers. For this reason, publishers should not include items in a feed until they are ready for publication.

    Implausible date: foo
    4.1.1.20.9
    source

    An item's source element indicates the fact that the item has been republished from another RSS feed (optional). The element must have a url attribute that identifies the URL of the source feed.

    Missing foo attribute: bar
    4.1.1.20.10
    title

    An item's title element holds character data that provides the item's headline. This element is optional if the item contains a description element.

    item must contain either title or description
    5.
    Namespace Elements

    This section of the profile contains recommendations for how to handle these situations. This must not be considered definitive in regard to namespace elements, which are defined by their authors.

    Boilerplate; ignore.
    5.1.1
    atom:link

    The atom:link element defines a relationship between a web resource (such as a page) and an RSS channel or item (optional). The most common use is to identify an HTML representation of an entry in an RSS or Atom feed.

    Supported.
    5.1.1
    atom:link

    The element must have an href attribute that contains the URL of the related resource and may contain the following attributes:

    Missing foo attribute: bar
    5.1.1
    atom:link

    The element also may contain a rel attribute, which contains a keyword that identifies the nature of the relationship between the linked resouce and the element. Five relationships are possible:

    Supported.
    5.1.1
    atom:link

    An RSS feed can identify its own URL using the atom:link element within a channel. The link must have the rel attribute "self", an href attribute containing the feed's URL and may have a type attribute of "application/rss+xml":

    Supported.
    5.1.1
    atom:link

    There's no means to do this with RSS elements defined in the specification. Identifying a feed's URL within the feed makes it more portable, self-contained, and easier to cache. For these reasons, a feed should contain an atom:link used for this purpose.

    New requirement.  May raise a few eyebrows.

    Test cases.

    5.1.1
    atom:link

    When a namespace element duplicates the functionality of an element defined in RSS, the core element should be used.

    This requires an enumeration. I've seen too many elements that were initially decried loudly as funky that later proved to be very useful. atom:link, dc:creator, and content:encoded are all examples. To help seed this discussion, is it the intent of this requirement to make the following suggested replacements?

    • admin:generatorAgent => generator
    • dc:date => pubDate
    • dc:language => language
    • dc:publisher => webMaster
    • dc:rights => copyright
    • dc:source => source
    • dc:subject => category
    • dcterms:modified => lastBuildDate

    Also, why is this requirement in the atom:link section?

    5.2.1
    content:encoded

    The content:encoded element defines the full content of an item (optional). This element has a more precise purpose than the description element, which can be the full content, a summary or some other form of excerpt at the publisher's discretion.

    Supported.
    5.2.1
    content:encoded

    The content must be suitable for presentation as HTML and be encoded as character data in the same manner as the description element.

    Invalid HTML: explanation
    5.2.1
    content:encoded

    Publishers who don't want to employ item summaries in their feeds should use the description element for an item's full content rather than content:encoded because it has the widest support.

    New requirement.

    Test cases.

    5.2.1
    content:encoded

    Publishers who employ summaries should store the summary in description and the full content in content:encoded, ordering description first within the item. On items with no summary, the full content should be stored in description.

    New requirement.

    Test cases.

    5.3.1
    dc:creator

    The dc:creator element identifies the person or entity who wrote an item (optional). An item may contain more than one dc:creator element to credit multiple authors.

    Supported.
    5.3.1
    dc:creator

    Publishers should use author when they want to reveal an author's e-mail address and dc:creator when they don't. The same item should not include both elements.

    An item should not include both foo and bar
    5.3.1
    dc:creator

    This same recommendation should be followed for the use of dc:creator with the channel elements managingEditor and webMaster.

    A channel should not include both foo and bar
    5.4.1
    slash:comments

    The slash:comments element contains a non-negative integer that counts the number of comments that an item has received (optional).

    Supported.
    5.4.1
    slash:comments

    On an active web site, comment counts change frequently as new comments are published, so by necessity this element contains a snapshot of the totals at a particular moment in time. Because the Slash namespace lacks an element to indicate when the comment counts were compiled, publishers who use this element also should include a lastBuildDate element.

    New requirement.

    Test cases.