UserPreferences

PaceRecommendPlainTextContent


Abstract

Text containers and content blocks specifically may contain rich-text, which must be down-stripped by more basic aggregators. Simply removing tags from X/HTML streams can however easily truncate meaning as well.

Therefore a clear recommendation and explanation to prefer plain text in all text containers should be added in the atom-format spec.

See also: PaceSimpleContentType (which however differs in purpose)

Status

New

Rationale

The scheme of text containers allows for up to five different data models, and non-plain text content especially. Automated conversion from HTML to text is often too simplistic and doesn't necessarily preserve meaning.

While HTML is still mostly used for stylistic and not significant purposes, it could always span informations where simple transformations (e.g. tag stripping) can remove essential data.

html tags denoting nothing

In this example the HTML tags have little or no meaning:

  <content type="html"><![CDATA[
    makes <em>no real</em> difference
  ]]></content>

dangerous tags

Whereas entries like this considerably lose substance if a too simple conversion mechanism touches them:

  <title type="xhtml">
     <div xmlns="http://www.w3.org/1999/xhtml">

        <acronym title="My Super New XML Language">MSNXL</acronym>

           <!-- or for example: -->

        event <ins>now</ins> <s>delayed</s>

     </div>
  </title>

The examples are of course construed. Most bloggin software doesn't emit such exotic HTML features.

recommend plain text

All aggregators will happily accept plain text (AggregatorTypeStatistics), but some can only use that and so have to down-convert any X/HTML to it. As explained above, this sometimes can go wrong completely and so some implementations may shy away from such procedures and then (even worse) instead ignore/delete some feed entries.

The obvious conclusion here therefore should be to use type="text" favourably and unless X/HTML really provides significant benefits in feed entries. And then it's better to leave the dubious honor to eventually mangle data/content to feed generators - which are in a superiour position (in contrast to consuming software) to decide if HTML makes sense or not. (hint: for <title>s it mostly doesn't)

Proposal

Recommend use of plain text in favour of X/HTML.

...

Impacts

Complicates feed generators, if they then get logic to decide on HTML stripping.

The missing separation into <content> and something like <rich-content> containers was meant to provoke/force wider support for HTML content in all aggregators. Such a recommendation of course would stifle that.

Notes

In the same line, SuperAggregators should never convert text container data models. Many Web-based aggregators have full HTML support and can always present feeds more correctly than desktop aggregators. No need to map down to the least common denominator by proxies.


CategoryProposals