It’s just data

Rails and Snowmen

People have started to notice that Rails is adding a snowman to their URLs.  There even is now a website devoted to this.

These types of social implications of technical decisions fascinate me.  Here’s some further background that I have pieced together.  I may have some details wrong, corrections welcome.

For starters, Rails by default standardizes on utf-8 for web pages.  As with pretty much everything in Rails, you can change the default, but virtually nobody does.  Utf-8 is a good choice here, and certainly is better than iso-8859-1 or win-1252.

Rails provides the encoding information on the Content-Type header, and on the accept-charset attribute.  Under normal circumstances, this will cause all responses to be encoded as utf-8, across all commonly used browsers.  Yes, including IE.

Most pages in Rails are produced using templates, and generally these templates are not the problem.  Data in those templates typically come from databases, and sometimes data can get into databases that isn’t 100% pure and clean.  In particular, sometimes this data may have encoding errors.  Such errors can easily become visible when that data is displayed in a form.

Browser recovery strategies vary on encoding errors, but often involve displaying a diamond with a question mark in it.

User behavior varies in the presence of such errors, but a common reaction is to switch the encoding.

The trouble starts when the user then proceeds to submit the form.  The net result, with some browsers, is that the data is sent respecting the user’s choice.  In other cases, browsers send the data using the application’s choice.

How Rails will react to encoding other than utf-8 being used depends on the version of Rails, the version of Ruby and a number of other factors.  In some cases, the result is an HTTP 400 response code (Bad Request).  In others, a 500 (Server Error).  In others, a 404 (Not found).  In others, even more misencoded data will make it all the way to the database.

As I said, sometimes the browser will chose to respect the user’s choice.  This is generally only done if it is possible to do so.  As not every character can be encoded using Western ISO Latin1, including such a character in a hidden field has been found to be an effective strategy of forcing the browser’s hand.

Enter the snowman.

In most cases, this is simply invisible metadata that solves a real problem that is otherwise hard to describe and debug.

Unfortunately, it isn’t always so invisible.  Try a query on this page and observe the resulting URI.  This page opted to use HTTP GET in order to make the URI meaningful.  Unfortunately the URIs with the latest version of Rails now have a bit of exposed cruft.

The fact that people care about such things to complain indicates that socialization of the concept of that URIs are to be meaningful is working.  The unfair perception that this is (yet another) workaround for IE has also entered into the debate.

This is a very real problem.  One without clean and comprehensive solutions.  The Rails team is aware of the _charset_ hidden value, but that opens up a different set of problems.

Solutions being discussed to date include renaming the form field, choosing a different character, moving the field to the end of the query, and providing a mechanism to opt out.


Utf-8 is a good choice here, and certainly is better than iso-8859-1 or win-1252.

Especially since ISO 8859-1 has long ago been superseded by ISO 8859-15.

Posted by anonymous at

Sam Ruby: Rails and Snowmen

[link]...

Excerpt from Delicious/jonas at

wycats: rename _snowman to _e

This has lead to some pushback.  I’ve (half-seriously) suggested _8-ɟʇn=ƃuıpoɔuǝ.

Posted by Sam Ruby at

Heh.  My app submits utf8=✓.  I almost wish I’d chosen something more fun now.

Posted by Randall at

If the goal is to keep the DB clean, why not include the snowman only in POST operation.
It should be fairly simple to set it working that way in rails or in any other framework/helper.
If the problem is also with GET requests, as I would suppose it is, like when site-searching, I would probably include the “snowman” only for IE browsers, put it at the end of the query and call it _ie-encoding=8-ɟʇn.

I am the opinion that different browser deserve different experiences and you don’t dumb it down for everybody because of one browser.

I also do think that bugs like these are big enough to demand patches from the browser vendors, and make a big stink about them until they fix them.

Posted by Diego Scataglini at

The week in links (08/02)

Reconnoiter: A Whirlwind Tour (Theo Schlossnagle, OmniTI) twitter.com user_streams The Webkit Team added WOFF support for @font-face how-fast-can-a-cloud-run/?ref=technology Rails-and-Snowmen TextMate Troubleshooting/SnowLeopard...

Excerpt from turnings :: daniel berlinger at

Commit war!

wycats: Replace snowman with utf8=✓

jeremy: It’s snowing!

Posted by Sam Ruby at

August 16, 2010: I Still Like Boring Software Development

Book Status Beta 6 is out and available for sale here. The major addition is the new Shoulda chapter. It’s also available from Amazon. Note that the ship date for the print book seems to have moved to November. Next up is the RSpec chapter,...

Excerpt from Rails Test Prescriptions Blog at

reverted

Posted by Sam Ruby at

Acccording to railssnowman, this bug appears when the user overrides the UTF-8 encoding. Hence, a solution to the problem could be to prevent users from changing the encoding. As a matter of fact, when a UTF-8 encoded page contains the Byte-order mark (BOM), then neither Internet Explorer nor Webkit browsers permit the user to override the encoding.

See bug bug 12950 against HTML5.

And also bug 12897 against HTML5/XHTML5.

Posted by Leif Halvard Silli at

Answer by Simone Carletti for Rails 3 UTF-8 query string showing up in URL?

The utf8 parameter (formerly known as snowman ) is a Rails 3 workaround for an Internet Explorer bug. The short answer is that Internet Explorer ignores POST data UTF8 encoding unless at least one UTF8 char is included in the POST data. For this...

Excerpt from Rails 3 UTF-8 query string showing up in URL? - Stack Overflow at

Add your comment