It’s just data


I’ve been looking into differences between the WHATWG URL Living Standard and the combination of RFC 3986 and RFC 3987.  I’ve come up with an indirect but effective way to identify the differences.  To start with I downloaded urltestdata.txt and urltestparser.  I then wrote a small script to convert the test data into json.

I then wrote another script to take this data and pass it through what is advertised as a closely conforming implementation of the relevant RFCs.

Looking at the results, the first set of issues related to the stripping of leading and trailing whitespace, so I updated the script to do that to focus on the remaining differences.  Similarly, the URL parsing definition includes the leading ? and # in the query and fragment values respectively, so I eliminated those differences in the cases where the values were non-empty.

The resulting script produces the this output.

The next set of differences concern canonicalization, so I ran tests using Addressable’s normalize method.  Note that as this as this non standard.  Updated output including normalization.

Based on what you have as output for e.g. “http://0Xc0.0250.01” it seems these tests might not actually match the specification in all cases. (Though mostly it looks familiar and correct.)

Posted by Anne van Kesteren at

url test results by browser.

Updates to the test data should be sent as pull requests to w3c/web-platform-tests.

See a user agent that should be included in the results?  Visit urltest and leave a comment with the user agent and hex code that that the web page reports.

Posted by Sam Ruby at

urltest is JS only. Does it make sense to test things like httpie, curl, modules and libraries from ruby, python, php and so on?

Posted by karl at

Opera/9.80 (Macintosh; Intel Mac OS X 10.9.5) Presto/2.12.388 Version/12.16

Posted by zcorpan at

urltest is JS only. Does it make sense to test things like httpie, curl, modules and libraries from ruby, python, php and so on?

Sure!  I’ll note that the ‘IETF’ rows actually represent data captured by a Ruby library.  My personal preference is to focus on modern, actively maintained or spec compliant applications.  A counter-example would be Java.

Opera/9.80 (Macintosh; Intel Mac OS X 10.9.5) Presto/2.12.388 Version/12.16

Added.  Thanks!

Posted by Sam Ruby at

test case review

Posted by Sam Ruby at

To address a problem Anne found, I updated urltesttojson.js, and then updated the urltestdata.json, captured new results for each browser (thanks, Simon!), and produced new output.

Colors on the initial page triage results:

Clicking through to an individual result, lack of convergence is represented by an entire column in gold.  Exceptions thrown are shown in pale violet red (#D87093).

Posted by Sam Ruby at

I’ve updated the colors to split out no convergence (Pale Red) from convergence doesn’t match WHATWG (Hot Pink - #FF69B4).

Posted by Sam Ruby at

PLH ran these tests using the following user agent:

Mozilla/5.0 (iPad; CPU OS 8_0_2 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12A405 Safari/600.1.4

I compared these results to those obtained from Version/7.1 Safari/537.85.10 on Intel Mac OS X 10_9_5.

Not a single result changed.

Posted by Sam Ruby at

The information which you people are given are really good, These look really great.

Posted by BandarQ at

Usually I prefer IETF URI over WHATWG URL

Posted by Kevin E. Dudley at

This is really very nice blog and so informative. Thanks a lot for sharing this article.

Posted by BandarQQ at

Add your comment