First, I believe that it is important that the error recovery impacts the flow of meaningful text to the least extent possible. The original version might as well have used exceptions.
Second, and in the spirit of duck typing, consider
checking for a required behavior or characteristic
instead of checking on the class of an object. The above code is superior to the original in that it can handle the possibility of non-string messages.
It is more idiomatic to use #respond_to?
uri = AtomURI.check(alleged_uri)
return report_error(uri) unless uri.respond_to?(:path)
request = Net::HTTP::Get.new(uri.path)
I understand the desire to test capability (methods.include? / responds_to?) vs. an ‘instanceof’ test, but in the end, they aren’t really all that different. It doesn’t feel completely right to have to sniff objects in anyway before doing something with them.
If it was me, I’d separate the check that it’s not a ‘valid’ URI into a separate method from the one that converts the string uri into a URI object. It’s far easier to understand.
The point of duck typing is be able to use the same code path for different classes, as long as each class implements the methods required in the code path. IMHO checking the class, or checking required behavior are just dirty hacks.
And is it just me, or are utility classes (classes with just class methods) not the way to do OO programming?
A missing class here could be a HTTPResource. You get an exception if you try to create an HTTPResource with f.e. a file: URI.
Bob, I don’t think that’s the fault of the Ruby URI class. The Java URI class has the same behavior. The regular expression given in RFC 3986 is intentionally broad since it must handle any URI schemes, including non-HTTP ones. HTTP URLs include a DNS name, which as this RFC mentioned, is covered in section 3.5 of RFC 1034 and section 2.1 of RFC 1123. From those, the legal characters are A-Z, a-z, 0-9 and the hyphen ("-").
Unfortunately since Blogspot (and perhaps other services) allowed the registration of hostnames with underscores in them, this is a moot point since users don’t care about what the RFC says when they can’t use a blog URL in <insert app name here> that they can view perfectly well in their browser.
With due respect, both approaches are ugly. Tim’s violates Replace Conditional With Polymorphism by checking instaceof; yours leaks complexity into the calling code.
The mistake is Tim’s design decision to overload the return value. You don’t have to, so don’t.
uri = AtomURI.check(alleged_uri) { |e| report_error(e) }
return unless uri
Ah. Anonymous subs in Perl don’t work quite that way (the return would just exit the block, not the scope surrounding the block), so I didn’t think to propose that.
As for the placement of the check method, that depends – is alleged_uri an instanceof an existing class, like String or URI, which would make check an injected method? If so, the choice of method name seems imprudently short. I think I’d be most likely to recast it as a complex constructor on URI, assuming alleged_uri is a String, yielding something like this:
uri = URI.new_with_app_check(alleged_uri) { |e| return report_error(e) }
This is purposefully a tad wordier to reduce the likelihood of future toe-stepping.
Mihai, you’re quite right, but what’s the point of pretending that [link] doesn’t exist? Why should a hypothetical APP endpoint below such a subdomain be unaccessable when the browsers all allow for it? I’d be tempted to say that the RFCs should be updated, because it’s not exactly possible to put the cat back in the bag on this one.
[RAD stands for Ruby Ape Diaries, of which this is part IV.] That glob of letters in the title stands for “There’s More Than One Way To Do It”, and it comes from Perl culture. It is a distinguishing feature of Perl that if there’s something you...
Tim Bray is running a series of posts on an Atom Publishing Protocol client that tests out an APP implementation called APE, or Atom Protocol Exerciser. In one of the series Tim describes how he uses Duck Typing to handle checking a URI for...
This is a somewhat peripheral issue, but if you just use uri.path you’ll lose the query string: “syndicate.cgi?format=Atom” will be turned into “syndicate.cgi”. There’s a URI#path_query method that includes the query string, but it’s private. AFAICT the best strategy is something like: