I second with Bill’s first two items, but I have to add Python’s module new to the list. Compare Python vs Ruby.
To be fair, this is very much related to Bill’s fifth point.
Update: The Python code in question has been dramatically improved based on suggestions by Bjørn Stabell.
Update: The updated Python code has a very subtle bug, one that could very easily qualify as a pain point. Details.
The unittest module IMHO is quite lame. Which isn’t our fault really, because we just believed other people who said XUnit was a good idea. And XUnit isn’t a good idea, it’s an idea that makes sense in class-obsessed languages like Smalltalk or Java, and doesn’t really make sense elsewhere.
In a system like py.test or nosetest, you can pretty much use your test_encoding() function as-is, yielding test functions and parameters.
I personally use the new module about once a year, and lambda a bit more often buch not much. And I’m not really shy around metaprogramming constructs. I suppose it’s a different style of programming. Or in this case, I would have just instantiated the test case multiple times, as in this example: [link]
Other places where new might come in handy, I’m more likely to use descriptors [link]
In a system like py.test or nosetest, you can pretty much use your test_encoding() function as-is, yielding test functions and parameters.
Indeed that function was originally written to be used with nosetest. It was converted to use unittest to reduce external dependencies. The experience has taught me that I really don’t like unittest; something like nose or py.test would imho be much more worthy of a place in the Python standard library.
We’ve built a full server in Python, and our main pain point has been Bill’s number 5 - lack of real parallel execution when using threads. For web apps, mod_python and wsgi/mod_wsgi provide a solution, but true parallel execution would be nice.
Other, less painful issues are:
- The need to update imports when changing code from a module to a package.
- Vendors shipping ‘old’ python versions (e.g RHEL4 with Python 2.3), where the dependency matrix makes it virtually impossible to upgrade the default/system Python. This means you have to install another Python version, which is not always easy for non-administrative users.
I looked at your “Compare Python vs Ruby” links and I have to gripe that they both stink. Why are there no comments? I was able to decipher the intention, but non-straightforward code like that deserves an explanation. That’s my pain point in any language.
The CPAN community is pushing its members to do a better job on code documentation by offering "Kwalitee" points for publishing modules where all of the public methods/functions are documented. There’s even a tongue-in-cheek high score list.
Rather than building instance methods at runtime, it might have been easier to build whole TestCase classes:
# This is not a unittest.TestCase subclass so that it doesn't
# get found by e.g. unittest.findTestCases()
class EncodingTestCaseMixin:
def testEncoding(self):
# Test code using self.encoding and self.input.
def buildTestSuite():
suite = unittest.TestSuite():
for input, encoding in test_encoding():
test_class = type('EncodingTestCase(%r, %r)' % (input, encoding),
(unittest.TestCase, EncodingTestCaseMixin),
dict(input=input, encoding=encoding))
suite.addTests(unittest.makeSuite(test_class))
If you wanted to perform multiple tests on each (input, encoding) pair, it would be a simple matter of adding more methods to EncodingTestCaseMixin.
Whereas I agree with some of the complaints about Python, the Python and Ruby examples you provide are not at all comparable: the test cases aren’t anything close to the same (try looking at the code). Whoever did the Python case didn’t try very hard to make it clean and elegant; it would be easy to half the size of the code and increase readability 10 times. There’s also an ugly hack at the top to work around some import issues, something that could easily have been pushed to a config file instead.
Python has an excellent way to make readable tests. Long before anyone uttered the words Behavior Driven Development using DSLs in Ruby, Python has had Doctests, which in my 10 years of experience is the clearest way to show how an API or object model is supposed to work, while at the same time being executable as tests. In fact, you can just copy & paste the output of your interaction with the interpreter to create doctests. I believe a good goal with tests is to approach Donald Knuth’s literate programming ideal.
Something that does approximately what the Ruby test cases does (obviously needs some polish):
import re, glob, unittest, inputstream
def Html5EncodingTestCase(unittest.TestCase):
def __init__(self):
# set up encoding test cases using dynamically generated methods
for filename in glob.glob("encoding/*.dat"):
for idx,data,encoding in parseTestCases(open(filename).read()):
def encodingTest():
assert encoding == stream(data).charEncoding.lower()
setattr(self, 'test_%s_%d' % (test_name, idx+1), encodingTest)
def test_chardet(self):
encoding = stream(open("encoding/chardet/test_big5.txt").read()).charEncoding
assert encoding.lower() == "big5"
def stream(*args, **kw):
"""Helper, turn off chardet"""
kw['chardet'] = False
return inputstream.HTMLInputStream(*args, **kw)
def parseTestCases(testcases):
for (idx, testcase) in enumerate(
re.findall(
"^#data\s*\n(.*?)\n#encoding\s*\n(.*?)\n",
testcases, re.DOTALL|re.MULTILINE)):
(data, encoding) = testcase
yield idx, data, encoding
And a much more natural way to do it using doctests:
__tests___ = """Test encoding and character set detection
Let's first set up a helper method:
>>> def encoding(string):
... # guess encoding of string
... # TODO: implementation
This is just a control test; for the other tests to work,
this should pass - you may have to set your defaults appropriately:
>>> encoding('<!DOCTYPE HTML>\n<!-- control test -->')
Windows-1252
Now let's start on the serious tests.
>>> encoding('<!DOCTYPE HTML>\n<meta charset="ISO-8859-1">')
Windows-1252
>>> encoding('<!DOCTYPE HTML>\n<meta charset="ISO-8859-9">')
ISO-8859-9
"""
Both Ruby and Python are very high-level languages so with the right abstractions code can read very well.
Bjørn, I had to make a number of changes to this, primarily due to the fact that unittest requires that the methods be set on the class and not just the instance, and that at least one method be present before any instance is created, but after I was done, I agree that the end result is much cleaner. Take a look. Thanks!
It turns out that in Python, a for-loop does not introduce a new scope. What this means it that in the above code, every test_* function binds to the samedata and encoding variables, which by the time these functions are executed have the value of the last iteration of the loop.
A workaround to this is to capture the values in the function declaration itself:
Hi Sam, yeah I noticed that too. I believe you’ll soon be able to do this using a `nonlocal` keyword as well.
I also started thinking why we’re not using Python’s _new_ to automatically add new methods to classes. It would make the Python code even more similar to the Ruby one. It turns out unittest.TestCase isn’t a new-style class yet. Anyways, I refactored the code to make this change easy to do in the future (make the metaprogramming stuff more clear), and to move the potentially unnecessary stuff to the end of the file:
import sys, os, glob, re, unittest
from inputstream import HTMLInputStream as stream
class Html5EncodingTestCase(unittest.TestCase):
@classmethod
def addTests(cls):
# all tests are dynamically added here
for name,func in cls.getTests():
setattr(cls, name, func)
@staticmethod
def getTests():
# test 1: character detection test (optional)
try:
import chardet
def test_chardet(self):
data = open("encoding/chardet/test_big5.txt").read()
assert stream(data).charEncoding.lower() == "big5"
yield 'test_chardet', test_chardet
except ImportError:
print "chardet not found, skipping chardet tests"
# tests 2...: encoding tests, one for each case in each test file
test_parser = re.compile("^#data\s*\n(.*?)\n#encoding\s*\n(.*?)\n", re.S|re.M)
for filename in glob.glob("encoding/*.dat"):
test_name = os.path.basename(filename).replace('-','')[:-4]
test_data = open(filename).read()
for idx,(data,encoding) in enumerate(test_parser.findall(test_data)):
def test_encoding(self, d=data, e=encoding):
self.assertEquals(e.lower(), stream(d, chardet=False).charEncoding)
yield 'test_%s_%d' % (test_name, idx+1), test_encoding
# TODO:
# * if unittest.TestCase was a new-style class (inheriting from object), we could
# replace addTests with __new__, which would be automatically called on class
# definition time, rendering the below statement unnecessary:
Html5EncodingTestCase.addTests()
# TODO:
# * the below code could probably be completely removed; the test runner
# should be able to just import the module and automatically discover all of
# the tests within.
def buildTestSuite():
return unittest.defaultTestLoader.loadTestsFromName(__name__)
def main():
buildTestSuite()
unittest.main()
#RELEASE remove
if __name__ == '__main__':
# XXX Allow us to import the sibling module
os.chdir(os.path.split(os.path.abspath(__file__))[0])
sys.path.insert(0, os.path.abspath(os.path.join(os.pardir, "src")))
#END RELEASE
#RELEASE add
#import html5lib
#from html5lib import inputstream
#END RELEASE
if __name__ == "__main__":
main()
I was quite disappointed Python apparently made it so hard to do closures, so I did some more research.
I found some Lua discussions that put me on the right track, and it turns out your last pain point isn’t so: in Python 2.2 and onwards, this works fine, the encoding tests still pass:
for idx,(data,encoding) in enumerate(re.compile(
"^#data\s*\n(.*?)\n#encoding\s*\n(.*?)\n",
re.DOTALL|re.MULTILINE).findall(open(filename).read())):
def encodingTest(self):
stream = inputstream.HTMLInputStream(data,chardet=False)
self.assertEquals(encoding.lower(), stream.charEncoding)
setattr(Html5EncodingTestCase, 'test_%s_%d' % (test_name, idx+1), encodingTest)
... or if you use my last example:
for idx,(data,encoding) in enumerate(test_parser.findall(test_data)):
def test_encoding(self):
self.assertEquals(encoding.lower(), stream(data, chardet=False).charEncoding)
yield 'test_%s_%d' % (test_name, idx+1), test_encoding
I was also wrong on one point: The `nonlocal` keyword is, if I understand it right, only necessary if you want to force Python to treat variables you assign to in the inner function to rebind to variables in a outer scope instead of becoming declarations for new local variables. There’s even a comparison with Ruby that makes it more clear.
Sam, you’re absolutely right! I should’ve checked once more.
Here’s a simple examples demonstrating the problem:
>>> def funcs(x):
... for i in range(5):
... def g(): return x + i
... yield g
>>> [ g() for g in list(funcs(1)) ]
[5, 5, 5, 5, 5]
>>> [ g() for g in funcs(1) ]
[1, 2, 3, 4, 5]
So, an equivalent of the new Python test code using classes would be (elliding the initial imports):
# The following is not subclassed from TestCase so it doesn't get
# picked up by loadTestsFromName()
class EncodingTestCaseMixin:
data = None
encoding = None
def setUp(self):
self.stream = inputstream.HTMLInputStream(self.data, chardet=False)
def testStreamEncoding(self):
self.assertEqual(self.encoding.lower(), self.stream.charEncoding)
class ChardetTestCase(unittest.TestCase):
...
# Only process the chardet tests if the chardet module is available ...
try:
import chardet
except ImportError:
del ChardetTestCase
def buildTestSuite():
suite = unittest.defaultTestLoader.loadTestsFromName(__name__)
# Now add the encoding tests ...
for filename in html5lib_test_files("encoding"):
test_name = os.path.basename(filename).replace('.dat',''). \
replace('-','')
for idx, (data, encoding) in enumerate(re.compile(
"^#data\s*\n(.*?)\n#encoding\s*\n(.*?)\n",
re.DOTALL|re.MULTILINE).findall(open(filename).read())):
class_name = 'Encoding_%s_%d_TestCase' % (test_name, idx+1)
suite.addTests(unittest.makeSuite(type(
class_name, (unittest.TestCase, EncodingTestCaseMixin),
dict(data=data, encoding=encoding)))
return suite
if __name__ == '__main__':
unittest.main(defaultTest='buildTestSuite')
This has the benefit of separating out the actual test code from the test suite construction, which should make both sections of the code more maintainable. It should be just as easy to identify failing tests, since both the class and method name are used to construct the test name.
After hearing about it on an ITC podcast (also mentioned in previous post) I ordered the “RESTful Web Services” book from Amazon (and got into an argument with Sam Ruby regarding Python pain points; he seems to be a Ruby-convert these...
After hearing about it on an ITC podcast (also mentioned in previous post ) I ordered the “RESTful Web Services” book from Amazon (and got into an argument with Sam Ruby regarding Python pain points ; he seems to be a Ruby-convert these days). It...