It’s just data

Python 3.0a1

Guido van Rossum: The first Python 3000 release is out — Python 3.0a1. Be the first one on your block to download it!

$ python3.0
Python 3.0a1 (py3k, Aug 31 2007, 21:24:31) 
[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> print(len('Iñtërnâtiônàlizætiøn'))
20
>>>

:-)


How does it fair with characters outside of the basic multilingual plane?  From memory, Python 2.x gives different answers depending on whether it was compiled in UCS2 or UCS4 mode.

[I guess I’ll find out for myself once I compile it ...]

Posted by James Henstridge at

Let’s try it out, based on this:

rubys@rubypad:~$ python3.0
Python 3.0a1 (py3k, Aug 31 2007, 21:24:31) 
[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> '𐌰𐍄𐍄𐌰 𐌿𐌽𐍃𐌰𐍂'
'\U00010330\U00010344\U00010344\U00010330 \U0001033f\U0001033d\U00010343\U00010330\U00010342'
>>> len('𐌰𐍄𐍄𐌰 𐌿𐌽𐍃𐌰𐍂')
19
>>>

Looks like I’ve compiled targeting UCS2.

Posted by Sam Ruby at

Okay, so it is the same situation as for Python 2.x.  Things get really confusing when you index into a string and get back half a character ...

Posted by James Henstridge at

Sam Ruby: Python 3.0a1

“>>> print(len('Iñtërnâtiônàlizætiøn')) 20”...

Excerpt from del.icio.us/edcrypt at

5 Apr 2008

Py3k I18n Improving on Sam Ruby’s example , to show that, in Python 3.0, code (names) can use unicode characters (also, the default encoding of the interpreter now is utf-8): $ python3.0 Python 3.0a3+ (py3k:61959, Mar 26 2008, 21:02:26) [GCC 4.2.3...

Excerpt from Advogato blog for eopadoan at

Add your comment