Web Framework (97)

37 Name: #!/usr/bin/anonymous : 2008-02-05 19:25 ID:i+ITJfDJ

>>35 Þ sorts differently in different languages.

> Now if you were just blindly manipulating strings, as you suggest, you would have a problem,

Read my post again. I didn't say anything about blindly doing anything: I actually said the exact opposite.

>>> print sorted(file("test.txt","r").readlines())
['PAGAN\n', 'PORN\n', 'PRALINE\n', 'TANK\n', 'THEME\n', 'TITMOUSE\n', 'YARN\n', 'YOUHOU\n', 'ZEBRA\n', '\xc3\x9eORN\n']

What's the encoding of this file again? I know, let's assume utf-8!

>>> for i in file("test.txt","r").readlines(): print i.decode('utf-8')

Well that seems to work. Let's just hope users never actually control the contents of test.txt:

>>> for i in file("test.txt","r").readlines(): print i.decode('utf-8')
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1: invalid data

It happens all the time. People do a lot of work, then some user posts actual unicode where it isn't expected and through the magic of transcoding the entire database is hosed.

This thread has been closed. You cannot post in this thread any longer.