We recently deployed a web based system to old Red Hat installation still using latin-1 as primary encoding for its filesystem. This caused headscratching as we were using Ubuntu during the development which uses UTF-8 encoding for filenames. Files were copied to the production server using rsync. When ls command was used in a terminal everything seemed be fine, because terminal itself was configured to UTF-8 and thus decoded the filenames when outputted to the terminal. However the web server was unable to serve images containing Finnish ä and ö characters in filenames.
After the headscratching was over we had figured out that renaming filenames to use latin-1 encoding makes the web server serve them fine. If it were a fresh start I’d rather configure the server itself to use now prevailent UTF-8, but because the infrastructure was shared with other, older, projects this was out of the question.
Thus, the script below was created to address the filename encoding problem (syntax highlighting available on the orignal publication):
#!/usr/bin/python """ Recursively fix filename encoding problems http://www.opensourcehacker.com, MIT licensed """ import os # Source filename encoding FROM="utf-8" # Target filename encoding TO="latin-1" # Current working directory PATH=os.getcwd() for root, dirs, files in os.walk(PATH): # Assume files are 8-bit strings in the native encoding # of the system which we cannot know for f in files: try: decoded = f.decode(FROM) except UnicodeDecodeError: print "Cannot decode:" + f continue fullpath = os.path.join(root, f) newpath = os.path.join(root, decoded.encode(TO)) if newpath != fullpath: print "Renaming:" + fullpath + " to:" + newpath os.rename(fullpath, newpath)
I still don’t know how to verify / check / guess which encoding Linux file systems are using (if not UTF-8…), so someone please enlight me. In this case we learnt this by a smart guess and testing: we created two filenames, one name encoded with UTF-8 and one encoded with Latin-1 and see which file the web server was giving for us. I am also unsure whether the fact that files were orignally kept in Subversion repository and committed there from Windows had anything to do with the problem.