UnicodeEncodeError at /upload

Via email, here’s an issue we came across whilst trying to publish a course:

UnicodeEncodeError at /upload

‘ascii’ codec can’t encode character ‘\u2019’ in position 88: ordinal not in range(128)

Request Method: POST
Request URL: - OppiaMobile
Django Version: 2.2.24
Exception Type: UnicodeEncodeError
Exception Value: ‘ascii’ codec can’t encode character ‘\u2019’ in position 88: ordinal not in range(128)
Exception Location: /usr/lib/python3.8/zipfile.py in _extract_member, line 1701
Python Executable: /usr/bin/python3
Python Version: 3.8.10
Python Path: [’/home/staging/django-oppia’, ‘/home/staging/env/lib/python3.8/site-packages’, ‘/usr/lib/python38.zip’, ‘/usr/lib/python3.8’, ‘/usr/lib/python3.8/lib-dynload’, ‘/usr/local/lib/python3.8/dist-packages’, ‘/usr/lib/python3/dist-packages’]
Server time: Thu, 9 Sep 2021 16:47:08 +0000

Unicode error hint

The string that could not be encoded/decoded was: Almaz’s cas

@jjoseba Just to let you know what I found and tried here…

On my local Moodle and Oppia server, I don’t get this error. However I’m using Mysql 5.7. If I publish to our staging server I get the same error (this is on Mysql 8.x). I noticed the the default collation in MySQL is different between the versions utf8mb4_generic_ci (5.7) vs utf8mb4_900_ai_ci (8.x).

I tried exporting the database, changing the collations in the script, then reloading the database (so all was specified as utf8mb4_general_ci), but the problem remains.

After being able to replicate this, it was not a problem raised by the different database collations. The error shown related with unzipping was not misleading. After debugging the upload process, it fails directly at zip extracting. I searched in the course XML for the string that contains the character Django is complaining about (“Almaz’s cas”) and… it does not appear! It is the name of one of the images referenced in the course.

So the issue is not in writing that special character into the database, but in managing a zip file that contains non-ASCII characters in some of the paths. There are two possible solutions for this:

  • Instead of relying on zipfile.extractall() method of the zip library, manually manage the unzipping process of the course package and decode each path to avoid mismatching encoding errors.
  • Avoid using special characters in filenames in the Moodle export block.

Meanwhile, as a temporary fix for this course (or others with similar issues), the filenames can be changed manually avoiding the special characters. In this specific course these are the files with a non-ASCII character (right quotation mark):

ILLU_2.2.2.A.  Illustration for Almaz’s case study.jpg
ILLU_2.2.2.B.  Illustration for Fatuma’s case study.jpg
ILLU_3.1.2.A. Illustration for Lemlem’s case study.jpg
ILLU_3.2.1.A. Illustration for Aynalem’s case study.jpg
1 Like

@jjoseba, you got this. And this one’s on me - I missed checking the file name prior to upload - sincere apologies.

The course published successfully after the filename was corrected.

Thank you for all your amazing support on this one.

Isaac

1 Like

Glad to help!
Anyway, I think this should be detected and handled by the system, so it’s something to add in the development roadmap. The next time there will be no need for those extra checks :slight_smile:

That sounds Amazing!