0003451: Python doesn't recognize the string returned by the anime request as unicode - AniDB Bug Tracker

ID	Project	Category	View Status	Date Submitted	Last Update

0003451	AniDB HTTP API	Bug Report - Interface	public	2020-06-22 09:31	2020-06-23 08:14

Reporter	isrealityreallyreal	Assigned To	DerIdiot
Priority	normal	Severity	minor	Reproducibility	have not tried
Status	closed	Resolution	no change required

Summary	0003451: Python doesn't recognize the string returned by the anime request as unicode
Description	Let me preface this bug report with the stipulation that it is very possible that either: 1. The error is caused by something in my own code 2. I made an incorrect assumption that the response was supposed to be UTF-8 encoded In case #1, I would kindly request that whoever responds to this ticket help me diagnose the problem (I minimized my code to only the relevant parts to help with this), but I can understand if you don't have time since this isn't your code. In case #2, all I would like to know is how the string is encoded so that I can rewrite my script to handle that. For the third possibility (i.e. that this is a valid bug in the API), my description is as follows: I'm in the early stages of writing an Anidb HTML API client and right now I'm just trying to print the XML data for the "anime" request. When I try to decode the string returned from the HTTP request, I'm getting `UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte`. This would imply that the returned string is either not in fact unicode, or has some character at the beginning that Python doesn't recognize. The error happens on the line `print(data.decode(encoding="utf-8"))`. I have provided a minimized version of the Python script I am using at the moment under "Steps to Reproduce" below. Your help is much appreciated.
Steps To Reproduce	Execute the following python script (fingers crossed that github-style markdown works here): ```python #!/usr/bin/python3 import urllib.request as request anidb_url = "http://api.anidb.net:9001/httpapi?client=listmaker&clientver=1&protover=1" def get_info(aid): """Get the raw XML for the given aid from anidb.""" time.sleep(3) # make absolutely sure we don't request too often response = request.urlopen("{}&request=anime&aid={}".format(anidb_url, aid)) data = response.read() print(data.decode(encoding="utf-8")) get_info(7729) # should print info for Steins;Gate, but instead raises UnicodeDecodeError ```
Tags	anime, Unicode, UTF-8, XML

isrealityreallyreal 2020-06-22 09:35 reporter ~0004432	Darn, looks like the markdown didn't work. Here's some syntax highlighting for my script if that helps. image.png (40,317 bytes) image.png (40,317 bytes)

DerIdiot 2020-06-23 08:14 administrator ~0004433	"All content is UTF8 encoded and gzip compressed (you may have to handle the decompressing yourself, if your HTTP library doesn't support compression of HTTP data). Transfer is in chunks, which are part of HTTP 1.1. This is the case even when your client requested uncompressed HTTP 1.0. "

Date Modified	Username	Field	Change
2020-06-22 09:31	isrealityreallyreal	New Issue
2020-06-22 09:31	isrealityreallyreal	Status	new => assigned
2020-06-22 09:31	isrealityreallyreal	Assigned To	=> Ommina
2020-06-22 09:31	isrealityreallyreal	Tag Attached: anime
2020-06-22 09:31	isrealityreallyreal	Tag Attached: Unicode
2020-06-22 09:31	isrealityreallyreal	Tag Attached: UTF-8
2020-06-22 09:31	isrealityreallyreal	Tag Attached: XML
2020-06-22 09:35	isrealityreallyreal	File Added: image.png
2020-06-22 09:35	isrealityreallyreal	Note Added: 0004432
2020-06-23 08:09	DerIdiot	Assigned To	Ommina =>
2020-06-23 08:09	DerIdiot	Status	assigned => new
2020-06-23 08:14	DerIdiot	Assigned To	=> DerIdiot
2020-06-23 08:14	DerIdiot	Status	new => closed
2020-06-23 08:14	DerIdiot	Resolution	open => no change required
2020-06-23 08:14	DerIdiot	Note Added: 0004433