View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0003451 | AniDB HTTP API | Bug Report - Interface | public | 2020-06-22 09:31 | 2020-06-23 08:14 |
Reporter | isrealityreallyreal | Assigned To | DerIdiot | ||
Priority | normal | Severity | minor | Reproducibility | have not tried |
Status | closed | Resolution | no change required | ||
Summary | 0003451: Python doesn't recognize the string returned by the anime request as unicode | ||||
Description | Let me preface this bug report with the stipulation that it is very possible that either: 1. The error is caused by something in my own code 2. I made an incorrect assumption that the response was supposed to be UTF-8 encoded In case #1, I would kindly request that whoever responds to this ticket help me diagnose the problem (I minimized my code to only the relevant parts to help with this), but I can understand if you don't have time since this isn't your code. In case #2, all I would like to know is how the string is encoded so that I can rewrite my script to handle that. For the third possibility (i.e. that this is a valid bug in the API), my description is as follows: I'm in the early stages of writing an Anidb HTML API client and right now I'm just trying to print the XML data for the "anime" request. When I try to decode the string returned from the HTTP request, I'm getting `UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte`. This would imply that the returned string is either not in fact unicode, or has some character at the beginning that Python doesn't recognize. The error happens on the line `print(data.decode(encoding="utf-8"))`. I have provided a minimized version of the Python script I am using at the moment under "Steps to Reproduce" below. Your help is much appreciated. | ||||
Steps To Reproduce | Execute the following python script (fingers crossed that github-style markdown works here): ```python #!/usr/bin/python3 import urllib.request as request anidb_url = "http://api.anidb.net:9001/httpapi?client=listmaker&clientver=1&protover=1" def get_info(aid): """Get the raw XML for the given aid from anidb.""" time.sleep(3) # make absolutely sure we don't request too often response = request.urlopen("{}&request=anime&aid={}".format(anidb_url, aid)) data = response.read() print(data.decode(encoding="utf-8")) get_info(7729) # should print info for Steins;Gate, but instead raises UnicodeDecodeError ``` | ||||
Tags | anime, Unicode, UTF-8, XML | ||||
|
Darn, looks like the markdown didn't work. Here's some syntax highlighting for my script if that helps. |
|
"All content is UTF8 encoded and gzip compressed (you may have to handle the decompressing yourself, if your HTTP library doesn't support compression of HTTP data). Transfer is in chunks, which are part of HTTP 1.1. This is the case even when your client requested uncompressed HTTP 1.0. " |
Date Modified | Username | Field | Change |
---|---|---|---|
2020-06-22 09:31 | isrealityreallyreal | New Issue | |
2020-06-22 09:31 | isrealityreallyreal | Status | new => assigned |
2020-06-22 09:31 | isrealityreallyreal | Assigned To | => Ommina |
2020-06-22 09:31 | isrealityreallyreal | Tag Attached: anime | |
2020-06-22 09:31 | isrealityreallyreal | Tag Attached: Unicode | |
2020-06-22 09:31 | isrealityreallyreal | Tag Attached: UTF-8 | |
2020-06-22 09:31 | isrealityreallyreal | Tag Attached: XML | |
2020-06-22 09:35 | isrealityreallyreal | File Added: image.png | |
2020-06-22 09:35 | isrealityreallyreal | Note Added: 0004432 | |
2020-06-23 08:09 | DerIdiot | Assigned To | Ommina => |
2020-06-23 08:09 | DerIdiot | Status | assigned => new |
2020-06-23 08:14 | DerIdiot | Assigned To | => DerIdiot |
2020-06-23 08:14 | DerIdiot | Status | new => closed |
2020-06-23 08:14 | DerIdiot | Resolution | open => no change required |
2020-06-23 08:14 | DerIdiot | Note Added: 0004433 |