View Issue Details

IDProjectCategoryView StatusLast Update
0003451AniDB HTTP APIBug Report - Interfacepublic2020-06-23 08:14
Reporterisrealityreallyreal Assigned ToDerIdiot  
PrioritynormalSeverityminorReproducibilityhave not tried
Status closedResolutionno change required 
Summary0003451: Python doesn't recognize the string returned by the anime request as unicode
DescriptionLet me preface this bug report with the stipulation that it is very possible that either:

1. The error is caused by something in my own code
2. I made an incorrect assumption that the response was supposed to be UTF-8 encoded

In case #1, I would kindly request that whoever responds to this ticket help me diagnose the problem (I minimized my code to only the relevant parts to help with this), but I can understand if you don't have time since this isn't your code. In case #2, all I would like to know is how the string is encoded so that I can rewrite my script to handle that. For the third possibility (i.e. that this is a valid bug in the API), my description is as follows:

I'm in the early stages of writing an Anidb HTML API client and right now I'm just trying to print the XML data for the "anime" request. When I try to decode the string returned from the HTTP request, I'm getting `UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte`. This would imply that the returned string is either not in fact unicode, or has some character at the beginning that Python doesn't recognize. The error happens on the line `print(data.decode(encoding="utf-8"))`.

I have provided a minimized version of the Python script I am using at the moment under "Steps to Reproduce" below. Your help is much appreciated.
Steps To ReproduceExecute the following python script (fingers crossed that github-style markdown works here):

```python
#!/usr/bin/python3

import urllib.request as request

anidb_url = "http://api.anidb.net:9001/httpapi?client=listmaker&clientver=1&protover=1"

def get_info(aid):
    """Get the raw XML for the given aid from anidb."""
    time.sleep(3) # make absolutely sure we don't request too often
    response = request.urlopen("{}&request=anime&aid={}".format(anidb_url, aid))
    data = response.read()
    print(data.decode(encoding="utf-8"))

get_info(7729) # should print info for Steins;Gate, but instead raises UnicodeDecodeError
```
Tagsanime, Unicode, UTF-8, XML

Activities

isrealityreallyreal

2020-06-22 09:35

reporter   ~0004432

Darn, looks like the markdown didn't work. Here's some syntax highlighting for my script if that helps.
image.png (40,317 bytes)   
image.png (40,317 bytes)   

DerIdiot

2020-06-23 08:14

administrator   ~0004433

"All content is UTF8 encoded and gzip compressed (you may have to handle the decompressing yourself, if your HTTP library doesn't support compression of HTTP data). Transfer is in chunks, which are part of HTTP 1.1. This is the case even when your client requested uncompressed HTTP 1.0. "

Issue History

Date Modified Username Field Change
2020-06-22 09:31 isrealityreallyreal New Issue
2020-06-22 09:31 isrealityreallyreal Status new => assigned
2020-06-22 09:31 isrealityreallyreal Assigned To => Ommina
2020-06-22 09:31 isrealityreallyreal Tag Attached: anime
2020-06-22 09:31 isrealityreallyreal Tag Attached: Unicode
2020-06-22 09:31 isrealityreallyreal Tag Attached: UTF-8
2020-06-22 09:31 isrealityreallyreal Tag Attached: XML
2020-06-22 09:35 isrealityreallyreal File Added: image.png
2020-06-22 09:35 isrealityreallyreal Note Added: 0004432
2020-06-23 08:09 DerIdiot Assigned To Ommina =>
2020-06-23 08:09 DerIdiot Status assigned => new
2020-06-23 08:14 DerIdiot Assigned To => DerIdiot
2020-06-23 08:14 DerIdiot Status new => closed
2020-06-23 08:14 DerIdiot Resolution open => no change required
2020-06-23 08:14 DerIdiot Note Added: 0004433