PPTX files have their content type replaced with application/zip

vebjorn's Avatar

vebjorn

20 Feb, 2012 01:37 PM via web

When I upload a PPTX file (from newer versions of Microsoft Powerpoint), the browser sends the correct content type:

------WebKitFormBoundarytbhfmfnqRIO83xgf Content-Disposition: form-data; name="file"; filename="test.pptx"
Content-Type: application/vnd.openxmlformats-officedocument.presentationml.presentation

However, when I view the assembly (e.g., https://transloadit.com/assemblies/view/3dacaaf9531e1d15679407947d4...) on your web site, it lists "application/zip" as the content type. The incorrect "application/zip" is also what is being stored as the file's content type in S3 when the S3 robot saves the file.

As you can imagine, this becomes a problem when a used tries to download the file again from S3. How can I get around the problem?

  1. Support Staff 2 Posted by tim on 20 Feb, 2012 02:18 PM

    tim's Avatar

    Hey there,

    I pushed a fix for this. Once our testsuite is done running the tests I will trigger a deployment.

    The fix should be live for you within the next 15min.

    Kind regards
    Tim

  2. Support Staff 3 Posted by tim on 20 Feb, 2012 02:40 PM

    tim's Avatar

    The fix has been live for a couple minutes.

  3. 4 Posted by vebjorn on 21 Feb, 2012 10:27 AM

    vebjorn's Avatar

    Thanks for the fix, but stil the same thing is happening:
    https://transloadit.com/assemblies/view/6f4cf8bdd4c64ec06424a246cd4...

  4. Support Staff 5 Posted by tim on 21 Feb, 2012 10:34 AM

    tim's Avatar

    Hey vebjorn,

    it seems our version of exiftool identifies the image to be a zip, that is the problem.

    root@dev2:~# exiftool b48bb5eaf9b1daf320a0386d78038a22
    ExifTool Version Number : 8.56
    File Name : b48bb5eaf9b1daf320a0386d78038a22
    Directory : .
    File Size : 34 kB
    File Modification Date/Time : 2012:02:21 10:26:44+00:00
    File Permissions : rw-r--r--
    File Type : ZIP
    MIME Type : application/zip
    Zip Required Version : 20
    Zip Bit Flag : 0x0006
    Zip Compression : Deflated
    Zip Modify Date : 1980:01:01 00:00:00
    Zip CRC : 0xb0bbdac2
    Zip Compressed Size : 497
    Zip Uncompressed Size : 3259
    Zip File Name : [Content_Types].xml
    root@dev2:~#

    I'll make sure we upgrade exiftool soonish. I have already tested it on a newer version and there the mime type is detected fine.

    Kind regards,
    Tim

  5. 6 Posted by vebjorn on 21 Feb, 2012 10:52 AM

    vebjorn's Avatar

    Tim, a PPTX file is actually a zip file, as far as the magic numbers are concerned. That is part of the problem here.

    Perhaps the storage robots should trust and use the Content-Type given by the uploading browser, even though the image processing robots use the magic numbers to see if the file is applicable to them. Would that work?

  6. Support Staff 7 Posted by tim on 21 Feb, 2012 12:42 PM

    tim's Avatar

    Well we'd rather use some file inspection tools to find out the real mime type based on the magic numbers and then decide what to do with it.

    Storage robots accept all files anyway - but they send the mime found in the file.

    We have now upgraded exiftool, which should fix the problem. The upgrade will become available throughout the network over the next hours.

    By the way, the s3 robot allows you to set custom headers with the "headers" parameter - did you see that already?

    Kind regards
    Tim

  7. 8 Posted by vebjorn on 21 Feb, 2012 08:16 PM

    vebjorn's Avatar

    If the fix has propagated yet, it does not work (see https://transloadit.com/assemblies/view/dc8cf311cb7f5217ae29a44adae...). I have attached the relevant file in case that helps you.

    The phrases "the real mime type based on the magic numbers" and "the mime found in the file" are misnomers because the client uploading the file in general has more information available than just the magic numbers when deciding which content type to send. In particular, any Mac or Windows machine with Microsoft Office installed will be able to assign the correct content types to docx, xlsx, and pptx documents, even though all of these formats are zip files.

    I think the correct thing to do is to use exiftools and magic numbers when deciding whether a particular image processing robot can handle a file, but to pass along the original content type when sending the original file to S3.

    Setting the content type with the "headers" parameter is not an appealing solution because the user could upload files of any type (they are attachments to messages) and the file being uploaded does not go through my server. (After all, avoiding having to process files on my server is the main reason I use transloadit.)

  8. Support Staff 9 Posted by tim on 22 Feb, 2012 06:29 AM

    tim's Avatar

    Hey vebjorn,

    I realized that the version of exiftool I tested this against was 8.76. That could extracted "application/vnd.openxmlformats-officedocument.presentationml.presentation". We have now upgraded to 8.79, which extracts "application/zip" again, so there is some confusion in exiftool as well.

    I will discuss within the team if it's a good idea to pass on the original content-type instead of relying completely on magic numbers/exiftools and will then get back to you.

    Sorry for the inconvenience.

  9. Support Staff 10 Posted by tim on 26 Mar, 2012 07:57 AM

    tim's Avatar

    Hey vebjorn,

    did you see my reply above?

  10. 11 Posted by vebjorn on 26 Mar, 2012 01:17 PM

    vebjorn's Avatar

    Where you said you will get back to me? Yes, I saw that reply.

  11. Support Staff 12 Posted by tim on 31 Mar, 2012 08:36 AM

    tim's Avatar

    Hey vebjorn,

    sorry for the delay. We have upgraded our meta parsing tools and the mime type should be extracted fine now. Please confirm this when you get a chance.

    Kind regards,
    Tim

  12. tim closed this discussion on 31 Mar, 2012 08:36 AM.

Comments are currently closed for this discussion. You can start a new one.

Recent Discussions

17 May, 2012 10:00 PM
16 May, 2012 05:50 PM
16 May, 2012 09:16 AM
14 May, 2012 11:10 AM
11 May, 2012 05:21 AM