PPTX files have their content type replaced with application/zip
When I upload a PPTX file (from newer versions of Microsoft Powerpoint), the browser sends the correct content type:
------WebKitFormBoundarytbhfmfnqRIO83xgf Content-Disposition:
form-data; name="file"; filename="test.pptx"
Content-Type:
application/vnd.openxmlformats-officedocument.presentationml.presentation
However, when I view the assembly (e.g., https://transloadit.com/assemblies/view/3dacaaf9531e1d15679407947d4...) on your web site, it lists "application/zip" as the content type. The incorrect "application/zip" is also what is being stored as the file's content type in S3 when the S3 robot saves the file.
As you can imagine, this becomes a problem when a used tries to download the file again from S3. How can I get around the problem?
Comments are currently closed for this discussion. You can start a new one.
Support Staff 2 Posted by tim on 20 Feb, 2012 02:18 PM
Hey there,
I pushed a fix for this. Once our testsuite is done running the tests I will trigger a deployment.
The fix should be live for you within the next 15min.
Kind regards
Tim
Support Staff 3 Posted by tim on 20 Feb, 2012 02:40 PM
The fix has been live for a couple minutes.
4 Posted by vebjorn on 21 Feb, 2012 10:27 AM
Thanks for the fix, but stil the same thing is happening:
https://transloadit.com/assemblies/view/6f4cf8bdd4c64ec06424a246cd4...
Support Staff 5 Posted by tim on 21 Feb, 2012 10:34 AM
Hey vebjorn,
it seems our version of exiftool identifies the image to be a zip, that is the problem.
root@dev2:~# exiftool b48bb5eaf9b1daf320a0386d78038a22
ExifTool Version Number : 8.56
File Name : b48bb5eaf9b1daf320a0386d78038a22
Directory : .
File Size : 34 kB
File Modification Date/Time : 2012:02:21 10:26:44+00:00
File Permissions : rw-r--r--
File Type : ZIP
MIME Type : application/zip
Zip Required Version : 20
Zip Bit Flag : 0x0006
Zip Compression : Deflated
Zip Modify Date : 1980:01:01 00:00:00
Zip CRC : 0xb0bbdac2
Zip Compressed Size : 497
Zip Uncompressed Size : 3259
Zip File Name : [Content_Types].xml
root@dev2:~#
I'll make sure we upgrade exiftool soonish. I have already tested it on a newer version and there the mime type is detected fine.
Kind regards,
Tim
6 Posted by vebjorn on 21 Feb, 2012 10:52 AM
Tim, a PPTX file is actually a zip file, as far as the magic numbers are concerned. That is part of the problem here.
Perhaps the storage robots should trust and use the Content-Type given by the uploading browser, even though the image processing robots use the magic numbers to see if the file is applicable to them. Would that work?
Support Staff 7 Posted by tim on 21 Feb, 2012 12:42 PM
Well we'd rather use some file inspection tools to find out the real mime type based on the magic numbers and then decide what to do with it.
Storage robots accept all files anyway - but they send the mime found in the file.
We have now upgraded exiftool, which should fix the problem. The upgrade will become available throughout the network over the next hours.
By the way, the s3 robot allows you to set custom headers with the "headers" parameter - did you see that already?
Kind regards
Tim
8 Posted by vebjorn on 21 Feb, 2012 08:16 PM
If the fix has propagated yet, it does not work (see https://transloadit.com/assemblies/view/dc8cf311cb7f5217ae29a44adae...). I have attached the relevant file in case that helps you.
The phrases "the real mime type based on the magic numbers" and "the mime found in the file" are misnomers because the client uploading the file in general has more information available than just the magic numbers when deciding which content type to send. In particular, any Mac or Windows machine with Microsoft Office installed will be able to assign the correct content types to docx, xlsx, and pptx documents, even though all of these formats are zip files.
I think the correct thing to do is to use exiftools and magic numbers when deciding whether a particular image processing robot can handle a file, but to pass along the original content type when sending the original file to S3.
Setting the content type with the "headers" parameter is not an appealing solution because the user could upload files of any type (they are attachments to messages) and the file being uploaded does not go through my server. (After all, avoiding having to process files on my server is the main reason I use transloadit.)
Support Staff 9 Posted by tim on 22 Feb, 2012 06:29 AM
Hey vebjorn,
I realized that the version of exiftool I tested this against was 8.76. That could extracted "application/vnd.openxmlformats-officedocument.presentationml.presentation". We have now upgraded to 8.79, which extracts "application/zip" again, so there is some confusion in exiftool as well.
I will discuss within the team if it's a good idea to pass on the original content-type instead of relying completely on magic numbers/exiftools and will then get back to you.
Sorry for the inconvenience.
Support Staff 10 Posted by tim on 26 Mar, 2012 07:57 AM
Hey vebjorn,
did you see my reply above?
11 Posted by vebjorn on 26 Mar, 2012 01:17 PM
Where you said you will get back to me? Yes, I saw that reply.
Support Staff 12 Posted by tim on 31 Mar, 2012 08:36 AM
Hey vebjorn,
sorry for the delay. We have upgraded our meta parsing tools and the mime type should be extracted fine now. Please confirm this when you get a chance.
Kind regards,
Tim
tim closed this discussion on 31 Mar, 2012 08:36 AM.