Firefox - gedit is the best!

Tue Oct 29 10:36:41 UTC 2013

On 29 October 2013 09:07, Mateusz Marzantowicz
<mmarzantowicz at osdf.com.pl> wrote:
> On 29.10.2013 09:17, Ian Malone wrote:
>> On 29 October 2013 04:47, Tim <ignored_mailbox at yahoo.com.au> wrote:

>>> There are any number of different types of files
>>> (function-wise) that are the same file-type (construction-wise), so they
>>> need correct identification by what's sending it, as it will be the only
>>> thing that would correctly know what it is.
>>
>> This and the general problem of correctly identifying the type of
>> every data type and version under the sun is the reason to not try and
>> snoop the data type.
>>
>
> OK, I know all that argumentation about security but as you've mentioned
> HTTP headers could be easily manipulated. Content recognition must be
> done somewhere, in that case on web server, in order to set headers
> correctly. There always would be need for content inspection. So what is
> better: check content on server side or client side? From client
> perspective the later is safer because it doesn't have to trust some
> remote entity. My sample URL showed that even GitHub isn't perfect and
> sets improper headers for some files (or it does it by choice). Finally,
> client software and web browsers should not be fragile to miscellaneous
> and manipulated content - they just should recognizes it as such.
>

This is irrelevant, they are two different things: security and the
intended interpretation of the data. Security in this context comes
down to being suspicious that what you get may not be what it claims
to be. The client does not (should not) *trust* the content type. But
the correct application to handle a particular content type is best
placed to decide whether it's genuine, not your web browser. Trying to
read and detect an array of types would open the door to more
vulnerabilities. Scanning for viruses or known attacks is not content
detection in this sense.
As Tim pointed out even just for text you can't trivially tell whether
it should be interpreted as plain text, html, svg, C etc. without
trying to do complex parsing. There is *not* a need for content
detection if the server is working correctly, it should know from
context what it's serving. You've found a bug with github, that's
their issue to fix, not every web browser's to bodge.

-- 
imalone
http://ibmalone.blogspot.co.uk