On 05/23/2007 04:16 PM, Michael Higgins wrote:
>>> -----Original Message-----
>>> From: Michael Higgins [mailto:mhiggins@[EMAIL PROTECTED]
>>>
>>> Hello, List-ers --
>>>
>>> I've come across a problem, unsure where to ask, so
>>> subscribed here. I upload a file through a browser. It's a
>>> '.txt' file and it comes as text/html.
>>>
>>> However, I've found some hyphen and single-quote like
>>> characters that are in this text file are from a higher
>>> codepoint... or something. What _seems_ to happen is the
>>> browser is stripping them and my script isn't getting all the
>>> info to dump into my database.
>
> [8<]
>
>> -----Original Message-----
>> From: Scott Statland [mailto:statland@[EMAIL PROTECTED]
>>
>> The characters that you are describing, may need to be
>> escaped or have their codes entered.
>> It sounds like that they may have special meanings in either
>> the scripting language or in the html output.
>
> Hmm.
>
> I guess my question wasn't clear. The issue is a file upload that is
tagged
> as text/html but has wide characters in it. The file doesn't make it out
of
> the browser right AFAICT. (If this is obviously incorrect, please post
the
> correction!)
>
> A little more pain and research let me to find this:
>
> open F, '<', $ARGV[0] or die $!;
> for (<F>){
> s/([^\x00-\x7f])/sprintf('&#%d;', ord($1))/ge;
> print
> }
>
> ... helpful code snippet, which applied to my files before they are
uploaded
> gives me a new text file with lines like: "Regarding the box – the
> driver wouldn’t".
>
> The cool part is that it is uploaded fully and when viewed in a browser
the
> characters are displayed correctly. Duh.
>
> Now, if I could only get the browser to fix it up like this when
sending...
> rather than what it was doing. Since it's going to a *nix box, I don't
care
> about the text/binary thing, right? I guess I could test from a 'nix
Firefox
> and see if the behaviour is different.
>
> Anyone have a thought on what is happening that the browser upload fails
to
> accommodate text with wide chars? I don't know how it determines ...
maybe
> if the first char was wide, it'd go up as a different mimetype?
>
> Cheers,
>
>
> Michael Higgins
>
>
>
What browser is creating the problem? What O/S is that browser running on?
MSIE re****tedly performs some file type heuristics, so I suspect the
browser is MSIE.
Evidently the .txt file looks like an HTML file. If it truly is an HTML
file, you might be able to fix the problem by specifying the character
set in a META tag.
Or you could compress the file with gzip or zip to convince MSIE to
leave the file alone when uploading it.


|