>> -----Original Message-----
>> From: Michael Higgins [mailto:mhiggins@[EMAIL PROTECTED]
>>
>> Hello, List-ers --
>>
>> I've come across a problem, unsure where to ask, so
>> subscribed here. I upload a file through a browser. It's a
>> '.txt' file and it comes as text/html.
>>
>> However, I've found some hyphen and single-quote like
>> characters that are in this text file are from a higher
>> codepoint... or something. What _seems_ to happen is the
>> browser is stripping them and my script isn't getting all the
>> info to dump into my database.
[8<]
>>
> -----Original Message-----
> From: Scott Statland [mailto:statland@[EMAIL PROTECTED]
>
> The characters that you are describing, may need to be
> escaped or have their codes entered.
> It sounds like that they may have special meanings in either
> the scripting language or in the html output.
Hmm.
I guess my question wasn't clear. The issue is a file upload that is
tagged
as text/html but has wide characters in it. The file doesn't make it out
of
the browser right AFAICT. (If this is obviously incorrect, please post the
correction!)
A little more pain and research let me to find this:
open F, '<', $ARGV[0] or die $!;
for (<F>){
s/([^\x00-\x7f])/sprintf('&#%d;', ord($1))/ge;
print
}
.... helpful code snippet, which applied to my files before they are
uploaded
gives me a new text file with lines like: "Regarding the box – the
driver wouldn’t".
The cool part is that it is uploaded fully and when viewed in a browser
the
characters are displayed correctly. Duh.
Now, if I could only get the browser to fix it up like this when
sending...
rather than what it was doing. Since it's going to a *nix box, I don't
care
about the text/binary thing, right? I guess I could test from a 'nix
Firefox
and see if the behaviour is different.
Anyone have a thought on what is happening that the browser upload fails
to
accommodate text with wide chars? I don't know how it determines ... maybe
if the first char was wide, it'd go up as a different mimetype?
Cheers,
Michael Higgins


|