In <iL6dnQMCbfnGfsvfRVn-pg@[EMAIL PROTECTED]
>,
Scott Moore <scott.moore6@[EMAIL PROTECTED]
> wrote:
> The usual mode in original Pascal for string is to use "right padded"
> strings, which works for the majority of purposes.
I disagree. I often build up lines from several strings, either by
explicit concatenation or as a sequence of `WriteLn' arguments.
Trimming trailing spaces in substrings would falsify the result. If
some character not otherwise used (e.g. NUL) would be used for
padding it might have been barely acceptable IMHO, but with spaces
it's not an option worth consideration in the majority of my
purposes.
> However, it is also
> possible to use lengthed strings. A lot of this is covered in the
> ANSI-ISO Pascal FAQ listed in my sig below.
You mean building a record containing the length and the characters?
How is this different from both the UCSD/BP and the EP strings that
you denounced as "Basic" strings?
> Virtually all Pascal compilers have a way to work with dynamic arrays,
> although it was (unfortunately, in my opinion) common to restrict them
> to strings only. The extended Pascal standard has full dynamic arrays
> as well (but oddly, felt the need to also include the redundant string
> capability).
Actually EP string types are just a special (predefined) case of
schema types. They have some special rules (mostly syntactic), but
their implementation is basically the same.
It's not redundant either, since this way, operations on constant
strings can be done at compile-time (and must be, according to EP,
i.e. such results are allowed values of constants). Any user-defined
string handling would not allow that.
> I mention this all to make a point. String processing would be one area
> where it would be wise to choose an extention to the original Pascal.
> I wish they could all be compatible, but the reality is that they are
> not. If you are only interested in strings, and not dynamic arrays
> in general, then the UCSD implementation probally qualifies as the
> most widely implemented such extention. It appears nowdays in both the
> Borland series, and both GPC and FPC, differing in only minor details.
Actually GPC doesn't have them yet (but plans to). But from a user's
point of view they don't look much different from EP strings, except
that UCSD strings allow access to the length as "character number
0", and their length is limited to 255 chars.
In <slrnd5f70p.2va4.marcov@[EMAIL PROTECTED]
>,
Marco van de Voort <marcov@[EMAIL PROTECTED]
> wrote:
> On 2005-04-08, CBFalconer <cbfalconer@[EMAIL PROTECTED]
> wrote:
> > Marco van de Voort wrote:
> >> On 2005-04-08, Scott Moore <scott.moore6@[EMAIL PROTECTED]
> wrote:
> >> than either in padded or zero terminated form?
> >
> > They are more expensive in memory allocation. Many moons ago I was
> > using one of the Turbos and I had a recursive routine that included
> > some string manipulation. It didn't even declare any string
> > variables. It proceeded to crash the system with a stack overflow.
>
> How? If I declare a string everywhere where I would normally do an array
> of char, and pass by ref, how could this happen?
Indeed. What takes up most memory is the character array which is
the same in both cases: With both fixed-strings, UCSD strings and EP
strings you must allocate a certain length initially. All of them
can pass parameters with the actual length (fixed-strings as
conformant arrays, UCSD strings at least in BP as "open string"
parameters AFAIR, EP strings as schema parameters). And of course,
all of them can be passed by reference where suitable.
The difference WRT storage is the additional length information (1
byte for UCSD, one integer in EP). But that's a fixed size (and
rather small), and it's not responsible for the effects Chuck
mentioned.
> > I disagree that length is needed in most string operations.
> > Compares, writes, and most copies can just start at the beginning
> > and go on until the end. Nul terminated strings do very well for
> > this sort of operation. Concatenation is an exception.
>
> E.g. on a x86 compare is typically done by a rep cmpsb which needs
length,
> same with copy. C char *'s are slow because, in a comparison case, have
to
> check each character once for the comparison, once _each_ for zero.
>
> Of course the compiler builders went through a lot of trouble to try to
> speed this up, but that is brute force, not elegance.
I also agree. As Chuck said, concatenation is an exception, but not
the only one. And since such "exceptions" add an O(n) factor, they
may be quite considerable. And as Marco said, even where it's
possible to avoid knowing the length in advance, it may complicate
the algorithm, and that doesn't only apply internally in the
compiler, but also for user algorithms. Using a high-level language
I want to be able to process strings from boths side when I want to.
> p.s. I miss ansi-iso strings in the equation. How expensive is a
lenght() on
> a 7185 "string" ?
Depends on what you mean -- and that's where the trouble starts. If
you want the full length, i.e. the array size, it's O(1), but the
result is independent of the contents. If you mean the length
excluding trailing spaces, it becomes O(n) -- similar as with C
strings, though counting from the other side. For C strings you have
to count the characters of the string to find the NUL, for Pascal
fixed-strings you have to count the trailing spaces (which may
actually be worse -- if you usually allocate length for a supposed
worst case, I suppose on average less than half of it is actually
used, so you spend more time counting padding spaces than C spends
counting characters).
(For EP strings (schemata), Length is an O(1) operation, of course.)
For all these reasons and because storing the length is a fixed
overhead, this is actually a no-brainer to me. I wonder why some of
you are still arguing for string types without stored length in
general. It may be suitable for some very special purposes, but
usually I value programming comfort higher, that's why I prefer to
program in Pascal in the first place ...
Frank
--
Frank Heckenbach, frank@[EMAIL PROTECTED]
http://fjf.gnu.de/
GnuPG and PGP keys: http://fjf.gnu.de/plan
(7977168E)
Pascal code, BP CRT bugfix: http://fjf.gnu.de/programs.html
Free GNU Pascal Compiler: http://www.gnu-pascal.de/


|