lundslaktare@[EMAIL PROTECTED]
said:
<snip>
> The thing is, that I don't know enough of C (or any other language for
> that matter)
> that I can write such a program myself.
Well, perhaps you can learn.
> I asked my Father, and he suggested that I should use microsoft-word
> to make the whole text one column,
> and then insert that column in excel and use the sorting
> function.
Blech! :-)
> But that wont work for long text(more than 65536 words).
And it's so inelegant, too.
> I think it's just natural to count
> mor-
> ning
>
> as morning and -
I think it's more natural to think of it as one word that has been split
across two lines, and the hyphen is furniture that can be discarded.
> At least you don't want to miss a word just because it's written on
> two lines.
>
> Do you know a simple program that don't use megs of data?
The amount of data is up to you, since that's decided at runtime.
The best book on C programming is "The C Programming Language", 2nd
edition, by Kernighan and Ritchie. Exercise 6-4 of that book is: "Write a
program that prints the distinct words in its input sorted into decreasing
order of frequency of occurrence. Precede each word by the count."
Bryan Williams has written a solution to this exercise, available here:
http://clc-wiki.net/wiki/K%26R2_solutions%3AChapter_6%3AExercise_4
We can hack this to produce a list sorted by word rather than by count, by
writing this function:
int CompareWords(const void *vWord1,
const void *vWord2)
{
WORD *const *Word1 = vWord1;
WORD *const *Word2 = vWord2;
assert(NULL != vWord1);
assert(NULL != vWord2);
return strcmp((*Word1)->Word, (*Word2)->Word);
}
....and by adding a prototype for it:
int CompareWords(const void *vWord1,
const void *vWord2);
just under the similar prototype for CompareCounts, and by replacing the
qsort call with this line:
qsort(WordArray, Treecount, sizeof *WordArray, CompareWords);
We can fix the output to the way you want it by changing:
fprintf(Dest, "%10lu %s\n", (unsigned long)WordArray[Pos]->Count,
WordArray[Pos]->Word);
to this:
fprintf(Dest,
"%-30s\t%10lu\n",
WordArray[Pos]->Word,
(unsigned long)WordArray[Pos]->Count);
After making these changes and re-compiling (see
http://www.cpax.org.uk/prg/****table/c/resources.php
for a list of free C
compilers), running the program (and using its own source code as input)
results in output that starts like this:
AddToTree 5
Assumptions 1
Author 1
Bryan 1
Buf 4
CANNOT_MALLOC_WORDARRAY 2
Chapter 1
CompResult 4
CompareCounts 3
(and goes on for another 219 lines).
This is very nearly what you want, but not quite, because it doesn't
handle
your "join halfwords that are split by a hyphen" requirement. If you want
that, you'll either have to find some other kind soul who has more spare
time than I do, or learn enough about C to make that change yourself.
--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www.
+rjh@[EMAIL PROTECTED]
users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999


|