In article <MPG.2293b77914e2118e989704@[EMAIL PROTECTED]
>, mrc2323@[EMAIL PROTECTED]
says...
[ ... ]
> Okay, here's what I am trying to do:
> 1. I have a large file with names (in the form "last, first") and
> gender codes ('M'/'F') from which I want to parse the "first" name
> string and build a std::list (or some other container type) of all
> unique (first) names and genders.
> 2. I intend to use this information in a data entry application that
> checks the validity of a inputted first_name against this data
> collection. If there's a conflict (e.g. "SUE",'M', or "MARVIN",'F'), I
> want the application to pause and let the user decide if that's correct.
> 3. There are cases ("PAT", "CHRIS", etc.) where the name is valid,
> regardless of gender. Therefore, the std::list should contain multiple
> objects that have the same "search key" value (e.g. "CHRIS") but have
> different gender codes - 2 different objects with identical "find"
> values. I am struggling with (1) building the std::list and (2)
> searching it for all possible variations of name & gender.
> BTW, I appreciate the helpful response, Jerry...
First of all, I would _not_ use a linked list. Second, I'd use a single
entry for each first name, storing the number of males and number of
females with that first name. I believe that should simplify your code
quite a bit. Personally, I'd write the code something like this:
// Warning: all code in the post is UNTESTED!
// data is it's read from the file:
struct person {
std::string fname;
std::string lname;
char gender;
};
// read the data from the file:
std::istream &operator>>(std::istream &is, person &p) {
// assumes file is of form: last_name ',' first_name ',' gender '\n'
is.getline(p.lname, ',');
is.getline(p.fname, ',');
is >> p.gender;
if (is.peek() != '\n')
is.setstate(std::ios::failbit);
return is;
}
enum gender { MALE, FEMALE };
// holds the data we care about:
struct name_use {
std::string first_name;
long f_use;
long m_use;
name_use(std::string name, gender g) :
first_name(name), f_use(0), m_use(0)
{
if (g==MALE)
++m_use;
else
++f_use;
}
bool operator<(name_use &other) {
return first_name < other.first_name;
}
};
std::set<name_use> names;
std::ifstream input("myfile.hst");
person temp;
while (input>>temp) {
gender g = temp.gender == 'M' ? MALE : FEMALE;
name_use nu(temp.fname, gender);
std::set<name_use>::iterator it = names.find(nu);
if (it != names.end()) {
// name found -- increment appropriate count
if (gender == MALE)
++it->m_use;
else
++it->f_use;
}
else { // name not present yet
names.insert(nu);
}
To put this to use, you'd set a threshold, and check whether the value
was below that threshold:
const double threshold = 0.05;
person p;
get_data(input, p);
std::set<name_use>::iterator n = names.find(person.fname);
if ((n == names.end())
// maybe a typo?
warn("Please verify name");
else {
double percent_male = double(n->m_use)/(n->f_use+n->m_use);
double percent_female = 1.0 - percent_male;
if (p.gender == 'M' && (percent_male < threshold))
warn("Please verify gender");
else if (p.gender == 'F' && (percent_female < threshold))
warn("Please verify gender");
}
There are, of course, a number of alternatives, such as using an
std::map, with the first name as the key and the usages as the
associated data. This might be a tad cleaner in places, but I doubt the
difference would be particularly major.
--
Later,
Jerry.
The universe is a figment of its own imagination.


|