In the last weeks I have been asked several times to modify some components I’m working on to add the ability to split a full name in its components (first name, family name, etc.).
It looks like most people have great expectations about this working correctly but they get annoyed when it fails, and you can be sure it will fail. It will fail because it’s impossible to parse a name correctly, for instance:
|Barack Hussein Obama||Barack||Hussein||Obama|
|Pier Silvio Berlusconi||Pier Silvio||Berlusconi|
|José Rodríguez Zapatero||José||Rodríguez Zapatero|
How can you do this automatically?
This becomes particularly silly if you cannot be sure that the string you are going to parse is actually a full name, for instance don’t try to parse a chat nickname. It’s true that gmail/gtalk uses your full name by default, but this is only a default and it’s true only for gmail.
To cut a long story short, please please please don’t try to parse names. You can see by yourself how hard it is, even if I’m just considering western-style names.
If you still don’t trust me here’s a quote from
e-name-western.c, i.e. the file that does name parsing in libebook :
* <Nat> Jamie, do you know anything about name parsing? * <jwz> Are you going down that rat hole? Bring a flashlight.
On a side note when you are trying to understand why some code is broken you can find some funny commits, like the great EDS purge
Update: I found this “serious” bug in