<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>
<channel>
	<title>Comments on: Parsing names</title>
	<atom:link href="http://blog.barisione.org/2009-06/parsing-names/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.barisione.org/2009-06/parsing-names/</link>
	<description></description>
	<pubDate>Wed, 10 Mar 2010 18:41:51 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7.1</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Nacho de los Ríos</title>
		<link>http://blog.barisione.org/2009-06/parsing-names/comment-page-1/#comment-1898</link>
		<dc:creator>Nacho de los Ríos</dc:creator>
		<pubDate>Mon, 22 Jun 2009 10:00:50 +0000</pubDate>
		<guid isPermaLink="false">http://blog.barisione.org/?p=149#comment-1898</guid>
		<description>And there are further complications in Spanish! As you seem to find some fun in this matter, let me elaborate:

Just as, I imagine, in Italian, many last names and names have articles and prepositions in them (de, los, del, de los, de la, de las, la, etcetera); these should NEVER be capitalized, and must be ignored for sorting, both of which are invariably done wrong.

And there are some names that can also be, less frequently, surnames (Santiago, Esteban, Miguel, for example).

And we do legally have two last names, one from the father, one from the mother, but we often only quote the first one, for brevity. 

For traditional reasons, in the vast majority of cases, you get the first last name from your father, but nowadays parents have that choice too.

I would not be surprised if you could pass on your SECOND last name, instead of the first one, offering a full four possibilities for naming your progeny, although I know of no such cases. I do *believe* that all brothers have to have the same set and in the same order, but Spanish law is so permissive with that sort of dumb liberties that I could easily be wrong.

So there is no safe way to split "José Miguel de la Rosa" without  more information. Most likely the name would be "José Miguel" and there would only be one last name quoted, but it could also be that the name were "José" and the last names "Miguel" and "de la Rosa". So for Spanish name splitting, some complicated heuristics are necessary improve certainty, and still they will fail sometimes.</description>
		<content:encoded><![CDATA[<p>And there are further complications in Spanish! As you seem to find some fun in this matter, let me elaborate:</p>
<p>Just as, I imagine, in Italian, many last names and names have articles and prepositions in them (de, los, del, de los, de la, de las, la, etcetera); these should NEVER be capitalized, and must be ignored for sorting, both of which are invariably done wrong.</p>
<p>And there are some names that can also be, less frequently, surnames (Santiago, Esteban, Miguel, for example).</p>
<p>And we do legally have two last names, one from the father, one from the mother, but we often only quote the first one, for brevity. </p>
<p>For traditional reasons, in the vast majority of cases, you get the first last name from your father, but nowadays parents have that choice too.</p>
<p>I would not be surprised if you could pass on your SECOND last name, instead of the first one, offering a full four possibilities for naming your progeny, although I know of no such cases. I do *believe* that all brothers have to have the same set and in the same order, but Spanish law is so permissive with that sort of dumb liberties that I could easily be wrong.</p>
<p>So there is no safe way to split &#8220;José Miguel de la Rosa&#8221; without  more information. Most likely the name would be &#8220;José Miguel&#8221; and there would only be one last name quoted, but it could also be that the name were &#8220;José&#8221; and the last names &#8220;Miguel&#8221; and &#8220;de la Rosa&#8221;. So for Spanish name splitting, some complicated heuristics are necessary improve certainty, and still they will fail sometimes.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: daybreaker</title>
		<link>http://blog.barisione.org/2009-06/parsing-names/comment-page-1/#comment-1897</link>
		<dc:creator>daybreaker</dc:creator>
		<pubDate>Sat, 20 Jun 2009 13:21:58 +0000</pubDate>
		<guid isPermaLink="false">http://blog.barisione.org/?p=149#comment-1897</guid>
		<description>If you consider parsing names in eastern asian languages, things would be much worse. In Korean, for example, the family name and the given name is not separated by whitespaces. (The order of them is reversed--the family name comes first.) Most names are composed of 3 hangul syllables, but some names are 2 syllables, and even some names have family names with 2 syllables and given names with 1 or 2 syllables. In texts, we have postpositional words that indicates the role of an word in a sentence, so one or two more syllables may follow the name instance without whitespace.</description>
		<content:encoded><![CDATA[<p>If you consider parsing names in eastern asian languages, things would be much worse. In Korean, for example, the family name and the given name is not separated by whitespaces. (The order of them is reversed&#8211;the family name comes first.) Most names are composed of 3 hangul syllables, but some names are 2 syllables, and even some names have family names with 2 syllables and given names with 1 or 2 syllables. In texts, we have postpositional words that indicates the role of an word in a sentence, so one or two more syllables may follow the name instance without whitespace.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael Moore</title>
		<link>http://blog.barisione.org/2009-06/parsing-names/comment-page-1/#comment-1896</link>
		<dc:creator>Michael Moore</dc:creator>
		<pubDate>Fri, 19 Jun 2009 11:20:11 +0000</pubDate>
		<guid isPermaLink="false">http://blog.barisione.org/?p=149#comment-1896</guid>
		<description>@Anonymous: did you take a look at the software I mentioned?  It does what Marco is looking for, identifying the culture of a name and parsing it according to that culture's rules.  This is a knowledge-based solution to a hard problem.  For all the reasons that Marco and other commenters have identified, an algorithmic approach will not work.</description>
		<content:encoded><![CDATA[<p>@Anonymous: did you take a look at the software I mentioned?  It does what Marco is looking for, identifying the culture of a name and parsing it according to that culture&#8217;s rules.  This is a knowledge-based solution to a hard problem.  For all the reasons that Marco and other commenters have identified, an algorithmic approach will not work.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeroen Ruigrok van der Werven</title>
		<link>http://blog.barisione.org/2009-06/parsing-names/comment-page-1/#comment-1895</link>
		<dc:creator>Jeroen Ruigrok van der Werven</dc:creator>
		<pubDate>Fri, 19 Jun 2009 08:52:28 +0000</pubDate>
		<guid isPermaLink="false">http://blog.barisione.org/?p=149#comment-1895</guid>
		<description>I think you will find that it is nearly impossible to invent a system that can correctly parse names into pieces.

My own name, for example, has as last name Ruigrok van der Werven, but if you would take the general Dutch rule that last names have a 'van ' part, you might think van der Werven is my last name and Ruigrok is part of my first or middle name. And 'van' is not considered part of the last name and is registered as a prefix and this system does not provide for that specific part.</description>
		<content:encoded><![CDATA[<p>I think you will find that it is nearly impossible to invent a system that can correctly parse names into pieces.</p>
<p>My own name, for example, has as last name Ruigrok van der Werven, but if you would take the general Dutch rule that last names have a &#8216;van &#8216; part, you might think van der Werven is my last name and Ruigrok is part of my first or middle name. And &#8216;van&#8217; is not considered part of the last name and is registered as a prefix and this system does not provide for that specific part.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Anonymous</title>
		<link>http://blog.barisione.org/2009-06/parsing-names/comment-page-1/#comment-1894</link>
		<dc:creator>Anonymous</dc:creator>
		<pubDate>Fri, 19 Jun 2009 08:19:54 +0000</pubDate>
		<guid isPermaLink="false">http://blog.barisione.org/?p=149#comment-1894</guid>
		<description>I don't understand why any software would even *consider* trying to parse a name.  Just treat the full name as an indivisible string, and if necessary have a second field for "nickname" or similar if absolutely necessary for some reason.</description>
		<content:encoded><![CDATA[<p>I don&#8217;t understand why any software would even *consider* trying to parse a name.  Just treat the full name as an indivisible string, and if necessary have a second field for &#8220;nickname&#8221; or similar if absolutely necessary for some reason.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Anonymous</title>
		<link>http://blog.barisione.org/2009-06/parsing-names/comment-page-1/#comment-1893</link>
		<dc:creator>Anonymous</dc:creator>
		<pubDate>Fri, 19 Jun 2009 07:32:52 +0000</pubDate>
		<guid isPermaLink="false">http://blog.barisione.org/?p=149#comment-1893</guid>
		<description>@Michael Moore: are you the fat liberal film-maker, or merely another spammer of the same name?</description>
		<content:encoded><![CDATA[<p>@Michael Moore: are you the fat liberal film-maker, or merely another spammer of the same name?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Marco Barisione: Parsing names &#124; Full-Linux.com</title>
		<link>http://blog.barisione.org/2009-06/parsing-names/comment-page-1/#comment-1892</link>
		<dc:creator>Marco Barisione: Parsing names &#124; Full-Linux.com</dc:creator>
		<pubDate>Fri, 19 Jun 2009 06:06:10 +0000</pubDate>
		<guid isPermaLink="false">http://blog.barisione.org/?p=149#comment-1892</guid>
		<description>[...] information related to your search Marco Barisione: Parsing names is now available in this link&#8230;:         News [...]</description>
		<content:encoded><![CDATA[<p>[...] information related to your search Marco Barisione: Parsing names is now available in this link&#8230;:         News [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael Moore</title>
		<link>http://blog.barisione.org/2009-06/parsing-names/comment-page-1/#comment-1891</link>
		<dc:creator>Michael Moore</dc:creator>
		<pubDate>Thu, 18 Jun 2009 22:50:37 +0000</pubDate>
		<guid isPermaLink="false">http://blog.barisione.org/?p=149#comment-1891</guid>
		<description>IBM InfoSphere Global Name Recognition provides multi-cultural name information, analytics and name matching through a series of flexible, easy-to-integrate, SOA enabled interfaces, enabling you to unlock and unleash the wealth of information in a name.</description>
		<content:encoded><![CDATA[<p>IBM InfoSphere Global Name Recognition provides multi-cultural name information, analytics and name matching through a series of flexible, easy-to-integrate, SOA enabled interfaces, enabling you to unlock and unleash the wealth of information in a name.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: barisione</title>
		<link>http://blog.barisione.org/2009-06/parsing-names/comment-page-1/#comment-1890</link>
		<dc:creator>barisione</dc:creator>
		<pubDate>Thu, 18 Jun 2009 19:52:28 +0000</pubDate>
		<guid isPermaLink="false">http://blog.barisione.org/?p=149#comment-1890</guid>
		<description>@Michael Wales:
That works in libebook and is parsed correctly, but the code uses an hardcoded list of prefixes/suffixes.
There are also other possible problems from this. Is liv a first name or does it mean 54th? :)</description>
		<content:encoded><![CDATA[<p>@Michael Wales:<br />
That works in libebook and is parsed correctly, but the code uses an hardcoded list of prefixes/suffixes.<br />
There are also other possible problems from this. Is liv a first name or does it mean 54th? :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tassos Bassoukos</title>
		<link>http://blog.barisione.org/2009-06/parsing-names/comment-page-1/#comment-1889</link>
		<dc:creator>Tassos Bassoukos</dc:creator>
		<pubDate>Thu, 18 Jun 2009 19:46:59 +0000</pubDate>
		<guid isPermaLink="false">http://blog.barisione.org/?p=149#comment-1889</guid>
		<description>That's not all. Greek names have traditionally the family (last) name first, then the given name.</description>
		<content:encoded><![CDATA[<p>That&#8217;s not all. Greek names have traditionally the family (last) name first, then the given name.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.320 seconds -->
