Handling phone numbers

I’m often asked questions about the handling and parsing of phone numbers, so I’m going to explain how we do it on Maemo 5. I hope this can be useful also for developers of other applications.

There is no unique standardised way to write phone numbers; in the UK the phone number of the Buckingham Palace Visitor Office can be written as 02073212233, +44 (0)20 7321 2233, 0044 207 321 2233, etc. If you omit the international prefix +44, the number 02073212233 could be used by somebody else in another country, for instance to me it looks like a phone number for somebody living in Milan.

When storing a phone number you should keep it as you got it, including spaces, parenthesis, etc.
When you want to use the number you should drop all the useless characters, but keep the extensions numbers. For instance 44-555-P1 would become 44555P1, which means: call the Vodafone UK balance information number 44555, pause for some seconds waiting for the recorded voice to start speaking, and send a 1 (i.e. ask for a text message with the remaining minutes for this month).

When comparing phone numbers to see if they belong to the same contact you also want to strip all the extra digits sent after a pause as those are not really part of the phone number. At this point you still have to somehow handle the craziness of international and local prefixes, for instance all these numbers could be a valid way to call the same person in San Marino: 0549 123 456, +378 0549 123 456, +39 0549 123 456, 0039 0549 123 456, 011 39 0549 123 456.
How do phones handle this? Just by comparing the last 7 digits of the phone number, that is the minimum length used somewhere for phone numbers.
This of course leaves a chance of false matches, but as you can see there is no real generic solution for this.

Here’s some code to show how to handle phone numbers. I used Python as a sort of pseudo-language, so I preferred readability for non-Python developers over good pythonic code.

extension_chars = ('p', 'P', 'w', 'W', 'x', 'X')


def normalize_phone_number(number):
    common_delimiters = (',', '.', '(', ')', '-', ' ',
                         't', '/')
    valid_digits = ('#', '*', '0', '1', '2', '3', '4',
                    '5', '6', '7', '8', '9')

    normalized = ''

    for digit in number:
        if digit in extension_chars:
            # Keep the extension characters P, W and X,
            # but be sure they are upper case.
            digit = digit.upper()
        elif digit == '+':
            # "+" is valid only at the beginning of phone
            # numbers or after the number suppression
            # prefix. (No idea why we support only this
            # GSM code, but not the VSC ones.)
            if normalized not in ('', '*31#', '#31#'):
                print 'Wrong "+" in "%s"' % number
                # Skip this "+".
                continue
        elif digit in common_delimiters:
            # Skip this delimiter.
            continue
        elif digit in valid_digits:
            # Ok, let's keep it.
            pass
        else:
            # What is this? It doesn't seem valid but we
            # just keep it
            print 'Unknown character "%s" in "%s"' % 
                    (digit, number)

        normalized += digit

    return normalized


def remove_extension_chars(number):
    clean = ''

    for digit in number:
        if digit in extension_chars:
            # Extension character, drop this character and
            # the rest of the string.
            break

        clean += digit

    return clean


def phone_numbers_equal(number1, number2):
    number1 = normalize_phone_number(number1)
    number1 = remove_extension_chars(number1)

    number2 = normalize_phone_number(number2)
    number2 = remove_extension_chars(number2)

    # Compare only the last 7 digits.
    # If one of the numbers is shorter than 7 digits it's
    # important that the comparison is done on the full
    # length of the numbers and not only on the last tiny
    # bits of the 2 numbers.
    return number1[-7:] == number2[-7:]

Python code for handling phone numbers
(Download the full code with tests)

If you are handling phone numbers on Maemo 5, there are already some useful functions to use: e_​normalize_​phone_​number, osso_​abook_​phone_​numbers_​equal, osso_​abook_​contact_​matches_​phone_​number and osso_​abook_​query_​phone_​number.

39 thoughts on “Handling phone numbers

  1. Frankly, the way phones handle this, blow massively.
    My last nokia phone showed a friend calling (w/ name) and I answered. When I tried to call back, it failed, because I had the wrong prefix and my phone still identified the caller. Where was I to know from, that I actually had the wrong number?

    I would really appreciate a better way than that.

    Like

  2. @maweki:
    That’s the operator’s fault. The can put whatever they want in the called ID, including the number without the prefix even if it’s needed. A phone cannot do anything to work it around.

    Like

  3. @barisione:
    In the caller’s details it showed the correct differing number (w/ the right prefix). As it would in the above code.

    Sure, if no prefix is available for one number, this would be the way to go.

    But if both numbers do have an area-code-like amount of numbers, you could surely compare more than 7 numbers, when they are available.

    Like

  4. Sorry for the double post, but to clarify:

    Number 1 is 0172 1234567
    Number 2 is 0049171 1234567
    common german cell prefixes. When both numbers are available that way (w/ prefix) it should not produce a match. On my former Nokia, it did and showed a name even if the numbers did not match.

    Like

  5. damn, I should really keep my head together.
    The problem is, that you produce a false match if you write down the wrong prefix and the other person calls you. You see the name and guess that you wrote the number down correctly.

    I think if both numbers are the same length, you could check the whole numbers for a match and if they differ in lenth you could check for the shorter lentgth (maybe minus 2) but no shorter than 7.

    You see, I think that there is an off chance, that two different people you know have the same number with different prefixes but the room for human error when entering numbers by hand (as it is common) is pretty big, afaik.

    Like

  6. I had do handle phone numbers a while back for an ERP system with TAPI interface.
    To make searching fast I added an extra field to the DB for the normalised numbers.

    To normalise the number I do:
    1. If the first character is an ‘+’ repace it with ’00’
    2. remove all not numeric characters
    3. If the first numeric character is an ‘0’ and the second not replace ‘0’ with the international prefix of the user.

    With this simple normalising I can handle:
    +49 30 12345-67 DIN 5008
    +49 30 1234567 E.123
    +49 (30) 1234567 Microsoft[1] +49-30-1234567 Uniform Resource Identifier RFC 3966; and E.123

    But not +49 (0)30 12345-67. But that isn’t a problem because I can argue, that I support all national and international standards of writing phone numbers.

    [1] http://msdn.microsoft.com/de-de/library/cc728034.aspx

    Like

  7. To normalise the number I do:
    1. If the first character is an ‘+’ repace it with ‘00′

    This breaks if the international prefix is not 00 (like in the US).

    2. remove all not numeric characters

    If you have 123p4 it would become 1234, i.e. a different phone number.

    3. If the first numeric character is an ‘0′ and the second not replace ‘0′ with the international prefix of the user.

    This doesn’t work everywhere, in Italy the initial 0 is not stripped (since some years ago, before it was stripped).
    Also, you don’t always know the international prefix.

    But not +49 (0)30 12345-67. But that isn’t a problem because I can argue, that I support all national and international standards of writing phone numbers.

    The +XX (0) YYY format is just stupid and I never saw anybody outside Britain using it.

    Like

  8. Is there any way to change the number of digits that are used to compare phone numbers? (on the N900)

    My country only has 6 numbers.

    Like

  9. @Ragnar:
    Really? Which country?
    When you receive a call in your country how does the number appear? With the international prefix?

    There is no way to change it on the N900 and changing it is a problem for various reasons, but I can warn the people working on Harmattan about this problem.

    Like

  10. Country: Faroe Islands
    Int Prefix: +298
    Number: xx xx xx

    Usually the number shows with the prefix, but there are times when it dosen’t. So the contact name is not always shown.

    All of the numbers in my contacts have the prefix, so when I dial/sms some one by typing inn the number without the prefix the contact name is not shown either.

    My current fix is to add a number without prefix to some of my contacts.

    On my SE w350i contact names are shown even if the number is with a prefix or without, so it should be working.

    Like

  11. Interesting, I thought all the small countries had their phone systems somehow linked to the one of another bigger country (like San Marino and the Vatican City with Italy).

    Sadly there is no way to solve this on the N900 :-/

    Like

  12. > The +XX (0) YYY format is just stupid
    > and I never saw anybody outside
    > Britain using it.

    I have seen people in France using that format too.

    There is also this format used by the website http://www.louvre.fr
    (33) 01 40 20 50 50
    and it should be understood as +33140205050 or 0140205050

    Like

  13. > The +XX (0) YYY format is just stupid
    > and I never saw anybody outside
    > Britain using it.

    You are right that format is rally stupid, but I have seen people in Germany using that format too.

    Like

  14. I’m sure Nokia phones check only last 6 numbers when displaying caller — have 2 similar number distinguised only by operator, for ie. 031123456 and 041123456 (031 and 041 for operator) — it will display only 1 person despite different persons stored in adressbook. That was with rusty 6310i and same way work N900 … Unfortunately 😦

    Like

  15. If two numbers share the same 7 digits with different prefix, N900 in PR1.2 shows the number instead of the name. In older release it used to show the name. What has changed?

    Why not search for a larger number of digits (ex. 9) if the match is more than 1 for 7 digit numbers?

    Like

  16. Not quite sure why +XX (0) YYYYZZZZZZ is a “stupid” format. Surely it emphasises the rules for dialling UK numbers (drop the leading 0) which aren’t consistent across all countries (as noted above). However, local users who may not be used to see +44YYYYZZZZZZ won’t get confused, as they’ll recognise 0YYYY ZZZZZZ as a phone number.

    This is a timely post as when Hermes starts pulling in phone numbers from LinkedIn/Plaxo, recognising if a contact already has that number is going to be key 🙂

    Like

  17. @sponka:
    I’m pretty sure it’s 7.

    @Mahmoud:
    It’s because you could have the same number for multiple people working for the same company and showing a random name is misleading.
    You cannot use more digits as more digits are just not reliable.

    @Andrew:
    Because it’s the only commonly used format that if typed on a phone doesn’t work. You need somebody explaining you what the (0) means to understand it; it probably looks fine to you as you always saw it being British 😉

    Like

  18. @barisione, the only format which can’t be typed on a _mobile_ phone. Nothing starting ‘+’ will work when dialled on a landline.

    Brackets meaning optional/region specific seems reasonably sensible. Far *far* better than the other UK format which is still seen all over the place: (0YYYY) ZZZZZZ.

    Anyway, osso_abook_contact_matches_phone_number() seems exactly like the method I want to be using in Hermes 🙂

    Like

  19. @Germán Póo-Caamaño:
    You mean full number *including* the local prefix? Googling a bit it looks like it’s 6 if you exclude the local prefix and I would assume that the local prefix is needed when you call from a mobile phone. Or am I wrong?

    Like

  20. @barisione:
    You are right, it’s 7 numbers. And my N900 show name, not numbers.

    Mobile number in my country have 3 digits for mobile operator following 6 digit number (ie. 041 123456); landlines have 2 digit area code and 7 digits for numbers (ie. 01 1234567).

    Phone check 7 numbers, so if I make mistake at operator (031, 041, 050, 040, 070 –> that would be only 2 operators instead 5 as phone distinguish only 1 and 0.

    I have 2 persons with same mobile number, but different operator in my phonebook. It doesn’t matter who call, phone display same name. Also, I often type numbers directly, not from phonebook. If I make mistake at operator number I wont know until wrong person answer. Still, display will show picture and name of the person I wanted to call 🙂

    Heve few years old SE 710i and that phone doesn’t have such problem.

    Like

  21. @sponka:
    As I said there is no way to properly and universally comparing 2 different phone numbers; you have to choose between showing the contact even if the prefix is wrong or not being able to match the contact at all in a lot of countries…

    Like

  22. So, N900 (Maemo) is the way it is, but in Harmattan they could at least use the prefix returned from the GSM network cell, and if the incoming caller ID starts with that prefix – make sure to have any contact with a full match (all digits) be returned before checking only ID without prefix (if the prefix match). Last resort matching 6 digits.
    a) any contact with full number==caller ID
    b) caller ID minus GSM cell prefix, if caller ID starts with the prefix
    c) matching last 6 digits only

    Like

  23. @Fredrik Wendt:
    How would that help? You would still need to do all the matching in any case, even if you find a contact matching an early step of matching.

    Like

  24. Quick note: the format +XX (0) YYY… is quite commonly seen in Sweden too – you dial the zero if you’re on a landline, or whenever you don’t dial the country code.

    My suggestion could help the case where the owner of the phone has entered the wrong area code for a phone number, as described by Sponka in comment #15.

    Another example: My son has a cell phone with the number +46XXX121492 and the landline number to our home is +46YY121492 (YY is (0)31).

    My suggestion would match correctly when my son calls me from his cell, but when if he calls from home – it’s still arbitrary (the caller ID would be 031121492, which doesn’t match the GSM prefix my network cell reports (+46) so 6-digit matching would apply).

    I could write you a unit test case in Python, Java or JavaScript if you think that’d be useful. 🙂

    Are lookups this expensive that an if statement (and some substring/concatenation), potentially resulting in three different lookups on the address/phone book is worth avoiding?

    Like

  25. No, they are not too expensive, but I don’t see how your algorithm can improve the quality of matches. You would always do the 6/7 digits matching for any phone number.

    Like

  26. Sorry, would mind explaining that more specifically – I’m not sure I follow: are you suggesting that my suggestion would break anything, or change behaviour to the worse? My suggestion doesn’t solve everything, it just makes a couple of cases.

    Like

  27. An always fixed 6-7 digits match is not good enough. As an example Egypt is moving to 8 digits after the area / operator code, ex. 023 12345678. The current search algorithm in PR1.2 will not work.

    Why not before searching with only 7 digits, search with the full caller number, if it matches one contact then show it. Else reduce the digits and look for a 7 digits match.

    Use more than one way to try to find only 1 match.

    If I do not have contacts with the same full number, I would expect the caller name to appear even if some contacts share the last 6 or 7 digits as long as the carrier is sending a longer number of the caller.

    Like

  28. @Fredrik Wendt, Mahmoud:
    Multiple contacts can have the same phone number. It’s needed to match *all* of them, not just one.
    For instance, if you have your mother saved with phone number +991234567 and your dad with 01234567, you must match both of them when you look for +991234567, 01234567 or 00991234567.

    Like

  29. @barisione:
    If the caller number has a country code prefix + or 00 at the beginning you can exclude the country code before the search so +99 123456789 will become 123456789 and search first with the full local number 123456789 before searching for 3456789. (country codes can be used from the ISO)

    For your example, I agree there will not be a unique number because after excluding the +99 because there are two contacts with the same and thus it will show the number only (duplication). In this case the user is insisting on saving two contacts with the same number.

    but if:
    Contact A saved number: +99 123456789
    Contact B saved number: 3456789
    Caller: +88 123456789

    The call might better be identified as coming from Contact B NOT Contact A (and there is no duplication) because the country code mismatch with Contact A. After finding a match, if both the caller and contact have country codes they must match, otherwise look for another match.

    and if (no country code)
    Caller: 01212345678
    Contact A: 01212345678
    Contact B: 01012345678

    A match for “12345678” will find two but a match for “212345678” will find only one.

    If this is expensive or complex, may be making the number of digits to search part of the regional settings or configurable is better.

    Like

  30. @Mahmoud:
    That doesn’t even work with the examples I gave in the blog post.
    You cannot really rely on that:
    – A phone number could be callable with multiple different codes;
    – The international call prefix is different in different countries (+, 00, 011);
    – The same country could have different call prefixes for different purposes (like selecting a different operator);
    – New countries or countries in war often use phone numbers for another country and then slowly move to a new scheme.

    Like

  31. @barisione: Interesting. And thanks for sharing. On one level, I’m truly sorry that you have to take contries at war into account.
    At another level, I’m sorry that the code being discussed here isn’t free, so that I can change it to work the way I want it too. Obviously, I’m prepared to move the code further than you are (in order to make it work better for me in my situation) – You want to adress the world’s requirements, I’m interested in mine.

    That OpenMoko on my desk smiles at me. The N900 does too, but gives me a different feeling. 😉

    Like

  32. I am working on validating UK mobile phone format. Users do not want to follow strictly any rules. I tried to have this below algorithm to hopefully cover different cases of input. Please note that mobile number in UK have the format of 07XXXXXXXXX but it gets confusing when allowing users to enter country code.

    1. The system first strips out any non-numeric text:
    Examples:
    Case 1: +44 (0)7966 335 263 becomes 4407966335263
    Case 2: 0044 – 7966 335 263 becomes 00447966335263
    Case 3: 07966335263 no change

    2. If first 2 digits are “00”, strip them.
    4407966335263 no change
    00447966335263 becomes 447966335263
    07966335263 no change

    3. Then, first 2 digits are “44”, strip them.
    4407966335263 becomes 07966335263
    447966335263 becomes 7966335263
    07966335263 no change

    4. Then, first 2 digits must be “07”; or first digit must be “7”. The rest of the number must have 9 digits. Otherwise, wrong format.

    Please if you could check if this works?

    Like

  33. @Hugh Nguyen:
    Are you sure you want to change what the user inserts? The format could change (and it happened in the past) while your application could not be updated.
    What if the user wants to insert a non-UK phone number? When I moved to the UK I needed some weeks before getting a UK phone number. Then why just mobile phones?
    Throwing away something in parenthesis is dangerous as somebody could put the whole prefix in parenthesis.
    If you really want just UK mobile phone numbers your algorithm could work.

    Like

  34. @barisione, thank for the reply.
    The users are in UK only. For people who come to UK, the forms often have more contact phone number fields so they can enter land line phone, which is not validated.

    I’m also not convinced about the need to validate mobile phone number. But sometimes it’s just because the stakeholders want it. Sometimes we don’t challenges calls like this because it costs more troubles than just do it. I will present this algorithm along with some situations (one that you mentioned about people using not-UK mobile number) and see if they still want to do it.

    Like

Comments are closed.