Regex for UK Telephone Numbers

for all subjects/topics not covered by the other forum categories
Post Reply
User avatar
BeebMaster
Posts: 3520
Joined: Sun Aug 02, 2009 5:59 pm
Location: Lost in the BeebVault!
Contact:

Regex for UK Telephone Numbers

Post by BeebMaster » Fri Feb 14, 2020 10:25 am

Can anyone suggest a regular expression to match UK telephone numbers? I have 600-odd plain text files to process and I need to match 'phone numbers in the text. I've tried several expressions from websites, and none has found a single number.

It's made me a bit suspicious that there may be hidden characters in the files, so I think I will have to check some of the files with a hex editor to make sure the 'phone numbers are only ASCII numeric characters plus brackets and space. Oh and plus.
Image

User avatar
BeebMaster
Posts: 3520
Joined: Sun Aug 02, 2009 5:59 pm
Location: Lost in the BeebVault!
Contact:

Re: Regex for UK Telephone Numbers

Post by BeebMaster » Fri Feb 14, 2020 12:15 pm

Well well, what a difference a day makes! This works:

Code: Select all

((\+44\s?\(0\)\s?\d{2,4})|(\+44\s?(01|02|03|07|08)\d{2,3})|(\+44\s?(1|2|3|7|8)\d{2,3})|(\(\+44\)\s?\d{3,4})|(\(\d{5}\))|((01|02|03|07|08)\d{2,3})|(\d{5}))(\s|-|.)(((\d{3,4})(\s|-)(\d{3,4}))|((\d{6,7})))
Although I did notice it pulls in some strings of 10/11 numbers which aren't telephone numbers (eg. "0123456789.pdf" etc) but I think I can live with that.
Image

RobC
Posts: 2963
Joined: Sat Sep 01, 2007 10:41 pm
Contact:

Re: Regex for UK Telephone Numbers

Post by RobC » Fri Feb 14, 2020 12:50 pm

BeebMaster wrote:
Fri Feb 14, 2020 12:15 pm
I did notice it pulls in some strings of 10/11 numbers which aren't telephone numbers (eg. "0123456789.pdf" etc)
I worked on a similar problem once and used probabilities derived from order statistics to flag up cases like that.

User avatar
1024MAK
Posts: 10229
Joined: Mon Apr 18, 2011 5:46 pm
Location: Looking forward to summer in Somerset, UK...
Contact:

Re: Regex for UK Telephone Numbers

Post by 1024MAK » Fri Feb 14, 2020 12:51 pm

Hi Ian

Given the number ( :P ) of times there has been a change to the format and length of mainland U.K. telephone numbers, good luck with that!

Mark

User avatar
BeebMaster
Posts: 3520
Joined: Sun Aug 02, 2009 5:59 pm
Location: Lost in the BeebVault!
Contact:

Re: Regex for UK Telephone Numbers

Post by BeebMaster » Fri Feb 14, 2020 12:59 pm

Indeed. I remember the Current Bun or somesuch showing all the number changes since the original "Whitehall 1212" a few years back!

I'm pretty sure I'm not dealing with any old area codes before the extra 1 was introduced though.

Despite websites with sample expressions claiming infallibility, it probably isn't possible to catch everything. That string won't capture local numbers only ("222 3333") for example.

However it looks like if I add \s to the end of that string it omits "string-of-numbers.filetype" whilst still matching "call me on 0123456789 or 07772345678".

But now it doesn't match "my number is 01234567890." because of the full stop!

I think I will process the files using the string ending in /s and then manually look at matches with the original string.
Image

User avatar
jgharston
Posts: 4081
Joined: Thu Sep 24, 2009 12:22 pm
Location: Whitby/Sheffield
Contact:

Re: Regex for UK Telephone Numbers

Post by jgharston » Fri Feb 14, 2020 3:58 pm

Not a regular expression, but pass it into and then out of the Phone library functions.

num$=FNPhone_ToStrF(FNPhone_FromStr(numb$))

will take a mangled number and return it as a properly-formatted number.

Code: Select all

$ bbcbasic
PDP11 BBC BASIC IV Version 0.32
(C) Copyright J.G.Harston 1989,2005-2020
>_

User avatar
scruss
Posts: 269
Joined: Sun Jul 01, 2018 4:12 pm
Location: Toronto
Contact:

Re: Regex for UK Telephone Numbers

Post by scruss » Sat Feb 15, 2020 4:52 am

BeebMaster wrote:
Fri Feb 14, 2020 12:59 pm
I'm pretty sure I'm not dealing with any old area codes before the extra 1 was introduced though.
That would likely be impossible to regex. Some cities had special short codes that allowed nearby but not quite local towns to be dialled at local rates. From Glasgow, "32" would get you East Kilbride and "36" would get you Killearn/Balfron.

Coeus
Posts: 1759
Joined: Mon Jul 25, 2016 12:05 pm
Contact:

Re: Regex for UK Telephone Numbers

Post by Coeus » Wed Mar 18, 2020 1:48 pm

scruss wrote:
Sat Feb 15, 2020 4:52 am
BeebMaster wrote:
Fri Feb 14, 2020 12:59 pm
I'm pretty sure I'm not dealing with any old area codes before the extra 1 was introduced though.
That would likely be impossible to regex. Some cities had special short codes that allowed nearby but not quite local towns to be dialled at local rates. From Glasgow, "32" would get you East Kilbride and "36" would get you Killearn/Balfron.
I rather suspect there isn't even a limit on the total number of digits as I suspect it was possible to route calls directly through the network this way, i.e. if you knew the local code from Woodbridge to Ipswich (probably 9 - this usually is "up a level", i.e. nearer to the core) and then Ipswich to Glasgow, then the 32 from Glasgow to East Kilbride you could dial the whole call without STD.

IIRC this was eventually stopped because people were getting national calls for local rate but then someone discovered how to use a penny whistle with the holes at the right frequencies to do inband signalling directly rather than relying on dialed digits being passed on.

Anyway, I have just seen this thread of would have replied earlier. Compared with BeebMaster's RE earlier up the thread this Perl function doesn't look it recognises numbers with the UK international prefix +44 but is something we use to format telephone numbers for display. It removes any spaces the person entering the number has put in and instead puts them in the standard place, distinguishing between those numbers normally represented in two groups vs. three.
[/quote]

Code: Select all

sub FormatTelno {
    my $new = $_ = $_[0];
    tr/ //d;
    if (m/^(02\d)(\d{4})(\d{4,})$/   ||
        m/^(011\d+)(\d{3})(\d{4,})$/ ||
        m/^(01\d1)(\d{3})(\d{4,})$/  ||
        m/^(070)(\d{4})(\d{4,})$/    ||
        m/^(0[389]\d{2})(\d{3})(\d+)$/) {
        $new = "$1 $2 $3";
    } elsif (m/^(0\d{4})(\d{6,})$/) {
        $new = "$1 $2";
    }
    return $new;
}

julie_m
Posts: 227
Joined: Wed Jul 24, 2019 9:53 pm
Location: Derby, UK
Contact:

Re: Regex for UK Telephone Numbers

Post by julie_m » Wed Mar 18, 2020 7:04 pm

UK payphones worked fundamentally differently than US ones. In the USA, the payphone sent tones up the line to the exchange to indicate coin insertions; in the UK, the exchange sent pulses down the line to indicate funds exhaustion.

Chaining together local codes was sort of possible but not always. In later years the short codes were crafted so as to thwart this, making use of the property that digits arriving up inbound trunks can be sent to a different hunt group than digits dialled on local phones. For example dialling from Burton-upon-Trent to Derby, the code was 93. To call from Derby to Ashbourne, the code was 91. Burton could call Ashbourne with the short code 939 (not 9391). It was thus impossible to chain Derby codes beginning with 9 from Burton (or its satellite exchanges Etwall, Barton-under-Needwood, Hoar Cross and Sudbury).

Etwall (028373) on the outskirts of Derby to Cannock (05435) in Staffordshire -- a distance over 50km. by road -- was charged at local rate, since Etwall was a satellite of Burton-upon-Trent and Cannock was a satellite of Lichfield.

User avatar
SimonSideburns
Posts: 553
Joined: Mon Aug 26, 2013 9:09 pm
Location: Purbrook, Hampshire
Contact:

Re: Regex for UK Telephone Numbers

Post by SimonSideburns » Fri Mar 20, 2020 2:08 pm

What is interesting is that Portsmouth and Southampton share the area code 023, originally with 92 representing Portsmouth and 80 representing Southampton. This was at the time of the big switch over (or whatever it was called) back in 2000.

Calling my house from within the 023 area I simply need to use the 92nn nnnn number. This replaced the original 0705 nnnnnn code, which later went on to be 01705. Southampton used to be 01703 followed by a 6 digit local code.

I know that they've run out of 92 numbers and have started using 023 93nn nnnn. It confuses the hell out of some people (certainly some older folk) who think you've made a mistake and change it to 92, but that of course dials someone else entirely.

There does seem to be a lot of inconsistency with regard to how to format the new number. I've seen a multitude of ways, including (02392) 123456, 023 92 123456, 02392 123 456, and 023 9212 3456 (which is the preferred way I am led to believe). I'm not sure if it's usual these days to leave out the brackets surrounding the area code.
Just remember kids, Beeb spelled backwards is Beeb!

User avatar
1024MAK
Posts: 10229
Joined: Mon Apr 18, 2011 5:46 pm
Location: Looking forward to summer in Somerset, UK...
Contact:

Re: Regex for UK Telephone Numbers

Post by 1024MAK » Fri Mar 20, 2020 3:59 pm

To be fair, because of the bits and pieces way that they have gone about it since the big switch over, it’s a bit of a mess :(
I don’t blame the telecommunications companies, but rather the so called regulator Ofcom. I think a bunch of five year olds would make a better job than this lot. And that applies to the regulation and licensing of the radio bands as well.

Current network plan

Mark

User avatar
jgharston
Posts: 4081
Joined: Thu Sep 24, 2009 12:22 pm
Location: Whitby/Sheffield
Contact:

Re: Regex for UK Telephone Numbers

Post by jgharston » Fri Mar 20, 2020 8:41 pm

SimonSideburns wrote:
Fri Mar 20, 2020 2:08 pm
There does seem to be a lot of inconsistency with regard to how to format the new number. I've seen a multitude of ways, including (02392) 123456, 023 92 123456, 02392 123 456, and 023 9212 3456 (which is the preferred way I am led to believe). I'm not sure if it's usual these days to leave out the brackets surrounding the area code.
The correct way is 023 xxxx xxxx, as - as with all 02x numbers - the local number is an 8-digit number. By writing it as 023xx or similar, the writer is lying (yes, explicitly and baldly lying) by stating that their communicant can use a 7-digit or 6-digit number to contact the destination.

I mean, how confusing is the simple statement "02x areas have 8-digit numbers"?

And people can't even be aresed to actually check by the simple process of just simply picking up their telephone and attempting to dial the 6-digit or 7-digit number they are claiming they use. Or even just looking out of the window and looking at other numbers in public view.

Well, telling their potential customers "F**** OFF we don't want to hear from you!!!!" will get them exactly what they are stating they want.

Code: Select all

$ bbcbasic
PDP11 BBC BASIC IV Version 0.32
(C) Copyright J.G.Harston 1989,2005-2020
>_

Kazzie
Posts: 1793
Joined: Sun Oct 15, 2017 8:10 pm
Location: North Wales
Contact:

Re: Regex for UK Telephone Numbers

Post by Kazzie » Sat Mar 21, 2020 7:22 am

Cardiff had a similar situation, going from (01222) xxx xxx to (029) 20 xxx xxx. Many people assumed the 20 was part of the area code, then several years later 21 xxx xxx numbers started being issued.
BBC Model B 32K issue 7, Sidewise ROM board with 16K RAM
Archimedes 420/1 upgraded to 4MB RAM, ZIDEFS with 512MB CF card
RiscPC 600 under repair
Acorn System 1 home-made replica

User avatar
1024MAK
Posts: 10229
Joined: Mon Apr 18, 2011 5:46 pm
Location: Looking forward to summer in Somerset, UK...
Contact:

Re: Regex for UK Telephone Numbers

Post by 1024MAK » Sat Mar 21, 2020 8:09 am

Kazzie wrote:
Sat Mar 21, 2020 7:22 am
Cardiff had a similar situation, going from (01222) xxx xxx to (029) 20 xxx xxx. Many people assumed the 20 was part of the area code, then several years later 21 xxx xxx numbers started being issued.
Same in Bristol and surrounding area, same in a number of other places. Even technical people are confused. Soon after the change in the Bristol area I pointed out that the alterations to the signs showing the emergency contact numbers were incorrect. They had previously been 0272 xxx xxx and changed to 0117 9xxx xxx but the signs were altered to read 01179 xxx xxx. The signs should have said 0117 9xxx xxx.

When I raised this issue with the local manager, he said, it’s okay, if they dial the full number they will get through. But you don’t really want any confusion with an emergency telephone number.

Obviously since then more numbers have been added so the first digit of the local number may no longer start with 9.

The Bristol details are on this site.

One big disadvantage with this change was that exchange numbers in the area surrounding Bristol could no longer use the two digit short code (which started with a 9) to dial a Bristol telephone number. That was annoying :twisted: Lots of businesses and other organisations had to change all their publicity, signs, everything to remove the short code and show the new area code and the new local number.

That included private companies that had their own telephone systems (like my employer) as all the private exchange ‘phones had to have their labels showing the external direct dial number changed as well.

Now if it had been me doing the change, I would have added two digits to the front/start/MSD of the Bristol local numbers, and the first digit would not have been 0, 1 or 9. If you are going to cause widespread change, at least make it future proof for a long time period.

Mark

User avatar
BigEd
Posts: 3340
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Regex for UK Telephone Numbers

Post by BigEd » Sat Mar 21, 2020 8:18 am

(Edit: oops, pipped in the post.) I remember when Bristol changed - perhaps from 0272 with six digits to 0117 with seven digits, everyone got a 9 on the front of their number, and yet quite a few public signs and printed materials mistakenly wrote 01179 as the prefix.

I also remember Southampton was 0703. Those were the days. (I worked at Philips, their number was 0703 702 701.) Actually, it turns out I didn't remember that - I was sure it was 0203!

I see there's a whole page:
UK telephone code misconceptions

User avatar
BigEd
Posts: 3340
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Regex for UK Telephone Numbers

Post by BigEd » Sat Mar 21, 2020 8:22 am

And we shouldn't forget "That's 01 if you're outside London"

User avatar
jgharston
Posts: 4081
Joined: Thu Sep 24, 2009 12:22 pm
Location: Whitby/Sheffield
Contact:

Re: Regex for UK Telephone Numbers

Post by jgharston » Sat Mar 21, 2020 9:45 am

BigEd wrote:
Sat Mar 21, 2020 8:18 am
(Edit: oops, pipped in the post.) I remember when Bristol changed - perhaps from 0272 with six digits to 0117 with seven digits, everyone got a 9 on the front of their number, and yet quite a few public signs and printed materials mistakenly wrote 01179 as the prefix.
Similarly in Sheffield, though the worst I've seen is this!

They've managed to grasp the concept that they have a 7-digit number, but they *also* used a broken 6-digit area code, giving a 11-digit number (excluding the 0).

Code: Select all

$ bbcbasic
PDP11 BBC BASIC IV Version 0.32
(C) Copyright J.G.Harston 1989,2005-2020
>_

User avatar
BeebMaster
Posts: 3520
Joined: Sun Aug 02, 2009 5:59 pm
Location: Lost in the BeebVault!
Contact:

Re: Regex for UK Telephone Numbers

Post by BeebMaster » Sat Mar 21, 2020 11:41 am

We get that here too; most Oldham numbers are 0161 6xx xxxx, but lots of businesses use 01616 xxx xxx or 01616 xxxxxx for their number. I wouldn't like to say that this started as a result of the "shame factor" following the Oldham riots a few years back, but I don't remember seeing it beforehand, and lots of people living in outlying parts of Oldham have chosen to omit "Oldham" from their postal address from this time (eg. 23 Acacia Avenue, My Village, OLx xxx) and never restored it since.
Image

User avatar
SimonSideburns
Posts: 553
Joined: Mon Aug 26, 2013 9:09 pm
Location: Purbrook, Hampshire
Contact:

Re: Regex for UK Telephone Numbers

Post by SimonSideburns » Sat Mar 21, 2020 7:12 pm

Like we're all saying, it's a total mish-mash of conflicting, confusing, and downright random at times mess.

I do wish the whole thing was changed, for the better, with complete updating to a more modern scheme, with future proofing built in, and without trying to preserve any old fashioned notion of backward compatibility.

But, I doubt that will happen, and they'll keep randomly making up new bodges to shoehorn extra capacity into a scheme that wasn't designed for it.
Just remember kids, Beeb spelled backwards is Beeb!

User avatar
BigEd
Posts: 3340
Joined: Sun Jan 24, 2010 10:24 am
Location: West
Contact:

Re: Regex for UK Telephone Numbers

Post by BigEd » Sat Mar 21, 2020 7:31 pm

There were some clever steps in the renumbering to prepare the ground. I'm not sure if there were mis-steps. For example, it looks like London numbers changed in 1990, then 1995, and again in 2000: "As a result of this history, there is now a widespread misconception that 0207 and 0208 are the dialling codes for parts of London."

Post Reply

Return to “off-topic”