The task is to validate if the text consists of numbers delimited by double colons '::'. There may be no number between double colons. Single colons are not alowed.
Valid samples:
'', '::', '123::', '::123', '123', '::::', '::1234::5678901234567890::::1234567890::888'
Invalid samples:
':', '123:', ':::', ':::123'
A programmer used regexp /^(\d+|::)*$/ to solve the task. But it takes too long or doesn’t work on the text below:
var str = '::1234::5678901234567890::::1234567890::888:' alert( str.match(/^(\d+|::)*$/) ) // should return null
Explain, why? Give a pattern which works.
P.S. The example is from an article in Perl Journal by Jeffrey Friedl. Nice reading btw.
First solve the task in the case when there are no tags ending with <br/>. Use lookahead.
The first half is «Why it happens».
The reason is, of course, backtracking. If the string is valid, no backtracking occurs:
var str = '::1234::5678901234567890::::1234567890::888' alert( str.match(/^(\d+|::)*$/) ) // should return null
The backtracking happens if there’s no match at the last char, like below:
var str = '::1234::5678901234567890::::1234567890::888:' alert( str.match(/^(\d+|::)*$/) )
The regexp engine backtracks and tries every possible combination of multiple \d+ in numbers composing the string.
In short, there’s no difference between:
var str = '::1234::5678901234567890::::1234567890::888:' str.match(/^(\d+|::)*$/)…And the following simpler example:
var str = '123456789012345678901234567890888' str.match(/^(\d+)*$/) )
In both cases, the infinite backtracking occurs because of (\d+)*.
Let’s try to shorten the string:
var str = '::1234567890::888:' alert( str.match(/^(\d+|::)*$/) ) // now works
The shorter string works, because there are much less combinations in it.
What’s the correct variant?
The pattern below has another strategy of matching.
var str = '::1234::5678901234567890::::1234567890::888:' alert( str.match(/^(\d*::)*\d*$/) )
Here we match as many numbers \d* followed by '::' as possible. The numbers are optional, that’s why *.
We have to add an optional number \d at the end, because otherwise, for /^(\d::)*/, the text would have to end with ::.
This pattern also backtracks, of course, because there is a greedy quantifier inside. But the backtracking is much simpler and faster now. Think how it goes in your free time.