Мастер-классы по Javascript Екатеринбург Ростов-на-Дону Москва Узнать больше...
Содержание (скрыть) Содержание (показать)

Fix slow regexp

The task is to validate if the text consists of numbers delimited by double colons '::'. There may be no number between double colons. Single colons are not alowed.

Valid samples:

'', '::', '123::', '::123', '123', '::::',
'::1234::5678901234567890::::1234567890::888'

Invalid samples:

':', '123:', ':::', ':::123'

A programmer used regexp /^(\d+|::)*$/ to solve the task. But it takes too long or doesn’t work on the text below:

var str = '::1234::5678901234567890::::1234567890::888:'

alert( str.match(/^(\d+|::)*$/) ) // should return null

Explain, why? Give a pattern which works.

P.S. The example is from an article in Perl Journal by Jeffrey Friedl. Nice reading btw.

Решение, шаг 1
Решение
Решение, шаг 1

First solve the task in the case when there are no tags ending with <br/>. Use lookahead.

Решение, шаг 2
Решение, шаг 2

The first half is «Why it happens».

The reason is, of course, backtracking. If the string is valid, no backtracking occurs:

var str = '::1234::5678901234567890::::1234567890::888'

alert( str.match(/^(\d+|::)*$/) ) // should return null

The backtracking happens if there’s no match at the last char, like below:

var str = '::1234::5678901234567890::::1234567890::888:'

alert( str.match(/^(\d+|::)*$/) )

The regexp engine backtracks and tries every possible combination of multiple \d+ in numbers composing the string.

In short, there’s no difference between:

var str = '::1234::5678901234567890::::1234567890::888:'
str.match(/^(\d+|::)*$/)
…And the following simpler example:
var str = '123456789012345678901234567890888'
str.match(/^(\d+)*$/) )

In both cases, the infinite backtracking occurs because of (\d+)*.

Let’s try to shorten the string:

var str = '::1234567890::888:'

alert( str.match(/^(\d+|::)*$/) )  // now works

The shorter string works, because there are much less combinations in it.

What’s the correct variant?

The pattern below has another strategy of matching.

var str = '::1234::5678901234567890::::1234567890::888:'

alert( str.match(/^(\d*::)*\d*$/) )

Here we match as many numbers \d* followed by '::' as possible. The numbers are optional, that’s why *.

We have to add an optional number \d at the end, because otherwise, for /^(\d::)*/, the text would have to end with ::.

This pattern also backtracks, of course, because there is a greedy quantifier inside. But the backtracking is much simpler and faster now. Think how it goes in your free time.

#286
Наверх

Реклама

Нашли опечатку?

Нашли опечатку на сайте? Что-то кажется странным?
Выделите соответствующий текст и нажмите Ctrl+Enter!

Последние Комментарии

Помоги другим!

Помоги другим узнать о хорошей статье!