Regexes: The Bad, the Better, and the Best

Not all regular expression engines work in the way being described here. The step-through in this post demonstrates the (unoptimized) algorithm that Java, Ruby, Perl, Python, and PHP use, which is the recursive backtracking algorithm.

The author should emphasise that point more. The whole point the article makes is moot if regex is compiled to a DFA before use. A DFA will match a character in strictly O(1), so no matter how complex the regex is, strings will always be matched in linear time. Even more, a more “general” regex will typically have fewer states in the DFA than the specific one, so it may be better to use it instead.

All the regexes she used in the article are in fact “simple enough” that the Thompson algorithm and subset construction work, so maybe an example that actually uses the fancy features would be more illustrative.

Anúncios

Deixe um comentário

Preencha os seus dados abaixo ou clique em um ícone para log in:

Logotipo do WordPress.com

Você está comentando utilizando sua conta WordPress.com. Sair / Alterar )

Imagem do Twitter

Você está comentando utilizando sua conta Twitter. Sair / Alterar )

Foto do Facebook

Você está comentando utilizando sua conta Facebook. Sair / Alterar )

Foto do Google+

Você está comentando utilizando sua conta Google+. Sair / Alterar )

Conectando a %s