regex - Automatically find short regexp to match a set of words? -


i not looking specific regular expression, software find them.

let have file , file b: how find regexp matches words of a, not match of words in a?

if contains "truit fruit" , b contains "ridiculous", software return ".ru." '.r.' invalid.

it "practical" aspect of question [1], though interests me find actual software solves in practice.

thanks help,

nathann

[1] https://cstheory.stackexchange.com/questions/1854/is-finding-the-minimum-regular-expression-an-np-complete-problem

there no algorithm somehow "cleverly derive" regular expression examples. can implement brute force attempt of iteration through permutations of common substrings of words in , tests b against until find solution. not guaranteed find solution, though.

for case there no common substrings of words in extend approach introduce "or" operator in regular expressions. get's ugly , slow.

if not lead solution, you'd have go on extending attempts such exclusion rules added expression iterating through words in b , creating anti patterns it. horrible attempt.

and said: never guaranteed find solution.


there 1 thing though:

if not interested in how final regular expression looks can this: create regex combining words in "whitespace padded version of a" "or" operation (so \struit\s|\sfruit\s in example). attempt creates huge expressions. have take care exclude exact substrings might occur in b again. may lead longer expressions still.


bottom line: there no elegant solution this. because question not allow that. question is: why have regular expression? why can't string comparisions? not more expensive anyway in such vaguely defined scenario...


Comments

Popular posts from this blog

qt - Using float or double for own QML classes -

Create Outlook appointment via C# .Net -

ios - Swift Array Resetting Itself -