regex - Automatically find short regexp to match a set of words? -
i not looking specific regular expression, software find them.
let have file , file b: how find regexp matches words of a, not match of words in a?
if contains "truit fruit" , b contains "ridiculous", software return ".ru." '.r.' invalid.
it "practical" aspect of question [1], though interests me find actual software solves in practice.
thanks help,
nathann
there no algorithm somehow "cleverly derive" regular expression examples. can implement brute force attempt of iteration through permutations of common substrings of words in , tests b against until find solution. not guaranteed find solution, though.
for case there no common substrings of words in extend approach introduce "or" operator in regular expressions. get's ugly , slow.
if not lead solution, you'd have go on extending attempts such exclusion rules added expression iterating through words in b , creating anti patterns it. horrible attempt.
and said: never guaranteed find solution.
there 1 thing though:
if not interested in how final regular expression looks can this: create regex combining words in "whitespace padded version of a" "or" operation (so \struit\s|\sfruit\s
in example). attempt creates huge expressions. have take care exclude exact substrings might occur in b again. may lead longer expressions still.
bottom line: there no elegant solution this. because question not allow that. question is: why have regular expression? why can't string comparisions? not more expensive anyway in such vaguely defined scenario...
Comments
Post a Comment