python - unstruct wikipedia synonym bracket -
i want unstruct wikipedia synonym bracket.
here's easy 1 do.
he [[korean]]. i can remove bracket.
here's difficult one.
he lives in [[gimhae city|gimhae]]. the first one(gimhae city) wikipedia document title.
so have second 1 in bracket.
any suggestion welcome.
you can use following regex:
\[{2}(?:[^|\]]*\|)?([^]]*)]{2} and relace \1.
see demo
here regex matches:
\[{2}- 2 opening square brackets(?:[^|\]]*\|)?- 0 or 1 sequence of characters other|,](with[^|\]]*) , literal|\|(note escaped outside of character class)([^]]*)- matches , captures group 1 we'll reference later\10 or more characters other closing square bracket]{2}- 2 closing square brackets (note not have escape them here since first[escaped).
the python snippet:
import re p = re.compile(r'\[{2}(?:[^|\]]*\|)?([^]]*)]{2}') test_str = "he lives in [[gimhae city|gimhae]]. lives in [[gimhae]]. " result = re.sub(p, r"\1", test_str) print(result) # => lives in gimhae. lives in gimhae.
Comments
Post a Comment