python - unstruct wikipedia synonym bracket -


i want unstruct wikipedia synonym bracket.

here's easy 1 do.

he [[korean]]. 

i can remove bracket.

here's difficult one.

he lives in [[gimhae city|gimhae]]. 

the first one(gimhae city) wikipedia document title.

so have second 1 in bracket.

any suggestion welcome.

you can use following regex:

\[{2}(?:[^|\]]*\|)?([^]]*)]{2} 

and relace \1.

see demo

here regex matches:

  • \[{2} - 2 opening square brackets
  • (?:[^|\]]*\|)? - 0 or 1 sequence of characters other | , ] (with [^|\]]*) , literal | \| (note escaped outside of character class)
  • ([^]]*) - matches , captures group 1 we'll reference later \1 0 or more characters other closing square bracket
  • ]{2} - 2 closing square brackets (note not have escape them here since first [ escaped).

the python snippet:

import re p = re.compile(r'\[{2}(?:[^|\]]*\|)?([^]]*)]{2}') test_str = "he lives in [[gimhae city|gimhae]]. lives in [[gimhae]]. " result = re.sub(p, r"\1", test_str) print(result) # => lives in gimhae. lives in gimhae.  

Comments

Popular posts from this blog

qt - Using float or double for own QML classes -

Create Outlook appointment via C# .Net -

ios - Swift Array Resetting Itself -