How to clean up this python output -


i trying use python module textract extract text images , since images contains noise output getting noise in addition actual text interested in. can suggest code best ways clean output.

here code:

>>> in glob.glob("*.jpg"): ...     print(textract.process(i)) 

here output:

...       -s.  4รข€˜-0-.r-v .-  5,14,45_18685-m  c.  .4         "v-0-an .-  5,14,44_17793-m   5,13,66  17951-n   5,13,65_17959-n 

basically want lines starts number "5" , nothing else. added line code above still didn't work way expected.

here revised code

>>> in glob.glob("*.jpg"): ...     text = textract.process(i) ...     if text.startswith('5'): ...             print text 

and output revised code

5,13,66  17951-n   5,13,65_17959-n 

maybe should try split extracted text lines first:

>>> in glob.glob("*.jpg"): ...     text = textract.process(i) ...     # split text multi lines ...     line in text.split('\n'): ...         if line.startswith('5'): ...                 print line 

Comments

Popular posts from this blog

qt - Using float or double for own QML classes -

Create Outlook appointment via C# .Net -

ios - Swift Array Resetting Itself -