How to clean up this python output -
i trying use python module textract
extract text images , since images contains noise output getting noise in addition actual text interested in. can suggest code best ways clean output.
here code:
>>> in glob.glob("*.jpg"): ... print(textract.process(i))
here output:
... -s. 4รข€˜-0-.r-v .- 5,14,45_18685-m c. .4 "v-0-an .- 5,14,44_17793-m 5,13,66 17951-n 5,13,65_17959-n
basically want lines starts number "5" , nothing else. added line code above still didn't work way expected.
here revised code
>>> in glob.glob("*.jpg"): ... text = textract.process(i) ... if text.startswith('5'): ... print text
and output revised code
5,13,66 17951-n 5,13,65_17959-n
maybe should try split extracted text lines first:
>>> in glob.glob("*.jpg"): ... text = textract.process(i) ... # split text multi lines ... line in text.split('\n'): ... if line.startswith('5'): ... print line
Comments
Post a Comment