c++ - utf8::next() of utfcpp - tries to iterate past the end of the string -
i'm using utfcpp work utf-8 encoded strings stored in std::string objetcs.
i want iterate on code points. utf8::next()
uint32_t next(octet_iterator& it, octet_iterator end);
seems way this. here's test program illustrate use:
std::string u8("hello utf-8 \u2610\u2193\u2190\u0394 world!\n"); std::cout << u8 << std::endl; uint32_t cp = 0; std::string::iterator b = u8.begin(); std::string::iterator e = u8.end(); while (cp = utf8::next(b,e)) printf("%d, ", cp);
this extracts characters fine, however, program throws not_enough_room exception, indicates "it gets equal end during extraction of code point" after printing 10, ascii newline control character:
hello utf-8 ☐↓←Δ world! 72, 101, 108, 108, 111, 32, 85, 84, 70, 45, 56, 32, 9744, 8595, 8592, 916, 32, 87, 111, 114, 108, 100, 33, 10, terminate called after throwing instance of 'utf8::not_enough_room' what(): not enough space
obviously, providing end iterator seems insufficient keep utf8::next trying read on end of string.
i'm confused utf8::unchecked::next() function, not take end iterator. how know stop? catching exception normal control flow detect end of string?? i'm missing something.
i think responsible checking whether iterator equal end() before calling next().
should work without exception being thrown:
[...] uint32_t cp = 0; std::string::iterator b = u8.begin(); std::string::iterator e = u8.end(); while ( b != e ) { cp = utf8::next(b,e); printf("%d, ", cp); }
generally, use of exceptions control flow considered anti-pattern.
Comments
Post a Comment