c++ - utf8::next() of utfcpp - tries to iterate past the end of the string -

- July 15, 2010

i'm using utfcpp work utf-8 encoded strings stored in std::string objetcs.

i want iterate on code points. utf8::next()

uint32_t next(octet_iterator& it, octet_iterator end);

seems way this. here's test program illustrate use:

std::string u8("hello utf-8 \u2610\u2193\u2190\u0394 world!\n"); std::cout << u8 << std::endl; uint32_t cp = 0; std::string::iterator b = u8.begin(); std::string::iterator e = u8.end(); while (cp = utf8::next(b,e))     printf("%d, ", cp);

this extracts characters fine, however, program throws not_enough_room exception, indicates "it gets equal end during extraction of code point" after printing 10, ascii newline control character:

hello utf-8 ☐↓←Δ world! 72, 101, 108, 108, 111, 32, 85, 84, 70, 45, 56, 32, 9744, 8595, 8592, 916, 32, 87, 111, 114, 108, 100, 33, 10, terminate called after throwing instance of 'utf8::not_enough_room' what():  not enough space

obviously, providing end iterator seems insufficient keep utf8::next trying read on end of string.

i'm confused utf8::unchecked::next() function, not take end iterator. how know stop? catching exception normal control flow detect end of string?? i'm missing something.

i think responsible checking whether iterator equal end() before calling next().
should work without exception being thrown:

[...] uint32_t cp = 0; std::string::iterator b = u8.begin(); std::string::iterator e = u8.end(); while ( b != e ) {     cp = utf8::next(b,e);     printf("%d, ", cp); }

generally, use of exceptions control flow considered anti-pattern.

Search This Blog

Chrom

c++ - utf8::next() of utfcpp - tries to iterate past the end of the string -

Comments

Post a Comment

Popular posts from this blog

qt - Using float or double for own QML classes -

json - ORA-06502: PL/SQL: numeric or value error: character string buffer too small - Convert Clob to varchar2 -

python - jinja2: TemplateSyntaxError: expected token ',', got 'string' -