java - Tokenize Timestamp with jFlex and handle ISO format -
i trying solve date tokenization issue various date format, , 1 of them maybe in iso8601 format, using 't' delimiter. , want able know character 't' timestamp when has digit preceding , following it.
for instance, if have array
string[] timestamp = {"time: 12/45/60", "2015-07-13t05:30:59"}
i want splitted result of
(time) (:) (12) (/) (45) (/) (60) (2015) (-) (07) (-) (13) (t) (05) (:) (30) (:) (59)
i using jflex make tokenizer, , wrote .flex file such:
%% %class lexer specialt = (\dt\d) parameter = [:jletterdigit:]+ delimiter = [^a-za-z0-9]|{specialt} %% [:digit:]+ {return new datetoken(yytext(), "int");} {delimiter} {return new datetoken(yytext(), "delimiter");} {parameter} {return new datetoken(yytext(), "text");}
however, tokenizer parse out symbols, not 't'. has suggestions? thank much.
it works me, or rather works grammar describes things.
<!-- language: lang-none --> 7-29t08:42 [int] 7 [delimiter] - [text] 29t08 [delimiter] : [int] 42 [delimiter]
indeed, after scanner matches -
delimiter
- it gets
2
, matches [:digit:]+, , {parameter} - it gets
9
, matches [:digit:]+, , {parameter} - it gets
t
, doesn't match [:digit:]+ still matches {parameter} - then
0
,8
keep matching {parameter} :
doesn't match {parameter}; , token{parameter} 29t08
returned.
note {specialt} recognized if enter that:
<!-- language: lang-none --> 5t6 [delimiter] 5t6
your first problem specialt capturing much.
your second problem {parameter} matches virtually everything.
i suggest define iso date more accurately:
// hh:mm or hh:mm:ss isotime = {dig2} {delimiter} {dig2} ({delimiter} {dig2})? // yyyy-mm-dd or yyyy-mm-ddt<isotime> isodate = {dig4} ({delimiter} {dig2}){2} (t {isotime})?
this create nice token full 2015-07-29t16:42.
Comments
Post a Comment