Difference between Task ID vs Message ID vs Stream ID in Storm? And how "anchoring tuples" and "acker tasks" are related to the context? -
i want know difference among them, , how related each other, , roles of "anchoring tuples" , "acker tasks". if possible, please detailed explanation , examples. have read official documentation , related articles, have unclear understanding on topic.
streamid: per default there single (logical) stream called default
. in use-cases necessary, have not single (logical) stream multiple (with different data in each stream). can declare other stream , assign id (ie, name) them distinguish them (this done in declareoutputfields(...)
method). when "plugging" topology together, per default, assign default stream (as input stream), can specify name of stream want receive explicitly.
taskid. each spout/bolt has assigned parallelism (ie, degree of parallelism, dop
). thus, each spout/bolt executed in multiple tasks, , each task id assigned such can distinguished.
messageid: if want use fault-tolerance mechanism, need assign unique id each tuple emitted spouts.
acker tasks: used process ack
messages bolts (ie, message sent system when call collector.ack(...)
or collector.fail(...)
) track if tuples processed or fail. don't need care them usually.
anchoring: anchoring mechanism tell storm, input tuples used produce output tuples. if have example bolt, splits sentence words , emits tuple per word, anchor words same sentence. (if on of word tuples fail, storm know need reply sentence tuple such lost tuple can recovered). on other hand, if aggregate let's last 5 input tuples compute average value, buffer 5 input tuples until emit average tuple , use 5 input tuple anchors single avg output tuple. (again, if output tuple lost, storm know need reply 5 input tuples such lost average tuple can recomputed). aware, cannot use tuples anchors, acked storm. thus, need delay acking input tuple, until no longer needed anchor tuple.
Comments
Post a Comment