Twitter Heron: Stream Processing at Scale

Datetime:2016-08-23 02:37:03          Topic: Heron           Share

Twitter Heron: Stream Processing at Scale on Sigmod 2015.

Heron is the newly announced Twitter Stream Processing Engine. It has already used in production for about half a year. One year ago Twitter Released their Stream Processing Engine Storm@twitter .

This is a huge paper, which covers many aspects. I tried, but still cannot understand the whole paper in every detail. I may also miss understand some points in this review.

Scheduling in Storm

Storm has a very complexity multiple levels of scheduling.

Several instances of worker processes are scheduled by the operating system in a host. Inside the JVM process, each executor is mapped to two threads. In turn, these threads are scheduled using a preemptive and priority-based scheduling algorithm by the JVM. Since each thread has to runseveral tasks, the executor implements another scheduling algorithm to invoke the appropriate task, based on the incoming data. Such multiple levels of scheduling and their complex interaction often leads to uncertainty about when the tasks are being scheduled.

Recovery in Storm

In Storm, each worker can run disparate task. It is not possible to isolate the resource usage of tasks. Since logs from multiple tasks are written into a single file, it is hard to identify any errors or exceptions that are associated with a particular task.

Wast of Resource

Storm assumes that every worker is homogeneous. This results in inefficient utilization of allocated resources, and often results in over-provisioning. This problem gets worse with increasing number of diverse components being packed into a worker.

Missing heartbeats

Because of the over-provisioning, when a worker is executing a heap dump and long garbage collection it misses sending heartbeats signals.

One review in Chinese: http://www.longda.us/?p=529And the author’s argue: https://gist.github.com/maosongfu/c3aeb1bb5eb7b39fcdc5





About List