Twitter Heron: Stream Processing at Scale on Sigmod 2015.
This is a huge paper, which covers many aspects. I tried, but still cannot understand the whole paper in every detail. I may also miss understand some points in this review.
Scheduling in Storm
Storm has a very complexity multiple levels of scheduling.
Several instances of worker processes are scheduled by the operating system in a host. Inside the JVM process, each executor is mapped to two threads. In turn, these threads are scheduled using a preemptive and priority-based scheduling algorithm by the JVM. Since each thread has to runseveral tasks, the executor implements another scheduling algorithm to invoke the appropriate task, based on the incoming data. Such multiple levels of scheduling and their complex interaction often leads to uncertainty about when the tasks are being scheduled.
Recovery in Storm
In Storm, each worker can run disparate task. It is not possible to isolate the resource usage of tasks. Since logs from multiple tasks are written into a single file, it is hard to identify any errors or exceptions that are associated with a particular task.
Wast of Resource
Storm assumes that every worker is homogeneous. This results in inefficient utilization of allocated resources, and often results in over-provisioning. This problem gets worse with increasing number of diverse components being packed into a worker.
Because of the over-provisioning, when a worker is executing a heap dump and long garbage collection it misses sending heartbeats signals.
One review in Chinese: http://www.longda.us/?p=529And the author’s argue: https://gist.github.com/maosongfu/c3aeb1bb5eb7b39fcdc5