Brazilian web giant Globo.com has completed a revamp of its live streaming architecture with DataStax distributed database Cassandra to cope with increased demand during the Olympic Games.
Globo.com is the internet arm of Grupo Globo, one of the 5 largest media conglomerates in the world, producer and exporter of content such as series, soaps and shows worldwide. During the Games, the company will be broadcasting 100 percent of the competitions online.
Cassandra was the solution for the firm's concerns around scaling the storage supporting its live streaming capabilities as it will be managing some 324 new streams broadcasting the various Olympic events - that's more than three times the normal workload.
"Going back to any given point in a program is a common functionality in cable TV, but we needed to cater for the live demand and consequently, look at ways to improve the storage of the live streaming," Globo.com product manager for the online video platform, Igor Macaubas, tells ZDNet.
Ramping up storage
Having a platform to ingest and stream content live is crucial for Globo. This is currently done using the HTTP Live Streaming (HLS) protocol, which enables content broadcasting to web, mobile and smart TV users.
The need for streaming came up in 2014, just before the World Cup. Globo.com then introduced a Digital Video Recording (DVR) feature underpinned by in-memory storage Redis, which allowed users to pause and rewind videos on a cache-like set up.
The feature worked well for the football competitions since the firm was only broadcasting a maximum of two matches at a time. The requirement to scale then became apparent during the General Elections that year, when the Internet group needed to broadcast 27 candidate debates simultaneously.
High availability and scalability were key concerns in terms of storage to support the live streaming operation, which Globo.com sought to address by trying out the open source version of Cassandra.
"Shortly after the World Cup we started to use DVR more broadly but it wouldn't scale at all in the way we needed: streaming one live channel was already a challenge, let alone more than 100 channels. That's what prompted our decision to try Cassandra," says Macaubas.
With DataStax 4.5, the NoSQL company is offering fast analytics through Apache Spark as well as the option to merge Cassandra and Hadoop data.
"We have a strong open source culture at Globo and we could hear that Netflix was one of the companies using [Cassandra] and recommending it strongly so we went for it and allocated 4 super servers to record all of our streaming," he adds.
According to Macaubas, the team needed to produce six different versions of the streaming to adapt to the user's broadband quality, meaning it actually had about 600 streaming channels being broadcast within the initial 100 channels.
"That means we have a huge demand in terms of real-time recording and also that if we introduce latency, this will generate delays so we have to be fast - and that's where Cassandra shines," the manager says.
After the positive experience from the Brazilian elections, Globo's online team was confident enough to test the new architecture on its two main products, Globosat Play and Globo Play. This resulted in 168 new streams, worth 88GB of disk space, being ingested and streamed on the platform.
The media firm then decided to move on to DataStax's enterprise version of Cassandra earlier this year, following a realization that getting access to round the clock support was crucial to prepare for the Olympic Games.
"We had some serious issues upgrading from one version to the next of the open source version of Cassandra and realized that we had much more complex needs than what our in-house skills would be able to address, so we decided to use the enterprise version in order to get the support we needed," says Macaubas.
Globo has its own infrastructure for streaming and does not use content delivery networks (CDNs). The company also has access to a Internet Exchange Point (IXP) provided by the Brazilian Network Information Center for traffic offloads. In addition, the firm also has distribution agreements with most local Internet Service Providers.
Apache has just released Apache Cassandra v2.0, the latest version of its popular highly-scalable, big data distributed database.
The media company currently uses two Datastax Enterprise 4.8 datacenters, with four nodes each, one based in Rio de Janeiro and another in São Paulo. According to Macaubas, the company already employed 60 servers for live broadcasting and that number was increased to about 100 to cover the demand during the Games.
The replication factor is two and query consistency levels are defined as one, since the dataset is never updated, eliminating the need to check consistency on other nodes, improving response time. Each node runs on RHEL 6.5 and has 24 cores, 64GB of RAM and 1TB SSD.
Various load testing events were also carried out ahead of the Games, with the equivalent of double the amount of signals and demand expected for the period during a 2-3 week timeframe, in addition to charging the distribution infrastructure.
According to Macaubas, one of the main technical lessons learned during the progressive adoption of Cassandra was the requirement to use solid state drives (SSDs). That is because they handle writes better than traditional spinning disks. "You absolutely cannot do Cassandra clusters without SSD," he says.
However, the biggest lesson was not technology-related, but rather a human one. "We are very happy with our current Cassandra set up and we have a very good level of support," Macaubas says. "But this journey has also shown that you also have to be humble enough to ask for help when you need it."