10M events per second into HDFS, Under a sec query per 20GB of HDFS data… All of this and more will be demonstrated live during this presentation that explores the area of real-time data ingest into Hadoop.
It presents the architectural trade-offs as well as demonstrate alternative implementations that strike the appropriate balance across the following common challenges:
* Decentralized writes (multiple data centers and collectors)
* Continuous Availability, High Reliability
* No loss of data
* Elasticity of introducing more writers
* Bursts in Speed per syslog emitter
* Continuous, real-time collection
* Flexible Write Targets (local FS, HDFS etc.)
[vimeo 74713138 480 270]
Video producer: http://jz13.java.no/