Wednesday, November 27, 2013

Redis for Data Analysis

Attractor has a lot of realtime calculations. In this post I would like to share my insights about working with streaming data using redis. There are many challenging issues related with realtime data analysis. Among them are the following:
  • How to store data?
  • Which data structures should be used?
  • How to keep data persistent?
  • How to make queries fast? 
The list is not an ideal one. However, it shows what you should keep in mind while building your own data warehouse with realtime data processing. 

You should definitely look through these presentations:





High-Volume Data Collection and Real Time Analytics Using Redis from cacois


My personal insights:

You should not use Redis like HDFS or other "big data" storage. Raw data can be stored in Redis, but not for a long time. Transfer raw data to other data storage solutions regularly.

Use all data structures efficiently. You should clearly understand when it is appropriate to use sorted sets, hashes or bitmaps. If you do everything right, Redis is able to handle billions of rows within milliseconds.

Don't forget about data persistence. Use AOF each second and keep db dumps in a safe place.



No comments:

Post a Comment