четверг, 1 октября 2020 г.

Data streaming 101: AWS Kinesis, Redshift, MySQL, Athena

These are my notes from a discovery task I have had:
  • EC2 can write to Kinesis data stream and then Firehose delivery stream can transform data with Lambda and store result at S3 or EMR.
  • EMR is good choice to run Spark application (Python, Java, Scala) and may use S3 as a destination storage.
  • Athena performs 30% better with S3 data formatted as Parquet compared to JSON.
  • Redshift is read-optimized while MySQL is write-optimized. 
  • Application performance with MySQL would benefit in loading small volumes of data more frequently. 
  • Redshift is more efficient at loading large volumes of data less frequently.
And a great video with Vladimir and Benjamin discussing evolution of architecture:

Комментариев нет:

Отправить комментарий