среда, 23 сентября 2020 г.

Data ingestion with AWS Lake formation

On the current project we have 2 billions of data points per day. These metrics are ingested with data pipelines into GCP storage. POC task I used to work - is a question "can we optimize a current architecture"? Target platform to develop architecture - Amazon web services. Draft architecture with AWS Lake formation looks pretty good on a high level:
It utilizes different types of datasources out of the box. And it looks additional connectors can be developed and attached. Lake Formation provides secure access to data through an AWS Identity and Access Management (IAM) policies. It operates with terms of "blueprint", "workflow" and "data catalog". A blueprint is a data management template to ingest data into a data lake. And a workflow is a container for AWS Glue jobs. The Data Catalog is persistent metadata store - it used to store, annotate, and share metadata in the AWS Cloud in the same way as in an Apache Hive metastore.

More about AWS Lake formation: https://docs.aws.amazon.com/lake-formation/latest/dg/how-it-works.html

Комментариев нет:

Отправить комментарий