четверг, 15 октября 2020 г.

[Links for myself] Clojure book, Machine Learning, preschool, free GPUs

Well, old good times - if you so old as me you should be familiar with lists of favorite links.

(Btw, on a picture - pub in my town, beer & rock'n'roll included)

Clojure web development book (HTTP, Routing , Middleware): https://grishaev.me/clj-book-web-1/ 

Writing Clojure web applications with RING (library, not a framework): https://www.baeldung.com/clojure-ring

Hitchhikers guide to Machine Learning (algorithms pros and cons, explanation like for school kids): https://tproger.ru/translations/hitchhikers-guide-to-ml/

Google's Colab (free GPU, notebook style): http://colab.research.google.com/

Kaggle (a lot of datasets for hardware, good to play with monitoring data, free GPU hours): https://www.kaggle.com

UML diagrams I used for AWS architecture trainings and classes: https://www.lucidchart.com

Fun explanation about quantum effects (and how/why quantum computing works) https://www.youtube.com/watch?v=g_IaVepNDT4

IBM quantum experience (and lego bricks to play with: https://quantum-computing.ibm.com

Quantum development kit (by M$, emulator (I wondering how they emulate frozen qubits on my hot laptop) included): https://www.microsoft.com/en-us/quantum/development-kit and simulator itself https://github.com/StationQ/Liquid

And Qiskit: https://qiskit.org/

Oh, The Lord, please give me power to read all of it and the calmness and eggs of steel to understand 10% of it.

Still alive? Wiki: https://en.wikipedia.org/wiki/Cloud-based_quantum_computing


четверг, 1 октября 2020 г.

Data streaming 101: AWS Kinesis, Redshift, MySQL, Athena

These are my notes from a discovery task I have had:
  • EC2 can write to Kinesis data stream and then Firehose delivery stream can transform data with Lambda and store result at S3 or EMR.
  • EMR is good choice to run Spark application (Python, Java, Scala) and may use S3 as a destination storage.
  • Athena performs 30% better with S3 data formatted as Parquet compared to JSON.
  • Redshift is read-optimized while MySQL is write-optimized. 
  • Application performance with MySQL would benefit in loading small volumes of data more frequently. 
  • Redshift is more efficient at loading large volumes of data less frequently.
And a great video with Vladimir and Benjamin discussing evolution of architecture:

понедельник, 28 сентября 2020 г.

[Cheatsheet] Patterns in Cloud Computing

Patterns are defined as a known ways to-do-things-as-good-guys. It is widely used concept in software architecture. I personally know guys who are ready to kill for patterns. But please, don't forget one proven thing - today's patterns may turn into anti-patterns in next year (or tomorrow!).

So here is cheatsheet, original page is "Cloud Computing Patterns



пятница, 25 сентября 2020 г.

Architecture cheatsheet: SOA vs Microservices

Martin Fowler gives a good definition for Microservices architecture term. In short this architectural style concentrates on building a suite of small services. Every of these small guys are running in its own space and communicating via lightweight way - say HTTP/REST.


In comparison to classic SOA architecture Microservices approach concentrates on really "small" parts with two different responsibilities:
  • Infrastructure services
  • Functional services
Functional components within a microservices architecture in fact has single purpose, and do one thing at  a time. The big win is they do it  really really well. 

среда, 23 сентября 2020 г.

Data ingestion with AWS Lake formation

On the current project we have 2 billions of data points per day. These metrics are ingested with data pipelines into GCP storage. POC task I used to work - is a question "can we optimize a current architecture"? Target platform to develop architecture - Amazon web services. Draft architecture with AWS Lake formation looks pretty good on a high level:
It utilizes different types of datasources out of the box. And it looks additional connectors can be developed and attached. Lake Formation provides secure access to data through an AWS Identity and Access Management (IAM) policies. It operates with terms of "blueprint", "workflow" and "data catalog". A blueprint is a data management template to ingest data into a data lake. And a workflow is a container for AWS Glue jobs. The Data Catalog is persistent metadata store - it used to store, annotate, and share metadata in the AWS Cloud in the same way as in an Apache Hive metastore.

More about AWS Lake formation: https://docs.aws.amazon.com/lake-formation/latest/dg/how-it-works.html

понедельник, 21 сентября 2020 г.

Amazon Kineses - use big data streaming to build real-time app

Good practical example of architecture - Ads data is ingested real time, then processed in custom application and delivered to user. Amazon Kinesis is a fully managed, cloud-based service for real-time data processing over large, distributed data streams.

среда, 24 июня 2020 г.

Grafana 7: ML/AI, creating model for metrics from InfluxDB database

This video tutorial displays a process of creating a machine learning model for memory metrics in InfuxDB. Grafana used as a target platform to visualize it. Datasource source files are available https://github.com/vsergeyev/loudml-grafana-app (MIT license)

среда, 18 марта 2020 г.

Machine Learning with Loud ML and Grafana

Once upon a time I have discovered a Loud ML AI solution on a web. It's written in Python and use Keras as a backend. Loud ML use donut unsupervised learning based models type. it is great for being able to produce ML results in near-real time.

One may create a model by simply POST to http://loudml_server:8077/models

{
        "bucket_interval": "5m",
        "default_bucket": "telegraf_autogen_cpu",
        "features": [
            {
                "name": "mean_usage_user",
                "measurement": "cpu",
                "field": "usage_user",
                "metric": "mean",
                "io": "io",
                "default": null,
                "match_all": [
                    {
                        "tag": "cpu",
                        "value": "cpu-total"
                    },
                    {
                        "tag": "host",
                        "value": "macbook4823"
                    }
                ]
            }
        ],
        "interval": "60s",
        "max_evals": 10,
        "name": "telegraf_cpu_mean_usage_user_cpu_cpu_total_host_macbook4823_5m",
        "offset": "10s",
        "span": 100,
        "type": "donut"
    }

So I have spent a bit of free time to play with these ML models.

Grafana has an easy to start example for graph panel. I have added a button "Create Baseline" to get current data selection and POST it to Loud ML server.

These are screenshots of setup.

On the top is my Loud ML panel with 3 series - 1st one is for original data (input). Series 2nd and 3rd are prediction results from Loud ML model. Bottom panel is Grafana's built-in graph and it has ability to show annotations - anomalies detected by Loud ML model.

This is screenshot of Loud ML specific options setup in panel. Next one is a Queries settings - Query A is original data from InfluxDB; Query B are predictions stored in output database.


And for reference, here is a details for model created. Model have options for GroupBy, Tags and Fill values.

Source code for Grafana panel is on GitHub: https://github.com/vsergeyev/loudml-grafana-app

It may be installed as a regular plugin, please see Grafana documentation.

пятница, 17 января 2020 г.

Our brains are game changers

Yesterday's evening I have spent with my daughter helping me to fix an old PC. It was an issue in an AC power supply (box was too old, back from ±15 years ago). So PC cannot handle a demand for power from it's components any more. We fixed it with getting out 2 from 4 RAM modules.

Then I have finally booted it. My daughter was happy a lot cause her pet game project called "My Pocket Mouse" was saved (yeap, I have copied it's files to my new macbook; need to put into git repo some how).

And I have found some files I have worked 15-20 years ago - Delphi projects, my first PHP web sites and some old OS/2 stuff I have worked on.

And so one story from 2006th come into my mind finally:

So it started one day, occasionally. My friend had comed by and told me about a new cool framework for Python programming language. It looked so cool for my brain after all this accounting software and place of dev team manager in a small IT company.

What exactly happens when you have a good job, stable income and established relations in your current workplace? Your brain doesn't see a lot of changes and starts to think - “Hey man, it is missing something. I need a change, an adrenaline, dopamine and some stress. So I will work it out and will feel happy after some problems.” This is a natural way our brains work - they need new information. Remember how it was when you got a new place to live - lot of tasks to do (positive emotions, feeling of a goal to complete), a lot furniture to buy, expenses (stress), select a painting (even more stress), stupid painters (negative emotions), then positive emotions - you got something ready. This positive/negative curve is a great deal for the brain. This guy sees he is alive, hormones are produced, and we get emotional feedback.

So how is this connected with a willingness to change a job? My brain said to me, - “Man, c’mon, this is a really cool thing to switch your career”. At this time I have no debts, loans and family so why not. Another cool part of a deal was - this can be done part time, like a contractor. It was about 15 years ago and software outsourcing in Ukraine just appeared. One was able to put a profile online saying “I’m a cool damn software contractor”. And no matter how skilled you are in reality. Just type in a form - “Software development skills - Advanced”, “HTML skills - Advanced”, “JavaScript programming language - Advanced”. Oh and title - “Senior Full Stack Software Expert with a Focus on Client’s Success”.

You're done.