Migrate 10T data from On Premise to RDS without interuption we have a mission critical mysql database hosted on premise for many years, as the data growing it has reached more than...
Flink WindowFunctions MapReduce is an old technology in nowadays, but its concept is still valid in data processing world, in previous article...
Time and Window in Flink Stream Application Tyler Akidau published Streaming 101: The world beyond batch up on Oreilly in 2015, this is the fundamental theory of Go...
Flink Stream Branches In a stream application, it's very common to split the stream into multiple streams and apply different logic, in Flink ...
Beam and Flink Functions Apache Beam could be used as API layer of Apache Flink, function is the function is the fundamental ops in these 2 frame...
Why I make Flinker project and how to bootstrap a Flink project I've been using Apache Beam for many years to process big data, Apache Beam support lots of runtime under the hood, e.g ...
Use AWS Aurora Serverless to handle huge queries Recently I've developped some statistics logic to query mysql database and collect the results, every single query is op...
Deploy pySpark jobs into kubernetes with python dependencies Recently I spend some time on pySpark to see if it could help our team to resolve some of the problems we are facing on ...
Index is Key to Mongo Performance I have a data flow to insert about 100M records into mongoDB, I'm using Beam to run on Flink cluster to deal with the th...