Data

we have a mission critical mysql database hosted on premise for many years, as the data growing it has reached more than...

MapReduce is an old technology in nowadays, but its concept is still valid in data processing world, in previous article...

Tyler Akidau published Streaming 101: The world beyond batch up on Oreilly in 2015, this is the fundamental theory of Go...

In a stream application, it's very common to split the stream into multiple streams and apply different logic, in Flink ...

Apache Beam could be used as API layer of Apache Flink, function is the function is the fundamental ops in these 2 frame...

I've been using Apache Beam for many years to process big data, Apache Beam support lots of runtime under the hood, e.g ...

Recently I've developped some statistics logic to query mysql database and collect the results, every single query is op...

Recently I spend some time on pySpark to see if it could help our team to resolve some of the problems we are facing on ...

I have a data flow to insert about 100M records into mongoDB, I'm using Beam to run on Flink cluster to deal with the th...