Scheduling tasks on Google cloud platform
At travel audience we use Google Cloud Platform in our data engineering stack. Every day, we process terabytes of data using Apache Beam streaming workflows (on Cloud Dataflow). Our analytic workload is powered by batch jobs using Beam/Dataflow or Spark/Dataproc. All this works well, but there are some workloads which are not continuous streaming applications but needs to be executed on a regular basis. This demands reliable tasks scheduling. But at the moment there is no task scheduler service on Google Cloud Platform that fits our workloads. In this article, our Data Engineer Ziyad Muhammed Mohiyudheen describes a use case that demands scheduling with some of the options which we considered and how (and more importantly why) we decided on one option.