Show HN: Open-source template for end-to-end streaming analytics https://ift.tt/x5kVnYL

Show HN: Open-source template for end-to-end streaming analytics To help my future self, I decided to build a repository in which I can quickly deploy an end-to-end modern analytics pipeline, from ingestion to fast analytics and business dashboards, including data exploration, time-series forecasting, and monitoring of the stack. Of course, all the components are open source, and you can use this template as a stepping stone for your near-realtime streaming analytics. What's the inspiration? I’ve been working with streaming analytics for a long time. I’ve done not-too-stale analytics with a RDBMs incremental query and a spreadsheet, gone over the micro-batch-looks-almost-like-real-time lambda analytics, and the near-real-time analytics since kappa and afterwards. The range and features of tools today is way better than what we had 15 years ago. What remains constant is the requirement for freshness of data, and for more advanced analytics. This means that you cannot really build a reliable data pipeline for near-realtime analytics at scale using a single component, and every time you start a new project you waste a lot of time just integrating the different moving parts. When the repository starts, the pipeline will collect public events from the GitHub API, send them to a message broker (Apache Kafka), persist them into a fast time-series database (QuestDB), and visualize them on a dashboard (Grafana). It will also provide a web-based development environment (Jupyter Notebook) for data science and machine learning. Monitoring metrics are captured by a server agent (Telegraf) and stored back into the time-series database (QuestDB). Hopefully others in the community find this useful! https://ift.tt/58GafNZ February 9, 2024 at 01:22AM

Comments