A few weeks ago we hosted the 8th (and 3rd virtual) Smartly.io DevTalks. This time we dove into the world of Kubernetes with three seasoned experts: Barak Schoster, Gibbs Cullen, and Shreyas Srivatsan. We were particularly thrilled about the night’s theme, since we use Kubernetes quite a bit at Smartly.io and all three talks tackled issues that we have faced in scaling our own product development operations.
Barak Schoster: Developer Productivity and How It Relates to Static Analysis of Infrastructure as Code Manifests
Barak Schoster is CTO and co-founder at Bridgecrew, a company that helps teams secure their cloud infrastructure. When he isn’t at the beach with his kids, Barak often contributes to open-source projects like Checkov and Prowler. He has previously worked for RSA, Fortscale, and IDF tech unit and focuses on cybersecurity, machine learning, and big data architecture.
In his talk, Barak covered best practices for writing, testing, and maintaining infrastructure at scale using policy-as-code in CI/CD and runtime. Barak demonstrated how to embed security into Kubernetes, Terraform, Cloudformation, and Serverless manifest and covered the current state of open-source repositories and Kubernetes manifests found in the wild. You can find the source code of the presented tool in Github.
Barak’s talk served as a good reminder that “security is not just Jira tickets” but also changes that we learn from — which makes a tight feedback loop important. At Smartly.io, we have a lot of microservices and developer teams are doing their own DevOps work as far as it is feasible. Having more people manage our infrastructure allows for flexibility, but it also leaves room for errors. Our Developer Tooling team is currently exploring ways to make developers more confident and knowledgeable in dealing with the infrastructure. Barak definitely gave them food for thought! It is very likely that we will implement these learnings in our GitHub Actions CI in the near future.
Gibbs Cullen: Intro to M3 and Prometheus: Monitoring at Global Scale
Gibbs Cullen is a Developer Advocate at Chronosphere committed to helping the community understand the concepts behind Prometheus and using M3 as a long-term storage. In addition, Gibbs helps the community with best practices on alerting, monitoring, and configuring deployments of Prometheus and M3 in Kubernetes. Before joining Chronosphere, she was a Product Manager on the AWS Data Lab team.
Prometheus isn’t ideal when you try storing more metrics, using longer retention, and establishing a single pane of glass on top of Prometheus for monitoring needs across regions. Thankfully, Gibbs introduces us to
M3, which is an open-source metrics platform. In her talk, she walks us through how to deploy M3Coordinator and M3DB using the M3 Kubernetes operator, and compile your Prometheus instances into a single global monitoring system, using a real-world example.
At Smartly.io, we use Prometheus as the primary monitoring tool for our Kubernetes clusters. We’ve scaled some of the clusters up to a point where a simple monitoring solution is no longer able to handle the load. Furthermore, our development environment is experiencing some storage issues. Gibbs gave us invaluable insights on how to start experimenting with M3. M3 seems a very lucrative answer to the reliability, scalability, and efficiency pain points Prometheus has.
Shreyas Srivatsan: Comprehensive Observability of Your Microservices Using Deep Linked Metrics and Traces
Shreyas is a Technical Lead at Chronosphere working on all things monitoring. He has previously worked as a Technical Lead at Uber on the observability alerting infrastructure team. Shreyas is greatly interested in monitoring of all kinds and has contributed to Prometheus, upstreaming exemplar support for the OpenMetrics parser. Prior to Uber, Shreyas was a Senior Software Engineer on the Hyper-V Hypervisor team at Microsoft.
In his talk, Shreyas demonstrates how to enable jumping straight from an alert notification to displaying a problematic trace along with a comparison to a non-problematic trace. This is accomplished with a combination of open-source tools like OpenTelemetry, OpenMetrics, Prometheus, Jaeger, Grafana, and M3. The audience also learns how recent advances in the community can enable them to reduce their time-to-mitigation by providing the relevant context of a bad request vs. a good request directly from a graph.
The Smartly.io Product Development team currently works on implementing tracing and linking it to our logging and metrics in a simple way, which makes Shreyas’ talk highly relevant for us. Following his exemplars of good requests would most likely prove to be useful for our team, too. After seeing his talk, we are encouraged to take the time to set up proper tracing to solve issues sooner rather than later.
If you want to learn about how we use Kubernetes at Smartly.io, we have a few more resources for you
Kubernetes the Hard Way by Martti
How (and Why) We Run Our Development Environment in Kubernetes by Mark
Stay tuned for more
DevTalks will be back with more intriguing talks to boost your learning. Join the meetup group at https://www.meetup.com/Smartly-io-DevTalks/ to stay up to date about upcoming events.