Having been around for almost as long as Smartly.io, the DevOps team focuses on enabling a secure and scalable platform, where our service runs smoothly every day of the year. What is more, they take care of platform security and make sure we comply with both internal and external guidelines, such as ISO27001 certification.
Handling scale and complexity
The DevOps team runs a system that is among the biggest and most complex in Finland. The infrastructure allows more than one million unique ad creatives to be rendered per minute and processes petabytes of data every month through hundreds of servers. "Most of our services are running on Kubernetes and we take advantage of both cloud and bare metal", says Martti, the team lead for DevOps. "Essentially, it's a hybrid solution, where we use AWS, Google Cloud Platform, and Heroku, combined with some servers operating modern Linux-based dedicated servers hosted in data centers operated by Hetzner."
DevOps works closely with all development teams at Smartly.io to ensure smooth operations and ability to ship new features at a fast pace. They are involved in all services across our product organization, one of which is our on-demand image processing setup. Getting the images rendered in our Image Templates requires a sophisticated combination of Varnish, HAProxy, and Apache Traffic Server, which move a lot of information online. This network and caching platform alone enables more than three petabytes of outgoing image traffic per month.
Constantly developing ways of working
One of the biggest challenges the team faces in their day-to-day work is balancing the effort between all the different projects. The team sits down every two weeks to go through the backlog and decides on the most important things to work on. They strive to have two people working together on each of the more complicated tasks to avoid silos of knowledge. “We have created our own agile ways of working that don’t strictly follow any framework”, Martti says. “By picking and choosing the best practices from different frameworks and constantly reviewing what works and what doesn’t, we support workflows well suited for our team.”
Another way to avoid making mistakes and ensure replicability is adopting the principle of configuration as code. “We manage configuration resources in the source repository similarly as we do with code and treat them as versioned artifacts,” Martti explains. “Automating work this way saves us a significant amount of time”. Configuration as code combined with anticipating the future and doing the right things at the right time enables the team to operate like a well-oiled machine. “This means to say that we’re looking to use cloud technologies when it’s the best option and choose something else if it’s not feasible given the circumstances,” Martti describes.
Automation is a crucial factor in daily work, and the team is on a quest to find the optimal level of automation to keep the platform running. Right now, many tasks that require manual work could be automated, but automation is labor-intensive and risky. Martti describes the work as building a robot for watering plants: “It’s easy to teach the robot to pour a fixed amount of water on the plant. However, it gets a lot more complicated when you consider all the ways the process can go wrong. What if the robot needs to adjust the amount of water based on how dry or wet the soil is? What if the plant has been moved or it has died? It’s almost impossible to take all the different variables into account when automating DevOps work. Getting it right, especially with our large databases, requires rigidly defined areas of doubt and uncertainty.”
Looking for a new teammate
Along with being one of the oldest engineering teams in the company, the team is also full of expertise. Engineers Renne, Juuso, Martti, Pontus, Tuomas, Juha and Lassi have decades of combined Smartly.io experience topped with even more industry knowledge. For example, they do Kubernetes the hard way (running on bare metal) and their database knowledge is extensive. At the moment, the team is looking for a Site Reliability Engineer, who isn’t afraid of tackling large-scale problems. Engineers in the DevOps team get to work on a broad tech stack as databases alone include Cassandra, RabbitMQ, PostgreSQL and MongoDB as well as ElasticSearch and Redis. All team members operate on the full spectrum using tools such as Ansible, Docker, Logstash, Jenkins, Grafana, and Calico together with WireGuard, which means that the new team member should be willing to learn new things in a broad scope.
Read more about projects the DevOps team has contributed to: Sharding a PostgreSQL Database with Citus.