Back to Projects

DevOps & IaC

Automated IT Infrastructure Recovery with Terraform

About this Project

Project Overview

This project was the subject of a bachelor thesis focused on evaluating and implementing an automated disaster recovery solution for IT infrastructure using Infrastructure as Code (IaC). The primary objective was to design and build a prototype that could rapidly and reliably restore a foundational cloud environment following a total system failure. The solution leverages Terraform to automate the entire provisioning and configuration process, ensuring consistency and significantly reducing manual intervention during a critical recovery event.

The Challenge

The project addressed a complex "chicken-and-egg" problem rooted in the high-security standards of a customer project, which followed a Zero-Trust security model. The existing CI/CD runners were not permitted to provision the recovery infrastructure from outside the environment, as this would expose confidential data and endpoints. This created a deadlock: to build the secure, internal infrastructure, an internal GitLab Runner was needed, but to create that runner, the infrastructure had to be built first. The challenge was to develop a fully automated bootstrapping process that could create this initial, secure foundation—including the GitLab Runner itself—from a local machine, thereby solving the procedural impasse and enabling the rest of the infrastructure to be built out securely from within.

Key Features

  • Automated Infrastructure Provisioning: Leverages Terraform to declaratively define and create all necessary Azure resources, including a virtual network (VNet) in a hub-and-spoke topology, subnets, a virtual machine, and a secure Bastion Host.
  • Dynamic GitLab Runner Configuration: Utilizes cloud-init to automatically configure the provisioned VM on its first boot. This includes installing Docker and the GitLab Runner, registering the new runner with the specified GitLab group using dynamic credentials, and setting up systemd services for automated maintenance and cleanup.
  • GitLab Integration: The Terraform plan automatically creates a new GitLab group and populates it with projects by pushing local repositories. This makes the infrastructure and application code immediately available for the newly created runner to execute its pipelines.
  • Modular and Reusable Architecture: The Terraform code is structured into three distinct layers: Core (basic resource definitions), Standard Building Blocks (SBBs - combining core modules to meet compliance), and Exposed Modules (the final deployable solution). This pattern promotes reusability, maintainability, and strict adherence to organizational standards.
  • Dual Deployment Modes: The system is designed for flexibility. It supports local execution via simple shell scripts (apply.sh, destroy.sh) for the initial disaster recovery scenario. Additionally, it includes a fully configured GitLab CI/CD pipeline for remote execution, enabling validation, testing, and ongoing management of the infrastructure.
  • Secure by Design: Implements secure access to the runner VM via Azure Bastion, eliminating the need for public IP addresses and exposed SSH ports. Network Security Groups (NSGs) are used to enforce strict firewall rules for both inbound and outbound traffic, ensuring a hardened environment from the moment of creation.

Technical Deep Dive

  • Infrastructure as Code (IaC): The entire cloud environment is defined using Terraform, with Terragrunt acting as a thin wrapper to keep the configuration DRY (Don't Repeat Yourself) and manage remote state. This declarative approach ensures that the infrastructure is version-controlled in Git, repeatable, and easily auditable. The modular design allows developers to easily compose complex infrastructure from pre-approved, compliant building blocks.

  • Automation & Deployment: For the primary disaster recovery use case, simple Shell Scripts provide a one-command interface to provision the entire foundation. For continuous integration and validation, a comprehensive .gitlab-ci.yml file orchestrates the process remotely. This pipeline includes stages for linting, validation, planning, and applying changes, with manual approvals for destructive actions to prevent accidental infrastructure removal.

  • Configuration Management: The virtual machine for the GitLab Runner is bootstrapped using a cloud-init configuration file. Terraform dynamically populates this file with variables (like GitLab registration tokens and runner tags) before passing it to the Azure API. This powerful technique ensures the VM is fully configured and operational on its first boot, without any manual SSH access or configuration required.

Personal Learnings

This thesis project provided a profound, hands-on experience in solving a critical, real-world DevOps challenge within a high-security corporate context. I moved beyond basic IaC principles to design a sophisticated, multi-layered Terraform architecture that emphasizes reusability and compliance. The core challenge taught me how to architect solutions that elegantly overcome procedural deadlocks, transforming a manual, error-prone process into a fully automated, reliable system. I also deepened my expertise in CI/CD pipeline construction, cloud-init, and shell scripting, ultimately delivering a comprehensive solution that serves as a robust blueprint for disaster recovery automation.