Steven's Knowledge

Best Practices

Production-ready Ansible - project layout, CI/CD, testing with Molecule, security, and Terraform integration

Best Practices

These habits separate a small site.yml from a maintainable, multi-team automation repo.

Project Layout

A layout that scales from one playbook to many:

ansible/
├── ansible.cfg
├── requirements.yml              # Galaxy roles & collections
├── inventory/
│   ├── production.yml            # static or plugin
│   ├── staging.yml
│   ├── group_vars/
│   │   ├── all.yml
│   │   ├── all/secrets.yml       # vault-encrypted
│   │   ├── webservers.yml
│   │   └── databases.yml
│   └── host_vars/
├── playbooks/
│   ├── site.yml                  # full configuration
│   ├── setup.yml                 # base server setup
│   ├── deploy.yml                # application deploy
│   └── rotate-secrets.yml        # one-off maintenance
├── roles/
│   ├── common/
│   ├── docker/
│   ├── nginx/
│   └── app/
└── molecule/                     # role tests (per-role subdirs)
    └── nginx/

Key properties:

  • One inventory per environment, never mixed.
  • Secrets in group_vars/.../secrets.yml, vault-encrypted.
  • Playbooks are thin — they apply roles. Roles hold the logic.
  • A requirements.yml makes external roles reproducible.

ansible.cfg

A pragmatic default config:

[defaults]
inventory             = inventory/production.yml
roles_path            = roles:vendor_roles
collections_path      = collections
host_key_checking     = false                  # CI / ephemeral hosts
forks                 = 25                     # parallelism
stdout_callback       = yaml                   # readable output
gathering             = smart
fact_caching          = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout  = 86400
retry_files_enabled   = false

[ssh_connection]
pipelining            = true                   # ~2x faster
control_path          = /tmp/ansible-ssh-%%h-%%p-%%r
ssh_args              = -o ControlMaster=auto -o ControlPersist=60s

CI/CD: Lint on PR, Apply on Merge

# .github/workflows/ansible.yml
name: Ansible
on:
  pull_request:
    paths: ['ansible/**']
  push:
    branches: [main]
    paths: ['ansible/**']

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: '3.12' }
      - run: pip install ansible ansible-lint yamllint
      - run: yamllint ansible/
      - run: ansible-lint ansible/

  syntax:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: '3.12' }
      - run: pip install ansible
      - run: ansible-galaxy install -r ansible/requirements.yml
      - run: ansible-playbook ansible/playbooks/site.yml --syntax-check

  apply:
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    needs: [lint, syntax]
    runs-on: ubuntu-latest
    environment: production                    # requires reviewer approval
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: '3.12' }
      - run: pip install ansible
      - run: ansible-galaxy install -r ansible/requirements.yml

      - name: Configure SSH
        run: |
          mkdir -p ~/.ssh
          echo "${{ secrets.DEPLOY_KEY }}" > ~/.ssh/deploy_key
          chmod 600 ~/.ssh/deploy_key

      - name: Write vault password
        run: echo "${{ secrets.VAULT_PASSWORD }}" > ~/.vault_pass

      - name: Apply playbook
        run: |
          ansible-playbook ansible/playbooks/site.yml \
            -i ansible/inventory/production.yml \
            --vault-password-file ~/.vault_pass

Testing with Molecule

Molecule spins up ephemeral containers, applies your role, and asserts on the result. Reserve it for shared roles.

pip install molecule molecule-plugins[docker]
cd roles/nginx
molecule init scenario default
# roles/nginx/molecule/default/molecule.yml
---
dependency:
  name: galaxy
driver:
  name: docker
platforms:
  - name: ubuntu-22
    image: geerlingguy/docker-ubuntu2204-ansible
    pre_build_image: true
provisioner:
  name: ansible
verifier:
  name: ansible
# roles/nginx/molecule/default/verify.yml
---
- hosts: all
  tasks:
    - name: nginx is running
      service:
        name: nginx
        state: started
      check_mode: true
      register: status
      failed_when: status.changed

    - name: nginx returns 200 on /
      uri:
        url: http://localhost/
        status_code: 200
molecule test          # create → converge → verify → destroy
molecule converge      # just apply the role; keep container running
molecule login         # exec into the container for debugging

Security

PrincipleWhat it looks like
No plaintext secretsVault-encrypt every secret file; pre-commit hook to refuse pushes of unencrypted secrets
Pin everythingAnsible version (in requirements.txt or container), role versions, collection versions
Least-privilege SSHDedicated deploy user; SSH keys per environment; sudoers limited to required commands
Audit playbook runsCentralize logs (log_path in ansible.cfg); ship to your SIEM
Check mode in CI--check --diff runs as part of PR review
Lint and scanansible-lint, yamllint, plus kics / trivy config scanners

A pre-commit hook to block unencrypted vault files:

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/ansible-community/ansible-lint
    rev: v24.2.0
    hooks:
      - id: ansible-lint
  - repo: local
    hooks:
      - id: vault-encrypted
        name: Vault files must be encrypted
        entry: bash -c 'grep -L "^\\$ANSIBLE_VAULT" "$@" | tee /dev/stderr | wc -l | grep -q "^0$"' --
        files: 'secrets\.ya?ml$'
        language: system

Terraform + Ansible Integration

Two tools, one pipeline:

PhaseToolAction
ProvisioningTerraformCreate VMs, networks, load balancers, DNS
ConfigurationAnsibleInstall software, configure services, deploy apps
UpdatesAnsibleRolling deploys, config changes
ScalingTerraformAdd/remove instances, update infrastructure

The clean way to hand off: have Terraform write Ansible inventory.

# Terraform: tag instances and emit an inventory file
resource "aws_instance" "web" {
  count = var.instance_count
  # ...

  tags = {
    Name        = "web-${count.index}"
    Role        = "webserver"
    Environment = var.environment
  }
}

resource "local_file" "ansible_inventory" {
  filename = "${path.module}/../ansible/inventory/${var.environment}.yml"
  content  = templatefile("${path.module}/inventory.tpl", {
    web_hosts = aws_instance.web[*]
  })
}

Or skip the file and have Ansible query AWS directly with the EC2 plugin (see Advanced Patterns — Dynamic Inventory).

Operating Tips

A handful of habits worth adopting:

  1. Always --check --diff first. Two extra seconds; catches every "wait, I didn't expect that" before it lands on prod.
  2. --limit is your friend. Practice on one host before the fleet.
  3. Tag everything. --tags deploy for fast iteration; --skip-tags slow-setup for re-runs.
  4. Don't command: or shell: without thinking. Look for a real module first; the result is idempotent.
  5. Roll, don't restart all at once. serial: plus a health check plus LB de/re-register = zero-downtime deploys.
  6. One play per concern. Setup, deploy, secret rotation — separate playbooks. Easier to reason about.
  7. Re-run the same playbook regularly. If apply produces unexpected changed counts, something drifted — investigate.

Checklist

Pre-production Ansible checklist

  • Per-environment inventory (no cross-env mixing)
  • All secrets encrypted with Ansible Vault
  • Vault password stored in a secret manager (not in the repo)
  • Roles versioned via requirements.yml
  • CI runs yamllint + ansible-lint + --syntax-check on every PR
  • Production apply gated behind code review + manual approval
  • Shared roles have Molecule tests
  • Rolling deploys use serial: and health checks
  • Handlers used for service restarts (not service: restarted everywhere)
  • Pipelining and SSH ControlMaster enabled for speed
  • Fact caching enabled
  • Periodic drift run scheduled (ansible-playbook site.yml --check --diff in CI)

On this page