Best Practices
Production-ready Ansible - project layout, CI/CD, testing with Molecule, security, and Terraform integration
Best Practices
These habits separate a small site.yml from a maintainable, multi-team automation repo.
Project Layout
A layout that scales from one playbook to many:
ansible/
├── ansible.cfg
├── requirements.yml # Galaxy roles & collections
├── inventory/
│ ├── production.yml # static or plugin
│ ├── staging.yml
│ ├── group_vars/
│ │ ├── all.yml
│ │ ├── all/secrets.yml # vault-encrypted
│ │ ├── webservers.yml
│ │ └── databases.yml
│ └── host_vars/
├── playbooks/
│ ├── site.yml # full configuration
│ ├── setup.yml # base server setup
│ ├── deploy.yml # application deploy
│ └── rotate-secrets.yml # one-off maintenance
├── roles/
│ ├── common/
│ ├── docker/
│ ├── nginx/
│ └── app/
└── molecule/ # role tests (per-role subdirs)
└── nginx/Key properties:
- One inventory per environment, never mixed.
- Secrets in
group_vars/.../secrets.yml, vault-encrypted. - Playbooks are thin — they apply roles. Roles hold the logic.
- A
requirements.ymlmakes external roles reproducible.
ansible.cfg
A pragmatic default config:
[defaults]
inventory = inventory/production.yml
roles_path = roles:vendor_roles
collections_path = collections
host_key_checking = false # CI / ephemeral hosts
forks = 25 # parallelism
stdout_callback = yaml # readable output
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 86400
retry_files_enabled = false
[ssh_connection]
pipelining = true # ~2x faster
control_path = /tmp/ansible-ssh-%%h-%%p-%%r
ssh_args = -o ControlMaster=auto -o ControlPersist=60sCI/CD: Lint on PR, Apply on Merge
# .github/workflows/ansible.yml
name: Ansible
on:
pull_request:
paths: ['ansible/**']
push:
branches: [main]
paths: ['ansible/**']
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: '3.12' }
- run: pip install ansible ansible-lint yamllint
- run: yamllint ansible/
- run: ansible-lint ansible/
syntax:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: '3.12' }
- run: pip install ansible
- run: ansible-galaxy install -r ansible/requirements.yml
- run: ansible-playbook ansible/playbooks/site.yml --syntax-check
apply:
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
needs: [lint, syntax]
runs-on: ubuntu-latest
environment: production # requires reviewer approval
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: '3.12' }
- run: pip install ansible
- run: ansible-galaxy install -r ansible/requirements.yml
- name: Configure SSH
run: |
mkdir -p ~/.ssh
echo "${{ secrets.DEPLOY_KEY }}" > ~/.ssh/deploy_key
chmod 600 ~/.ssh/deploy_key
- name: Write vault password
run: echo "${{ secrets.VAULT_PASSWORD }}" > ~/.vault_pass
- name: Apply playbook
run: |
ansible-playbook ansible/playbooks/site.yml \
-i ansible/inventory/production.yml \
--vault-password-file ~/.vault_passTesting with Molecule
Molecule spins up ephemeral containers, applies your role, and asserts on the result. Reserve it for shared roles.
pip install molecule molecule-plugins[docker]
cd roles/nginx
molecule init scenario default# roles/nginx/molecule/default/molecule.yml
---
dependency:
name: galaxy
driver:
name: docker
platforms:
- name: ubuntu-22
image: geerlingguy/docker-ubuntu2204-ansible
pre_build_image: true
provisioner:
name: ansible
verifier:
name: ansible# roles/nginx/molecule/default/verify.yml
---
- hosts: all
tasks:
- name: nginx is running
service:
name: nginx
state: started
check_mode: true
register: status
failed_when: status.changed
- name: nginx returns 200 on /
uri:
url: http://localhost/
status_code: 200molecule test # create → converge → verify → destroy
molecule converge # just apply the role; keep container running
molecule login # exec into the container for debuggingSecurity
| Principle | What it looks like |
|---|---|
| No plaintext secrets | Vault-encrypt every secret file; pre-commit hook to refuse pushes of unencrypted secrets |
| Pin everything | Ansible version (in requirements.txt or container), role versions, collection versions |
| Least-privilege SSH | Dedicated deploy user; SSH keys per environment; sudoers limited to required commands |
| Audit playbook runs | Centralize logs (log_path in ansible.cfg); ship to your SIEM |
| Check mode in CI | --check --diff runs as part of PR review |
| Lint and scan | ansible-lint, yamllint, plus kics / trivy config scanners |
A pre-commit hook to block unencrypted vault files:
# .pre-commit-config.yaml
repos:
- repo: https://github.com/ansible-community/ansible-lint
rev: v24.2.0
hooks:
- id: ansible-lint
- repo: local
hooks:
- id: vault-encrypted
name: Vault files must be encrypted
entry: bash -c 'grep -L "^\\$ANSIBLE_VAULT" "$@" | tee /dev/stderr | wc -l | grep -q "^0$"' --
files: 'secrets\.ya?ml$'
language: systemTerraform + Ansible Integration
Two tools, one pipeline:
| Phase | Tool | Action |
|---|---|---|
| Provisioning | Terraform | Create VMs, networks, load balancers, DNS |
| Configuration | Ansible | Install software, configure services, deploy apps |
| Updates | Ansible | Rolling deploys, config changes |
| Scaling | Terraform | Add/remove instances, update infrastructure |
The clean way to hand off: have Terraform write Ansible inventory.
# Terraform: tag instances and emit an inventory file
resource "aws_instance" "web" {
count = var.instance_count
# ...
tags = {
Name = "web-${count.index}"
Role = "webserver"
Environment = var.environment
}
}
resource "local_file" "ansible_inventory" {
filename = "${path.module}/../ansible/inventory/${var.environment}.yml"
content = templatefile("${path.module}/inventory.tpl", {
web_hosts = aws_instance.web[*]
})
}Or skip the file and have Ansible query AWS directly with the EC2 plugin (see Advanced Patterns — Dynamic Inventory).
Operating Tips
A handful of habits worth adopting:
- Always
--check --difffirst. Two extra seconds; catches every "wait, I didn't expect that" before it lands on prod. --limitis your friend. Practice on one host before the fleet.- Tag everything.
--tags deployfor fast iteration;--skip-tags slow-setupfor re-runs. - Don't
command:orshell:without thinking. Look for a real module first; the result is idempotent. - Roll, don't restart all at once.
serial:plus a health check plus LB de/re-register = zero-downtime deploys. - One play per concern. Setup, deploy, secret rotation — separate playbooks. Easier to reason about.
- Re-run the same playbook regularly. If
applyproduces unexpectedchangedcounts, something drifted — investigate.
Checklist
Pre-production Ansible checklist
- Per-environment inventory (no cross-env mixing)
- All secrets encrypted with Ansible Vault
- Vault password stored in a secret manager (not in the repo)
- Roles versioned via
requirements.yml - CI runs
yamllint+ansible-lint+--syntax-checkon every PR - Production apply gated behind code review + manual approval
- Shared roles have Molecule tests
- Rolling deploys use
serial:and health checks - Handlers used for service restarts (not
service: restartedeverywhere) - Pipelining and SSH ControlMaster enabled for speed
- Fact caching enabled
- Periodic drift run scheduled (
ansible-playbook site.yml --check --diffin CI)