Blame docs/infra/dns.md

5aab14
# CentOS DNS authoritative and resolvers setup
5aab14
5aab14
## Public DNS setup
5aab14
5aab14
### Bind authoritative servers
5aab14
e5d196
We use [Bind](https://www.isc.org/bind/) as main DNS authoritative solution.
e5d196
For the `public` zones, we simply use the traditional primary/secondary setup, where primary zone is updated and then secondary servers are notified and so issue a IXFR/AXFR transfer to get latest zone content (and have same SOA)
e5d196
e5d196
The way we configure DNS zones is easy : 
e5d196
e5d196
 * update static zone (if needed, see below) in the `filestore/bind` directory , managed as a git repository, based on the environment)
e5d196
 * play the [bind](https://github.com/centos/ansible-role-bind) ansible role either on trigger or just wait for automatic role to be applied (see the Ansible section for the setup)
e5d196
e5d196
We have also some delegated zones that are either still served by bind, or PowerDNS (see below)
e5d196
5aab14
#### Static zones
5aab14
e5d196
As described above, to add/delete/modify a DNS record in the static zone, one has just to :
e5d196
e5d196
  * update SOA in `zone` file
e5d196
  * update/add/delete record
e5d196
  * commit/push to git (in the `filestore` git repository, depending on the inventory)
e5d196
  * trigger ansible
e5d196
e5d196
List of zones served through static files (public zones): 
e5d196
e5d196
 * centos.org
e5d196
   * ocp.centos.org
e5d196
   * ocp.ci.centos.org
e5d196
   * ocp.stg.ci.centos.org
e5d196
 * centosproject.org
e5d196
e5d196
##### Specific records
e5d196
e5d196
`CAA` records: used to publicly announce which valid CA can sign our certificates for our zones : 
e5d196
e5d196
```bash
e5d196
dig @ns1.centos.org -t CAA centos.org +short
e5d196
0 issue "amazon.com"
e5d196
0 issuewild "letsencrypt.org"
e5d196
0 issue "letsencrypt.org"
e5d196
0 issuewild "digicert.com"
e5d196
0 issue "digicert.com"
e5d196
e5d196
``` 
e5d196
e5d196
`TXT / SPF` records: used for [Sender Policy Framework](https://en.wikipedia.org/wiki/Sender_Policy_Framework) and restrict from which IP block/host one can send mail originating from @centos.org domain
e5d196
e5d196
`TXT` for kerberos : we have a pointer to announce that one can use FEDORAPROJECT for kerberos ticket
e5d196
e5d196
```
e5d196
dig @ns1.centos.org -t TXT _kerberos.centos.org +short
e5d196
"FEDORAPROJECT.ORG"
e5d196
```
e5d196
e5d196
`CNAME` : simple aliases for other A/AAAA records
e5d196
e5d196
`CNAME` for TLS/ACME dns challenge : we use some `static` CNAME pointing to equivalent record in `dynamic` zone (see below)
e5d196
e5d196
`NS` records : for the zones that we delegate to other authoritative servers, like for example (but not limited to) `mirror.centos.org` , served by PowerDNS/GeoIP (see also below)
e5d196
5aab14
#### Dynamic zones
5aab14
e5d196
We also have a specific `acme.centos.org` zone, that is only use for one specific purpose : creating on the fly TXT records that will be used by LetsEncrypt/ACME for DNS challenge.
e5d196
For this we use [acme.sh](https://github.com/Neilpang/acme.sh) tool that will do that automatically for us : it will use nsupdate with specific allowed key to create dynamically the needed record that ACME server will verify to validate and then sign the CSR.
e5d196
1c0ded
See [TLS section](../security/tls.md#how-to-obtain-new-cert-dns-challenge-is-the-preferred-way) on how to use it.
e5d196
e5d196
Some pointers:
e5d196
e5d196
  * [https://github.com/acmesh-official/acme.sh/wiki/dnsapi#7-use-nsupdate-to-automatically-issue-cert](https://github.com/acmesh-official/acme.sh/wiki/dnsapi#7-use-nsupdate-to-automatically-issue-cert)
e5d196
  * [https://github.com/Neilpang/acme.sh/wiki/DNS-alias-mode](https://github.com/Neilpang/acme.sh/wiki/DNS-alias-mode)
e5d196
e5d196
5aab14
### PowerDNS servers (GeoIP)
5aab14
e5d196
For some specific records, like `mirror.centos.org`, or `vault.centos.org` (and others) , we wanted to use something else than simple Round-Robin logic into Bind zone file.
e5d196
The idea was to optimize where to redirect based on GeoIP/country information, and so use nearest server for that role.
e5d196
e5d196
[PowerDNS](https://www.powerdns.com) is a really good authoritative solution that also permits you to inject your own [Pipe backend](https://doc.powerdns.com/md/authoritative/backend-pipe/) , meaning that we were able to have our own logic based on our requirements.
e5d196
e5d196
See our [pdns-custom-geoip-backend](https://github.com/CentOS/pdns-custom-geoip-backend) git repository that contains the simple code used for that and corresponding [pdns-pipe](https://github.com/CentOS/ansible-role-pdns-pipe) ansible role used to automatically deploy it.
e5d196
e5d196
Workflow for the `dynamic` backend : 
e5d196
e5d196
 * we have a central `nodes.db` sqlite3 DB that is where we use a specific schema to enter fqdn, ipv4/ipv6 address[es], region, continent, country, and if node is active or not (see `/var/lib/centos-infra/nodes.db`)
e5d196
 * if we have to add/modify/remove a node we just do it in that sqlite DB
e5d196
 * we then regenerate a .json file parsing DB and sorting in correct format that can be consummed by powerdns pipe backend (call the `/var/lib/centos-infra/gen_backend` script), that will also encrypt .json with gpg
e5d196
 * delegated powerdns nodes will detect changes, decrypt files, and reload it in memory automatically
e5d196
e5d196
Worth knowing that for existing setup, and when we want to put a machine out of the pool, or add it back, we have a simple ansible adhoc-task (using prompt vars) that can be used for this : `adhoc-node-pdns-modify.yml` : 
e5d196
e5d196
```
e5d196
ansible-playbook-prod playbooks/adhoc-node-pdns-modify.yml 
e5d196
Host to modify in PowerDNS ? => : centosq7.centos.org
e5d196
Action (enable|disable) ? => : enable
e5d196
e5d196
PLAY [centosq7.centos.org] ********************************************************************************************
e5d196
e5d196
TASK [Enable/Disable msync node in PowerDNS geoip backend] ************************************************************
e5d196
Friday 25 June 2021  14:46:44 +0200 (0:00:07.333)       0:00:07.333 *********** 
e5d196
changed: [centosq7.centos.org]
e5d196
e5d196
PLAY RECAP ************************************************************************************************************
e5d196
centosq7.centos.org        : ok=1    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
e5d196
e5d196
Friday 25 June 2021  14:46:51 +0200 (0:00:06.806)       0:00:14.140 *********** 
e5d196
=============================================================================== 
e5d196
Enable/Disable msync node in PowerDNS geoip backend ------------------------------------------------------------ 6.81s
e5d196
Playbook run took 0 days, 0 hours, 0 minutes, 14 seconds
e5d196
e5d196
```
e5d196
e5d196
5aab14
## Internal DNS setup
5aab14
5aab14
### Bind authoritative and resolvers
5aab14
e5d196
For DCs that we control (Red Hat ones) and for which we have internal zone/subnet and so different set of `internal` IPs, we also use Bind, but with other features added in our Ansible role, like allow recursion (specific ACL to let internal subnet uses bind both as authoritative *and* resolver)
e5d196
e5d196
The procedure to update a zone is identical to the one described for public zone, but surely coming from a different ansible inventory and so different `filestore` git repo (tied to that inventory/env)
e5d196
e5d196
Worth knowing that we (ab)use some specific feature like Response Policy Zones [RPZ](https://www.isc.org/rpz/) : that permits us to , while still internally, answers automatically and redirect known `external` records (like mirrorlist.centos.org) to an internal IP and so not query the `public` authoritative servers.
e5d196
e5d196
All that is supported by our [bind](https://github.com/centos/ansible-role-bind) Ansible role, so consider reading defaults/main.yml to see how that works, or have access in ansible inventory/filestore for real examples.
e5d196
e5d196
5aab14
### Unbound resolvers
e5d196
e5d196
In some specific subnets/environments we also use [Unbound](https://www.nlnetlabs.nl/projects/unbound/about/) which is a also a really lightweight/fast resolver, that supports plenty of features.
e5d196
e5d196
While not technically called RPZ, Unbound let you define some records, and forward other queries to other resolvers (forwarders).
e5d196
e5d196
Our [unbound](https://github.com/CentOS/ansible-role-unbound) ansible role supports such features, like : 
e5d196
e5d196
 * control ACL (for recursion)
e5d196
 * override some specific records through ansible list
e5d196
 * also parse *automatically* an ansible group itself to generate dynamically a kind of internal zone.
e5d196
e5d196
That last feature is the one we use (through `unbound_local_groups`) for the `internal` rdu2.centos.org zone, as when we'll add a new machine, and ip defined for the host/VM, unbound will automatically add it into computed file used by unbound.
e5d196
e5d196
e5d196