Blame docs/buildsys/mirror-network.md

388fbe
The whole mirror network is a mix of sponsored/donated machines to the CentOS Project (so machines we install/control/monitor in the centos.org namespace) and external mirrors. Depending on CentOS Project (Linux vs Stream) or artifacts (normal packages, iso images, cloud images, etc) it can land on multiple and different mirror networks
388fbe
388fbe
## CentOS Linux 7 / Stream 8
388fbe
### Overview
388fbe
This is the largest and oldest mirror network we have and serving CentOS community for multiple years now.
388fbe
Workflow goes like this : 
388fbe
388fbe
 * CentOS releases/updates packages are pushed on main mirrors (mirror-ref)
388fbe
 * Some other centos owned/controlled machines are getting content (machines in the mirror.centos.org pool)
388fbe
 * external third-party mirrors, when declared in a mirrors databases and with IP used in ACL, are authorized to get content through rsync
388fbe
 * our "legacy" mirrors crawler validates in loop these mirrors and produce up2date mirrorlists that are pushed to mirrorlists.centos.org node (see other link in this doc to explain how it works just for mirrorlists)
388fbe
388fbe
### Operations
388fbe
Third-party mirrors are supposed (from now on) to create ticket on the centos-infra tracker to ask for a new mirror to be registered, or existing one to be modified.
388fbe
The current status can be seen on [https://mirror-status.centos.org](https://mirror-status.centos.org).
388fbe
Worth reading the [mirrorlist doc](https://github.com/centos/mirrorlists-code) about the backend scripts, used to validate mirrors and so how to modify existing entries in DB.
388fbe
388fbe
To add a new mirror, the `/var/lib/centos-mirrors/mirror-geo-check.py` helper script can be used to :
388fbe
388fbe
 * automatically detect continent/country (and state if in US) and verify restricted access
388fbe
 * validate mirror is (currently) reachable and check content
388fbe
 * prepare the needed sql query/statement to run on mysql
388fbe
388fbe
PS : all modifications to existing mirrors need to be done (for now, no helper script) "by hands"
388fbe
388fbe
## CentOS Stream 9 and above
388fbe
### Overview
388fbe
More or less same process as for previous releases, except that the mirror pool is `mirror.stream.centos.org` and has less machines, due to hard-disk space constraint. Indeed, starting from Stream 9, all artifacts, including debuginfo and source packages, are all pushed out together to same mirror network (so not to `vault.centos.org` nor `debuginfo.centos.org` - see below)
388fbe
388fbe
### Operations
388fbe
As we don't use our previous mirror crawler for this, nothing is really needed, except third-party mirrors registering their mirror on Fedora MirrorManager, and someone from infra team adding it to the CentOS Category, after having checked that mirror isn't located in a restricted/embargoed country
388fbe
388fbe
## Cloud images
388fbe
### Overview
388fbe
The CentOS Project build some artifacts that can be directly consumed, like cloud images, in various formats : .raw, .qcow2 and vagrant boxes.
388fbe
Due to the initial size of these images, it was decided to create a specific `cloud.centos.org` small CDN (still based on donated/sponsored machines)
388fbe
The release process is the same as above though : artifacts are pushed to a specific mirror reference and machines in the cloud.centos.org pool are pulling content and expose it to public consumers.
388fbe
388fbe
Worth knowing though that in parallel some cloud images are also directly available on cloud infra, like our AMI images are automatically available on AWS and so don't need to be pulled from cloud.centos.org to be reimported back on AWS.
388fbe
388fbe
### Operations
388fbe
Nothing really to be done, except adding/removing from our PowerDNS setup nodes behind cloud.centos.org in case of migration or maintenance
388fbe
388fbe
## Vault network
388fbe
### Overview
388fbe
For some time, and due to disk space usage on donated machines, it was decided to push *src.rpm* packages to a different network, called `vault.centos.org` and so not overloading `mirror.centos.org` (and so not overloading third-party mirrors network too)
388fbe
As mentioned in the CentOS Stream 9 section, it was only the case up to release Stream 8, as starting from Stream 9, mirror.stream.centos.org (and mirrors getting content from it) would contain *all* packages, including src.rpm and debuginfo packages
388fbe
`vault.centos.org` is also still containing archived and EOL'ed CentOS versions
388fbe
388fbe
It's now a dedicated Cloudfront distribution on AWS, meaning that everybody should be getting content directly from AWS, and so from caching servers in AWS infra (edge locations).
388fbe
That Cloudfront setup is configured to use some specific http servers are Origin (in a failover group).
388fbe
The release process is the same as above though : artifacts are pushed to a specific mirror reference and machines in the Vault Origin pool are pulling content and expose it to cloudfront (while these servers are reachable, they'd prevent getting content directly, and only cloudfront is authorized, see the `mirror-vault` role to see how)
388fbe
388fbe
### Operations
388fbe
 * tuning WAF rules on AWS Cloudfront to eventuall reject DDoS and abusers and limit traffic
388fbe
 * adding/removing nodes used as Origin (all monitored already)
388fbe
388fbe
388fbe
## Debuginfo network
388fbe
### Overview
388fbe
For some time, and due to disk space usage on donated machines, it was decided to push *debuginfo* packages to a different network, called `debuginfo.centos.org` and so not overloading `mirror.centos.org` (and so not overloading third-party mirrors network too)
388fbe
The release process is the same as above though : artifacts are pushed to a specific mirror reference and machines in the debuginfo.centos.org pool are pulling content and expose it to public consumers.
388fbe
As mentioned in the CentOS Stream 9 section, it was only the case up to release Stream 8, as starting from Stream 9, mirror.stream.centos.org (and mirrors getting content from it) would contain *all* packages, including src.rpm and debuginfo packages
388fbe
388fbe
### Operations
388fbe
Nothing really to be done, except adding/removing from our PowerDNS setup nodes behind debuginfo.centos.org in case of migration or maintenance
388fbe
388fbe
## Buildlogs network
388fbe
### Overview
388fbe
For the release of CentOS 7, it was decided to expose publicly the status/progress, but also pushing out all built rpm packages directly, including the specific mock config file that was used, the build log files (reason why it's named `buildlogs` so that people could see what was already built, before a final CentOS 7 tree would be able to see the light and so being tested. As it was there, it was decided to still push all unsigned pkgs to that network, but also use it to push SIGs content that would be also push there to be tested by community, and so tagged as `testing`.
388fbe
388fbe
It's worth knowing that in the `mirror-buildlogs` role, you'll see some RewriteRules in httpd to redirect to a specific CDN : CDN77 was interested in sponsoring the CentOS Project back then, so all pkgs download should be redirected to them, and they use (like Cloudfront for AWS and vault.centos.org) a dedicated "origin" node to themselves get content in their caching network
388fbe
388fbe
The release process is the usual : artifacts are pushed to a specific mirror reference and machines in the buildlogs.centos.org pool are pulling content and expose it to public consumers and CDN77.
388fbe
388fbe
### Operations
388fbe
Apart from monitoring the infra, the only thing to do is adding/removing record from our PowerDNS setup (GeoIP) *and* verify that the origin node used by CDN77 is still reachable and working (if not, just update in public DNS the CNAME for the <fqdn> declared as origin (see ansible inventory and role for which record is used for that)
388fbe