Text Blame History Raw

The whole mirror network is a mix of sponsored/donated machines to the CentOS Project (so machines we install/control/monitor in the centos.org namespace) and external mirrors. Depending on CentOS Project (Linux vs Stream) or artifacts (normal packages, iso images, cloud images, etc) it can land on multiple and different mirror networks

CentOS Linux 7 / Stream 8

Overview

This is the largest and oldest mirror network we have and serving CentOS community for multiple years now. Workflow goes like this :

  • CentOS releases/updates packages are pushed on main mirrors (mirror-ref)
  • Some other centos owned/controlled machines are getting content (machines in the mirror.centos.org pool)
  • external third-party mirrors, when declared in a mirrors databases and with IP used in ACL, are authorized to get content through rsync
  • our "legacy" mirrors crawler validates in loop these mirrors and produce up2date mirrorlists that are pushed to mirrorlists.centos.org node (see other link in this doc to explain how it works just for mirrorlists)

Operations

Third-party mirrors are supposed (from now on) to create ticket on the centos-infra tracker to ask for a new mirror to be registered, or existing one to be modified. The current status can be seen on https://mirror-status.centos.org. Worth reading the mirrorlist doc about the backend scripts, used to validate mirrors and so how to modify existing entries in DB.

To add a new mirror, the /var/lib/centos-mirrors/mirror-geo-check.py helper script can be used to :

  • automatically detect continent/country (and state if in US) and verify restricted access
  • validate mirror is (currently) reachable and check content
  • prepare the needed sql query/statement to run on mysql

PS : all modifications to existing mirrors need to be done (for now, no helper script) "by hands"

CentOS Stream 9 and above

Overview

More or less same process as for previous releases, except that the mirror pool is mirror.stream.centos.org and has less machines, due to hard-disk space constraint. Indeed, starting from Stream 9, all artifacts, including debuginfo and source packages, are all pushed out together to same mirror network (so not to vault.centos.org nor debuginfo.centos.org - see below)

Operations

As we don't use our previous mirror crawler for this, nothing is really needed, except third-party mirrors registering their mirror on Fedora MirrorManager, and someone from infra team adding it to the CentOS Category, after having checked that mirror isn't located in a restricted/embargoed country

Cloud images

Overview

The CentOS Project build some artifacts that can be directly consumed, like cloud images, in various formats : .raw, .qcow2 and vagrant boxes. Due to the initial size of these images, it was decided to create a specific cloud.centos.org small CDN (still based on donated/sponsored machines) The release process is the same as above though : artifacts are pushed to a specific mirror reference and machines in the cloud.centos.org pool are pulling content and expose it to public consumers.

Worth knowing though that in parallel some cloud images are also directly available on cloud infra, like our AMI images are automatically available on AWS and so don't need to be pulled from cloud.centos.org to be reimported back on AWS.

Operations

Nothing really to be done, except adding/removing from our PowerDNS setup nodes behind cloud.centos.org in case of migration or maintenance

Vault network

Overview

For some time, and due to disk space usage on donated machines, it was decided to push src.rpm packages to a different network, called vault.centos.org and so not overloading mirror.centos.org (and so not overloading third-party mirrors network too) As mentioned in the CentOS Stream 9 section, it was only the case up to release Stream 8, as starting from Stream 9, mirror.stream.centos.org (and mirrors getting content from it) would contain all packages, including src.rpm and debuginfo packages vault.centos.org is also still containing archived and EOL'ed CentOS versions

It's now a dedicated Cloudfront distribution on AWS, meaning that everybody should be getting content directly from AWS, and so from caching servers in AWS infra (edge locations). That Cloudfront setup is configured to use some specific http servers are Origin (in a failover group). The release process is the same as above though : artifacts are pushed to a specific mirror reference and machines in the Vault Origin pool are pulling content and expose it to cloudfront (while these servers are reachable, they'd prevent getting content directly, and only cloudfront is authorized, see the mirror-vault role to see how)

Operations

  • tuning WAF rules on AWS Cloudfront to eventuall reject DDoS and abusers and limit traffic
  • adding/removing nodes used as Origin (all monitored already)

Debuginfo network

Overview

For some time, and due to disk space usage on donated machines, it was decided to push debuginfo packages to a different network, called debuginfo.centos.org and so not overloading mirror.centos.org (and so not overloading third-party mirrors network too) The release process is the same as above though : artifacts are pushed to a specific mirror reference and machines in the debuginfo.centos.org pool are pulling content and expose it to public consumers. As mentioned in the CentOS Stream 9 section, it was only the case up to release Stream 8, as starting from Stream 9, mirror.stream.centos.org (and mirrors getting content from it) would contain all packages, including src.rpm and debuginfo packages

Operations

Nothing really to be done, except adding/removing from our PowerDNS setup nodes behind debuginfo.centos.org in case of migration or maintenance

Buildlogs network

Overview

For the release of CentOS 7, it was decided to expose publicly the status/progress, but also pushing out all built rpm packages directly, including the specific mock config file that was used, the build log files (reason why it's named buildlogs so that people could see what was already built, before a final CentOS 7 tree would be able to see the light and so being tested. As it was there, it was decided to still push all unsigned pkgs to that network, but also use it to push SIGs content that would be also push there to be tested by community, and so tagged as testing.

It's worth knowing that in the mirror-buildlogs role, you'll see some RewriteRules in httpd to redirect to a specific CDN : CDN77 was interested in sponsoring the CentOS Project back then, so all pkgs download should be redirected to them, and they use (like Cloudfront for AWS and vault.centos.org) a dedicated "origin" node to themselves get content in their caching network

The release process is the usual : artifacts are pushed to a specific mirror reference and machines in the buildlogs.centos.org pool are pulling content and expose it to public consumers and CDN77.

Operations

Apart from monitoring the infra, the only thing to do is adding/removing record from our PowerDNS setup (GeoIP) and verify that the origin node used by CDN77 is still reachable and working (if not, just update in public DNS the CNAME for the <fqdn> declared as origin (see ansible inventory and role for which record is used for that)