From 388fbed56ed63be36ec13ed540b6077db7834628 Mon Sep 17 00:00:00 2001 From: Fabian Arrotin Date: Mar 11 2022 08:28:24 +0000 Subject: Some notes for mirror network Signed-off-by: Fabian Arrotin --- diff --git a/docs/buildsys/mirror-network.md b/docs/buildsys/mirror-network.md index e69de29..0c62615 100644 --- a/docs/buildsys/mirror-network.md +++ b/docs/buildsys/mirror-network.md @@ -0,0 +1,78 @@ +The whole mirror network is a mix of sponsored/donated machines to the CentOS Project (so machines we install/control/monitor in the centos.org namespace) and external mirrors. Depending on CentOS Project (Linux vs Stream) or artifacts (normal packages, iso images, cloud images, etc) it can land on multiple and different mirror networks + +## CentOS Linux 7 / Stream 8 +### Overview +This is the largest and oldest mirror network we have and serving CentOS community for multiple years now. +Workflow goes like this : + + * CentOS releases/updates packages are pushed on main mirrors (mirror-ref) + * Some other centos owned/controlled machines are getting content (machines in the mirror.centos.org pool) + * external third-party mirrors, when declared in a mirrors databases and with IP used in ACL, are authorized to get content through rsync + * our "legacy" mirrors crawler validates in loop these mirrors and produce up2date mirrorlists that are pushed to mirrorlists.centos.org node (see other link in this doc to explain how it works just for mirrorlists) + +### Operations +Third-party mirrors are supposed (from now on) to create ticket on the centos-infra tracker to ask for a new mirror to be registered, or existing one to be modified. +The current status can be seen on [https://mirror-status.centos.org](https://mirror-status.centos.org). +Worth reading the [mirrorlist doc](https://github.com/centos/mirrorlists-code) about the backend scripts, used to validate mirrors and so how to modify existing entries in DB. + +To add a new mirror, the `/var/lib/centos-mirrors/mirror-geo-check.py` helper script can be used to : + + * automatically detect continent/country (and state if in US) and verify restricted access + * validate mirror is (currently) reachable and check content + * prepare the needed sql query/statement to run on mysql + +PS : all modifications to existing mirrors need to be done (for now, no helper script) "by hands" + +## CentOS Stream 9 and above +### Overview +More or less same process as for previous releases, except that the mirror pool is `mirror.stream.centos.org` and has less machines, due to hard-disk space constraint. Indeed, starting from Stream 9, all artifacts, including debuginfo and source packages, are all pushed out together to same mirror network (so not to `vault.centos.org` nor `debuginfo.centos.org` - see below) + +### Operations +As we don't use our previous mirror crawler for this, nothing is really needed, except third-party mirrors registering their mirror on Fedora MirrorManager, and someone from infra team adding it to the CentOS Category, after having checked that mirror isn't located in a restricted/embargoed country + +## Cloud images +### Overview +The CentOS Project build some artifacts that can be directly consumed, like cloud images, in various formats : .raw, .qcow2 and vagrant boxes. +Due to the initial size of these images, it was decided to create a specific `cloud.centos.org` small CDN (still based on donated/sponsored machines) +The release process is the same as above though : artifacts are pushed to a specific mirror reference and machines in the cloud.centos.org pool are pulling content and expose it to public consumers. + +Worth knowing though that in parallel some cloud images are also directly available on cloud infra, like our AMI images are automatically available on AWS and so don't need to be pulled from cloud.centos.org to be reimported back on AWS. + +### Operations +Nothing really to be done, except adding/removing from our PowerDNS setup nodes behind cloud.centos.org in case of migration or maintenance + +## Vault network +### Overview +For some time, and due to disk space usage on donated machines, it was decided to push *src.rpm* packages to a different network, called `vault.centos.org` and so not overloading `mirror.centos.org` (and so not overloading third-party mirrors network too) +As mentioned in the CentOS Stream 9 section, it was only the case up to release Stream 8, as starting from Stream 9, mirror.stream.centos.org (and mirrors getting content from it) would contain *all* packages, including src.rpm and debuginfo packages +`vault.centos.org` is also still containing archived and EOL'ed CentOS versions + +It's now a dedicated Cloudfront distribution on AWS, meaning that everybody should be getting content directly from AWS, and so from caching servers in AWS infra (edge locations). +That Cloudfront setup is configured to use some specific http servers are Origin (in a failover group). +The release process is the same as above though : artifacts are pushed to a specific mirror reference and machines in the Vault Origin pool are pulling content and expose it to cloudfront (while these servers are reachable, they'd prevent getting content directly, and only cloudfront is authorized, see the `mirror-vault` role to see how) + +### Operations + * tuning WAF rules on AWS Cloudfront to eventuall reject DDoS and abusers and limit traffic + * adding/removing nodes used as Origin (all monitored already) + + +## Debuginfo network +### Overview +For some time, and due to disk space usage on donated machines, it was decided to push *debuginfo* packages to a different network, called `debuginfo.centos.org` and so not overloading `mirror.centos.org` (and so not overloading third-party mirrors network too) +The release process is the same as above though : artifacts are pushed to a specific mirror reference and machines in the debuginfo.centos.org pool are pulling content and expose it to public consumers. +As mentioned in the CentOS Stream 9 section, it was only the case up to release Stream 8, as starting from Stream 9, mirror.stream.centos.org (and mirrors getting content from it) would contain *all* packages, including src.rpm and debuginfo packages + +### Operations +Nothing really to be done, except adding/removing from our PowerDNS setup nodes behind debuginfo.centos.org in case of migration or maintenance + +## Buildlogs network +### Overview +For the release of CentOS 7, it was decided to expose publicly the status/progress, but also pushing out all built rpm packages directly, including the specific mock config file that was used, the build log files (reason why it's named `buildlogs` so that people could see what was already built, before a final CentOS 7 tree would be able to see the light and so being tested. As it was there, it was decided to still push all unsigned pkgs to that network, but also use it to push SIGs content that would be also push there to be tested by community, and so tagged as `testing`. + +It's worth knowing that in the `mirror-buildlogs` role, you'll see some RewriteRules in httpd to redirect to a specific CDN : CDN77 was interested in sponsoring the CentOS Project back then, so all pkgs download should be redirected to them, and they use (like Cloudfront for AWS and vault.centos.org) a dedicated "origin" node to themselves get content in their caching network + +The release process is the usual : artifacts are pushed to a specific mirror reference and machines in the buildlogs.centos.org pool are pulling content and expose it to public consumers and CDN77. + +### Operations +Apart from monitoring the infra, the only thing to do is adding/removing record from our PowerDNS setup (GeoIP) *and* verify that the origin node used by CDN77 is still reachable and working (if not, just update in public DNS the CNAME for the declared as origin (see ansible inventory and role for which record is used for that) +