Text Blame History Raw

CentOS mirrorlist service

Note

the mirrorlist.centos.org is crucial for all deployed CentOs instances all around the world as each deployed CentOS instance will query the mirrorlist webservice to get a list of validated and up2date mirrors to retrieve their content from. It's using GeoIP or checking if coming from a cloud provide (like EC2), in which case it would redirect to the nearest (GeoIP) or internal (Cloudfront setup for AWS/EC2) mirror

Overview

mirrorlists schema

It contains the following kind of scripts:

Backend (crawler)

There are two Perl scripts for checking mirrors:

  • makemirrorlists-combined.pl for creating files for mirrorlist.centos.org
  • makeisolists-combined.pl for creating files for isoredirect.centos.org.

Both scripts can create lists for all CentOS supported released ,including SIG and AltArch content. makemirrorlists-combined.pl will test each mirror separately for IPv4 and IPv6.

mirrorlist.centos.org will then be able to present only IPv6-capable mirrors to the clients when mirrorlist.centos.org is accessed over IPv6. More details about the internals of these scripts can be found in backend/mirrorlist_crawler_deployment_notes.txt

Frontend

All scripts are located in the frontend folder. The following items are needed for the mirrorlist/isoredirect service:

  • A http server (apache) using mod_proxy_balancer (see frontend/httpd/mirrorlist.conf vhost example)
  • python-bottle to run the {ml,isoredirect}.py code for various instances
  • Maxmind Geolite2 database : City version
  • python-geoip2 pkg (to consume those Geolite2 DB)
  • python-memcached (to cache results for GeoIP/Cloud providers)
  • For each worker, a specific instance/port can be initialized and added to Apache config for the proxy-balancer (see frontend/systemd/centos-ml-worker@.service)

Those services (mirrorlist/isoredirect) just consume mirrorlist files, pushed to those nodes, and updated in loop by the Crawler process (see Backend section above)

When a request is made to the service, the python script :

  • checks for IPv4 or IPv6 connectivity
  • checks if IP is in memcached (for country/cloud provider)
  • searches if IP is from cloud provider
  • computes Geolocation based on the origin IP
  • searches for validated mirrors in the same country/state for the request arch/repo/release
  • returns such list