| # CentOS mirrorlist service |
| |
| !!! note |
| the mirrorlist.centos.org is *crucial* for all deployed CentOs instances all around the world as each deployed CentOS instance will query the mirrorlist webservice to get a list of validated and up2date mirrors to retrieve their content from. It's using GeoIP *or* checking if coming from a cloud provide (like EC2), in which case it would redirect to the nearest (GeoIP) or internal (Cloudfront setup for AWS/EC2) mirror |
| |
| |
| ## Overview |
|  |
| |
| It contains the following kind of scripts: |
| |
| * backend : so scripts used by our "crawler" node, validating in loop all the external mirrors through IPv4 and IPv6 and so producing the 'mirrorlists', each one per repo/arch/country |
| * frontend : python scripts used for : |
| * http://mirrorlist.centos.org |
| * http://isoredirect.centos.org |
| |
| ## Backend (crawler) |
| There are two Perl scripts for checking mirrors: |
| |
| * makemirrorlists-combined.pl for creating files for mirrorlist.centos.org |
| * makeisolists-combined.pl for creating files for isoredirect.centos.org. |
| |
| Both scripts can create lists for all CentOS supported released ,including SIG and AltArch content. makemirrorlists-combined.pl will test each mirror separately for IPv4 and IPv6. |
| |
| mirrorlist.centos.org will then be able to present only IPv6-capable mirrors to the clients when mirrorlist.centos.org is accessed over IPv6. |
| More details about the internals of these scripts can be found in backend/mirrorlist_crawler_deployment_notes.txt |
| |
| ## Frontend |
| All scripts are located in the frontend folder. |
| The following items are needed for the mirrorlist/isoredirect service: |
| |
| * A http server (apache) using mod_proxy_balancer (see frontend/httpd/mirrorlist.conf vhost example) |
| * python-bottle to run the {ml,isoredirect}.py code for various instances |
| * Maxmind Geolite2 database : [City version](https://dev.maxmind.com/geoip/geoip2/geolite2/) |
| * python-geoip2 pkg (to consume those Geolite2 DB) |
| * python-memcached (to cache results for GeoIP/Cloud providers) |
| * For each worker, a specific instance/port can be initialized and added to Apache config for the proxy-balancer (see frontend/systemd/centos-ml-worker@.service) |
| |
| Those services (mirrorlist/isoredirect) just consume mirrorlist files, pushed to those nodes, and updated in loop by the Crawler process (see Backend section above) |
| |
| When a request is made to the service, the python script : |
| |
| * checks for IPv4 or IPv6 connectivity |
| * checks if IP is in memcached (for country/cloud provider) |
| * searches if IP is from cloud provider |
| * computes Geolocation based on the origin IP |
| * searches for validated mirrors in the same country/state for the request arch/repo/release |
| * returns such list |