Blame docs/infra/mirrorlists.md

12bb45
# CentOS mirrorlist service
12bb45
12bb45
!!! note
12bb45
    the mirrorlist.centos.org is *crucial* for all deployed CentOs instances all around the world as each deployed CentOS instance  will query the mirrorlist webservice to get a list of validated and up2date mirrors to retrieve their content from. It's using GeoIP *or* checking if coming from a cloud provide (like EC2), in which case it would redirect to the nearest (GeoIP) or internal (Cloudfront setup for AWS/EC2) mirror
12bb45
12bb45
12bb45
## Overview 
12bb45
![mirrorlists schema](../img/mirrorlists.png)
12bb45
12bb45
It contains the following kind of scripts:
12bb45
12bb45
 * backend : so scripts used by our "crawler" node, validating in loop all the external mirrors through IPv4 and IPv6 and so producing the 'mirrorlists', each one per repo/arch/country
12bb45
 * frontend : python scripts used for :
12bb45
    * http://mirrorlist.centos.org
12bb45
    * http://isoredirect.centos.org
12bb45
12bb45
## Backend (crawler)
12bb45
There are two Perl scripts for checking mirrors: 
12bb45
12bb45
 * makemirrorlists-combined.pl for creating files for mirrorlist.centos.org
12bb45
 * makeisolists-combined.pl for creating files for isoredirect.centos.org. 
12bb45
12bb45
Both scripts can create lists for all CentOS supported released ,including SIG and AltArch content. makemirrorlists-combined.pl will test each mirror separately for IPv4 and IPv6.
12bb45
12bb45
mirrorlist.centos.org will then be able to present only IPv6-capable mirrors to the clients when mirrorlist.centos.org is accessed over IPv6.
12bb45
More details about the internals of these scripts can be found in backend/mirrorlist_crawler_deployment_notes.txt
12bb45
12bb45
## Frontend 
12bb45
All scripts are located in the frontend folder.
12bb45
The following items are needed for the mirrorlist/isoredirect service:
12bb45
12bb45
 * A http server (apache) using mod_proxy_balancer (see frontend/httpd/mirrorlist.conf vhost example)
12bb45
 * python-bottle to run the {ml,isoredirect}.py code for various instances
12bb45
 * Maxmind Geolite2 database : [City version](https://dev.maxmind.com/geoip/geoip2/geolite2/)
12bb45
 * python-geoip2 pkg (to consume those Geolite2 DB)
12bb45
 * python-memcached (to cache results for GeoIP/Cloud providers)
12bb45
 * For each worker, a specific instance/port can be initialized and added to Apache config for the proxy-balancer (see frontend/systemd/centos-ml-worker@.service)
12bb45
12bb45
Those services (mirrorlist/isoredirect) just consume mirrorlist files, pushed to those nodes, and updated in loop by the Crawler process (see Backend section above)
12bb45
12bb45
When a request is made to the service, the python script :
12bb45
 
12bb45
 * checks for IPv4 or IPv6 connectivity
12bb45
 * checks if IP is in memcached (for country/cloud provider)
12bb45
 * searches if IP is from cloud provider
12bb45
 * computes Geolocation based on the origin IP
12bb45
 * searches for validated mirrors in the same country/state for the request arch/repo/release
12bb45
 * returns such list