Blob Blame History Raw
# CentOS mirrorlist service

!!! note
    the mirrorlist.centos.org is *crucial* for all deployed CentOs instances all around the world as each deployed CentOS instance  will query the mirrorlist webservice to get a list of validated and up2date mirrors to retrieve their content from. It's using GeoIP *or* checking if coming from a cloud provide (like EC2), in which case it would redirect to the nearest (GeoIP) or internal (Cloudfront setup for AWS/EC2) mirror


## Overview 
![mirrorlists schema](../img/mirrorlists.png)

It contains the following kind of scripts:

 * backend : so scripts used by our "crawler" node, validating in loop all the external mirrors through IPv4 and IPv6 and so producing the 'mirrorlists', each one per repo/arch/country
 * frontend : python scripts used for :
    * http://mirrorlist.centos.org
    * http://isoredirect.centos.org

## Backend (crawler)
There are two Perl scripts for checking mirrors: 

 * makemirrorlists-combined.pl for creating files for mirrorlist.centos.org
 * makeisolists-combined.pl for creating files for isoredirect.centos.org. 

Both scripts can create lists for all CentOS supported released ,including SIG and AltArch content. makemirrorlists-combined.pl will test each mirror separately for IPv4 and IPv6.

mirrorlist.centos.org will then be able to present only IPv6-capable mirrors to the clients when mirrorlist.centos.org is accessed over IPv6.
More details about the internals of these scripts can be found in backend/mirrorlist_crawler_deployment_notes.txt

## Frontend 
All scripts are located in the frontend folder.
The following items are needed for the mirrorlist/isoredirect service:

 * A http server (apache) using mod_proxy_balancer (see frontend/httpd/mirrorlist.conf vhost example)
 * python-bottle to run the {ml,isoredirect}.py code for various instances
 * Maxmind Geolite2 database : [City version](https://dev.maxmind.com/geoip/geoip2/geolite2/)
 * python-geoip2 pkg (to consume those Geolite2 DB)
 * python-memcached (to cache results for GeoIP/Cloud providers)
 * For each worker, a specific instance/port can be initialized and added to Apache config for the proxy-balancer (see frontend/systemd/centos-ml-worker@.service)

Those services (mirrorlist/isoredirect) just consume mirrorlist files, pushed to those nodes, and updated in loop by the Crawler process (see Backend section above)

When a request is made to the service, the python script :
 
 * checks for IPv4 or IPv6 connectivity
 * checks if IP is in memcached (for country/cloud provider)
 * searches if IP is from cloud provider
 * computes Geolocation based on the origin IP
 * searches for validated mirrors in the same country/state for the request arch/repo/release
 * returns such list