Blue Dynamics: Setting up a PyPI mirror (with z3c.pypimirror)

Green Tree Python - mirrored PyPI - the official Python Package Index is sometimes on its limits and times out. This can happen and I'am sure people do best work to keep it up and running. But from a company-perspective its good to always have the files from PyPI available.

So we need to mirror. And here we need a full mirror. So this includes packages not hosted on PyPI, but on third party servers. In past those links to externally hosted packages made major problems. And yes - murphys law - the foreign server is down when its needed urgent.

Existing Software

So what do we have?

pep381client (see PEP 381 'Mirroring infrastructure for PyPI'),
z3c.pypimirror (see also its project page),
collective.eggproxy a caching proxy for eggs from eggservers,
yopypi self balancing instance that will redirect your PYPI request when PYPI is down to a default (or predefined) PYPI mirror.

Maybe theres more but thats my most important findings.

pep381client sounds good, sounds official. But it really creates a more or less 1:1 mirror of PyPI. Good? Yes - it is what I expect from a mirror. But not if you want externally hosted packages mirrored as well. But thats exactly what we need for our use-case.

z3c.pypimirror mirrors PyPI plus externally hosted packages and also follows externally hosted index pages. It supports incremental updates. Good? It's more than a mirror, because it aggregates packages - and yes, its exact what our use-case is.

collective.eggproxy is an caching proxy, so it caches only requested eggs. It's nice to speed up local development, but not sufficient for production servers.

yopipy is a nice helper if you usally want to query official PyPI but fallback to a mirror if PyPI has problems.

Setting up z3c.pypimirror

So we decided to set up z3c.pypimirror. After hitting some problems with externally hosted packages I contacted the authors and got write access on the Launchpad project to fix these problems. I released version 1.0.16 and everything described below works with this version.

First: I'am buildout addicted and so here it is: The buildout to set up my mirror, here the buildout.cfg

[buildout]
parts = mirror mirror-cfg

[mirror]
recipe = zc.recipe.egg:scripts
eggs = z3c.pypimirror

[dirs]
recipe = z3c.recipe.mkdir
mirror-base = PATH/TO/mirror
mirror-files = ${:mirror-base}/files
paths = 
    ${:mirror-files}

[mirror-cfg]
recipe = collective.recipe.template
input = ${buildout:directory}/pypimirror.cfg.in
output = ${buildout:directory}/pypimirror.cfg
url = http://pypi.MYDOMAIN.TLD
mirror-path = ${dirs:mirror-files}
lockfile = ${buildout:directory}/mirror.lock
logfile = ${dirs:mirror-base}/mirror.log

And the configuration template pypimirror.cfg.in:

[DEFAULT]
# the root folder of all mirrored packages.
# if necessary it will be created for you
mirror_file_path = ${:mirror-path}

# where's your mirror on the net?
base_url = ${:url}

# lock file to avoid duplicate runs of the mirror script
lock_file_name = ${:lockfile}

# days to fetch in past on update
fetch_since_days = 1

# Pattern for package files, only those matching will be mirrored
filename_matches =
    *.zip
    *.tgz
    *.egg
    *.tar.gz
    *.tar.bz2

# Pattern for package names; only packages having matching names will
# be mirrored
package_matches = 
    *

# remove packages not on pypi (or externals) anymore
cleanup = True

# create index.html files
create_indexes = True

# be more verbose
verbose = True

# resolve download_url links on pypi which point to files and download
# the files from there (if they match filename_matches).
# The filename and filesize (from the download header) are used
# to find out if the file is already on the mirror. Not all servers
# support the content-length header, so be prepared to download
# a lot of data on each mirror update.
# This is highly experimental and shouldn't be used right now.
# 
# NOTE: This option should only be set to True if package_matches is not 
# set to '*' - otherwise you will mirror a huge amount of data. BE CAREFUL
# using this option!!!
external_links = True

# similar to 'external_links' but also follows an index page if no
# download links are available on the referenced download_url page
# of a given package.
#
# NOTE: This option should only be set to True if package_matches is not 
# set to '*' - otherwise you will mirror a huge amount of data. BE CAREFUL
# using this option!!!
follow_external_index_pages = False

# logfile 
log_filename = ${:logfile}

Add a bootstrap.py to the directory and run

python2.6 bootstrap.py
./bin/buildout

Now take some time, bandwith and ~16GB harddisk space and run the initial mirror. If you're on a remote server over ssh run it in background or - like I do - run in a screen

. If the process stops for some reason just re-run it, it won't download packages twice.

./bin/pypimirror -I -c -v pypimirror.cfg

Finally add a cron-job to fetch the updates - i.e. every two hours, like so:

* 0-23/2 * * /PATH/TO/pypimirror/bin/pypimirror -U /PATH/TO/pypimirror/pypimirror.cfg

Now a webserver is needed. I took nginx and added a site to /etc/nginx/sites-enabled:

server {
        listen IPADDRESS;            
        server_name pypi.MYDOMAIN.TLD;
        location / {
            root /PATH/TO/mirror/files;
        }
}

Reload nginx and done! The mirror is ready.

Using the mirror

I use buildout almost everywhere. To use the mirror simple add one line to your buildouts main section:

[buildout]
....
index = http://pypi.MYDOMAIN.TLD
...

For more information consult the official docs.

Image Green Tree Python by nasmac (Ian C) under a CC-License edited by Jens Klein

Blue Dynamics: Setting up a PyPI mirror (with z3c.pypimirror)

Existing Software

Setting up z3c.pypimirror

Using the mirror

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List