Quantcast
Channel: Planet Plone - Where Developers And Integrators Write
Viewing all articles
Browse latest Browse all 3535

Blue Dynamics: Setting up a PyPI mirror (with z3c.pypimirror)

$
0
0

Green Tree Python - mirroredPyPI - the official Python Package Index is sometimes on its limits and times out. This can happen and I'am sure people do best work to keep it up and running. But from a company-perspective its good to always have the files from PyPI available.

So we need to mirror. And here we need a full mirror. So this includes packages not hosted on PyPI, but on third party servers. In past those links to externally hosted packages made major problems. And yes - murphys law - the foreign server is down when its needed urgent.

Existing Software

So what do we have?

  1. pep381client (see PEP 381 'Mirroring infrastructure for PyPI'),
  2. z3c.pypimirror (see also its project page),
  3. collective.eggproxy a caching proxy for eggs from eggservers,
  4. yopypi self balancing instance that will redirect your PYPI request when PYPI is down to a default (or predefined) PYPI mirror.

Maybe theres more but thats my most important findings.

pep381client sounds good, sounds official. But it really creates a more or less 1:1 mirror of PyPI. Good? Yes - it is what I expect from a mirror. But not if you want externally hosted packages mirrored as well. But thats exactly what we need for our use-case.

z3c.pypimirror mirrors PyPI plus externally hosted packages and also follows externally hosted index pages. It supports incremental updates. Good? It's more than a mirror, because it aggregates packages - and yes, its exact what our use-case is.

collective.eggproxy is an caching proxy, so it caches only requested eggs. It's nice to speed up local development, but not sufficient for production servers.

yopipy is a nice helper if you usally want to query official PyPI but fallback to a mirror if PyPI has problems.

Setting up z3c.pypimirror

So we decided to set up z3c.pypimirror. After hitting some problems with externally hosted packages I contacted the authors and got write access on the Launchpad project to fix these problems. I released version 1.0.16 and everything described below works with this version.

First: I'am buildout addicted and so here it is: The buildout to set up my mirror, here the buildout.cfg

[buildout]
parts = mirror mirror-cfg

[mirror]
recipe = zc.recipe.egg:scripts
eggs = z3c.pypimirror

[dirs]
recipe = z3c.recipe.mkdir
mirror-base = PATH/TO/mirror
mirror-files = ${:mirror-base}/files
paths = 
    ${:mirror-files}

[mirror-cfg]
recipe = collective.recipe.template
input = ${buildout:directory}/pypimirror.cfg.in
output = ${buildout:directory}/pypimirror.cfg
url = http://pypi.MYDOMAIN.TLD
mirror-path = ${dirs:mirror-files}
lockfile = ${buildout:directory}/mirror.lock
logfile = ${dirs:mirror-base}/mirror.log

And the configuration template pypimirror.cfg.in:

[DEFAULT]
# the root folder of all mirrored packages.
# if necessary it will be created for you
mirror_file_path = ${:mirror-path}

# where's your mirror on the net?
base_url = ${:url}

# lock file to avoid duplicate runs of the mirror script
lock_file_name = ${:lockfile}

# days to fetch in past on update
fetch_since_days = 1

# Pattern for package files, only those matching will be mirrored
filename_matches =
    *.zip
    *.tgz
    *.egg
    *.tar.gz
    *.tar.bz2

# Pattern for package names; only packages having matching names will
# be mirrored
package_matches = 
    *

# remove packages not on pypi (or externals) anymore
cleanup = True

# create index.html files
create_indexes = True

# be more verbose
verbose = True

# resolve download_url links on pypi which point to files and download
# the files from there (if they match filename_matches).
# The filename and filesize (from the download header) are used
# to find out if the file is already on the mirror. Not all servers
# support the content-length header, so be prepared to download
# a lot of data on each mirror update.
# This is highly experimental and shouldn't be used right now.
# 
# NOTE: This option should only be set to True if package_matches is not 
# set to '*' - otherwise you will mirror a huge amount of data. BE CAREFUL
# using this option!!!
external_links = True

# similar to 'external_links' but also follows an index page if no
# download links are available on the referenced download_url page
# of a given package.
#
# NOTE: This option should only be set to True if package_matches is not 
# set to '*' - otherwise you will mirror a huge amount of data. BE CAREFUL
# using this option!!!
follow_external_index_pages = False

# logfile 
log_filename = ${:logfile}

Add a bootstrap.py to the directory and run

python2.6 bootstrap.py
./bin/buildout
Now take some time, bandwith and ~16GB harddisk space and run the initial mirror. If you're on a remote server over ssh run it in background or - like I do - run in a screen

. If the process stops for some reason just re-run it, it won't download packages twice.

./bin/pypimirror -I -c -v pypimirror.cfg

Finally add a cron-job to fetch the updates - i.e. every two hours, like so:

* 0-23/2 * * /PATH/TO/pypimirror/bin/pypimirror -U /PATH/TO/pypimirror/pypimirror.cfg

Now a webserver is needed. I took nginx and added a site to /etc/nginx/sites-enabled:

server {
        listen IPADDRESS;            
        server_name pypi.MYDOMAIN.TLD;
        location / {
            root /PATH/TO/mirror/files;
        }
}

 Reload nginx and done! The mirror is ready.

Using the mirror

I use buildout almost everywhere. To use the mirror simple add one line to your buildouts main section:

[buildout]
....
index = http://pypi.MYDOMAIN.TLD
...

For more information consult the official docs.

Image Green Tree Python by nasmac (Ian C) under a CC-License edited by Jens Klein


Viewing all articles
Browse latest Browse all 3535

Trending Articles