Setting up a mirror of Rufus.W3.Org RPM database

This page explain how to set-up a Web database for RPM packages similar to the one running on rpmfind.net . You should first get acquainted on the mirroring principle described shortly on the mirroring proposal. However the setup should be fairy simple:

Prerequisites

  1. You must of course have a Web server running, I suggest Apache the obvious choice for a Linux machine, it's probably installed by default anyway.
  2. You should run a mirror of the RDF database available on ftp://rpmfind.net/linux/RDF . To help boostraping the mirroring process it may prove more efficient to fetch first a compressed archive of the whole RDF tree and expand it. Note that you don't need to mirror the full tree, you can select to prune some of the subtrees (but do not break the overall structure !). I suggest using rsync to do the mirroring. Another alternative is to use mirror-2.8 perl script, but it's somewhat more difficult to set-up.
  3. You should get a recent copy of rpm2html, you can grab an rpm for example :-) (the version must be >= 0.90, and it's generally a good idea to follow closely the releases), install it.
  4. Of course, you need disk space, currently the RDF tree requires 1.3GBytes while the full HTML tree built consumes nearly 4GBytes.
  5. Subscribe to the rpm2html mailing-list, send a mail to majordomo@rpmfind.net with the line
    subscribe rpm2html
    in the body of the message. The list archive are on-line.

Setting up the mirror

You need to replicate the RDF database available on ftp://rpmfind.net/linux/RDF .

The simplest is to use rsync, the command is simply

rsync -az --delete rpmfind.net::RDF /linux/RDF
I you want to keep the metadata mirror under /linux/RDF. Note also that I am interested in people providing HTTP access to metadata so on a standard linux setup /home/httpd/html/linux/RDF would be even better !

Instead, if you want to use mirror, basically install it (this is a set of perl scripts dedicated to the job of mirroring FTP sites), and add to the default configuration (usually named mirror.defaults) an entry for the RDF repository. Just add the following lines at the end of your mirror.defaults:

package=rdf
        site=rpmfind.net
        remote_dir=/linux/RDF
        local_dir=/home/httpd/html/linux/RDF
        remote_user=anonymous
        remote_password=me@machine RDF mirroring
Try it by launching "mirror -d -p rdf" and check for possible problems.

Setting up the rpm2html config file

I suggest grabbing my existing config file and modify it, this is a bit painful, but hopefully has to be done only once:

Modify the Global section

  1. Change the maint and mail values to reflect your name and prefered E-mail address for feedback
  2. Change the dir path to the actual directory where the HTML file have to be produced (something like /home/httpd/html/RPM if you use the standard apache setup). This has to be in your server exported space and the tree may grow to 200 MBytes so check first that you have enought space !
  3. Change url to the prefix to access teh pages on your HTTP server. For example if you are serving them from /home/httpd/html/RPM, the full URL to access them is http://my.server.org/RPM and the correct value would be : url=/RPM .
  4. Remove any rdf=true or rdf_dir=/linux/RDF if present, those are used on rufus to create the .rdf files from the .rpm ones. You don't need them on a mirror.

Modify each Directory section

After the global section, the config file is a list of directory specific informations, usually related to one specific distribution. The goal here is to adapt it to your local filesystem and point to the local FTP mirrors (for example, you wouldn't point directly to RedHat site but to one of the mirrors in your area). You may drop some for the directories of you are too tight on space or if there is no near mirror for this specific distribution. Let's examine one entry:
  1. [/linux/RDF/redhat/5.0/i386]change /linux to the actual location on your disk for the mirror, e.g.:
    [/home/ftp/pub/mirror/redhat/5.0/i386]
  2. name=RedHat-5.0 for i386 : You probably don't have to change the name of the distribution, unless you want to translate it.
  3. subdir=redhat/5.0/i386 : local path, don't change it !
  4. ftp=ftp://ftp.redhat.com/pub/redhat/redhat-5.0/alpha/RedHat/RPMS : The origin server for the packages, don't change it !
  5. ftpsrc=ftp://ftp.redhat.com/pub/redhat/redhat-5.0/SRPMS : The origin server for the sources, you may want to point to a near server providing the sources RPMs.
  6. color=#ffe0ff: Color code for this distribution, you can change that but avoid giving nearly the same color for two different distribution.
  7. mirror=ftp://rpmfind.net/linux/redhat/redhat-5.0/alpha/RedHat/RPMS : The first nearest mirror, customize to reduce the bandwidth traffic (don't reference rufus server if you are located in Australia !).
  8. mirror=ftp://ftp.redhat.com/pub/redhat/redhat-5.0/alpha/RedHat/RPMS : additionnal mirrors may be added, rpm2html currently don't use this feature, but will in a near future ...
Note that if you changed the configuration file for an existing setup, you need to pass the -force option to rpm2html to ensure that all the pages are updated.

Run rpm2html

Try it:

rpm2html config.rpm2html.mirrors

Check for error messages, indicating path or directory rights problems, then point your favorite browser to the Web pages and ensure that the links generated internally are correct, as well as the outside links to the actual RPM mirrors.
 

Automate the process

Add the mirror command to update the RDF directory and the call to rpm2html to your crontab. Note that rpm2html never clean up old pages generated but no more accurate, you need to add this to your cron job before running rpm2html:

Announce it and register

Once you have a working setup, it would be cool to announce it to the rpm2html mailing-list, and to your local linux users group Don't forget to give location (country, state) information as well as the dataset indexed if you don't run the full archive. this has to be shared ! Contact me if you want to localize the output of rpm2html, it's not that hard !
 
Daniel Veillard

$Id: mirror.html,v 1.10 2001/07/17 22:50:09 veillard Exp $