Discussion:
createrepo: external locking?
Oliver Hookins
2010-08-25 12:56:20 UTC
Permalink
Hi all,

I've hit an interesting problem on some of our repositories we use for
populating with our own applications. We have build servers generating RPMs
automatically (sometimes on every scm commit) and these are then dumped into the
yum repository directories and createrepo run on the directory to generate the
metadata.

At the moment, this means that every job in the build system that generates RPMs
must have its own directory (and now there are quite a few) as multiple
createrepo instances running against the same directory fails. We can't just run
it from cron periodically as there are some triggered jobs which pull down the
most recent metadata immediately afterwards to do some testing of the new
packages.

So, is there some way to get createrepo to lock a particular directory so that
multiple processes can update the metadata at least sequentially? Or is this
kind of thing best done with a wrapper around createrepo (or not at all)?

Best Regards,
Oliver
seth vidal
2010-08-25 13:26:20 UTC
Permalink
Post by Oliver Hookins
Hi all,
I've hit an interesting problem on some of our repositories we use for
populating with our own applications. We have build servers generating RPMs
automatically (sometimes on every scm commit) and these are then dumped into the
yum repository directories and createrepo run on the directory to generate the
metadata.
At the moment, this means that every job in the build system that generates RPMs
must have its own directory (and now there are quite a few) as multiple
createrepo instances running against the same directory fails. We can't just run
it from cron periodically as there are some triggered jobs which pull down the
most recent metadata immediately afterwards to do some testing of the new
packages.
So, is there some way to get createrepo to lock a particular directory so that
multiple processes can update the metadata at least sequentially? Or is this
kind of thing best done with a wrapper around createrepo (or not at all)?
I don't see how createrepo locking the dir would help unless all other
apps honored the locks quasi-sanely.

Is there a reason you cannot just make the program(s) which need to
access the metadata run in sequence and not in parallel?

-sv
Oliver Hookins
2010-08-25 13:48:31 UTC
Permalink
Post by seth vidal
Post by Oliver Hookins
Hi all,
I've hit an interesting problem on some of our repositories we use for
populating with our own applications. We have build servers generating RPMs
automatically (sometimes on every scm commit) and these are then dumped into the
yum repository directories and createrepo run on the directory to generate the
metadata.
At the moment, this means that every job in the build system that generates RPMs
must have its own directory (and now there are quite a few) as multiple
createrepo instances running against the same directory fails. We can't just run
it from cron periodically as there are some triggered jobs which pull down the
most recent metadata immediately afterwards to do some testing of the new
packages.
So, is there some way to get createrepo to lock a particular directory so that
multiple processes can update the metadata at least sequentially? Or is this
kind of thing best done with a wrapper around createrepo (or not at all)?
I don't see how createrepo locking the dir would help unless all other
apps honored the locks quasi-sanely.
Is there a reason you cannot just make the program(s) which need to
access the metadata run in sequence and not in parallel?
Not easily, and it's not desirable. The build process is often very long but the
RPM generation and subsequent metadata generation is relatively short. I don't
think we can put locks in our build system for just this part of the build,
sadly.

It seems like we might have to just keep maintaining one directory per build
process...
Robert Xu
2010-08-25 14:07:32 UTC
Permalink
Post by Oliver Hookins
Post by seth vidal
Post by Oliver Hookins
Hi all,
I've hit an interesting problem on some of our repositories we use for
populating with our own applications. We have build servers generating RPMs
automatically (sometimes on every scm commit) and these are then dumped into the
yum repository directories and createrepo run on the directory to generate the
metadata.
At the moment, this means that every job in the build system that generates RPMs
must have its own directory (and now there are quite a few) as multiple
createrepo instances running against the same directory fails. We can't just run
it from cron periodically as there are some triggered jobs which pull down the
most recent metadata immediately afterwards to do some testing of the new
packages.
So, is there some way to get createrepo to lock a particular directory so that
multiple processes can update the metadata at least sequentially? Or is this
kind of thing best done with a wrapper around createrepo (or not at all)?
I don't see how createrepo locking the dir would help unless all other
apps honored the locks quasi-sanely.
Is there a reason you cannot just make the program(s) which need to
access the metadata run in sequence and not in parallel?
Not easily, and it's not desirable. The build process is often very long but the
RPM generation and subsequent metadata generation is relatively short. I don't
think we can put locks in our build system for just this part of the build,
sadly.
It seems like we might have to just keep maintaining one directory per build
process...
You can use this, like I do when I run cron:
http://code.google.com/p/withlock/

later, Robert Xu
seth vidal
2010-08-25 14:46:46 UTC
Permalink
Post by Oliver Hookins
Post by seth vidal
I don't see how createrepo locking the dir would help unless all other
apps honored the locks quasi-sanely.
Is there a reason you cannot just make the program(s) which need to
access the metadata run in sequence and not in parallel?
First:
Could you subscribe to the list from the address you're posting from? I
don't enjoy approving emails for delivery.
Post by Oliver Hookins
Not easily, and it's not desirable. The build process is often very long but the
RPM generation and subsequent metadata generation is relatively short. I don't
think we can put locks in our build system for just this part of the build,
sadly.
So then I'm confused again


you have N build processes - at the end of the process they spit out an
rpm or a set of rpms into your repo dir - and they run createrepo - but
they could be finishing at roughly the same time and therefore will
tread on one another?

Do they all need to update the metadata immediately following their
builds?

Can you have them notify the createrepo runner that their builds are
completed?

-sv
Oliver Hookins
2010-08-25 14:54:50 UTC
Permalink
Post by seth vidal
Post by Oliver Hookins
Post by seth vidal
I don't see how createrepo locking the dir would help unless all other
apps honored the locks quasi-sanely.
Is there a reason you cannot just make the program(s) which need to
access the metadata run in sequence and not in parallel?
Could you subscribe to the list from the address you're posting from? I
don't enjoy approving emails for delivery.
I was pretty sure I did subscribe already, before I posted initially. I just
tried to subscribe again and it sent me the standard privacy alert...
Post by seth vidal
Post by Oliver Hookins
Not easily, and it's not desirable. The build process is often very long but the
RPM generation and subsequent metadata generation is relatively short. I don't
think we can put locks in our build system for just this part of the build,
sadly.
So then I'm confused again
you have N build processes - at the end of the process they spit out an
rpm or a set of rpms into your repo dir - and they run createrepo - but
they could be finishing at roughly the same time and therefore will
tread on one another?
That's pretty much it. We did all of the builds initially dropping their RPMs
into a single repository and had frequent collisions between createrepo runs.
Post by seth vidal
Do they all need to update the metadata immediately following their
builds?
Yes, as we have some continuous integration jobs which are triggered from the
finish of the build jobs that install the latest RPM from the repositories on
other machines.
Post by seth vidal
Can you have them notify the createrepo runner that their builds are
completed?
Currently, we just run createrepo at the end of the build job, but as I
mentioned, running a single instance from cron isn't possible since we need the
metadata to be generated synchronously. If there is even a possibility that we
have to wait 59 seconds for the metadata to be next created, the triggered
integration job won't be able to see the new RPM in the repository when it runs.
seth vidal
2010-08-25 15:06:49 UTC
Permalink
Post by Oliver Hookins
Post by seth vidal
Could you subscribe to the list from the address you're posting from? I
don't enjoy approving emails for delivery.
I was pretty sure I did subscribe already, before I posted initially. I just
tried to subscribe again and it sent me the standard privacy alert...
You're subscribed now.
Post by Oliver Hookins
Post by seth vidal
you have N build processes - at the end of the process they spit out an
rpm or a set of rpms into your repo dir - and they run createrepo - but
they could be finishing at roughly the same time and therefore will
tread on one another?
That's pretty much it. We did all of the builds initially dropping their RPMs
into a single repository and had frequent collisions between createrepo runs.
Then it sounds to me like you need a proper build system with dependent
build relationships. Something like koji.

A few of other options:

1. have each process generate their repodata in another outputdir and
then copy the repodata over top of the path you want atomically.

you still run the risk of there being races there, however.

2. have a repo per-builder and use yum to merge the repos at
install/build time. Then each builder has their own repo they are
controlling uniquely.

Then when the builds are finished one process can merge all of those.

3. put a wrapper with a simple lockfile around createrepo so the other
processes wait until it is complete before they start.

I don't see putting this code into createrepo - if only b/c it is a bit
out of scope for createrepo, I think.

-sv
Oliver Hookins
2010-08-26 09:50:49 UTC
Permalink
Post by seth vidal
Post by Oliver Hookins
That's pretty much it. We did all of the builds initially dropping their RPMs
into a single repository and had frequent collisions between createrepo runs.
Then it sounds to me like you need a proper build system with dependent
build relationships. Something like koji.
Thanks for the suggestion, I wasn't aware of koji so I'll take a look at it.
Post by seth vidal
1. have each process generate their repodata in another outputdir and
then copy the repodata over top of the path you want atomically.
you still run the risk of there being races there, however.
2. have a repo per-builder and use yum to merge the repos at
install/build time. Then each builder has their own repo they are
controlling uniquely.
Then when the builds are finished one process can merge all of those.
3. put a wrapper with a simple lockfile around createrepo so the other
processes wait until it is complete before they start.
I don't see putting this code into createrepo - if only b/c it is a bit
out of scope for createrepo, I think.
It may be a problem not worth solving, or maybe even a non-problem. Thank you
very much for your suggestions.

Loading...