Hedayat Vatankhah
2014-07-29 20:45:15 UTC
Dear all,
While I'm thinking about a completely new repository format, but I found
that would need much time from my side and probably lots of discussions.
But I think that implementing a kind of delta metadata support for the
current format is much easier and possibly provides considerable
improvement for updates compared to the current situation where a
completely new metadata is generated for updates. And I think I can
contribute code too.
Anyway, I'd like to propose my idea and request for feedback. My
suggestion is currently limited to XML MD format only, which is more
space efficient than SQLite dbs in itself. And specially since DNF is
using them.
The idea is very simple: when createrepo is going to update an existing
repodata, it doesn't replace the old primary/filelists/etc data.
Instead, it creates new xml files containing data about
added/modified/deleted packages (a deleted package can be marked by
having an entry for it without any data except its name and EVR. A
special attribute might be added if needed). And, repomd.xml contains
reference to these new files only.
Now, how older metadata files are referenced? They are referenced from
the new MD files, just like how repomd.xml refers to the latest
versions. For example, new primary.xml file which contains data about
latest added/modified/deleted packages references the previous
primary.xml file (which in turn might reference an older primary.xml
database). Therefore, we have a linked list of primary.xml files, where
the head is the latest primary delta and the tail is the main
primary.xml file created when the repo was created.
I like the idea of linked lists here, because it is scalable! But if it
seems overkill (does not allow pipelining to download a number of delta
xml files, because you should download an open an XML file to learn
about the name of previous one), we might come up with some other
suggestions for storing the list, e.g. the list might be stored in
repomd.xml itself, or a separate file for interested clients.
IMHO, this can be easily implemented in createrepo, and should be fairly
straightforward for clients to use. What do you think?
Regards,
Hedayat
While I'm thinking about a completely new repository format, but I found
that would need much time from my side and probably lots of discussions.
But I think that implementing a kind of delta metadata support for the
current format is much easier and possibly provides considerable
improvement for updates compared to the current situation where a
completely new metadata is generated for updates. And I think I can
contribute code too.
Anyway, I'd like to propose my idea and request for feedback. My
suggestion is currently limited to XML MD format only, which is more
space efficient than SQLite dbs in itself. And specially since DNF is
using them.
The idea is very simple: when createrepo is going to update an existing
repodata, it doesn't replace the old primary/filelists/etc data.
Instead, it creates new xml files containing data about
added/modified/deleted packages (a deleted package can be marked by
having an entry for it without any data except its name and EVR. A
special attribute might be added if needed). And, repomd.xml contains
reference to these new files only.
Now, how older metadata files are referenced? They are referenced from
the new MD files, just like how repomd.xml refers to the latest
versions. For example, new primary.xml file which contains data about
latest added/modified/deleted packages references the previous
primary.xml file (which in turn might reference an older primary.xml
database). Therefore, we have a linked list of primary.xml files, where
the head is the latest primary delta and the tail is the main
primary.xml file created when the repo was created.
I like the idea of linked lists here, because it is scalable! But if it
seems overkill (does not allow pipelining to download a number of delta
xml files, because you should download an open an XML file to learn
about the name of previous one), we might come up with some other
suggestions for storing the list, e.g. the list might be stored in
repomd.xml itself, or a separate file for interested clients.
IMHO, this can be easily implemented in createrepo, and should be fairly
straightforward for clients to use. What do you think?
Regards,
Hedayat