Discussion:
adding different compression types to createrepo
Matthew Dawkins
2010-08-02 23:39:45 UTC
Permalink
Hello, I'm new to the list, but not new to using createrepo and yum for repo
creation.

I'm looking to find if there are not patches out there to add an option to
calling for lzma or xz compression of the metafiles instead of gzip. I have
seen what opensuse uses to compress the primary.xml in conjunction with the
gzip compression and I have extended on it to compress the other and
filelists files too, but they don't really add clean functionality.

Regards,
Matthew Dawkins
Anders F Björklund
2010-08-03 08:46:07 UTC
Permalink
Post by Matthew Dawkins
I'm looking to find if there are not patches out there to add an
option to calling for lzma or xz compression of the metafiles
instead of gzip. I have seen what opensuse uses to compress the
primary.xml in conjunction with the gzip compression and I have
extended on it to compress the other and filelists files too, but
they don't really add clean functionality.
That would be https://features.opensuse.org/309167 I think,
but see also http://duncan.mac-vicar.com/blog/archives/537
/and http://lists.baseurl.org/pipermail/yum/2009-April/022560.html)
and some other discussion about using YAML and/or LZMA for repodata.

I think the extended opensuse patch should be acceptable, where it
uses "primary" for .xml or .xml.gz and the new "primary_lzma" key
for .xml.lzma or .xml.xz - plus the same added for "filelists_lzma"
and "other_lzma", it causes less breakage than changing "primary"...

--anders
Anders F Björklund
2010-08-03 14:55:55 UTC
Permalink
Post by Anders F Björklund
I think the extended opensuse patch should be acceptable, where it
uses "primary" for .xml or .xml.gz and the new "primary_lzma" key
for .xml.lzma or .xml.xz - plus the same added for "filelists_lzma"
and "other_lzma", it causes less breakage than changing "primary"...
/usr/bin/repo2solv.sh which is the tool that parses the repos in
openSUSE, already supports lzma (look at the source), we just don't
have any tool generating lzma.
And Smart also supports lzma/xz, since it just uses the file suffix.
I was under the impression that you had patched createrepo for lzma ?
And it should be simple to support both of .lzma and .xz compressions
by using PylibLZMA in yum, or liblzma directly for yum-metadata-parser.
I disagree the type should be changed. It would add more confusing
undocumented stuff to the format.
The concern was that if the type wasn't changed, then older versions
of yum would try to parse the lzma/xz as uncompressed and just get
checksum errors. So it was more for "compatibility" than anything ?
But the "primary_lzma" wasn't my idea, since it came from openSUSE.

--anders
Duncan Mac-Vicar P.
2010-08-03 14:45:31 UTC
Permalink
Post by Anders F Björklund
I think the extended opensuse patch should be acceptable, where it
uses "primary" for .xml or .xml.gz and the new "primary_lzma" key
for .xml.lzma or .xml.xz - plus the same added for "filelists_lzma"
and "other_lzma", it causes less breakage than changing "primary"...
--anders
/usr/bin/repo2solv.sh which is the tool that parses the repos in
openSUSE, already supports lzma (look at the source), we just don't have
any tool generating lzma.

I disagree the type should be changed. It would add more confusing
undocumented stuff to the format.

Duncan
seth vidal
2010-08-03 15:14:28 UTC
Permalink
Post by Anders F Björklund
Post by Matthew Dawkins
I'm looking to find if there are not patches out there to add an
option to calling for lzma or xz compression of the metafiles
instead of gzip. I have seen what opensuse uses to compress the
primary.xml in conjunction with the gzip compression and I have
extended on it to compress the other and filelists files too, but
they don't really add clean functionality.
That would be https://features.opensuse.org/309167 I think,
but see also http://duncan.mac-vicar.com/blog/archives/537
/and http://lists.baseurl.org/pipermail/yum/2009-April/022560.html)
and some other discussion about using YAML and/or LZMA for repodata.
YAML? I don't think anyone has discussed YAML and was taken seriously.

More likely than not most of the metadata will be transitioned to
sqlite-db ONLY and no xml at all. Yum already supports repos with only
sqlite dbs and no xml (other than repomd.xml).
Post by Anders F Björklund
I think the extended opensuse patch should be acceptable, where it
uses "primary" for .xml or .xml.gz and the new "primary_lzma" key
for .xml.lzma or .xml.xz - plus the same added for "filelists_lzma"
and "other_lzma", it causes less breakage than changing "primary"...
I suspect it would make sense to provide:

[primary|filelists|other][_db]_xz in the repomd.xml, in addition to the
gzip/bzip2 compressed alternatives and then gradually phase out the
older ones.


My biggest reticence in adding the xz support now is that if we move to
changing the repodata format to the layout that's been discussed on here
and on yum-devel list that we'll be able to move to xz support(and some
future proofing for compression formatting) w/o having to kludge a bunch
of xz compression crap into createrepo.

What do y'all think?

-sv
James Antill
2010-08-03 16:02:45 UTC
Permalink
Post by seth vidal
Post by Anders F Björklund
I think the extended opensuse patch should be acceptable, where it
uses "primary" for .xml or .xml.gz and the new "primary_lzma" key
for .xml.lzma or .xml.xz - plus the same added for "filelists_lzma"
and "other_lzma", it causes less breakage than changing "primary"...
[primary|filelists|other][_db]_xz in the repomd.xml, in addition to the
gzip/bzip2 compressed alternatives and then gradually phase out the
older ones.
*shrug*, I'm not sure why we'd want to add support for lzma on the
obsolete .xml files. Smart and apt both support .sqlite now, and if
zypper doesn't they can always use modifyrepo to add them for SuSE.
Also calling them "primary_db.xz" would fit the convention with
"groups.gz".
Post by seth vidal
My biggest reticence in adding the xz support now is that if we move to
changing the repodata format to the layout that's been discussed on here
and on yum-devel list that we'll be able to move to xz support(and some
future proofing for compression formatting) w/o having to kludge a bunch
of xz compression crap into createrepo.
Well it's going to be much easier to just change the compression, but
it's not the end of the world if you want to put it off a year or so
(IMO).
seth vidal
2010-08-03 16:11:36 UTC
Permalink
Post by James Antill
*shrug*, I'm not sure why we'd want to add support for lzma on the
obsolete .xml files. Smart and apt both support .sqlite now, and if
zypper doesn't they can always use modifyrepo to add them for SuSE.
Also calling them "primary_db.xz" would fit the convention with
"groups.gz".
nod. s/_xz/.xz/ yes.
Post by James Antill
Post by seth vidal
My biggest reticence in adding the xz support now is that if we move to
changing the repodata format to the layout that's been discussed on here
and on yum-devel list that we'll be able to move to xz support(and some
future proofing for compression formatting) w/o having to kludge a bunch
of xz compression crap into createrepo.
Well it's going to be much easier to just change the compression, but
it's not the end of the world if you want to put it off a year or so
(IMO).
like I said - reticence. Not decided at all. :-/

-sv
Anders F Björklund
2010-08-03 16:12:31 UTC
Permalink
Post by James Antill
Post by seth vidal
Post by Anders F Björklund
I think the extended opensuse patch should be acceptable, where it
uses "primary" for .xml or .xml.gz and the new "primary_lzma" key
for .xml.lzma or .xml.xz - plus the same added for "filelists_lzma"
and "other_lzma", it causes less breakage than changing "primary"...
[primary|filelists|other][_db]_xz in the repomd.xml, in addition to the
gzip/bzip2 compressed alternatives and then gradually phase out the
older ones.
openSUSE actually uses LZMA, not XZ. Both for RPMS and for repodata.

Using liblzma would handle both, but feeding xz to lzma doesn't work.
Post by James Antill
*shrug*, I'm not sure why we'd want to add support for lzma on the
obsolete .xml files. Smart and apt both support .sqlite now, and if
zypper doesn't they can always use modifyrepo to add them for SuSE.
Smart supposes .sqlite, but it is slower to parse and not extendible ?
Post by James Antill
Also calling them "primary_db.xz" would fit the convention with
"groups.gz".
Except that it is "group_gz"... (the type of the comps.xml.gz file)

--anders
James Antill
2010-08-03 16:53:59 UTC
Permalink
Post by Anders F Björklund
Post by James Antill
Post by seth vidal
Post by Anders F Björklund
I think the extended opensuse patch should be acceptable, where it
uses "primary" for .xml or .xml.gz and the new "primary_lzma" key
for .xml.lzma or .xml.xz - plus the same added for "filelists_lzma"
and "other_lzma", it causes less breakage than changing "primary"...
[primary|filelists|other][_db]_xz in the repomd.xml, in addition to the
gzip/bzip2 compressed alternatives and then gradually phase out the
older ones.
openSUSE actually uses LZMA, not XZ. Both for RPMS and for repodata.
Using liblzma would handle both, but feeding xz to lzma doesn't work.
Does that matter? The rpm payload is inside rpm, so matters less, and
can't be changed now anyway. But for "normal" files the convention is to
use .xz now ... no?
Post by Anders F Björklund
Post by James Antill
*shrug*, I'm not sure why we'd want to add support for lzma on the
obsolete .xml files. Smart and apt both support .sqlite now, and if
zypper doesn't they can always use modifyrepo to add them for SuSE.
Smart supposes .sqlite, but it is slower to parse and not extendible ?
I find it _really_ hard to believe that XML parsing is faster than
reading data from .sqlite.
.sqlite is about as extensible as XML, and primary/etc. have never
changed (and the coming changes are just as likely to be done by adding
new files).

However there is a cost to having 4-8 different versions of
primary/filelists/etc. ... both in createrepo time, in hosting disk
space, in maintenance of all the weird code paths and in repomd.xml size
(most repos. aren't using metalink, so repomd.xml is downloaded a lot).
Post by Anders F Björklund
Post by James Antill
Also calling them "primary_db.xz" would fit the convention with
"groups.gz".
Except that it is "group_gz"... (the type of the comps.xml.gz file)
You're right, for some reason I thought it was "group.gz" ... my bad,
ignore that comment.
Anders F Björklund
2010-08-03 20:09:49 UTC
Permalink
Post by James Antill
Post by Anders F Björklund
openSUSE actually uses LZMA, not XZ. Both for RPMS and for repodata.
Using liblzma would handle both, but feeding xz to lzma doesn't work.
Does that matter? The rpm payload is inside rpm, so matters less, and
can't be changed now anyway. But for "normal" files the convention is to
use .xz now ... no?
I thought the "convention" was .gz, since that's the only choice given.

The name suffix isn't that important, whether "_xz" or "_lzma" or "2"
Post by James Antill
Post by Anders F Björklund
Post by James Antill
*shrug*, I'm not sure why we'd want to add support for lzma on the
obsolete .xml files. Smart and apt both support .sqlite now, and if
zypper doesn't they can always use modifyrepo to add them for SuSE.
Smart supposes .sqlite, but it is slower to parse and not
extendible ?
I find it _really_ hard to believe that XML parsing is faster than
reading data from .sqlite.
Sloppy coding, most likely. The files *are* somewhat bigger, though...

Either way it's not as much of a gain as for yum which can use the
.sqlite files directly. Both the .xml and .sqlite need converting,
into the internal format. And so far changing hasn't been worth it.
BTW; The code is at https://code.launchpad.net/~afb/smart/sqlite

The main advantage is the random access, where even indexed XML sucks.
Post by James Antill
.sqlite is about as extensible as XML, and primary/etc. have never
changed (and the coming changes are just as likely to be done by adding
new files).
It's easier to add new attributes and tags, than new columns and tables.
Post by James Antill
However there is a cost to having 4-8 different versions of
primary/filelists/etc. ... both in createrepo time, in hosting disk
space, in maintenance of all the weird code paths and in repomd.xml size
(most repos. aren't using metalink, so repomd.xml is downloaded a lot).
I'm not sure where this 4-8 number came from. We were talking about 2,
or 3 if you want to include the .sqlite files too (which are different).

repomd.xml
primary.xml.gz (or primary.xml)
primary.xml.xz (or primary.xml.lzma)
primary.sqlite.bz2

When primary.xml.gz is removed, that's only one or two formats left.
Depending on whether you are using the XML markup or SQL database ?

Just thought it was a simple solution to the repodata compatibility.
And something that you would have upstream rather that downstream...

--anders
Matthew Dawkins
2010-08-03 23:03:44 UTC
Permalink
So if I understand this right. The push is to drop xml entirely and push
sqlitedb which is double in size and not necessarily faster?

Is anyone else interested in helping produce a patch to give the users a
decision to compress the xml files with xz or lzma and not be piggy backed
on the gz files?

Sorry about the earlier full digest posts. Gmail does a great job of hiding
the old text.
Anders F Björklund
2010-08-04 07:59:55 UTC
Permalink
Post by Matthew Dawkins
Is anyone else interested in helping produce a patch to give the
users a decision to compress the xml files with xz or lzma and not
be piggy backed on the gz files?
That should be trivial.

Bonus points for using "import lzma" instead of "| lzma",
like utils.LzmaFile did*... That is, to use lzma.LZMAFile
Might want to fix the typos/thinkos too, while at it. :-)
(the lzma and gz variables in HybridFile have been swapped)

* https://build.opensuse.org/package/view_file?file=createrepo-0.9.8-
add-lzma-option-to-generate-
primary.xml.lzma.patch&package=createrepo&project=system%
3Apackagemanager

--anders
Anders F Björklund
2010-08-06 10:25:27 UTC
Permalink
Post by Anders F Björklund
Bonus points for using "import lzma" instead of "| lzma",
like utils.LzmaFile did*... That is, to use lzma.LZMAFile
In fact, that is probably a requirement... (lzma.LZMAFile)

There is nothing in the current patch that actually makes
sure that the "lzma" (or "xz") command has finished first,
before it looks at the resulting repodata temp output file.
This can result in the following repomd (note the file size):

<data type="primary">
<checksum
type="sha256">e8960f94ce801d8ba6c514aaf07c9f64db0ca76461a56252fe94fc9068
93377b</checksum>
<timestamp>1281089005</timestamp>
<size>841</size>
<open-size>1657</open-size>
<open-checksum
type="sha256">55ca9c23daaa3ea30040cf5d98ee5fb16064d49aad2168ae01d82cbe9f
83c390</open-checksum>
<location href="repodata/primary.xml.gz"/>
</data>
<data type="primary_lzma">
<checksum
type="sha256">e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b78
52b855</checksum>
<timestamp>1281089005</timestamp>
<size>0</size>
<open-size>1657</open-size>
<open-checksum
type="sha256">55ca9c23daaa3ea30040cf5d98ee5fb16064d49aad2168ae01d82cbe9f
83c390</open-checksum>
<location href="repodata/primary.xml.lzma"/>
</data>

It was supposed to be 821 bytes, but lzma hadn't finished...

--anders
Anders F Björklund
2010-08-25 10:56:26 UTC
Permalink
Post by Anders F Björklund
Post by Anders F Björklund
Bonus points for using "import lzma" instead of "| lzma",
like utils.LzmaFile did*... That is, to use lzma.LZMAFile
In fact, that is probably a requirement... (lzma.LZMAFile)
Fixed patch was rejected by this mailing list, but is in Unity.
It adds --lzma and --xz switches to createrepo, with pyliblzma.

Since it is using gzip by default, there should be no breakage...

Support for primary.xml.lzma and Packages.lzma is now in Smart,
along with the reading of the Mandriva media_info/info.xml.lzma.

As stable distros are still using bzip2, it'll take longer for xz.

--anders

seth vidal
2010-08-04 14:26:17 UTC
Permalink
Post by Matthew Dawkins
So if I understand this right. The push is to drop xml entirely and
push sqlitedb which is double in size and not necessarily faster?
Double in size? Not sure where you get double in size from.

for primary it seems to be about 30% larger - which is one of the things
we want to address by changing the format of all of it.

for filelists the sqlite is smaller by almost half.
for other/changelog the sqlite is 12% larger.
Post by Matthew Dawkins
Is anyone else interested in helping produce a patch to give the users
a decision to compress the xml files with xz or lzma and not be piggy
backed on the gz files?
I'm not terribly interested in accepting that patch upstream. It'll just
be breaking compat for lots and lots of tools and breaking compat w/o
making other beneficial changes to the format of things is unwise.

It means we'll have repos masquerading as compatible which are not.

-sv
Matthew Dawkins
2010-08-04 15:01:29 UTC
Permalink
Post by seth vidal
Double in size? Not sure where you get double in size from.
for primary it seems to be about 30% larger - which is one of the things
we want to address by changing the format of all of it.
This is just one of my results:
668 ./filelists.sqlite.bz2
536 ./filelists.xml.gz
128 ./other.sqlite.bz2
92 ./other.xml.gz
472 ./primary.sqlite.bz2
212 ./primary.xml.gz
4 ./repomd.xml

for filelists the sqlite is smaller by almost half.
Post by seth vidal
for other/changelog the sqlite is 12% larger.
You can see primary is_NOT 30% larger but more like 120% larger. Although
filelists is only around 25% larger. So...?
Post by seth vidal
I'm not terribly interested in accepting that patch upstream. It'll just
be breaking compat for lots and lots of tools and breaking compat w/o
making other beneficial changes to the format of things is unwise.
It means we'll have repos masquerading as compatible which are not.
If the patch is done right it can give each individual user their desired
results. It would be easy to add a switch to the command like the opensuse
patch does. ie

createrepo --lzma\xz\bz2\gz

and leave the default as gz for xml files and bz2 for sqlite files. That way
nothing changes for the normal default usage, but if one would like to
enhance their compression of the xml or sqlite files, that would be their
choice to add the extra switch. I don't understand why that gets confused
with trying to masquerade and being compatible?
Robert Xu
2010-08-04 15:13:38 UTC
Permalink
I can't resist these talks anymore. time for me to give my... 2 cents.
Post by seth vidal
Double in size? Not sure where you get double in size from.
for primary it seems to be about 30% larger - which is one of the things
we want to address by changing the format of all of it.
I still don't like the fact that we can't just have simple xml files.
xml isn't obsolete.
668     ./filelists.sqlite.bz2
536     ./filelists.xml.gz
128     ./other.sqlite.bz2
92      ./other.xml.gz
472     ./primary.sqlite.bz2
212     ./primary.xml.gz
4       ./repomd.xml
Post by seth vidal
for filelists the sqlite is smaller by almost half.
for other/changelog the sqlite is 12% larger.
You can see primary is_NOT 30% larger but more like 120% larger.  Although
filelists is only around 25% larger. So...?
It doesn't help for people with slow internet connections; downloading
such large files could take an eternity.
Post by seth vidal
I'm not terribly interested in accepting that patch upstream. It'll just
be breaking compat for lots and lots of tools and breaking compat w/o
making other beneficial changes to the format of things is unwise.
It means we'll have repos masquerading as compatible which are not.
If the patch is done right it can give each individual user their desired
results. It would be easy to add a switch to the command like the opensuse
patch does. ie
createrepo --lzma\xz\bz2\gz
and leave the default as gz for xml files and bz2 for sqlite files. That way
nothing changes for the normal default usage, but if one would like to
enhance their compression of the xml or sqlite files, that would be their
choice to add the extra switch. I don't understand why that gets confused
with trying to masquerade and being compatible?
I like this. Let createrepo get its "option" switch and let the
distributions figure out themselves.
Don't change this just because one distribution wants to.

Just my few cents.
--
later, Robert Xu
seth vidal
2010-08-04 16:02:26 UTC
Permalink
Post by Matthew Dawkins
You can see primary is_NOT 30% larger but more like 120% larger.
Decompress them.
Post by Matthew Dawkins
If the patch is done right it can give each individual user their
desired results. It would be easy to add a switch to the command like
the opensuse patch does. ie
createrepo --lzma\xz\bz2\gz
Each user of createrepo - not each user of the repo. And since the users
of createrepo don't REALLY matter it seems odd to be targetting them as
a 'user'.

the trick is making repos which will work with old and new versions of
various pkg mgmt tools.
Post by Matthew Dawkins
and leave the default as gz for xml files and bz2 for sqlite files.
That way nothing changes for the normal default usage, but if one
would like to enhance their compression of the xml or sqlite files,
that would be their choice to add the extra switch. I don't understand
why that gets confused with trying to masquerade and being compatible?
But it means when I look at the repomd.xml
I see the 'primary' datatype and the file is in a format I CANNOT read
on an older ver of yum, apt or smart. And if there are no other options
available for my pkgmgmt tool in then we have a repo which cannot be
read at all.

So then we get a user who complains that a repo is advertising itself as
a particular type of repo but it has unreadable metadata files.

that's the merit in having different names for the datatypes, OR
as I've suggested elsewhere we have a way of specifying the compression
format in the datatype entry as an attribute so we can know if we're
going to be compat with a repo just by looking at the repomd.xml and not
wasting time/bandwidth downloading the files.


that's what all this about.

-sv
Matthew Dawkins
2010-08-04 17:30:59 UTC
Permalink
Post by seth vidal
Decompress them.
I'm not worried about the decompressed file. Otherwise I wouldn't have
brought up the subject of better_compression. It's not a decompressed file
that I want to serve and have downloaded.

Each user of createrepo - not each user of the repo. And since the users
Post by seth vidal
of createrepo don't REALLY matter it seems odd to be targetting them as
a 'user'.
Well, whatever you want to call them/me/us. As a content provider and a user
of that same content. I am VERY_AWARE of how to improve it. Each distro
using createrepo to create a repo for their users, is a user of createrepo.
Post by seth vidal
the trick is making repos which will work with old and new versions of
various pkg mgmt tools.
Like Robert Xu said. Let the distros hash out what format and compression
they want to provide in their own repos and not let one person or distro
make that decision for us. If in time your sqlite format is AWESOME, I'm
sure we will switch, but until that time we will continue to use the xml
files.
Post by seth vidal
But it means when I look at the repomd.xml
I see the 'primary' datatype and the file is in a format I CANNOT read
on an older ver of yum, apt or smart. And if there are no other options
available for my pkgmgmt tool in then we have a repo which cannot be
read at all.
There would be a simple fix for you. Don't use the xz compression in your
repo and don't try to add a distro's repo that prolly won't work for you
anyways. Use the defaults.
Post by seth vidal
So then we get a user who complains that a repo is advertising itself as
a particular type of repo but it has unreadable metadata files.
Remember the default would still be gz for xml and bz2 for sqlite.

Also, you seem to forget that a distros repos are very tailored to the setup
already used by that distro. I'm not trying to offer content to Fedora or
Opensuse or Xdistro. My pkgs will very well break another distros install.
And that prolly goes for any other distros as well. The last thing I'm gonna
do is go add a foreign repo.

Ultimately, I care about my repos. I care about making the download smaller
for the end user. Xz compression of whatever file format you provide is
ultimately smaller than gz and or bz2. I'm not trying to provide backwards
compatibility for older smart versions (the pkg mgr we use) and I'm not
proposing that someone inadvertently break theirs. It's about a value added
option, that obviously some repo providers using createrepo would like to
see.
Post by seth vidal
that's the merit in having different names for the datatypes, OR
as I've suggested elsewhere we have a way of specifying the compression
format in the datatype entry as an attribute so we can know if we're
going to be compat with a repo just by looking at the repomd.xml and not
wasting time/bandwidth downloading the files.
I can understand this, but isn't that what the magic info in a file is
for??? None the less using what the opensuse patch did seems reasonable no?
primary_xz
Robert Xu
2010-08-04 17:43:21 UTC
Permalink
Post by Matthew Dawkins
Post by seth vidal
Decompress them.
I'm not worried about the decompressed file. Otherwise I wouldn't have
brought up the subject of better_compression. It's not a decompressed file
that I want to serve and have downloaded.
Post by seth vidal
Each user of createrepo - not each user of the repo. And since the users
of createrepo don't REALLY matter it seems odd to be targetting them as
a 'user'.
We use createrepo, yes, but it's the users of our repos that are
benefited the most.
We have to tailor to their needs.
Post by Matthew Dawkins
Well, whatever you want to call them/me/us. As a content provider and a user
of that same content. I am VERY_AWARE of how to improve it. Each distro
using createrepo to create a repo for their users, is a user of createrepo.
Post by seth vidal
the trick is making repos which will work with old and new versions of
various pkg mgmt tools.
Like Robert Xu said. Let the distros hash out what format and compression
they want to provide in their own repos and not let one person or distro
make that decision for us. If in time your sqlite format is AWESOME, I'm
sure we will switch, but until that time we will continue to use the xml
files.
Post by seth vidal
But it means when I look at the repomd.xml
I see the 'primary' datatype and the file is in a format I CANNOT read
on an older ver of yum, apt or smart. And if there are no other options
available for my pkgmgmt tool in then we have a repo which cannot be
read at all.
There would be a simple fix for you. Don't use the xz compression in your
repo and don't try to add a distro's repo that prolly won't work for you
anyways. Use the defaults.
Isn't that what a repomd.xml file is for?
You can clearly see something is wrong if you download Fedora's apt
package and try to use it.

It'd be nice to have repomd have "backwards compat" in the file.
But then again, why would you use another distro's files?
Post by Matthew Dawkins
Post by seth vidal
So then we get a user who complains that a repo is advertising itself as
a particular type of repo but it has unreadable metadata files.
Remember the default would still be gz for xml and bz2 for sqlite.
Also, you seem to forget that a distros repos are very tailored to the setup
already used by that distro. I'm not trying to offer content to Fedora or
Opensuse or Xdistro. My pkgs will very well break another distros install.
And that prolly goes for any other distros as well. The last thing I'm gonna
do is go add a foreign repo.
Ultimately, I care about my repos. I care about making the download smaller
for the end user. Xz compression of whatever file format you provide is
ultimately smaller than gz and or bz2. I'm not trying to provide backwards
compatibility for older smart versions (the pkg mgr we use) and I'm not
proposing that someone inadvertently break theirs. It's about a value added
option, that obviously some repo providers using createrepo would like to
see.
Post by seth vidal
that's the merit in having different names for the datatypes, OR
as I've suggested elsewhere we have a way of specifying the compression
format in the datatype entry as an attribute so we can know if we're
going to be compat with a repo just by looking at the repomd.xml and not
wasting time/bandwidth downloading the files.
I can understand this, but isn't that what the magic info in a file is
for??? None the less using what the opensuse patch did seems reasonable no?
primary_xz
Truthfully, I would rather xz the xml files than xz the sqlite files;
The sqlite files would still end up bigger.
--
later, Robert Xu
seth vidal
2010-08-04 18:05:33 UTC
Permalink
Post by Robert Xu
We use createrepo, yes, but it's the users of our repos that are
benefited the most.
We have to tailor to their needs.
Sort of right. There are multiple goals to createrepo.

First - it was to be a tool to generate rpm-xml-metadata. Originally, it
was written as a reference implementation.

The format of the xml files is, ostensibly, the format. However, the
entire repodata structure, which is inclusive of compression formats,
have become a standard.

ie: xml files are gzip compressed, excluding repomd.xml which is not
compressed at all.

That means if we make it so that SOME repos are not compressed using the
same format we've broken the standard.

That is, to me, a problem and not one to be undertaken lightly.
Post by Robert Xu
Isn't that what a repomd.xml file is for?
You can clearly see something is wrong if you download Fedora's apt
package and try to use it.
But you can't clearly see that something is wrong unless you are
willing to trust the filenames.
Post by Robert Xu
It'd be nice to have repomd have "backwards compat" in the file.
But then again, why would you use another distro's files?
There are lots of repos made which are for ALL distros - for example the
adobe-flash repos which are just 'yum repos' and are not distro
specific.
Post by Robert Xu
Truthfully, I would rather xz the xml files than xz the sqlite files;
The sqlite files would still end up bigger.
the sqlite is not really involved, at all, at this point. We can safely
ignore it.

-sv
seth vidal
2010-08-04 17:47:42 UTC
Permalink
Post by Matthew Dawkins
Like Robert Xu said. Let the distros hash out what format and
compression they want to provide in their own repos and not let one
person or distro make that decision for us. If in time your sqlite
format is AWESOME, I'm sure we will switch, but until that time we
will continue to use the xml files.
That's the problem. RPM just had xz/lzma support added for compressing
payloads. That broke..... all older versions of rpm being able to even
read/handle packages made by the newer versions.

I was hoping to keep from doing that to all of our users AGAIN.

I would LOVE to be able to wish-away all backward compat issues, but
they exist and they ARE REAL.

So, if we're going to add code that breaks compat with older clients,
I'd rather:

- make BIG changes that benefit us a lot more than just changing the
compression format
- work on a new standard.

-sv
Matthew Dawkins
2010-08-04 18:01:45 UTC
Permalink
Post by seth vidal
Post by Matthew Dawkins
Like Robert Xu said. Let the distros hash out what format and
compression they want to provide in their own repos and not let one
person or distro make that decision for us. If in time your sqlite
format is AWESOME, I'm sure we will switch, but until that time we
will continue to use the xml files.
That's the problem. RPM just had xz/lzma support added for compressing
payloads. That broke..... all older versions of rpm being able to even
read/handle packages made by the newer versions.
I was hoping to keep from doing that to all of our users AGAIN.
I would LOVE to be able to wish-away all backward compat issues, but
they exist and they ARE REAL.
So, if we're going to add code that breaks compat with older clients,
- make BIG changes that benefit us a lot more than just changing the
compression format
- work on a new standard.
-sv
So you are saying you would purposely using the xz switch?

In my case I have no problems using it. Smart supports the decompression of
xz'd xml files. It's not even the case of me breaking backwards-compat repo.

Adding an option and using it are two separate issues.
seth vidal
2010-08-04 18:06:28 UTC
Permalink
Post by Matthew Dawkins
So you are saying you would purposely using the xz switch?
I don't understand this sentence.
Post by Matthew Dawkins
In my case I have no problems using it. Smart supports the
decompression of xz'd xml files. It's not even the case of me breaking
backwards-compat repo.
Adding an option and using it are two separate issues.
Read my other posts and I think it'll make more sense.


-sv
Matthew Dawkins
2010-08-04 18:42:20 UTC
Permalink
Post by seth vidal
Post by Matthew Dawkins
So you are saying you would purposely using the xz switch?
I don't understand this sentence.
Post by Matthew Dawkins
In my case I have no problems using it. Smart supports the
decompression of xz'd xml files. It's not even the case of me breaking
backwards-compat repo.
Adding an option and using it are two separate issues.
Read my other posts and I think it'll make more sense.
No one is saying, "Change the default behavior."

No, sorry your logic doesn't make sense. I've read over and over what you
have said and all I see are sentences avoiding the real subject of adding an
additional compression option.

I think it would be better to agree on how to add such functionality (which
seems to be a desired feature) instead continually say that you wouldn't add
it or reasons not to. It does no one any good to fork or hold rogue patches
for a tool that many consider to be a standard, but that's what's going to
happen if all the users of createrepo hear is no.
seth vidal
2010-08-04 18:44:45 UTC
Permalink
On Wed, Aug 4, 2010 at 12:06 PM, seth vidal
Post by Matthew Dawkins
So you are saying you would purposely using the xz switch?
I don't understand this sentence.
Post by Matthew Dawkins
In my case I have no problems using it. Smart supports the
decompression of xz'd xml files. It's not even the case of
me breaking
Post by Matthew Dawkins
backwards-compat repo.
Adding an option and using it are two separate issues.
Read my other posts and I think it'll make more sense.
No one is saying, "Change the default behavior."
No, sorry your logic doesn't make sense. I've read over and over what
you have said and all I see are sentences avoiding the real subject of
adding an additional compression option.
I think it would be better to agree on how to add such functionality
(which seems to be a desired feature) instead continually say that you
wouldn't add it or reasons not to. It does no one any good to fork or
hold rogue patches for a tool that many consider to be a standard,
but that's what's going to happen if all the users of createrepo hear
is no.
'all they hear is no'? Seriously, you've asked for one thing and you've
been given a fairly detailed and nuanced answer and you boil it down to
'no'?

-sv
Matthew Dawkins
2010-08-04 19:04:27 UTC
Permalink
Post by seth vidal
'all they hear is no'? Seriously, you've asked for one thing and you've
been given a fairly detailed and nuanced answer and you boil it down to
'no'?
Honestly, maybe that's the problem. Instead of addressing the question head
on. The responses have been more or less negative. I haven't seen one
positive response of well maybe this could work for everyone without
breaking compatiblitly.


Why don't you ask? How many people would like to see alternative options to
compress the xml or sqlite files? Minus repomd.xml. Apparently Duncan and
opensuse have already tried to slightly address the issue?


Look I'm not trying to be confronting, but at least please try to work with
the idea.

Matt
seth vidal
2010-08-04 19:08:57 UTC
Permalink
Post by seth vidal
'all they hear is no'? Seriously, you've asked for one thing and you've
been given a fairly detailed and nuanced answer and you boil it down to
'no'?
Honestly, maybe that's the problem. Instead of addressing the question
head on. The responses have been more or less negative. I haven't seen
one positive response of well maybe this could work for everyone
without breaking compatiblitly.
Why don't you ask? How many people would like to see alternative
options to compress the xml or sqlite files? Minus repomd.xml.
Apparently Duncan and opensuse have already tried to slightly address
the issue?
Look I'm not trying to be confronting, but at least please try to work
with the idea.
I thought we did. By using a different datatype this is doable in a way
that won't break older clients. It WILL require some modification to
existing clients to get them to handle the xz/lzma type.

I brought up the subject of discussing big changes to the
format/repodata in another thread which, I'll note, I've received
nothing by echoing silence on.


-sv
Matthew Dawkins
2010-08-04 19:30:49 UTC
Permalink
Post by seth vidal
I thought we did. By using a different datatype this is doable in a way
that won't break older clients. It WILL require some modification to
existing clients to get them to handle the xz/lzma type.
<data type="other_xz">
<data type="filelists_xz">
<data type="primary_xz">
seems to be acceptable???

I have talked to Anders on what it would take for smart to handle them, and
it seems like an easy enough change.

So next would be providing an acceptable patch that would be accepted
upstream?

I brought up the subject of discussing big changes to the
Post by seth vidal
format/repodata in another thread which, I'll note, I've received
nothing by echoing silence on.
First things first.
Anders F Björklund
2010-08-05 09:25:45 UTC
Permalink
Post by Matthew Dawkins
Post by seth vidal
I thought we did. By using a different datatype this is doable in a way
that won't break older clients. It WILL require some modification to
existing clients to get them to handle the xz/lzma type.
<data type="other_xz">
<data type="filelists_xz">
<data type="primary_xz">
seems to be acceptable???
The suggestion was to use "_lzma" so that it would work with either
(of .lzma and .xz), but whatever. It's only a name, easy to change.
Post by Matthew Dawkins
I have talked to Anders on what it would take for smart to handle them, and
it seems like an easy enough change.
= "primary_lzma" in info and info["primary_lzma"] or info["primary"]
Yeah, that seems easy enough to me. Lather, rinse, repeat for others.

--anders
James Antill
2010-08-04 18:40:52 UTC
Permalink
Post by seth vidal
the trick is making repos which will work with old and new versions of
various pkg mgmt tools.
Like Robert Xu said. Let the distros hash out what format and
compression they want to provide in their own repos and not let one
person or distro make that decision for us.
If everyone wants to patch createrepo to hell and back, and have 666
differently compatible versions of repomd ... it's OSS, you can do that.
Go have fun. But, to be frank, don't call it repomd and unsubscribe from
this mailing list because you are just doing your own thing.

The much better option is to have a group of upstream maintainers that
will produce the best possible migration path from where we are now, to
where we want to be. And, IMNSHO, that does _not_ including everything
supporting 666 different things, or having repodata with 666 different
formats in it.
It so happens that createrepo is the most active (newest features,
etc.) program used to generate repomd data and that:

git log --since=2004-01-01 | egrep '^Author'

...contains just Fedora/RHEL people. That isn't our fault and we try to
take input from everyone (but we aren't going to make everyone's life
harder, just to make yours easier).
Saying "one distro. making decisions for us" is, at best, misleading.
Post by seth vidal
If in time your sqlite format is AWESOME, I'm sure we will switch,
but until that time we will continue to use the xml files.
sqlite has been around in createrepo since Feb. 2007, database-only
work has been ongoing since at least Mar. 2008. Generating sqlite by
default has been there since Apr. 2009.
But, hey, feel free to do nothing for another few years and then
complain.
Matthew Dawkins
2010-08-04 18:56:13 UTC
Permalink
Post by James Antill
sqlite has been around in createrepo since Feb. 2007, database-only
work has been ongoing since at least Mar. 2008. Generating sqlite by
default has been there since Apr. 2009.
But, hey, feel free to do nothing for another few years and then
complain.
Not true. I have the last version 0.9.8 released 28-Aug-2009 and to invoke
the sqlite files I have to use the switch "-d", but honestly that's not what
this discuss was ever about.

I'm not complaining. I'm trying to be proactive in a tool that we use at the
base of our distro and get a feature included. A feature that I know others
would re-use as well.
Duncan Mac-Vicar P.
2010-08-16 14:54:01 UTC
Permalink
Post by James Antill
It so happens that createrepo is the most active (newest features,
git log --since=2004-01-01 | egrep '^Author'
...contains just Fedora/RHEL people. That isn't our fault and we try to
take input from everyone (but we aren't going to make everyone's life
harder, just to make yours easier).
I guess you are using the Author tag incorrectly then.

http://lists.baseurl.org/pipermail/rpm-metadata/2007-May/000764.html
shows some patches contributed by Christoph Thiel that were committed.

Also, there were other patches described in
http://lists.baseurl.org/pipermail/rpm-metadata/2007-May/000770.html
that seems to have been ignored.

I submitted the same patches later:
http://lists.baseurl.org/pipermail/rpm-metadata/2008-September/000980.html
and they were ignored.

At the end, our createrepo package version lagged behind for long time
(0.4.x when 0.9.x was upstream) because the old one worked in production
(build service was using it) and nobody wanted to deal with those
patches anymore.
Post by James Antill
Saying "one distro. making decisions for us" is, at best, misleading.
I agree, misleading. But as you can see from above, the interest in
other distributions has not been IMHO "optimal" either.
--
Duncan Mac-Vicar P. - Novell® Making IT Work As One™
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nuernberg)
James Antill
2010-08-04 14:42:44 UTC
Permalink
Post by Anders F Björklund
Post by James Antill
Post by Anders F Björklund
openSUSE actually uses LZMA, not XZ. Both for RPMS and for repodata.
Using liblzma would handle both, but feeding xz to lzma doesn't work.
Does that matter? The rpm payload is inside rpm, so matters less, and
can't be changed now anyway. But for "normal" files the convention
is to use .xz now ... no?
I thought the "convention" was .gz, since that's the only choice given.
For repomd, yes .gz/.bz2 is the only std. target atm. I meant as a
general answer it's common to use .xz but not .lzma, Eg.
ftp://ftp.gnu.org/gnu/coreutils/ now has .gz and .xz for the latest
releases.
Post by Anders F Björklund
Either way it's not as much of a gain as for yum which can use the
.sqlite files directly. Both the .xml and .sqlite need converting,
into the internal format. And so far changing hasn't been worth it.
Yes, we understand yum gets a bigger speedup than zypper will because
we use the .sqlite directly. I'm even willing to concede that a custom
DB can be faster than sqlite (although I doubt it is _significant_), for
the data it is designed for.
However it seems like a very worthwhile goal, to me, to only have one
set of MD. And for that set of MD to not require worthless conversions
on each client. There is currently only primary_db in upstream
createrepo, which meets those needs.

So, while we aren't going to remove "primary" generation support
tomorrow, it is very much a second class type already IMO.
Post by Anders F Björklund
Post by James Antill
.sqlite is about as extensible as XML, and primary/etc. have never
changed (and the coming changes are just as likely to be done by adding
new files).
It's easier to add new attributes and tags, than new columns and tables.
I would disagree, you can do a single call in sqlite to see if a table
or column exists and if so it will be there for every value ... XML is
much less conforming. At worst I'd say it's the same.
Post by Anders F Björklund
Post by James Antill
However there is a cost to having 4-8 different versions of
primary/filelists/etc. ... both in createrepo time, in hosting disk
space, in maintenance of all the weird code paths and in repomd.xml size
(most repos. aren't using metalink, so repomd.xml is downloaded a lot).
I'm not sure where this 4-8 number came from. We were talking about 2,
or 3 if you want to include the .sqlite files too (which are different).
repomd.xml
primary.xml.gz (or primary.xml)
primary.xml.xz (or primary.xml.lzma)
primary.sqlite.bz2
Supporting everything, we'd have:

primary.gz
primary.lzma
primary.xz
primary_db.bz2
primary_db.xz
primary_solv.xz
[...]

...my goal (and Seth's, I think) is to have something like:

mini_primary.xz
[...]

...but, obviously that is "in the future", as no code has been written
for any of the proposed repodata formats.
--
James Antill - ***@fedoraproject.org
http://yum.baseurl.org/wiki/whatsnew/3.2.28
http://yum.baseurl.org/wiki/YumBenchmarks
http://yum.baseurl.org/wiki/YumHistory
Anders F Björklund
2010-08-05 09:15:59 UTC
Permalink
Post by James Antill
Post by Anders F Björklund
Post by James Antill
Post by Anders F Björklund
openSUSE actually uses LZMA, not XZ. Both for RPMS and for
repodata.
Using liblzma would handle both, but feeding xz to lzma doesn't work.
Does that matter? The rpm payload is inside rpm, so matters
less, and
can't be changed now anyway. But for "normal" files the convention
is to use .xz now ... no?
I thought the "convention" was .gz, since that's the only choice given.
For repomd, yes .gz/.bz2 is the only std. target atm. I meant as a
general answer it's common to use .xz but not .lzma, Eg.
ftp://ftp.gnu.org/gnu/coreutils/ now has .gz and .xz for the latest
releases.
Sure, but it said "in openSUSE" above, and not "in general". :-)

It doesn't matter, as when I said "LZMA" I meant either lzma/xz,
as in either of the (legacy) LZMA_Alone or the XZ file formats.

unxz and pyliblzma handles both (xzdec only handled .xz - my bad)
Post by James Antill
Post by Anders F Björklund
Either way it's not as much of a gain as for yum which can use the
.sqlite files directly. Both the .xml and .sqlite need converting,
into the internal format. And so far changing hasn't been worth it.
Yes, we understand yum gets a bigger speedup than zypper will because
we use the .sqlite directly. I'm even willing to concede that a custom
DB can be faster than sqlite (although I doubt it is
_significant_), for
the data it is designed for.
I don't know much about zypper, so will let duncanmv answer that.

Was talking about Smart, which reads everything into the "cache".
Post by James Antill
However it seems like a very worthwhile goal, to me, to only have one
set of MD. And for that set of MD to not require worthless conversions
on each client. There is currently only primary_db in upstream
createrepo, which meets those needs.
I thought that the .xml was transparently converted to .sqlite
on the client with the use of the "yum-metadata-parser" module ?

Having the .sqlite in the repodata is more like a pre-compute,
especially if you are not going to use the database afterwards.
Post by James Antill
So, while we aren't going to remove "primary" generation support
tomorrow, it is very much a second class type already IMO.
Post by Anders F Björklund
Post by James Antill
.sqlite is about as extensible as XML, and primary/etc. have never
changed (and the coming changes are just as likely to be done by adding
new files).
It's easier to add new attributes and tags, than new columns and tables.
I would disagree, you can do a single call in sqlite to see if a table
or column exists and if so it will be there for every value ... XML is
much less conforming. At worst I'd say it's the same.
OK. I'll try to add the "Requires(hint):" to the sqlite as well.

It should only be an extra column of "hint BOOLEAN DEFAULT FALSE"
Post by James Antill
Post by Anders F Björklund
Post by James Antill
However there is a cost to having 4-8 different versions of
primary/filelists/etc. ... both in createrepo time, in hosting disk
space, in maintenance of all the weird code paths and in repomd.xml size
(most repos. aren't using metalink, so repomd.xml is downloaded a lot).
I'm not sure where this 4-8 number came from. We were talking
about 2,
or 3 if you want to include the .sqlite files too (which are
different).
repomd.xml
primary.xml.gz (or primary.xml)
primary.xml.xz (or primary.xml.lzma)
primary.sqlite.bz2
primary.gz
primary.lzma
primary.xz
primary_db.bz2
primary_db.xz
primary_solv.xz
[...]
Everything ? No, the patch was to add just "primary_lzma".
(with the addition of "filelists_lzma" and "other_lzma")
There is no need to have both of .lzma and .xz, and the
.sqlite and .solv are mostly useful for yum and zypper...

The only other addition I made was to add an ".index"
file, so that one could seek a specific pkgid quickly.
(it's just a text file with "$pkgid\t\$offset\n" lines,
and to the uncompressed stream so only one index needed)

But that index file is also easy to compute afterwards.
Sample program was like 50 lines of python or something.
So for a generic repo there would be only be *two* files,
the "compat" .xml.gz and either of .xml.xz / .sqlite.bz2

primary.xml.gz
primary.xml.xz
Post by James Antill
mini_primary.xz
[...]
...but, obviously that is "in the future", as no code has been written
for any of the proposed repodata formats.
Sounds like a totally different discussion, as per thread ?

The "primary_lzma" type addition was definitely here-and-now.

--anders
seth vidal
2010-08-05 12:44:29 UTC
Permalink
Post by Anders F Björklund
Post by James Antill
I would disagree, you can do a single call in sqlite to see if a table
or column exists and if so it will be there for every value ... XML is
much less conforming. At worst I'd say it's the same.
OK. I'll try to add the "Requires(hint):" to the sqlite as well.
It should only be an extra column of "hint BOOLEAN DEFAULT FALSE"
The requires(hint) you're talking about - do you mean the requires(pre)
information? If so - that's stored in the sqlite db.

-sv
Anders F Björklund
2010-08-05 12:57:17 UTC
Permalink
Post by seth vidal
Post by Anders F Björklund
OK. I'll try to add the "Requires(hint):" to the sqlite as well.
It should only be an extra column of "hint BOOLEAN DEFAULT FALSE"
The requires(hint) you're talking about - do you mean the requires
(pre)
information? If so - that's stored in the sqlite db.
It's for RPMSENSE_MISSINGOK, suggests/enhances hint.

And yes, it is stored similarly to RPMSENSE_PREREQ

--anders
seth vidal
2010-08-05 13:08:35 UTC
Permalink
Post by Anders F Björklund
Post by seth vidal
Post by Anders F Björklund
OK. I'll try to add the "Requires(hint):" to the sqlite as well.
It should only be an extra column of "hint BOOLEAN DEFAULT FALSE"
The requires(hint) you're talking about - do you mean the requires
(pre)
information? If so - that's stored in the sqlite db.
It's for RPMSENSE_MISSINGOK, suggests/enhances hint.
And yes, it is stored similarly to RPMSENSE_PREREQ
Ah - missingok, that's special to only specific rpm versions.

-sv
Matthew Dawkins
2010-08-05 15:05:58 UTC
Permalink
Post by seth vidal
Post by Anders F Björklund
Post by seth vidal
Post by Anders F Björklund
OK. I'll try to add the "Requires(hint):" to the sqlite as well.
It should only be an extra column of "hint BOOLEAN DEFAULT FALSE"
The requires(hint) you're talking about - do you mean the requires
(pre)
information? If so - that's stored in the sqlite db.
It's for RPMSENSE_MISSINGOK, suggests/enhances hint.
And yes, it is stored similarly to RPMSENSE_PREREQ
Ah - missingok, that's special to only specific rpm versions.
To rpm5??? which is another progressive project that we support and would
like to see better supported with createrepo.
seth vidal
2010-08-05 15:12:52 UTC
Permalink
Post by Matthew Dawkins
Post by Anders F Björklund
Post by seth vidal
Post by Anders F Björklund
OK. I'll try to add the "Requires(hint):" to the sqlite
as well.
Post by Anders F Björklund
Post by seth vidal
Post by Anders F Björklund
It should only be an extra column of "hint BOOLEAN
DEFAULT FALSE"
Post by Anders F Björklund
Post by seth vidal
The requires(hint) you're talking about - do you mean the
requires
Post by Anders F Björklund
Post by seth vidal
(pre)
information? If so - that's stored in the sqlite db.
It's for RPMSENSE_MISSINGOK, suggests/enhances hint.
And yes, it is stored similarly to RPMSENSE_PREREQ
Ah - missingok, that's special to only specific rpm versions.
To rpm5??? which is another progressive project that we support and
would like to see better supported with createrepo.
rpm5 is a fork and one I do not and will not be supporting anytime soon.

I want nothing to do with it and that's really the end of the
discussion, imo.

-sv
Robert Xu
2010-08-05 15:26:18 UTC
Permalink
Post by seth vidal
        >
        > >> OK. I'll try to add the "Requires(hint):" to the sqlite
        as well.
        > >>
        > >> It should only be an extra column of "hint BOOLEAN
        DEFAULT FALSE"
        > >
        > > The requires(hint) you're talking about - do you mean the
        requires
        > > (pre)
        > > information? If so - that's stored in the sqlite db.
        >
        > It's for RPMSENSE_MISSINGOK, suggests/enhances hint.
        >
        > And yes, it is stored similarly to RPMSENSE_PREREQ
        Ah - missingok, that's special to only specific rpm versions.
To rpm5??? which is another progressive project that we support and
would like to see better supported with createrepo.
rpm5 is a fork and one I do not and will not be supporting anytime soon.
I want nothing to do with it and that's really the end of the
discussion, imo.
the tone doesn't really help the situation here.
--
later, Robert Xu
seth vidal
2010-08-05 15:31:45 UTC
Permalink
Post by Robert Xu
Post by seth vidal
Post by Matthew Dawkins
Post by Anders F Björklund
Post by seth vidal
Post by Anders F Björklund
OK. I'll try to add the "Requires(hint):" to the sqlite
as well.
Post by Anders F Björklund
Post by seth vidal
Post by Anders F Björklund
It should only be an extra column of "hint BOOLEAN
DEFAULT FALSE"
Post by Anders F Björklund
Post by seth vidal
The requires(hint) you're talking about - do you mean the
requires
Post by Anders F Björklund
Post by seth vidal
(pre)
information? If so - that's stored in the sqlite db.
It's for RPMSENSE_MISSINGOK, suggests/enhances hint.
And yes, it is stored similarly to RPMSENSE_PREREQ
Ah - missingok, that's special to only specific rpm versions.
To rpm5??? which is another progressive project that we support and
would like to see better supported with createrepo.
rpm5 is a fork and one I do not and will not be supporting anytime soon.
I want nothing to do with it and that's really the end of the
discussion, imo.
the tone doesn't really help the situation here.
The tone reflects reality and a far deeper history than I want to go
into here.

But more to the point - since the python bindings between rpm and rpm5
have diverged compatibility is no longer guaranteed. Yum and createrepo
will be following rpm.


-sv
Matthew Dawkins
2010-08-05 15:32:01 UTC
Permalink
Post by seth vidal
rpm5 is a fork and one I do not and will not be supporting anytime soon.
I want nothing to do with it and that's really the end of the
discussion, imo.
It's really sad to see such a negative attitude. :-(

I would think a FOSS project is about collaboration and openness to ideas...
I hope for our sake and others that I know that use the rpm5/smart/rpm-md
combo are not going to be effected by such a lop-sided tone.
seth vidal
2010-08-05 15:37:06 UTC
Permalink
Post by seth vidal
rpm5 is a fork and one I do not and will not be supporting anytime soon.
I want nothing to do with it and that's really the end of the
discussion, imo.
It's really sad to see such a negative attitude. :-(
I would think a FOSS project is about collaboration and openness to
ideas... I hope for our sake and others that I know that use the
rpm5/smart/rpm-md combo are not going to be effected by such a
lop-sided tone.
I have far more experience with the maintainers of the rpm5 fork than
you do and all I can say is: Good luck with your project.

-sv
Matthew Dawkins
2010-08-05 16:50:09 UTC
Permalink
Post by seth vidal
I have far more experience with the maintainers of the rpm5 fork than
you do and all I can say is: Good luck with your project.
http://lists.baseurl.org/mailman/listinfo/rpm-metadata
I know about the history, but it sounds like you have a personal vendetta,
which IMHO, life is too short to hold any those.

Please for this project's sake turn it into a project of community
collaboration and not the project of "no". From what I can see you might as
well close the project to any outside influence and bundle
rpm.org/yum/createrepo into a happy little Fedora/RH project.

I like this project and the fact it's not just used by one distro, but I
hope that is not the agenda here?!?
seth vidal
2010-08-05 18:47:35 UTC
Permalink
Post by Matthew Dawkins
Please for this project's sake turn it into a project of community
collaboration and not the project of "no". From what I can see you
might as well close the project to any outside influence and bundle
rpm.org/yum/createrepo into a happy little Fedora/RH project.
I like this project and the fact it's not just used by one distro, but
I hope that is not the agenda here?!?
The only agenda here is to keep the featureset and codebase sane.

All of our code is GPL'd or LGPL'd so you can, of course, take and use
whatever you'd like, in compliance with the license and normal copyright
law.

I will ask that you not duplicate the names of these programs as that
would lead to unnecessary user confusion.

If you have patches you would like to contribute you may, of course,
post them to this list and they may be included or not. Just like any
other project.

Good luck with your projects.
-sv
Matthew Dawkins
2010-08-05 19:19:59 UTC
Permalink
Post by seth vidal
The only agenda here is to keep the featureset and codebase sane.
All of our code is GPL'd or LGPL'd so you can, of course, take and use
whatever you'd like, in compliance with the license and normal copyright
law.
I will ask that you not duplicate the names of these programs as that
would lead to unnecessary user confusion.
If you have patches you would like to contribute you may, of course,
post them to this list and they may be included or not. Just like any
other project.
Well I don't think I would ever fork, but I do have patches which seem to be
not wanted....so thank you for the offer of being able to send you patches,
but knowing before hand they won't ever be considered..... o.O???

Thank you for your time.
James Antill
2010-08-06 14:33:54 UTC
Permalink
Post by Matthew Dawkins
Please for this project's sake turn it into a project of community
collaboration and not the project of "no". From what I can see you
might as well close the project to any outside influence and bundle
rpm.org/yum/createrepo into a happy little Fedora/RH project.
I like this project and the fact it's not just used by one distro, but
I hope that is not the agenda here?!?
Again with the "one distro." retoric? Let me rephrase what Seth said:

. All of the contributors to createrepo are using rpm and not rpm5.

. Most of them (at least) want nothing to do with rpm5, and will never
use it for any reason.

. rpm and rpm5 have _already_ diverged enough that supporting both would
need constant attention.

...if you've made a choice to use rpm5 then you have to live with all
the problems that brings, or reverse your mistake.
It's not our job to do lots of extra work to fix your mistakes ... that
is not, and has never been, the definition of community.
Daniel Veillard
2010-08-06 16:00:37 UTC
Permalink
Post by seth vidal
Post by Matthew Dawkins
To rpm5??? which is another progressive project that we support and
would like to see better supported with createrepo.
rpm5 is a fork and one I do not and will not be supporting anytime soon.
From an historical perspective and with my 15+ years of free software/
open source background, Seth, I do think rpm5 is the original trunk and
rpm the fork. The ownership of the brand goes, at least moraly, to the
person who does the work and not to the entity who paid for it in my
opinion (but being a developper I'm certainly biased). This bites many
many people, unfortunately.
Post by seth vidal
I want nothing to do with it and that's really the end of the
discussion, imo.
Hum, I'm probably one of the people with the most long term interest
in rpm tools and metadata, after all rpmfind[.net] is 13 years old now,
but the tone and some of the further comments on this threads indicates
it's time for me to unsubscribe. Tools are good now, and yum is a large
part of it, but the current situation on cross-distro work is far from
ideal and I don't see that improving at this point.

Daniel
--
Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
***@veillard.com | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library http://libvirt.org/
seth vidal
2010-08-06 16:53:10 UTC
Permalink
Post by Daniel Veillard
From an historical perspective and with my 15+ years of free software/
open source background, Seth, I do think rpm5 is the original trunk and
rpm the fork. The ownership of the brand goes, at least moraly, to the
person who does the work and not to the entity who paid for it in my
opinion (but being a developper I'm certainly biased). This bites many
many people, unfortunately.
You should probably talk to your manager about your copyright assignment
and ownership of the code you write while working for red hat. I think
it'll become clear in no time at all.
Post by Daniel Veillard
Hum, I'm probably one of the people with the most long term interest
in rpm tools and metadata, after all rpmfind[.net] is 13 years old now,
but the tone and some of the further comments on this threads indicates
it's time for me to unsubscribe.
okay.
Post by Daniel Veillard
Tools are good now, and yum is a large
part of it, but the current situation on cross-distro work is far from
ideal and I don't see that improving at this point.
The myriad of forks of rpm don't help that matter at all.

If you want to bring up the discussion of rpm vs rpm5 in fedora and/or
rhel, you're welcome to - but you should know you'll be in for a hell of
a fight.

-sv
Anders F Björklund
2010-08-05 15:33:44 UTC
Permalink
Post by seth vidal
Post by Anders F Björklund
Post by seth vidal
The requires(hint) you're talking about - do you mean the requires
(pre)
information? If so - that's stored in the sqlite db.
It's for RPMSENSE_MISSINGOK, suggests/enhances hint.
And yes, it is stored similarly to RPMSENSE_PREREQ
Ah - missingok, that's special to only specific rpm versions.
RPMSENSE_STRONG wasn't needed, since it only applies to
<rpm:suggests> and <recommends> and not to <rpm:requires>

And those would go in separate tables in the SQL anyway,
but the "strong" boolean would be similar to the "pre"...

--anders
Duncan Mac-Vicar P.
2010-08-04 10:29:43 UTC
Permalink
Post by Anders F Björklund
openSUSE actually uses LZMA, not XZ. Both for RPMS and for repodata.
No. We use lzma for rpm. Repositories use gz.
--
Duncan Mac-Vicar P. - Novell® Making IT Work As One™
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nuernberg)
Anders F Björklund
2010-08-04 10:37:58 UTC
Permalink
Post by Duncan Mac-Vicar P.
Post by Anders F Björklund
openSUSE actually uses LZMA, not XZ. Both for RPMS and for repodata.
No. We use lzma for rpm. Repositories use gz.
The createrepo patch I saw (optionally) used .lzma rather than .xz.
Either LZMA format would leave the .gz too, for compatibility reasons.

But if the repodata uses xz, and you pipe it off to lzmadec there's
a problem. Using xzdec (or unxz) should uncompress both automatically.

--anders
Duncan Mac-Vicar P.
2010-08-04 10:28:18 UTC
Permalink
Post by James Antill
*shrug*, I'm not sure why we'd want to add support for lzma on the
obsolete .xml files. Smart and apt both support .sqlite now, and if
zypper doesn't they can always use modifyrepo to add them for SuSE.
Also calling them "primary_db.xz" would fit the convention with
"groups.gz".
obsolete xml files?

This comes as a surprise for me. We don't support sqlite files because
we don't need to. solv files are faster than any sqlite database.

If rpmmd is moving to sqlite only I think it would be a good reason for
us to move away from rpmmd.

I find really ugly that sqlite is used as an standard because a
performance problem with xml specific to yum and fedora.
--
Duncan Mac-Vicar P. - Novell® Making IT Work As One™
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nuernberg)
seth vidal
2010-08-04 14:28:17 UTC
Permalink
Post by Duncan Mac-Vicar P.
obsolete xml files?
That was the idea.
Post by Duncan Mac-Vicar P.
This comes as a surprise for me. We don't support sqlite files because
we don't need to. solv files are faster than any sqlite database.
Then what does it matter to you if there are xml files or not; if you're
just converting whatever data you have into .solv files?
Post by Duncan Mac-Vicar P.
If rpmmd is moving to sqlite only I think it would be a good reason for
us to move away from rpmmd.
Wow, glad you're not all ultimatum-y.
Post by Duncan Mac-Vicar P.
I find really ugly that sqlite is used as an standard because a
performance problem with xml specific to yum and fedora.
Do you realize how combative your tone is? Is that on purpose?

-sv
Duncan Mac-Vicar P.
2010-08-06 10:02:32 UTC
Permalink
Post by seth vidal
Then what does it matter to you if there are xml files or not; if you're
just converting whatever data you have into .solv files?
Because that will force everyone else to reimplement their tools using a
design that is based on a implementation and not the other way around.
Post by seth vidal
Do you realize how combative your tone is? Is that on purpose?
I am not combative, really. I actually feel sad.

We have invested so much time in following what you guys do in every
possible aspect, the metadata, we moved our updates to updateinfo.xml,
changed the whole wording of channels/sources to repositories.

We took all SUSE weird changes to rpmmd and put them as extensions. We
started to deprecate the old susetags format for rpmmd repositories.

All this just to make the rpm world better and interoperable. The
openSUSE build service can build for SUSE/Fedora/Mdv distros, and of
course the generated repositories are rpmmd-based.

If yum will manage the repodata format tied to Fedora's specific
implementation, that means that _IMO_ rpmmd is just not ready to be the
rpm metadata format.

I am not saying that you should not do it. I am just saying that may be
it was wrong for us to think of rpmmd as a "format" or possible
"standard". We also have investments to protect here, and we need to
define or have an idea what will be the metadata for our repositories in
the long term (because it is not something you want to change every week).

Duncan
James Antill
2010-08-06 16:54:33 UTC
Permalink
Post by Duncan Mac-Vicar P.
Post by seth vidal
Then what does it matter to you if there are xml files or not; if you're
just converting whatever data you have into .solv files?
Because that will force everyone else to reimplement their tools using a
design that is based on a implementation and not the other way around.
Post by seth vidal
Do you realize how combative your tone is? Is that on purpose?
I am not combative, really. I actually feel sad.
We have invested so much time in following what you guys do in every
possible aspect, the metadata, we moved our updates to updateinfo.xml,
changed the whole wording of channels/sources to repositories.
We took all SUSE weird changes to rpmmd and put them as extensions. We
started to deprecate the old susetags format for rpmmd repositories.
All this just to make the rpm world better and interoperable. The
openSUSE build service can build for SUSE/Fedora/Mdv distros, and of
course the generated repositories are rpmmd-based.
It's nice that you've done work internally to be interoperable, we've
done the same thing with yum. It'd be much nicer if you shared more of
the upstream burden :).
Post by Duncan Mac-Vicar P.
If yum will manage the repodata format tied to Fedora's specific
implementation, that means that _IMO_ rpmmd is just not ready to be the
rpm metadata format.
How is it yum/Fedora specific? Zypper is now the only major program
which doesn't understand sqlite for primary/filelists/other. And, as
I've said explicitly before, XML support isn't going to be removed
anytime soon (although I wouldn't be shocked if --database-only because
_default_ within the next 12 months).

And, yes, if you assume that you are going to take a significant
conversion hit everytime the repodata changes ... then it probably
doesn't matter much if that conversion is from XML or sqlite.
But, with my yum hat on, I don't make that assumption and I see no good
reason to do so. It often requires more work, not having a perfect
custom local repo. format, but that's life. And from the point of view
of using what the repodata provides, XML is much slower.
Robert Xu
2010-08-06 17:35:45 UTC
Permalink
 It's nice that you've done work internally to be interoperable, we've
done the same thing with yum. It'd be much nicer if you shared more of
the upstream burden :).
Ok, If more distributions shared the upstream burden, it'd be nice.
But I'm still seeing repomd geared toward yum and fedora. You're not taking into
consideration everyone's needs as well! You call repomd and rpm an
open-source project
where everyone can help, I see a Red Hat project with a dictatorship. Right now,
rpm5 sounds much more open source.
 How is it yum/Fedora specific? Zypper is now the only major program
which doesn't understand sqlite for primary/filelists/other. And, as
I've said explicitly before, XML support isn't going to be removed
anytime soon (although I wouldn't be shocked if --database-only because
_default_ within the next 12 months).
Answer that yourself. Think about it. We'll give you some time.
 And, yes, if you assume that you are going to take a significant
conversion hit everytime the repodata changes ... then it probably
doesn't matter much if that conversion is from XML or sqlite.
 But, with my yum hat on, I don't make that assumption and I see no good
reason to do so. It often requires more work, not having a perfect
custom local repo. format, but that's life. And from the point of view
of using what the repodata provides, XML is much slower.
For yum. Again, we're taking this from a "yum is the standard and so
is red hat's way"
XML is slower to YUM. Not some other tools like zypper.

Think about this: What if every distribution switched to rpm5? Not
likely, but what if they did?
I've seen rpm5 packages popping up everywhere (surprising number, and
it's growing).
Many distributions are already moving over to rpm5. Be wise about decisions.
--
later, Robert Xu
seth vidal
2010-08-06 18:05:55 UTC
Permalink
Post by Robert Xu
where everyone can help, I see a Red Hat project with a dictatorship. Right now,
rpm5 sounds much more open source.
Sigh - dictatorship... really? That's your word choice?

You don't think that's a bit dramatic?

As far as rpm being open source - rpm.org is right there - you're
welcome to talk to the maintainers Not all of them work for red hat,
last time I checked. Keep that in mind.
Post by Robert Xu
Post by James Antill
How is it yum/Fedora specific? Zypper is now the only major program
which doesn't understand sqlite for primary/filelists/other. And, as
I've said explicitly before, XML support isn't going to be removed
anytime soon (although I wouldn't be shocked if --database-only because
_default_ within the next 12 months).
Answer that yourself. Think about it. We'll give you some time.
Wow, that remark is rude and unnecessary. Please be civil. (You too,
James)
Post by Robert Xu
For yum. Again, we're taking this from a "yum is the standard and so
is red hat's way"
XML is slower to YUM. Not some other tools like zypper.
Great. and?

This list has been overwhelmingly silent for a while. I used to post
more here but when it became obvious that no one was listening I
stopped.
Post by Robert Xu
Think about this: What if every distribution switched to rpm5? Not
likely, but what if they did?
If rhel ever switched to using rpm5 let's just say I'll be very
surprised. It's not outside the realm of possibility but I personally
think it is unlikely.
Post by Robert Xu
I've seen rpm5 packages popping up everywhere (surprising number, and
it's growing).
Many distributions are already moving over to rpm5. Be wise about decisions.
I know you don't believe me, but I am being wise about decisions.

I do not think rpm5 is a wise decision.

As I said before - I'm not making any changes that explicitly rule it
out - but I'm also not caring one way or another about compatibility
with that fork.

Remember - the metadata, which this list is devoted to, is a format
standard Createrepo is a reference implementation that A LOT of people
happen to use. you're welcome to write another implementation that
creates the same format.

I know there is at least one branch of createrepo out there that handles
the softdeps included in pkgs.


-sv
James Antill
2010-08-04 15:00:25 UTC
Permalink
Post by Duncan Mac-Vicar P.
Post by James Antill
*shrug*, I'm not sure why we'd want to add support for lzma on the
obsolete .xml files. Smart and apt both support .sqlite now, and if
zypper doesn't they can always use modifyrepo to add them for SuSE.
Also calling them "primary_db.xz" would fit the convention with
"groups.gz".
obsolete xml files?
This comes as a surprise for me. We don't support sqlite files because
we don't need to. solv files are faster than any sqlite database.
I'm sure solv files _could_ take 0.00001 instead of .sqlite's 0.01 to
do operations, I don't find that compelling though.
Post by Duncan Mac-Vicar P.
I find really ugly that sqlite is used as an standard because a
performance problem with xml specific to yum and fedora.
If you want to say "yum sucks, zypper rules" feel free ... on your
blog.

However, I do recommend you refrain from arguing that zypper converting
XML to solv files is faster than yum not doing any conversion ... it
seems less "I do bad benchmarks" and more "I'm insane", and even people
not familiar with yum can notice it.
Really, you are disappointing us, we are used to much better
misinformation campaigns.
Duncan Mac-Vicar P.
2010-08-06 10:28:51 UTC
Permalink
Post by James Antill
If you want to say "yum sucks, zypper rules" feel free ... on your
blog.
No, I don't want to say that. And I have never done it. I am sorry if my
blog ever offended you. IIRC I never posted any opinion there. Just numbers.
Post by James Antill
However, I do recommend you refrain from arguing that zypper converting
XML to solv files is faster than yum not doing any conversion ... it
seems less "I do bad benchmarks" and more "I'm insane", and even people
not familiar with yum can notice it.
Really, you are disappointing us, we are used to much better
misinformation campaigns.
I don't understand what you are trying to say there and about the
benchmarks.

Duncan
Duncan Mac-Vicar P.
2010-08-04 10:25:00 UTC
Permalink
Post by seth vidal
More likely than not most of the metadata will be transitioned to
sqlite-db ONLY and no xml at all. Yum already supports repos with only
sqlite dbs and no xml (other than repomd.xml).
openSUSE does not use sqlite databases as we convert the xml into solv
files on the client. So I hope xml files doesn't dissapear for what we
considered (rpm-md) to be a potential standard in the rpm world.

If we ever do something on the server side, it would be solv files also.
But that would be ZYpp specific and probably implemented as extensions.
Post by seth vidal
My biggest reticence in adding the xz support now is that if we move to
changing the repodata format to the layout that's been discussed on here
and on yum-devel list that we'll be able to move to xz support(and some
future proofing for compression formatting) w/o having to kludge a bunch
of xz compression crap into createrepo.
Can you point us to that layout description?
--
Duncan Mac-Vicar P. - Novell® Making IT Work As One™
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nuernberg)
seth vidal
2010-08-04 14:29:15 UTC
Permalink
Post by Duncan Mac-Vicar P.
Post by seth vidal
My biggest reticence in adding the xz support now is that if we move to
changing the repodata format to the layout that's been discussed on here
and on yum-devel list that we'll be able to move to xz support(and some
future proofing for compression formatting) w/o having to kludge a bunch
of xz compression crap into createrepo.
Can you point us to that layout description?
sure.
Discussion starts here:
http://lists.baseurl.org/pipermail/yum-devel/2010-June/007122.html

and goes on for a ways.

-sv
seth vidal
2010-08-03 15:07:05 UTC
Permalink
Post by Matthew Dawkins
Hello, I'm new to the list, but not new to using createrepo and yum
for repo creation.
I'm looking to find if there are not patches out there to add an
option to calling for lzma or xz compression of the metafiles instead
of gzip. I have seen what opensuse uses to compress the primary.xml
in conjunction with the gzip compression and I have extended on it to
compress the other and filelists files too, but they don't really add
clean functionality.
there aren't any upstream, yet. I just got a python liblzma pkg into
fedora to support that compression natively in python. So that's half of
it. The biggest trick is having to name the datatype in repomd.xml to
reflect the compression format since older yums (and apts and smarts,
etc) won't necessarily be able to handle lzma compressed files.

-sv
Loading...