Session Start (FreeNode:#nda): Sun Dec 25 21:39:44 2005
*** #nda: sebastian-mares idimkovic @JohnV 
*** #nda was created on Sun Dec 25 21:26:58 2005.
sebastian-mares: Testing, testing...
idimkovic: :)
sebastian-mares: Now I have to get Francis in here. :-P
idimkovic: keeping fingers crossed
idimkovic: :)
sebastian-mares: Roberto and Darryl wanted to join and give some advice, but they are both offline.
sebastian-mares: Anyways...
sebastian-mares: Well, e-mail sent, now I hope he finds his way here.
idimkovic: let's hope so
idimkovic: if he needs some help
idimkovic: he can always send email to me or Juha
sebastian-mares: Sure.
idimkovic: we do online e-mail chatting sometimes, too ;-)
JohnV: well.. I'll see it like this: there was a bug in ABR bitreservour allocation. There is slight quality drop in case of hard to encode material with the version used in the test. Shouldn't be a big problem when this is fixed.
sebastian-mares: Yeah, but the bug can lead to problems in the current test.
idimkovic: I seriously doubt so
sebastian-mares: When people listen to the first part of the song, which has a higher bitrate and quality then the rest, Nero gets a higher ranking that it would get in real-world.
idimkovic: the bit rate was not that higher
idimkovic: and the quality impact was minimal
idimkovic: besides
idimkovic: it could be claimed that people listening to the second part found it worse
idimkovic: but in fact difference was rather minimal
idimkovic: which was the reason
idimkovic: why it wasn't spotted at all :(
idimkovic: before the test
sebastian-mares: The people who listened to the second parse rated it the way Nero deserves, since that is the quality that is used for the remaining track.
JohnV: It's very small quality impact for anything else except high attack section etc where it can be audible for more people.
JohnV: It's not how the ABR will work either, it was bugged. 
JohnV: The true quality will be high with no quality drop with high attack sections
JohnV: after the bug fix
sebastian-mares: Well, after the bug fix.
idimkovic: well
idimkovic: besides
idimkovic: I worked on this this afternoon as well as on the new IS
sebastian-mares: But so far, the current results will have no relevance to real-world usage.
JohnV: cause the reservour won't be starved
idimkovic: and the test version was in fact sent to Juha for test
idimkovic: I would gladly send it to you and Guru as soon as Juha OKs it
idimkovic: so we can be sure ABR works good for the HE-AAC test
sebastian-mares: So, as Darryl, Francis and Roberto suggsted, the best thing would be performing the Tukey's HSD analysis with Nero's results included, but taking it out from the plot then.
sebastian-mares: Or instead of taking it out totally, separating it from the other contenders.
idimkovic: well it is up to you Sebastian, I dont' really think there was significant impact for such a hard measure
sebastian-mares: Well, the same can be said about the WMA standard problem.
JohnV: me neither, the bitrate drop doesn't lower quality that much. Even Guru didn't notice it until in the attack sample at first.
idimkovic: in fact I don't really think it can be compared to WMA standard
idimkovic: the difference is >much< smaller
sebastian-mares: I had to replace it since testing 2-pass on small samples would have resulted in different results than on day-to-day usage.
sebastian-mares: Yes, but then noticed it in other samples, too.
idimkovic: yes, but WMA would generate huge differences very easy to spot
sebastian-mares: Well, a difference of 25 kbps is pretty high.
sebastian-mares: And as I said, it happens with ALL samples.
idimkovic: well I would more be concerned on the quality impact
JohnV: imo, the listeners SHOULD listen the whole sample and make the judgement based on that. That is the idea anyway.
sebastian-mares: Whether it's the run-in time or the bit reservoir, that doesn't matter.
sebastian-mares: Yes, but that cannot be controlled.
JohnV: If there would be a significant quality drop, they would hear this as artifacting and judge accordingly
sebastian-mares: Imagine the following...
sebastian-mares: Francis and others suggested using smaller samples for listening tests.
sebastian-mares: 5 seconds and that kind of material.
JohnV: but smaller samples are not used
sebastian-mares: In such case, Nero would actually produce a very good quality because of its run-in time.
sebastian-mares: Second...
JohnV: but it's the same if there was artifacting because of bad psychoac or something after 15 sec. The listener should hear it if it's significant.
JohnV: As far as I see it, the whole track is there to be judged, if there is a quality drop, there is. If not for most, they don't notice it.
idimkovic: and in fact, nobody else except Fransis really noticed - we did as many tests as possible in the preparation time
idimkovic: and nobody really found that out
JohnV: And after the fix, it shouldn't be noticeable in attack samples either.
sebastian-mares: Francis also tested something else...
sebastian-mares: He encoded LesJoursHeureux once as sample (30 seconds clip) and once as full file and then extracted the 30 seconds from the full track encoding.
sebastian-mares: iTunes, AoTuV and LAME produced a very, almost-identical bitrate distribution, while Nero something totally different.
idimkovic: well I already said - due to this bug, quality-wise only hardest samples were affected
JohnV: it is also quite questionable if Nero is separated just because something should sound good and something maybe worse, but you think people wouldn't hear the worst sounding section. I mean.. the whole track is there.
JohnV: Imo let the people judge. Anyway this bug will be fixed.
idimkovic: (if it is not already)
idimkovic: how I see things here
idimkovic: - there was a bug in the bit-rate manager, where bit reservoir was drawn out
sebastian-mares: (Link: http://maresweb.homeip.net/listening-test/guru.eml)http://maresweb.homeip.net/listening-test/guru.eml
sebastian-mares: Save to disk and open with Outlook, Thunderbird, whatever.
idimkovic: - this bug was not found out in the pre-test 
idimkovic: - nobody in the internal pre-test complained about the quality issues
idimkovic: and, so far, Fransis is complaining only - and honestly I don't think many others would notice any degradation, except if feeded in by something pathological
idimkovic: we will of course fix this
idimkovic: but I somehow find a measure of excluding it from the test a bit too hard since the quality impact is IMHO neglectable
sebastian-mares: Others didn't complain because maybe they didn't listen to the whole track.
idimkovic: come on Sebastian
idimkovic: if the quality difference is there
sebastian-mares: And it is.
idimkovic: there is not a single chance that people won't notice it
sebastian-mares: As ABX logs show.
sebastian-mares: I have them.
sebastian-mares: Guru posted additional ones via e-mail.
idimkovic: Guru did
idimkovic: but I find it very questionable it is of big importance for majority 
idimkovic: nowhere near WMA bitrate deviation
JohnV: it has to be assumed that people listen the whole track. People can't know before hand where artifacting appears.
idimkovic: => CONCLUSION: Nero ABR gives allocates more bits when the
idimkovic: encoder is starting, less after few seconds. It sometimes easy
idimkovic: to ABX (sometimes it's not ABXable, at least for me).
idimkovic: that is Guru's comment
idimkovic: if it is "sometimes not ABXable" I can be pretty sure that for huge majority it is far from ABXable ;)
idimkovic: where is he by the way :)
sebastian-mares: But the main problem is not really if people listen to the same track or not. The problem is that Nero shows a different  behavior with samples than it does with full tracks.
idimkovic: ok this is true but
idimkovic: how big is this deviation
sebastian-mares: And we removed WMA Std. because we wanted results that are meaningful.
idimkovic: and is it so relevant that it could change the result significantly
idimkovic: I don't think so really
sebastian-mares: So, problem now is... Why did I remove WMA Std. when the Nero results aren't meaningful either?
idimkovic: if I did I would be first to ask for the recall honestly
idimkovic: but - "meaningful" is questionalble
JohnV: the difference is very small, and it will be fixed. And this problem also appears in the test samples.
sebastian-mares: Well, I believe that you are honest, but the rest sees you as Nero developer.
sebastian-mares: And as a person who tries to make Nero appear in best light.
idimkovic: well I am practical
JohnV: why isn't Nero results meaningful? The samples are long enough to this bug to take effect.
sebastian-mares: So, because of this "bonus" Nero has at the beginning of the track, it can be over-rated.
idimkovic: I do believe these deviations are not significant for the huge majority
JohnV: or underrated
idimkovic: it can be under-rated
idimkovic: because of the second part
idimkovic: (and I don't think so)
sebastian-mares: No, it is not under-rated.
JohnV: because after the fix it's better
idimkovic: it would not help me or Nero either ;)
sebastian-mares: The second part is not worse.
sebastian-mares: It's how the encoder performs.
idimkovic: well if the second part is NOT worse
sebastian-mares: Only the first 20 seconds have a higher bitrate.
sebastian-mares: Ivan, I sent you bitrate distribution graphs from Guru.
JohnV: it can be underrated in a sense that if it's bad, people will notice it and give bad score, but after the fix it will be better
sebastian-mares: They show how the first 10 to 20 seconds have bitrates of 155 kbps and then they drop to 130 kbps and remain steady at that bitrate.
JohnV: have you checked the Vorbis bitrate distribution? Isn't it giving high bitrate all the time almost?
sebastian-mares: So it's not like nero encodes first part with 192 kbps and the second one with 64 kbps making an avg. of 128.
JohnV: sebastian-mares: not true
JohnV: it's not like that at all
JohnV: that's a huge exaggeration
sebastian-mares: What is a huge exaggeration?
sebastian-mares: Could you kick Ivan?
JohnV: the quality drop is _minimal_ for most people and it's maybe noticeable in hard to encode samples at the moment for some.
JohnV: it's not at all any 192 vs 64
sebastian-mares: I know.
sebastian-mares: That's what I said.
JohnV: the difference is minimal
sebastian-mares: But you said it can be under-rated, which is not true.
sebastian-mares: Since after the 155 kbps part, Nero encodes at 130 kbps which is what it continues to encode to over the whole file.
JohnV: well in a way it can, especially after this is fixed. :)
*** idimkovic2 has joined #nda.
idimkovic2: huh
idimkovic2: now we have two idimkovic nicknames 
idimkovic2: could someone paste me please what has been discussed?
sebastian-mares: So you can't say "Oh, it is under-rated because people rated the 64 kbps part)".
sebastian-mares: sebastian-mares: Only the first 20 seconds have a higher bitrate.
 sebastian-mares: Ivan, I sent you bitrate distribution graphs from Guru.
 JohnV: it can be underrated in a sense that if it's bad, people will notice it and give bad score, but after the fix it will be better
 sebastian-mares: They show how the first 10 to 20 seconds have bitrates of 155 kbps and then they drop to 130 kbps and remain steady at that bitrate.
 JohnV: have you checked the Vorbis bitrate distribution? Isn't it giving high bitrate all the time almost?
 sebastian-mares: So it's not like nero encodes first part with 192 kbps and the second one with 64 kbps making an avg. of 128.
 JohnV: sebastian-mares: not true
 JohnV: it's not like that at all
 JohnV: that's a huge exaggeration
 sebastian-mares: What is a huge exaggeration?
 sebastian-mares: Could you kick Ivan?
 JohnV: the quality drop is _minimal_ for most people and it's maybe noticeable in hard to encode samples at the moment for some.
 JohnV: it's not at all any 192 vs 64
 sebastian-mares: I know.
 sebastian-mares: That's what I said.
 JohnV: the difference is minimal
 sebastian-mares: But you said it can be under-rated, which is not true.
 sebastian-mares: Since after the 155 kbps part, Nero encodes at 130 kbps which is what it continues to encode to over the whole file.
 JohnV: well in a way it can, especially after this is fixed. :)
idimkovic2: honestly
idimkovic2: the problem is
idimkovic2: what we are talking here is bitrate deviation
idimkovic2: but it would be more important to check QUALITY deviation
idimkovic2: because
idimkovic2: people are not rating bitrate plots
idimkovic2: they are rating quality
JohnV: right
idimkovic2: bitrate deviation had a bug
sebastian-mares: Yes, but in this case, there is a sudden drop in quality, too.
idimkovic2: this is what IS questionable
sebastian-mares: Otherwise, Guru wouldn'Ve noticed it.
idimkovic2: not really
idimkovic2: Guru is extremely trained listener
JohnV: the tracks have certain quality, and people are gonna judge it. That is the point.
sebastian-mares: Yes.
idimkovic2: I am quite sure he would pick up 1 or 2 kbps difference
idimkovic2: but this is not the point
idimkovic2: the point is
idimkovic2: would THIS change the results with statistical significancy
idimkovic2: I honestly don't think so
idimkovic2: there is absolutely no way
idimkovic2: because - if that was the case, this would have been spoted long time ago
JohnV: this is the way the track sound with this bug. After the fix the reservour won't starve and it will be give also hard to encode parts more bits
idimkovic2: so, at the end of the day
idimkovic2: it will just sound better (I hope ;)
idimkovic2: allocating more where it SHOULD
idimkovic2: @Sebastian - did you get in touch with Francis
sebastian-mares: Yes, I am wondering why he's not appearing.
sebastian-mares: Anyways, what I wanted to say is that it's natural that the quality is going to drop.
sebastian-mares: Since Nero is using the reservoir for a reason.
sebastian-mares: If it wasn't drained, it would've used it for the rest, too.
JohnV: With this bug the bitrate stays around 130kbps which gives very good quality on most material, for very hard to encode attacks etc. of course higher bitrate helps, and it will be like that.
idimkovic2: not really
idimkovic2: it would drop
idimkovic2: if and only if
idimkovic2: the bitres goes to zero
idimkovic2: and it is not the case
idimkovic2: from my todays work - it was a bug where it went from 500K to ~10 
idimkovic2: and then up and down
idimkovic2: which is enough for most of the material
sebastian-mares: So you claim that the bitrate drops, but the quality stays the same?
idimkovic2: no
sebastian-mares: How should I understand it then?
idimkovic2: bitrate does drop, but still having buffering capabilities for minimal impact on quality
idimkovic2: AAC starts to suck if the bit reservoir is drained
idimkovic2: so to put it this way - at the beginning where there is a bug
sebastian-mares: Second...
idimkovic2: if the frame is overcoded - that won't be noticed as improvement by many
idimkovic2: oly maybe by Guru and few other people - because it was >above the JND<
idimkovic2: so if you overcode above the JND - that is more or less useless work :(  and this is a pity
sebastian-mares: (Link: http://foobar2000.net/divers/tests/2005.12/AAC_Nero_Digital.png)http://foobar2000.net/divers/tests/2005.12/AAC_Nero_Digital.png
idimkovic2: point is
sebastian-mares: After the bitrate drop, there isn't much variation with bitrate.
idimkovic2: quality does not scale with the bit rate linearly
idimkovic2: this is a very wrong idea
idimkovic2: you cannot look at second plots
idimkovic2: you must look at individual frame plots
sebastian-mares: And a bitrate drop of 25 kbps must have an impact on quality.
idimkovic2: which is not so significant
JohnV: sebastian-mares: not necessarely
idimkovic2: because average bitrate requirement is around 128-135 kbps
idimkovic2: so when you overcode - impact is much less
idimkovic2: than undercoding it
idimkovic2: less than 128
idimkovic2: that's where noise ABOVE threshold kicks in
idimkovic2: threshold = psychoacoustic threshold
idimkovic2: suppose that you have a tone
idimkovic2: and a masker
idimkovic2: and masker is 10 dB below the tone -> that is JND
idimkovic2: and it requires, say, 10 kbits
idimkovic2: if you overcode - it is still below average human JND
idimkovic2: impact = small
idimkovic2: if you UNDERcode - it gets above JND, impact = big
idimkovic2: now, average music material bitrate requirement for AAC is well around 128 kbps
JohnV: <sebastian-mares> And a bitrate drop of 25 kbps must have an impact on quality. <- if there is, then we should see it in the results. And it's our bad.
idimkovic2: so overcoding at the beginning means something maybe only to very golden ears
idimkovic2: for the rest of the world.... I seriosly doubt that they would notice
idimkovic2: and if you want proof about variations
idimkovic2: ask Guru to use some frame plotting tool
idimkovic2: to see >frame< bit distribution
idimkovic2: if the frame bit distribution was fixed to 3000 bits/frame (128 kbps) -> we would be screwed badly
idimkovic2: this whole argument
idimkovic2: would make much mroe sense at 96 kbps
idimkovic2: or 80
idimkovic2: where such overcoding would have impact
idimkovic2: but later surely people would indeed notice drop
sebastian-mares: Second...
idimkovic2: and, if we had fixed frames later - hehe boy people would hear that :)
idimkovic2: ok
sebastian-mares: Trying to find out what's wrong with Guru...
*** idimkovic has signed off IRC (Connection timed out).
sebastian-mares: OK, no idea what's up.
idimkovic2: doh nobody is commenting my IS samples ;-)
sebastian-mares: My main concern is that Nero performs differently when encoding samples than when encoding full tracks.
JohnV: it's Christmas day ffs :) I've been busy all day with my god son. :B
idimkovic2: hahaha
idimkovic2: @Sebastian - thing is I don't believe this bug affects scores significantly
idimkovic2: if it did
sebastian-mares: Well, it does affect them.
idimkovic2: it would be surely not fair
idimkovic2: but not significantly
JohnV: sebastian-mares: well the same difference should be there audible then in the samples then also. They are long enough
sebastian-mares: I mean, it might.
idimkovic2: I would propose to announce this bug
idimkovic2: I have nothing against that
sebastian-mares: And that is why I think the best solution would be not to remove Nero entirely like Darryl suggested, but separate from the others.
idimkovic2: but indeed I think it would be too harsh to remove Nero
sebastian-mares: Not sure if that is wise.
sebastian-mares: I mean, announce the bug now.
idimkovic2: not now
sebastian-mares: It will change how the test is ran.
idimkovic2: at the end of the test
sebastian-mares: Yeah.
JohnV: sebastian-mares: I don't understand this removing.. The track is there, it has the "bug". People are suppose to rate it.
idimkovic2: no no
idimkovic2: for sure, that would be very bad
idimkovic2: but as for the final results
idimkovic2: I would not remove Nero
idimkovic2: but anyway, it is your call
sebastian-mares: I wouldn't either.
sebastian-mares: And I talked to Guru about this.
sebastian-mares: He has no problem with keeping Nero in.
JohnV: the thing is, this bug takes an effect in these samples. If it's significant, people are suppose to notice it. The task is to judge the samples.
sebastian-mares: Anyways, since Tukey's HSD is calculated with Nero, there is no difference if I include Nero or not.
idimkovic2: in which case >we< would be in the lose
sebastian-mares: I mean, it won't affect final rankings.
idimkovic2: not anyone else
sebastian-mares: Not really...
sebastian-mares: You fail to see one thing...
JohnV: It would be different if the bug wouldn't take effect in these samples, BUT IT DOES.
idimkovic2: well as soon as the bugfix is tested
sebastian-mares: If people listen to the first few seconds. They listen to Nero, AoTuV, LAME.
idimkovic2: how do you know if they do that?
idimkovic2: that is a big assumption
sebastian-mares: Same as the assumption that there is no impact.
JohnV: sebastian-mares: how do you know how people listen? They are suppose to judge the quality of the full track, and find artifacts from the full track.
sebastian-mares: They rate AoTuV which encodes in the same way, regardless if the material is a sample or a full track.
sebastian-mares: Then LAME which is similar to AoTuV.
idimkovic2: the assumption that there is no impact is based on the rather common sense
sebastian-mares: Then they listen to Nero which has a bonus of a high bitrate.
idimkovic2: there is no point in overcoding something that is transparent for 99% of people
idimkovic2: it will >still< be transparent
idimkovic2: it will be a waste of space for us
idimkovic2: because we could use those bits somewhere else
idimkovic2: but , like we discussed, people do not find difference in the second part of the track
idimkovic2: what is the point then?
sebastian-mares: But what if the second part that is encoded with 130 kbps and not 155 kbps like the first 10 to 20 seconds is NOT transparent but people didn't test?
sebastian-mares: Also, consider this...
idimkovic2: I would 
idimkovic2: but seriously I don't believe that is the case
idimkovic2: and also
sebastian-mares: In my samples, the difficult part is in the middle or at the beginning most of the time.
idimkovic2: that would be found out in the pre-test
JohnV: Imo if it's assumed that people only listen the first 2-3 seconds, we can quit these tests alltogether.. because then important artifacting will be missed.
idimkovic2: hmmm
sebastian-mares: So in the spot where Nero still has a higher bitrate.
sebastian-mares: In a full track encoding, the difficult part is no longer in that spot.
idimkovic2: I don't think "hardness" can be judge by that
idimkovic2: (Link: http://foobar2000.net/divers/tests/2005.12/Vorbis.png)http://foobar2000.net/divers/tests/2005.12/Vorbis.png
sebastian-mares: Yes.
sebastian-mares: Did you look at the EML I linked to on my home server?
sebastian-mares: It has a PNG showing the behavior sample vs. full track.
idimkovic2: yes I did
idimkovic2: but, like I said
idimkovic2: overcoding won't bring quality impact for 99% of the people
idimkovic2: and the rest of the track still had bit reservoir and 128 kbps
idimkovic2: so it would be very very very hard for serious quality drop
sebastian-mares: What about the beginning?
idimkovic2: well I already said
sebastian-mares: Doesn't it have 155 kbps and reservoir?
idimkovic2: it was overcoded
idimkovic2: it has 128 kbps and bit reservoir ;)
idimkovic2: but if it was overcoded
idimkovic2: there was absolutely no point
idimkovic2: except, for Guru
idimkovic2: and few other people 
idimkovic2: and STILL 
sebastian-mares: But dude...
sebastian-mares: Let me speak out, please.
idimkovic2: I doub't even Guru would rate second part as bad
idimkovic2: ok
sebastian-mares: In my samples, the complicated part is usually in the middle or towards the beginning. That is, the complicated part is in the range where the bitrate is higher than the rest. On a full track encoding, the complicated part is no longer in a range of 155 kbps, but in the range where Nero allocates ~128 kbps so the quality is worse.
idimkovic2: well according by the VBR posts from Vorbis
idimkovic2: and for Vorbis we can be pretty sure it is pure VBR
sebastian-mares: Guru posted some ABX logs where he managed to find difference between the same part - one time encoded as 20 seconds clip, and one time encoded as full track and then extracted.
sebastian-mares: So there IS a difference between the two modes.
idimkovic2: the demands are not centered
sebastian-mares: Whereas for Vorbis, there isn't.
idimkovic2: there IS of course, I didn't say there is not
sebastian-mares: They are encoded in the same way because Vorbis is VBR.
idimkovic2: hold a sec
idimkovic2: what I claim with Vorbis - is that complexity was not centered at some specific place
JohnV: right, that can be soon from Vorbis' plots
idimkovic2: which can be seen by it's bit rate distribution, which is purely quality based
JohnV: *seen
idimkovic2: and also I don't claim there is no difference with Nero encodings
idimkovic2: I just am pretty sure this difference is far from significant
JohnV: yep
sebastian-mares: Have a look at this FYI:
sebastian-mares: E:\TESTS D'AUTOMNE 2005\VBR start-up\decoded\LesJoursHeureux (12'00 - 20'00 directly encoded as short sample) [WMAPro].wav vs E:\TESTS D'AUTOMNE 2005\VBR start-up\decoded\LesJoursHeureux (12'00 - 20'00 extract from full encoding) [WMAPro].wav
sebastian-mares: 9:26:22 PM f 8/16 pval = 0.598
sebastian-mares: So, pretty much guessed with WMA.
sebastian-mares: Means that they are more or less the same.
idimkovic2: say what
idimkovic2: why don't we give this to 10 people
idimkovic2: and ask them the same :)
idimkovic2: would be very good to check
sebastian-mares: E:\TESTS D'AUTOMNE 2005\VBR start-up\decoded\LesJoursHeureux (12'00 - 20'00 extract from full encoding) [Nero Digital].wav vs E:\TESTS D'AUTOMNE 2005\VBR start-up\decoded\LesJoursHeureux (12'00 - 20'00 directly encoded as short sample) [Nero Digital].wav
sebastian-mares: 9:22:40 PM p 14/16 pval = 0.0020
sebastian-mares: Another picture.
sebastian-mares: When to do so?
idimkovic2: well - Guru did find difference, I think we all agree on that
sebastian-mares: Do it now?
sebastian-mares: Could harm the test.
idimkovic2: maybe Guru can encode some independent samples
sebastian-mares: Do it later?
sebastian-mares: Too late.
idimkovic2: well at the end of the day - I think it is up to you ;)   I just think it would be good to check that difference, I dont' really think it was significant at all
sebastian-mares: Guru is active on HA's PM center...
JohnV: right.. that's why imo it is the best, that people just rate the samples. The effect is there. Let group rate the samples. We can't really start assuminig about how some group will do the testing, ie. just 2-3 secs or something.
sebastian-mares: Wondering what he's up to. :-P
idimkovic2: :)
sebastian-mares: Well, samples are group-rated anyways.
idimkovic2: well I would propose to keep Nero in the table... if you think it is needed, put some *   and write down that it had a bug - and then let's do verification - we would have bugfixed version by then
sebastian-mares: The idea was just to separate Nero result's.
sebastian-mares: Like, move it far right.
idimkovic2: after the test
idimkovic2: we can for sure compare
idimkovic2: bugfixed version 
idimkovic2: vs non bugfixed
sebastian-mares: And writing something like "We are not sure about XYZ because 123".
JohnV: "we are not sure if these results are valid, because we think people only listen the first 2-3 seconds"?
idimkovic2: :-) oh my
sebastian-mares: ...
idimkovic2: well I am not sure about positioning
idimkovic2: and where to put it
idimkovic2: I seriously don't think there is a considerable impact - but still,  if you think it is needed... well
sebastian-mares: Did Guru write you a PM?
idimkovic2: let me check
idimkovic2: no
sebastian-mares: Crapola.
sebastian-mares: I am wondering what he's doing.
sebastian-mares: Would've been much better for him to be here.
sebastian-mares: In this way, I am telling you guys what he told me and I am telling him what you told me.
idimkovic2: yeah
sebastian-mares: And during this process, information can be altered.
idimkovic2: anyway 
sebastian-mares: Which sucks.
idimkovic2: let's also make sure >everything is god damn tested< before the next test, please
idimkovic2: I know it has been only a week before this one
sebastian-mares: Yeah.
idimkovic2: but it is a pity these things pop up now
sebastian-mares: Ah, Guru is back in PM center.
sebastian-mares: :-P
idimkovic2: heh :)
idimkovic2: anyways
idimkovic2: Juha did get new version today (jeez... I have to stop working weekends ;)
idimkovic2: if it is OK - @Juha please then send it to Guru and Seb
idimkovic2: I did frame utilization plots today
sebastian-mares: What do you think about the idea of asking people to compare two encodings, but not mentioning why.
idimkovic2: and they are equal for #41 - 7 minute track
sebastian-mares: What did you use?
idimkovic2: idea is OK
sebastian-mares: Guru used matroska or something.
sebastian-mares: :-B
idimkovic2: maybe you can encode few long tracks
idimkovic2: and ask people to compare them
*** gURuBoOleZZ has joined #nda.
idimkovic2: e.g. few full songs
sebastian-mares: YAY!
idimkovic2: Hey Guru :)
gURuBoOleZZ: it was long
sebastian-mares: Guru, you missed the whole discussion. lol
JohnV: hi hi
sebastian-mares: Now, let's start over.
gURuBoOleZZ: OK, back on TV. Bye
sebastian-mares: Nah!
idimkovic2: again?!?!?!? :-)
sebastian-mares: I am getting some sweets... ^^
idimkovic2: @Guru - we did have quite a hot discussion 
gURuBoOleZZ: So I'm too late?
idimkovic2: anyway
sebastian-mares: Well a bit, but we can summarize it for you.
idimkovic2: well - first thing is
idimkovic2: ti is a confirmed bug
gURuBoOleZZ: it would be nice :)
idimkovic2: and I did make a fix today along with the IS thing I'm posting on HA
idimkovic2: Juha is testing it
idimkovic2: -- Sebastian can tell you the rest ? :)
sebastian-mares: Oh no... I cannot even eat my cookies in peace...
sebastian-mares: Oh well.
gURuBoOleZZ: I guess that the debate was about 3.1.0.2
sebastian-mares: Yes.
gURuBoOleZZ: not the IS story
idimkovic2: yes
gURuBoOleZZ: OK
gURuBoOleZZ: The problem is that samples used in the test are... problematic
sebastian-mares: So, in one sentence, Ivan and Juha think that the problem doesn't have such a huge impact that Nero should be removed from the test.
gURuBoOleZZ: go on. I'll follow when you finish
idimkovic2: because the undercoding does not happen under the 99% user JND at 128 kbps
idimkovic2: and overcoding at the begin
idimkovic2: could have impact only to very small group of people
idimkovic2: like... you :)
sebastian-mares: Like you for example.
sebastian-mares: hehe.
idimkovic2: because if you overcode above the JND
gURuBoOleZZ: sorry, what's JND?
idimkovic2: for most people it is still "transparent"
idimkovic2: Just Noticeable Distortion
idimkovic2: threshold of audibility of the noise
idimkovic2: now 
idimkovic2: the bug with 3.1.0.2 was
idimkovic2: one parameter in the function was wrong (heh it is always the case, ask developers 
idimkovic2: ;)
idimkovic2: and bti reservoir was drained to approx 10-15 kbits
idimkovic2: (which is still a bit larger than CBR bit reservoir)
idimkovic2: instead of not being drained
idimkovic2: anyway
idimkovic2: that produce overcoding at the beginning
idimkovic2: but - I still doubt that is really audible afterwards for huge majority
idimkovic2: leave alone significance on the ITU scoring
idimkovic2: because even the rest gets 128+ kbps
idimkovic2: with bit reservoir
idimkovic2: what I do propose
idimkovic2: when we release the fix
idimkovic2: to compare "fixed" vs  3.1.0.2 on a large track
idimkovic2: not telling why of course
idimkovic2: and see the difference on 5-6-7 people
idimkovic2: is it significant or not
idimkovic2: okay that's all from me ;)
sebastian-mares: The reason for this is...
sebastian-mares: I told Ivan and Juha that for my samples, the complicated part of a song is usually in the range where Nero allocates 155 kbps.
idimkovic2: quick comment: but we disagree no that ;)
sebastian-mares: On a whole track however, the complicated part is where Nero allocates the normal ~130 kbps.
sebastian-mares: Where we disagree is on the question whether or not that is significant.
idimkovic2: also on the complexity
sebastian-mares: Ivan and Juha claim that it is not significant for the large majority.
idimkovic2: I honestly think it is not true - judging from the Vorbis VBR plots
idimkovic2: (for Vorbis we can be sure it is Quality-VBR)
sebastian-mares: Still there Guru? :-B
idimkovic2: we bored him :)
idimkovic2: anyway - vorbis plot ;)   (Link: http://foobar2000.net/divers/tests/2005.12/Vorbis.png)http://foobar2000.net/divers/tests/2005.12/Vorbis.png
gURuBoOleZZ: yes, but it's not easy to follow a debate when two people are writing in the same time
sebastian-mares: Ah, OK.
gURuBoOleZZ: finish?
idimkovic2: me : yep
gURuBoOleZZ: seb too?
sebastian-mares: Ivan says that we should let more people compare a full track encoding against a sample encoding and see if there is an audible difference for the majority.
sebastian-mares: If I got him right.
idimkovic2: yep
JohnV: if Vorbis bitrate distribution followed Sebastians thinking about the complexity of the tracks, it would go down. It doesn't.
sebastian-mares: Huh?
sebastian-mares: I don't get it.
gURuBoOleZZ: Let's the "complexity" question for the moment
gURuBoOleZZ: IMO, samples are not just "complex" on the beginning and quieter on the end
gURuBoOleZZ: SOme of them --maybe most-- are as complex at the end
JohnV: you can see that also from the Vorbis bitrate distribution
idimkovic2: which would make my overcoding argument even more important
idimkovic2: useless waste of bits for most people
gURuBoOleZZ: Not only Vorbis bitrate distribution, but also LAME and iTunes
JohnV: Vorbis is for sure totally quality based VBR. And very good
JohnV: that's why I mentioned Vorbis.
gURuBoOleZZ: LAME too I'd say
gURuBoOleZZ: iTunes less (it doesn't go below 128 kbps)
sebastian-mares: BTW, in case anyone is interested in a bitrate table for iTunes, LAME, Nero and AoTuV: (Link: http://maresweb.homeip.net/listening-test/bitrate%20table.htm)http://maresweb.homeip.net/listening-test/bitrate%20table.htm
gURuBoOleZZ: About the test, the issue and the way to handle this...
JohnV: imo LAME is not that good as Vorbis.. ;)
gURuBoOleZZ: I don't consider as pertinent to make an additionnal test to see if the problem is audible or not for a majority
gURuBoOleZZ: Just for example, we could put LAME -V4 or even -V2 instead of -V5
gURuBoOleZZ: and see if a majority of people can make a difference
gURuBoOleZZ: between -V2 and -V5
gURuBoOleZZ: If they can't, it would be (according to your logic) perfectly acceptable to substitute -V5 with -V2...
gURuBoOleZZ: ...in a 135 kbps listening test
gURuBoOleZZ: Of course it isn't
sebastian-mares: Second, I think you got something wrong...
sebastian-mares: The problem was to find out if there is a difference (quality wise) between the first part of a song and the second.
sebastian-mares: The first part being over-encoded and the second part being encoded the "normal" way.
gURuBoOleZZ: I understand this
idimkovic2: well in this case it is not "normal" way - it got less bits than it >could<
idimkovic2: if the bit reservoir bug was not there
sebastian-mares: One second folks...
idimkovic2: okay
idimkovic2: just quick info - I think it is very suboptimal to look at "bps" plots - individual frame plots are more precise way of distributing bits
sebastian-mares: Since both Juha and you admit that there is / was a bug in the Nero encoder, why don't we simply do what we wanted to do - include Nero in the plot, but separate it from the rest.
sebastian-mares: Since it was a bug and not a non-tuned psymodel for example.
idimkovic2: well if you think it is a good idea - I think the difference is rather too small for that, but like I said, it is up to you
idimkovic2: I think the difference would not require such harsh measure
gURuBoOleZZ: 1.5 ITU point difference with one sample is not something "small"
gURuBoOleZZ: 1.5 = my difference with sample 10
idimkovic2: with one listener, though 
gURuBoOleZZ: And?
idimkovic2: and very exceptional one
JohnV: that's hard to encode attack sample
idimkovic2: okay - but would it statistically change the results
gURuBoOleZZ: to know this, there's only one accurate way
gURuBoOleZZ: restart a new test, with the same samples and the same listeners
gURuBoOleZZ: and the same competitors
gURuBoOleZZ: The rest is very...
JohnV: if 1.5 is for Gutu on a very hard attack sample.. Well.. I don't think it is statistically significant overall
gURuBoOleZZ: well, i don't find the word
gURuBoOleZZ: alchemy?
gURuBoOleZZ: The problems are not "attack"
idimkovic2: huh I don't think it is practical at this point
gURuBoOleZZ: not smearing at least
idimkovic2: if it was discovered 2-3 days in the test
JohnV: sample 10 is attack sample, and at least for me the problems there comes from short blocks
gURuBoOleZZ: yes, but harpsichord is a weird instrument
idimkovic2: it is
idimkovic2: !
idimkovic2: :)
JohnV: but the problems come from short blocks
gURuBoOleZZ: it defeats several lossy encoders and even replaygain
gURuBoOleZZ: there's a problem with the sound located between the attacks
JohnV: but during short blocks anyway
gURuBoOleZZ: what I call "distortion", "tremolo" or trembling effect or sometimes brillianc
JohnV: I know what you mean
gURuBoOleZZ: I'm not sure that smearing or pre-echo is responsible of that
gURuBoOleZZ: not the only cause at least
idimkovic2: it can be draining between short blocks
JohnV: it's because of short blocks anyway
idimkovic2: usually if they eat too much from the bit reservoir,  longs in between would have a problem
gURuBoOleZZ: But sample 10 is not the only one involved. I detect 5 times this problem during the test
gURuBoOleZZ: And I wasn't careful during the 9 first ones
idimkovic2: how do you rate this problem compared with WMA issue?
gURuBoOleZZ: WMA issues may be clearly worse
gURuBoOleZZ: Nero Digital ABR are more subtle
idimkovic2: @Sebastian - this is what I was thinking, too
sebastian-mares: I never said they are the same.
gURuBoOleZZ: But WMA score could benefit or be handicapped by 2-pass method used on short samples
sebastian-mares: I was referring the to the fact that both are not representative in real world usage.
gURuBoOleZZ: On the contrary, Nero Digital encodings are always taking profit
gURuBoOleZZ: or at best the effect is inaudible
sebastian-mares: Because both produce different output when fed with different material type (sample vs. full track).
idimkovic2: I would say the second
idimkovic2: effect is inaudible
JohnV: shouldn't in principle if this is statistically significant all testers notice the same as Guru.
idimkovic2: because, excuse me - if iTunes was always higher bit rate than nero
idimkovic2: it always had more bits
idimkovic2: now if there was something clearly wrong - we would be handicapped a lot and that would be very noticeable
sebastian-mares: What do you mean with iTunes always had a higher bitrate?
sebastian-mares: file:///C:/Server/Freigabe/listening-test/Bitrate%20Table.htm
gURuBoOleZZ: effect is not inaudible when someone noticed it blindly on 5 samples
gURuBoOleZZ: without being informed of this problem
sebastian-mares: Sorry...
gURuBoOleZZ: pure discovery
sebastian-mares: (Link: http://maresweb.homeip.net/listening-test/bitrate%20table.htm)http://maresweb.homeip.net/listening-test/bitrate%20table.htm
JohnV: true, then more people should notice it if it's significant, and judge the score accordingly
gURuBoOleZZ: This is guessing
gURuBoOleZZ: You are probably right
idimkovic2: hmm Seb sorry
idimkovic2: I was still thinking of some HA post
idimkovic2: it was wrong then
gURuBoOleZZ: but between "most" people and "all" people, there's a difference
gURuBoOleZZ: the test is summing the results of all participant, not only the weakest ones
gURuBoOleZZ: by weakest, I mean the less discriminent
gURuBoOleZZ: such people couldn't make any difference between -V5 and -V0
JohnV: the main point is. The samples complexity is throught pretty much the same, if testers find something wrong in the samples, they will score accordingly.
gURuBoOleZZ: but it's not a reason to use the wrong LAME setting
gURuBoOleZZ: either we test the good and valid setting, or we don't test anything
gURuBoOleZZ: but we can't make a mistake, and then ask for a second round to see if a majority of people can hear a difference betwwen the flawed setting and the right one
gURuBoOleZZ: what would you do if 2 persons are able to make a difference and if 8 can't
gURuBoOleZZ: ?
idimkovic2: hmmh well it all goes down to the question - is this bug significant for separating Nero from the overall results
JohnV: people rate the sample worse, if they hear some problems in it. In all our samples there are potentially less quality after about the half of track, than there could be. Which means in principle they should rate the samples worse than after the fix.
gURuBoOleZZ: So what you suggest is to ask to all participating people to compare 18 samples again with oldnero and newnero?
gURuBoOleZZ: newnero = fixednero
gURuBoOleZZ: oldnero = the tested one
JohnV: no, I'm saying that in principle if people do the test as they should, we will be rated worse now, than after the fix.
idimkovic2: well I proposed to leave results and put the (*) mark and note the bug, and then also post the samples with the new encoder
idimkovic2: I think new samples could only be better but still...
JohnV: people lower score when they hear worse quality..
gURuBoOleZZ: Ivan, you can't let Nero in the final test if a problem of representativity was found
gURuBoOleZZ: problem leading to a difference in sound quality
gURuBoOleZZ: *proved* difference
gURuBoOleZZ: even by someone exceptional
sebastian-mares: If I understood correctly, Juha says that if people really noticed a quality drop, they would've rated Nero according to that.
gURuBoOleZZ: the difference exists
idimkovic2: thing is, Francis - I honestly do not believe the problem is significant enough for such a measure
gURuBoOleZZ: Here is how I rated the incriminated samples:
sebastian-mares: And he says that we cannot decide which encoders to throw out and which not based on assumptions that the testers only listened to the beginning of the track.
JohnV: sebastian-mares: sure. We have the track there. And after the half of it, it has lower bitreservour than it should. People rate the codecs based on the problems they hear..
gURuBoOleZZ: part A = x.xx and part B = y.yy. Final note = (X+Y)/2
idimkovic2: so if the problem is there, it will be rated
gURuBoOleZZ: This is what you think. The fact is that Nero benefits from an unbalanced bitrate distribution
idimkovic2: well I think that is not really true 
gURuBoOleZZ: Yes, the problem is rated, but the impact on mark is limited
idimkovic2: because it is a bug
idimkovic2: it over-allocates where it should not
idimkovic2: so there is a difference in noise/mask ratio which is not really the best thing you need
gURuBoOleZZ: there's a big, therefore the problem is true
gURuBoOleZZ: a bug - sorry
gURuBoOleZZ: the over-allocation has a clear impact on sound quality with some samples
idimkovic2: yes
idimkovic2: but it also means under-allocation
idimkovic2: somewhere else
gURuBoOleZZ: it increase the encoding quality compared to a true (full) encoding with the same encoder
idimkovic2: well if you over-allocated somewhere
idimkovic2: you have to under-allocate somewhere else
idimkovic2: for example - new encoder Juha has
gURuBoOleZZ: There's no under-allocation. Not before 90 seconds
gURuBoOleZZ: curves are showing this
idimkovic2: well it is under-allocation Francis
idimkovic2: because if there was no bug
JohnV: but the whole track which has after about half lower bitreservour, and thus in principle lower quality (maybe not audible always), but if it's there, people should rate the sample based on the problems they hear.
idimkovic2: these bits would be allocated in the perceptual way
idimkovic2: I can prove that
idimkovic2: with the new encoder
idimkovic2: usually - it goes like this
idimkovic2: frame_bit_demand = PE * bit_Reservoir_grant()
gURuBoOleZZ: BTW, if there's a under-allocation we would get:
idimkovic2: so - in case of bug
idimkovic2: you get results not scaled with PE
gURuBoOleZZ: nero samples with over-allocation + under-allocation
gURuBoOleZZ: right?
idimkovic2: I would call that - allocation without regard to perceptual demands
idimkovic2: not scaled
idimkovic2: because
idimkovic2: in ABR mode
sebastian-mares: Ivan, but how come Nero maintains the same bitrate after the first part?
idimkovic2: well like this
idimkovic2: ABR works like following
sebastian-mares: If it under-allocate, it would mean that the bitrate must go down towards the end or whatever.
idimkovic2: you have constant bit rate - so each frame >could< have 3000 bits/frame
idimkovic2: which equals to 128 kbps
idimkovic2: ok?
idimkovic2: and the rest is bit reservoir
sebastian-mares: Allocating 32 kbps too much in the beginning, but then also too few at the end.
idimkovic2: which is 500 kbits 
idimkovic2: no - let me explain please :)
sebastian-mares: OK
gURuBoOleZZ: (Link: http://foobar2000.net/divers/tests/2005.12/8_full_tracks.png)http://foobar2000.net/divers/tests/2005.12/8_full_tracks.png
idimkovic2: 500 kbits = 3.90625 seconds of pre-buffering
idimkovic2: so it has 128 kbits + 500 kbits of bit reservoir + X  addition to the bit reservoir which is 2.5% of the total size
idimkovic2: okay?
idimkovic2: so each frame could >always< be 300 bits/frame
idimkovic2: 3000 sorry
idimkovic2: and the bit reservoir status is exactly unchanged
idimkovic2: if the frame is larger than 3000 bits - it has to be taken from bit reservoir
idimkovic2: if it is smaller - it goes TO bit reservoir
idimkovic2: this is how it works
idimkovic2: now - if you need this, I can provide FAAD2 modified to print out individual frame bit demand
idimkovic2: and you can see for sure we do have frames less than average
idimkovic2: otherwise it would be ridiculous
gURuBoOleZZ: I'd like to see a precise analysis of bitrate
gURuBoOleZZ: I posted mine (imprecise)
gURuBoOleZZ: (Link: http://foobar2000.net/divers/tests/2005.12/8_full_tracks.png)http://foobar2000.net/divers/tests/2005.12/8_full_tracks.png
gURuBoOleZZ: but eloquent
gURuBoOleZZ: you can use foobar2000 to watch the bitrate evolution
idimkovic2: hmm let me check if I have faad2
gURuBoOleZZ: again, not very precise
gURuBoOleZZ: but it confirms the curve
gURuBoOleZZ: posted
idimkovic2: yes but
idimkovic2: even later
idimkovic2: there is still bit reservoir, and frames do get more than 128 kbits 
idimkovic2: only less than at the beginning
idimkovic2: and this is a undercoding 
idimkovic2: because otherwise, bit reservoir should be at the 50% in our implementation
idimkovic2: so it would always have bits to "borrow' if there is a short block
idimkovic2: or something
idimkovic2: in this case - it does not have
idimkovic2: and therefore at some very isolated cases difference is audible
gURuBoOleZZ: Interesting, but it doesn't change the most important point: samples provided in the test are different from what the end-user would get
idimkovic2: indeed
gURuBoOleZZ: over-coding + under-coding doesn't mean fair samples
idimkovic2: but then again I think the impact is not so significant
gURuBoOleZZ: it's even worse than pure over-coding
sebastian-mares: And this is the reason why we want to separate Nero from the other contenders in the plot.
gURuBoOleZZ: by worse: far from a true encoding (true= full + sample extraction)
gURuBoOleZZ: In not so significant, the "so" matters
idimkovic2: well I can't argue about that ,  it is just my remark that marking the Nero with * and stating the bug is enough - but if you guys disagree..
gURuBoOleZZ: Could we tolerate an approximation?
idimkovic2: well of course not
gURuBoOleZZ: if yes, why bother with strict bitrate rules (+/- 10%)?
idimkovic2: it is just the amount of penalty
JohnV: But I don't get it.. The over+undercoding is in effect in these samples. People rate the sample based on artifacting. So if the undercoding is significant, it will be rated.
gURuBoOleZZ: if yes, why bother with confidence interval?
idimkovic2: well bit rate rules are something pretty objective and requried
idimkovic2: quality difference of a bug is something else
gURuBoOleZZ: JohnV: it would be rated only if people are evaluating the full sample
idimkovic2: especially if its significance is questionable
gURuBoOleZZ: 30 seconds are very long
gURuBoOleZZ: I rarely do it myself
gURuBoOleZZ: and rather focuse my notation on one or two point
JohnV: gURuBoOleZZ: yes and we can't assume that they aren't. Otherwise we could as well stop these test altogether, because artifacts can be anywhere
idimkovic2: @Guru - and you would probably spot the point which is clearly worse
gURuBoOleZZ: This is why I requested to use 5...6 second long sample
gURuBoOleZZ: without believing in it
gURuBoOleZZ: I don't like either when samples are too short
idimkovic2: well if they were too short
idimkovic2: we would not find this problem
gURuBoOleZZ: Ivan> right
idimkovic2: also, I'm repeating myself - please help us test the encoder for the next test
gURuBoOleZZ: I didn't find it when I encoded my 150 samples gallery with the new encoder
idimkovic2: Guru, your help would be very appreciated
gURuBoOleZZ: samples are 10 sec long on average
JohnV: gURuBoOleZZ: I have pretty much the same problem as you.. I was testing hundreds of samples..
sebastian-mares: WTF?
sebastian-mares: Off Topic...
JohnV: most are 10-20 sec max, and I admit I can't listen all samples through.. I usually know the problematic places where to concentrade.
sebastian-mares: I just received an unencrypted result from someone - how is that possible?
idimkovic2: what the hell
sebastian-mares: From "Antonski".
gURuBoOleZZ: someone who set himself the test
sebastian-mares: Well, I can't use that.
JohnV: because if I listen all samples from beginning to end, it would be like 6 hours or something trough. :B
idimkovic2: no , not really
gURuBoOleZZ: no of course
sebastian-mares: Have to write to poor dude...
gURuBoOleZZ: JohnV= my sample gallery is 25 minutes long
gURuBoOleZZ: my official sample gallery I'd say
gURuBoOleZZ: It doesn't include all the other ones ;)
idimkovic2: @Guru - anyways Juha will send you the new enc
idimkovic2: as soon as it is verified 
idimkovic2: (that it does not make something nasty)
idimkovic2: I hope this one's ABR won't have this problem
idimkovic2: also - I told to Sebastian
idimkovic2: I plan to do "pre pre test" 
idimkovic2: for HE-AAC - testing Nero ABR, CBR and 2 VBR modes
gURuBoOleZZ: I hope this too, otherwise testing wiould become pretty difficult to set
sebastian-mares: Well...
idimkovic2: I find it very hard to tell which one is better of those
sebastian-mares: Second...
idimkovic2: okay
sebastian-mares: The day is almost over, we are derailing the discussion and I still don't know what to do exactly because the opinions are splitted.
sebastian-mares: So...
sebastian-mares: Trying to summarize...
sebastian-mares: Ivan and Juha claim that the difference is not so important and does not affect the overall ratings and therefore Nero should be included in the plot just like other encoders.
sebastian-mares: Juha moreover says that if people noticed a difference, they would've rated the sample according to that.
JohnV: imo the only argument to separate Nero is, that we assume that people don't listen the tracks, but only 5 seconds from the start. Imo in that case if that is true (we can't know really), in future tests it doesn't make any sense to include more than 5-6 second clips.
sebastian-mares: And we cannot decide something based on assumptions that XYZ listened to the first seconds only.
sebastian-mares: Francis on the other hand claims that since the output Nero produces when it's fed with samples is different than the output produced when fed with full tracks justifies the action of splitting the plot.
JohnV: but the separation will be essentially based on the assumption of testers' listening habit.
sebastian-mares: Is that correct so far?
idimkovic2: from my side - yes
gURuBoOleZZ: I have some points to detail, but I'd like to express them without seeing too much lines appearing when I'm writing
gURuBoOleZZ: Is that possible?
sebastian-mares: OK
sebastian-mares: So people, Psss!
sebastian-mares: ^_^
gURuBoOleZZ: Ivan, Juha?
idimkovic2: I'm all ears :)
idimkovic2: eyes... ;)
gURuBoOleZZ: Juha?
JohnV: y
gURuBoOleZZ: in english...
gURuBoOleZZ: ;)
JohnV: y=yes :)
gURuBoOleZZ: y = yes ?
gURuBoOleZZ: ok
gURuBoOleZZ: I'd like to add few comments to JohnV argument
gURuBoOleZZ: which is "if people are hearing the problem, they're rating this in consequence. Therefore, nothing to worry about"
gURuBoOleZZ: Is that OK?
gURuBoOleZZ: Does it correspond to what you mean?
JohnV: in principle, because people lower the score when they hear problems
gURuBoOleZZ: in principle, yes
gURuBoOleZZ: I'm currently the only listener which noticed the problem
JohnV: I wouldn't say "there's noting to worry about". Because this is a stupid mistake, so of course I worry..
JohnV: I don't take this lightly.. that's what I mean
gURuBoOleZZ: Therefore I can talk about the way I rank Nero digital
gURuBoOleZZ: Usually, I'm rating one or two "interesting" point of the sample
gURuBoOleZZ: A short range of 2..5 seconds
gURuBoOleZZ: The final mark is highly dependent of this precise range
gURuBoOleZZ: But sometimes, especially when the sample is divided in 2 different parts, I'm splitting the notation
gURuBoOleZZ: for example: piano + electric guitar
gURuBoOleZZ: piano get 4 and electric guitar gets 2
gURuBoOleZZ: final notation: 4+2/2 = 3
gURuBoOleZZ: only when both part have similar length
gURuBoOleZZ: Now Nero
gURuBoOleZZ: FOr Nero there are sometimes 2 different parts
gURuBoOleZZ: which are corresponding to 2 quality level
gURuBoOleZZ: In the case of LesJoursHeureux (the worse for my taste) there's a big difference between both parts
gURuBoOleZZ: wait 3 seconds
JohnV: But, IMO it doesn't mean much how one person rates, the idea of a group test that everything will be averaged by the group, testing methods included, and then we get the score from this.
*** guruboolez has joined #nda.
guruboolez: hello
JohnV: hi
guruboolez: I was disconnected
guruboolez: dial-up
JohnV: [01:04] <gURuBoOleZZ> wait 3 seconds
JohnV: [01:05] <JohnV> But, IMO it doesn't mean much how one person rates, the idea of a group test that everything will be averaged by the group, testing methods included, and then we get the score from this.
*** You're not a channel operator!
guruboolez: I haven't finished
JohnV: I thought you did :B
guruboolez: What I mean is that for someone like me which found a quality rupture hasn't end the test with a balanced mark
guruboolez: In my case, The fact that the first part sounded better implies that the final mark is higher than what Nero should get
JohnV: that is true, but this happens in a group
guruboolez: Your point, and Ivan's one, is that other people shouldn't notice the problem
JohnV: no, we don't know it.
guruboolez: and that only one have found this out
JohnV: The samples are there for everybody to be rated
sebastian-mares: Can I say something?
guruboolez: yes on my side
idimkovic2: yes of course :)
sebastian-mares: Cool...
sebastian-mares: What Guru wants to say is...
sebastian-mares: You have a car...
sebastian-mares: The whole car is worth 100 $ for example, but has some golden rims.
sebastian-mares: So you pay 300 $ for it, just because of the rims.
idimkovic2: hehe economist ;)
JohnV: you have to think this from a group perspective.. That is imo what matters.
sebastian-mares: The golden rims = Nero's advantage.
JohnV: not individuals
idimkovic2: well @Sebastian
idimkovic2: what would happen
idimkovic2: if we had fixed bit reservoir - it would be more balanced, and Guru might not find second part worse
idimkovic2: Group might not notice something significant either
sebastian-mares: Yes.
guruboolez: It's only a conjecture
idimkovic2: anyways Juha can send Guru the new encoder
guruboolez: You can assume that group can't make a difference between plain-vanilla vorbis and aoTuV
sebastian-mares: Well, if Nero didn't had the "bouns", people would've rated it worse.
idimkovic2: @Sebastian? Why/
sebastian-mares: Same - if the car didn't had the golden rims, you would've paid 100 $ for it.
guruboolez: and therefore claim that aoTuV isn't better than SVN encoders
idimkovic2: it might score better on some other parts
idimkovic2: bit reservoir is fixed
idimkovic2: it would just be used differently
guruboolez: most people were unable to make a difference between r3mix and preset standard
idimkovic2: not in overcoding something that should not be overcoded
idimkovic2: my point is - fixed version can only perform better if the psymodel is OK which I honestly believe it is
guruboolez: but it never mean that "only group matters" and than r3mix is as good as Dibrom's tuning
idimkovic2: and this could be found out in a day or two
JohnV: we don't have any bonus after half of the track, on the contrary..
idimkovic2: ... and the "bonus" at the beginning
sebastian-mares: You do on the samples.
JohnV: we have deficiency after half of the track cause reservour isn't working
idimkovic2: is wasted IMO because 
JohnV: as it should
idimkovic2: it is not allocated as it should 
idimkovic2: according to psymodel
idimkovic2: now Sebastial will probably ask me
idimkovic2: why then we don't separate Nero :)
sebastian-mares: :-)
guruboolez: If a deficiency occur, we should see it on curves or foobar2000
idimkovic2: well there is defficiency
guruboolez: but it's not the case
idimkovic2: because these bits used at the beginning would be used in all track
JohnV: It doesn't matter for people if it's overcoded if they can't hear a difference anyway, what matters in rating is artifacting and problems.
idimkovic2: let's do a small test (Guru ) -  Juha will send you the new encoder
idimkovic2: so you can check how the bits are spent
JohnV: and we have a deficiency due to this bug after half of the track lenghts
guruboolez: the bits used in the beginning are not missing in the rest
sebastian-mares: Yes, but what if Nero would produce severe artificats at the beginningif the bitrate was not 155 kbps, but 130 kbps?
idimkovic2: @Guru - they are missing
sebastian-mares: artifacts*
idimkovic2: they WOULD be used
guruboolez: curves between short samples and full tracks are showing this
idimkovic2: well that is easy to find out
idimkovic2: all I am saying
idimkovic2: in ABR mode bit reservoir is >fixed<
guruboolez: anyway, the problem is not here
idimkovic2: variable part is dependent on the file size
idimkovic2: how does it get spent... it is up to encoder
idimkovic2: and the preferable way to spend it
idimkovic2: is according to the psymodel demand
idimkovic2: if not - you get overcoding where it should not happe
idimkovic2: and less bits for coding something else
idimkovic2: because we can't go over or under the bit reservoir
idimkovic2: in ABR mode
idimkovic2: it is not VBR
guruboolez: let's stop with technical debate
guruboolez: Sebastian is not asking for an explanation
guruboolez: The samples are benefiting from a massive bonus bitrate, and are maybe handicaped on the end (not proved by any bitrate measurement tools)
guruboolez: Even if the bonus and the handicap would be perfectly balanced, the samples are still invalid
guruboolez: They don't reflect anything in real-lif
sebastian-mares: Since they don't represent real-world usage.
sebastian-mares: Yeah.
JohnV: the codec is invalid, samples are not. Samples have the same problems as real life tracks.
guruboolez: This is why Nero could only be disquilified
guruboolez: JohnV: no
sebastian-mares: No, because Nero behaves differently.
JohnV: how so?
guruboolez: The encoder is putting the additionnal bitrate on the beginning of each track
JohnV: the samples illustrate the problem.
sebastian-mares: Dude...
JohnV: the problem was found from these samples
guruboolez: but we haven't tested beginning of tracks, but samples coming from the middle, the end, etc..
sebastian-mares: When 20 seconds from 30 have a high bitrate, 2/3 of the sample is overencoded.
JohnV: looks very real life to me..
sebastian-mares: If 20 seconds of 240 have a higher bitrate, it's something else.
guruboolez: Nero sounds better during 20 seconds only
guruboolez: Problem: we have tested these 20 seconds for all 18 samples
guruboolez: What about the major part of full tracks encodings?
guruboolez: which are encoded at 133 kbps
sebastian-mares: It's encoded like the remaining 10 seconds of the samples.
guruboolez: all the time
JohnV: but the samples are not rated by the length, the are rated according to the artifacting
*** gURuBoOleZZ has signed off IRC (Connection timed out).
*** guruboolez has signed off IRC ().
*** gURuBoOleZZ has joined #nda.
idimkovic2: (Link: http://foobar2000.net/divers/tests/2005.12/AAC_Nero_Digital.png)http://foobar2000.net/divers/tests/2005.12/AAC_Nero_Digital.png   in fact I think it is more leaning towards 10 seconds
sebastian-mares: Yes, but if 1/3 of the track is artifacting, it's different than 98% of the track artifacting.
gURuBoOleZZ: 10 seconds is very much
idimkovic2: - run in
idimkovic2: IIRC the ABCHR app was set to start not at the beginnign?
gURuBoOleZZ: Especially when you see that the remaining part is still encoded with 130...133 kbps, thus NO UNDER-CODING
idimkovic2: beginning, sorry
idimkovic2: well
sebastian-mares: 2 seconds + encoder offset.
idimkovic2: depends what you call "under coding"
idimkovic2: it is under coded compared to the total average bit rate
idimkovic2: which was, IMHO, within the limits
idimkovic2: and in the lower end
gURuBoOleZZ: Pretty simple: target bitrate is 133 kbps
idimkovic2: of all contenders
gURuBoOleZZ: the beginning is at 150 kbps (+15 kbps ~)
idimkovic2: it is not - target bit rate is average bit rate of the sample
JohnV: gURuBoOleZZ: but the reservour is not flexible enough to give advantage it should
gURuBoOleZZ: But the second part is not lowered by 15 kbps compared to the target
idimkovic2: it is lowering of course
gURuBoOleZZ: It's not 150 kbps then 115 kbps => average 130 kbps
idimkovic2: it is N % lower than the average rate of the track
gURuBoOleZZ: It's 150 then 130 instead 133
gURuBoOleZZ: the over-coding is much greater than the under-coding
idimkovic2: if you have N kbits/s of the total bit rate,  and the first 10 seconds use 15% more
idimkovic2: how so?
idimkovic2: how it can be?
gURuBoOleZZ: 133-150
idimkovic2: if you measure the total bit rate of the sample
idimkovic2: yes, but Guru
gURuBoOleZZ: simply observe the bitrate
idimkovic2: if the average bit rate of the sample is 137 kbps
idimkovic2: it cannot be "more overcoded than undercoded"
JohnV: well.. imo this overcoding isn't even very unfair bitrate wise if you check aOtUv bitrates of these samples..
gURuBoOleZZ: The avergae bitrate of the sample MUST be 133 kbps
idimkovic2: this is the fallacy of looking at "average bits per seconds"
*** rjamorim has joined #nda.
idimkovic2: no it must not
rjamorim: BEHOLD
rjamorim: fuckers :)
idimkovic2: heeeey :)
sebastian-mares: Pssst!
rjamorim: hello, earthlings
idimkovic2: CELEBRITY is here :)
gURuBoOleZZ: I got 133 kbps for all tracks I've encoded and which are at least 4 min lon
JohnV: noooo
rjamorim: hahaha
gURuBoOleZZ: hello roberto
idimkovic2: guys I'm really tired
rjamorim: Hello, Francis :)
idimkovic2: let someone else tells Roberto what we discussed
gURuBoOleZZ: I must left too
idimkovic2: :-)
sebastian-mares: lol
sebastian-mares: Poor Roberto.
rjamorim: Aw :(
idimkovic2: anyway... I have to go
idimkovic2: as if anyone is interested
rjamorim: OK, seeya Ivan
idimkovic2: I am still with my opinion
JohnV: bye
idimkovic2: about ranking
idimkovic2: :-)
sebastian-mares: Anyone minds if I send Roberto the logs?
rjamorim: OK, sebastian will send me logs
rjamorim: hehehe
sebastian-mares: Not that it matters, I will do it anyways. :-B
idimkovic2: please!
sebastian-mares: LOL
idimkovic2: cya all
rjamorim: :D
rjamorim: cya, Ivan
gURuBoOleZZ: My opinion is unchanged: unrepresentative sample -> unrepresentative test
gURuBoOleZZ: bye
rjamorim: bye, Francis
* sebastian-mares waves.
rjamorim: Talk to you later
*** idimkovic2 has signed off IRC ().
rjamorim: Got your e-mail BTW :)
rjamorim: Will answer it soon
gURuBoOleZZ: bye :)
*** gURuBoOleZZ has signed off IRC ().
rjamorim: bye :)
rjamorim: heh
sebastian-mares: OK, only the three of us are here, now.
sebastian-mares: Juha, still alive?
JohnV: barely. :B
sebastian-mares: ok
rjamorim: haha
rjamorim: what time is it in .fi?
JohnV: 01:31
sebastian-mares: One hour minus here.
rjamorim: damn
sebastian-mares: Had a real-time conversation with Guru over e-mail yesterday at 3 AM. :-B
rjamorim: sweet :B
sebastian-mares: OK, log is on its way.
rjamorim: goodie
sebastian-mares: I'm such an idiot.
sebastian-mares: I sent it to myself. >_<
rjamorim: will read
rjamorim: :D
rjamorim: oh man
JohnV: lol
sebastian-mares: OK, sent.
rjamorim: danke
sebastian-mares: Keine Ursache. :-)
rjamorim: up yours!
rjamorim: ok, got it
sebastian-mares: Have fun.
sebastian-mares: 64 KB.
sebastian-mares: 1061 lines.
rjamorim: will save it to txt so that I can read it some place that does line breaking :B
rjamorim: I have read worse
rjamorim: Back when I actively participated in #vorbis
rjamorim: LOL FUCK
rjamorim: too big for NotePad :B
sebastian-mares: lol
sebastian-mares: Are you on Windows 95 or what?
sebastian-mares: Works perfectly fine here.
rjamorim: Mom's Win98
sebastian-mares: Ah.
JohnV: I just think it's pretty harsh if we are disqualified, because of a bug which has arguably positive and negative effects. The point is to rate the samples essentially. If people find problems from the samples, then they would score accordingly, that's still my point.
sebastian-mares: But our point is that the first part of the song sounds better than Nero would usually do.
sebastian-mares: And therefore, the overall rating is higher.
JohnV: It can be argued that in a real life track there would be more length for the problem section. According to plots, testers have 10-20 seconds to rate the problems.
sebastian-mares: But that is not the point.
JohnV: How do you know it would be audible better?
sebastian-mares: Only the fact that the first part has a higher quality leads to a higher ranking.
rjamorim: Because of higher bitrate? :B
JohnV: for most people in most samples?
sebastian-mares: If the whole file was the same, maybe the rating was lower.
JohnV: higher bitrate =! transparency for a group
sebastian-mares: Like 4.0 instead of 4.5.
sebastian-mares: That IS a difference.
JohnV: but the samples are not rates so that some part sounds good. There's 10-20 seconds which sound worse than it could. 
sebastian-mares: How much worse?
sebastian-mares: And how much better does the first part sound?
rjamorim: I just find that behaviour of spreading higher bitrates at the first few seconds and then evening out highly insidious.
rjamorim: It almost sounds like it was tailored for listening tests
rjamorim: Since it would make little difference on normal tracks
sebastian-mares: And it does.
rjamorim: But makes huge difference for 30-second samples
JohnV: These are impossible to answer, The point is, what decides the rating is the bad sound in a clip, and we have 10-20 sec potentially worse sound than there should
sebastian-mares: Guru pointed out that full tracks have totally different bitrate distribution than samples.
sebastian-mares: At least in Nero.
sebastian-mares: Vorbis, LAME and iTunes are almost identical.
JohnV: <rjamorim> It almost sounds like it was tailored for listening tests <- doesn't make sense cause we have 10-20 sec lenght worse sound than it could due to bug in reservour which didn't make it flexible enogh
rjamorim: But is that bug related to the higher bitrate distribution in the first 20 seconds?
JohnV: yes, read the log
JohnV: and it's not higher 20 seconds, it's more towards 10 seconds if you look the graphs
rjamorim: Too big log, will read later tonight
rjamorim: And if that bug is related to teh bitrate distribution: How can it make sound WORSE if bitrate is HIGHER?
sebastian-mares: Last part is worse.
JohnV: because reservour goes due to bug too small and not flexible enough
sebastian-mares: Because reservoir was drained or what?
sebastian-mares: OK
rjamorim: Yes, but if people tend to listen to beginning first, teh last part won't be taken so much into account
rjamorim: Hell, in some samples, the worse part probably doesn't even kick in until the very last seconds
JohnV: true, but that is an assumption
JohnV: hmm that's not true
rjamorim: Ican claiming quality impact was minimal is an assumtion too
rjamorim: *Ivan
*** idimkovic2 has joined #nda.
JohnV: it's 20 seconds max, most are towards 10 sec
sebastian-mares: You again?
idimkovic2: nah - I couldn't resist :)
idimkovic2: cannot sleep while this is on ;)
sebastian-mares: Juha's point is that we cannot exclude Nero just because we think people only listen to the first part.
JohnV: rjamorim: nobody noticed this before Guru, it is quite minimal..
rjamorim: Even if it was 5 seconds!
sebastian-mares: :-D
rjamorim: It would be intolerable!
sebastian-mares: Neither can I - I have to decide based on this shit.
JohnV: rjamorim: the bad section last 10-20sec and is present in the samples. People rate samples based on problems. If there are problems in the samples, they should be rated.
sebastian-mares: I think you fail to see one critical point.
sebastian-mares: The first part sounds better than the rest so this makes the score higher.
idimkovic2: huh 
idimkovic2: just two things
sebastian-mares: Also, what if Nero had artifacts during the first 5 seconds?
JohnV: How do you know it makes the score higher. We can't even know if the second part makes the score lower. Overall the difference is very small. For Guru in the hard attack samples only 1.5 ITU. What is that for a group..
sebastian-mares: But now doesn't because the bitrate is higher than normal.
idimkovic2: 1. Purely statistically, rest would then "sound worse" if we all agree that the AVERAGE bit rate was within limits
idimkovic2: 2. Perceptually,  overcoding something which is already transparent is a WASTE of bits
sebastian-mares: OK, but what about the second claim?
sebastian-mares: "Also, what if Nero had artifacts during the first 5 seconds?"
idimkovic2: what if?
sebastian-mares: Yes.
sebastian-mares: And because we don't know, Nero should be separated.
idimkovic2: well - it would not have artifacts somewhere else if the buffer is the same, then
sebastian-mares: Because the test is boggus.
idimkovic2: anyways
idimkovic2: Guru will get fixed version soon, would be just interesting to see
idimkovic2: if this is the case
sebastian-mares: Could I have the fixed version, too?
idimkovic2: of course
sebastian-mares: Thanks.
idimkovic2: that is the whole point
sebastian-mares: And I don't say to remove Nero totally.
idimkovic2: and also for the HE-AAC  I really don't want these problems :(
sebastian-mares: I say to simply mark it so it stands up from the rest.
idimkovic2: I guess nobody here wants them, too
sebastian-mares: And write a note on the graph why Nero can't be compared 1:1 with the rest.
idimkovic2: well I am for mentioning a bug
idimkovic2: but intepreting what that means
idimkovic2: it would be rather too much speculation for my taste, but that's just my 0.02 euros anyway ;)
sebastian-mares: Well...
sebastian-mares: We excluded WMA Standard from the test because encoding samples would've produced different results than encoding full tracks.
idimkovic2: and btw
idimkovic2: but the effect with ND is much much less
sebastian-mares: Now Guru tested the same with Nero and surprise: it also produces two different outputs.
sebastian-mares: But it is there.
idimkovic2: here we go again ;)
sebastian-mares: And it is much more with ND than with AoTuV or LAME.
idimkovic2: anyways - what I propose
idimkovic2: is to mark it and state that it had a bug concerning bit allocation  that might, or might not affect the results  
idimkovic2: whether it would be better or worse
idimkovic2: I cannot tell, and I do think it would be worse in fact
idimkovic2: because overcoding perceptually is much less important than undercoding
sebastian-mares: We should hurry up anyways. Otherwise, Juha is going to bang his head against the keyboard and fall asleep.
idimkovic2: hahaha
idimkovic2: anyway - one more thing from my side
idimkovic2: for the future of the tests
idimkovic2: please @everyone - add "time-domain bitrate check" 
JohnV: <idimkovic2> because overcoding perceptually is much less important than undercoding <- very important point..
idimkovic2: in "a must"  check likst
sebastian-mares: For the future of the tests, provide the "to-be-tested" encoder sooner, please. :-B
idimkovic2: yes :)
idimkovic2: well for HE-AAC I can be sure it would be available in mid-January
idimkovic2: hopefully
idimkovic2: so plenty of time for checking
sebastian-mares: Yeah.
sebastian-mares: Well, I am going to see what I will do with the plot.
sebastian-mares: Tukey's HSD is calculated with Nero anyways.
sebastian-mares: The rest is more or less a design issue.
idimkovic2: well
sebastian-mares: Whether or not we move Nero 200 pixels to the right and state that it had a bug or keep it where it is.
idimkovic2: if you see that considerable amount of people notice this by result interpretation
idimkovic2: then do that separation,  my advice
idimkovic2: if not... just menioning should be ok
sebastian-mares: OK
idimkovic2: that it has (had , hopefullY) a bug
idimkovic2: <joke> if we lose,  just disqualify us </joke> hehehe
sebastian-mares: Believe me - you are not losing.
idimkovic2: but I don't think that could ever happen ;)
sebastian-mares: Shine does.
sebastian-mares: ^_^
idimkovic2: lol
idimkovic2: anyway I'm definitely off now
sebastian-mares: Anyways, that should be all.
sebastian-mares: Thanks for participating.
idimkovic2: please check the intensity stereo :)
sebastian-mares: Sure.
idimkovic2: you're welcome
sebastian-mares: What did you get for Christmas, BTW? :-P
idimkovic2: I spent it all alone working ... GF is in the country far away
* idimkovic2 sad
sebastian-mares: Poor you...
sebastian-mares: :-P
idimkovic2: I should go out and have some beer or smth :)
sebastian-mares: Still in Berlin?
idimkovic2: yeah
idimkovic2: I'm prolly moving to Munich, though
sebastian-mares: Ehw...
idimkovic2: but it is ridiculously expensive
sebastian-mares: Bavaria
idimkovic2: real estate
idimkovic2: well.... yeah
sebastian-mares: Stoiber.
sebastian-mares: :-S
idimkovic2: :-)
idimkovic2: lol
idimkovic2: anyways... my GF is there and she's not mobile as I am
idimkovic2: so I have to compromise ;)
idimkovic2: anyway
idimkovic2: I'll be spending 3 days a week in Karlsruhe
sebastian-mares: Here in Karlsruhe, everything is closed. It is since 2 PM actually. Not even Burger King, Pizza Hut and the like are open.
idimkovic2: (gotta find place there, too)
idimkovic2: yeah I know
idimkovic2: I spent 2002 and part of 2003 there ;)
idimkovic2: once I almost starved :)
sebastian-mares: lol
idimkovic2: anywayz guys - have a good night
sebastian-mares: Yeah, I am off too.
idimkovic2: and a happy and successful new year
sebastian-mares: Yeah, for you too.
sebastian-mares: Roberto, still there?
JohnV: happy new year to everybody! :)
*** idimkovic2 has signed off IRC ().
sebastian-mares: :-)
sebastian-mares: Bye!
JohnV: bye :)(
Session Close (#nda): Mon Dec 26 01:07:31 2005