Session Start (FreeNode:#nda): Sun Dec 25 21:39:44 2005 *** #nda: sebastian-mares idimkovic @JohnV *** #nda was created on Sun Dec 25 21:26:58 2005. sebastian-mares: Testing, testing... idimkovic: :) sebastian-mares: Now I have to get Francis in here. :-P idimkovic: keeping fingers crossed idimkovic: :) sebastian-mares: Roberto and Darryl wanted to join and give some advice, but they are both offline. sebastian-mares: Anyways... sebastian-mares: Well, e-mail sent, now I hope he finds his way here. idimkovic: let's hope so idimkovic: if he needs some help idimkovic: he can always send email to me or Juha sebastian-mares: Sure. idimkovic: we do online e-mail chatting sometimes, too ;-) JohnV: well.. I'll see it like this: there was a bug in ABR bitreservour allocation. There is slight quality drop in case of hard to encode material with the version used in the test. Shouldn't be a big problem when this is fixed. sebastian-mares: Yeah, but the bug can lead to problems in the current test. idimkovic: I seriously doubt so sebastian-mares: When people listen to the first part of the song, which has a higher bitrate and quality then the rest, Nero gets a higher ranking that it would get in real-world. idimkovic: the bit rate was not that higher idimkovic: and the quality impact was minimal idimkovic: besides idimkovic: it could be claimed that people listening to the second part found it worse idimkovic: but in fact difference was rather minimal idimkovic: which was the reason idimkovic: why it wasn't spotted at all :( idimkovic: before the test sebastian-mares: The people who listened to the second parse rated it the way Nero deserves, since that is the quality that is used for the remaining track. JohnV: It's very small quality impact for anything else except high attack section etc where it can be audible for more people. JohnV: It's not how the ABR will work either, it was bugged. JohnV: The true quality will be high with no quality drop with high attack sections JohnV: after the bug fix sebastian-mares: Well, after the bug fix. idimkovic: well idimkovic: besides idimkovic: I worked on this this afternoon as well as on the new IS sebastian-mares: But so far, the current results will have no relevance to real-world usage. JohnV: cause the reservour won't be starved idimkovic: and the test version was in fact sent to Juha for test idimkovic: I would gladly send it to you and Guru as soon as Juha OKs it idimkovic: so we can be sure ABR works good for the HE-AAC test sebastian-mares: So, as Darryl, Francis and Roberto suggsted, the best thing would be performing the Tukey's HSD analysis with Nero's results included, but taking it out from the plot then. sebastian-mares: Or instead of taking it out totally, separating it from the other contenders. idimkovic: well it is up to you Sebastian, I dont' really think there was significant impact for such a hard measure sebastian-mares: Well, the same can be said about the WMA standard problem. JohnV: me neither, the bitrate drop doesn't lower quality that much. Even Guru didn't notice it until in the attack sample at first. idimkovic: in fact I don't really think it can be compared to WMA standard idimkovic: the difference is >much< smaller sebastian-mares: I had to replace it since testing 2-pass on small samples would have resulted in different results than on day-to-day usage. sebastian-mares: Yes, but then noticed it in other samples, too. idimkovic: yes, but WMA would generate huge differences very easy to spot sebastian-mares: Well, a difference of 25 kbps is pretty high. sebastian-mares: And as I said, it happens with ALL samples. idimkovic: well I would more be concerned on the quality impact JohnV: imo, the listeners SHOULD listen the whole sample and make the judgement based on that. That is the idea anyway. sebastian-mares: Whether it's the run-in time or the bit reservoir, that doesn't matter. sebastian-mares: Yes, but that cannot be controlled. JohnV: If there would be a significant quality drop, they would hear this as artifacting and judge accordingly sebastian-mares: Imagine the following... sebastian-mares: Francis and others suggested using smaller samples for listening tests. sebastian-mares: 5 seconds and that kind of material. JohnV: but smaller samples are not used sebastian-mares: In such case, Nero would actually produce a very good quality because of its run-in time. sebastian-mares: Second... JohnV: but it's the same if there was artifacting because of bad psychoac or something after 15 sec. The listener should hear it if it's significant. JohnV: As far as I see it, the whole track is there to be judged, if there is a quality drop, there is. If not for most, they don't notice it. idimkovic: and in fact, nobody else except Fransis really noticed - we did as many tests as possible in the preparation time idimkovic: and nobody really found that out JohnV: And after the fix, it shouldn't be noticeable in attack samples either. sebastian-mares: Francis also tested something else... sebastian-mares: He encoded LesJoursHeureux once as sample (30 seconds clip) and once as full file and then extracted the 30 seconds from the full track encoding. sebastian-mares: iTunes, AoTuV and LAME produced a very, almost-identical bitrate distribution, while Nero something totally different. idimkovic: well I already said - due to this bug, quality-wise only hardest samples were affected JohnV: it is also quite questionable if Nero is separated just because something should sound good and something maybe worse, but you think people wouldn't hear the worst sounding section. I mean.. the whole track is there. JohnV: Imo let the people judge. Anyway this bug will be fixed. idimkovic: (if it is not already) idimkovic: how I see things here idimkovic: - there was a bug in the bit-rate manager, where bit reservoir was drawn out sebastian-mares: (Link: http://maresweb.homeip.net/listening-test/guru.eml)http://maresweb.homeip.net/listening-test/guru.eml sebastian-mares: Save to disk and open with Outlook, Thunderbird, whatever. idimkovic: - this bug was not found out in the pre-test idimkovic: - nobody in the internal pre-test complained about the quality issues idimkovic: and, so far, Fransis is complaining only - and honestly I don't think many others would notice any degradation, except if feeded in by something pathological idimkovic: we will of course fix this idimkovic: but I somehow find a measure of excluding it from the test a bit too hard since the quality impact is IMHO neglectable sebastian-mares: Others didn't complain because maybe they didn't listen to the whole track. idimkovic: come on Sebastian idimkovic: if the quality difference is there sebastian-mares: And it is. idimkovic: there is not a single chance that people won't notice it sebastian-mares: As ABX logs show. sebastian-mares: I have them. sebastian-mares: Guru posted additional ones via e-mail. idimkovic: Guru did idimkovic: but I find it very questionable it is of big importance for majority idimkovic: nowhere near WMA bitrate deviation JohnV: it has to be assumed that people listen the whole track. People can't know before hand where artifacting appears. idimkovic: => CONCLUSION: Nero ABR gives allocates more bits when the idimkovic: encoder is starting, less after few seconds. It sometimes easy idimkovic: to ABX (sometimes it's not ABXable, at least for me). idimkovic: that is Guru's comment idimkovic: if it is "sometimes not ABXable" I can be pretty sure that for huge majority it is far from ABXable ;) idimkovic: where is he by the way :) sebastian-mares: But the main problem is not really if people listen to the same track or not. The problem is that Nero shows a different behavior with samples than it does with full tracks. idimkovic: ok this is true but idimkovic: how big is this deviation sebastian-mares: And we removed WMA Std. because we wanted results that are meaningful. idimkovic: and is it so relevant that it could change the result significantly idimkovic: I don't think so really sebastian-mares: So, problem now is... Why did I remove WMA Std. when the Nero results aren't meaningful either? idimkovic: if I did I would be first to ask for the recall honestly idimkovic: but - "meaningful" is questionalble JohnV: the difference is very small, and it will be fixed. And this problem also appears in the test samples. sebastian-mares: Well, I believe that you are honest, but the rest sees you as Nero developer. sebastian-mares: And as a person who tries to make Nero appear in best light. idimkovic: well I am practical JohnV: why isn't Nero results meaningful? The samples are long enough to this bug to take effect. sebastian-mares: So, because of this "bonus" Nero has at the beginning of the track, it can be over-rated. idimkovic: I do believe these deviations are not significant for the huge majority JohnV: or underrated idimkovic: it can be under-rated idimkovic: because of the second part idimkovic: (and I don't think so) sebastian-mares: No, it is not under-rated. JohnV: because after the fix it's better idimkovic: it would not help me or Nero either ;) sebastian-mares: The second part is not worse. sebastian-mares: It's how the encoder performs. idimkovic: well if the second part is NOT worse sebastian-mares: Only the first 20 seconds have a higher bitrate. sebastian-mares: Ivan, I sent you bitrate distribution graphs from Guru. JohnV: it can be underrated in a sense that if it's bad, people will notice it and give bad score, but after the fix it will be better sebastian-mares: They show how the first 10 to 20 seconds have bitrates of 155 kbps and then they drop to 130 kbps and remain steady at that bitrate. JohnV: have you checked the Vorbis bitrate distribution? Isn't it giving high bitrate all the time almost? sebastian-mares: So it's not like nero encodes first part with 192 kbps and the second one with 64 kbps making an avg. of 128. JohnV: sebastian-mares: not true JohnV: it's not like that at all JohnV: that's a huge exaggeration sebastian-mares: What is a huge exaggeration? sebastian-mares: Could you kick Ivan? JohnV: the quality drop is _minimal_ for most people and it's maybe noticeable in hard to encode samples at the moment for some. JohnV: it's not at all any 192 vs 64 sebastian-mares: I know. sebastian-mares: That's what I said. JohnV: the difference is minimal sebastian-mares: But you said it can be under-rated, which is not true. sebastian-mares: Since after the 155 kbps part, Nero encodes at 130 kbps which is what it continues to encode to over the whole file. JohnV: well in a way it can, especially after this is fixed. :) *** idimkovic2 has joined #nda. idimkovic2: huh idimkovic2: now we have two idimkovic nicknames idimkovic2: could someone paste me please what has been discussed? sebastian-mares: So you can't say "Oh, it is under-rated because people rated the 64 kbps part)". sebastian-mares: sebastian-mares: Only the first 20 seconds have a higher bitrate. sebastian-mares: Ivan, I sent you bitrate distribution graphs from Guru. JohnV: it can be underrated in a sense that if it's bad, people will notice it and give bad score, but after the fix it will be better sebastian-mares: They show how the first 10 to 20 seconds have bitrates of 155 kbps and then they drop to 130 kbps and remain steady at that bitrate. JohnV: have you checked the Vorbis bitrate distribution? Isn't it giving high bitrate all the time almost? sebastian-mares: So it's not like nero encodes first part with 192 kbps and the second one with 64 kbps making an avg. of 128. JohnV: sebastian-mares: not true JohnV: it's not like that at all JohnV: that's a huge exaggeration sebastian-mares: What is a huge exaggeration? sebastian-mares: Could you kick Ivan? JohnV: the quality drop is _minimal_ for most people and it's maybe noticeable in hard to encode samples at the moment for some. JohnV: it's not at all any 192 vs 64 sebastian-mares: I know. sebastian-mares: That's what I said. JohnV: the difference is minimal sebastian-mares: But you said it can be under-rated, which is not true. sebastian-mares: Since after the 155 kbps part, Nero encodes at 130 kbps which is what it continues to encode to over the whole file. JohnV: well in a way it can, especially after this is fixed. :) idimkovic2: honestly idimkovic2: the problem is idimkovic2: what we are talking here is bitrate deviation idimkovic2: but it would be more important to check QUALITY deviation idimkovic2: because idimkovic2: people are not rating bitrate plots idimkovic2: they are rating quality JohnV: right idimkovic2: bitrate deviation had a bug sebastian-mares: Yes, but in this case, there is a sudden drop in quality, too. idimkovic2: this is what IS questionable sebastian-mares: Otherwise, Guru wouldn'Ve noticed it. idimkovic2: not really idimkovic2: Guru is extremely trained listener JohnV: the tracks have certain quality, and people are gonna judge it. That is the point. sebastian-mares: Yes. idimkovic2: I am quite sure he would pick up 1 or 2 kbps difference idimkovic2: but this is not the point idimkovic2: the point is idimkovic2: would THIS change the results with statistical significancy idimkovic2: I honestly don't think so idimkovic2: there is absolutely no way idimkovic2: because - if that was the case, this would have been spoted long time ago JohnV: this is the way the track sound with this bug. After the fix the reservour won't starve and it will be give also hard to encode parts more bits idimkovic2: so, at the end of the day idimkovic2: it will just sound better (I hope ;) idimkovic2: allocating more where it SHOULD idimkovic2: @Sebastian - did you get in touch with Francis sebastian-mares: Yes, I am wondering why he's not appearing. sebastian-mares: Anyways, what I wanted to say is that it's natural that the quality is going to drop. sebastian-mares: Since Nero is using the reservoir for a reason. sebastian-mares: If it wasn't drained, it would've used it for the rest, too. JohnV: With this bug the bitrate stays around 130kbps which gives very good quality on most material, for very hard to encode attacks etc. of course higher bitrate helps, and it will be like that. idimkovic2: not really idimkovic2: it would drop idimkovic2: if and only if idimkovic2: the bitres goes to zero idimkovic2: and it is not the case idimkovic2: from my todays work - it was a bug where it went from 500K to ~10 idimkovic2: and then up and down idimkovic2: which is enough for most of the material sebastian-mares: So you claim that the bitrate drops, but the quality stays the same? idimkovic2: no sebastian-mares: How should I understand it then? idimkovic2: bitrate does drop, but still having buffering capabilities for minimal impact on quality idimkovic2: AAC starts to suck if the bit reservoir is drained idimkovic2: so to put it this way - at the beginning where there is a bug sebastian-mares: Second... idimkovic2: if the frame is overcoded - that won't be noticed as improvement by many idimkovic2: oly maybe by Guru and few other people - because it was >above the JND< idimkovic2: so if you overcode above the JND - that is more or less useless work :( and this is a pity sebastian-mares: (Link: http://foobar2000.net/divers/tests/2005.12/AAC_Nero_Digital.png)http://foobar2000.net/divers/tests/2005.12/AAC_Nero_Digital.png idimkovic2: point is sebastian-mares: After the bitrate drop, there isn't much variation with bitrate. idimkovic2: quality does not scale with the bit rate linearly idimkovic2: this is a very wrong idea idimkovic2: you cannot look at second plots idimkovic2: you must look at individual frame plots sebastian-mares: And a bitrate drop of 25 kbps must have an impact on quality. idimkovic2: which is not so significant JohnV: sebastian-mares: not necessarely idimkovic2: because average bitrate requirement is around 128-135 kbps idimkovic2: so when you overcode - impact is much less idimkovic2: than undercoding it idimkovic2: less than 128 idimkovic2: that's where noise ABOVE threshold kicks in idimkovic2: threshold = psychoacoustic threshold idimkovic2: suppose that you have a tone idimkovic2: and a masker idimkovic2: and masker is 10 dB below the tone -> that is JND idimkovic2: and it requires, say, 10 kbits idimkovic2: if you overcode - it is still below average human JND idimkovic2: impact = small idimkovic2: if you UNDERcode - it gets above JND, impact = big idimkovic2: now, average music material bitrate requirement for AAC is well around 128 kbps JohnV: And a bitrate drop of 25 kbps must have an impact on quality. <- if there is, then we should see it in the results. And it's our bad. idimkovic2: so overcoding at the beginning means something maybe only to very golden ears idimkovic2: for the rest of the world.... I seriosly doubt that they would notice idimkovic2: and if you want proof about variations idimkovic2: ask Guru to use some frame plotting tool idimkovic2: to see >frame< bit distribution idimkovic2: if the frame bit distribution was fixed to 3000 bits/frame (128 kbps) -> we would be screwed badly idimkovic2: this whole argument idimkovic2: would make much mroe sense at 96 kbps idimkovic2: or 80 idimkovic2: where such overcoding would have impact idimkovic2: but later surely people would indeed notice drop sebastian-mares: Second... idimkovic2: and, if we had fixed frames later - hehe boy people would hear that :) idimkovic2: ok sebastian-mares: Trying to find out what's wrong with Guru... *** idimkovic has signed off IRC (Connection timed out). sebastian-mares: OK, no idea what's up. idimkovic2: doh nobody is commenting my IS samples ;-) sebastian-mares: My main concern is that Nero performs differently when encoding samples than when encoding full tracks. JohnV: it's Christmas day ffs :) I've been busy all day with my god son. :B idimkovic2: hahaha idimkovic2: @Sebastian - thing is I don't believe this bug affects scores significantly idimkovic2: if it did sebastian-mares: Well, it does affect them. idimkovic2: it would be surely not fair idimkovic2: but not significantly JohnV: sebastian-mares: well the same difference should be there audible then in the samples then also. They are long enough sebastian-mares: I mean, it might. idimkovic2: I would propose to announce this bug idimkovic2: I have nothing against that sebastian-mares: And that is why I think the best solution would be not to remove Nero entirely like Darryl suggested, but separate from the others. idimkovic2: but indeed I think it would be too harsh to remove Nero sebastian-mares: Not sure if that is wise. sebastian-mares: I mean, announce the bug now. idimkovic2: not now sebastian-mares: It will change how the test is ran. idimkovic2: at the end of the test sebastian-mares: Yeah. JohnV: sebastian-mares: I don't understand this removing.. The track is there, it has the "bug". People are suppose to rate it. idimkovic2: no no idimkovic2: for sure, that would be very bad idimkovic2: but as for the final results idimkovic2: I would not remove Nero idimkovic2: but anyway, it is your call sebastian-mares: I wouldn't either. sebastian-mares: And I talked to Guru about this. sebastian-mares: He has no problem with keeping Nero in. JohnV: the thing is, this bug takes an effect in these samples. If it's significant, people are suppose to notice it. The task is to judge the samples. sebastian-mares: Anyways, since Tukey's HSD is calculated with Nero, there is no difference if I include Nero or not. idimkovic2: in which case >we< would be in the lose sebastian-mares: I mean, it won't affect final rankings. idimkovic2: not anyone else sebastian-mares: Not really... sebastian-mares: You fail to see one thing... JohnV: It would be different if the bug wouldn't take effect in these samples, BUT IT DOES. idimkovic2: well as soon as the bugfix is tested sebastian-mares: If people listen to the first few seconds. They listen to Nero, AoTuV, LAME. idimkovic2: how do you know if they do that? idimkovic2: that is a big assumption sebastian-mares: Same as the assumption that there is no impact. JohnV: sebastian-mares: how do you know how people listen? They are suppose to judge the quality of the full track, and find artifacts from the full track. sebastian-mares: They rate AoTuV which encodes in the same way, regardless if the material is a sample or a full track. sebastian-mares: Then LAME which is similar to AoTuV. idimkovic2: the assumption that there is no impact is based on the rather common sense sebastian-mares: Then they listen to Nero which has a bonus of a high bitrate. idimkovic2: there is no point in overcoding something that is transparent for 99% of people idimkovic2: it will >still< be transparent idimkovic2: it will be a waste of space for us idimkovic2: because we could use those bits somewhere else idimkovic2: but , like we discussed, people do not find difference in the second part of the track idimkovic2: what is the point then? sebastian-mares: But what if the second part that is encoded with 130 kbps and not 155 kbps like the first 10 to 20 seconds is NOT transparent but people didn't test? sebastian-mares: Also, consider this... idimkovic2: I would idimkovic2: but seriously I don't believe that is the case idimkovic2: and also sebastian-mares: In my samples, the difficult part is in the middle or at the beginning most of the time. idimkovic2: that would be found out in the pre-test JohnV: Imo if it's assumed that people only listen the first 2-3 seconds, we can quit these tests alltogether.. because then important artifacting will be missed. idimkovic2: hmmm sebastian-mares: So in the spot where Nero still has a higher bitrate. sebastian-mares: In a full track encoding, the difficult part is no longer in that spot. idimkovic2: I don't think "hardness" can be judge by that idimkovic2: (Link: http://foobar2000.net/divers/tests/2005.12/Vorbis.png)http://foobar2000.net/divers/tests/2005.12/Vorbis.png sebastian-mares: Yes. sebastian-mares: Did you look at the EML I linked to on my home server? sebastian-mares: It has a PNG showing the behavior sample vs. full track. idimkovic2: yes I did idimkovic2: but, like I said idimkovic2: overcoding won't bring quality impact for 99% of the people idimkovic2: and the rest of the track still had bit reservoir and 128 kbps idimkovic2: so it would be very very very hard for serious quality drop sebastian-mares: What about the beginning? idimkovic2: well I already said sebastian-mares: Doesn't it have 155 kbps and reservoir? idimkovic2: it was overcoded idimkovic2: it has 128 kbps and bit reservoir ;) idimkovic2: but if it was overcoded idimkovic2: there was absolutely no point idimkovic2: except, for Guru idimkovic2: and few other people idimkovic2: and STILL sebastian-mares: But dude... sebastian-mares: Let me speak out, please. idimkovic2: I doub't even Guru would rate second part as bad idimkovic2: ok sebastian-mares: In my samples, the complicated part is usually in the middle or towards the beginning. That is, the complicated part is in the range where the bitrate is higher than the rest. On a full track encoding, the complicated part is no longer in a range of 155 kbps, but in the range where Nero allocates ~128 kbps so the quality is worse. idimkovic2: well according by the VBR posts from Vorbis idimkovic2: and for Vorbis we can be pretty sure it is pure VBR sebastian-mares: Guru posted some ABX logs where he managed to find difference between the same part - one time encoded as 20 seconds clip, and one time encoded as full track and then extracted. sebastian-mares: So there IS a difference between the two modes. idimkovic2: the demands are not centered sebastian-mares: Whereas for Vorbis, there isn't. idimkovic2: there IS of course, I didn't say there is not sebastian-mares: They are encoded in the same way because Vorbis is VBR. idimkovic2: hold a sec idimkovic2: what I claim with Vorbis - is that complexity was not centered at some specific place JohnV: right, that can be soon from Vorbis' plots idimkovic2: which can be seen by it's bit rate distribution, which is purely quality based JohnV: *seen idimkovic2: and also I don't claim there is no difference with Nero encodings idimkovic2: I just am pretty sure this difference is far from significant JohnV: yep sebastian-mares: Have a look at this FYI: sebastian-mares: E:\TESTS D'AUTOMNE 2005\VBR start-up\decoded\LesJoursHeureux (12'00 - 20'00 directly encoded as short sample) [WMAPro].wav vs E:\TESTS D'AUTOMNE 2005\VBR start-up\decoded\LesJoursHeureux (12'00 - 20'00 extract from full encoding) [WMAPro].wav sebastian-mares: 9:26:22 PM f 8/16 pval = 0.598 sebastian-mares: So, pretty much guessed with WMA. sebastian-mares: Means that they are more or less the same. idimkovic2: say what idimkovic2: why don't we give this to 10 people idimkovic2: and ask them the same :) idimkovic2: would be very good to check sebastian-mares: E:\TESTS D'AUTOMNE 2005\VBR start-up\decoded\LesJoursHeureux (12'00 - 20'00 extract from full encoding) [Nero Digital].wav vs E:\TESTS D'AUTOMNE 2005\VBR start-up\decoded\LesJoursHeureux (12'00 - 20'00 directly encoded as short sample) [Nero Digital].wav sebastian-mares: 9:22:40 PM p 14/16 pval = 0.0020 sebastian-mares: Another picture. sebastian-mares: When to do so? idimkovic2: well - Guru did find difference, I think we all agree on that sebastian-mares: Do it now? sebastian-mares: Could harm the test. idimkovic2: maybe Guru can encode some independent samples sebastian-mares: Do it later? sebastian-mares: Too late. idimkovic2: well at the end of the day - I think it is up to you ;) I just think it would be good to check that difference, I dont' really think it was significant at all sebastian-mares: Guru is active on HA's PM center... JohnV: right.. that's why imo it is the best, that people just rate the samples. The effect is there. Let group rate the samples. We can't really start assuminig about how some group will do the testing, ie. just 2-3 secs or something. sebastian-mares: Wondering what he's up to. :-P idimkovic2: :) sebastian-mares: Well, samples are group-rated anyways. idimkovic2: well I would propose to keep Nero in the table... if you think it is needed, put some * and write down that it had a bug - and then let's do verification - we would have bugfixed version by then sebastian-mares: The idea was just to separate Nero result's. sebastian-mares: Like, move it far right. idimkovic2: after the test idimkovic2: we can for sure compare idimkovic2: bugfixed version idimkovic2: vs non bugfixed sebastian-mares: And writing something like "We are not sure about XYZ because 123". JohnV: "we are not sure if these results are valid, because we think people only listen the first 2-3 seconds"? idimkovic2: :-) oh my sebastian-mares: ... idimkovic2: well I am not sure about positioning idimkovic2: and where to put it idimkovic2: I seriously don't think there is a considerable impact - but still, if you think it is needed... well sebastian-mares: Did Guru write you a PM? idimkovic2: let me check idimkovic2: no sebastian-mares: Crapola. sebastian-mares: I am wondering what he's doing. sebastian-mares: Would've been much better for him to be here. sebastian-mares: In this way, I am telling you guys what he told me and I am telling him what you told me. idimkovic2: yeah sebastian-mares: And during this process, information can be altered. idimkovic2: anyway sebastian-mares: Which sucks. idimkovic2: let's also make sure >everything is god damn tested< before the next test, please idimkovic2: I know it has been only a week before this one sebastian-mares: Yeah. idimkovic2: but it is a pity these things pop up now sebastian-mares: Ah, Guru is back in PM center. sebastian-mares: :-P idimkovic2: heh :) idimkovic2: anyways idimkovic2: Juha did get new version today (jeez... I have to stop working weekends ;) idimkovic2: if it is OK - @Juha please then send it to Guru and Seb idimkovic2: I did frame utilization plots today sebastian-mares: What do you think about the idea of asking people to compare two encodings, but not mentioning why. idimkovic2: and they are equal for #41 - 7 minute track sebastian-mares: What did you use? idimkovic2: idea is OK sebastian-mares: Guru used matroska or something. sebastian-mares: :-B idimkovic2: maybe you can encode few long tracks idimkovic2: and ask people to compare them *** gURuBoOleZZ has joined #nda. idimkovic2: e.g. few full songs sebastian-mares: YAY! idimkovic2: Hey Guru :) gURuBoOleZZ: it was long sebastian-mares: Guru, you missed the whole discussion. lol JohnV: hi hi sebastian-mares: Now, let's start over. gURuBoOleZZ: OK, back on TV. Bye sebastian-mares: Nah! idimkovic2: again?!?!?!? :-) sebastian-mares: I am getting some sweets... ^^ idimkovic2: @Guru - we did have quite a hot discussion gURuBoOleZZ: So I'm too late? idimkovic2: anyway sebastian-mares: Well a bit, but we can summarize it for you. idimkovic2: well - first thing is idimkovic2: ti is a confirmed bug gURuBoOleZZ: it would be nice :) idimkovic2: and I did make a fix today along with the IS thing I'm posting on HA idimkovic2: Juha is testing it idimkovic2: -- Sebastian can tell you the rest ? :) sebastian-mares: Oh no... I cannot even eat my cookies in peace... sebastian-mares: Oh well. gURuBoOleZZ: I guess that the debate was about 3.1.0.2 sebastian-mares: Yes. gURuBoOleZZ: not the IS story idimkovic2: yes gURuBoOleZZ: OK gURuBoOleZZ: The problem is that samples used in the test are... problematic sebastian-mares: So, in one sentence, Ivan and Juha think that the problem doesn't have such a huge impact that Nero should be removed from the test. gURuBoOleZZ: go on. I'll follow when you finish idimkovic2: because the undercoding does not happen under the 99% user JND at 128 kbps idimkovic2: and overcoding at the begin idimkovic2: could have impact only to very small group of people idimkovic2: like... you :) sebastian-mares: Like you for example. sebastian-mares: hehe. idimkovic2: because if you overcode above the JND gURuBoOleZZ: sorry, what's JND? idimkovic2: for most people it is still "transparent" idimkovic2: Just Noticeable Distortion idimkovic2: threshold of audibility of the noise idimkovic2: now idimkovic2: the bug with 3.1.0.2 was idimkovic2: one parameter in the function was wrong (heh it is always the case, ask developers idimkovic2: ;) idimkovic2: and bti reservoir was drained to approx 10-15 kbits idimkovic2: (which is still a bit larger than CBR bit reservoir) idimkovic2: instead of not being drained idimkovic2: anyway idimkovic2: that produce overcoding at the beginning idimkovic2: but - I still doubt that is really audible afterwards for huge majority idimkovic2: leave alone significance on the ITU scoring idimkovic2: because even the rest gets 128+ kbps idimkovic2: with bit reservoir idimkovic2: what I do propose idimkovic2: when we release the fix idimkovic2: to compare "fixed" vs 3.1.0.2 on a large track idimkovic2: not telling why of course idimkovic2: and see the difference on 5-6-7 people idimkovic2: is it significant or not idimkovic2: okay that's all from me ;) sebastian-mares: The reason for this is... sebastian-mares: I told Ivan and Juha that for my samples, the complicated part of a song is usually in the range where Nero allocates 155 kbps. idimkovic2: quick comment: but we disagree no that ;) sebastian-mares: On a whole track however, the complicated part is where Nero allocates the normal ~130 kbps. sebastian-mares: Where we disagree is on the question whether or not that is significant. idimkovic2: also on the complexity sebastian-mares: Ivan and Juha claim that it is not significant for the large majority. idimkovic2: I honestly think it is not true - judging from the Vorbis VBR plots idimkovic2: (for Vorbis we can be sure it is Quality-VBR) sebastian-mares: Still there Guru? :-B idimkovic2: we bored him :) idimkovic2: anyway - vorbis plot ;) (Link: http://foobar2000.net/divers/tests/2005.12/Vorbis.png)http://foobar2000.net/divers/tests/2005.12/Vorbis.png gURuBoOleZZ: yes, but it's not easy to follow a debate when two people are writing in the same time sebastian-mares: Ah, OK. gURuBoOleZZ: finish? idimkovic2: me : yep gURuBoOleZZ: seb too? sebastian-mares: Ivan says that we should let more people compare a full track encoding against a sample encoding and see if there is an audible difference for the majority. sebastian-mares: If I got him right. idimkovic2: yep JohnV: if Vorbis bitrate distribution followed Sebastians thinking about the complexity of the tracks, it would go down. It doesn't. sebastian-mares: Huh? sebastian-mares: I don't get it. gURuBoOleZZ: Let's the "complexity" question for the moment gURuBoOleZZ: IMO, samples are not just "complex" on the beginning and quieter on the end gURuBoOleZZ: SOme of them --maybe most-- are as complex at the end JohnV: you can see that also from the Vorbis bitrate distribution idimkovic2: which would make my overcoding argument even more important idimkovic2: useless waste of bits for most people gURuBoOleZZ: Not only Vorbis bitrate distribution, but also LAME and iTunes JohnV: Vorbis is for sure totally quality based VBR. And very good JohnV: that's why I mentioned Vorbis. gURuBoOleZZ: LAME too I'd say gURuBoOleZZ: iTunes less (it doesn't go below 128 kbps) sebastian-mares: BTW, in case anyone is interested in a bitrate table for iTunes, LAME, Nero and AoTuV: (Link: http://maresweb.homeip.net/listening-test/bitrate%20table.htm)http://maresweb.homeip.net/listening-test/bitrate%20table.htm gURuBoOleZZ: About the test, the issue and the way to handle this... JohnV: imo LAME is not that good as Vorbis.. ;) gURuBoOleZZ: I don't consider as pertinent to make an additionnal test to see if the problem is audible or not for a majority gURuBoOleZZ: Just for example, we could put LAME -V4 or even -V2 instead of -V5 gURuBoOleZZ: and see if a majority of people can make a difference gURuBoOleZZ: between -V2 and -V5 gURuBoOleZZ: If they can't, it would be (according to your logic) perfectly acceptable to substitute -V5 with -V2... gURuBoOleZZ: ...in a 135 kbps listening test gURuBoOleZZ: Of course it isn't sebastian-mares: Second, I think you got something wrong... sebastian-mares: The problem was to find out if there is a difference (quality wise) between the first part of a song and the second. sebastian-mares: The first part being over-encoded and the second part being encoded the "normal" way. gURuBoOleZZ: I understand this idimkovic2: well in this case it is not "normal" way - it got less bits than it >could< idimkovic2: if the bit reservoir bug was not there sebastian-mares: One second folks... idimkovic2: okay idimkovic2: just quick info - I think it is very suboptimal to look at "bps" plots - individual frame plots are more precise way of distributing bits sebastian-mares: Since both Juha and you admit that there is / was a bug in the Nero encoder, why don't we simply do what we wanted to do - include Nero in the plot, but separate it from the rest. sebastian-mares: Since it was a bug and not a non-tuned psymodel for example. idimkovic2: well if you think it is a good idea - I think the difference is rather too small for that, but like I said, it is up to you idimkovic2: I think the difference would not require such harsh measure gURuBoOleZZ: 1.5 ITU point difference with one sample is not something "small" gURuBoOleZZ: 1.5 = my difference with sample 10 idimkovic2: with one listener, though gURuBoOleZZ: And? idimkovic2: and very exceptional one JohnV: that's hard to encode attack sample idimkovic2: okay - but would it statistically change the results gURuBoOleZZ: to know this, there's only one accurate way gURuBoOleZZ: restart a new test, with the same samples and the same listeners gURuBoOleZZ: and the same competitors gURuBoOleZZ: The rest is very... JohnV: if 1.5 is for Gutu on a very hard attack sample.. Well.. I don't think it is statistically significant overall gURuBoOleZZ: well, i don't find the word gURuBoOleZZ: alchemy? gURuBoOleZZ: The problems are not "attack" idimkovic2: huh I don't think it is practical at this point gURuBoOleZZ: not smearing at least idimkovic2: if it was discovered 2-3 days in the test JohnV: sample 10 is attack sample, and at least for me the problems there comes from short blocks gURuBoOleZZ: yes, but harpsichord is a weird instrument idimkovic2: it is idimkovic2: ! idimkovic2: :) JohnV: but the problems come from short blocks gURuBoOleZZ: it defeats several lossy encoders and even replaygain gURuBoOleZZ: there's a problem with the sound located between the attacks JohnV: but during short blocks anyway gURuBoOleZZ: what I call "distortion", "tremolo" or trembling effect or sometimes brillianc JohnV: I know what you mean gURuBoOleZZ: I'm not sure that smearing or pre-echo is responsible of that gURuBoOleZZ: not the only cause at least idimkovic2: it can be draining between short blocks JohnV: it's because of short blocks anyway idimkovic2: usually if they eat too much from the bit reservoir, longs in between would have a problem gURuBoOleZZ: But sample 10 is not the only one involved. I detect 5 times this problem during the test gURuBoOleZZ: And I wasn't careful during the 9 first ones idimkovic2: how do you rate this problem compared with WMA issue? gURuBoOleZZ: WMA issues may be clearly worse gURuBoOleZZ: Nero Digital ABR are more subtle idimkovic2: @Sebastian - this is what I was thinking, too sebastian-mares: I never said they are the same. gURuBoOleZZ: But WMA score could benefit or be handicapped by 2-pass method used on short samples sebastian-mares: I was referring the to the fact that both are not representative in real world usage. gURuBoOleZZ: On the contrary, Nero Digital encodings are always taking profit gURuBoOleZZ: or at best the effect is inaudible sebastian-mares: Because both produce different output when fed with different material type (sample vs. full track). idimkovic2: I would say the second idimkovic2: effect is inaudible JohnV: shouldn't in principle if this is statistically significant all testers notice the same as Guru. idimkovic2: because, excuse me - if iTunes was always higher bit rate than nero idimkovic2: it always had more bits idimkovic2: now if there was something clearly wrong - we would be handicapped a lot and that would be very noticeable sebastian-mares: What do you mean with iTunes always had a higher bitrate? sebastian-mares: file:///C:/Server/Freigabe/listening-test/Bitrate%20Table.htm gURuBoOleZZ: effect is not inaudible when someone noticed it blindly on 5 samples gURuBoOleZZ: without being informed of this problem sebastian-mares: Sorry... gURuBoOleZZ: pure discovery sebastian-mares: (Link: http://maresweb.homeip.net/listening-test/bitrate%20table.htm)http://maresweb.homeip.net/listening-test/bitrate%20table.htm JohnV: true, then more people should notice it if it's significant, and judge the score accordingly gURuBoOleZZ: This is guessing gURuBoOleZZ: You are probably right idimkovic2: hmm Seb sorry idimkovic2: I was still thinking of some HA post idimkovic2: it was wrong then gURuBoOleZZ: but between "most" people and "all" people, there's a difference gURuBoOleZZ: the test is summing the results of all participant, not only the weakest ones gURuBoOleZZ: by weakest, I mean the less discriminent gURuBoOleZZ: such people couldn't make any difference between -V5 and -V0 JohnV: the main point is. The samples complexity is throught pretty much the same, if testers find something wrong in the samples, they will score accordingly. gURuBoOleZZ: but it's not a reason to use the wrong LAME setting gURuBoOleZZ: either we test the good and valid setting, or we don't test anything gURuBoOleZZ: but we can't make a mistake, and then ask for a second round to see if a majority of people can hear a difference betwwen the flawed setting and the right one gURuBoOleZZ: what would you do if 2 persons are able to make a difference and if 8 can't gURuBoOleZZ: ? idimkovic2: hmmh well it all goes down to the question - is this bug significant for separating Nero from the overall results JohnV: people rate the sample worse, if they hear some problems in it. In all our samples there are potentially less quality after about the half of track, than there could be. Which means in principle they should rate the samples worse than after the fix. gURuBoOleZZ: So what you suggest is to ask to all participating people to compare 18 samples again with oldnero and newnero? gURuBoOleZZ: newnero = fixednero gURuBoOleZZ: oldnero = the tested one JohnV: no, I'm saying that in principle if people do the test as they should, we will be rated worse now, than after the fix. idimkovic2: well I proposed to leave results and put the (*) mark and note the bug, and then also post the samples with the new encoder idimkovic2: I think new samples could only be better but still... JohnV: people lower score when they hear worse quality.. gURuBoOleZZ: Ivan, you can't let Nero in the final test if a problem of representativity was found gURuBoOleZZ: problem leading to a difference in sound quality gURuBoOleZZ: *proved* difference gURuBoOleZZ: even by someone exceptional sebastian-mares: If I understood correctly, Juha says that if people really noticed a quality drop, they would've rated Nero according to that. gURuBoOleZZ: the difference exists idimkovic2: thing is, Francis - I honestly do not believe the problem is significant enough for such a measure gURuBoOleZZ: Here is how I rated the incriminated samples: sebastian-mares: And he says that we cannot decide which encoders to throw out and which not based on assumptions that the testers only listened to the beginning of the track. JohnV: sebastian-mares: sure. We have the track there. And after the half of it, it has lower bitreservour than it should. People rate the codecs based on the problems they hear.. gURuBoOleZZ: part A = x.xx and part B = y.yy. Final note = (X+Y)/2 idimkovic2: so if the problem is there, it will be rated gURuBoOleZZ: This is what you think. The fact is that Nero benefits from an unbalanced bitrate distribution idimkovic2: well I think that is not really true gURuBoOleZZ: Yes, the problem is rated, but the impact on mark is limited idimkovic2: because it is a bug idimkovic2: it over-allocates where it should not idimkovic2: so there is a difference in noise/mask ratio which is not really the best thing you need gURuBoOleZZ: there's a big, therefore the problem is true gURuBoOleZZ: a bug - sorry gURuBoOleZZ: the over-allocation has a clear impact on sound quality with some samples idimkovic2: yes idimkovic2: but it also means under-allocation idimkovic2: somewhere else gURuBoOleZZ: it increase the encoding quality compared to a true (full) encoding with the same encoder idimkovic2: well if you over-allocated somewhere idimkovic2: you have to under-allocate somewhere else idimkovic2: for example - new encoder Juha has gURuBoOleZZ: There's no under-allocation. Not before 90 seconds gURuBoOleZZ: curves are showing this idimkovic2: well it is under-allocation Francis idimkovic2: because if there was no bug JohnV: but the whole track which has after about half lower bitreservour, and thus in principle lower quality (maybe not audible always), but if it's there, people should rate the sample based on the problems they hear. idimkovic2: these bits would be allocated in the perceptual way idimkovic2: I can prove that idimkovic2: with the new encoder idimkovic2: usually - it goes like this idimkovic2: frame_bit_demand = PE * bit_Reservoir_grant() gURuBoOleZZ: BTW, if there's a under-allocation we would get: idimkovic2: so - in case of bug idimkovic2: you get results not scaled with PE gURuBoOleZZ: nero samples with over-allocation + under-allocation gURuBoOleZZ: right? idimkovic2: I would call that - allocation without regard to perceptual demands idimkovic2: not scaled idimkovic2: because idimkovic2: in ABR mode sebastian-mares: Ivan, but how come Nero maintains the same bitrate after the first part? idimkovic2: well like this idimkovic2: ABR works like following sebastian-mares: If it under-allocate, it would mean that the bitrate must go down towards the end or whatever. idimkovic2: you have constant bit rate - so each frame >could< have 3000 bits/frame idimkovic2: which equals to 128 kbps idimkovic2: ok? idimkovic2: and the rest is bit reservoir sebastian-mares: Allocating 32 kbps too much in the beginning, but then also too few at the end. idimkovic2: which is 500 kbits idimkovic2: no - let me explain please :) sebastian-mares: OK gURuBoOleZZ: (Link: http://foobar2000.net/divers/tests/2005.12/8_full_tracks.png)http://foobar2000.net/divers/tests/2005.12/8_full_tracks.png idimkovic2: 500 kbits = 3.90625 seconds of pre-buffering idimkovic2: so it has 128 kbits + 500 kbits of bit reservoir + X addition to the bit reservoir which is 2.5% of the total size idimkovic2: okay? idimkovic2: so each frame could >always< be 300 bits/frame idimkovic2: 3000 sorry idimkovic2: and the bit reservoir status is exactly unchanged idimkovic2: if the frame is larger than 3000 bits - it has to be taken from bit reservoir idimkovic2: if it is smaller - it goes TO bit reservoir idimkovic2: this is how it works idimkovic2: now - if you need this, I can provide FAAD2 modified to print out individual frame bit demand idimkovic2: and you can see for sure we do have frames less than average idimkovic2: otherwise it would be ridiculous gURuBoOleZZ: I'd like to see a precise analysis of bitrate gURuBoOleZZ: I posted mine (imprecise) gURuBoOleZZ: (Link: http://foobar2000.net/divers/tests/2005.12/8_full_tracks.png)http://foobar2000.net/divers/tests/2005.12/8_full_tracks.png gURuBoOleZZ: but eloquent gURuBoOleZZ: you can use foobar2000 to watch the bitrate evolution idimkovic2: hmm let me check if I have faad2 gURuBoOleZZ: again, not very precise gURuBoOleZZ: but it confirms the curve gURuBoOleZZ: posted idimkovic2: yes but idimkovic2: even later idimkovic2: there is still bit reservoir, and frames do get more than 128 kbits idimkovic2: only less than at the beginning idimkovic2: and this is a undercoding idimkovic2: because otherwise, bit reservoir should be at the 50% in our implementation idimkovic2: so it would always have bits to "borrow' if there is a short block idimkovic2: or something idimkovic2: in this case - it does not have idimkovic2: and therefore at some very isolated cases difference is audible gURuBoOleZZ: Interesting, but it doesn't change the most important point: samples provided in the test are different from what the end-user would get idimkovic2: indeed gURuBoOleZZ: over-coding + under-coding doesn't mean fair samples idimkovic2: but then again I think the impact is not so significant gURuBoOleZZ: it's even worse than pure over-coding sebastian-mares: And this is the reason why we want to separate Nero from the other contenders in the plot. gURuBoOleZZ: by worse: far from a true encoding (true= full + sample extraction) gURuBoOleZZ: In not so significant, the "so" matters idimkovic2: well I can't argue about that , it is just my remark that marking the Nero with * and stating the bug is enough - but if you guys disagree.. gURuBoOleZZ: Could we tolerate an approximation? idimkovic2: well of course not gURuBoOleZZ: if yes, why bother with strict bitrate rules (+/- 10%)? idimkovic2: it is just the amount of penalty JohnV: But I don't get it.. The over+undercoding is in effect in these samples. People rate the sample based on artifacting. So if the undercoding is significant, it will be rated. gURuBoOleZZ: if yes, why bother with confidence interval? idimkovic2: well bit rate rules are something pretty objective and requried idimkovic2: quality difference of a bug is something else gURuBoOleZZ: JohnV: it would be rated only if people are evaluating the full sample idimkovic2: especially if its significance is questionable gURuBoOleZZ: 30 seconds are very long gURuBoOleZZ: I rarely do it myself gURuBoOleZZ: and rather focuse my notation on one or two point JohnV: gURuBoOleZZ: yes and we can't assume that they aren't. Otherwise we could as well stop these test altogether, because artifacts can be anywhere idimkovic2: @Guru - and you would probably spot the point which is clearly worse gURuBoOleZZ: This is why I requested to use 5...6 second long sample gURuBoOleZZ: without believing in it gURuBoOleZZ: I don't like either when samples are too short idimkovic2: well if they were too short idimkovic2: we would not find this problem gURuBoOleZZ: Ivan> right idimkovic2: also, I'm repeating myself - please help us test the encoder for the next test gURuBoOleZZ: I didn't find it when I encoded my 150 samples gallery with the new encoder idimkovic2: Guru, your help would be very appreciated gURuBoOleZZ: samples are 10 sec long on average JohnV: gURuBoOleZZ: I have pretty much the same problem as you.. I was testing hundreds of samples.. sebastian-mares: WTF? sebastian-mares: Off Topic... JohnV: most are 10-20 sec max, and I admit I can't listen all samples through.. I usually know the problematic places where to concentrade. sebastian-mares: I just received an unencrypted result from someone - how is that possible? idimkovic2: what the hell sebastian-mares: From "Antonski". gURuBoOleZZ: someone who set himself the test sebastian-mares: Well, I can't use that. JohnV: because if I listen all samples from beginning to end, it would be like 6 hours or something trough. :B idimkovic2: no , not really gURuBoOleZZ: no of course sebastian-mares: Have to write to poor dude... gURuBoOleZZ: JohnV= my sample gallery is 25 minutes long gURuBoOleZZ: my official sample gallery I'd say gURuBoOleZZ: It doesn't include all the other ones ;) idimkovic2: @Guru - anyways Juha will send you the new enc idimkovic2: as soon as it is verified idimkovic2: (that it does not make something nasty) idimkovic2: I hope this one's ABR won't have this problem idimkovic2: also - I told to Sebastian idimkovic2: I plan to do "pre pre test" idimkovic2: for HE-AAC - testing Nero ABR, CBR and 2 VBR modes gURuBoOleZZ: I hope this too, otherwise testing wiould become pretty difficult to set sebastian-mares: Well... idimkovic2: I find it very hard to tell which one is better of those sebastian-mares: Second... idimkovic2: okay sebastian-mares: The day is almost over, we are derailing the discussion and I still don't know what to do exactly because the opinions are splitted. sebastian-mares: So... sebastian-mares: Trying to summarize... sebastian-mares: Ivan and Juha claim that the difference is not so important and does not affect the overall ratings and therefore Nero should be included in the plot just like other encoders. sebastian-mares: Juha moreover says that if people noticed a difference, they would've rated the sample according to that. JohnV: imo the only argument to separate Nero is, that we assume that people don't listen the tracks, but only 5 seconds from the start. Imo in that case if that is true (we can't know really), in future tests it doesn't make any sense to include more than 5-6 second clips. sebastian-mares: And we cannot decide something based on assumptions that XYZ listened to the first seconds only. sebastian-mares: Francis on the other hand claims that since the output Nero produces when it's fed with samples is different than the output produced when fed with full tracks justifies the action of splitting the plot. JohnV: but the separation will be essentially based on the assumption of testers' listening habit. sebastian-mares: Is that correct so far? idimkovic2: from my side - yes gURuBoOleZZ: I have some points to detail, but I'd like to express them without seeing too much lines appearing when I'm writing gURuBoOleZZ: Is that possible? sebastian-mares: OK sebastian-mares: So people, Psss! sebastian-mares: ^_^ gURuBoOleZZ: Ivan, Juha? idimkovic2: I'm all ears :) idimkovic2: eyes... ;) gURuBoOleZZ: Juha? JohnV: y gURuBoOleZZ: in english... gURuBoOleZZ: ;) JohnV: y=yes :) gURuBoOleZZ: y = yes ? gURuBoOleZZ: ok gURuBoOleZZ: I'd like to add few comments to JohnV argument gURuBoOleZZ: which is "if people are hearing the problem, they're rating this in consequence. Therefore, nothing to worry about" gURuBoOleZZ: Is that OK? gURuBoOleZZ: Does it correspond to what you mean? JohnV: in principle, because people lower the score when they hear problems gURuBoOleZZ: in principle, yes gURuBoOleZZ: I'm currently the only listener which noticed the problem JohnV: I wouldn't say "there's noting to worry about". Because this is a stupid mistake, so of course I worry.. JohnV: I don't take this lightly.. that's what I mean gURuBoOleZZ: Therefore I can talk about the way I rank Nero digital gURuBoOleZZ: Usually, I'm rating one or two "interesting" point of the sample gURuBoOleZZ: A short range of 2..5 seconds gURuBoOleZZ: The final mark is highly dependent of this precise range gURuBoOleZZ: But sometimes, especially when the sample is divided in 2 different parts, I'm splitting the notation gURuBoOleZZ: for example: piano + electric guitar gURuBoOleZZ: piano get 4 and electric guitar gets 2 gURuBoOleZZ: final notation: 4+2/2 = 3 gURuBoOleZZ: only when both part have similar length gURuBoOleZZ: Now Nero gURuBoOleZZ: FOr Nero there are sometimes 2 different parts gURuBoOleZZ: which are corresponding to 2 quality level gURuBoOleZZ: In the case of LesJoursHeureux (the worse for my taste) there's a big difference between both parts gURuBoOleZZ: wait 3 seconds JohnV: But, IMO it doesn't mean much how one person rates, the idea of a group test that everything will be averaged by the group, testing methods included, and then we get the score from this. *** guruboolez has joined #nda. guruboolez: hello JohnV: hi guruboolez: I was disconnected guruboolez: dial-up JohnV: [01:04] wait 3 seconds JohnV: [01:05] But, IMO it doesn't mean much how one person rates, the idea of a group test that everything will be averaged by the group, testing methods included, and then we get the score from this. *** You're not a channel operator! guruboolez: I haven't finished JohnV: I thought you did :B guruboolez: What I mean is that for someone like me which found a quality rupture hasn't end the test with a balanced mark guruboolez: In my case, The fact that the first part sounded better implies that the final mark is higher than what Nero should get JohnV: that is true, but this happens in a group guruboolez: Your point, and Ivan's one, is that other people shouldn't notice the problem JohnV: no, we don't know it. guruboolez: and that only one have found this out JohnV: The samples are there for everybody to be rated sebastian-mares: Can I say something? guruboolez: yes on my side idimkovic2: yes of course :) sebastian-mares: Cool... sebastian-mares: What Guru wants to say is... sebastian-mares: You have a car... sebastian-mares: The whole car is worth 100 $ for example, but has some golden rims. sebastian-mares: So you pay 300 $ for it, just because of the rims. idimkovic2: hehe economist ;) JohnV: you have to think this from a group perspective.. That is imo what matters. sebastian-mares: The golden rims = Nero's advantage. JohnV: not individuals idimkovic2: well @Sebastian idimkovic2: what would happen idimkovic2: if we had fixed bit reservoir - it would be more balanced, and Guru might not find second part worse idimkovic2: Group might not notice something significant either sebastian-mares: Yes. guruboolez: It's only a conjecture idimkovic2: anyways Juha can send Guru the new encoder guruboolez: You can assume that group can't make a difference between plain-vanilla vorbis and aoTuV sebastian-mares: Well, if Nero didn't had the "bouns", people would've rated it worse. idimkovic2: @Sebastian? Why/ sebastian-mares: Same - if the car didn't had the golden rims, you would've paid 100 $ for it. guruboolez: and therefore claim that aoTuV isn't better than SVN encoders idimkovic2: it might score better on some other parts idimkovic2: bit reservoir is fixed idimkovic2: it would just be used differently guruboolez: most people were unable to make a difference between r3mix and preset standard idimkovic2: not in overcoding something that should not be overcoded idimkovic2: my point is - fixed version can only perform better if the psymodel is OK which I honestly believe it is guruboolez: but it never mean that "only group matters" and than r3mix is as good as Dibrom's tuning idimkovic2: and this could be found out in a day or two JohnV: we don't have any bonus after half of the track, on the contrary.. idimkovic2: ... and the "bonus" at the beginning sebastian-mares: You do on the samples. JohnV: we have deficiency after half of the track cause reservour isn't working idimkovic2: is wasted IMO because JohnV: as it should idimkovic2: it is not allocated as it should idimkovic2: according to psymodel idimkovic2: now Sebastial will probably ask me idimkovic2: why then we don't separate Nero :) sebastian-mares: :-) guruboolez: If a deficiency occur, we should see it on curves or foobar2000 idimkovic2: well there is defficiency guruboolez: but it's not the case idimkovic2: because these bits used at the beginning would be used in all track JohnV: It doesn't matter for people if it's overcoded if they can't hear a difference anyway, what matters in rating is artifacting and problems. idimkovic2: let's do a small test (Guru ) - Juha will send you the new encoder idimkovic2: so you can check how the bits are spent JohnV: and we have a deficiency due to this bug after half of the track lenghts guruboolez: the bits used in the beginning are not missing in the rest sebastian-mares: Yes, but what if Nero would produce severe artificats at the beginningif the bitrate was not 155 kbps, but 130 kbps? idimkovic2: @Guru - they are missing sebastian-mares: artifacts* idimkovic2: they WOULD be used guruboolez: curves between short samples and full tracks are showing this idimkovic2: well that is easy to find out idimkovic2: all I am saying idimkovic2: in ABR mode bit reservoir is >fixed< guruboolez: anyway, the problem is not here idimkovic2: variable part is dependent on the file size idimkovic2: how does it get spent... it is up to encoder idimkovic2: and the preferable way to spend it idimkovic2: is according to the psymodel demand idimkovic2: if not - you get overcoding where it should not happe idimkovic2: and less bits for coding something else idimkovic2: because we can't go over or under the bit reservoir idimkovic2: in ABR mode idimkovic2: it is not VBR guruboolez: let's stop with technical debate guruboolez: Sebastian is not asking for an explanation guruboolez: The samples are benefiting from a massive bonus bitrate, and are maybe handicaped on the end (not proved by any bitrate measurement tools) guruboolez: Even if the bonus and the handicap would be perfectly balanced, the samples are still invalid guruboolez: They don't reflect anything in real-lif sebastian-mares: Since they don't represent real-world usage. sebastian-mares: Yeah. JohnV: the codec is invalid, samples are not. Samples have the same problems as real life tracks. guruboolez: This is why Nero could only be disquilified guruboolez: JohnV: no sebastian-mares: No, because Nero behaves differently. JohnV: how so? guruboolez: The encoder is putting the additionnal bitrate on the beginning of each track JohnV: the samples illustrate the problem. sebastian-mares: Dude... JohnV: the problem was found from these samples guruboolez: but we haven't tested beginning of tracks, but samples coming from the middle, the end, etc.. sebastian-mares: When 20 seconds from 30 have a high bitrate, 2/3 of the sample is overencoded. JohnV: looks very real life to me.. sebastian-mares: If 20 seconds of 240 have a higher bitrate, it's something else. guruboolez: Nero sounds better during 20 seconds only guruboolez: Problem: we have tested these 20 seconds for all 18 samples guruboolez: What about the major part of full tracks encodings? guruboolez: which are encoded at 133 kbps sebastian-mares: It's encoded like the remaining 10 seconds of the samples. guruboolez: all the time JohnV: but the samples are not rated by the length, the are rated according to the artifacting *** gURuBoOleZZ has signed off IRC (Connection timed out). *** guruboolez has signed off IRC (). *** gURuBoOleZZ has joined #nda. idimkovic2: (Link: http://foobar2000.net/divers/tests/2005.12/AAC_Nero_Digital.png)http://foobar2000.net/divers/tests/2005.12/AAC_Nero_Digital.png in fact I think it is more leaning towards 10 seconds sebastian-mares: Yes, but if 1/3 of the track is artifacting, it's different than 98% of the track artifacting. gURuBoOleZZ: 10 seconds is very much idimkovic2: - run in idimkovic2: IIRC the ABCHR app was set to start not at the beginnign? gURuBoOleZZ: Especially when you see that the remaining part is still encoded with 130...133 kbps, thus NO UNDER-CODING idimkovic2: beginning, sorry idimkovic2: well sebastian-mares: 2 seconds + encoder offset. idimkovic2: depends what you call "under coding" idimkovic2: it is under coded compared to the total average bit rate idimkovic2: which was, IMHO, within the limits idimkovic2: and in the lower end gURuBoOleZZ: Pretty simple: target bitrate is 133 kbps idimkovic2: of all contenders gURuBoOleZZ: the beginning is at 150 kbps (+15 kbps ~) idimkovic2: it is not - target bit rate is average bit rate of the sample JohnV: gURuBoOleZZ: but the reservour is not flexible enough to give advantage it should gURuBoOleZZ: But the second part is not lowered by 15 kbps compared to the target idimkovic2: it is lowering of course gURuBoOleZZ: It's not 150 kbps then 115 kbps => average 130 kbps idimkovic2: it is N % lower than the average rate of the track gURuBoOleZZ: It's 150 then 130 instead 133 gURuBoOleZZ: the over-coding is much greater than the under-coding idimkovic2: if you have N kbits/s of the total bit rate, and the first 10 seconds use 15% more idimkovic2: how so? idimkovic2: how it can be? gURuBoOleZZ: 133-150 idimkovic2: if you measure the total bit rate of the sample idimkovic2: yes, but Guru gURuBoOleZZ: simply observe the bitrate idimkovic2: if the average bit rate of the sample is 137 kbps idimkovic2: it cannot be "more overcoded than undercoded" JohnV: well.. imo this overcoding isn't even very unfair bitrate wise if you check aOtUv bitrates of these samples.. gURuBoOleZZ: The avergae bitrate of the sample MUST be 133 kbps idimkovic2: this is the fallacy of looking at "average bits per seconds" *** rjamorim has joined #nda. idimkovic2: no it must not rjamorim: BEHOLD rjamorim: fuckers :) idimkovic2: heeeey :) sebastian-mares: Pssst! rjamorim: hello, earthlings idimkovic2: CELEBRITY is here :) gURuBoOleZZ: I got 133 kbps for all tracks I've encoded and which are at least 4 min lon JohnV: noooo rjamorim: hahaha gURuBoOleZZ: hello roberto idimkovic2: guys I'm really tired rjamorim: Hello, Francis :) idimkovic2: let someone else tells Roberto what we discussed gURuBoOleZZ: I must left too idimkovic2: :-) sebastian-mares: lol sebastian-mares: Poor Roberto. rjamorim: Aw :( idimkovic2: anyway... I have to go idimkovic2: as if anyone is interested rjamorim: OK, seeya Ivan idimkovic2: I am still with my opinion JohnV: bye idimkovic2: about ranking idimkovic2: :-) sebastian-mares: Anyone minds if I send Roberto the logs? rjamorim: OK, sebastian will send me logs rjamorim: hehehe sebastian-mares: Not that it matters, I will do it anyways. :-B idimkovic2: please! sebastian-mares: LOL idimkovic2: cya all rjamorim: :D rjamorim: cya, Ivan gURuBoOleZZ: My opinion is unchanged: unrepresentative sample -> unrepresentative test gURuBoOleZZ: bye rjamorim: bye, Francis * sebastian-mares waves. rjamorim: Talk to you later *** idimkovic2 has signed off IRC (). rjamorim: Got your e-mail BTW :) rjamorim: Will answer it soon gURuBoOleZZ: bye :) *** gURuBoOleZZ has signed off IRC (). rjamorim: bye :) rjamorim: heh sebastian-mares: OK, only the three of us are here, now. sebastian-mares: Juha, still alive? JohnV: barely. :B sebastian-mares: ok rjamorim: haha rjamorim: what time is it in .fi? JohnV: 01:31 sebastian-mares: One hour minus here. rjamorim: damn sebastian-mares: Had a real-time conversation with Guru over e-mail yesterday at 3 AM. :-B rjamorim: sweet :B sebastian-mares: OK, log is on its way. rjamorim: goodie sebastian-mares: I'm such an idiot. sebastian-mares: I sent it to myself. >_< rjamorim: will read rjamorim: :D rjamorim: oh man JohnV: lol sebastian-mares: OK, sent. rjamorim: danke sebastian-mares: Keine Ursache. :-) rjamorim: up yours! rjamorim: ok, got it sebastian-mares: Have fun. sebastian-mares: 64 KB. sebastian-mares: 1061 lines. rjamorim: will save it to txt so that I can read it some place that does line breaking :B rjamorim: I have read worse rjamorim: Back when I actively participated in #vorbis rjamorim: LOL FUCK rjamorim: too big for NotePad :B sebastian-mares: lol sebastian-mares: Are you on Windows 95 or what? sebastian-mares: Works perfectly fine here. rjamorim: Mom's Win98 sebastian-mares: Ah. JohnV: I just think it's pretty harsh if we are disqualified, because of a bug which has arguably positive and negative effects. The point is to rate the samples essentially. If people find problems from the samples, then they would score accordingly, that's still my point. sebastian-mares: But our point is that the first part of the song sounds better than Nero would usually do. sebastian-mares: And therefore, the overall rating is higher. JohnV: It can be argued that in a real life track there would be more length for the problem section. According to plots, testers have 10-20 seconds to rate the problems. sebastian-mares: But that is not the point. JohnV: How do you know it would be audible better? sebastian-mares: Only the fact that the first part has a higher quality leads to a higher ranking. rjamorim: Because of higher bitrate? :B JohnV: for most people in most samples? sebastian-mares: If the whole file was the same, maybe the rating was lower. JohnV: higher bitrate =! transparency for a group sebastian-mares: Like 4.0 instead of 4.5. sebastian-mares: That IS a difference. JohnV: but the samples are not rates so that some part sounds good. There's 10-20 seconds which sound worse than it could. sebastian-mares: How much worse? sebastian-mares: And how much better does the first part sound? rjamorim: I just find that behaviour of spreading higher bitrates at the first few seconds and then evening out highly insidious. rjamorim: It almost sounds like it was tailored for listening tests rjamorim: Since it would make little difference on normal tracks sebastian-mares: And it does. rjamorim: But makes huge difference for 30-second samples JohnV: These are impossible to answer, The point is, what decides the rating is the bad sound in a clip, and we have 10-20 sec potentially worse sound than there should sebastian-mares: Guru pointed out that full tracks have totally different bitrate distribution than samples. sebastian-mares: At least in Nero. sebastian-mares: Vorbis, LAME and iTunes are almost identical. JohnV: It almost sounds like it was tailored for listening tests <- doesn't make sense cause we have 10-20 sec lenght worse sound than it could due to bug in reservour which didn't make it flexible enogh rjamorim: But is that bug related to the higher bitrate distribution in the first 20 seconds? JohnV: yes, read the log JohnV: and it's not higher 20 seconds, it's more towards 10 seconds if you look the graphs rjamorim: Too big log, will read later tonight rjamorim: And if that bug is related to teh bitrate distribution: How can it make sound WORSE if bitrate is HIGHER? sebastian-mares: Last part is worse. JohnV: because reservour goes due to bug too small and not flexible enough sebastian-mares: Because reservoir was drained or what? sebastian-mares: OK rjamorim: Yes, but if people tend to listen to beginning first, teh last part won't be taken so much into account rjamorim: Hell, in some samples, the worse part probably doesn't even kick in until the very last seconds JohnV: true, but that is an assumption JohnV: hmm that's not true rjamorim: Ican claiming quality impact was minimal is an assumtion too rjamorim: *Ivan *** idimkovic2 has joined #nda. JohnV: it's 20 seconds max, most are towards 10 sec sebastian-mares: You again? idimkovic2: nah - I couldn't resist :) idimkovic2: cannot sleep while this is on ;) sebastian-mares: Juha's point is that we cannot exclude Nero just because we think people only listen to the first part. JohnV: rjamorim: nobody noticed this before Guru, it is quite minimal.. rjamorim: Even if it was 5 seconds! sebastian-mares: :-D rjamorim: It would be intolerable! sebastian-mares: Neither can I - I have to decide based on this shit. JohnV: rjamorim: the bad section last 10-20sec and is present in the samples. People rate samples based on problems. If there are problems in the samples, they should be rated. sebastian-mares: I think you fail to see one critical point. sebastian-mares: The first part sounds better than the rest so this makes the score higher. idimkovic2: huh idimkovic2: just two things sebastian-mares: Also, what if Nero had artifacts during the first 5 seconds? JohnV: How do you know it makes the score higher. We can't even know if the second part makes the score lower. Overall the difference is very small. For Guru in the hard attack samples only 1.5 ITU. What is that for a group.. sebastian-mares: But now doesn't because the bitrate is higher than normal. idimkovic2: 1. Purely statistically, rest would then "sound worse" if we all agree that the AVERAGE bit rate was within limits idimkovic2: 2. Perceptually, overcoding something which is already transparent is a WASTE of bits sebastian-mares: OK, but what about the second claim? sebastian-mares: "Also, what if Nero had artifacts during the first 5 seconds?" idimkovic2: what if? sebastian-mares: Yes. sebastian-mares: And because we don't know, Nero should be separated. idimkovic2: well - it would not have artifacts somewhere else if the buffer is the same, then sebastian-mares: Because the test is boggus. idimkovic2: anyways idimkovic2: Guru will get fixed version soon, would be just interesting to see idimkovic2: if this is the case sebastian-mares: Could I have the fixed version, too? idimkovic2: of course sebastian-mares: Thanks. idimkovic2: that is the whole point sebastian-mares: And I don't say to remove Nero totally. idimkovic2: and also for the HE-AAC I really don't want these problems :( sebastian-mares: I say to simply mark it so it stands up from the rest. idimkovic2: I guess nobody here wants them, too sebastian-mares: And write a note on the graph why Nero can't be compared 1:1 with the rest. idimkovic2: well I am for mentioning a bug idimkovic2: but intepreting what that means idimkovic2: it would be rather too much speculation for my taste, but that's just my 0.02 euros anyway ;) sebastian-mares: Well... sebastian-mares: We excluded WMA Standard from the test because encoding samples would've produced different results than encoding full tracks. idimkovic2: and btw idimkovic2: but the effect with ND is much much less sebastian-mares: Now Guru tested the same with Nero and surprise: it also produces two different outputs. sebastian-mares: But it is there. idimkovic2: here we go again ;) sebastian-mares: And it is much more with ND than with AoTuV or LAME. idimkovic2: anyways - what I propose idimkovic2: is to mark it and state that it had a bug concerning bit allocation that might, or might not affect the results idimkovic2: whether it would be better or worse idimkovic2: I cannot tell, and I do think it would be worse in fact idimkovic2: because overcoding perceptually is much less important than undercoding sebastian-mares: We should hurry up anyways. Otherwise, Juha is going to bang his head against the keyboard and fall asleep. idimkovic2: hahaha idimkovic2: anyway - one more thing from my side idimkovic2: for the future of the tests idimkovic2: please @everyone - add "time-domain bitrate check" JohnV: because overcoding perceptually is much less important than undercoding <- very important point.. idimkovic2: in "a must" check likst sebastian-mares: For the future of the tests, provide the "to-be-tested" encoder sooner, please. :-B idimkovic2: yes :) idimkovic2: well for HE-AAC I can be sure it would be available in mid-January idimkovic2: hopefully idimkovic2: so plenty of time for checking sebastian-mares: Yeah. sebastian-mares: Well, I am going to see what I will do with the plot. sebastian-mares: Tukey's HSD is calculated with Nero anyways. sebastian-mares: The rest is more or less a design issue. idimkovic2: well sebastian-mares: Whether or not we move Nero 200 pixels to the right and state that it had a bug or keep it where it is. idimkovic2: if you see that considerable amount of people notice this by result interpretation idimkovic2: then do that separation, my advice idimkovic2: if not... just menioning should be ok sebastian-mares: OK idimkovic2: that it has (had , hopefullY) a bug idimkovic2: if we lose, just disqualify us hehehe sebastian-mares: Believe me - you are not losing. idimkovic2: but I don't think that could ever happen ;) sebastian-mares: Shine does. sebastian-mares: ^_^ idimkovic2: lol idimkovic2: anyway I'm definitely off now sebastian-mares: Anyways, that should be all. sebastian-mares: Thanks for participating. idimkovic2: please check the intensity stereo :) sebastian-mares: Sure. idimkovic2: you're welcome sebastian-mares: What did you get for Christmas, BTW? :-P idimkovic2: I spent it all alone working ... GF is in the country far away * idimkovic2 sad sebastian-mares: Poor you... sebastian-mares: :-P idimkovic2: I should go out and have some beer or smth :) sebastian-mares: Still in Berlin? idimkovic2: yeah idimkovic2: I'm prolly moving to Munich, though sebastian-mares: Ehw... idimkovic2: but it is ridiculously expensive sebastian-mares: Bavaria idimkovic2: real estate idimkovic2: well.... yeah sebastian-mares: Stoiber. sebastian-mares: :-S idimkovic2: :-) idimkovic2: lol idimkovic2: anyways... my GF is there and she's not mobile as I am idimkovic2: so I have to compromise ;) idimkovic2: anyway idimkovic2: I'll be spending 3 days a week in Karlsruhe sebastian-mares: Here in Karlsruhe, everything is closed. It is since 2 PM actually. Not even Burger King, Pizza Hut and the like are open. idimkovic2: (gotta find place there, too) idimkovic2: yeah I know idimkovic2: I spent 2002 and part of 2003 there ;) idimkovic2: once I almost starved :) sebastian-mares: lol idimkovic2: anywayz guys - have a good night sebastian-mares: Yeah, I am off too. idimkovic2: and a happy and successful new year sebastian-mares: Yeah, for you too. sebastian-mares: Roberto, still there? JohnV: happy new year to everybody! :) *** idimkovic2 has signed off IRC (). sebastian-mares: :-) sebastian-mares: Bye! JohnV: bye :)( Session Close (#nda): Mon Dec 26 01:07:31 2005