Results of the public AAC listening test @ 96 kbps (July 2011)


These are the summary results of the AAC listening test @ 96 kbps.

You can download a ZIP file containing all results for all samples.

How to interpret the plots: Each plot is drawn with 5 codecs on the X axis and the rating given (1.0 to 5.0) on the Y axis. The 95% confidence intervals are given on each plot. The mean rating given to each codec is indicated by the middle point of each vertical line segment. Each vertical line segment represents the 95% confidence interval (using ANOVA analysis) for each codec. This analysis is identical to the one used in previous listening tests.

One codec can be said to be better than another with greater than 95% confidence if the bottom of its segment is at or above the top of the competing codec's line segment. Note that this is an approximate analysis with some assumptions, notably that the listener gradings are normally distributed. A more reliable, almost assumption-free analysis (bootstrap) is below.

Important note: These plots represent group preferences (for the particular group of people who participated in the test). Individual preferences vary somewhat. The best codec for a person is dependent on his own preferences and the type of music he prefers.


Plot of the complete result (20 samples, 280 results):

Full plot

Closeup of the interesting results (20 samples, 280 results):

Zoomed plot

Per-sample results

A page with graphics for each sample individually is here.

Bitrate table

The codecs and settings were calibrated to provide ~96kbps on a large variety of music.

Codec bitrates

These are the bitrates used by the codes for the samples in the test:

	Sample   Length[s]  nero     QT CVBR  QT TVBR  FhG      CT CBR   Anchor
	--------------------------------------------------------------------------
	01       30         109      108      119      120      100      102
	02        9          75       94       67       77      100       76
	03       13          93      112      102       97      100      101
	04       28         102       99       98      113      100      103
	05       30          95       97       95       99      100       98
	06       20          81       98       84       90      100      105
	07       22         109      107      107      125      100      103
	08       28          94      105       82       95      100       97
	09        9          96       98       95      106      100      104
	10       30          98      106      106      101      100       99
	11       20          96       97       87      104      100      100
	12       15         100      110      101      100      100      100
	13       10         101      101      101       95      100       99
	14       10          89       97       97      105      100      104
	15       19         105      109      113      117      100      101
	16       28          90       96       84       91      100      101
	17       20         104       97       90      105      100      104
	18       18          65       93       67       84      100      102
	19       16         106       98       91      101      100       96
	20       30          90       96       83       83      100       97
	--------------------------------------------------------------------------
	Mean     20.3        95-96   101       93-94   100-101  100      100    
    

Bootstrap analysis:

	bootstrap.py v1.0 2011-02-03
	Copyright (C) 2011 Gian-Carlo Pascutto 
	License Affero GPL version 3 or later 

	Reading from: results_AAC_2011.txt
	Read 6 treatments, 280 samples => 15 comparisons
	Means:
	    Nero      CVBR      TVBR       FhG        CT  low_anchor
	   3.698     4.391     4.342     4.253     4.039     1.545

	Unadjusted p-values:
		  CVBR      TVBR      FhG       CT        low_anchor
	Nero      0.000*    0.000*    0.000*    0.000*    0.000*
	CVBR      -         0.128     0.002*    0.000*    0.000*
	TVBR      -         -         0.059     0.000*    0.000*
	FhG       -         -         -         0.000*    0.000*
	CT        -         -         -         -         0.000*

	CVBR is better than Nero (p=0.000)
	TVBR is better than Nero (p=0.000)
	FhG is better than Nero (p=0.000)
	FhG is worse than CVBR (p=0.002)
	CT is better than Nero (p=0.000)
	CT is worse than CVBR (p=0.000)
	CT is worse than TVBR (p=0.000)
	CT is worse than FhG (p=0.000)
	low_anchor is worse than Nero (p=0.000)
	low_anchor is worse than CVBR (p=0.000)
	low_anchor is worse than TVBR (p=0.000)
	low_anchor is worse than FhG (p=0.000)
	low_anchor is worse than CT (p=0.000)

	p-values adjusted for multiple comparison:
		  CVBR      TVBR      FhG       CT        low_anchor
	Nero      0.000*    0.000*    0.000*    0.000*    0.000*
	CVBR      -         0.130     0.005*    0.000*    0.000*
	TVBR      -         -         0.107     0.000*    0.000*
	FhG       -         -         -         0.000*    0.000*
	CT        -         -         -         -         0.000*

	CVBR is better than Nero (p=0.000)
	TVBR is better than Nero (p=0.000)
	FhG is better than Nero (p=0.000)
	FhG is worse than CVBR (p=0.005)
	CT is better than Nero (p=0.000)
	CT is worse than CVBR (p=0.000)
	CT is worse than TVBR (p=0.000)
	CT is worse than FhG (p=0.000)
	low_anchor is worse than Nero (p=0.000)
	low_anchor is worse than CVBR (p=0.000)
	low_anchor is worse than TVBR (p=0.000)
	low_anchor is worse than FhG (p=0.000)
	low_anchor is worse than CT (p=0.000)
    

ANOVA analysis:

	FRIEDMAN version 1.24 (Jan 17, 2002) http://ff123.net/
	Blocked ANOVA analysis

	Number of listeners: 280
	Critical significance:  0.05
	Significance of data: 0.00E+00 (highly significant)
	---------------------------------------------------------------
	ANOVA Table for Randomized Block Designs Using Ratings

	Source of         Degrees     Sum of    Mean
	variation         of Freedom  squares   Square    F      p

	Total             1679        3200.32
	Testers (blocks)   279        1020.15
	Codecs eval'd        5        1666.66  333.33   905.53  0.00E+00
	Error             1395         513.51    0.37
	---------------------------------------------------------------
	Fisher's protected LSD for ANOVA:   0.101

	Means:

	CVBR     TVBR     FhG      CT       Nero     low_anch 
	  4.39     4.34     4.25     4.04     3.70     1.55   

	---------------------------- p-value Matrix ---------------------------

		 TVBR     FhG      CT       Nero     low_anch 
	CVBR     0.333    0.007*   0.000*   0.000*   0.000*   
	TVBR              0.084    0.000*   0.000*   0.000*   
	FhG                        0.000*   0.000*   0.000*   
	CT                                  0.000*   0.000*   
	Nero                                         0.000*   
	-----------------------------------------------------------------------

	CVBR is better than FhG, CT, Nero, low_anchor
	TVBR is better than CT, Nero, low_anchor
	FhG is better than CT, Nero, low_anchor
	CT is better than Nero, low_anchor
	Nero is better than low_anchor
    

Notes:

The graphs are a simple ANOVA analysis over all submitted and valid results. This is compatible with the graphs of previous listening tests, but should only be considered as a visual support for the real analysis.

For a correct calculation of the statistical probability, and to see if one can safely make any conclusions, one has to refer to the bootstrap output.

Post-screening:

Invalid results were discarded according to the following criteria, which were made public at the beginning of the test:

Contact

IgorC: igoruso@gmail.com