Home Theater Forum and Systems banner

1 - 20 of 41 Posts

·
Registered
Joined
·
999 Posts
Discussion Starter #1
Just had to point out some blind testing of the DACs and low level electronics (stereo only) for two AVRs and a pre-pro done by Steve Callas here... and now in post #6 below.

Nice work Steve and company. But as the length of the ensuing thread at AVS that followed shows, as they say, "no good deed goes unpunished". :T

Bob
 

·
Registered
Joined
·
703 Posts
Hi Steve, Bob.

Good stuff, I would have never seen it without the link...thanks for posting it here!

I gave up on AVS a long time ago. You have a knowledgeable guys (like Steve, Ed, Jack, ect) who try to keep the discussions on track and fair to all OEMs. Then you have the “mutual admiration society”(and everyone knows who they are..:). A handful of self appointed experts who validate each other’s opinions 50x a day. Stroke their collective ego and you win their support( and the next “listening test” if you are an OEM). If you don’t play that game…they will collectively target you and do whatever they can to discredit you on the forum(s). Steve refuses to play the game, so you’ll see the same 2-3 guys always snipping at anything he posts now.


Tom V.
SVS
 

·
Registered
Joined
·
2,398 Posts
Yep, you nailed it, thanks Tom. The funniest part (or should that be most pathetic?) is that their "leader" will go about creating new screen names so he can voice the same opinion more than once as if to lend more credit to his arguments :rolleyes:
 

·
Registered
Joined
·
2,398 Posts
Sure:

Yesterday, myself and a few other enthusiasts got together to conduct a processor blind listening test. Our goal was to determine whether or not the different DACs and analog prestages of different receivers and pre/pros can affect sound quality. Amplification was not being tested, just processor sound quality, so a discrete amp was used to do all of the amplification. The units we used, in my opinion, were a good representation of various levels of product that most of us in this hobby will consider.

Processors:

Pioneer VSX 1014 – Essentially the same unit as the newer 1015, this receiver has become the standard for entry-level receivers. Plenty of features, decent amp section, and a reputation as being great for movies and not so great for music.

Harman Kardon AVR 635 – Not a hi end receiver by any means, but definitely regarded as a step up from entry level. Tons of features, beefy amp section, and a reputation as being one of the top receivers in regards to sound quality.

Audio Refinement Pre-2DSP – A dedicated AV preamp processor that is regarded as another step up from receivers. While this unit is not the most expensive and doesn’t have the longest list of features, it is regarded as being one of the most musical pre/pros, having great sound quality.

Amplifier:
PS Audio HCA 2 – Quality 2 channel amp that is also regarded for its great sound quality.

Speakers:
Totem Acoustics Forrests

CD Player:
Panasonic DVD S77

Cables
RS Gold analog stereo, Dayton Audio digital coax, DIY dual 14 gauge twisted pair speaker wire

The participants were Jon, his girlfriend Gudrun, my friend Tyler, and myself. ---k---, a member from htguide.com, and another professor from Purdue were all scheduled to come as well, but they wimped out…..buncha wimps. Again, since we were only interested in testing processing, we used the digital coax output from the cd player to each unit and then the analog preouts from each unit to the amplifier. All of the equipment (aside from the speakers of course) was kept in a second room with doors shut so neither it nor the moderator were visible to the listeners – the speaker cables ran under the door. Jon will post a few pictures of the setup when he replies to this thread. We made sure to eliminate all variables that may affect the sound that aren’t related to the actual processing, so EQ, tone controls, distance settings, subwoofer functions, etc., in each processor were turned off. The units simply had to decode the incoming digital stream and send full range signals out to the speakers with no post processing. We didn’t use a subwoofer because each unit may have different crossover slopes or bass management methods, and that could possibly affect what we heard and would not be attributed to the DACs or analog prestage. We didn’t test surround sound quality performance because I have already had discussions with an algorithm engineer from Dolby about whether or not different DSPs can affect what info steered to different channels or whether they can affect sound quality.

The units were calibrated to each other by using a test cd with a wide band pink noise tone and a digital RS meter mounted to a tripod that was placed on the main seating position. We plugged in the L channel output from one unit and adjusted the master volume until we registered 66db from the tone. Then we unplugged the L channel, plugged in the R channel, and adjusted the individual R channel settings until it also read 66db. Then we plugged in both channels and measured the output, which in Jon’s room was 70db. We did this with each unit until we had identical output levels.

Our test consisted of two parts. The first part was to test whether or not there were sound quality differences between units – not which unit we preferred, not what differences we noticed, JUST whether or not we heard differences. The second part, which was to be conducted only if our first test statistically proved to us that differences did exist (based on a 70% accuracy level), was to test which unit had the best sound quality based on our preferences. We wrote out the three combinations of units – HK vs PI, PI vs AR, and HK vs AR – on three strips of paper, folded them up, and placed them in a basket. Three of us reached in the basket and selected a piece of paper, which we then put in our pockets. When it was someone’s turn to be moderator, they opened up the paper, and were allowed to use only those two units for playback during their test. This way, nobody knew what pieces of equipment were being tested when. The fourth person (who won this spot through rock, paper, scissors) had free reign to use all three units during their testing.

The procedure for each test is written down on our log sheets, so see attached. If anyone has any questions or if it is not clear enough, feel free to ask. I'd write it all out here again, but then this post would be nearly twice as long, and none of us want that. So just take a look at the attached log sheets, maybe zoom in a bit, and read the procedure. One difference is that we did not have to start on a silent track, as you'll read later, I inserted 3.5 seconds of silence at the beginning of each track, so the mod just had to que up the track number and press play. The audible difference part consisted of four tests with three listeners and one moderator for each test, and the moderator changed for each test. There were three listening positions, left, middle, and right, and the listeners rotated their seats between tests so everyone got a chance at each seat. Listeners were not allowed to speak of the test of share any impressions at all until all four tests were done. The moderator could not be seen or heard in the other room behind the doors. We all did a few dry runs as both listeners and mods so that everyone was clear in how to conduct the test – it is difficult in explaining it, but very easy and intuitive in practice.

The songs we chose for the audible difference testing were selected back in January. Each participant chose a couple of songs that they both enjoyed and were confident that they knew very well. They then sent me these songs, and I isolated a ~35 second long clip from each song that we agreed captured its essence and a range that we felt would be easy to distinguish between if audible differences were present. I compiled these clips in addition to the full songs on cds and sent them out to each participant in early February. By doing this, each participant has been able to listen to and become very familiar with the songs and exact clips we would be using for this audible difference testing for over three months. Basically, by the time the test finally took place, the participants knew the samples through and through. A note of interest is that I received the HK 635 earlier this week and found that it will mute the first second or so of playback from a digital stream, so at the last minute, I had to pull up the clips again and add 3.5 second of silence to the beginning of each clip. In doing this, I eliminated any chance of this oddity tipping us off as to whether the HK was being used. Taking this into consideration, I think we successfully covered all aspects of the test that could have possibly kept it from truely being blind.

Before we get to the results, I just want to make some points clear so we can avoid some of the nastiness that resulted from our last test. Whether you agree or disagree with our results is fine, just don’t try and convince us of something otherwise, as we just spent a 10 hour day testing. We aren’t trying to pass off our test as a given fact in every single circumstance for every single person, but our results are fact in this listening room with this equipment with these people. If you disagree with some part of the methodology, that is fine, just politely express it as a logical point and I will address it. If you don’t agree with our results, DO NOT try and find imaginary faults within our test to try and justify yourself.

The raw results from the audible differences test showed that as a group, we were correct 61 times out of 120, or 51% accuracy. To break it down by comparison:

HK vs PI – correct 21 out of 39, or 54% accuracy
PI vs AR – correct 15 our ot 36, or 42% accuracy
HK vs AR – correct 19 out of 30, or 63% accuracy

To break that down further, these are the results we got when removing the trials in which the moderator chose to use the same unit twice in a row. In other words, these results are purely of the direct comparison of switching from one unit to the other, and because of that, the most significant in our opinion.

HK vs PI – correct 18 out of 33, or 55% accuracy
PI vs AR – correct 9 out of 24, or 38% accuracy
HK vs AR – correct 10 out of 18, or 56% accuracy

To examine it a different way, here are the results by person:

Jon – correct 14 out of 30, or 47% accuracy
Gudrun – correct 16 out of 30, or 53% accuracy
Tyler – correct 12 out of 30, or 40% accuracy
Steve – correct 19 out of 30, or 63% accuracy

No combination resulted in 70% or greater accuracy, and no single person achieved greater than 70% accuracy. Because of this, and because we agreed afterwards that it was very difficult to try and pick something out to base your decision on, we did not continue on with the sound quality preference testing.

The closest we came to statistically proving there were audible differences was with the HK vs the AR, using the song Arousing Thunder by Grant Lee Buffalo, which has some bass from a drum being struck throughout the clip. As a group, we were correct 12 out of 15 times, or 80% accurate. Tyler had actually taken down a few notes during this test, and on Trials 3 and 5 he jotted that the second playback had heavier or deeper bass – the HK was used for the second playback on both of those trials. Later in the evening, we did a quick test of the HK using it’s internal amplification vs the AR using the PS Audio amp, and I also noted that the first playback had more punch to the bass – it turned out to be the HK as well. Unfortunately, I don’t know how much significance we can draw from only 15 samples on that combination with that song. Our collective score of the HK vs the AR never got higher than 63%. If we had more time, we could have examined this further, but it was already into the night and we needed to refit the baseplate on Jon’s kickass subwoofer.

To be honest, the results were pretty surprising to me. Had you asked me prior to a few months ago whether DACs made a difference, I would have said no. But in doing my research for a new receiver purchase, I came upon several first hand user reviews from this website and others, some from users whose opinion I really respect, that different DACs truly do make a difference. So in the last few months, I thought for sure we would be able to identify differences…..I guess not. If we were able to measure level matched outputs of the same clips from two units on a computer screen, we may notice small differences do exist, but in actual practice, they were not readily discernable. Will this test affect my purchasing decision as I claimed it would for months leading up to it? Yes. My HK 635 has a couple of glitches and needs to go back. Since I will be using a Carvin hd1800 to power my mains, this test proves to me I can buy a less expensive receiver and still get the same sound quality from the processing. A Pioneer 1015 might be the ticket.

As a side blind test, one that I have always wanted to do but never got around to, mainly because I haven’t drank a soda in years, we tested Pepsi vs Coke. There was a pretty big difference between the two that we all picked up on, one had a lot more carbonation and had a hint of citrus, the other was sweeter and smoother, almost more syrupy tasting. Only problem was that Gudrun and I assumed Coke was the more carbonated soda, so we were incorrect, but it still stands that the difference between the two is quite evident.

Big thanks to Jon for hosting and providing us with a nice spread of food. And Jon, though I said it like 20 times yesterday, that audio rack looks great! I want to get started on mine asap.
 

Attachments

·
Plain ole user
Joined
·
11,121 Posts
Steve,

I appreciate what you are doing. Hope you don't get beat up as much as on AVS. You never replied to my question regarding the value of single subject longitudinal learning studies to identify differences that may be hard to detect. It would also remove the between subject variablility that complicates this kind of research. It would really put the goldenears to the test. Do you have any opinions?
 

·
Registered
Joined
·
2,398 Posts
You never replied to my question regarding the value of single subject longitudinal learning studies to identify differences that may be hard to detect
Not 100% sure what you are asking, could you elaborate? I think you are asking if it would be best for the listeners to undergo some training to become more adept at discerning differences?
 

·
Banned
Joined
·
22,577 Posts
Thanks Steve! Hopefully it will be more welcomed here, although I realize we don't have near as many members to see it... but it is great content and very much appreciated.
 

·
Plain ole user
Joined
·
11,121 Posts
SteveCallas said:
Not 100% sure what you are asking, could you elaborate? I think you are asking if it would be best for the listeners to undergo some training to become more adept at discerning differences?
If someone wanted to really begin to understand what can and what cannot be perceived as differences between given components, one would find those who suggest that they can hear differences and use them as subjects in learning trials where they attempt to hone their skill at determining differnces and see if they perform better over time and statistically better than random. If they can't determine differences reliably over time, and cannot improve performance with time, and this is confirmed with a range of individuals, it would be pretty convincing evidence that things are equal. In the cases where differences are discovered it would be a perfect opportunity to learn more about what affects msuical reproduction. Studies that use groups of individuals inherently introduce between subject variability that is poorly accounted for. Within subject performance is much more relevant.
 

·
Registered
Joined
·
1,585 Posts
I think I got to the 4th or 5th page before I gave up..

however, the test was interesting.. and revealing. I think I would have had a similar 'prejudice' as you, i.e., that you would have had a more obvious difference between the different units. Excellent.. thanks for taking the time to do your tests.

For lcaillo:
I think I got lost too.. are you saying that they should run the tests over several days/weeks/etc because after being exposed to the various units for that long, they'll be more likely to see/hear a difference? and with more people in the study the results are more relevant?

JCD
 

·
Plain ole user
Joined
·
11,121 Posts
If what you want to know is whether differences exist, you have to consider the variables that might affect that determination. In the human observer you have a test instrument with lots of variables. Add several humans together and you really confuse the picture. It would be like using several uncalibrated scopes to measure something, where each one was drifting with time and you had no idea whether any of them were even similar in their ability to track a given signal.

What I suggest is to pick the best scope that has the most likelyhood of finding differences, and tune it over time to get the biggest difference, i.e. let the golden ear learn to distinguish the differences as well as he/she can over time and see if it can be done. If it can, the conditions, listening material, components, levels, etc. will inform our understanding of what is audible and what is not. Now repeat the experiment with others of differing skills and see what you learn.

No one would ever trust an experiment described above with the unreliable scopes, but if that is all that you had, how would you resolve the problem of reducing and controlling for the variability? You would look need to learn about the within-unit variation for each unit, figure out what it is capable of, and then do what you can to learn what you are trying to learn. You would not simply assume that there are no differences. Since you only need to determine if there are differences, you can get by with the sloppy measuring devices, as long as you control for the within-device variablity. Measurements over time can identify this variability. IF there is a learning effect, since humans are analogous to an expert system, then there likely are differences. If the variability remains the same or increases, you can pretty much conclude that the differences, if any are not significant.
 

·
Plain ole user
Joined
·
11,121 Posts
Steve, don't take my suggestion as criticism of what you have done. I think you did a great job. Any research, however, needs to be considered for how we can improve in future experiments. I am just trying to discover how to learn more from this type of research.

My perspective on within vs between subject variablity is likely a little different than most because I have experience in the matter from my study in the field of motor learning. My MS thesis, in fact, dealt with a very similar issue, looking at the efficacy of human "experts" in the evaluation of EMG signals. It is a very simiilar problem, in that the goal was to develop better tools to interpret EMG, at a time when virtually all of the interpretation of the type that we were doing was done visually by these experts.

There are some real problems with using multiple human observers as a source for experimental data, particularly when their intentions, goals, and abillities vary. There are some real advantages, in terms of what you can learn, to using single subject repeated measures over time.
 

·
Registered
Joined
·
2,398 Posts
First off, no worries, I don't take it as criticism - this is good discussion.

Ok, now I understand what you are saying. While it definitely makes sense, and would help to get closer to scientifically determining for certain whether audible differences can be detected by one person, I think it falls slighty out of the scope of our test. Rereading our goal, I guess we should have worded it a bit differently to "Our goal was to determine whether or not the different DACs and analog prestages of different receivers and pre/pros can affect percieved sound quality in a typical listening evironment under normal listening conditions", or something to that effect. The testing methods were structured to make any differences that may exist as a result of the DACs and analog prestage more easily noticed then under normal listening conditions to basically error on the side of wanting to make sure we would hear the differences if we could. The test results would likely influence my own purchasing decision, and I didn't want to take any chances of denying myself of any potential sound quality benefits.

So basically, Jon and I were doing it more for our own sake and curiousity then to try and establish a scientific fact which can be applied to any and all circumstances. Kinda touched on with "We aren’t trying to pass off our test as a given fact in every single circumstance for every single person, but our results are fact in this listening room with this equipment with these people". If we wanted to go further, or even just know for certain whether differences truly exist, we would have just run the analog preouts into a computer and recorded the same clip played back through the different units, then examined the results.
 
D

·
Guest
Joined
·
0 Posts
Actually, I'm finding (I've quit reading the Amps forum simply due to time constraints so this is the first time I've seen the thread) the responses on AVS quite interesting. Some people will always take the opportunity to argue about something not on topic, but I'm surprised how little of the objectivist/subjectivist war has bled over into the thread.

Interesting tests, btw, Steve. I think the only way you're going to find a difference in DACs is between two with significantly different bit widths.
 
D

·
Guest
Joined
·
0 Posts
One thing that comes through clearly from many of the responders is the extent to which people build their knowledge base from marketing, rather than from learning the scientific principles involved. The most frustrating arguments are with those who quote from marketing materials and will not be disabused of the veracity of those materials. Essentially, we have a country full of people who will believe a snake oil saleman before a scientist. (Of course, with the global warming "debate" we now have a hybrid testifying...)

Makes one even more depressed.

Anyone remember the Duckman drug episode?
 

·
Banned
Joined
·
357 Posts
Steve next thing you know they will be dis-crediting nose test for poop vs spaghetti. Thing about AVS is that it is oversaturated with under/un-educated fanboys, trolls and such I loved it when someone PM'ed me asking why I built a sub instead of just buying an SVS. I replied cause it teaches me nothing to whip out the wallet and drop the coin on something pre-built and marked up. When I can buy the necessary parts and build something equivical to 1.5-2x the cost in money.

~Bob
 

·
Banned
Joined
·
22,577 Posts
Okay guys.... we might be heading off topic here. Let's try to stay straight on this and not lead to another AVS discussion.... please? with sweet-in-low on top? :dontknow:
 
1 - 20 of 41 Posts
Top