Home Theater Forum and Systems banner

Home Theater Shack 2015 High-End Amplifier Evaluation Event Reporting and Discussion Thread

70184 Views 250 Replies 29 Participants Last post by  JoeGonzales
Home Theater Shack 2015 High-End Amplifier Evaluation Event Reporting and Discussion Thread



:fireworks2:
:fireworks1:




This thread is a continuation of the High-End Amplifier Evaluation Event Preparations Thread previously under way.



The event has begun. Coming to you from southern Alabama, the Home Theater Shack Evaluation Team has assembled at Sonnie Parker's Cedar Creek Cinema for the 2015 High-End Amplifier Evaluation Event. We have amps, we have speakers, we have tunes, we have great eats, what more could one ask for?

Be reminded of the first law of audio evaluation event execution. They never go exactly as planned. Not everything gets there, not everything works, but you endeavor to persevere and get things done.

We have deal with speakers not able to reach us in time, with cabling issues, with equipment not interfacing properly, a laptop crash, with hums and buzzes and clicks and pops, with procedural questions - - - yet we forge ahead, adapt, evolve, redirect, and forge ahead some more - - - and the task of evaluating amplifiers is underway.

Speakers: We were unable to get the Chane A5rx-c and the Acoustic Zen Crescendo Mk II speaker pairs. We are running the Spatial Hologram M1 Turbo v2 and the Martin Logan ESL. Both are very revealing speakers, baring a lot of inner detail in our recordings. They will serve us well. The A5rx-c will be reviewed for HTS when available.

At the moment, the Holograms are serving as our primary evaluation tool. I will post setup details and interesting discoveries a little later. They are giving us a monstrous soundstage, the kind that eats small animals for breakfast, with extremely sharp imaging and very good depth acuity. They are extremely clear, getting into the realm of rivaling electrostatic transparency. Their in-room response is very good, with some expected peaks and dips, but still very listenable. The high frequency response is extended and smooth. The bass gives you that "Are you sure the subs are not on?" feeling on deeper tracks.

We decided to start with sighted comparisons and open discussion today, and blind tests tomorrow. The Audyssey XT32 / Dirac Live comparison has not been completed yet.

Have we heard differences? Yes, some explainable and some not. One amp pairing yielded differences that several evaluators are convinced they could pick in a blind AB test.

One thing I have learned for sure: The perfect complement to good southern barbeque is a proper peach cobbler. Add great company and you have a perfect get-together.

The Event
  • Date: Thursday evening, March 12th through Saturday evening, March 14th.
  • Place: Cedar Creek Cinema, Alabama, hosted by Sonnie, Angie, and Gracie Parker.
  • Evaluation Panel: Joe Alexander (ALMFamily), Leonard Caillouet (lcaillo), Dennis Young (Tesseract), Sonnie Parker (Sonnie), Wayne Myers (AudiocRaver).

The Amplifiers
  • Behringer EP2500
  • Denon X5200 AVR
  • Emotiva XPA-2
  • Exposure 2010S
  • Krell Duo 175
  • Mark Levinson 532H
  • Parasound HALO A31
  • Pass Labs X250.5
  • Sunfire TGA-7401
  • Van Alstine Fet Valve 400R
  • Wyred 4 Sound ST-500 MK II
The Speakers
  • Spatial Hologram M1 Turbo v2, courtesy Clayton Shaw, Spatial Audio
  • Martin Logan ESL
Other key equipment special for the event:
  • Van Alstine ABX Switch Box, recently updated version (February 2015)
  • miniDSP nanoAVR DL, courtesy Tony Rouget, miniDSP
  • OPPO BDP-105

As mentioned, our deepest appreciation goes to Sonnie, Angie, and Gracie Parker, our hosts, for welcoming us into their home. Look up Southern Hospitality in your dictionary, and they are (or should be) listed as prime role models thereof.

This first posting will be updated with more info and results, so check back from time to time.




Amplifier Observations
These are the observations from our notes regarding what we heard that were supported by being consistent between sighted and blind testing and across reviewers. While we failed to identify the amps in ABX testing, the raw observations from the blind comparisons did correlate in some cases to the sighted observations and with the observations of other reviewers. Take these reports for what they are, very subjective assessments and impressions which may or may not be accurate.


Denon X5200 AVR

Compared to other amps, several observations were consistent. The Denon had somewhat higher sibilance, was a bit brighter, and while it had plenty of bass it was noted several times to lack definition found in other amps. At high levels, it did seem to strain a bit more than the other amps, which is expected for an AVR compared to some of the much larger amps. Several times it was noted by multiple reviewers that it had very good detail and presence, as well as revealing ambiance in the recordings.

We actually listened to the Denon more than any other amp, as it was in four of the blind comparisons. It was not reliably identified in general, so one could argue that it held its own quite well, compared to even the most expensive amps. The observations from the blind comparisons that had some common elements either between blind and sighted comparisons or between observers are below. The extra presence and slight lack of bass definition seem to be consistent observations of the Denon AVR, but everyone agreed that the differences were not a definitive advantage to any one amp that would lead us to not want to own or listen to another, so I think we can conclude that the Denon held its own and was a worthy amp to consider.

Compared to Behringer
- bass on Denon had more impact than Behr, vocals sounded muted on Behr
- vocals sounded muted on ML compared to Denon
- Denon: crisp highs preferred compared to Behringer which is silky.
- Denon is more present, forward in mids and highs than Behringer.

Compared to Mark Levinson
- Denon seemed to lack low end punch compared to ML.
- Denon is smooth, a certain PUSH in the bass notes, cellos & violins sounded distant, hi-hat stood out, distant vocal echo stood out, compared to ML.
- Denon bass seemed muddy compared to ML which is tighter.
- ML more distant strings than Denon.
- Denon is slightly mushy and fat in bass. String bass more defined on ML.
- ML seems recessed compared to Denon.

Compared to Pass
- vocals sounded muffled on Pass compared to Denon
- crisp bass on Denon compared to Pass
- Denon & Pass both even, accurate, transparent, natural, no difference, like both
- Pass seems soft on vocals but very close.
- Denon has a bit more punch on bottom, maybe not as much very deep bass, more mid bass.

Compared to Van Alstine
- bass on Chant track was crisp for VA while Denon was slightly sloppy
- sibilance not as pronounced on VA as it was on Denon
- VA super clarity & precision, detailed, space around strings, around everything compared to Denon which is not as clear, liked VA better.
- sibilanceon Denon, VA has less “air” but more listenable, both very good
- Very deep bass more defined on VA, overall more bass on Denon.


Wyred 4 Sound ST-500 MK II

In the sighted listening we compared the ST-500 MK II to the Van Alstine Fet Valve 400R. The assessments varied but were generally closer to no difference. The Van Alstine got comments of being fatter on the bottom. The Wyred 4 Sound was noted to have slightly better bass definition but apparently less impact there, and slightly less detail in the extreme highs. Most comments about the midrange were not much, if any difference. An interesting observation here was by Wayne, noting that he did not think he would be able to tell the difference in a blind comparison. Considering the ST-500 MK II is an ICE design and the Fet Valve 400R is a hybrid, we expected this to be one of the comparisons that would yield differences if any. As I am always concerned about expectation bias, this was one that I was particularly concerned with. Van Alstine is a personal favorite for a couple of us so I expected a clear preference for it to be present in the sighted comparison. I felt that the Wyred 4 Sound amp help its own with the much more expensive and likely to be favored VA.

In the blind comparisons, we compared the ST-500 MK II to the Emotiva XPA-2 and the Sunfire TGA-7401 in two separate sessions. Of course, in these sessions we had no idea what we were listening to until after all the listening was done. In the comparison to the Emotiva, some notes revealed not much difference and that these were two of the best sounding amps yet. The ST-500 MK II was noted to have the best midrange yet, along with the Emotiva. It was described as having less sibilance than both the Emotiva and Sunfire. Both the Emotiva and the ST-500 MK II were described as unstrained in terms of dynamics. In comparison to the Emotiva it was noted to have solid highs, lively dynamics, rich string tones, and punch in the bass. The overall preference in comparison to the Emo was either no difference to preferring the W4S.

In comparison to the Sunfire, comments ranged from preference for the W4S to not much difference to preference for the Sunfire. The Sunfire was described as having more presence in the midrange, while the Wyred was noted to be shrill, lifeless, and hollow by comparison.

These comments varied a lot, but the points of convergence were generally around the similarities to three amps that would be expected to be most likely to be different, if we found any differences at all. The objective results is that we failed to identify the amp in ABX comparisons to two other much more expensive amplifiers. I would have to conclude that based on the results, the ST-500 MK II represents one of the best values and certainly should satisfy most listeners.​





Audyssey XT32 vs. Dirac Live Listening Comparison

Last year HTS published a review of a the miniDSP DDRC-22D, a two-channel Dirac Live Digital Room Correction (DRC) product. The review included a comparison to Audyssey XT. A number of readers requested a comparison of Dirac Live with Audyssey XT32. That comparison was recently completed during the Home Theater Shack High-End Amplifier Evaluation Event at Sonnie Parker's Cedar Creek Cinema in rural Alabama. This report provides the results of that comparison.

Go to the Audyssey XT32 vs. Dirac Live Listening Comparison Report and Discussion Thread.


Spatial Hologram M1 Turbo Speakers

I was very pleased with the Spatial Hologram M1 speakers we used for the amplifier evaluation, and felt that they more than fulfilled our needs. They did not become "gotta have them" items for any of the evaluators, although I had thoughts in that direction once or twice. But they were speakers we could easily ignore through the weekend. I mean this as a high complement. Never did an evaluator complain that the M1 speakers were "in the way" or "holding us back," and we were able to focus on the task at hand unhindered. That alone means a lot, and may say more about them than the rest of the review just completed.

Here is what they did for us:
  • Because of their high efficiency, amplifiers were not straining to deliver the volumes we called for. We could be confident that the amps were operating in their linear ranges and that if we heard a difference it was not due to an amp being overdriven.
  • The stretched-out soundstage opened up a lot of useful detail for us to consider in our evaluations. In discussing the soundstage at one point, there was a consensus that it might be stretched a little too far and might be "coming apart at the seams," showing some gaps, although this did not hinder our progress. My final assessment is that this was not the case, all due respect to the fine ears of the other evaluators. I elaborate on this point in the M1 Review.
  • They served well as a full-range all-passive speaker, able to reach deep and deliver 40 Hz frequencies with lots of clean "oomph," all without the need for DSP boosting and without subwoofer support.
I thoroughly enjoyed spending time with them, and wish to again thank Clayton Shaw of Spatial Audio for loaning them to us. A complete review of the M1 speakers has been posted.

Go to the Spatial Hologram M1 Turbo Version 2 Speaker Review.


A Soundstage Enhancement Experience

Sonnie's MartinLogan ESL hybrid electrostatics were set up very nicely when we arrived, so we avoided moving them through the weekend. There were some improvements made to the soundstage and imaging by way of treatments, and some interesting twists and turns along the way which turned out to be very informative.

I have documented the exercise in a separate post.

Go to the Soundstage Enhancement Experience thread.
See less See more
3
201 - 220 of 251 Posts
I wish I could have said it better! Thank you to all the evaluators for their time and money to accomplish this task, and thanks to Chashint for articulating what I could not. Great job guys.
A complete review of the miniDSP nanoAVR DL has been posted HERE. Where appropriate, comparisons are made to Audyssey XT32, focusing on the end-user experience.

A post has been added to this thread titled Audyssey XT32 (without Pro Kit) vs Dirac Live End User Experience Comparison Summary of Audible Characteristics. Read it HERE.
Your prior evaluation event threads have been beyond outstanding.

What happened? Were you guys issued a legal gag order? 21 pages of nothing conclusive, but hints of both sides are right talk.

Can you at least post the cumulative measured response graphs together on one graph for us pretty please. I appreciate your efforts. This thread ended up a real dissapointment for me. :(

I wish you would have tested inefficient bookshelves as I suggested in the preparation thread. Both sets of speakers were high efficiency.

Please give us the data conclusive or not. What are you waiting for? :huh: Does this mean this blind test resulted in a statistical null result?
See post 177. There were virtually no measureable differences that amounted to anything in either frequency response or impulse response. I have spent hours trying to find something and it just is not there. I really expected to find something in the impulse response, but it was actually more similar than the frequency response measures.

The bottom line is that we were not able to reliably detect differences nor reliably identify differences in ABX comparisons. the things that we made notes that we heard that had any correllation at all across listeners and sessions are detailed in the first post. They were mostly about the amps that we heard in more than one comparison, which leads me to believe that with more focused listening on less amps that we might be able to find some diffenences that hold up. Frankly, we probably had too many amps and tried to do too much. The next time we probably won't have more than 2 or 3.

Sorry to disappoint, but it is what it is. I don't think that it is fair to post every comment we wrote down on every amp that is uncorrellated. I am still uneasy with posting as much as I did because there may have been only one comment that supported each, and that is far from reliable. We can conclude that what we THINK we hear is highly variable and suspect.

We will have a much better idea about how to approach it next time.
See less See more
Your prior evaluation event threads have been beyond outstanding. What the happened? Were you guys issued a legal gag order? 21 pages of nothing conclusive, but hints of both sides are right talk. Can you at least post the cumulative measured response graphs together on one graph for us pretty please. I appreciate your efforts. This thread ended up a real dissapointment for me. :( I wish you would have tested inefficient bookshelves as I suggested in the preparation thread. Both sets of speakers were high efficiency. Please give us the data conclusive or not. What are you waiting for? :huh: Does this mean this blind test resulted in a statistical null result?

I think this is being kinda rough on these guys.
This evaluation produced the same results as ALL other level matched amplifier evaluations.
Since you are disappointed/dissatisfied with it, perhaps you will put together your own amplifier comparison and post the methods/results here for critique.
I wish the test provided some conclusion is all. A null result is why blind testing is often criticized. I do think that a proper test method with this many amps would take more listeners, more time and a better room setup. I care less about the listener impressions than seeing the hard data in graph form. In a blind test fatigue and the stress to answer "right" bias listener results. Are we testing the amps vs listeners or the test vs. human psycology.

Numbers and hard data are unquestionable. In future I would recommend finding a set of speakers known to give amps trouble at key frequencies (probably bass impedance/inductance swings) and measure output vs frequency over a specific bandwidth. Then vary the output voltages higher and run the test over and over. Some amps will simply not "wake up" speakers until a certain voltage is reached. I recommended bookshelves because bass is often not their strongsuit and poor amps can result in thin sounding speaker response.

Remove the human element and the test becomes faster and conclusive. Then we can try to find out what the measurements mean in a listening experience. :)
See less See more
I wish the test provided some conclusion is all. A null result is why blind testing is often criticized. I do think that a proper test method with this many amps would take more listeners, more time and a better room setup. I care less about the listener impressions than seeing the hard data in graph form. In a blind test fatigue and the stress to answer "right" bias listener results. Are we testing the amps vs listeners or the test vs. human psycology. Numbers and hard data are unquestionable. In future I would recommend finding a set of speakers known to give amps trouble at key frequencies (probably bass impedance/inductance swings) and measure output vs frequency over a specific bandwidth. Then vary the output voltages higher and run the test over and over. Some amps will simply not "wake up" speakers until a certain voltage is reached. I recommended bookshelves because bass is often not their strongsuit and poor amps can result in thin sounding speaker response. Remove the human element and the test becomes faster and conclusive. Then we can try to find out what the measurements mean in a listening experience. :)
No matter the initial conditions, variables introduced or placebos controlled, someone will always be dissatisfied with test results. That's the nature of testing. Hard data is not always the absolute arbiter of conclusion. Questionable recording practices and post-manipulation can come into play. I seriously doubt any of our panel engaged in such integrity-robbing practices. Rather, I believe they conducted themselves with the highest professionalism and exercised due diligence in set-up. Room acoustics and speaker positioning were already dialed in to the nth degree before the trials began. And why use hard-to-drive, specialty speakers unless their ownership proliferated throughout the mass market?

On one hand, you ask for hard data, but on the other you speak of "waking up" speakers; Where's the hard data for that? Remove the human element, and you have (drum roll, please) The Terminator Syndrome: machines measuring machines producing physical phenomena for other machines. The results can hardly be soothing. Right, Ahhnolt?

Sent from my iPad using HTShack
See less See more
Whatever the shortcomings of blind listening tests are I have yet to see any other method proposed that is better.
If the proposal is to let measurements be the end of the argument then the electrical measurements on the many and various amplifiers that have been published in HiFi magazines should (if you understand basic electronics and orders of magnitude) lead to the conclusion that properly functioning amplifiers that are not overdriven will sound so similar that they are unidentifiable from each other in listening tests.

Every time any group starts an amplifier "shootout" there seems to be a groundswell of hope that there will finally be a conclusion that's different from all of the other controlled amplifier listening efforts that have come before.
But alas, if the levels are carefully matched and the listeners do not know which machine is powering the speakers the machines all sound the same.
See less See more
Chasnit wrote:

But alas, if the levels are carefully matched and the listeners do not know which machine is powering the speakers the machines all sound the same.
Yep, couldn't have said it better. But those were some pretty serious amps. Judging by the ridiculous price drops Sonnie resorted to....not the price range most of us are at. Maybe it would have been nice to throw in some more reasonably priced contenders. Coudos for making it happen though, it's nice to know the extra dollars are better spent elsewhere. At least that's what I'm taking away from it.
A $300 AVR is not going to provide stable voltage into a difficult load.

Lou you are right, in the end a machine cant tell us how another machine is interacting with our senses. But we cannot reliably predict any machine will interact between 2 different listeners the same way. I think this is why the test I mentioned is a good place to start, then listen to the key ranges once they are determined by initial screening. Then you can correlate objective to subjective tests and see if a pattern emerges. Wayne has already done this to an extent in his test of an Axiom Amplifier into speakers. He would be a key resource in designing this test.
Chasnit wrote: Yep, couldn't have said it better. But those were some pretty serious amps. Judging by the ridiculous price drops Sonnie resorted to....not the price range most of us are at. Maybe it would have been nice to throw in some more reasonably priced contenders. Coudos for making it happen though, it's nice to know the extra dollars are better spent elsewhere. At least that's what I'm taking away from it.
Some, maybe even most, people would spend their dollars elsewhere. Some for the reasons you stated; others for reasons dealing with mob mentality. Still others place high value on certain differences, even if only perceived. So perceived or not, there's nothing wrong with someone spending more if the difference is important to them. Sure build quality, craftsmanship, and appearance play an influential role in how one amp sounds over another. Sure blind tests say otherwise. The hobby is big enough for both camps. Each just uses different machines to accomplish the same task.

Sent from my iPad using HTShack
A $300 AVR is not going to provide stable voltage into a difficult load. Lou you are right, in the end a machine cant tell us how another machine is interacting with our senses. But we cannot reliably predict any machine will interact between 2 different listeners the same way. I think this is why the test I mentioned is a good place to start, then listen to the key ranges once they are determined by initial screening. Then you can correlate objective to subjective tests and see if a pattern emerges. Wayne has already done this to an extent in his test of an Axiom Amplifier into speakers. He would be a key resource in designing this test.
Yes, I see that now. It's more complicated than just differences between machines. Two listeners have different perceptions of reality and react to stimuli differently. We can take that one step further by repeating that even one particular person may not react to an amp/song/speaker the same way from moment to moment. Psychological and physiological factors influence how we react to what we hear whether it be ABX stress or wishful thinking. For that matter, something as trivial as a grocery list can distract us from discerning differences. Welcome to the machine! :)

Sent from my iPad using HTShack
A $300 AVR is not going to provide stable voltage into a difficult load. Lou you are right, in the end a machine cant tell us how another machine is interacting with our senses. But we cannot reliably predict any machine will interact between 2 different listeners the same way. I think this is why the test I mentioned is a good place to start, then listen to the key ranges once they are determined by initial screening. Then you can correlate objective to subjective tests and see if a pattern emerges. Wayne has already done this to an extent in his test of an Axiom Amplifier into speakers. He would be a key resource in designing this test.
How do you know a $300 AVR is not going to be stable into a difficult load?
Is there any data to back up that statement?

Since you know what and how you want a test to be conducted, why not do it instead of asking other people to do it?
If you were to publish the methodology, conduct the test, and then publish the results I (and probably others) would read it from start to finish.
BluRockinLou wrote:

So perceived or not, there's nothing wrong with someone spending more if the difference is important to them.
We all have our paradigms! I just bougt a very nice set of Michelins for my truck. Could have spent a lot less, but I like them!
I wish the test provided some conclusion is all. A null result is why blind testing is often criticized.
This test did provide a very informative conclusion based on the original criteria. It's just not the conclusion you want it to be.

In future I would recommend finding a set of speakers known to give amps trouble....
You want to do a specialized test to find the best amp to drive hard loads. That is not what this test was about.
BluRockinLou wrote: We all have our paradigms! I just bougt a very nice set of Michelins for my truck. Could have spent a lot less, but I like them!
Me too.
I have been meaning to post my own observations and conclusions from the event. The posts over the last few days have prompted me to go ahead and get that done.

Thanks Leonard for being willing in take the time to dig through all of the data the way that he did. He has far more patience at that then I, and no one could have done a better job.

I had hoped that we would have much clearer results than we did, either that there were clear differences that we could prove, or that there were none we could hear at all. Instead we ended up with some of us able to hear some differences some of the time and only a little data to prove it. Concerning data which can be said to support any consistency of findings across the whole listening panel, Leonard has found what was there to be found and reported it already.

I will go ahead and post my individual observations for what they are worth, but only to be taken with a huge grain of salt, because they are impressions and that is all. If the others wish to post their impressions, they are welcome to do so.

First of all let me describe my evaluation process. I personally feel this is quite important because different people seem to have different ways of going about this. And if one has a listening style that works for him, that should probably be taken into account in the design of the blind testing that person will engage in. In other words, I might be able to set up a valid blind test method that would work great for me and throw Joe or Dennis or Leonard completely off, while there may be an approach that would work perfectly for some of them and leave me flat.

And this is one of the great difficulties in setting up tests like this. Someone with a good background in audio and acoustics and psychoacoustics and testing sits down and figures out a really good double-blind test method for ABX testing and 5 people walk into the room and it happens to fit the listening approaches of only one of them and he does well but the other four fail miserably, and the test overall shows no statistically significant data supporting the ability to tell a difference. Had the test been set up another way it might give a different result.

The ABX testing that we did made it a necessity that the evaluators rely upon extremely fine details being held in auditory memory for 30 seconds to a minute to be used in an AB comparison. Dennis appeared to do very well with this, while the rest of us did not. For me that was extremely difficult, as the fine differences that we were hearing were simply not something that I could capture in memory and carry forward in that way into a comparison 30 seconds to a minute later. Maybe with practice I could learn to do so but at the event I was not able to.

Here is what worked well for me. I felt fairly confident about the differences that I was hearing between amplifiers in sighted testing we did on the first day. The two amplifiers were set up, their levels were matched, we knew which was which, and we held the a/b switch in our hands while we listen to our own selected listening tracks. As I listened through my tracks I switched back and forth freely between the two amplifiers. Over time I started to recognize that there were certain passages of each track that seemed more likely than others to help reveal differences between the amplifiers, so I focused more on those parts of the tracks, but I also listened to other passages just in case something new would pop up.

When I heard a difference I tried to make a note of what part of what track it was that I heard it on, and what I heard, and where I felt I had time I would go back and repeat that to be sure that the difference was distinctive and easily identifiable. Remember these judgments were not absolute in any way but extremely comparative in nature, as will be seen in my impressions of some of the amplifiers the follow. By switching back and forth during those critical passages, I felt the contrast almost jumped out sometimes when the switching was done at just the right moment. Given the ability to do that repeatedly with a pair of amplifiers, I got to the point where I was pretty confident I could identify the difference consistently.

So if I was to set up a blind comparison around that listening style and try to get statistical data to show I could do it consistently here's how I would go about it. I would start out with the pair sighted so I knew which was which and go about the test as I have described and identify the characteristics comparatively between the two amplifiers. Then I would leave the room, have the test setter upper flip a coin and decide whether or not to swap the two amplifiers. When I came back into the room I would know it was the same two amplifiers but would not know if they had been switched or not. So my task would be to sit down and listen, switching back and forth and try to come to the same conclusion as I did before comparing with the same tracks and identify which of the amplifiers was A and which was b.

When done, I would leave the room and we would do the whole thing again, maybe 10 times in a row in a day. At the end of the day if with this process I was able to identify the amplifiers correctly say nine times out of ten, that would be a significant result.

On another day, the same process could be followed with another pair of amps. You can see that this could turn into quite a long, drawn out process with multiple people and multiple amplifiers. You can also see that someone else might try the same method and have it absolutely not work for him at all. And it would become difficult to find a way to work with a listening preferences of each listener and still end up with what one can call statistically valid results because it almost ended up being like different kinds of tests for each listener.

That is a problem that I see with throwing around broad statements like, can you prove it in a double-blind study? Which double-blind study? Who sets it up? What are the conditions?

Some will say that it is wrong to tailor the test to the listener, that it invalidates the study right off the bat. and again I would say that it depends on how you define what you were trying to accomplish. For the kind of differences we are talking about, I will go out on a limb and predict that if the only way it is approached is by trying to come up with a single generic test that has to fit all listeners with their different critical listening styles, then the testing is bound to show that those differences cannot be heard consistently across a broad listening audience under that kind of test.

But if somebody gets their gumption together to define an approach to accommodate individual listening style and crunch the numbers together at the end, and include the information about what those styles were, then we may someday end up with a real in depth test that shows that those differences can be discerned consistently. This could even be done in a way which accommodates those who prefer long term listening tests, as some say that that is the only way to really hear some of the fine differences. That has not been my experience so far, but it is very close minded of me to assume that it cannot work for someone else, or even to assume that would not work for me if I really give it a proper chance over time.

I would like to note one observation that I find somewhat humorous. In the ABX testing, my success rate at identifying the X amplifier was the worst of the whole bunch of us. I was wrong six out of seven times. In a way, that result is the most statistically significant of all the listeners at the event, I just had a mental flip-flop of some kind going on that led me to the wrong answer almost every time.


My Observations

These differences are comparative in nature, and almost impossibly small. I would never expect to be able to walk into a room and hear one of these amplifiers playing and say, "Hey, I recognize that particular sound as being the Parasound amp,” or the Krell amp or any other particular amp. And with my experience at this so far, I would be suspect of anyone who claims that they could.


Day 1, Sighted Pairings:

1 - Krell vs Parasound:
Krell, bigger sound
Parasound, not as big

2 - Denon vs Mark Levinson
Denon, brighter
Mark Levinson, rolled-off high end

3 - Emotiva vs Pass Labs:
Emotiva, slightly bigger bass
Pass labs, tighter
This was a fun pairing, I liked both amps.

4 - Van Alstyne vs Wyred4Sound, no difference noted

5 - Behringer vs Sunfire
Behringer, les bass
Sunfire, more bass

6 - Exposure vs Krell, no difference noted


Day 2, Blind Pairings:

1 - Denon vs Behringer
Denon, crisp highs
Behringer, silky highs
I preferred the Denon.

2 - Denon vs Mark Levinson
Denon, bass not as clear
Mark Levinson, bass seemed tighter, clearer
I missed the rolled off high frequencies of the Mark Levinson, which I heard in sighted testing.

3 - Exposure vs Parasound, no difference noted

4 - Wired for sound vs Emotiva
Wired for sound, solid highs, lively dynamics, richest string tones, punchy bass
Emotiva, punchy bass
I preferred the Wyred4Sound

5 - Wyred4Sound vs Sunfire
Wyred4Sound, a little shrill
Sunfire, alive, nice highs
I preferred the Sunfire

6 - Denon vs Pass Labs
I noted no difference between these two amplifiers, but my comments were that they were both very even, accurate, transparent, and natural, and that I'd like either of them.

7 - Denon vs Van Alstyne
Denon, okay, not quite as clear, a normal amp sound
Van Alstyne, super clear and detailed, space around all the sounds
I preferred the van Alstine


Future Work:

How about removing the room from the equation? Use the same setup, but at the speaker terminals attach an attenuator pad and buffer amp with leads to a different room, feeding a class A headphone amp and low-distortion headphones. With the right headphones, I can readily hear differences between headphone DAC/AMP models I am reviewing. Just an idea.


Conclusions:

The main takeaway here is that the differences are incredibly small, difficult to hear, and difficult to test for in a provable way. I would have probably been happy with any of these amplifiers if I had to walked into a room and heard it all by itself. I doubt I would have been able to say that any one of them was better or worse than any other under normal listening circumstances.
See less See more
Well written Wayne, that kind of straight forward honesty is what HTS is all about! I saw you had responded to this thread & I immediatly started imagining what you might say. I actually had thoughts about setting up a session with cans like you said! Wow, is the cosmos coming together or what! Interesting theory, but like you said, very hard to hear differences.

One thing I might add is to never know which amp is which. If you built a false wall between you and the amps, and just labeled them A, B, C, D etc. Then made the list of pairings so everybody got the same exposure/chance to put through their paces. It might help to take out any preconceived expectations. No need to know which amp it is at all! Kinda simplistic untill you think about it.:ponder:
One thing I might add is to never know which amp is which. If you built a false wall between you and the amps, and just labeled them A, B, C, D etc. Then made the list of pairings so everybody got the same exposure/chance to put through their paces. It might help to take out any preconceived expectations. No need to know which amp it is at all! Kinda simplistic untill you think about it.:ponder:
You are right, as long as they are always called the same name, they could be letters, numbers, mineral names, whatever, to help remove any bias.
Wayne and I are of one mind on this. I was not much better than he was at matching the amps to what I thought I heard. I found myself asking the question, was it A or B that was more like X. I was taking notes but would still get confused. I have terrible short term memory for random data and have to repeat things to myself to be able to recall at all, and the short time was a problem for me. I would focus on what I was hearing and feeling and did not have time to let myself really get into the music. I was more successful when I just listened for feel rather than specific characteristics to compare the amps. The ones that I got right were purely on what the sound felt like to me, with no specific notes.

I have a bit of experience in testing and in behavioral research with multiple trials. One thing I have learned that was confirmed here is that subjects have to get comfortable with the testing context before you can get reliable results. I can see why so many people get hysterical at the idea of AB or ABX testing. It is very different than the way we normally listen.

I really believe that there were differences at times, but VERY small. It will take much more time and focus on just a couple of amps to tease them out consistently if it is possible at all. Looking back on it, trying to compare so many amps was a fools errand, even if it was tremendous fun.

I'll pull out my notes from the first day and post them. My impressions of the amps were sighted, but I can be sure that they apply to the right one. I got mixed up too much on the ABX comparisons to be sure that the comments would be about the right amp. The could easily be backwards.
See less See more
201 - 220 of 251 Posts
This is an older thread, you may not receive a response, and could be reviving an old thread. Please consider creating a new thread.
Top