Home Theater Forum and Systems banner

Home Theater Shack 2015 High-End Amplifier Evaluation Event Reporting and Discussion Thread

70183 Views 250 Replies 29 Participants Last post by  JoeGonzales
Home Theater Shack 2015 High-End Amplifier Evaluation Event Reporting and Discussion Thread


This thread is a continuation of the High-End Amplifier Evaluation Event Preparations Thread previously under way.

The event has begun. Coming to you from southern Alabama, the Home Theater Shack Evaluation Team has assembled at Sonnie Parker's Cedar Creek Cinema for the 2015 High-End Amplifier Evaluation Event. We have amps, we have speakers, we have tunes, we have great eats, what more could one ask for?

Be reminded of the first law of audio evaluation event execution. They never go exactly as planned. Not everything gets there, not everything works, but you endeavor to persevere and get things done.

We have deal with speakers not able to reach us in time, with cabling issues, with equipment not interfacing properly, a laptop crash, with hums and buzzes and clicks and pops, with procedural questions - - - yet we forge ahead, adapt, evolve, redirect, and forge ahead some more - - - and the task of evaluating amplifiers is underway.

Speakers: We were unable to get the Chane A5rx-c and the Acoustic Zen Crescendo Mk II speaker pairs. We are running the Spatial Hologram M1 Turbo v2 and the Martin Logan ESL. Both are very revealing speakers, baring a lot of inner detail in our recordings. They will serve us well. The A5rx-c will be reviewed for HTS when available.

At the moment, the Holograms are serving as our primary evaluation tool. I will post setup details and interesting discoveries a little later. They are giving us a monstrous soundstage, the kind that eats small animals for breakfast, with extremely sharp imaging and very good depth acuity. They are extremely clear, getting into the realm of rivaling electrostatic transparency. Their in-room response is very good, with some expected peaks and dips, but still very listenable. The high frequency response is extended and smooth. The bass gives you that "Are you sure the subs are not on?" feeling on deeper tracks.

We decided to start with sighted comparisons and open discussion today, and blind tests tomorrow. The Audyssey XT32 / Dirac Live comparison has not been completed yet.

Have we heard differences? Yes, some explainable and some not. One amp pairing yielded differences that several evaluators are convinced they could pick in a blind AB test.

One thing I have learned for sure: The perfect complement to good southern barbeque is a proper peach cobbler. Add great company and you have a perfect get-together.

The Event
  • Date: Thursday evening, March 12th through Saturday evening, March 14th.
  • Place: Cedar Creek Cinema, Alabama, hosted by Sonnie, Angie, and Gracie Parker.
  • Evaluation Panel: Joe Alexander (ALMFamily), Leonard Caillouet (lcaillo), Dennis Young (Tesseract), Sonnie Parker (Sonnie), Wayne Myers (AudiocRaver).

The Amplifiers
  • Behringer EP2500
  • Denon X5200 AVR
  • Emotiva XPA-2
  • Exposure 2010S
  • Krell Duo 175
  • Mark Levinson 532H
  • Parasound HALO A31
  • Pass Labs X250.5
  • Sunfire TGA-7401
  • Van Alstine Fet Valve 400R
  • Wyred 4 Sound ST-500 MK II
The Speakers
  • Spatial Hologram M1 Turbo v2, courtesy Clayton Shaw, Spatial Audio
  • Martin Logan ESL
Other key equipment special for the event:
  • Van Alstine ABX Switch Box, recently updated version (February 2015)
  • miniDSP nanoAVR DL, courtesy Tony Rouget, miniDSP
  • OPPO BDP-105

As mentioned, our deepest appreciation goes to Sonnie, Angie, and Gracie Parker, our hosts, for welcoming us into their home. Look up Southern Hospitality in your dictionary, and they are (or should be) listed as prime role models thereof.

This first posting will be updated with more info and results, so check back from time to time.

Amplifier Observations
These are the observations from our notes regarding what we heard that were supported by being consistent between sighted and blind testing and across reviewers. While we failed to identify the amps in ABX testing, the raw observations from the blind comparisons did correlate in some cases to the sighted observations and with the observations of other reviewers. Take these reports for what they are, very subjective assessments and impressions which may or may not be accurate.

Denon X5200 AVR

Compared to other amps, several observations were consistent. The Denon had somewhat higher sibilance, was a bit brighter, and while it had plenty of bass it was noted several times to lack definition found in other amps. At high levels, it did seem to strain a bit more than the other amps, which is expected for an AVR compared to some of the much larger amps. Several times it was noted by multiple reviewers that it had very good detail and presence, as well as revealing ambiance in the recordings.

We actually listened to the Denon more than any other amp, as it was in four of the blind comparisons. It was not reliably identified in general, so one could argue that it held its own quite well, compared to even the most expensive amps. The observations from the blind comparisons that had some common elements either between blind and sighted comparisons or between observers are below. The extra presence and slight lack of bass definition seem to be consistent observations of the Denon AVR, but everyone agreed that the differences were not a definitive advantage to any one amp that would lead us to not want to own or listen to another, so I think we can conclude that the Denon held its own and was a worthy amp to consider.

Compared to Behringer
- bass on Denon had more impact than Behr, vocals sounded muted on Behr
- vocals sounded muted on ML compared to Denon
- Denon: crisp highs preferred compared to Behringer which is silky.
- Denon is more present, forward in mids and highs than Behringer.

Compared to Mark Levinson
- Denon seemed to lack low end punch compared to ML.
- Denon is smooth, a certain PUSH in the bass notes, cellos & violins sounded distant, hi-hat stood out, distant vocal echo stood out, compared to ML.
- Denon bass seemed muddy compared to ML which is tighter.
- ML more distant strings than Denon.
- Denon is slightly mushy and fat in bass. String bass more defined on ML.
- ML seems recessed compared to Denon.

Compared to Pass
- vocals sounded muffled on Pass compared to Denon
- crisp bass on Denon compared to Pass
- Denon & Pass both even, accurate, transparent, natural, no difference, like both
- Pass seems soft on vocals but very close.
- Denon has a bit more punch on bottom, maybe not as much very deep bass, more mid bass.

Compared to Van Alstine
- bass on Chant track was crisp for VA while Denon was slightly sloppy
- sibilance not as pronounced on VA as it was on Denon
- VA super clarity & precision, detailed, space around strings, around everything compared to Denon which is not as clear, liked VA better.
- sibilanceon Denon, VA has less “air” but more listenable, both very good
- Very deep bass more defined on VA, overall more bass on Denon.

Wyred 4 Sound ST-500 MK II

In the sighted listening we compared the ST-500 MK II to the Van Alstine Fet Valve 400R. The assessments varied but were generally closer to no difference. The Van Alstine got comments of being fatter on the bottom. The Wyred 4 Sound was noted to have slightly better bass definition but apparently less impact there, and slightly less detail in the extreme highs. Most comments about the midrange were not much, if any difference. An interesting observation here was by Wayne, noting that he did not think he would be able to tell the difference in a blind comparison. Considering the ST-500 MK II is an ICE design and the Fet Valve 400R is a hybrid, we expected this to be one of the comparisons that would yield differences if any. As I am always concerned about expectation bias, this was one that I was particularly concerned with. Van Alstine is a personal favorite for a couple of us so I expected a clear preference for it to be present in the sighted comparison. I felt that the Wyred 4 Sound amp help its own with the much more expensive and likely to be favored VA.

In the blind comparisons, we compared the ST-500 MK II to the Emotiva XPA-2 and the Sunfire TGA-7401 in two separate sessions. Of course, in these sessions we had no idea what we were listening to until after all the listening was done. In the comparison to the Emotiva, some notes revealed not much difference and that these were two of the best sounding amps yet. The ST-500 MK II was noted to have the best midrange yet, along with the Emotiva. It was described as having less sibilance than both the Emotiva and Sunfire. Both the Emotiva and the ST-500 MK II were described as unstrained in terms of dynamics. In comparison to the Emotiva it was noted to have solid highs, lively dynamics, rich string tones, and punch in the bass. The overall preference in comparison to the Emo was either no difference to preferring the W4S.

In comparison to the Sunfire, comments ranged from preference for the W4S to not much difference to preference for the Sunfire. The Sunfire was described as having more presence in the midrange, while the Wyred was noted to be shrill, lifeless, and hollow by comparison.

These comments varied a lot, but the points of convergence were generally around the similarities to three amps that would be expected to be most likely to be different, if we found any differences at all. The objective results is that we failed to identify the amp in ABX comparisons to two other much more expensive amplifiers. I would have to conclude that based on the results, the ST-500 MK II represents one of the best values and certainly should satisfy most listeners.​

Audyssey XT32 vs. Dirac Live Listening Comparison

Last year HTS published a review of a the miniDSP DDRC-22D, a two-channel Dirac Live Digital Room Correction (DRC) product. The review included a comparison to Audyssey XT. A number of readers requested a comparison of Dirac Live with Audyssey XT32. That comparison was recently completed during the Home Theater Shack High-End Amplifier Evaluation Event at Sonnie Parker's Cedar Creek Cinema in rural Alabama. This report provides the results of that comparison.

Go to the Audyssey XT32 vs. Dirac Live Listening Comparison Report and Discussion Thread.

Spatial Hologram M1 Turbo Speakers

I was very pleased with the Spatial Hologram M1 speakers we used for the amplifier evaluation, and felt that they more than fulfilled our needs. They did not become "gotta have them" items for any of the evaluators, although I had thoughts in that direction once or twice. But they were speakers we could easily ignore through the weekend. I mean this as a high complement. Never did an evaluator complain that the M1 speakers were "in the way" or "holding us back," and we were able to focus on the task at hand unhindered. That alone means a lot, and may say more about them than the rest of the review just completed.

Here is what they did for us:
  • Because of their high efficiency, amplifiers were not straining to deliver the volumes we called for. We could be confident that the amps were operating in their linear ranges and that if we heard a difference it was not due to an amp being overdriven.
  • The stretched-out soundstage opened up a lot of useful detail for us to consider in our evaluations. In discussing the soundstage at one point, there was a consensus that it might be stretched a little too far and might be "coming apart at the seams," showing some gaps, although this did not hinder our progress. My final assessment is that this was not the case, all due respect to the fine ears of the other evaluators. I elaborate on this point in the M1 Review.
  • They served well as a full-range all-passive speaker, able to reach deep and deliver 40 Hz frequencies with lots of clean "oomph," all without the need for DSP boosting and without subwoofer support.
I thoroughly enjoyed spending time with them, and wish to again thank Clayton Shaw of Spatial Audio for loaning them to us. A complete review of the M1 speakers has been posted.

Go to the Spatial Hologram M1 Turbo Version 2 Speaker Review.

A Soundstage Enhancement Experience

Sonnie's MartinLogan ESL hybrid electrostatics were set up very nicely when we arrived, so we avoided moving them through the weekend. There were some improvements made to the soundstage and imaging by way of treatments, and some interesting twists and turns along the way which turned out to be very informative.

I have documented the exercise in a separate post.

Go to the Soundstage Enhancement Experience thread.
See less See more
221 - 240 of 251 Posts
Yay! Thanks for more details. Very interesting observations about ABX testing and possible criteria for improving reliability. :T

I hoped my initial (abnormal) terse prodding was enough to get a response and not leave a lasting bad taste in your mouths about reader feedback. You guys always do great work for us who like to live vicariously through these sort of reports. :) THANK YOU! :reading:
Leonard makes a good point, that time to really get comfortable and familiar with the test environment is a good thing. Some of those contrasting impressions I feel could be repeated in the right conditions. Some were still quite fresh and could have evolved over time in their description. For instance, "silky highs" vs. "crisp highs," what does that even mean? If I had another half hour or 45 minutes to really investigate that dimension of a pair of amplifiers, would it did end up a completely different description? Would that contrast have become easier to hear and identify somehow? Might it even have ended up disappearing altogether, something totally imagined? These are all possibilities.

The human imagination is incredibly powerful, I do not understand why it is so difficult for some people to accept that it can affect our hearing, too. I have had it happen to me. I have nothing against faith, nothing against trusting that I can hear something even if it cannot be measured. And I feel no need to prove to someone else something that I know is true and repeatable, especially if it can be replicated from scratch in a different environment. But I am going to need to prove it to myself to be sure I did not make it up. Repeated testing, perhaps over several listening sessions, perhaps over several days, may be needed to get those initial impressions sorted through and settled down to real repeatability and meaningful description.

As a pure guess, I would say that my impressions above are 50% stable and 50% unstable, or in need of more time to mature and even be sure they were real.
See less See more
So both of us were reliably wrong in identifying amps in ABX comparisons. What conclusion can we draw from that? I know we both worked very hard at trying to get it right, so my feeling is that it is the testing design that is flawed. We should have been closer to 50% if there was not some systematic bias going on.
We should have been closer to 50% if there was not some systematic bias going on.
A completely random pick should result in close to 50/50, monkeys should get those results. You were not randomly picking, you were making educated guesses, which tend to result in more incorrect answers (info from Mometrex testing).
That is my point. The fact that we consistently got them backwards in a choice with even odds is curious in itself.
I have thought about that a LOT. It has to mean something.
It does mean something. It means that you were wrong when you thought you were right (you thought you were picking the correct amp, but you actually picked the incorrect amp). I don't mean this in a demeaning way, it's just what the test results mean. You could not tell which amp was which, and because you tried to pick the correct one, it usually will lead to the incorrect amp.
If the testing design was not flawed, and there was not a difference between the amps, we should average around 50%. My point is that there is something going on beyond chance, which means that there is either a flaw in the testing design. My point is that ABX testing is not as objective a method as many would suggest.

Yes, we were more often wrong than right. It is not demeaning it is just data. For data to become informative, you have to attach some meaning to it. If the results were closer to random, I would be less critical of our methodology. To be consistently wrong is very curious.
If the testing design was not flawed, and there was not a difference between the amps, we should average around 50%. My point is that there is something going on beyond chance, which means that there is either a flaw in the testing design. My point is that ABX testing is not as objective a method as many would suggest.

Yes, we were more often wrong than right. It is not demeaning it is just data. For data to become informative, you have to attach some meaning to it. If the results were closer to random, I would be less critical of our methodology. To be consistently wrong is very curious.
Results would average 50% over a large sample size. In a small size you will have people that get them all right or all wrong, so. I don't think you can draw much from a 2 person sample size. That being said, it is interesting and may warrant further investigation.
The sample size was 28 (4 observers x 7 comparisons). Out of those 28 trials we got 11 correct (most of those were thanks to Dennis, BTW). The probablility of that if there was a .5 probability on each test would be about 11%. That is certainly not low enough to conclude with a high degree of certainty that the test was biased, but it is still pretty unlikely for a fair test at .5 probability per trial.

I understand statistics better than most, as I was a math teacher and did behavioral research in grad school. All research has hidden biases that are hard to tease out. Increasing the number of trials makes it less likely to err in one's conclusions when all of the significant variables have been controlled for. Below a certain threshold, however, you are still in guessing mode, and that is where we remain. When there, you have to make educated guesses at how you can make the testing more reliable in getting at what you are looking for and how to minimize the effects of unintended variables. Confusion in recall was certainly an issue for Wayne and me, less so for Joe, and not much so for Dennis. If we just considered Dennis' results he was correct enough to be statistically significant, but it would be unfair to do so. You don't throw out some of the data to get the result you want.

The bottom line is that the patterns of the data do suggest a problem with the methodology, at least for some of the subjects. In the future we will account for that.
See less See more
What if we had a switch box with A, B, and X, could return to X at any time, take auditory memory out of the equation?
My point from post #224 is that you do not have a 50/50 chance at getting the answer correct. To get a 50/50 result, the answers have to be picked at random.

You were not picking your answers at random (i.e. flipping a coin, or choosing an answer before you saw the question), you were using judgmental guesses which alters the outcome from a 50/50 result. If you could correctly distinguish a certain amp, then the results would have been skewed toward more correct answers. If you could not correctly distinguish a certain amp, then the results would have been skewed toward more incorrect answers.

Using random picks usually comes to a 50/50 result. Using judgmental guessing does not come to a 50/50 result.

Your results were skewed toward more incorrect answers, this tells us that you could not correctly identify the amps. It does not mean that the testing was flawed.
See less See more
(Full disclosure, I think amplifiers sound the same if not driven to distortion)
My training is electr
I do not mean this in an argumentative way at all. Just seeking a little further clarity. I think part of what we are learning from this, is that there is an awful lot of fine detail in the "sounds the same" part of that statement, which we all have a tendency to throw around freely, myself included. Under normal listening conditions, we are processing so much so fast, and if an amplifier doesn't sound bad, or if it sounds pretty good, then we think of it as being good enough and we are happy. When we start listening really close for detail, are there a little differences that might be audible? Little differences in the way that soundstage shows up as a result of crosstalk in circuitry or in power supply circuitry? Distortion of a slightly different nature in this amplifier vs that amplifier, both good amplifiers but with slightly different sonic characteristics resulting from bias circuitry design? And part of the question along with all of that, is it worth the trouble to try to hear that level of detail? If it is not something that jumps right out at you, why worry about it? That is Sonnie's way of looking at things. And for most of us most of the time that is not a bad way of approaching it.

But remember also that the purpose of this study and studies like it is to try to determine "can we hear a difference" not "is it worthwhile to try to hear a difference?" They're totally different questions.

Just some things to consider.
See less See more
For those who enjoy the more philosophical side of things, I always enjoyed Robert Pirsig's books, Zen and the Art of Motorcycle Maintenance, and Lila, two books about the philosophy of static and dynamic quality and the way we as humans tend to like to divide things into finer and finer levels of discrimination and categorization. It is kind of our nature. and it can be taken to silly extremes at times.:coocoo:
Sorry for that post, it was incomplete.
I got a phone call and somehow send happened.

I think there are many things to consider about audio/listening testing.
There was another thread a while back discussing acoustic memory and filling in the gaps between two recordings that were supposed to be the same, but one had something subtle added to it.
Once that subtle addition was heard the brain simply added the missing info to the other track.

I may have the finer details of that a little skewed, but if the above scenario can happen with audio that actually is different then trying to differentiate amplifiers which all have stellar electrical specifications is pretty much impossible.
I have yet to see a better listening test method described than the blind listening test method, but I am open to the possibility that once something is heard the brain may fill in any missing pieces on the next essentially similar thing that is heard.
If this is happening it would be reasonable for it to be cumulative.

I do think amplifiers (including AVR amplifiers) sound the same, if there are differences to be heard between systems it would (IMO) be more likely for those differences to be in the front end.
Even if amplifiers do sound the same that does not mean people shouldn't want to own an amplifier, HiFi and HT for the enthusiast are in large part about playing with different things.
See less See more
I have yet to see a better listening test method described than the blind listening test method...
After our experience, I find myself wanting to ask, What blind test method? What methodology specifically? And I do not mean that as a challenge to you personally, just trying to make the point that the specific approach of the test can make a difference, as we found, and those specifics and conditions become an important part of the "can we hear a difference" question.
Is there a possibility to convince anyone to pass on the short term ABX tests that have many participants and numerous amplifiers that would by my definition, confuse matters more than clarifying things ?

I think first a reviewer should get very familiar with their software and as was done in the tests performed in these pages, limit what will be listened to.

Second, tests should be limited to no more than two amplifiers. While is would be best to have a static amplifier in that it should be one that the review would be familiar with. All other equipment should remain the same throughout the testing process.

Time, this takes time and as such, maybe two or three full nights should be spent listening to one amplifier and then the same amount of time for the second. Once the first two time spans were completed, then less time can be used to swap back and forth between the two amps being tested.

I would postulate that if there is a difference of sufficient importance, it will be heard. There will most probably be a difference and some of these have been described above by Craver.
Take the time to make these listening sessions worthwhile, not rushed. I do not see where AB testing works in anything much. Coke/pepsi failed, various types of water were tested and it did not work either. Some things that we are particularly sensitive to are easily recognized, different brands of bacon are good examples. That takes up several senses at once.

I know this method does work, so try it. You might like it.
See less See more
I find myself more in favor of the idea of limiting variables until you really feel you have your arms around them. Perhaps two amplifiers at a time is a really good way to get started with this.
I am not trying to be argumentative either

I used to be dead set certain blind listening tests were irrefutable until I saw the discussion (could have been another forum and sorry but I cannot find it now) about the music tracks being altered so one had additional sounds and after the listeners had heard both tracks a few times they started also hearing the additional sounds in the track where they did not exist.
I am aware that the psychoacoustic abilities of the human system are very powerful and I am at least open to the possibility that blind A/B testing could be fundamentally flawed.

This is what I have seen as the most often cited amplifier testing protocol.
It seems straight forward and reasonable enough.

I am willing to bet a coke, that if you sat a $300 AVR along with some electronic project boxes that would presumably switch the source and speakers between the AVR and someone's awesome HiFi rig and the owner/listener had a clicker that turned on a green or red LED on one of the boxes to indicate they were listening to the AVR or the awesome HiFi rig they would hear a difference when the AVR LED was lit even though no change had actually occurred.
Repeat the test unsighted, and A/B would become indistinguishable too the listener.
See less See more
221 - 240 of 251 Posts
This is an older thread, you may not receive a response, and could be reviving an old thread. Please consider creating a new thread.