Home Theater Forum and Systems banner

Home Theater Shack 2015 High-End Amplifier Evaluation Event Reporting and Discussion Thread

75K views 250 replies 29 participants last post by  JoeGonzales 
#1 ·
Home Theater Shack 2015 High-End Amplifier Evaluation Event Reporting and Discussion Thread



:fireworks2:
:fireworks1:




This thread is a continuation of the High-End Amplifier Evaluation Event Preparations Thread previously under way.



The event has begun. Coming to you from southern Alabama, the Home Theater Shack Evaluation Team has assembled at Sonnie Parker's Cedar Creek Cinema for the 2015 High-End Amplifier Evaluation Event. We have amps, we have speakers, we have tunes, we have great eats, what more could one ask for?

Be reminded of the first law of audio evaluation event execution. They never go exactly as planned. Not everything gets there, not everything works, but you endeavor to persevere and get things done.

We have deal with speakers not able to reach us in time, with cabling issues, with equipment not interfacing properly, a laptop crash, with hums and buzzes and clicks and pops, with procedural questions - - - yet we forge ahead, adapt, evolve, redirect, and forge ahead some more - - - and the task of evaluating amplifiers is underway.

Speakers: We were unable to get the Chane A5rx-c and the Acoustic Zen Crescendo Mk II speaker pairs. We are running the Spatial Hologram M1 Turbo v2 and the Martin Logan ESL. Both are very revealing speakers, baring a lot of inner detail in our recordings. They will serve us well. The A5rx-c will be reviewed for HTS when available.

At the moment, the Holograms are serving as our primary evaluation tool. I will post setup details and interesting discoveries a little later. They are giving us a monstrous soundstage, the kind that eats small animals for breakfast, with extremely sharp imaging and very good depth acuity. They are extremely clear, getting into the realm of rivaling electrostatic transparency. Their in-room response is very good, with some expected peaks and dips, but still very listenable. The high frequency response is extended and smooth. The bass gives you that "Are you sure the subs are not on?" feeling on deeper tracks.

We decided to start with sighted comparisons and open discussion today, and blind tests tomorrow. The Audyssey XT32 / Dirac Live comparison has not been completed yet.

Have we heard differences? Yes, some explainable and some not. One amp pairing yielded differences that several evaluators are convinced they could pick in a blind AB test.

One thing I have learned for sure: The perfect complement to good southern barbeque is a proper peach cobbler. Add great company and you have a perfect get-together.

The Event
  • Date: Thursday evening, March 12th through Saturday evening, March 14th.
  • Place: Cedar Creek Cinema, Alabama, hosted by Sonnie, Angie, and Gracie Parker.
  • Evaluation Panel: Joe Alexander (ALMFamily), Leonard Caillouet (lcaillo), Dennis Young (Tesseract), Sonnie Parker (Sonnie), Wayne Myers (AudiocRaver).

The Amplifiers
  • Behringer EP2500
  • Denon X5200 AVR
  • Emotiva XPA-2
  • Exposure 2010S
  • Krell Duo 175
  • Mark Levinson 532H
  • Parasound HALO A31
  • Pass Labs X250.5
  • Sunfire TGA-7401
  • Van Alstine Fet Valve 400R
  • Wyred 4 Sound ST-500 MK II
The Speakers
  • Spatial Hologram M1 Turbo v2, courtesy Clayton Shaw, Spatial Audio
  • Martin Logan ESL
Other key equipment special for the event:
  • Van Alstine ABX Switch Box, recently updated version (February 2015)
  • miniDSP nanoAVR DL, courtesy Tony Rouget, miniDSP
  • OPPO BDP-105

As mentioned, our deepest appreciation goes to Sonnie, Angie, and Gracie Parker, our hosts, for welcoming us into their home. Look up Southern Hospitality in your dictionary, and they are (or should be) listed as prime role models thereof.

This first posting will be updated with more info and results, so check back from time to time.




Amplifier Observations
These are the observations from our notes regarding what we heard that were supported by being consistent between sighted and blind testing and across reviewers. While we failed to identify the amps in ABX testing, the raw observations from the blind comparisons did correlate in some cases to the sighted observations and with the observations of other reviewers. Take these reports for what they are, very subjective assessments and impressions which may or may not be accurate.


Denon X5200 AVR

Compared to other amps, several observations were consistent. The Denon had somewhat higher sibilance, was a bit brighter, and while it had plenty of bass it was noted several times to lack definition found in other amps. At high levels, it did seem to strain a bit more than the other amps, which is expected for an AVR compared to some of the much larger amps. Several times it was noted by multiple reviewers that it had very good detail and presence, as well as revealing ambiance in the recordings.

We actually listened to the Denon more than any other amp, as it was in four of the blind comparisons. It was not reliably identified in general, so one could argue that it held its own quite well, compared to even the most expensive amps. The observations from the blind comparisons that had some common elements either between blind and sighted comparisons or between observers are below. The extra presence and slight lack of bass definition seem to be consistent observations of the Denon AVR, but everyone agreed that the differences were not a definitive advantage to any one amp that would lead us to not want to own or listen to another, so I think we can conclude that the Denon held its own and was a worthy amp to consider.

Compared to Behringer
- bass on Denon had more impact than Behr, vocals sounded muted on Behr
- vocals sounded muted on ML compared to Denon
- Denon: crisp highs preferred compared to Behringer which is silky.
- Denon is more present, forward in mids and highs than Behringer.

Compared to Mark Levinson
- Denon seemed to lack low end punch compared to ML.
- Denon is smooth, a certain PUSH in the bass notes, cellos & violins sounded distant, hi-hat stood out, distant vocal echo stood out, compared to ML.
- Denon bass seemed muddy compared to ML which is tighter.
- ML more distant strings than Denon.
- Denon is slightly mushy and fat in bass. String bass more defined on ML.
- ML seems recessed compared to Denon.

Compared to Pass
- vocals sounded muffled on Pass compared to Denon
- crisp bass on Denon compared to Pass
- Denon & Pass both even, accurate, transparent, natural, no difference, like both
- Pass seems soft on vocals but very close.
- Denon has a bit more punch on bottom, maybe not as much very deep bass, more mid bass.

Compared to Van Alstine
- bass on Chant track was crisp for VA while Denon was slightly sloppy
- sibilance not as pronounced on VA as it was on Denon
- VA super clarity & precision, detailed, space around strings, around everything compared to Denon which is not as clear, liked VA better.
- sibilanceon Denon, VA has less “air” but more listenable, both very good
- Very deep bass more defined on VA, overall more bass on Denon.


Wyred 4 Sound ST-500 MK II

In the sighted listening we compared the ST-500 MK II to the Van Alstine Fet Valve 400R. The assessments varied but were generally closer to no difference. The Van Alstine got comments of being fatter on the bottom. The Wyred 4 Sound was noted to have slightly better bass definition but apparently less impact there, and slightly less detail in the extreme highs. Most comments about the midrange were not much, if any difference. An interesting observation here was by Wayne, noting that he did not think he would be able to tell the difference in a blind comparison. Considering the ST-500 MK II is an ICE design and the Fet Valve 400R is a hybrid, we expected this to be one of the comparisons that would yield differences if any. As I am always concerned about expectation bias, this was one that I was particularly concerned with. Van Alstine is a personal favorite for a couple of us so I expected a clear preference for it to be present in the sighted comparison. I felt that the Wyred 4 Sound amp help its own with the much more expensive and likely to be favored VA.

In the blind comparisons, we compared the ST-500 MK II to the Emotiva XPA-2 and the Sunfire TGA-7401 in two separate sessions. Of course, in these sessions we had no idea what we were listening to until after all the listening was done. In the comparison to the Emotiva, some notes revealed not much difference and that these were two of the best sounding amps yet. The ST-500 MK II was noted to have the best midrange yet, along with the Emotiva. It was described as having less sibilance than both the Emotiva and Sunfire. Both the Emotiva and the ST-500 MK II were described as unstrained in terms of dynamics. In comparison to the Emotiva it was noted to have solid highs, lively dynamics, rich string tones, and punch in the bass. The overall preference in comparison to the Emo was either no difference to preferring the W4S.

In comparison to the Sunfire, comments ranged from preference for the W4S to not much difference to preference for the Sunfire. The Sunfire was described as having more presence in the midrange, while the Wyred was noted to be shrill, lifeless, and hollow by comparison.

These comments varied a lot, but the points of convergence were generally around the similarities to three amps that would be expected to be most likely to be different, if we found any differences at all. The objective results is that we failed to identify the amp in ABX comparisons to two other much more expensive amplifiers. I would have to conclude that based on the results, the ST-500 MK II represents one of the best values and certainly should satisfy most listeners.​





Audyssey XT32 vs. Dirac Live Listening Comparison

Last year HTS published a review of a the miniDSP DDRC-22D, a two-channel Dirac Live Digital Room Correction (DRC) product. The review included a comparison to Audyssey XT. A number of readers requested a comparison of Dirac Live with Audyssey XT32. That comparison was recently completed during the Home Theater Shack High-End Amplifier Evaluation Event at Sonnie Parker's Cedar Creek Cinema in rural Alabama. This report provides the results of that comparison.

Go to the Audyssey XT32 vs. Dirac Live Listening Comparison Report and Discussion Thread.


Spatial Hologram M1 Turbo Speakers

I was very pleased with the Spatial Hologram M1 speakers we used for the amplifier evaluation, and felt that they more than fulfilled our needs. They did not become "gotta have them" items for any of the evaluators, although I had thoughts in that direction once or twice. But they were speakers we could easily ignore through the weekend. I mean this as a high complement. Never did an evaluator complain that the M1 speakers were "in the way" or "holding us back," and we were able to focus on the task at hand unhindered. That alone means a lot, and may say more about them than the rest of the review just completed.

Here is what they did for us:
  • Because of their high efficiency, amplifiers were not straining to deliver the volumes we called for. We could be confident that the amps were operating in their linear ranges and that if we heard a difference it was not due to an amp being overdriven.
  • The stretched-out soundstage opened up a lot of useful detail for us to consider in our evaluations. In discussing the soundstage at one point, there was a consensus that it might be stretched a little too far and might be "coming apart at the seams," showing some gaps, although this did not hinder our progress. My final assessment is that this was not the case, all due respect to the fine ears of the other evaluators. I elaborate on this point in the M1 Review.
  • They served well as a full-range all-passive speaker, able to reach deep and deliver 40 Hz frequencies with lots of clean "oomph," all without the need for DSP boosting and without subwoofer support.
I thoroughly enjoyed spending time with them, and wish to again thank Clayton Shaw of Spatial Audio for loaning them to us. A complete review of the M1 speakers has been posted.

Go to the Spatial Hologram M1 Turbo Version 2 Speaker Review.


A Soundstage Enhancement Experience

Sonnie's MartinLogan ESL hybrid electrostatics were set up very nicely when we arrived, so we avoided moving them through the weekend. There were some improvements made to the soundstage and imaging by way of treatments, and some interesting twists and turns along the way which turned out to be very informative.

I have documented the exercise in a separate post.

Go to the Soundstage Enhancement Experience thread.
 
See less See more
3
#210 ·
A $300 AVR is not going to provide stable voltage into a difficult load.

Lou you are right, in the end a machine cant tell us how another machine is interacting with our senses. But we cannot reliably predict any machine will interact between 2 different listeners the same way. I think this is why the test I mentioned is a good place to start, then listen to the key ranges once they are determined by initial screening. Then you can correlate objective to subjective tests and see if a pattern emerges. Wayne has already done this to an extent in his test of an Axiom Amplifier into speakers. He would be a key resource in designing this test.
 
#212 ·
A $300 AVR is not going to provide stable voltage into a difficult load. Lou you are right, in the end a machine cant tell us how another machine is interacting with our senses. But we cannot reliably predict any machine will interact between 2 different listeners the same way. I think this is why the test I mentioned is a good place to start, then listen to the key ranges once they are determined by initial screening. Then you can correlate objective to subjective tests and see if a pattern emerges. Wayne has already done this to an extent in his test of an Axiom Amplifier into speakers. He would be a key resource in designing this test.
Yes, I see that now. It's more complicated than just differences between machines. Two listeners have different perceptions of reality and react to stimuli differently. We can take that one step further by repeating that even one particular person may not react to an amp/song/speaker the same way from moment to moment. Psychological and physiological factors influence how we react to what we hear whether it be ABX stress or wishful thinking. For that matter, something as trivial as a grocery list can distract us from discerning differences. Welcome to the machine! :)

Sent from my iPad using HTShack
 
#217 ·
I have been meaning to post my own observations and conclusions from the event. The posts over the last few days have prompted me to go ahead and get that done.

Thanks Leonard for being willing in take the time to dig through all of the data the way that he did. He has far more patience at that then I, and no one could have done a better job.

I had hoped that we would have much clearer results than we did, either that there were clear differences that we could prove, or that there were none we could hear at all. Instead we ended up with some of us able to hear some differences some of the time and only a little data to prove it. Concerning data which can be said to support any consistency of findings across the whole listening panel, Leonard has found what was there to be found and reported it already.

I will go ahead and post my individual observations for what they are worth, but only to be taken with a huge grain of salt, because they are impressions and that is all. If the others wish to post their impressions, they are welcome to do so.

First of all let me describe my evaluation process. I personally feel this is quite important because different people seem to have different ways of going about this. And if one has a listening style that works for him, that should probably be taken into account in the design of the blind testing that person will engage in. In other words, I might be able to set up a valid blind test method that would work great for me and throw Joe or Dennis or Leonard completely off, while there may be an approach that would work perfectly for some of them and leave me flat.

And this is one of the great difficulties in setting up tests like this. Someone with a good background in audio and acoustics and psychoacoustics and testing sits down and figures out a really good double-blind test method for ABX testing and 5 people walk into the room and it happens to fit the listening approaches of only one of them and he does well but the other four fail miserably, and the test overall shows no statistically significant data supporting the ability to tell a difference. Had the test been set up another way it might give a different result.

The ABX testing that we did made it a necessity that the evaluators rely upon extremely fine details being held in auditory memory for 30 seconds to a minute to be used in an AB comparison. Dennis appeared to do very well with this, while the rest of us did not. For me that was extremely difficult, as the fine differences that we were hearing were simply not something that I could capture in memory and carry forward in that way into a comparison 30 seconds to a minute later. Maybe with practice I could learn to do so but at the event I was not able to.

Here is what worked well for me. I felt fairly confident about the differences that I was hearing between amplifiers in sighted testing we did on the first day. The two amplifiers were set up, their levels were matched, we knew which was which, and we held the a/b switch in our hands while we listen to our own selected listening tracks. As I listened through my tracks I switched back and forth freely between the two amplifiers. Over time I started to recognize that there were certain passages of each track that seemed more likely than others to help reveal differences between the amplifiers, so I focused more on those parts of the tracks, but I also listened to other passages just in case something new would pop up.

When I heard a difference I tried to make a note of what part of what track it was that I heard it on, and what I heard, and where I felt I had time I would go back and repeat that to be sure that the difference was distinctive and easily identifiable. Remember these judgments were not absolute in any way but extremely comparative in nature, as will be seen in my impressions of some of the amplifiers the follow. By switching back and forth during those critical passages, I felt the contrast almost jumped out sometimes when the switching was done at just the right moment. Given the ability to do that repeatedly with a pair of amplifiers, I got to the point where I was pretty confident I could identify the difference consistently.

So if I was to set up a blind comparison around that listening style and try to get statistical data to show I could do it consistently here's how I would go about it. I would start out with the pair sighted so I knew which was which and go about the test as I have described and identify the characteristics comparatively between the two amplifiers. Then I would leave the room, have the test setter upper flip a coin and decide whether or not to swap the two amplifiers. When I came back into the room I would know it was the same two amplifiers but would not know if they had been switched or not. So my task would be to sit down and listen, switching back and forth and try to come to the same conclusion as I did before comparing with the same tracks and identify which of the amplifiers was A and which was b.

When done, I would leave the room and we would do the whole thing again, maybe 10 times in a row in a day. At the end of the day if with this process I was able to identify the amplifiers correctly say nine times out of ten, that would be a significant result.

On another day, the same process could be followed with another pair of amps. You can see that this could turn into quite a long, drawn out process with multiple people and multiple amplifiers. You can also see that someone else might try the same method and have it absolutely not work for him at all. And it would become difficult to find a way to work with a listening preferences of each listener and still end up with what one can call statistically valid results because it almost ended up being like different kinds of tests for each listener.

That is a problem that I see with throwing around broad statements like, can you prove it in a double-blind study? Which double-blind study? Who sets it up? What are the conditions?

Some will say that it is wrong to tailor the test to the listener, that it invalidates the study right off the bat. and again I would say that it depends on how you define what you were trying to accomplish. For the kind of differences we are talking about, I will go out on a limb and predict that if the only way it is approached is by trying to come up with a single generic test that has to fit all listeners with their different critical listening styles, then the testing is bound to show that those differences cannot be heard consistently across a broad listening audience under that kind of test.

But if somebody gets their gumption together to define an approach to accommodate individual listening style and crunch the numbers together at the end, and include the information about what those styles were, then we may someday end up with a real in depth test that shows that those differences can be discerned consistently. This could even be done in a way which accommodates those who prefer long term listening tests, as some say that that is the only way to really hear some of the fine differences. That has not been my experience so far, but it is very close minded of me to assume that it cannot work for someone else, or even to assume that would not work for me if I really give it a proper chance over time.

I would like to note one observation that I find somewhat humorous. In the ABX testing, my success rate at identifying the X amplifier was the worst of the whole bunch of us. I was wrong six out of seven times. In a way, that result is the most statistically significant of all the listeners at the event, I just had a mental flip-flop of some kind going on that led me to the wrong answer almost every time.


My Observations

These differences are comparative in nature, and almost impossibly small. I would never expect to be able to walk into a room and hear one of these amplifiers playing and say, "Hey, I recognize that particular sound as being the Parasound amp,” or the Krell amp or any other particular amp. And with my experience at this so far, I would be suspect of anyone who claims that they could.


Day 1, Sighted Pairings:

1 - Krell vs Parasound:
Krell, bigger sound
Parasound, not as big

2 - Denon vs Mark Levinson
Denon, brighter
Mark Levinson, rolled-off high end

3 - Emotiva vs Pass Labs:
Emotiva, slightly bigger bass
Pass labs, tighter
This was a fun pairing, I liked both amps.

4 - Van Alstyne vs Wyred4Sound, no difference noted

5 - Behringer vs Sunfire
Behringer, les bass
Sunfire, more bass

6 - Exposure vs Krell, no difference noted


Day 2, Blind Pairings:

1 - Denon vs Behringer
Denon, crisp highs
Behringer, silky highs
I preferred the Denon.

2 - Denon vs Mark Levinson
Denon, bass not as clear
Mark Levinson, bass seemed tighter, clearer
I missed the rolled off high frequencies of the Mark Levinson, which I heard in sighted testing.

3 - Exposure vs Parasound, no difference noted

4 - Wired for sound vs Emotiva
Wired for sound, solid highs, lively dynamics, richest string tones, punchy bass
Emotiva, punchy bass
I preferred the Wyred4Sound

5 - Wyred4Sound vs Sunfire
Wyred4Sound, a little shrill
Sunfire, alive, nice highs
I preferred the Sunfire

6 - Denon vs Pass Labs
I noted no difference between these two amplifiers, but my comments were that they were both very even, accurate, transparent, and natural, and that I'd like either of them.

7 - Denon vs Van Alstyne
Denon, okay, not quite as clear, a normal amp sound
Van Alstyne, super clear and detailed, space around all the sounds
I preferred the van Alstine


Future Work:

How about removing the room from the equation? Use the same setup, but at the speaker terminals attach an attenuator pad and buffer amp with leads to a different room, feeding a class A headphone amp and low-distortion headphones. With the right headphones, I can readily hear differences between headphone DAC/AMP models I am reviewing. Just an idea.


Conclusions:

The main takeaway here is that the differences are incredibly small, difficult to hear, and difficult to test for in a provable way. I would have probably been happy with any of these amplifiers if I had to walked into a room and heard it all by itself. I doubt I would have been able to say that any one of them was better or worse than any other under normal listening circumstances.
 
#218 ·
Well written Wayne, that kind of straight forward honesty is what HTS is all about! I saw you had responded to this thread & I immediatly started imagining what you might say. I actually had thoughts about setting up a session with cans like you said! Wow, is the cosmos coming together or what! Interesting theory, but like you said, very hard to hear differences.

One thing I might add is to never know which amp is which. If you built a false wall between you and the amps, and just labeled them A, B, C, D etc. Then made the list of pairings so everybody got the same exposure/chance to put through their paces. It might help to take out any preconceived expectations. No need to know which amp it is at all! Kinda simplistic untill you think about it.:ponder:
 
#219 ·
One thing I might add is to never know which amp is which. If you built a false wall between you and the amps, and just labeled them A, B, C, D etc. Then made the list of pairings so everybody got the same exposure/chance to put through their paces. It might help to take out any preconceived expectations. No need to know which amp it is at all! Kinda simplistic untill you think about it.:ponder:
You are right, as long as they are always called the same name, they could be letters, numbers, mineral names, whatever, to help remove any bias.
 
#220 ·
Wayne and I are of one mind on this. I was not much better than he was at matching the amps to what I thought I heard. I found myself asking the question, was it A or B that was more like X. I was taking notes but would still get confused. I have terrible short term memory for random data and have to repeat things to myself to be able to recall at all, and the short time was a problem for me. I would focus on what I was hearing and feeling and did not have time to let myself really get into the music. I was more successful when I just listened for feel rather than specific characteristics to compare the amps. The ones that I got right were purely on what the sound felt like to me, with no specific notes.

I have a bit of experience in testing and in behavioral research with multiple trials. One thing I have learned that was confirmed here is that subjects have to get comfortable with the testing context before you can get reliable results. I can see why so many people get hysterical at the idea of AB or ABX testing. It is very different than the way we normally listen.

I really believe that there were differences at times, but VERY small. It will take much more time and focus on just a couple of amps to tease them out consistently if it is possible at all. Looking back on it, trying to compare so many amps was a fools errand, even if it was tremendous fun.

I'll pull out my notes from the first day and post them. My impressions of the amps were sighted, but I can be sure that they apply to the right one. I got mixed up too much on the ABX comparisons to be sure that the comments would be about the right amp. The could easily be backwards.
 
#221 ·
Yay! Thanks for more details. Very interesting observations about ABX testing and possible criteria for improving reliability. :T

I hoped my initial (abnormal) terse prodding was enough to get a response and not leave a lasting bad taste in your mouths about reader feedback. You guys always do great work for us who like to live vicariously through these sort of reports. :) THANK YOU! :reading:
 
#222 ·
Leonard makes a good point, that time to really get comfortable and familiar with the test environment is a good thing. Some of those contrasting impressions I feel could be repeated in the right conditions. Some were still quite fresh and could have evolved over time in their description. For instance, "silky highs" vs. "crisp highs," what does that even mean? If I had another half hour or 45 minutes to really investigate that dimension of a pair of amplifiers, would it did end up a completely different description? Would that contrast have become easier to hear and identify somehow? Might it even have ended up disappearing altogether, something totally imagined? These are all possibilities.

The human imagination is incredibly powerful, I do not understand why it is so difficult for some people to accept that it can affect our hearing, too. I have had it happen to me. I have nothing against faith, nothing against trusting that I can hear something even if it cannot be measured. And I feel no need to prove to someone else something that I know is true and repeatable, especially if it can be replicated from scratch in a different environment. But I am going to need to prove it to myself to be sure I did not make it up. Repeated testing, perhaps over several listening sessions, perhaps over several days, may be needed to get those initial impressions sorted through and settled down to real repeatability and meaningful description.

As a pure guess, I would say that my impressions above are 50% stable and 50% unstable, or in need of more time to mature and even be sure they were real.
 
#223 ·
So both of us were reliably wrong in identifying amps in ABX comparisons. What conclusion can we draw from that? I know we both worked very hard at trying to get it right, so my feeling is that it is the testing design that is flawed. We should have been closer to 50% if there was not some systematic bias going on.
 
#227 ·
It does mean something. It means that you were wrong when you thought you were right (you thought you were picking the correct amp, but you actually picked the incorrect amp). I don't mean this in a demeaning way, it's just what the test results mean. You could not tell which amp was which, and because you tried to pick the correct one, it usually will lead to the incorrect amp.
 
#228 ·
If the testing design was not flawed, and there was not a difference between the amps, we should average around 50%. My point is that there is something going on beyond chance, which means that there is either a flaw in the testing design. My point is that ABX testing is not as objective a method as many would suggest.

Yes, we were more often wrong than right. It is not demeaning it is just data. For data to become informative, you have to attach some meaning to it. If the results were closer to random, I would be less critical of our methodology. To be consistently wrong is very curious.
 
#230 ·
The sample size was 28 (4 observers x 7 comparisons). Out of those 28 trials we got 11 correct (most of those were thanks to Dennis, BTW). The probablility of that if there was a .5 probability on each test would be about 11%. That is certainly not low enough to conclude with a high degree of certainty that the test was biased, but it is still pretty unlikely for a fair test at .5 probability per trial.

I understand statistics better than most, as I was a math teacher and did behavioral research in grad school. All research has hidden biases that are hard to tease out. Increasing the number of trials makes it less likely to err in one's conclusions when all of the significant variables have been controlled for. Below a certain threshold, however, you are still in guessing mode, and that is where we remain. When there, you have to make educated guesses at how you can make the testing more reliable in getting at what you are looking for and how to minimize the effects of unintended variables. Confusion in recall was certainly an issue for Wayne and me, less so for Joe, and not much so for Dennis. If we just considered Dennis' results he was correct enough to be statistically significant, but it would be unfair to do so. You don't throw out some of the data to get the result you want.

The bottom line is that the patterns of the data do suggest a problem with the methodology, at least for some of the subjects. In the future we will account for that.
 
#232 ·
My point from post #224 is that you do not have a 50/50 chance at getting the answer correct. To get a 50/50 result, the answers have to be picked at random.

You were not picking your answers at random (i.e. flipping a coin, or choosing an answer before you saw the question), you were using judgmental guesses which alters the outcome from a 50/50 result. If you could correctly distinguish a certain amp, then the results would have been skewed toward more correct answers. If you could not correctly distinguish a certain amp, then the results would have been skewed toward more incorrect answers.

Using random picks usually comes to a 50/50 result. Using judgmental guessing does not come to a 50/50 result.

Your results were skewed toward more incorrect answers, this tells us that you could not correctly identify the amps. It does not mean that the testing was flawed.
 
#234 ·
(Full disclosure, I think amplifiers sound the same if not driven to distortion)
My training is electr
I do not mean this in an argumentative way at all. Just seeking a little further clarity. I think part of what we are learning from this, is that there is an awful lot of fine detail in the "sounds the same" part of that statement, which we all have a tendency to throw around freely, myself included. Under normal listening conditions, we are processing so much so fast, and if an amplifier doesn't sound bad, or if it sounds pretty good, then we think of it as being good enough and we are happy. When we start listening really close for detail, are there a little differences that might be audible? Little differences in the way that soundstage shows up as a result of crosstalk in circuitry or in power supply circuitry? Distortion of a slightly different nature in this amplifier vs that amplifier, both good amplifiers but with slightly different sonic characteristics resulting from bias circuitry design? And part of the question along with all of that, is it worth the trouble to try to hear that level of detail? If it is not something that jumps right out at you, why worry about it? That is Sonnie's way of looking at things. And for most of us most of the time that is not a bad way of approaching it.

But remember also that the purpose of this study and studies like it is to try to determine "can we hear a difference" not "is it worthwhile to try to hear a difference?" They're totally different questions.

Just some things to consider.
 
#235 ·
For those who enjoy the more philosophical side of things, I always enjoyed Robert Pirsig's books, Zen and the Art of Motorcycle Maintenance, and Lila, two books about the philosophy of static and dynamic quality and the way we as humans tend to like to divide things into finer and finer levels of discrimination and categorization. It is kind of our nature. and it can be taken to silly extremes at times.:coocoo:
 
#236 ·
Sorry for that post, it was incomplete.
I got a phone call and somehow send happened.

I think there are many things to consider about audio/listening testing.
There was another thread a while back discussing acoustic memory and filling in the gaps between two recordings that were supposed to be the same, but one had something subtle added to it.
Once that subtle addition was heard the brain simply added the missing info to the other track.

I may have the finer details of that a little skewed, but if the above scenario can happen with audio that actually is different then trying to differentiate amplifiers which all have stellar electrical specifications is pretty much impossible.
I have yet to see a better listening test method described than the blind listening test method, but I am open to the possibility that once something is heard the brain may fill in any missing pieces on the next essentially similar thing that is heard.
If this is happening it would be reasonable for it to be cumulative.

I do think amplifiers (including AVR amplifiers) sound the same, if there are differences to be heard between systems it would (IMO) be more likely for those differences to be in the front end.
Even if amplifiers do sound the same that does not mean people shouldn't want to own an amplifier, HiFi and HT for the enthusiast are in large part about playing with different things.
 
#237 ·
I have yet to see a better listening test method described than the blind listening test method...
After our experience, I find myself wanting to ask, What blind test method? What methodology specifically? And I do not mean that as a challenge to you personally, just trying to make the point that the specific approach of the test can make a difference, as we found, and those specifics and conditions become an important part of the "can we hear a difference" question.
 
#238 ·
Is there a possibility to convince anyone to pass on the short term ABX tests that have many participants and numerous amplifiers that would by my definition, confuse matters more than clarifying things ?

I think first a reviewer should get very familiar with their software and as was done in the tests performed in these pages, limit what will be listened to.

Second, tests should be limited to no more than two amplifiers. While is would be best to have a static amplifier in that it should be one that the review would be familiar with. All other equipment should remain the same throughout the testing process.

Time, this takes time and as such, maybe two or three full nights should be spent listening to one amplifier and then the same amount of time for the second. Once the first two time spans were completed, then less time can be used to swap back and forth between the two amps being tested.

I would postulate that if there is a difference of sufficient importance, it will be heard. There will most probably be a difference and some of these have been described above by Craver.
Take the time to make these listening sessions worthwhile, not rushed. I do not see where AB testing works in anything much. Coke/pepsi failed, various types of water were tested and it did not work either. Some things that we are particularly sensitive to are easily recognized, different brands of bacon are good examples. That takes up several senses at once.

I know this method does work, so try it. You might like it.
 
#240 ·
I am not trying to be argumentative either

I used to be dead set certain blind listening tests were irrefutable until I saw the discussion (could have been another forum and sorry but I cannot find it now) about the music tracks being altered so one had additional sounds and after the listeners had heard both tracks a few times they started also hearing the additional sounds in the track where they did not exist.
I am aware that the psychoacoustic abilities of the human system are very powerful and I am at least open to the possibility that blind A/B testing could be fundamentally flawed.

This is what I have seen as the most often cited amplifier testing protocol.
http://tom-morrow-land.com/tests/ampchall/
It seems straight forward and reasonable enough.

I am willing to bet a coke, that if you sat a $300 AVR along with some electronic project boxes that would presumably switch the source and speakers between the AVR and someone's awesome HiFi rig and the owner/listener had a clicker that turned on a green or red LED on one of the boxes to indicate they were listening to the AVR or the awesome HiFi rig they would hear a difference when the AVR LED was lit even though no change had actually occurred.
Repeat the test unsighted, and A/B would become indistinguishable too the listener.
 
#243 ·
Good Morning Charlie, this is Jack and I am inviting you to my home to listen to 3 high quality amplifiers on my present reference systems and with a bit of training on what to listen for, you will be able to tell the difference closer to 100% of the time than you think. And I will buy the Coke (soft Drink) for your pleasure once you grasp how easy it can be. No challenge here, just an offer should you wish to travel just a few hours. :smile:
 
#247 ·
I still contend that the "blind" in these evaluations should be literal. The amps should be boxed/concealed such that no one knows what the amp is until the event is over. Switch as you would, listen, record your impressions & only after all is done, unbox & see what is what. Of course, the person doing the hook ups would know, but they, perhaps, should/would not participate.
 
#250 ·
Of course that is the best way to do it. It potentially makes procedures far more complex and possibly unmanageable/prohibitive. Bind works for me, generally, if carefully executed and thoroughly reported.

Edit: For me personally, this is partly because I enjoy experimenting but not all of the tedious detail involved in thorough scientific double-blind methods.
 
This is an older thread, you may not receive a response, and could be reviving an old thread. Please consider creating a new thread.
Top