Home Theater Forum and Systems banner

221 - 240 of 250 Posts

·
Registered
Joined
·
102 Posts
Yay! Thanks for more details. Very interesting observations about ABX testing and possible criteria for improving reliability. :T

I hoped my initial (abnormal) terse prodding was enough to get a response and not leave a lasting bad taste in your mouths about reader feedback. You guys always do great work for us who like to live vicariously through these sort of reports. :) THANK YOU! :reading:
 

·
Banned
Joined
·
4,838 Posts
Discussion Starter #222
Leonard makes a good point, that time to really get comfortable and familiar with the test environment is a good thing. Some of those contrasting impressions I feel could be repeated in the right conditions. Some were still quite fresh and could have evolved over time in their description. For instance, "silky highs" vs. "crisp highs," what does that even mean? If I had another half hour or 45 minutes to really investigate that dimension of a pair of amplifiers, would it did end up a completely different description? Would that contrast have become easier to hear and identify somehow? Might it even have ended up disappearing altogether, something totally imagined? These are all possibilities.

The human imagination is incredibly powerful, I do not understand why it is so difficult for some people to accept that it can affect our hearing, too. I have had it happen to me. I have nothing against faith, nothing against trusting that I can hear something even if it cannot be measured. And I feel no need to prove to someone else something that I know is true and repeatable, especially if it can be replicated from scratch in a different environment. But I am going to need to prove it to myself to be sure I did not make it up. Repeated testing, perhaps over several listening sessions, perhaps over several days, may be needed to get those initial impressions sorted through and settled down to real repeatability and meaningful description.

As a pure guess, I would say that my impressions above are 50% stable and 50% unstable, or in need of more time to mature and even be sure they were real.
 

·
Plain ole user
Joined
·
11,121 Posts
So both of us were reliably wrong in identifying amps in ABX comparisons. What conclusion can we draw from that? I know we both worked very hard at trying to get it right, so my feeling is that it is the testing design that is flawed. We should have been closer to 50% if there was not some systematic bias going on.
 

·
Registered
Joined
·
918 Posts
We should have been closer to 50% if there was not some systematic bias going on.
A completely random pick should result in close to 50/50, monkeys should get those results. You were not randomly picking, you were making educated guesses, which tend to result in more incorrect answers (info from Mometrex testing).
 

·
Plain ole user
Joined
·
11,121 Posts
That is my point. The fact that we consistently got them backwards in a choice with even odds is curious in itself.
 

·
Registered
Joined
·
918 Posts
It does mean something. It means that you were wrong when you thought you were right (you thought you were picking the correct amp, but you actually picked the incorrect amp). I don't mean this in a demeaning way, it's just what the test results mean. You could not tell which amp was which, and because you tried to pick the correct one, it usually will lead to the incorrect amp.
 

·
Plain ole user
Joined
·
11,121 Posts
If the testing design was not flawed, and there was not a difference between the amps, we should average around 50%. My point is that there is something going on beyond chance, which means that there is either a flaw in the testing design. My point is that ABX testing is not as objective a method as many would suggest.

Yes, we were more often wrong than right. It is not demeaning it is just data. For data to become informative, you have to attach some meaning to it. If the results were closer to random, I would be less critical of our methodology. To be consistently wrong is very curious.
 

·
Registered
Joined
·
201 Posts
If the testing design was not flawed, and there was not a difference between the amps, we should average around 50%. My point is that there is something going on beyond chance, which means that there is either a flaw in the testing design. My point is that ABX testing is not as objective a method as many would suggest.

Yes, we were more often wrong than right. It is not demeaning it is just data. For data to become informative, you have to attach some meaning to it. If the results were closer to random, I would be less critical of our methodology. To be consistently wrong is very curious.
Results would average 50% over a large sample size. In a small size you will have people that get them all right or all wrong, so. I don't think you can draw much from a 2 person sample size. That being said, it is interesting and may warrant further investigation.
 

·
Plain ole user
Joined
·
11,121 Posts
The sample size was 28 (4 observers x 7 comparisons). Out of those 28 trials we got 11 correct (most of those were thanks to Dennis, BTW). The probablility of that if there was a .5 probability on each test would be about 11%. That is certainly not low enough to conclude with a high degree of certainty that the test was biased, but it is still pretty unlikely for a fair test at .5 probability per trial.

I understand statistics better than most, as I was a math teacher and did behavioral research in grad school. All research has hidden biases that are hard to tease out. Increasing the number of trials makes it less likely to err in one's conclusions when all of the significant variables have been controlled for. Below a certain threshold, however, you are still in guessing mode, and that is where we remain. When there, you have to make educated guesses at how you can make the testing more reliable in getting at what you are looking for and how to minimize the effects of unintended variables. Confusion in recall was certainly an issue for Wayne and me, less so for Joe, and not much so for Dennis. If we just considered Dennis' results he was correct enough to be statistically significant, but it would be unfair to do so. You don't throw out some of the data to get the result you want.

The bottom line is that the patterns of the data do suggest a problem with the methodology, at least for some of the subjects. In the future we will account for that.
 

·
Registered
Joined
·
918 Posts
My point from post #224 is that you do not have a 50/50 chance at getting the answer correct. To get a 50/50 result, the answers have to be picked at random.

You were not picking your answers at random (i.e. flipping a coin, or choosing an answer before you saw the question), you were using judgmental guesses which alters the outcome from a 50/50 result. If you could correctly distinguish a certain amp, then the results would have been skewed toward more correct answers. If you could not correctly distinguish a certain amp, then the results would have been skewed toward more incorrect answers.

Using random picks usually comes to a 50/50 result. Using judgmental guessing does not come to a 50/50 result.

Your results were skewed toward more incorrect answers, this tells us that you could not correctly identify the amps. It does not mean that the testing was flawed.
 

·
Banned
Joined
·
4,838 Posts
Discussion Starter #234
(Full disclosure, I think amplifiers sound the same if not driven to distortion)
My training is electr
I do not mean this in an argumentative way at all. Just seeking a little further clarity. I think part of what we are learning from this, is that there is an awful lot of fine detail in the "sounds the same" part of that statement, which we all have a tendency to throw around freely, myself included. Under normal listening conditions, we are processing so much so fast, and if an amplifier doesn't sound bad, or if it sounds pretty good, then we think of it as being good enough and we are happy. When we start listening really close for detail, are there a little differences that might be audible? Little differences in the way that soundstage shows up as a result of crosstalk in circuitry or in power supply circuitry? Distortion of a slightly different nature in this amplifier vs that amplifier, both good amplifiers but with slightly different sonic characteristics resulting from bias circuitry design? And part of the question along with all of that, is it worth the trouble to try to hear that level of detail? If it is not something that jumps right out at you, why worry about it? That is Sonnie's way of looking at things. And for most of us most of the time that is not a bad way of approaching it.

But remember also that the purpose of this study and studies like it is to try to determine "can we hear a difference" not "is it worthwhile to try to hear a difference?" They're totally different questions.

Just some things to consider.
 

·
Banned
Joined
·
4,838 Posts
Discussion Starter #235
For those who enjoy the more philosophical side of things, I always enjoyed Robert Pirsig's books, Zen and the Art of Motorcycle Maintenance, and Lila, two books about the philosophy of static and dynamic quality and the way we as humans tend to like to divide things into finer and finer levels of discrimination and categorization. It is kind of our nature. and it can be taken to silly extremes at times.:coocoo:
 

·
Premium Member
Joined
·
2,539 Posts
Sorry for that post, it was incomplete.
I got a phone call and somehow send happened.

I think there are many things to consider about audio/listening testing.
There was another thread a while back discussing acoustic memory and filling in the gaps between two recordings that were supposed to be the same, but one had something subtle added to it.
Once that subtle addition was heard the brain simply added the missing info to the other track.

I may have the finer details of that a little skewed, but if the above scenario can happen with audio that actually is different then trying to differentiate amplifiers which all have stellar electrical specifications is pretty much impossible.
I have yet to see a better listening test method described than the blind listening test method, but I am open to the possibility that once something is heard the brain may fill in any missing pieces on the next essentially similar thing that is heard.
If this is happening it would be reasonable for it to be cumulative.

I do think amplifiers (including AVR amplifiers) sound the same, if there are differences to be heard between systems it would (IMO) be more likely for those differences to be in the front end.
Even if amplifiers do sound the same that does not mean people shouldn't want to own an amplifier, HiFi and HT for the enthusiast are in large part about playing with different things.
 

·
Banned
Joined
·
4,838 Posts
Discussion Starter #237
I have yet to see a better listening test method described than the blind listening test method...
After our experience, I find myself wanting to ask, What blind test method? What methodology specifically? And I do not mean that as a challenge to you personally, just trying to make the point that the specific approach of the test can make a difference, as we found, and those specifics and conditions become an important part of the "can we hear a difference" question.
 

·
Registered
Joined
·
1,784 Posts
Is there a possibility to convince anyone to pass on the short term ABX tests that have many participants and numerous amplifiers that would by my definition, confuse matters more than clarifying things ?

I think first a reviewer should get very familiar with their software and as was done in the tests performed in these pages, limit what will be listened to.

Second, tests should be limited to no more than two amplifiers. While is would be best to have a static amplifier in that it should be one that the review would be familiar with. All other equipment should remain the same throughout the testing process.

Time, this takes time and as such, maybe two or three full nights should be spent listening to one amplifier and then the same amount of time for the second. Once the first two time spans were completed, then less time can be used to swap back and forth between the two amps being tested.

I would postulate that if there is a difference of sufficient importance, it will be heard. There will most probably be a difference and some of these have been described above by Craver.
Take the time to make these listening sessions worthwhile, not rushed. I do not see where AB testing works in anything much. Coke/pepsi failed, various types of water were tested and it did not work either. Some things that we are particularly sensitive to are easily recognized, different brands of bacon are good examples. That takes up several senses at once.

I know this method does work, so try it. You might like it.
 

·
Premium Member
Joined
·
2,539 Posts
I am not trying to be argumentative either

I used to be dead set certain blind listening tests were irrefutable until I saw the discussion (could have been another forum and sorry but I cannot find it now) about the music tracks being altered so one had additional sounds and after the listeners had heard both tracks a few times they started also hearing the additional sounds in the track where they did not exist.
I am aware that the psychoacoustic abilities of the human system are very powerful and I am at least open to the possibility that blind A/B testing could be fundamentally flawed.

This is what I have seen as the most often cited amplifier testing protocol.
http://tom-morrow-land.com/tests/ampchall/
It seems straight forward and reasonable enough.

I am willing to bet a coke, that if you sat a $300 AVR along with some electronic project boxes that would presumably switch the source and speakers between the AVR and someone's awesome HiFi rig and the owner/listener had a clicker that turned on a green or red LED on one of the boxes to indicate they were listening to the AVR or the awesome HiFi rig they would hear a difference when the AVR LED was lit even though no change had actually occurred.
Repeat the test unsighted, and A/B would become indistinguishable too the listener.
 
221 - 240 of 250 Posts
Top