Customer Reviews Not as Important as Ratings
October 12, 2010 § Leave a comment
Customer Reviews Not as Important as Ratings
Origin and Intro:
I shop online very often, both in the U.S. and back China. So I want to design an experiment that can help me shop more rationally.
I find customer reviews and ratings an interesting feature. When browsing the product list, I often check out the rating stars and only click open either the one with the highest rating or no review at all. Then I jump to customer review section, skim through several reviews, paying attention only to the first few reviews and never turn to the next page.
Some retailers like Amazon.com show reviews based on the elusive how-many-people-find-the-review-useful order, while others like Macy’s and Urbanoutfitters display their reviews in time order. Whichever the order, the first review shown is always mysteriously positive, with star ratings of five or four.
Since I only read the first few reviews, they has the biggest influence on me. And as it turns out, those positive reviews always urge me to own the product as soon as possible.
In the beginning, I designed my experiment to see if people are more favorable to the product with most favorable reviews shown at first, as first impression is very important. Unfortunately, most of my respondents realized my tricks immediately. So I redesigned my experiment to find out if people evaluate the product based on the reviews, or rating stars to make decision.
I expect to see that most people choose to buy the shoe with good reviews first, if they are given the reviews along with the ratings (control group), whereas if they are given the reviews without ratings, their choices would vary (experimental group).
First, I chose the same shoe that was sold by three major online shopping websites. Most of the people shopping online are women and shoes, handbags and clothes are the most common purchases. So I thought shoe was probably a good idea.
Then I got the reviews and randomly chose one 5-star review, four 4-stars, four 3-stars and one 2-star. Although the overall rating for all the shoes should be the same, I gave shoe B overall ratings of 4-star, C 4 1/2-star and A 3-star, just to confuse the respondents. All the good and bad reviews address to similar pros and cons. Pros: comfortable, fashionable, the right height of the heel. Cons: uncomfortable, heel too high, low quality.
For shoe A, I put the bad reviews first, good reviews last; for shoe B, good reviews first, bad reviews last; for shoe C, one bad review, one good review.
I emailed the profiles of three shoes in three separate document to my friends, their age ranging from 20 to 50, and interviewed some people at Bobst library aged 20-30, also using separate files for the three shoes.
The instructions I gave was: assume you are shopping online, in about two minutes, decide which shoe you’d like to buy depending solely on the reviews provided.
The two-minute limit is, I think, the proper amount of time a consumer spends on three shoes. Most of the shopping decision is made within seconds. Also, I didn’t want my respondents to study the shoe carefully, which is very likely since they know it is a test, in which case, it will lose the point of the experiment.
The picture I put on the shoe profiles are the same black pump for professional attire. I wanted to give them a general impression of the shoe so that they wouldn’t focus on trying to figure out what the shoe was like based on the reviews. For the same reason, I provided some information about the shoe.
- 1/4″ platform; 3″ heel
- Leather upper; Black
- Man-made sole
- Pump with round toe
For the control group, 12 out of 15 chose shoe B. One chose C. Two chose B. Two of them gave the reason that they just read the first review when shopping online. Several of them said the reviews of shoe B were more positive.
For the experimental group, people’s choices vary. 7 out of 18 people chose shoe B, 7 chose shoe A, 4 chose shoe C.
In the control group, 80% people chose shoe B. Even though there is no single rational answer, if the whole group acted rationally, I’d expect the answer to vary like the experimental group, despite the problem of small and select sample.
People chose shoe B probably because the first review of B is the most positive of the first reviews of the three shoes. Moreover, the general rating for B is 4-star, the best among the three. Thus B left more favorable impression to customers in the 2 minutes when they browsed the “webpage”.
In the experimental group, the result is almost proportionate distribution, with shoe B still on top. I think the same reason for the control group applies. But the most significant findings here is that the rating star gives customers a strong quantified sense of value and quality whereas the contents of the review play a minor part.
It is true that people can’t possibly read all the reviews in about 2 minutes, nor can they make an informed judgement in such a short time. And most of them probably read from the beginning. However, I never instructed them on which review to read, neither did I ask them to read in a specific order. This very problem proves the fact that people have a certain review-reading pattern, that is to read from the beginning and only pay attention to the first few.
Advice to shopping websites:
A. Do show the rating stars.
B. Put positive reviews first.
A follow-up experiment of showing people mismatched rating stars and reviews would be more interesting.