The
Wilson score calculates based on a binomial distribution a lower and upper bound for ratings. I implemented the formula found
here. Not sure if this is methodologically sound, but I think it does the trick.
Now we don't use the lower Wilson score directly. In case of the lower wilson score plenty of sites would show a zero as there simply are too few ratings to derive any confidence greater than zero. What we do:
* We have a config weighing parameter. I think it's 25%. Let's call it wf for weighing factor.
* (1-wf) * Avg Rating + wf * Lower Wilson Score.
You are obviously free to increase / decrease wf. I found 25% mostly did the trick and sorted the list in a sensible fashion.