Thanks, Travis... a good contribution. There seem to be two areas of question though:
2) What's the sampling method? It looks like "People who visit this stats page use X", which is susceptible to gaming, although there's not much awareness yet:
If you had to phrase this in a sentence, what would you say it is actually measuring, and how?
Comment by John Dowdell at July 24, 2008 2:06 PM
I knew some questions like these might pop up :)
I tried to cover as many bases as possible, so here are my answers.
For Silver Light, the script must always create and deploy on the page an invisible Silver Light instance. The version number is pulled from the plugin instance.
In both cases, no SWF or XAML file is actually loaded and the temporary instances of the plugins are immediately destroyed.
2) The sampling actually takes place across many sites. You can include RIAStats on your own site and get your own stats! The stats collected from your site are aggregated into the public statistics.
The sampling method is per-browser-site-day. So one "unit" is a single browser, on a single site in a 24 hour period. I do this for several reasons.
a) We want to gather penetration stats over time. Basically the counter is reset daily (though the front page of riastats shows a collection of the last 30 days).
b) we don't want people gaming the system, so I made the collection happen over many different sites.
c) we don't want a single browser reporting over and over, so if cookies are disabled for riastats.com in a browser, stats for that browser won't be collected.
d) If a person did game the system, then it would only affect a single 24 hour period. They would have to game the system every single day.
e) Hopefully one day the site will have a wide enough deployment, that a single user trying to game the system would be a tiny fraction of the results.
f) There are some basic filtering rules on the servers that keep a single IP from flooding the stats.
Hope that answers some of your questions. There will be more info like this available in the FAQ at RIAStats.com; just havn't had the time to post it.
Comment by Travis Collins at July 24, 2008 2:24 PM
Cool, thanks. That federated sampling seems like it would be much less susceptible to gaming. The invocation check is solid too... thanks for watching out for those JS-only gotchas!
It might be good to check referrer headers, or whether there are spikes in visits... the "gaming" I was thinking of was more along the lines of "Hey everybody visit this site!" Hard to see how to avoid that, although the wide site sampling helps.
Comment by John Dowdell at July 24, 2008 2:45 PM
Thanks Travis, very useful stuff.
Any chance you can explain the big dip for Silverlight 2 from October 08 to November 08? Was there a change in your detection methods that might explain a drop so dramatic in a day?
Comment by Chris at January 23, 2009 6:15 PM
The dramatic changes in % numbers happen when a site with a relatively large number of visitors adds or removes the RIAStats script.
Every site has a different visitor pool, and therefore every site's RIAStats result is going to be slightly different. The idea is that with enough contributing sites, we'll have some solid real world numbers. (instead of stats from a single site).
So if today RIAStats is collecting stats from 10,000 visitors, and tomorrow a site with 1,000,000 visitors per day adds the RIAStats script, the stats will be much more a long the lines of the new, more popular site's visitor trend.
Right now there are about 45 sites that have the RIAStats script (which is free!). They are a true mix of topics, languages and visitor counts.
The dramatic shift you saw in Silver Light happened when a large foreign job listing site added the script. Their visitor numbers were about double what RIAStats was collecting. However, several sizeable sites have been added since then, and now the largest site makes up only about 55% of the total stats collected.
You'll notice that before the "drop", the lines were much more sporadic. As RIAStats is added to more sites, and thus has more data points, the lines will become smoother. Having big sites, and little sites is a good thing.
The idea is that "we" as a community can collate better and more accurate stats, than an single can do individually.
Thanks for asking!
Comment by Travis Collins at January 24, 2009 11:31 AM
Post a comment