June 29th, 2014
DataShine: Census provides a simple, map-based view of the UK's 2011 census data. I could browse this thing for hours….
The DataShine mapping platform is an output from an ESRC Future Research Leaders Project entitled "Big Open Data: Mining and Synthesis". The overall project seeks promote and develop the use of large and open datasets amongst the social science community. A key part of this initiative is the visualisation of these data in new and informative ways to inspire new uses and generate insights. Phase one has been to create the mapping platform with data from the 2011 Census. The next phases will work on important issues such as representing the uncertainty inherent in many population datasets and also developing tools that will enable the synthesis of data across multiple sources.
[Via Flowing Data]
May 14th, 2014
15 weird things that 9% of Britons say they believe:
If Labour are having a tough time in the polls, the Lib Dems are facing a European wipe out.
The latest YouGov figures on how people are intending to vote in the European Elections put Lib Dem support at 9%. Our friends at UsVsTh3m noticed this was significantly lower than the number of people who would be prepared to have sex with an android.
We wondered what other things more than 9% of the British public believe, would be prepared to do, or have done…
10. Eat testicles
Not just the preserve of Bushtucker Trials in I'm A Celeb, 9% of people in the UK said they would be prepared to eat animal testicles. Remember, that's the same amount of people who say they'll vote Lib Dem.
Gloating? Perhaps. But it's a welcome distraction from contemplating UKIP's polling numbers.
December 1st, 2013
Jonas Lund's Gallery Analytics brings WiFi-based tracking to the cultural sector:
Lund's Gallery Analytics project is a site-specific installation for exhibitions that's able to generate data about behavior of visitors and present this data in a Google Analytics-like environment. By setting up a mesh Wi-Fi network and combining it with custom-made software, Gallery Analytics is able to track every Wi-Fi-enabled device (such as a smartphone) moving around in the area in real-time. [...]
I can see how with a long-term exhibit you might want to tinker with the layout if analysis reveals that visitors are tending to overlook a particular piece, or perhaps even to swap out a piece that people aren't paying attention to for something that might attract more interest, but if you have a short-term exhibit will you accumulate enough data to draw firm conclusions about what is and isn't working before it moves on? Also, if you're a museum that hosts visits by groups you might find that a group of students being led through on a tour of your exhibits will end up distorting your stats a bit. What you really need is a real-life equivalent of the Referrer field to help you distinguish between a group being led around and individual, self-directed visitors.
All in all, this could be a heck of a tool for museum and gallery operators, so long as they don't go nuts and start assuming that the data is the whole story.
Now, for extra credit, consider your local shopping mall or town centre doing all of the above. Is that better, or worse, or no different? Please justify your answer.
[Via Extenuating Circumstances]
March 21st, 2013
Thank the Academy: A visualization of how Oscar winners express gratitude.
I have to be honest: that's one of the more straightforward charts on the site. The interactive charts that really let you slice and dice the data are where the action is, but they can't be properly represented by a dinky screenshot over here.
You really should go and have a play for yourself. You can view the differences in the content of speeches and even the behaviour of the recipient, then view the differences between eras, or between different classes of award winner. It's a very well done site.
I'd love to see someone apply these same techniques and style of presentation to another corpus taken from an annual event with a bit of a history to it. Say, Budget speeches by Chancellors of the Exchequer over the last 50 years, or party leaders' speeches to their party conference. Granted, you couldn't do much with an analysis by gender of either of those data sets – what with any analysis by gender of the relevant UK data sets having Margaret Thatcher on one side of the stats and generation after generation of middle aged men on the other – but there would be all sorts of illuminating ways to break the data down.
One think I can confidently predict: those sorts of data sets would provide fewer opportunities to tally the number of speakers who burst into tears during their speech. Also, some poor devil would have to sit through recordings of each speech taking detailed notes, and I'm pretty sure that'd be a lot less entertaining – a lot less glamorous, certainly – than watching 50 years of excerpts from the Oscars.
Getting back to the more glamorous data set, the site will even tell you who has been thanked by name most frequently in acceptance speeches by directors, leading and supporting actors and actresses.
[Via Flowing Data]
February 1st, 2013
Being a statistician, a feminist and a fan of the outgoing US Secretary of State, Hilary Parker couldn't resist investigating whether it's true that the name Hilary/Hillary is the most poisoned baby name in US history.
A lot of screen-scraping and many R sessions later, she shares her conclusions and reasoning with the rest of us, As a bonus, she explores fascinating side issues, like the reasons why some names saw short-lived leaps in popularity:
For each of the names that "dropped in" I did a little research on the name and the year. "Dewey" popped up in 1898 because of the Spanish-American War – people named their daughters after George Dewey "Deneen" was one name of a duo with a one-hit wonder in 1968. "Katina" and "Catina" were wildly popular because in 1972 in the soap opera Where the Heart Is a character is born named Katina. "Farrah" became popular in 1976 when Charlie's Angels, starring Farrah Fawcett, debuted (notice that the name becomes popular in 2009 when Farrah Fawcett died).
I couldn't resist doing a quick-and-dirty search across the data files on the relative frequency of given names in the population of U.S. births where the individual has a Social Security Number. It appears that in 2008 and 2009 the name 'Barack' saw a tenfold rise in popularity compared to 2007 (albeit from a small base):
Interestingly, no hits came up for the name 'Barack' in the files representing years prior to 2007. Could it be that I've uncovered evidence, from data files supplied by his very own Administration, that Obama wasn't born on American soil after all?
[Via Waxy.org: Links Miniblog]
November 12th, 2012
Such was the extent of his triumph last week that even 11 year-old girls are crushing on Nate Silver:
I wonder if when you get up in the morning you open your kitchen cabinet and go, I'm feeling 18.5% Rice Chex and 27.9% Frosted Mini-Wheats and 32% one of those whole-grain Kashi cereals which have photos of smiling multicultural people on the boxes, as if smiling multicultural people were a new form of fibre. And then I wonder if you think, But I'm really feeling 58.3% like having a cupcake for breakfast, but then your mom says, "I don't care if you're a fancy statistician with a Times blog and Seattle green-architect eyeglass frames, you still need something heart-healthy to start your day," but then you tell her, "Mom, if you keep nagging me I will never let you meet my new boyfriend, Matt Bomer."
See, I think that because you predicted the election with near-100% accuracy Matt Bomer is way more likely to go out with you than with Dick Morris, who predicted a Romney landslide, or with Karl Rove, who kept predicting that Ohio was still in play a week after the election was over. In fact, right now I bet that you could get anyone to go out with you just by saying something like "I predicted Florida, North Carolina, and Illinois, and now I'm predicting that you'll have dinner with me."
March 15th, 2012
The Body Counter, or, What Statistical Analysis Can Teach Us about Atrocities…
Traditionally, human rights work has been more akin to investigative reporting, but [Patrick] Ball is the most influential of a handful of people around the world who see that world not in terms of words, but of figures. His specialty is applying quantitative analysis to mountains of anecdotes, finding the correlations that coax out a story that cannot easily be dismissed.
In testifying [during the trial of Slobodan Milosevic], Ball was doing something other human rights workers can only fantasize about: He confronted the accused, presented him with evidence, and watched him being held to account. At that point, Milosevic in his four wars had killed some 125,000 people, more than anyone in Europe since Stalin. But now the Butcher of the Balkans sat in a courtroom that looked rather like a community college classroom, with two Dutch police officers behind him and his cell waiting for him at the end of each day's session, rhetorical bluster his only available weapon against Ball's evidence.
Milosevic died before the trial ended. Ball returned to Washington and then went on to Lima to work for Peru's Truth and Reconciliation Commission — one of dozens of truth commissions, tribunals, and investigatory bodies where his methods have changed our understanding of war. [...]
October 30th, 2011
September 13th, 2011
The Curious Science of Counting a Crowd.
Give us another half decade or so of smartphone market penetration and this'll be a solved problem, at least in the 'developed' world. The police will just grab copies of the logs from the mobile phone masts adjacent to the meeting site and count up the number of different devices that tried to access them during the course of the demo/rally/parade.
OK, so strictly speaking they'll be counting mobile phones rather than people, but I bet it'll still produce a count well within the 10% margin of error researchers currently hope to achieve using statistical methods.
[Via The Morning News]
May 1st, 2011
Marc Tracy on attending the 2011 MIT Sloan Sports Analytics Conference:
There was an unusually small room at the northwest end of the hall devoted to the authors of research papers. Here are the names of some of the papers: "Paired Pitching: The Welcomed Death of the Starting Pitcher"; "Optimizing an NBA Team's Approach to Free Agency Using a Multiple Choice Knapsack Model"; "A Groovy Kind of Golf Club: The Impact of Grooves Rule Changes in 2010 on the PGA Tour"; "An Improved Adjusted Plus-Minus Statistic for NHL Players"; "A Major League Baseball Swing Quality Metric"; "The Effects of Altitude on Soccer Match Outcomes." There is no way David Foster Wallace did not come up with at least one of those titles.
Readers on this side of the pond shouldn't be put off by the fact that the article is almost entirely about American sports; ultimately, Tracy isn't so much writing about sports as he is about geeks who love (analysing) sports. The geekery is the thing.
[Via Give Me Something To Read]
April 25th, 2011
The New York Times has come up with Project Cascade, a program that takes referrer analysis to a whole new level. Both pretty and useful.
I look forward to someone cloning this and building a WordPress plugin to put this sort of analysis slap bang in the middle of site administrators' Dashboards.
[Via Flowing Data]
January 5th, 2011
The year 2010 at MetaFilter Infographic.
- I'm astonished at the proportion of the site's traffic that goes to AskMeFi. It's not a part of the site I ever visit.
- A nice anecdote from Matt Haughey's post about compiling these stats:
In the past few years I've been contacted by representatives at some major companies sniffing around for acquisitions, but when they find out the moderation and maintenance of the community is all done by hand and we have to talk to thousands of people like grown-ups instead of some amazing whiz-bang python script doing it all for us, the conversation ends and we go back to work.
December 26th, 2010
David McCandless has been looking at the translated editions of his book, Information is Beautiful:
The Finnish publishers called their version 'Tieto On Kaunista'. This, I believe, translates as 'Rainbow Information Icicles Pierce The Bubble Of Your Mind'. (maybe)
According to Google Translate it doesn't mean anything of the sort, but IMHO it ought to: 'Rainbow Information Icicles Pierce The Bubble Of Your Mind' is a fantastic title. Perhaps McCandless could use it for his next book instead.
May 5th, 2010
There are lies, damned lies, and automated sensor readings:
A RECORDED downturn in [Adelaide's] Central Market shoppers that had been attributed to the global financial crisis has now been blamed on a faulty doorway sensor system.
The council and traders have been in a panic over the past year over a sharp downturn in visitor figures and fine-tuned advertising campaigns to attract shoppers.
A council report obtained by The Advertiser has found faulty sensors caused the dramatic drop in recorded visitors and ACC has now been forced to review at least a year of data.
The council's best estimate is that the drop in actual visitor numbers over the past year is less than 1 per cent, compared with about 10 per cent previously believed. [...]
[Via RISKS Digest]
March 14th, 2010
Economics of Sainthood (a preliminary investigation):
Saint-making has been a major activity of the Catholic Church for centuries. The pace of sanctifications has picked up noticeably in the last several decades under the last two popes, John Paul II and Benedict XVI. Our goal is to apply social-science reasoning to understand the Church's choices on numbers and characteristics of saints, gauged by location and socio-economic attributes of the persons designated as blessed. [...]
[Via The Browser]
August 20th, 2009
A billion here, a billion there … pretty soon you're talking real money.
[Via Word Magazine newsletter #76]
August 15th, 2009
A timeline of global media scare stories. Judging by that chart, we're living in the Decade of Diseases.
I wonder why the chart doesn't show the numbers for stories about 'terrorism'. Isn't that truly the 'scare story' of the last decade by a country mile?
August 5th, 2009
In the run-up to the 60th anniversary of the founding of the People's Republic of China, the Chinese National Bureau of Statistics has invited staff to write pieces celebrating the anniversary. One statistician submitted this paean to the power of numbers:
Some mock me for doing statistics
Some loathe me and statistics
Some don't understand what statistics are
Why is it that statistics
Put a calm smile on my face?
Because of statistics
I can solve the deepest mysteries
Because of statistics
I will not be lonely again, playing in the data
Because of statistics
I can rearrange the stars in the skies above
Because of statistics
My life is different, more meaningful
I love my life, my statistics
A little corner of my Excel geek's soul stirred when I read that. Sad, but true.