March 21st, 2013
Thank the Academy: A visualization of how Oscar winners express gratitude.
I have to be honest: that's one of the more straightforward charts on the site. The interactive charts that really let you slice and dice the data are where the action is, but they can't be properly represented by a dinky screenshot over here.
You really should go and have a play for yourself. You can view the differences in the content of speeches and even the behaviour of the recipient, then view the differences between eras, or between different classes of award winner. It's a very well done site.
I'd love to see someone apply these same techniques and style of presentation to another corpus taken from an annual event with a bit of a history to it. Say, Budget speeches by Chancellors of the Exchequer over the last 50 years, or party leaders' speeches to their party conference. Granted, you couldn't do much with an analysis by gender of either of those data sets – what with any analysis by gender of the relevant UK data sets having Margaret Thatcher on one side of the stats and generation after generation of middle aged men on the other – but there would be all sorts of illuminating ways to break the data down.
One think I can confidently predict: those sorts of data sets would provide fewer opportunities to tally the number of speakers who burst into tears during their speech. Also, some poor devil would have to sit through recordings of each speech taking detailed notes, and I'm pretty sure that'd be a lot less entertaining – a lot less glamorous, certainly – than watching 50 years of excerpts from the Oscars.
Getting back to the more glamorous data set, the site will even tell you who has been thanked by name most frequently in acceptance speeches by directors, leading and supporting actors and actresses.
[Via Flowing Data]
February 1st, 2013
Being a statistician, a feminist and a fan of the outgoing US Secretary of State, Hilary Parker couldn't resist investigating whether it's true that the name Hilary/Hillary is the most poisoned baby name in US history.
A lot of screen-scraping and many R sessions later, she shares her conclusions and reasoning with the rest of us, As a bonus, she explores fascinating side issues, like the reasons why some names saw short-lived leaps in popularity:
For each of the names that "dropped in" I did a little research on the name and the year. "Dewey" popped up in 1898 because of the Spanish-American War – people named their daughters after George Dewey "Deneen" was one name of a duo with a one-hit wonder in 1968. "Katina" and "Catina" were wildly popular because in 1972 in the soap opera Where the Heart Is a character is born named Katina. "Farrah" became popular in 1976 when Charlie's Angels, starring Farrah Fawcett, debuted (notice that the name becomes popular in 2009 when Farrah Fawcett died).
I couldn't resist doing a quick-and-dirty search across the data files on the relative frequency of given names in the population of U.S. births where the individual has a Social Security Number. It appears that in 2008 and 2009 the name 'Barack' saw a tenfold rise in popularity compared to 2007 (albeit from a small base):
Interestingly, no hits came up for the name 'Barack' in the files representing years prior to 2007. Could it be that I've uncovered evidence, from data files supplied by his very own Administration, that Obama wasn't born on American soil after all?
[Via Waxy.org: Links Miniblog]
November 12th, 2012
Such was the extent of his triumph last week that even 11 year-old girls are crushing on Nate Silver:
I wonder if when you get up in the morning you open your kitchen cabinet and go, I'm feeling 18.5% Rice Chex and 27.9% Frosted Mini-Wheats and 32% one of those whole-grain Kashi cereals which have photos of smiling multicultural people on the boxes, as if smiling multicultural people were a new form of fibre. And then I wonder if you think, But I'm really feeling 58.3% like having a cupcake for breakfast, but then your mom says, "I don't care if you're a fancy statistician with a Times blog and Seattle green-architect eyeglass frames, you still need something heart-healthy to start your day," but then you tell her, "Mom, if you keep nagging me I will never let you meet my new boyfriend, Matt Bomer."
See, I think that because you predicted the election with near-100% accuracy Matt Bomer is way more likely to go out with you than with Dick Morris, who predicted a Romney landslide, or with Karl Rove, who kept predicting that Ohio was still in play a week after the election was over. In fact, right now I bet that you could get anyone to go out with you just by saying something like "I predicted Florida, North Carolina, and Illinois, and now I'm predicting that you'll have dinner with me."
March 15th, 2012
The Body Counter, or, What Statistical Analysis Can Teach Us about Atrocities…
Traditionally, human rights work has been more akin to investigative reporting, but [Patrick] Ball is the most influential of a handful of people around the world who see that world not in terms of words, but of figures. His specialty is applying quantitative analysis to mountains of anecdotes, finding the correlations that coax out a story that cannot easily be dismissed.
In testifying [during the trial of Slobodan Milosevic], Ball was doing something other human rights workers can only fantasize about: He confronted the accused, presented him with evidence, and watched him being held to account. At that point, Milosevic in his four wars had killed some 125,000 people, more than anyone in Europe since Stalin. But now the Butcher of the Balkans sat in a courtroom that looked rather like a community college classroom, with two Dutch police officers behind him and his cell waiting for him at the end of each day's session, rhetorical bluster his only available weapon against Ball's evidence.
Milosevic died before the trial ended. Ball returned to Washington and then went on to Lima to work for Peru's Truth and Reconciliation Commission — one of dozens of truth commissions, tribunals, and investigatory bodies where his methods have changed our understanding of war. [...]
October 30th, 2011
September 13th, 2011
The Curious Science of Counting a Crowd.
Give us another half decade or so of smartphone market penetration and this'll be a solved problem, at least in the 'developed' world. The police will just grab copies of the logs from the mobile phone masts adjacent to the meeting site and count up the number of different devices that tried to access them during the course of the demo/rally/parade.
OK, so strictly speaking they'll be counting mobile phones rather than people, but I bet it'll still produce a count well within the 10% margin of error researchers currently hope to achieve using statistical methods.
[Via The Morning News]
May 1st, 2011
Marc Tracy on attending the 2011 MIT Sloan Sports Analytics Conference:
There was an unusually small room at the northwest end of the hall devoted to the authors of research papers. Here are the names of some of the papers: "Paired Pitching: The Welcomed Death of the Starting Pitcher"; "Optimizing an NBA Team's Approach to Free Agency Using a Multiple Choice Knapsack Model"; "A Groovy Kind of Golf Club: The Impact of Grooves Rule Changes in 2010 on the PGA Tour"; "An Improved Adjusted Plus-Minus Statistic for NHL Players"; "A Major League Baseball Swing Quality Metric"; "The Effects of Altitude on Soccer Match Outcomes." There is no way David Foster Wallace did not come up with at least one of those titles.
Readers on this side of the pond shouldn't be put off by the fact that the article is almost entirely about American sports; ultimately, Tracy isn't so much writing about sports as he is about geeks who love (analysing) sports. The geekery is the thing.
[Via Give Me Something To Read]
April 25th, 2011
The New York Times has come up with Project Cascade, a program that takes referrer analysis to a whole new level. Both pretty and useful.
I look forward to someone cloning this and building a WordPress plugin to put this sort of analysis slap bang in the middle of site administrators' Dashboards.
[Via Flowing Data]
January 5th, 2011
The year 2010 at MetaFilter Infographic.
- I'm astonished at the proportion of the site's traffic that goes to AskMeFi. It's not a part of the site I ever visit.
- A nice anecdote from Matt Haughey's post about compiling these stats:
In the past few years I've been contacted by representatives at some major companies sniffing around for acquisitions, but when they find out the moderation and maintenance of the community is all done by hand and we have to talk to thousands of people like grown-ups instead of some amazing whiz-bang python script doing it all for us, the conversation ends and we go back to work.
December 26th, 2010
David McCandless has been looking at the translated editions of his book, Information is Beautiful:
The Finnish publishers called their version 'Tieto On Kaunista'. This, I believe, translates as 'Rainbow Information Icicles Pierce The Bubble Of Your Mind'. (maybe)
According to Google Translate it doesn't mean anything of the sort, but IMHO it ought to: 'Rainbow Information Icicles Pierce The Bubble Of Your Mind' is a fantastic title. Perhaps McCandless could use it for his next book instead.
May 5th, 2010
There are lies, damned lies, and automated sensor readings:
A RECORDED downturn in [Adelaide's] Central Market shoppers that had been attributed to the global financial crisis has now been blamed on a faulty doorway sensor system.
The council and traders have been in a panic over the past year over a sharp downturn in visitor figures and fine-tuned advertising campaigns to attract shoppers.
A council report obtained by The Advertiser has found faulty sensors caused the dramatic drop in recorded visitors and ACC has now been forced to review at least a year of data.
The council's best estimate is that the drop in actual visitor numbers over the past year is less than 1 per cent, compared with about 10 per cent previously believed. [...]
[Via RISKS Digest]
March 14th, 2010
Economics of Sainthood (a preliminary investigation):
Saint-making has been a major activity of the Catholic Church for centuries. The pace of sanctifications has picked up noticeably in the last several decades under the last two popes, John Paul II and Benedict XVI. Our goal is to apply social-science reasoning to understand the Church's choices on numbers and characteristics of saints, gauged by location and socio-economic attributes of the persons designated as blessed. [...]
[Via The Browser]
August 20th, 2009
A billion here, a billion there … pretty soon you're talking real money.
[Via Word Magazine newsletter #76]
August 15th, 2009
A timeline of global media scare stories. Judging by that chart, we're living in the Decade of Diseases.
I wonder why the chart doesn't show the numbers for stories about 'terrorism'. Isn't that truly the 'scare story' of the last decade by a country mile?
August 5th, 2009
In the run-up to the 60th anniversary of the founding of the People's Republic of China, the Chinese National Bureau of Statistics has invited staff to write pieces celebrating the anniversary. One statistician submitted this paean to the power of numbers:
Some mock me for doing statistics
Some loathe me and statistics
Some don't understand what statistics are
Why is it that statistics
Put a calm smile on my face?
Because of statistics
I can solve the deepest mysteries
Because of statistics
I will not be lonely again, playing in the data
Because of statistics
I can rearrange the stars in the skies above
Because of statistics
My life is different, more meaningful
I love my life, my statistics
A little corner of my Excel geek's soul stirred when I read that. Sad, but true.
March 4th, 2009
The Times Labs Blog has a lovely graph showing the 50 most expensive footballers, aggregating the fees paid across each player's career and adjusting them for inflation.
Fascinating stuff: the majority of players on the chart have had just one or two moves, but the top four players by cumulative transfer value have had four or more moves. It'd be interesting to see an indication of which countries were involved in the various moves; I can't help but notice that the great majority of the players listed have spent some time in the Premiership.
It'd be fascinating to see an updated version of this chart in five or ten years. A lot of the players in the bottom third of the chart – the likes of Fernando Torres, Michael Essien and Samuel Eto'o – are still active and are likely to have one or two more big money moves before they're done, moving them a long way up the chart.
December 26th, 2008
Over at Crooked Timber, Daniel Davies contemplates questions of mortality:
[We...] have been working out job-related mortality rates for a variety of professions. To date, we have:
By definition, all Popes die in office and it is very hard to get any data about Popes having died from job-related illnesses, so we had to scale back the project to estimate the rate of violent job related deaths of Popes. The consensus of historians has seven popes definitely having been murdered, which would give a job-related mortality rate of 476/100k Pope years since Gregory the Great in 540 AD. However …
There are a further ten Popes listed by Wikipedia as possibly having been murdered (this number includes John Paul I, so it apparently doesn't take much in the way of real evidence to get on this list). Adding them in would boost the Papal fatality rate to 1158/100kPy. And furthermore …
The first 25 Popes were martyred. This is surely the very definition of a job related death, and I don't really see much case for excluding them as an outlier – in any long run of data, you're bound to get a couple of periods under which Europe was dominated by a vehemently anti-Christian empire. In order to take this additional data into account, we have to extend the window back to St Peter and 33AD, but you still get 2126 job related deaths/100kPy. [...]
The entire post is well worth reading, including the copious footnotes and the comments. Fine work all round.
September 23rd, 2008
Back in 2005, I thought that Gapminder was a neat tool for mapping demographic, economic and sociological data that badly needed an online, Flash-based version.
Happily, somewhere along the way they added a full-featured web-based version of their software: I can heartily recommend it as a way to pass an hour or two.
[Via The Scout Report]