December 13th, 2014
The associated Reddit comment thread can be found here.
[Via Flowing Data]
The associated Reddit comment thread can be found here.
[Via Flowing Data]
DataShine: Census provides a simple, map-based view of the UK's 2011 census data. I could browse this thing for hours….
The DataShine mapping platform is an output from an ESRC Future Research Leaders Project entitled "Big Open Data: Mining and Synthesis". The overall project seeks promote and develop the use of large and open datasets amongst the social science community. A key part of this initiative is the visualisation of these data in new and informative ways to inspire new uses and generate insights. Phase one has been to create the mapping platform with data from the 2011 Census. The next phases will work on important issues such as representing the uncertainty inherent in many population datasets and also developing tools that will enable the synthesis of data across multiple sources.
[Via Flowing Data]
If Labour are having a tough time in the polls, the Lib Dems are facing a European wipe out.
The latest YouGov figures on how people are intending to vote in the European Elections put Lib Dem support at 9%. Our friends at UsVsTh3m noticed this was significantly lower than the number of people who would be prepared to have sex with an android.
We wondered what other things more than 9% of the British public believe, would be prepared to do, or have done…
10. Eat testicles
Not just the preserve of Bushtucker Trials in I'm A Celeb, 9% of people in the UK said they would be prepared to eat animal testicles. Remember, that's the same amount of people who say they'll vote Lib Dem.
Gloating? Perhaps. But it's a welcome distraction from contemplating UKIP's polling numbers.
Jonas Lund's Gallery Analytics brings WiFi-based tracking to the cultural sector:
Lund's Gallery Analytics project is a site-specific installation for exhibitions that's able to generate data about behavior of visitors and present this data in a Google Analytics-like environment. By setting up a mesh Wi-Fi network and combining it with custom-made software, Gallery Analytics is able to track every Wi-Fi-enabled device (such as a smartphone) moving around in the area in real-time. […]
I can see how with a long-term exhibit you might want to tinker with the layout if analysis reveals that visitors are tending to overlook a particular piece, or perhaps even to swap out a piece that people aren't paying attention to for something that might attract more interest, but if you have a short-term exhibit will you accumulate enough data to draw firm conclusions about what is and isn't working before it moves on?1 Also, if you're a museum that hosts visits by groups2 you might find that a group of students being led through on a tour of your exhibits will end up distorting your stats a bit. What you really need is a real-life equivalent of the Referrer field to help you distinguish between a group being led around and individual, self-directed visitors.3
All in all, this could be a heck of a tool for museum and gallery operators, so long as they don't go nuts and start assuming that the data is the whole story.
Now, for extra credit, consider your local shopping mall or town centre doing all of the above. Is that better, or worse, or no different? Please justify your answer.
Thank the Academy: A visualization of how Oscar winners express gratitude.
I have to be honest: that's one of the more straightforward charts on the site. The interactive charts that really let you slice and dice the data are where the action is, but they can't be properly represented by a dinky screenshot over here.
You really should go and have a play for yourself. You can view the differences in the content of speeches and even the behaviour of the recipient, then view the differences between eras, or between different classes of award winner. It's a very well done site.
I'd love to see someone apply these same techniques and style of presentation to another corpus taken from an annual event with a bit of a history to it. Say, Budget speeches by Chancellors of the Exchequer over the last 50 years, or party leaders' speeches to their party conference. Granted, you couldn't do much with an analysis by gender of either of those data sets – what with any analysis by gender of the relevant UK data sets having Margaret Thatcher on one side of the stats and generation after generation of middle aged men on the other – but there would be all sorts of illuminating ways to break the data down.
One think I can confidently predict: those sorts of data sets would provide fewer opportunities to tally the number of speakers who burst into tears during their speech. Also, some poor devil would have to sit through recordings of each speech taking detailed notes, and I'm pretty sure that'd be a lot less entertaining – a lot less glamorous, certainly – than watching 50 years of excerpts from the Oscars.
Getting back to the more glamorous data set, the site will even tell you who has been thanked by name most frequently in acceptance speeches by directors, leading and supporting actors and actresses.1
[Via Flowing Data]
In the future, you have access to all your data. Memory, or the lack thereof, is no longer discussed. It is only assumed, a feature of modern life, since you can now relive all your past data as experiences. But because of "technical constraints," all of your experiences are taxonomized and merged for ease of efficiency/retrieval. To access your past, then, is to relive each experience – in real time, all at once.
You spend seven weeks holding your iPhone to your ear on hold.
You pull to refresh for seven months, click to refresh for nine.
You miss 30 Thanksgiving dinners restarting your laptop.
12 Valentine's Days restarting your iPhone.
You swipe past iPad ads for 48 hours before ever seeing content.
Being a statistician, a feminist and a fan of the outgoing US Secretary of State, Hilary Parker couldn't resist investigating whether it's true that the name Hilary/Hillary is the most poisoned baby name in US history.
A lot of screen-scraping and many R sessions later, she shares her conclusions and reasoning with the rest of us, As a bonus, she explores fascinating side issues, like the reasons why some names saw short-lived leaps in popularity:
For each of the names that "dropped in" I did a little research on the name and the year. "Dewey" popped up in 1898 because of the Spanish-American War – people named their daughters after George Dewey "Deneen" was one name of a duo with a one-hit wonder in 1968. "Katina" and "Catina" were wildly popular because in 1972 in the soap opera Where the Heart Is a character is born named Katina. "Farrah" became popular in 1976 when Charlie's Angels, starring Farrah Fawcett, debuted (notice that the name becomes popular in 2009 when Farrah Fawcett died).
I couldn't resist doing a quick-and-dirty search across the data files on the relative frequency of given names in the population of U.S. births where the individual has a Social Security Number1. It appears that in 2008 and 2009 the name 'Barack' saw a tenfold rise in popularity compared to 2007 (albeit from a small base):
Interestingly, no hits came up for the name 'Barack' in the files representing years prior to 2007. Could it be that I've uncovered evidence, from data files supplied by his very own Administration, that Obama wasn't born on American soil after all?2
[Via Waxy.org: Links Miniblog]
Such was the extent of his triumph last week that even 11 year-old girls are crushing on Nate Silver:
I wonder if when you get up in the morning you open your kitchen cabinet and go, I'm feeling 18.5% Rice Chex and 27.9% Frosted Mini-Wheats and 32% one of those whole-grain Kashi cereals which have photos of smiling multicultural people on the boxes, as if smiling multicultural people were a new form of fibre. And then I wonder if you think, But I'm really feeling 58.3% like having a cupcake for breakfast, but then your mom says, "I don't care if you're a fancy statistician with a Times blog and Seattle green-architect eyeglass frames, you still need something heart-healthy to start your day," but then you tell her, "Mom, if you keep nagging me I will never let you meet my new boyfriend, Matt Bomer."
See, I think that because you predicted the election with near-100% accuracy Matt Bomer is way more likely to go out with you than with Dick Morris, who predicted a Romney landslide, or with Karl Rove, who kept predicting that Ohio was still in play a week after the election was over. In fact, right now I bet that you could get anyone to go out with you just by saying something like "I predicted Florida, North Carolina, and Illinois, and now I'm predicting that you'll have dinner with me."
The Body Counter, or, What Statistical Analysis Can Teach Us about Atrocities…
Traditionally, human rights work has been more akin to investigative reporting, but [Patrick] Ball is the most influential of a handful of people around the world who see that world not in terms of words, but of figures. His specialty is applying quantitative analysis to mountains of anecdotes, finding the correlations that coax out a story that cannot easily be dismissed.
In testifying [during the trial of Slobodan Milosevic], Ball was doing something other human rights workers can only fantasize about: He confronted the accused, presented him with evidence, and watched him being held to account. At that point, Milosevic in his four wars had killed some 125,000 people, more than anyone in Europe since Stalin. But now the Butcher of the Balkans sat in a courtroom that looked rather like a community college classroom, with two Dutch police officers behind him and his cell waiting for him at the end of each day's session, rhetorical bluster his only available weapon against Ball's evidence.
Milosevic died before the trial ended. Ball returned to Washington and then went on to Lima to work for Peru's Truth and Reconciliation Commission — one of dozens of truth commissions, tribunals, and investigatory bodies where his methods have changed our understanding of war. […]
Where is the best place for a player to aim, knowing the inaccuracies of throwing?
What is the optimal strategy? Where should a player aim in order to obtain the highest expected outcome over many throws?
Should they aim for the triple 20, with a big payout on a success, but a low score from a miss? Or, should they aim for the bullseye?
Alternatively, is there some other optimal location on the board they can aim for that, whilst not the highest scoring region, has a large exapnse of middle of the road point values. Would aiming for this region, even with an inaccurate shot, get a reasonable number of points such that, on average, the expected score is the highest that can be achieved?
The true answer to this riddle, as we will see, is that "it depends…"
[Via Flowing Data]
Give us another half decade or so of smartphone market penetration and this'll be a solved problem, at least in the 'developed' world. The police will just grab copies of the logs from the mobile phone masts adjacent to the meeting site and count up the number of different devices that tried to access them during the course of the demo/rally/parade.1
OK, so strictly speaking they'll be counting mobile phones rather than people, but I bet it'll still produce a count well within the 10% margin of error researchers currently hope to achieve using statistical methods.
[Via The Morning News]
Marc Tracy on attending the 2011 MIT Sloan Sports Analytics Conference:
There was an unusually small room at the northwest end of the hall devoted to the authors of research papers. Here are the names of some of the papers: "Paired Pitching: The Welcomed Death of the Starting Pitcher"; "Optimizing an NBA Team's Approach to Free Agency Using a Multiple Choice Knapsack Model"; "A Groovy Kind of Golf Club: The Impact of Grooves Rule Changes in 2010 on the PGA Tour"; "An Improved Adjusted Plus-Minus Statistic for NHL Players"; "A Major League Baseball Swing Quality Metric"; "The Effects of Altitude on Soccer Match Outcomes." There is no way David Foster Wallace did not come up with at least one of those titles.
Readers on this side of the pond shouldn't be put off by the fact that the article is almost entirely about American sports; ultimately, Tracy isn't so much writing about sports as he is about geeks who love (analysing) sports. The geekery is the thing.
The New York Times has come up with Project Cascade, a program that takes referrer analysis to a whole new level. Both pretty and useful.
I look forward to someone cloning this and building a WordPress plugin to put this sort of analysis slap bang in the middle of site administrators' Dashboards.1
[Via Flowing Data]
In the past few years I've been contacted by representatives at some major companies sniffing around for acquisitions, but when they find out the moderation and maintenance of the community is all done by hand and we have to talk to thousands of people like grown-ups instead of some amazing whiz-bang python script doing it all for us, the conversation ends and we go back to work.
David McCandless has been looking at the translated editions of his book, Information is Beautiful:
The Finnish publishers called their version 'Tieto On Kaunista'. This, I believe, translates as 'Rainbow Information Icicles Pierce The Bubble Of Your Mind'. (maybe)
According to Google Translate it doesn't mean anything of the sort, but IMHO it ought to: 'Rainbow Information Icicles Pierce The Bubble Of Your Mind' is a fantastic title. Perhaps McCandless could use it for his next book instead.
There are lies, damned lies, and automated sensor readings:
A RECORDED downturn in [Adelaide's] Central Market shoppers that had been attributed to the global financial crisis has now been blamed on a faulty doorway sensor system.
The council and traders have been in a panic over the past year over a sharp downturn in visitor figures and fine-tuned advertising campaigns to attract shoppers.
A council report obtained by The Advertiser has found faulty sensors caused the dramatic drop in recorded visitors and ACC has now been forced to review at least a year of data.
The council's best estimate is that the drop in actual visitor numbers over the past year is less than 1 per cent, compared with about 10 per cent previously believed. […]
[Via RISKS Digest]
Saint-making has been a major activity of the Catholic Church for centuries. The pace of sanctifications has picked up noticeably in the last several decades under the last two popes, John Paul II and Benedict XVI. Our goal is to apply social-science reasoning to understand the Church's choices on numbers and characteristics of saints, gauged by location and socio-economic attributes of the persons designated as blessed. […]
[Via The Browser]
A billion here, a billion there … pretty soon you're talking real money.
[Via Word Magazine newsletter #76]
A timeline of global media scare stories. Judging by that chart, we're living in the Decade of Diseases.
I wonder why the chart doesn't show the numbers for stories about 'terrorism'.1 Isn't that truly the 'scare story' of the last decade by a country mile?