Census 2011

June 29th, 2014

DataShine: Census provides a simple, map-based view of the UK's 2011 census data. I could browse this thing for hours….

The DataShine mapping platform is an output from an ESRC Future Research Leaders Project entitled "Big Open Data: Mining and Synthesis". The overall project seeks promote and develop the use of large and open datasets amongst the social science community. A key part of this initiative is the visualisation of these data in new and informative ways to inspire new uses and generate insights. Phase one has been to create the mapping platform with data from the 2011 Census. The next phases will work on important issues such as representing the uncertainty inherent in many population datasets and also developing tools that will enable the synthesis of data across multiple sources.

[Via Flowing Data]

Comments Off

9% of Brits think that pop music is better now than it was 20 years ago

May 14th, 2014

15 weird things that 9% of Britons say they believe:

If Labour are having a tough time in the polls, the Lib Dems are facing a European wipe out.

The latest YouGov figures on how people are intending to vote in the European Elections put Lib Dem support at 9%. Our friends at UsVsTh3m noticed this was significantly lower than the number of people who would be prepared to have sex with an android.

We wondered what other things more than 9% of the British public believe, would be prepared to do, or have done…

[...]

10. Eat testicles
Not just the preserve of Bushtucker Trials in I'm A Celeb, 9% of people in the UK said they would be prepared to eat animal testicles. Remember, that's the same amount of people who say they'll vote Lib Dem.

Gloating? Perhaps. But it's a welcome distraction from contemplating UKIP's polling numbers.

[Via LinkMachineGo!]

Comments Off

Gallery Analytics

December 1st, 2013

Jonas Lund's Gallery Analytics brings WiFi-based tracking to the cultural sector:

Gallery Analytics Realtime view

Lund's Gallery Analytics project is a site-specific installation for exhibitions that's able to generate data about behavior of visitors and present this data in a Google Analytics-like environment. By setting up a mesh Wi-Fi network and combining it with custom-made software, Gallery Analytics is able to track every Wi-Fi-enabled device (such as a smartphone) moving around in the area in real-time. [...]

I can see how with a long-term exhibit you might want to tinker with the layout if analysis reveals that visitors are tending to overlook a particular piece, or perhaps even to swap out a piece that people aren't paying attention to for something that might attract more interest, but if you have a short-term exhibit will you accumulate enough data to draw firm conclusions about what is and isn't working before it moves on?1 Also, if you're a museum that hosts visits by groups2 you might find that a group of students being led through on a tour of your exhibits will end up distorting your stats a bit. What you really need is a real-life equivalent of the Referrer field to help you distinguish between a group being led around and individual, self-directed visitors.3

All in all, this could be a heck of a tool for museum and gallery operators, so long as they don't go nuts and start assuming that the data is the whole story.

Now, for extra credit, consider your local shopping mall or town centre doing all of the above. Is that better, or worse, or no different? Please justify your answer.

[Via Extenuating Circumstances]

  1. Imagine trying to use Google Analytics on a web site where one or more sections of the site is completely repurposed every X weeks. Not just tweaking the colour scheme and fonts, but actually ripping out the content that used to be book reviews and replacing it with knitting patterns, and organising the content by colour one month and by country of origin the next.
  2. e.g. schoolkids. It's been a long time since I was one, but I assume that occasionally classes of schoolchildren still get taken on a visit to their local museum/art gallery.
  3. But then, if a group of a couple of dozen distorts your stats that much then you're getting so few visitors that perhaps you're not going to be open much longer so this software isn't going to have time to help!

Comments Off

Thank the Academy

March 21st, 2013

Thank the Academy: A visualization of how Oscar winners express gratitude.

Thanked at length...

I have to be honest: that's one of the more straightforward charts on the site. The interactive charts that really let you slice and dice the data are where the action is, but they can't be properly represented by a dinky screenshot over here.

You really should go and have a play for yourself. You can view the differences in the content of speeches and even the behaviour of the recipient, then view the differences between eras, or between different classes of award winner. It's a very well done site.

I'd love to see someone apply these same techniques and style of presentation to another corpus taken from an annual event with a bit of a history to it. Say, Budget speeches by Chancellors of the Exchequer over the last 50 years, or party leaders' speeches to their party conference. Granted, you couldn't do much with an analysis by gender of either of those data sets – what with any analysis by gender of the relevant UK data sets having Margaret Thatcher on one side of the stats and generation after generation of middle aged men on the other – but there would be all sorts of illuminating ways to break the data down.

One think I can confidently predict: those sorts of data sets would provide fewer opportunities to tally the number of speakers who burst into tears during their speech. Also, some poor devil would have to sit through recordings of each speech taking detailed notes, and I'm pretty sure that'd be a lot less entertaining – a lot less glamorous, certainly – than watching 50 years of excerpts from the Oscars.

Getting back to the more glamorous data set, the site will even tell you who has been thanked by name most frequently in acceptance speeches by directors, leading and supporting actors and actresses.1

[Via Flowing Data]

  1. I'll give you a clue. He's still active in the industry today. As is his brother.

Comments Off

Big data

February 17th, 2013

Big:

In the future, you have access to all your data. Memory, or the lack thereof, is no longer discussed. It is only assumed, a feature of modern life, since you can now relive all your past data as experiences. But because of "technical constraints," all of your experiences are taxonomized and merged for ease of efficiency/retrieval. To access your past, then, is to relive each experience – in real time, all at once.

You begin:

You spend seven weeks holding your iPhone to your ear on hold.
You pull to refresh for seven months, click to refresh for nine.
You miss 30 Thanksgiving dinners restarting your laptop.
12 Valentine's Days restarting your iPhone.
You swipe past iPad ads for 48 hours before ever seeing content.
[...]

Comments Off

Poison(ed)

February 1st, 2013

Being a statistician, a feminist and a fan of the outgoing US Secretary of State, Hilary Parker couldn't resist investigating whether it's true that the name Hilary/Hillary is the most poisoned baby name in US history.

A lot of screen-scraping and many R sessions later, she shares her conclusions and reasoning with the rest of us, As a bonus, she explores fascinating side issues, like the reasons why some names saw short-lived leaps in popularity:

For each of the names that "dropped in" I did a little research on the name and the year. "Dewey" popped up in 1898 because of the Spanish-American War – people named their daughters after George Dewey "Deneen" was one name of a duo with a one-hit wonder in 1968. "Katina" and "Catina" were wildly popular because in 1972 in the soap opera Where the Heart Is a character is born named Katina. "Farrah" became popular in 1976 when Charlie's Angels, starring Farrah Fawcett, debuted (notice that the name becomes popular in 2009 when Farrah Fawcett died).

I couldn't resist doing a quick-and-dirty search across the data files on the relative frequency of given names in the population of U.S. births where the individual has a Social Security Number1. It appears that in 2008 and 2009 the name 'Barack' saw a tenfold rise in popularity compared to 2007 (albeit from a small base):

Screen shot

Interestingly, no hits came up for the name 'Barack' in the files representing years prior to 2007. Could it be that I've uncovered evidence, from data files supplied by his very own Administration, that Obama wasn't born on American soil after all?2

[Via Waxy.org: Links Miniblog]

  1. Downloadable here.
  2. You'll be shocked to hear that the answer to that question is a resounding No. To quote from the documentation that accompanied the data files: "To safeguard privacy, we restrict our list of names to those with at least 5 occurrences." So 2007 – when Obama was a first-term US Senator – was merely the first year the name was given to as many as 5 children.

Comments Off

His #1 11 year-old fan

November 12th, 2012

Such was the extent of his triumph last week that even 11 year-old girls are crushing on Nate Silver:

I wonder if when you get up in the morning you open your kitchen cabinet and go, I'm feeling 18.5% Rice Chex and 27.9% Frosted Mini-Wheats and 32% one of those whole-grain Kashi cereals which have photos of smiling multicultural people on the boxes, as if smiling multicultural people were a new form of fibre. And then I wonder if you think, But I'm really feeling 58.3% like having a cupcake for breakfast, but then your mom says, "I don't care if you're a fancy statistician with a Times blog and Seattle green-architect eyeglass frames, you still need something heart-healthy to start your day," but then you tell her, "Mom, if you keep nagging me I will never let you meet my new boyfriend, Matt Bomer."

See, I think that because you predicted the election with near-100% accuracy Matt Bomer is way more likely to go out with you than with Dick Morris, who predicted a Romney landslide, or with Karl Rove, who kept predicting that Ohio was still in play a week after the election was over. In fact, right now I bet that you could get anyone to go out with you just by saying something like "I predicted Florida, North Carolina, and Illinois, and now I'm predicting that you'll have dinner with me."

[Via kottke.org]

Comments Off

The Body Counter

March 15th, 2012

The Body Counter, or, What Statistical Analysis Can Teach Us about Atrocities…

Traditionally, human rights work has been more akin to investigative reporting, but [Patrick] Ball is the most influential of a handful of people around the world who see that world not in terms of words, but of figures. His specialty is applying quantitative analysis to mountains of anecdotes, finding the correlations that coax out a story that cannot easily be dismissed.

[...]

In testifying [during the trial of Slobodan Milosevic], Ball was doing something other human rights workers can only fantasize about: He confronted the accused, presented him with evidence, and watched him being held to account. At that point, Milosevic in his four wars had killed some 125,000 people, more than anyone in Europe since Stalin. But now the Butcher of the Balkans sat in a courtroom that looked rather like a community college classroom, with two Dutch police officers behind him and his cell waiting for him at the end of each day's session, rhetorical bluster his only available weapon against Ball's evidence.

Milosevic died before the trial ended. Ball returned to Washington and then went on to Lima to work for Peru's Truth and Reconciliation Commission — one of dozens of truth commissions, tribunals, and investigatory bodies where his methods have changed our understanding of war. [...]

Comments Off

One-Hundred-and-Eighty!

January 21st, 2012

A geek plays darts:

Where is the best place for a player to aim, knowing the inaccuracies of throwing?

What is the optimal strategy? Where should a player aim in order to obtain the highest expected outcome over many throws?

Should they aim for the triple 20, with a big payout on a success, but a low score from a miss? Or, should they aim for the bullseye?

Alternatively, is there some other optimal location on the board they can aim for that, whilst not the highest scoring region, has a large exapnse of middle of the road point values. Would aiming for this region, even with an inaccurate shot, get a reasonable number of points such that, on average, the expected score is the highest that can be achieved?

The true answer to this riddle, as we will see, is that "it depends…"

[Via Flowing Data]

Comments Off

50%?

October 30th, 2011

Best statistics question ever?

Best Statistics Question Ever?

Comments Off

Everybody Lies

September 13th, 2011

The Curious Science of Counting a Crowd.

Give us another half decade or so of smartphone market penetration and this'll be a solved problem, at least in the 'developed' world. The police will just grab copies of the logs from the mobile phone masts adjacent to the meeting site and count up the number of different devices that tried to access them during the course of the demo/rally/parade.1

OK, so strictly speaking they'll be counting mobile phones rather than people, but I bet it'll still produce a count well within the 10% margin of error researchers currently hope to achieve using statistical methods.

[Via The Morning News]

  1. For bonus points, by that time smartphone suppliers will be under a legal obligation to make their devices pass on their GPS coordinates whenever they try to talk to a mobile phone base station, so as to make it easier for the police to distinguish between those situated inside the park and those 50 yards outside, just passing by. What do you mean 'It's no business of the government where I'm standing'? If you've got nothing to hide then you've got nothing to worry about, have you? Well, have you?

Comments Off

The Joy of Stats

May 1st, 2011

Marc Tracy on attending the 2011 MIT Sloan Sports Analytics Conference:

There was an unusually small room at the northwest end of the hall devoted to the authors of research papers. Here are the names of some of the papers: "Paired Pitching: The Welcomed Death of the Starting Pitcher"; "Optimizing an NBA Team's Approach to Free Agency Using a Multiple Choice Knapsack Model"; "A Groovy Kind of Golf Club: The Impact of Grooves Rule Changes in 2010 on the PGA Tour"; "An Improved Adjusted Plus-Minus Statistic for NHL Players"; "A Major League Baseball Swing Quality Metric"; "The Effects of Altitude on Soccer Match Outcomes." There is no way David Foster Wallace did not come up with at least one of those titles.

Readers on this side of the pond shouldn't be put off by the fact that the article is almost entirely about American sports; ultimately, Tracy isn't so much writing about sports as he is about geeks who love (analysing) sports. The geekery is the thing.

[Via Give Me Something To Read]

Comments Off

Think of it as the web analytics equivalent of the Total Perspective Vortex

April 25th, 2011

The New York Times has come up with Project Cascade, a program that takes referrer analysis to a whole new level. Both pretty and useful.

I look forward to someone cloning this and building a WordPress plugin to put this sort of analysis slap bang in the middle of site administrators' Dashboards.1

[Via Flowing Data]

  1. On second thoughts, perhaps not. The post title explains why.

Comments Off

MeFi 2010

January 5th, 2011

The year 2010 at MetaFilter Infographic.

Two thoughts:

  1. I'm astonished at the proportion of the site's traffic that goes to AskMeFi. It's not a part of the site I ever visit.
  2. A nice anecdote from Matt Haughey's post about compiling these stats:

    In the past few years I've been contacted by representatives at some major companies sniffing around for acquisitions, but when they find out the moderation and maintenance of the community is all done by hand and we have to talk to thousands of people like grown-ups instead of some amazing whiz-bang python script doing it all for us, the conversation ends and we go back to work.

2 Comments »

'Information' is translated

December 26th, 2010

David McCandless has been looking at the translated editions of his book, Information is Beautiful:

The Finnish publishers called their version 'Tieto On Kaunista'. This, I believe, translates as 'Rainbow Information Icicles Pierce The Bubble Of Your Mind'. (maybe)

According to Google Translate it doesn't mean anything of the sort, but IMHO it ought to: 'Rainbow Information Icicles Pierce The Bubble Of Your Mind' is a fantastic title. Perhaps McCandless could use it for his next book instead.

Comments Off

Blinded by technology

May 5th, 2010

There are lies, damned lies, and automated sensor readings:

A RECORDED downturn in [Adelaide's] Central Market shoppers that had been attributed to the global financial crisis has now been blamed on a faulty doorway sensor system.

The council and traders have been in a panic over the past year over a sharp downturn in visitor figures and fine-tuned advertising campaigns to attract shoppers.

A council report obtained by The Advertiser has found faulty sensors caused the dramatic drop in recorded visitors and ACC has now been forced to review at least a year of data.

[...]

The council's best estimate is that the drop in actual visitor numbers over the past year is less than 1 per cent, compared with about 10 per cent previously believed. [...]

[Via RISKS Digest]

1 Comment »

Economics of Sainthood

March 14th, 2010

Economics of Sainthood (a preliminary investigation):

1. Introduction

Saint-making has been a major activity of the Catholic Church for centuries. The pace of sanctifications has picked up noticeably in the last several decades under the last two popes, John Paul II and Benedict XVI. Our goal is to apply social-science reasoning to understand the Church's choices on numbers and characteristics of saints, gauged by location and socio-economic attributes of the persons designated as blessed. [...]

[Via The Browser]

Comments Off

Putting it all into perspective

August 20th, 2009

A billion here, a billion there … pretty soon you're talking real money.

[Via Word Magazine newsletter #76]

Comments Off

Scare stories

August 15th, 2009

A timeline of global media scare stories. Judging by that chart, we're living in the Decade of Diseases.

I wonder why the chart doesn't show the numbers for stories about 'terrorism'.1 Isn't that truly the 'scare story' of the last decade by a country mile?

[Via Waxy.org]

  1. Be that articles about actual terrorist acts, or stories using the prospect of terrorist acts as a justification for whatever course of action the author advocates.

Comments Off

Statistical Feelings

August 5th, 2009

In the run-up to the 60th anniversary of the founding of the People's Republic of China, the Chinese National Bureau of Statistics has invited staff to write pieces celebrating the anniversary. One statistician submitted this paean to the power of numbers:

Life

Some mock me for doing statistics
Some loathe me and statistics
Some don't understand what statistics are

Why is it that statistics
Put a calm smile on my face?

Because of statistics
I can solve the deepest mysteries

Because of statistics
I will not be lonely again, playing in the data

Because of statistics
I can rearrange the stars in the skies above

Because of statistics
My life is different, more meaningful

I love my life, my statistics

A little corner of my Excel1 geek's soul stirred when I read that. Sad, but true.

  1. Yes, I know: Excel is a horrible tool for statistical work. But sometimes it's the only tool you've got…

1 Comment »

Page 1 of 212