The Sociology of Information: 2012

Sunday, October 28, 2012

Conference Tools for Twitter

Recent experience of following (and contributing to) Twitter stream at the annual meetings of the American Sociological Association inspired a number of ideas (not all original, I'm sure, and some probably already implemented) in connection with Twitter and conferences:

Conference organizers should devise and disseminate a simple hashtag schemes for sessions/presentations.
Set up scheduled tweets that announce sessions, say, 15 minutes ahead of time. Tweet can contain a link to web page with detailed information about session.
Presenters can submit brief, say 5 to 10, tweet summary of the points they are making and these can be automatically scheduled to be tweeted during the talk.
If talks are being live streamed, start of each talk can be marked by a tweet with URL to the stream. Major point tweets could be synced to location in recorded video/audio.
Develop an app that aggregates and archives live tweets of presentations for live and followup discussion.

What would you add?

Sociology of Information Nuggets

Elections and a three course semester have crowded out blogging over last few months. And so, the blogger's cop out of pointers to some recent interesting reads:

Maya Alexandri has a fun post, "What Thomas Cromwell had in common with the Dewey decimal system" calling attention the theme of information revolutions as noted in the Joan Acocella review of Hilary Mantel's Wolf Hall in The New Yorker. Alexandri and Acocella note interesting similarity of Cromwell's time and our own as eras in which "information is being radically reorganized." It's precisely the desire to clarify such recapitulations that drives my own work on the sociology of information.

Semil Shah offers a panegyric post about Timehop, an app that automatically sends you a photo of what you were doing a year ago today. It purports, among other things, to be a "solution" to the problem of having boxes of memories that you either never find the time to look at or into which you unintentionally dump hour or hour of time you don't have. Shah's optimistic take is

The carousel of old slides, the cigar box of warped pictures, and the Instagrams you’ve taken, now in your pocket, delivered to you in just the right way.

There are some great research questions swirling around issues like personal memory, artifacts, the externalization and automation of recall, search as every-ready reconstruction of the past. Stay tuned.

Friday, April 20, 2012

Scoops in Journalism and Everyday Life

Jay Rosen has a post today titled "Four Types of Scoops" that will surely make it into my sociology of information book. The four types are the "enterprise scoop" where the reporter who gets the scoop gets it by doing the "finding out." The information may be deliberately hidden or obscured by routine practice, but it would not have become known to the public without the work of the reporter. Then there is its opposite, the the "ego scoop": the news would have come out anyway, but the scooper gets (or provokes) a tip or equivalent. The third type Rosen calls the "trader's" scoop where early info has instrumental value -- as in a stock tip. Finally there is the "thought scoop." This is when the writer puts two and two together or otherwise "connects the dots" to, as he says, "apprehend--name and frame--something that's happening out there before anyone else recognizes it."

The information order of everyday life is conditioned by information exchanges that might be similarly categorized. But even before that we'd notice a distinction between exchanges that are NOT experiences as scoops -- I think there are two extremes: information passed on bucket-brigade style with no claim at all to having generated it or deserving any credit for its content or transmission. "Hey, they've run out of eggs, pass it on, eh?" and statements of a truly personal nature: "I'm not feeling well today" that do not reflect one's position or location or worth in the world.

Between these there are all manner of instances in which people play the scoop game in everyday interaction. The difference between an ordinary person and a reporter in this regard is that the reporter's scoop is vis a vis "the rest of the media" while the scoopness of the person's scoop is centered in the information ecology of the recipient. We have all met the inveterate ego scooper who moves from other to other to other trying to stay one step ahead of the diffusing information so that s/he can deliver the "scoop" over and over. And the enterprising gossip who pries information loose from friends and acquaintances and is always ready with the latest tidbit. In everyday interaction the wielder of the traders' scoop often generates the necessary arbitrage because others are willing to "pay" for information they can use as ego scoops. Alas, as in the media, the thought scoop is probably the rarest form in everyday life too. It's probably less self-conscious in everyday interaction and too more ephemeral which is too bad. Those conversational insights are probably more often lost than their counterparts in "print."

Sunday, April 08, 2012

Tomorrow's Social Science Today? By Techies?

If you generate the data, the analysts will come. And more and more of the technologies of everyday life generate data, lots of it. "Big data" takes big tools and big tools cost big bucks. The science of big data is mostly social science but, for the most part, it's not being done by social scientists. What's left out when social scientists leave themselves out of the conversation? And what happens to the funding for non-big-data social science when resource-hungry projects like this emerge? And what will be the effect on the epistemological status of non-big-data social science?

from the New York Times...

THE BAY CITIZEN

Berkeley Group Digs In to Challenge of Making Sense of All That Data

By JEANNE CARSTENSEN

Published: April 7, 2012

"It comes in “torrents” and “floods” and threatens to “engulf” everything that stands in its path.

No, it is not a tsunami, it is Big Data, the incomprehensibly large amount of raw, often real-time data that keeps piling up faster and faster from scientific research, social media, smartphones — virtually any activity that leaves a digital trace.

The sheer size of the pile (measured in petabytes, one million gigabytes, or even exabytes, one billion gigabytes) combined with its complexity has threatened to overwhelm just about everybody, including the scientists who specialize in wrangling it. “It’s easier to collect data,” said Michael Franklin, a professor of computer science at the University of California, Berkeley, “and harder to make sense of it.”

Read full article...

Friday, March 02, 2012

Is There a Right to Data Collection?

What's more socially harmful: politicians not knowing what sound bite will play well or voters being mislead by scurrilous misinformation?

New Hampshire is one state where legislators listened when voters complained about "push-polling" -- the practice of making campaign calls that masquerade as surveys or polls. Perhaps the most infamous example is George Bush's campaign calling South Carolinians to ask what they think if John McCain were to have fathered an illegitimate black baby.

The gist of M. D. Shear's article, Law Has Polling Firms Leery of Work in New Hampshire" (NYT 1 March 2012) is that pollsters and political consultants are whining that "legitimate" operations are getting gun-shy about polling in New Hampshire for fear of being fined. Actual surveys won't get done, they suggest, because poorly worded legislation creates too much illegitimate legal liability.

They do not take issue with what the law requires and some even call it well-intentioned. Paragraph 16a of section 664 of Title 53 of New Hampshire statutes requires those who administer push-polls to identify themselves as doing so on behalf of a candidate or issue. In other words, if that's what you are up to, you need to say so.

The problem, they say, is that the law is poorly written -- good intentions gone bad, they suggest. So, what does the statute actually say? Not so ambiguous, really. It says if you call pretending to be taking a survey but really you are spreading information about opposition candidates then you are push-polling:

XVII. "Push-polling" means:

Calling voters on behalf of, in support of, or in opposition to, any candidate for public office by telephone; and
Asking questions related to opposing candidates for public office which state, imply, or convey information about the candidates' character, status, or political stance or record; and
Conducting such calling in a manner which is likely to be construed by the voter to be a survey or poll to gather statistical data for entities or organizations which are acting independent of any particular political party, candidate, or interest group.

And so, the question arises: why aren't pollsters themselves taking steps to stamp out the practice? One supposes the answer is that they still want to use it, even if the "good guys" would not stoop to the level of sleaziness that Bush and Lee Atwater practiced.

Interestingly, one of the objections that the pollsters raised was that "complying with the law by announcing the candidate sponsoring the poll would corrupt the data being gathered." It's interesting because they don't think that constantly adjusting question wording and techniques that are technically push-polling even if they could stay inside the New Hampshire law would corrupt the data.

But this brings me to my real point. As a practicing social scientist I am consistently disheartened and often angered at the abuse of survey research engaged in by political parties and organizations. I receive "surveys" from the DNC, DCCC, Greenpeace, Sierra Club, MoveOn.org, etc. etc. that triply insult me:

They are, in fact, often push-polls (if gentle ones) whose real purpose is to inform and incite not collect data.
They are couched disingenuously in terms of providing me an opportunity for input, to have my voice heard.
As research instruments they are almost always C- or worse, violating the most basic tenets of survey construction.

Perhaps I should just humor them and wink since we do both know what's really going on. Sometimes the political actor in me is content to do so. But at other times the information order pollution that they represent really gets to me. These things corrupt the data of other legitimate research efforts. If the results are used, they amplify the error in the information order. These things undermine social information trust. They cheapen the very idea of opinion research. Imagine a certain amount of what passes as clinical trials is really just PR for pharmaceutical companies. Or imagine that the "high stakes testing" used to study the education system was really just a ploy to indoctrinate children. Or that marine biologists were just sending a message to the mollusks they study.

As a consultant helping organizations do research I used to ask "are you trying to find out something or are you trying to show something?" To this we could add "or are you just putting on a show?"

There's something disturbing when an industry like political polling can't do better than suggest that the one state that has taken steps to address a real democracy-threatening practice within that industry is somehow "the problem." A republican pollster whined that the law has “a harmful effect on legitimate survey research and message testing that really impairs our ability to do credible polling,” as if we should care. It doesn't take a Ph.D. to see that a little ignorance on the part of politicians about attitudes in New Hampshire as the price for stopping a practice that corrupts public deliberation is a tradeoff well worth making.

Saturday, February 18, 2012

Should Your Company Tell You Your Secrets

Nice sociology of info two-fer in Forbes article about Target being able to detect pregnancy based on purchases (see also "How Companies Learn Your Secrets"in NYT). The first connection is obvious: data mining lets company detect information "given off" by ordinary behavior. Second is the notification question. In the article the "story" is that Target outs young woman to her dad by sending targeted circular for maternity supplies.

So now we are in the situation where data mining companies have to interrogate their notification obligations just like doctors, lawyers, and spouses. I will work up an analysis in subsequent post. I anticipate insights about how corporate-ness of knower figures into the notification norm calculation.

Monday, February 13, 2012

Is Your Information Your Business?

The Business section is fast becoming the sociology of information section.

In "Twitter Is All in Good Fun, Until It Isn’t," David Carr writes about Roland Martin being sanctioned by CNN because of controversial Twitter posts. On the Bits page, Nick Bolton's article "So Many Apologies, So Much Data Mining," tells of David Morin, head of the company that produces the social network app, Path ("The smart journal that helps you share life with the ones you love."), that got into hot water last week when a programmer in Singapore noticed it hijacked users' address books without asking. On page B3 we find a 14 inch article by T. Vega about new research from Pew about how news media websites fail to make optimal use of online advertising.

More on those in future posts. Right next to the Pew article, J. Brustein's "Start-Ups Seek to Help Users Put a Price on Their Personal Data" profiles the startup "Personal" -- one of several that are trying to figure out how to let internet users capitalize on their personal data by locking it up in a virtual vault and selling access bit by bit.

This last one is of particular interest to me. Back in the early 90s I floated an idea that alarmed my social science colleagues: why not let study participants own their data? The idea was inspired by complaints that well-meaning researchers at Yale, where I was a graduate student at the time, routinely made their careers on the personal information they, or someone else, had collected from poor people in New Haven. The original source of that complaint was a community activist who had a more colorful way of describing the relationship between researcher and research subject.

The idea would be to tag data garnered in surveys and other forms of observation with an ID that could be matched with an escrow database (didn't really exist then, but now a part of "Software as a Service (Saas)"). When a researcher wanted to make use of data, she or he would include in the grant proposal some sort of data fee that would be delivered to the intermediary and then distributed as data royalties to the individuals the data concerned. The original researcher would still offer whatever enticements to participation (a bit like an advance for a book). The unique identifier held by the intermediary would allow data linking producing a valuable tool for research and an opportunity for research subjects to continue to collect royalties as their data was "covered" by new research projects just as a song writer does.

The most immediate objections were technical -- real but solvable, but then came the reasoned objections. This would make research more expensive! Perhaps, but another way to see this is that it would be a matter of more fully accounting for social costs and value, and for recognizing that researchers were taking advantage of latent value in their act of aggregation (similar to issues raised about Facebook recently). Another objection was that the purpose of the research was already to help these people. True enough. But why should they bear all the risk of that maybe working out, maybe not?

And so the conversation continued. I'm not sure I like the idea of converting personal information into monetary value; I think it sidesteps some important human/social/cultural considerations about privacy, intimacy, and the ways that information behavior is integral to our sense of self and and sense of relationships. But I do think it is critically important that we think carefully about the information order and how the value of information is created by surveillance and aggregation and how we want to think about what happens to the information we give, give up, and give off.

Related

Business Insider. 2011. "Here's How Much A Unique Visitor Is Worth"
Morey, Tim. 2011. "What's Your Personal Data Worth?" DesignMind blog. January 18, 2011
Moody, Glyn. 2010. "'What "Nothing to Hide' is Hiding," on Open... blog 11 Jan 2010.
Muffett, Alec. 2010. "Understanding Your Personal Information’s Value = The End of 'Nothing To Hide'" dropsafe blog 13 Jan 2010.
Personal.com Blog
Reger, Joe. 2005. "The Time Value of Personal Information," JoeReger.com blog 27 Dec 2005.

Sunday, January 22, 2012

Prying Information Loose and Dealing with Loose Information

A sociology of information triptych this morning. Disclosure laws that fail to fulfill their manifest/intended function, the secret work of parsing public information, and the pending capacity to record everything all bear on the question of the relationship between states and information.

In a 21 Jan 2012 NYT article, "I Disclose ... Nothing," Elisabeth Rosenthal (@nytrosenthal) suggests that despite increasing disclosure mandates we may not, in fact, be more informed. Among the obviating forces are information overload, dearth of interpretive expertise, tendency of organizations to hide behind "you were told...", formal rules provide organizations with blueprint for how to play around with technicalities (as, she notes, Republican PACs have done, using name changes and re-registration to "reset" their disclosure obligation clocks), routinization (as in the melodic litanies of side-effects in drug adverts), and the simple fact that people are not in a position to act on information even it is abundantly available and unambiguous. On the other side, the article notes that there is a whole "industry" out there -- journalists, regulators, reporters who can data mine the disclosure information even if individuals cannot take advantage.

Rachel Martin's (@rachelnpr) piece on NPR's Weekend Edition Sunday, CIA Tracks Public Information For The Private Eye describes almost the mirror image of this: how intelligence agencies are building their infrastructure for trying to find patterns in and making sense of the gadzillions of bits of public information that just sits their for all to see. It's another case that hints at an impossibility theorem about "connecting the dots" a priori.

And finally, in another NPR story, "Technological Innovations Help Dictators See All" Rachel Martin interviews John Villasenor about his paper, "Recording Everything: Digital Storage as an Enabler of Authoritarian Governments" on the idea that data storage has become so inexpensive that there is no reason for governments (they focus on authoritarian ones, but no reason to limit it) not to collect everything (even if, as the first two stories remind us, they may currently lack the capacity to do anything with it). I if surveillance uptake and data rot will prove to be competing tendencies.

The first piece suggests research questions: what are the variables that determine whether disclosure is "useful"? what features of disclosure rules generate cynical work-arounds? if "more is not always better," what is? can we better theorize the relationship between "knowing," open-ness, transparency, disclosure and democracy than we have so far?

The second piece really cries out for an essay capturing the irony of how the information pajamas get turned inside out with the spy agency trying to see what's in front of everyone (we are reminded in a perverse sort of way of Poe's "The Purloined Letter"). Perhaps we'll no longer associate going "under cover" with the CIA.

And the alarm suggested in the third piece is yet another entry under what I (and maybe others) have called the informational inversion -- when the generation, acquisition, and storage of information dominates by orders of magnitude our capacity to do anything with it.

Sunday, January 08, 2012

Journalism and Research Again

Lots of Twitter and blog activity in response to NYT article about Chetty, Friedman, and Rockoff research paper on effects of teachers on students' lives.

No small amount of the commentary is about how when journalists pick "interesting" bits out of research reports to construct a "story" they often create big distortions in the social knowledge-base.

So what can reporters do when trying to explain the significance of new research, without getting trapped by a poorly-supported sound bite?

Sherman Dorn has an excellent post on the case, "When reporters use (s)extrapolation as sound bites," that ends with some advice:

"If a claim could be removed from the paper without affecting the other parts, it is more likely to be a poorly-justified (s)implification/(s)extrapolation than something that connects tightly with the rest of the paper."

"If a claim is several orders of magnitude larger than the data used for the paper (e.g., taking data on a few schools or a district to make claims about state policy or lifetime income), don’t just reprint it. Give readers a way to understand the likelihood of that claim being unjustified (s)extrapolation."

"More generally, if a claim sounds like something from Freakonomics, hunt for a researcher who has a critical view before putting it in a story."

The Sociology of Information