Data doesn’t invade people’s lives. Lack of control over how it’s used does.
What’s really driving so-called Big Data isn’t the volume of information. It turns out Big Data doesn’t have to be all that big. Rather, it’s about a reconsideration of the fundamental economics of analyzing data.
For decades, there’s been a fundamental tension between three attributes of databases. You can have the data fast; you can have it big; or you can have it varied. The catch is, you can’t have all three at once.

I’d first heard this as the “three V’s of data”: Volume, Variety, and Velocity. Traditionally, getting two was easy but getting three was very, very, very expensive.
The advent of clouds, platforms like Hadoop, and the inexorable march of Moore’s Law means that now, analyzing data is trivially inexpensive. And when things become so cheap that they’re practically free, big changes happen—just look at the advent of steam power, or the copying of digital music, or the rise of home printing. Abundance replaces scarcity, and we invent new business models.
In the old, data-is-scarce model, companies had to decide what to collect first, and then collect it. A traditional enterprise data warehouse might have tracked sales of widgets by color, region, and size. This act of deciding what to store and how to store it is called designing the schema, and in many ways, it’s the moment where someone decides what the data is about. It’s the instant of context.
That needs repeating:
You decide what data is about the moment you define its schema.
With the new, data-is-abundant model, we collect first and ask questions later. The schema comes after the collection. Indeed, Big Data success stories like Splunk, Palantir, and others are prized because of their ability to make sense of content well after it’s been collected—sometimes called a schema-less query. This means we collect information long before we decide what it’s for.
And this is a dangerous thing.
When bank managers tried to restrict loans to residents of certain areas (known as redlining) Congress stepped in to stop it (with the Fair Housing Act of 1968.) They were able to legislate against discrimination, making it illegal to change loan policy based on someone’s race.

“Personalization” is another word for discrimination. We’re not discriminating if we tailor things to you based on what we know about you—right? That’s just better service.
In one case, American Express used purchase history to adjust credit limits based on where a customer shopped, despite his excellent credit limit.
Johnson says his jaw dropped when he read one of the reasons American Express gave for lowering his credit limit: “Other customers who have used their card at establishments where you recently shopped have a poor repayment history with American Express.”
We’re seeing the start of this slippery slope everywhere from tailored credit-card limits like this one to car insurance based on driver profiles. In this regard, Big Data is a civil rights issue, but it’s one that society in general is ill-equipped to deal with.

We’re great at using taste to predict things about people. OKcupid’s 2010 blog post “The Real Stuff White People Like” showed just how easily we can use information to guess at race. It’s a real eye-opener (and the guys who wrote it didn’t include everything they learned—some of it was a bit too controversial.) They simply looked at the words one group used which others didn’t often use. The result was a list of “trigger” words for a particular race or gender.
Now run this backwards. If I know you like these things, or see you mention them in blog posts, on Facebook, or in tweets, then there’s a good chance I know your gender and your race, and maybe even your religion and your sexual orientation. And that I can personalize my marketing efforts towards you.
That makes it a civil rights issue.

If I collect information on the music you listen to, you might assume I will use that data in order to suggest new songs, or share it with your friends. But instead, I could use it to guess at your racial background. And then I could use that data to deny you a loan.
Want another example? Check out Private Data In Public Ways, something I wrote a few months ago after seeing a talk at Big Data London, which discusses how publicly available last name information can be used to generate racial boundary maps:

This TED talk by Malte Spitz does a great job of explaining the challenges of tracking citizens today, and he speculates about whether the Berlin Wall would ever have come down if the Stasi had access to phone records in the way today’s governments do.
So how do we regulate the way data is used?
The only way to deal with this properly is to somehow link what the data is with how it can be used. I might, for example, say that my musical tastes should be used for song recommendation, but not for banking decisions.
Tying data to permissions can be done through encryption, which is slow, riddled with DRM, burdensome, hard to implement, and bad for innovation. Or it can be done through legislation, which has about as much chance of success as regulating spam: it feels great, but it’s damned hard to enforce.
There are brilliant examples of how a quantified society can improve the way we live, love, work, and play. Big Data helps detect disease outbreaks, improve how students learn, reveal political partisanship, and save hundreds of millions of dollars for commuters—to pick just four examples. These are benefits we simply can’t ignore as we try to survive on a planet bursting with people and shaken by climate and energy crises.
But governments need to balance reliance on data with checks and balances about how this reliance erodes privacy and creates civil and moral issues we haven’t thought through. It’s something that most of the electorate isn’t thinking about, and yet it affects every purchase they make.
This should be fun.
Comments
53 responses to “Big Data is our generation’s civil rights issue, and we don’t know it”
Very good work. Your insight into the unforeseen risks of Big Data is insightful and thorough.
However, what would you say to the criticism that you are seeing lions in the darkness? In other words, the risk of abuse certainly exists, but until we see a clear case of Big Data enabling and fueling discrimination, how do we know there is a real threat worth fighting? Your argument could be seen as the Techno-Moral equivalent to Iraq’s WMDs.
I am just curious what your response is. What is your argument that we really ought to be worried enough to act now?
Cheers.
– Harry
[…] lawsuit to come for our generation. Alistar over on the Solve for Interesting Blog has a great article on this very topic, and is a must read for anyone interested in big data and it’s […]
This is a problem for sure and the way around is perhaps regulation and compliance, but there are also things companies can do to get ahead of the issue of the ‘creepiness’ of Big Data:
http://successfulworkplace.com/2012/08/04/big-data-without-process-is-creepy/
Comments welcome.
[…] Alistair Croll recently argued that Big Data is this generations civil rights issue. He explains, “In the old, data-is-scarce model, companies had to decide what to collect first, and then collect it. A traditional enterprise data warehouse might have tracked sales of widgets by color, region, and size. This act of deciding what to store and how to store it is called designing the schema, and in many ways, it’s the moment where someone decides what the data is about. It’s the instant of context. That needs repeating: You decide what data is about the moment you define its schema.” […]
Excellent points here. Treating this as a civil (and moral) issue is an inspired way to build safeguards against the types of inferences that can come out of poor use of big data.
To Harry: The benefit I see here is raising awareness–both to help individuals better understand their digital footprints, but also to share ideas to help shape societal norms about use of data. Big data can give fantastic insight with proper methodology and good motivations. It can also further fuel confirmation bias about questionable motivations. Personally identifiable information can be abused–whether identity theft, or inferences drawn about people based on activity on Facebook, etc. Big data just accelerates it, and enables additional indirect inferences.
The indirect part reminds me of some of the underlying aspects of the financial issues caused by Bernie Madoff, or the housing/financial crisis enabled through predatory mortgages that became toxic investments. It’s easy to think everything is legit from what you see one or two steps from you…and also easy to assume everything beyond that is operating with the same expectations you have, even if it may not be.
[…] on solveforinteresting.com Partager:TwitterFacebookJ'aime ceci:J'aimeBe the first to like this. « Previous […]
While I agree with much of the article, I do take issue with the point made right at the start, ie “You can have the data fast; you can have it big; or you can have it varied. The catch is, you can’t have all three at once.” Yes you can. And you’ve been able to for really quite a long time. That’s what Teradata is for. And has been for 30 years.
David,
My point was that these three things equal a constant, and it is the constant that’s changing dramatically (a few sentences later I used three Vs again, this time as “very, very,very expensive.”)
The point here is that the upfront investment in, say, a Hadoop cluster as a service, paid for by the drink, significantly lowers the barriers to entry when compared to the data warehouses and BI tools of the past. And this very accessibility is what makes plenty of groups who might not otherwise expend the effort suddenly leverage all the spare data lying around to make decisions—sometimes not fair ones.
[…] Big Data is Our Generation’s Civil Rights Issue, and We Don’t Know It (Solve for Interesting) – Every new technology can be a force for good AND a force for evil – this article discusses some old civil rights issues through the lens of the data explosion… […]
[…] I wrote a post about big data and civil rights, which seems to have hit a nerve. It was posted on Solve for Interesting and here on Radar, and then folks like Boing Boing picked it […]
[…] I recently spoke with author, analyst and serial entrepreneur Alistair Croll, who believes that big data is our generation’s civil rights issue. “Ten years ago we said, ‘don’t put your name on the internet.’ Society is a moving target […]
[…] say data, and how its aggregated and used to shape opportunities and experiences differentially is the civil rights issue of our day. Education research is an example of an area where the consequences of this can be disastrous if all […]
Time is a big cure for many things. Computing is growing so fast that the three ‘V’s will eventually be conquered. As to the quantification of people as data and the using of that data to limit their abilities, that is already being done. People over 50 no longer can enter the fast track to advancement, medical conditions blindly further limit your abilities. Interest rates are applied at too high a rate with penalties that stifel all possible advancement. Data collected by the credit companies, basically your credit worthiness is only useful to make more profit by the credit cards, not based on your ability or likelihood of paying back. Insurance is based blindly on age and past accidents, but not on the ability of the driver, just incidence. The future will soon offer information to the common man that will eliminate the abilities of many a company. Unfair tactics will be broadcast so that the common man can avoid the companies, severely handicapping those companies. Information on gas stations is already being forecast so that consumers will avoid the high priced ones. This will escalate and companies will have to watch their p’s and q’s. Make no bones about it, with information, it is a two way street. Right now the consumer has the disadvantage, but with tight enough communications, a network of citizens can put any single business, out of business.
[…] points to an article by Alistair Croll in which Croll outlines ways our data can be used against us and argues that big data is a civil […]
[…] 주 전, 빅 데이터와 시민권에 대한 충격적인 글을 올렸다. 이 글은 Solve for Interesting과 Radar에 실렸고, Boing Boing 같은 블로그에도 올라갔다. 이전에는 내 글에 […]
[…] 주 전, 빅 데이터와 시민권에 대한 충격적인 글을 올렸다. 이 글은 Solve for Interesting과 Radar에 실렸고, Boing Boing 같은 블로그에도 올라갔다. 이전에는 내 글에 […]
[…] an important post called “Big Data is our generation’s civil rights issue, and we don’t know […]
Hello, just wanted to say, I loved this post. It was inspiring.
Keep on posting!
[…] You can read Fertik’s piece at Scientific American and McDonald’s piece at PBS Idea Lab — they are this week’s recommended reads. You might also be interested in Alistair Croll’s related posts: “Thin walls and traffic cameras,” “New ethics for a new world” and “Big data is our generation’s civil rights issue, and we don’t know it.” […]
[…] title of another blog post, "Big Data is our generation's civil rights issue, and we don't know it," puts the backlash argument in a different nutshell. This author writes, "With the new, […]
[…] 如Alistair Croll所言,人們可能運用海量資料製造價格歧視,引起眾多公民權疑慮,在「個人化」名義下,海量資料卻可能用來針對特定社會族群,給予不一樣的待遇,法律通常禁止企業與個人出現此種歧視行為。企業購買網路廣告宣傳信用卡時,可能依據家戶所得或信貸紀錄,挑選特定目標群眾,導致他人完全無從得知該項優惠。Google甚至握有浮動設定內容價格的專利,例如你過往消費紀錄若顯示,可能花高價購買鞋子,下回在網路上打算買鞋時,搜尋結果也將傾向高價品。雇主如今也希望在人力資源方面運用海量資料,完全透過分析電腦使用習慣,評估如何提高員工生產力,而員工可能對這些資料與用途毫不知情。 […]
[…] implications of Big Data in many ways. The best is probably still Alistair Croll’s remarkable article Big Data is our generation’s civil rights issue, and we don’t know it on Solve for […]
I think one thing that bears mentioning is the potential harm the abuses could do to the legitimate and useful uses for Big Data. It won’t be long before tools for defeating these types of data collection come along, and there will be the inevitable tug-of-war between companies that want to use/sell their data stores, and people that don’t want to be profiled based on the internet and social media history. That battle will muddy the waters for more principled and altruistic applications of big data (e.g. city planning, demographic science, journalism) and in some cases, thwart their development.
I think a relatively simple (if simplistic) patch is to make a major push to anonymize the data points during the collection. Sure with effort, someone can reconstruct the identity of the person from the fragmentary records, but it makes it much harder to use the data against someone personally. Meanwhile, the valuable aspect of big data, the correlations and relationships stemming from a single node can be maintained intact, you just won’t know the identity of the node.
[…] a smart, year-ago post, entrepreneur Alistair Croll notes that ability to store vast amounts of data, and then throw […]
[…] The tougher question is what we do about predictive analytics…the kind that show that when X and Y happen, Z is n% likely to occur as well. That’s fine when it delivers security or health, but what about when it indicates that someone is likely to do something bad but hasn’t yet? Is that fair? How far is far enough when the issue is the individual versus the collective good? Big Data will certainly redefine long-held expectations about civil rights. […]
[…] out of a demographic. And this goes back to another particularly brilliant point that Croll makes: the use of Big Data to define our target audience creates a Civil Rights issue. Because there’s a very thin line between offering something to people who are statistically […]
[…] Big Data is our generation’s civil rights issue, and we don’t know it – Solve for Interesting. […]
Big data was and is used by certain US states governments to draw voting districts, resulting in clear violation of the principle “one man, one vote” and effectively nullifying the right to vote for millions. If that is not a civil rights issue I do not know what is.
[…] Big Data is our generation’s civil rights issue, and we don’t know it (Solve for Interes… […]
[…] I wrote a post about big data and civil rights, which seems to have hit a nerve. It was posted on Solve for Interesting and on Radar, and then folks like Boing Boing picked it […]
[…] BIG DATA IS OUR GENERATION’S CIVIL RIGHTS ISSUE, AND WE DON’T KNOW IT […]
[…] came across this article from last year on Big Data. With such advances in how we manage and collect information comes responsibilities, and the […]
[…] 如Alistair Croll所言,人們可能運用海量資料製造價格歧視,引起眾多公民權疑慮,在「個人化」名義下,海量資料卻可能用來針對特定社會族群,給予不一樣的待遇,法律通常禁止企業與個人出現此種歧視行為。企業購買網路廣告宣傳信用卡時,可能依據家戶所得或信貸紀錄,挑選特定目標群眾,導致他人完全無從得知該項優惠。Google甚至握有浮動設定內容價格的專利,例如你過往消費紀錄若顯示,可能花高價購買鞋子,下回在網路上打算買鞋時,搜尋結果也將傾向高價品。雇主如今也希望在人力資源方面運用海量資料,完全透過分析電腦使用習慣,評估如何提高員工生產力,而員工可能對這些資料與用途毫不知情。 […]
[…] and ultimately, to exclude. Technologist Alistair Croll has declared Big Data to be the “civil rights issue” of our […]
[…] in a new way (please read Alistair Croll’s (Twitter; @acroll) blog on it being our ‘Civil Rights Issue‘ – I like this by the […]
[…] For those who think Facebook makes only educated guesses as to what your sponsored feed content should sport, please take a minute to read his post here. […]
[…] I recently spoke with author, analyst and serial entrepreneur Alistair Croll, who believes that big data is our generation’s civil rights issue. “Ten years ago we said, ‘don’t put your name on the internet.’ Society is a moving target […]
[…] implications of Big Data in many ways. The best is probably still Alistair Croll’s remarkable article Big Data is our generation’s civil rights issue, and we don’t know it on Solve for […]
[…] The tougher question is what we do about predictive analytics…the kind that show that when X and Y happen, Z is n% likely to occur as well. That’s fine when it delivers security or health, but what about when it indicates that someone is likely to do something bad but hasn’t yet? Is that fair? How far is far enough when the issue is the individual versus the collective good? Big Data will certainly redefine long-held expectations about civil rights. […]
Interesting and thought provoking article. But this is not a civil rights issue.
At the heart of discrimination laws is the principle that you should not be discriminated against for something that you cannot change (e.g. race, gender).
If music listening and shopping habits correlate with high rates of loan default, these are lifestyle choices that are easy to change if one wants to.
Companies exercising intentional racial discrimination through correlated data seems both unlikely and unnecessary.
COULD LET ME PRINT THIS ARTICLE.THANK-YOU> PERUCHO8
Do you mean publish it in another medium? Sure; please provide attribution and a link back to the original post, and send me a scanned copy.
[…] boyd and Kate Crawford did in 2011. Almost two years ago, my colleague Alistair Croll wrote that “big data is our generation’s civil rights issue, and we don’t even know […]
[…] points to an article by Alistair Croll in which Croll outlines ways our data can be used against us and argues that big data is a civil […]
[…] what efforts are made to anonymize that data. As Alistair Croll, and industry watch and analyst, pointed out back in 2012, “Big Data is our generation’s civil rights issue, and we don’t know […]
[…] on medication side effects How Open Data Can Reveal—And Correct—The Faults In Our Health System Big Data is our Generation’s Civil Rights Issue, and We Don’t Know It. From the Forum Creating Scales for Quantifying Action Sharing Anonymized […]
[…] Big Data is Our Generation’s Civil Rights Issue and We Don’t Know It (print / web) […]
[…] social systems. And I’m drawing upon a bog of knowledge from Cathy O’Neil, Allstair Croll, and danah boyd in case you want to read more of this ilk (you totally should, that’s […]
[…] rates can easily be crafted on the basis of aggregated data, tech analyst and author Alistair Croll cautions that individual personalization is just “another word for discrimination.” Advocates worry that […]
[…] as it does the data itself. A promising trend, but we’ll have to reconsider the fundamental economics of analyzing data first if we hope to address what Alistair Croll calls the civil rights issue of our […]
[…] my friend Alistair Croll wrote a couple of years ago: Big Data doesn’t have to be all that big. Rather it’s about a reconsideration of […]
[…] my friend Alistair Croll wrote a couple of years ago: Big Data doesn’t have to be all that big. Rather it’s about a reconsideration of […]
[…] August, my colleague Alistair Croll provocatively wrote that big data is our generation’s civil rights issue. Robert Kirkpatrick, director of U.N. Global Pulse, broadened his concerns when he delivered […]