COLLECT YOURSELF: Data Storage Centers as the Archive’s Underbelly @pda2013

COLLECT YOURSELF: Data Storage Centers as  the Archive’s Underbelly @pda2013





My lightning talk attempts to bridge practical concerns, with archival theory as well real life impacts.

Those impacts are varied – they are social, environmental, political and personal.

I frame the digital archive, for this talk, in terms of the perpetual data streams that we feed into. I consider this archive to encompass data, which breaks down into at least content and a measure of our habits. Content being the interface, and habits being the underbelly.

There are numerous examples of the digital archive as defined through social media, aggregators and even cloud storage — for the sake of these five minutes, and because it speak most aptly to the archival framing I’m attempting to get at, I’m focusing on Facebook

We’ve come to understand Facebook as a story, about Zuckerberg, through the film The Social Network (2010), a story about ideas and ownership over ideas. A story about social rankings, privilege and belonging.

We’ve come to understand Facebook also a social network – a platform to engage with others.

Some of us consider Facebook to be foremost an advertising platform with a social network built on top of it.

But few of us consider Facebook a series ever expanding, highly protected, data storage centers. The most striking consequence of these centers at first appears to be about materiality (and in turn environmental ecological repercussions) but I want to suggest that these concerns cannot be separated from preservation ideals and politics, that are especially pertinent for the (concept of the) archive.

If we consider that:

Facebook accounts for 1 out of every 7 minutes spent online…

We collectively “like” things 2 million times a minute…

We upload 3000 photos to Facebook every second…

We ingest more than 500 terabytes of data every day…

It takes about 1 pound of coal to create, package, store and move 2 megabytes of data…

According to a 2011 Greenpeace report, How Dirty is Your Data?, Facebook’s US-based data centers are each consuming the electricity of approximately 30 000 US homes. Facebook eats up anywhere from 9 percent to 25 percent of Canada’s and the US’s Internet’s traffic.

The questions that arise are:

What kind of infrastructure and technologies are required to host such large amounts of ‘free’ information, offering up data so rapidly, across so many platforms?

How are Facebook’s servers powered?

How many servers does Facebook have?

Where are Facebook’s servers located?

To support the growing activity of its social network since 2004, Facebook has built several data centers, including its first non US facility. This offshore storage center is made to metaphorically accommodate the 70 percent of Facebook users who live outside the US. Facebook also leases server space in nine or so data centers bicoastally (Miller, 2011).

–       Prineville, Oregon: In 2010, Facebook built its first data storage center

–       cost of 210 million dollars

–       built on vacant grounds, on a high plain above the small town, exposing its 147 000 square feet

–       remaining conveniently out of sight.

–       Foresthill, North Carolina: double the size of Prineville center

–       building started before Prineville facility was complete.

–       Lulea, Sweden

–       The third and most recent storage center to be built by Facebook is to be in Lulea, Sweden, a town of 50,000 residents.

–       ideal location with its cold climate serves with the hopes of working off electricity derived entirely from renewable sources.

–       It’s regional power grid is said to be extraordinarily reliable—no disruption of service since 1979—

–       Is the size of three US-based complexes, is estimated to be fully operational by 2014.

–       Each of the three complexes is equal to the one in Forest Hill, which was itself double the size of the previous one in Prineville.


Like the data growth itself, the storage centers are proliferating at exponential rates, in size and speed.

What’s the relationship between these (dislocated) data centers and the archive?

What choices are we making about the way our lives are archived through Facebook?

What are our expectations of the always on always available archive?

This upgraded archive is always ‘on,’ always able to deliver content. But by the same token, it exists in a state of constant potential.

Facilities operate at their full capacity at all time, regardless of the actual demand, which means that an incredible amount of energy is reserved for idling. The entire process–much of it redundant–is constantly backed-up (often using polluting generators), in case of a power outage, activity surge, or glitch in the system, to ensure immediate and seemingly uninterrupted service

As recently documented in The New York Times, more than 90 percent of servers is reserved for and used for stand-by only, while the remaining 10 percent is used for computation (Glanz, 2012).

This may be the single most telling insight from an archival point of view: the ideal of instantaneity imparted onto it by users who are simultaneously creating and subjected to such an unsustainable modality.

These demands are doubling globally every 18 months…

These figures continue to grow in tandem as demands multiplies: but to what end? Given the expansion rate, the model is set to fail if it’s based on the idea that we can continually match the growth of data to physical storage centers.


Who benefits?

What are the costs?

How is the impact measured?

Why does this matter?


Max Schrems: One telling anecdote that challenged the way Facebook determines layers of data (and user access to the past) is that of law student Max Schrems, of Vienna, Austria, who under EU law was legally entitled to request his dataset from Facebook. In December 2010, after using the site for three years, he demanded from Facebook a copy of all the information they had collected through his profile: he was sent a 1222-page PDF (O’neill, 2011).

This PDF outlines “records of when Schrems logged in and out of the social network, the times and content of sent and received messages and an accounting of every person and thing he’s ever liked, posted, poked, friended or recorded” (Europe vs Facebook, 2012; Donohue, 2011).

In this same article, Schrems is said to have remarked his amazement at the time about ‘how much it remembers’ and ‘how much it knows’—deleted posts or posts that are set to ‘private’ fall into the same data bank as public posts in the Facebook archive (Cheng, 2012).

Increasingly, the data generated in Facebook cannot be separated from the network or storage centers required to process, aggregate and preserve it. Tracking at all these levels demonstrates the extent to which the social network itself generates a parallel archive, of movement and habits, recording the interactions of the network itself, as a simultaneous—but exponentially bigger—living, archive. This parallel archive may come to make correlation about ourselves about which we are not even yet aware.

Users are detached from the contradictions embedded in the materialities of the process, and its technological stresses, and therefore necessarily continue to understand themselves, their mediated histories, and their roles within these data flows from this detached purview.