Freedom to Tinker

Syndicate content
Updated: 19 hours 37 min ago

Robots and the Law

Mon, 11/16/2009 - 10:29am

Stanford Law School held a panel Thursday on "Legal Challenges in an Age of Robotics". I happened to be in town so I dropped by and heard an interesting discussion.

Here's the official announcement:

Once relegated to factories and fiction, robots are rapidly entering the mainstream. Advances in artificial intelligence translate into ever-broadening functionality and autonomy. Recent years have seen an explosion in the use of robotics in warfare, medicine, and exploration. Industry analysts and UN statistics predict equally significant growth in the market for personal or service robotics over the next few years. What unique legal challenges will the widespread availability of sophisticated robots pose? Three panelists with deep and varied expertise discuss the present, near future, and far future of robotics and the law.

The key questions are how robots differ from past technologies, and how those differences change the law and policy issues we face.

Three aspects of robots seemed to recur in the discussion: robots take action that is important in the world; robots act autonomously; and we tend to see robots as beings and not just machines.

The last issue -- robots as beings -- is mostly a red herring for our purposes, notwithstanding its appeal as a conversational topic. Robots are nowhere near having the rights of a person or even of a sentient animal, and I suspect that we can't really imagine what it would be like to interact with a robot that qualified as a conscious being. Our brains seem to be wired to treat self-propelled objects as beings -- witness the surprising acceptance of robot "dogs" that aren't much like real dogs -- but that doesn't mean we should grant robots personhood.

So let's set aside the consciousness issue and focus on the other two: acting in the world, and autonomy. These attributes are already present in many technologies today, even in the purely electronic realm. Consider, for example, the complex of computers, network equipment, and software make up Google's data centers. Its actions have significant implications in the real world, and it is autonomous, at least in the sense that the panelists seemed to using the term "autonomous" -- it exhibits complex behavior without direct, immediate human instruction, and its behavior is often unpredictable even to its makers.

In the end, it seemed to me that the legal and policy issues raised by future robots will not be new in kind, but will just be extrapolations of the issues we're already facing with today's complex technologies -- and not a far extrapoloation but more of a smooth progression from where we are now. These issues are important, to be sure, and I was glad to hear smart panelists debating them, but I'm not convinced yet that we need a law of the robot. When it comes to the legal challenges of technology, the future will be like the past, only more so.

Still, if talking about robots will get policymakers to pay more attention to important issues in technology policy, then by all means, let's talk about robots.

Targeted Copyright Enforcement vs. Inaccurate Enforcement

Thu, 11/12/2009 - 12:22pm

Let's continue our discussion about copyright enforcement against online infringers. I wrote last time about how targeted enforcement can deter many possible violators even if the enforcer can only punish a few violators. Clever targeting of enforcement can destroy the safety-in-numbers effect that might otherwise shelter a crowd of would-be violators.

In the online copyright context, the implication is that large copyright owners might be able to use lawsuit threats to deter a huge population of would-be infringers, even if they can only manage to sue a few infringers at a time. In my previous post, I floated some ideas for how they might do this.

Today I want to talk about the implications of this. Let's assume, for the sake of argument, that copyright owners have better deterrence strategies available -- strategies that can deter more users, more effectively, than they have managed so far. What would this imply for copyright policy?

The main implication, I think, is to shed doubt on the big copyright owners' current arguments in favor or broader, less accurate enforcement. These proposed enforcement strategies go by various names, such as "three strikes" and "graduated response". What defines them is that they reduce the cost of each enforcement action, while at the same time reducing the assurance that the party being punished is actually guilty.

Typically the main source of cost reduction is the elimination of due process for the accused. For example, "three strikes" policies typically cut off someone's Internet connection if they are accused of infringement three times -- the theory being that making three accusations is much cheaper than proving one.

There's a hidden assumption underlying the case for cheap, inaccurate enforcement: that the only way to deter infringement is to launch a huge number of enforcement actions, so that most of the would-be violators will expect to face enforcement. The main point of my previous post is that this assumption is not necessarily true -- that it's possible, at least in principle, to deter many people with a moderate number of enforcement actions.

Indeed, one of the benefits of an accurate enforcement strategy -- a strategy that enforces only against actual violators -- is that the better it works, the cheaper it gets. If there are few violators, then few enforcement actions will be needed. A high-compliance, low-enforcement equilibrium is the best outcome for everybody.

Cheap, inaccurate enforcement can't reach this happy state.

Let's say there are 100 million users, and you're using an enforcement strategy that punishes 50% of violators, and 1% of non-violators. If half of the people are violators, you'll punish 25 million violators, and you'll punish 500,000 non-violators. That might seem acceptable to you, if the punishments are small. (If you're disconnecting 500,000 people from modern communications technology, that would be a different story.)

But now suppose that user behavior shifts, so that only 1% of users are violating. Then you'll be punishing 500,000 violators (50% of the 1,000,000 violators) along with 990,000 non-violators (1% of the 99,000,000 non-violators). Most of the people you'll be punishing are innocent, which is clearly unacceptable.

Any cheap, inaccurate enforcement scheme will face this dilemma: it can be accurate, or it can be fair, but it can't be both. The better is works, the more unfair it gets. It can never reach the high-compliance, low-enforcement equilibrium that should be the goal of every enforcement strategy.

Targeted Copyright Enforcement: Deterring Many Users with a Few Lawsuits

Mon, 11/09/2009 - 6:45am

One reason the record industry's strategy of suing online infringers ran into trouble is that there are too many infringers to sue. If the industry can only sue a tiny fraction of infringers, then any individual infringer will know that he is very unlikely to be sued, and deterrence will fail.

Or so it might seem -- until you read The Dynamics of Deterrence, a recent paper by Mark Kleiman and Beau Kilmer that explains how to deter a great many violators despite limited enforcement capacity.

Consider the following hypothetical. There are 26 players, whom we'll name A through Z. Each player can choose whether or not to "cheat". Every player who cheats gets a dollar. There's also an enforcer. The enforcer knows exactly who cheated, and can punish one (and only one) cheater by taking $10 from him. We'll assume that players have no moral qualms about cheating -- they'll do whatever maximizes their expected profit.

This situation has two stable outcomes, one in which nobody cheats, and the other in which everybody cheats. The everybody-cheats outcome is stable because each player figures that he has only a 1/26 chance of facing enforcement, and a 1/26 chance of losing $10 is not enough to scare him away from the $1 he can get by cheating.

It might seem that deterrence doesn't work because the cheaters have safety in numbers. It might seem that deterrence can only succeed by raising the penalty to more than $26. But here comes Kleiman and Kilmer's clever trick.

The enforcer gets everyone together and says, "Listen up, A through Z. From now on, I'm going to punish the cheater who comes first in the alphabet." Now A will stop cheating, because he knows he'll face certain punishment if he cheats. B, knowing that A won't cheat, will then realize that if he cheats, he'll face certain punishment, so B will stop cheating. Now C, knowing that A and B won't cheat, will reason that he had better stop cheating too. And so on ... with the result that nobody will cheat.

Notice that the trick still works even if punishment is not certain. Suppose each cheater has an 80% chance of avoiding detection. Now A is still deterred, because even a 20% chance of being fined $10 outweighs the $1 benefit of cheating. And if A is deterred, then B is deterred for the same reason, and so on.

Notice also that this trick might work even if some of the players don't think things through. Suppose A through J are all smart enough not to cheat, but K is clueless and cheats anyway. K will get punished. If he cheats again, he'll get punished again. K will learn quickly, by experience, that cheating doesn't pay. And once K learns not to cheat, the next clueless player will be exposed and will start learning not to cheat. Eventually, all of the clueless players will learn not to cheat.

Finally, notice that there's nothing special about using alphabetical order. The enforcer could use reverse alphabetical or any other order, and the same logic would apply. Any ordering will do, as long as each player knows where he is in the order.

Now let's apply this trick to copyright deterrence. Suppose the RIAA announces that from now on they're going to sue the violators who have the lowest U.S. IP addresses. Now users with low IP addresses will have a strong incentive to avoid infringing, which will give users with slightly higher IP addresses a stronger incentive to avoid infringing, and so on.

You might object that infringers aren't certain to get caught, or that infringers might be clueless or irrational, or that IP address order is arbitrary. But I explained above why these objections aren't necessarily showstoppers. Players might still be deterred even if detection is a probability rather than a certainty; clueless players might still learn by experience; and an arbitrary ordering can work perfectly well.

Alternatively, the industry could use time as an ordering, by announcing, for example, that starting at 8:00 PM Eastern time tomorrow evening, they will sue the first 1000 U.S. users they see infringing. This would make infringing at 8:00 PM much riskier than normal, which might keep some would-be infringers offline at that hour, which in turn would make infringing at 8:00 PM even riskier, and so on. The resulting media coverage ("I infringed at 8:02 and now I'm facing a lawsuit") could make the tactic even more effective next time.

(While IP address or time ordering might work, many other orderings are infeasible. For example, they can't use alphabetical ordering on the infringers' names, because they don't learn names until later in the process. The ideal ordering is one that can be applied very early in the investigative process, so that only cases at the beginning of the ordering need to be investigated. IP address and time ordering work well in this respect, as they are evident right away and are evident to would-be infringers.)

I'm not claiming that this trick will definitely work. Indeed, it would be silly to claim that it could drive online infringement to zero. But there's a chance that it would deter more infringers, for longer, than the usual approach of seemingly random lawsuits has managed to do.

This approach has some interesting implications for copyright policy, as well. I'll discuss those next time.

New York AG Files Antitrust Suit Against Intel

Thu, 11/05/2009 - 10:04am

Yesterday, New York's state Attorney General filed what could turn out to be a major antitrust suit against Intel. The suit accuses Intel of taking illegal steps to exclude a competitor, AMD, from the market.

All we have so far is the NYAG's complaint, which tells one side of the case. Intel will have ample opportunity to respond, and the NYAG will ultimately have the burden of backing up its allegations with proof -- so caution is in order at this point. Still, the complaint lays out the shape of the NYAG's case.

The case concerns the market for x86-compatible microprocessors, which are the "brains" of most personal computers. Intel dominates this market but a rival company, AMD, has long been trying to build market share. The complaint offers a long narrative of Intel's (and AMD's) relationships with major PC makers ("OEMs", in the jargon) such as Dell, HP, and IBM -- the customers who buy x86 processors from Intel and AMD.

The crux of the case is the allegation that Intel paid OEMs to not buy from AMD. This is reminiscent of one aspect of the big Microsoft antitrust case of 1998, in which one of the DOJ's claims was that Microsoft had paid people not to do business with Netscape.

I'll leave it to the experts to debate the economic niceties, but as I understand it there is a distinction between paying someone to buy more of your product (e.g. giving a volume discount) as opposed to paying someone to buy less of your rival's product. The former is generally fine, but if you have monopoly power the latter is suspect.

As the NYAG tells it, Intel tried to pretend the payments were for something else, but the participants knew what was really going on: that the payments would stop if an OEM started buying more from AMD. The evidence on this point could turn out to be important. Does the NYAG have "smoking gun" emails in which Intel made this explicit? Does the evidence show that OEMs understood the arrangement as the NYAG claims? I assume there's a huge trove of email evidence that both sides will be digesting.

It will be interesting to watch this case develop. Thanks to tools like RECAP, many of the case documents will be available to the public. Stay tuned for more improvements to RECAP that will provide even better access.

Election Day; More Unguarded Voting Machines

Tue, 11/03/2009 - 8:52am

It's Election Day in New Jersey. As usual, I visited several polling places in Princeton over the last few days, looking for unguarded voting machines. It's been well demonstrated that a bad actor who can get physical access to a New Jersey voting machine can modify its behavior to steal votes, so an unguarded voting machine is a vulnerable voting machine.

This time I visited six polling places. What did I find?

The good news -- and there was a little -- is that in one of the six polling places, the machines were properly secured. I'm not sure where the machines were, but I know that they were not visible anywhere in the accessible areas of the building. Maybe the machines were locked in a storage room, or maybe they hadn't been delivered yet, but anyway they were probably safe. This is the first time I have ever found a local polling place, the night before the election, with properly secured voting machines.

At the other five polling places, things weren't so good. At three places, the machines were unguarded in an area open to the public. I walked right up to them and had private time with them. In two other places, the machines were visible from outside the building and protected only by an outside door with an easily defeated lock. I didn't defeat the locks myself -- I wasn't going to cross that line -- but I'll bet you could have opened them quickly with tools you probably have in your car.

The final scorecard: ten machines totally unprotected, eight machines poorly protected, two machines well-protected. That's an improvement, but then again any protection at all would have been an improvement. We still have a long way to go.

AttachmentSize votingmachines_3nov2009.jpg674.16 KB

Sequoia Announces Voting System with Published Code

Thu, 10/29/2009 - 6:45am

Sequoia Voting Systems, one of the major e-voting companies, announced Tuesday that it will publish all of the source code for its forthcoming Frontier product. This is great news--an important step toward the kind of transparency that is necessary to make today's voting systems trustworthy.

To be clear, this will not be a fully open source system, because it won't give users the right to modify and redistribute the software. But it will be open in a very important sense, because everyone will be free to inspect, analyze, and discuss the code.

Significantly, the promise to publish code covers all of the systems involved in running the election and reporting results, "including precinct and central count digital optical scan tabulators, a robust election management and ballot preparation system, and tally, tabulation, and reporting applications". I'm sure the research community will be eager to study this code.

The trend toward publishing election system source code has been building over the last few years. Security experts have long argued that public scrutiny tends to increase security, and is one of the best ways to justify public trust in a system. Independent studies of major voting vendors' source code have found code quality to be disappointing at best, and vendors' all-out resistance to any disclosure has eroded confidence further. Add to this an increasing number of independent open-source voting systems, and secret voting technologies start to look less and less viable, as the public starts insisting that longstanding principles of election transparency be extended to election technology. In short, the time had come for this step.

Still, Sequoia deserves a lot of credit for being the first major vendor to open its technology. How long until the other major vendors follow suit?

DRM by any other name: The latest from Hollywood

Wed, 10/28/2009 - 7:45am

Sunday's New York Times had an article, Studios' Quest for Life After DVDs. To nobody's surprise, consumers want to have convenient access to "their" media, wherever they happen to be, without all the annoying restrictions that come into play when you add DRM to the picture. To many people's surprise, sales of DVDs (much less Blu-ray) are in trouble.

In the third quarter, studios’ home entertainment divisions generated about $4 billion, down 3.2 percent from a year ago, according to the Digital Entertainment Group, a trade consortium. But digital distribution contributed just $420 million, an increase of 18 percent.

Given that DVDs are really a luxury good (versus, say, food or electricity), the 3.2 percent drop seems like Hollywood is getting off easy. The growth in digital distribution is clearly getting attention, though. What's going on here? I imagine several things. People sometimes miss their shows. Maybe the cable went out. Maybe the TiVo crashed. Maybe they're on the road. Drop $2 at the iTunes Store and you're good to go. That's attractive and it's real money.

Still, the article goes on to talk about... yet more DRM.

Standing in the way are technology hurdles — how to let consumers play a video on various devices without letting them share it with 10,000 close friends on a pirate site — and the reluctance of studios to cooperate too closely with rivals for reasons of antitrust scrutiny and sheer competitiveness.
...
And piracy, at least conceptually, would be less of a worry. The technology [Disney's Keychest] rests on cloud computing, in which huge troves of data are stored on remote servers so users have access from anywhere. Movies would be streamed from the cloud and never downloaded, making them harder to pirate.

Of course, this is baloney. If it's going to work on my iPhone while I'm sitting in an airplane, the entire video needs to be stored there in advance. Furthermore, if the video is supposed to be "high definition," that's a bare minimum of 5 megabits/sec. (Broadcast HD is 20 megabits/sec and Blu-ray is 48 megabits/sec.) Most home DSL or cable modem connections either will never go that fast, or certainly cannot maintain those speeds without hiccups, particularly when sharing the line with other users. To do high quality video, you either have to have a real broadcast medium (cable, over-the-air, or satellite) or you have to download in advance and store on a hard drive.

And, of course, once you've stored the video, it's just not that hard to extract it. And it always will be. The challenge for Hollywood is to change the incentives of the game. Maybe sell me a flat-rate subscription. Maybe bundle it with my DSL provider. But make the experience compelling enough and cheap enough, and I'll do it. I regularly extract video from my TiVo and copy it to my iPhone via third-party software. It's practically painless and it happens to yield files that I could share with the world, but I don't. Why? Because there's real downside (I'd rather not get sued, thanks), and no particular upside.

So, dearest Hollywood executive, consider that selling your content for a reduced price, with no DRM, is not the same thing as "giving it away." If you allow third-parties to license your content and distribute it without DRM, you can still go after the "pirates", yet you'll allow normal people to enjoy your work without making them suffer for it. Yes, you may have kids copying content from one to the next, just like we used to do dubbing cassette tapes, but those incremental losses can and will be offset by the incremental gains of people enjoying your work and hitting the "buy" button.

There’s anonymity on the Internet. Get over it.

Tue, 10/27/2009 - 4:07pm

In a recent interview prominent antivirus developer Eugene Kaspersky decried the role of anonymity in cybercrime. This is not a new claim – it is touched on in the Commission on Cybersecurity for the 44th Presidency Report and Cybersecurity Act of 2009, among others – but it misses the mark. Any Internet design would allow anonymity. What renders our Internet vulnerable is primarily weakness of software security and authentication, not anonymity.

Consider a hypothetical of three Internet users: Alice, Bob, and Charlie. If Alice wants to communicate anonymously with Charlie, she may relay her messages through Bob. While Charlie knows Bob is an intermediary, Charlie does not know with whom he is ultimately communicating. For even greater anonymity Alice can pass her messages through multiple Bobs, and by applying cryptography she can ensure no individual Bob can piece together that she is communicating with Charlie. This basic approach to anonymity is remarkable in its independence of the Internet’s design: it only requires that some Bob(s) can and do run intermediary software. Even on an Internet where users could verify each other’s identity this means of anonymity would remain viable.

The sad state of software security – the latest DHS weekly bulletin alone identified over 40 “high severity” vulnerabilities – is what enables malicious users to exploit the Internet’s indelible capacity for anonymity. Modifying the prior hypothetical, suppose Alice now wants to spam, phish, denial of service (DoS) attack, or hack Charlie. After compromising Bob’s computer with malicious software (malware), Alice can send emails, host websites, and launch DoS attacks from it; Charlie knows Bob is apparently misbehaving, but has no means of discovering Alice’s role. Nearly all spam, phishing, and DoS attacks are now perpetrated with networks of compromised computers like Bob’s (botnets). At the writing of a July 2009 private sector report, just five botnets sourced nearly 75% of spam. Worse yet, botnets are increasingly self-perpetuating: spam and phishing websites propagate malware that compromises new computers for the botnet.

Shortcomings in authentication, the means of proving one’s identity either when necessary or at all times, are a secondary contributor to the Internet’s ills. Most applications rely on passwords, which are easily guessed or divulged through deception – the very mechanisms of most phishing and account hijacking. There are potential technical solutions that would enable a user to authenticate themselves without the risk of compromising accounts. But any approach will be undermined by weaknesses in underlying software security when a malicious party can trivially compromise a user’s computer.

The policy community is already trending towards acceptance of Internet anonymity and refocusing on software security and authentication; the recent White House Cyberspace Policy Review in particular emphasizes both issues. To the remaining unpersuaded, I can only offer at last a truism: There’s anonymity on the Internet. Get over it.

Net Neutrality: When is Network Management "Reasonable"?

Mon, 10/26/2009 - 4:54pm

Last week the FCC released its much-awaited Notice of Proposed Rulemaking (NPRM) on network neutrality. As expected, the NPRM affirms past FCC neutrality principles, and adds two more. Here's the key language:

1. Subject to reasonable network management, a provider of broadband Internet access service may not prevent any of its users from sending or receiving the lawful content of the user's choice over the Internet.

2. Subject to reasonable network management, a provider of broadband Internet access service may not prevent any of its users from running the lawful applications or using the lawful services of the user's choice.

3. Subject to reasonable network management, a provider of broadband Internet access service may not prevent any of its users from connecting to and using on its network the user's choice of lawful devices that do not harm the network.

4. Subject to reasonable network management, a provider of broadband Internet access service may not deprive any of its users of the user's entitlement to competition among network providers, application providers, service providers, and content providers.

5. Subject to reasonable network management, a provider of broadband Internet access service must treat lawful content, applications, and services in a nondiscriminatory manner.

6. Subject to reasonable network management, a provider of broadband Internet access service must disclose such information concerning network management and other practices as is reasonably required for users and content, application, and service providers to enjoy the protections specified in this part.

That's a lot of policy packed into (relatively) few words. I expect that my colleagues and I will have a lot to say about these seemingly simple rules over the coming weeks.

Today I want to focus on the all-purpose exception for "reasonable network management". Unpacking this term might tell us a lot about how the proposed rule would operate.

Here's what the NPRM says:

Reasonable network management consists of: (a) reasonable practices employed by a provider of broadband Internet access to (i) reduce or mitigate the effects of congestion on its network or to address quality-of-service concerns; (ii) address traffic that is unwanted by users or harmful; (iii) prevent the transfer of unlawful content; or (iv) prevent the unlawful transfer of content; and (b) other reasonable network management practices.

The key word is "reasonable", and in that respect the definition is nearly circular: in order to be "reasonable", a network management practice must be (a) "reasonable" and directed toward certain specific ends, or (b) "reasonable".

In the FCC's defense, it does seek comments and suggestions on what the definition should be, and it does say that it intends to make case-by-case determinations in practice, as it did in the Comcast matter. Further, it rejects a "strict scrutiny" standard of the sort that David Robinson rightly criticized in a previous post.

"Reasonable" is hard to define because in real life every "network management" measure will have tradeoffs. For example, a measure intended to block copyright-infringing material would in practice make errors in both directions: it would block X% (less than 100%) of infringing material, while as a side-effect also blocking Y% (more than 0%) of non-infringing material. For what values of X and Y is such a measure "reasonable"? We don't know.

Of course, declaring a vague standard rather than a bright-line rule can sometimes be good policy, especially where the facts on the ground are changing rapidly and it's hard to predict what kind of details might turn out to be important in a dispute. Still, by choosing a case-by-case approach, the FCC is leaving us mostly in the dark about where it will draw the line between "reasonable" and "unreasonable".

Intractability of Financial Derivatives

Thu, 10/15/2009 - 6:45am

A new result by Princeton computer scientists and economists shows a striking application of computer science theory to the field of financial derivative design. The paper is Computational Complexity and Information Asymmetry in Financial Products by Sanjeev Arora, Boaz Barak, Markus Brunnermeier, and Rong Ge. Although computation has long been used in the financial industry for program trading and "the thermodynamics of money", this new paper applies an entirely different kind of computer science: Intractability Theory.

A financial derivative is a contract specifying a payoff calculated by some formula based on the yields or prices of a specific collection of underlying assets. Consider the securitization of debt: a CDO (collateralized debt obligation) is a security formed by packaging together hundreds of home mortgages. The CDO is supposedly safer than the individual mortgages, since it spreads the risk (not every mortgage is supposed to default at once). Furthermore, a CDO is usually divided into "senior tranches" which are guaranteed not to drop in value as long as the total defaults in the pool does not exceed some threshhold; and "junior tranches" that are supposed to bear all the risk.

Trading in derivatives brought down Lehman Brothers, AIG, and many other buyers, based on mistaken assumptions about the independence of the underlying asset prices; they underestimated the danger that many mortgages would all default at the same time. But the new paper shows that in addition to that kind of danger, risks can arise because a seller can deliberately construct a derivative with a booby trap hiding in plain sight.

It's like encryption: it's easy to construct an encrypted message (your browser does this all the time), but it's hard to decrypt without knowing the key (we believe even the NSA doesn't have the computational power to do it). Similarly, the new result shows that the seller can construct the CDO with a booby trap, but even Goldman Sachs won't have enough computational power to analyze whether a trap is present.

The paper shows the example of a high-volume seller who builds 1000 CDOs from 1000 asset-classes of home mortages. Suppose the seller knows that a few of those asset classes are "lemons" that won't pay off. The seller is supposed to randomly distribute the asset classes into the CDOs; this minimizes the risk for the buyer, because there's only a small chance that any one CDO has more than a few lemons. But the seller can "tamper" with the CDOs by putting most of the lemons in just a few of the CDOs. This has an enormous effect on the senior tranches of those tampered CDOs.

In principle, an alert buyer can detect tampering even if he doesn't know which asset classes are the lemons: he simply examines all 1000 CDOs and looks for a suspicious overrepresentation of some of the asset classes in some of the CDOs. What Arora et al. show is that is an NP-complete problem ("densest subgraph"). This problem is believed to be computationally intractable; thus, even the most alert buyer can't have enough computational power to do the analysis.

Arora et al. show it's even worse than that: even after the buyer has lost a lot of money (because enough mortgages defaulted to devalue his "senior tranche"), he can't prove that that tampering occurred: he can't prove that the distribution of lemons wasn't random. This makes it hard to get recourse in court; it also makes it hard to regulate CDOs.

Intractability Theory forms the basis for several of the technologies discussed on Freedom-to-Tinker: cryptography, digital-rights management, watermarking, and others. Perhaps now financial policy is now another one.

Sidekick Users' Data Lost: Blame the Cloud?

Wed, 10/14/2009 - 8:42am

Users of Sidekick mobile phones saw much of their data disappear last week due to engineering problems at a Microsoft data center. Sidekick devices lose the contents of their memory when they don't have power (e.g. when the battery is being changed), so all data is transmitted to a data center for permanent storage -- which turned out not to be so permanent.

(The latest news is that some of the data, perhaps most of it, may turn out to be recoverable.)

A common response to this story is that this kind of danger is inherent in "cloud" computing services, where you rely on some service provider to take care of your data. But this misses the point, I think. Preserving data is difficult, and individual users tend to do a mediocre job of it. Admit it: You have lost your own data at some point. I know I have lost some of mine. A big, professionally run data center is much less likely to lose your data than you are.

It's worth noting, too, that many cloud services face lower risk of this sort of problem. My email, for example, lives in the cloud--the "official copy" is on a central server, and copies are downloaded frequently to my desktop and laptop computers. If the server were to go up in flames, along with all of the server backups, I would still be in good shape, because I would still have copies of everything on my desktop and laptop.

For my email and similar services, the biggest risk to data integrity is not that the server will disappear altogether, but that the server will misbehave in subtle ways, causing my stored data to be corrupted over time. Thanks to the automatic synchronization between the server and my two clients (desktop and laptop), bad data could be replicated silently into all copies. In principle, some of the damage could be repaired later, using the server's backups, but that's a best case scenario.

This risk, of buggy software corrupting data, has always been with us. The question is not whether problems will happen in the cloud -- in any complex technology, trouble comes with the territory -- but whether the cloud makes a problem worse.

PrivAds: Behavioral Advertising without Tracking

Mon, 10/12/2009 - 2:53pm

There's an interesting new paper out of Stanford and NYU, about a system called "PrivAds" that tries to provide behavioral advertising on web sites, without having a central server gather detailed information about user behavior. If the paper's approach turns out to work, it could have an important impact on the debate about online advertising and privacy.

Advertisers have obvious reasons to show you ads that match your interests. You can benefit too, if you see ads that are relevant to your needs, rather than ones you don't care about. The problem, as I argued in my Congressional testimony, comes when sites track your activities, and build up detailed files on you, in order to do the targeting.

PrivAds tries to solve this problem by providing behavioral advertising without having any server track you. The idea is that your own browser will track you, and analyze your online activities to build a model of your interests, but your browser won't reveal this information to anyone else. When a site wants to show you an interest-based ad, your browser will choose the ad from a portfolio of ads offered by the ad service.

The tricky part is how your browser can do all of this without incidentally leaking your activities to the server. For example, the ad agency needs to know how many times each ad was shown. How can you report this to the ad service without revealing which ads you saw? PrivAds offers a solution based on fancy cryptography, so that the ad agency can aggregate reports from many users, without being able to see the users' individual reports. Similarly, every interaction between your browser and the outside must be engineered carefully so that behavioral advertising can occur but the browser doesn't telegraph your actions.

It's not clear at this point whether the PrivAds approach will work, in the sense of protecting privacy without reducing the effectiveness of ad targeting. It's clear, though, that PrivAds is asking an important question.

If the PrivAds approach succeeds, demonstrating that behavioral advertising does not require tracking, this doesn't mean that companies will stop wanting to track you -- but it does mean that they won't be able to use advertising as an excuse to track you.

Chilling and Warming Effects

Fri, 10/09/2009 - 10:41am

For several years, the Chilling Effects Clearinghouse has cataloging the effects of legal threats on online expression and helping people to understand their rights. Amid all the chilling we continue to see, it's welcome to see rays of sunshine when bloggers stand up to threats, helping to stop the cycle of threat-and-takedown.

The BoingBoing team did this the other day when they got a legal threat from Ralph Lauren's lawyers over an advertisement they mocked on the BoingBoing blog for featuring a stick-thin model. The lawyers claimed copyright infringement, saying "PRL owns all right, title, and interest in the original images that appear in the Advertisements." Other hosts pull content "expeditiously" when they receive these notices (as Google did when notified of the post on Photoshop Disasters), and most bloggers and posters don't counter-notify, even though Chilling Effects offers a handy counter-notification form.

Not BoingBoing, they posted the letter (and the image again) along with copious mockery, including an offer to feed the obviously starved models, and other sources picked up on the fun. The image has now been seen by many more people than would have discovered it in BoingBoing's archives, in a pattern the press has nicknamed the "Streisand Effect."

We use the term "chilling effects" to describe indirect legal restraints, or self-censorship, because most cease-and-desist letters don't go through the courts. The lawyers (and non-lawyers) sending them rely on the in terrorem effects of threatened legal action, and often succeed in silencing speech for the cost of an e-postage stamp.

Actions like BoingBoing's use the court of public opinion to counter this squelching. They fight legalese with public outrage (in support of legal analysis), and at the same time, help other readers to understand they have similar rights. Further, they increase the "cost" of sending cease-and-desists, as they make potential claimants consider the publicity risks being made to look foolish, bullying, or worse.

For those curious about the underlying legalities here, the Copyright Act makes clear that fair use, including for the purposes of commentary, criticism, and news reporting, is not an infringement of copyright. See Chilling Effects' fair use FAQ. Yet the DMCA notice-and-takedown procedure encourages ISPs to respond to complaints with takedown, not investigation and legal balancing. Providers like BoingBoing's Priority Colo should also get credit for their willingness to back their users' responses.

As a result of the attention, Ralph Lauren apologized for the image: "After further investigation, we have learned that we are responsible for the poor imaging and retouching that resulted in a very distorted image of a woman's body. We have addressed the problem and going forward will take every precaution to ensure that the caliber of our artwork represents our brand appropriately."

May the warming (and proper attention to the health of fashion models) continue!

[cross-posted at Chilling Effects]

Privacy as a Social Problem, Not a Technology Problem

Wed, 10/07/2009 - 11:28am

Bob Blakley had an interesting post Monday, arguing that technologists tend to frame the privacy issue poorly. (I would add that many non-technologists use the same framing.) Here's a sample:

That's how privacy works; it's not about secrecy, and it's not about control: it's about sociability. Privacy is a social good which we give to one another, not a social order in which we control one another.

Technologists hate this; social phenomena aren't deterministic and programmers can't write code to make them come out right. When technologists are faced with a social problem, they often respond by redefining the problem as a technical problem they think they can solve.

...

The privacy framing that's going on in the technology industry today is this:

Social Frame: Privacy is a social problem; the solution is to ensure that people use sensitive personal information only in ways that are beneficial to the subject of the information.

BUT as technologists we can't ... control peoples' behavior, so we can't solve this problem. So instead let's work on a problem that sounds similar:

Technology Frame: Privacy is a technology problem; since we can't make people use sensitive personal information sociably, the solution is to ensure that people never see others' sensitive personal information.

We technologists have tried to solve the privacy problem in this technology frame for about a decade now, and, not surprisingly (information wants to be free!) we have failed.

...

The technology frame isn't the problem. Privacy is the problem. Society can and routinely does solve the privacy problem in the social frame, by getting the vast majority of people to behave sociably.

This is an excellent point, and one that technologists and policymakers would be wise to consider. Privacy depends, ultimately, on people and institutions showing a reasonable regard for the privacy interests of others.

Bob goes on to argue that technologies should be designed to help these social mechanisms work.

A sociable space is one in which people's social and antisocial actions are exposed to scrutiny so that normal human social processes can work.

A space in which tagging a photograph publicizes not only the identities of the people in the photograph but also the identities of the person who took the photograph and the person who tagged the photograph is more sociable than a space in which the only identity revealed is that of the person in the photograph - because when the picture of Jimmy holding a martini washes up on the HR department's desk, Jimmy will know that Johnny took it (at a private party) and Julie tagged him - and the conversations humans have developed over tens of thousands of years to handle these situations will take place.

Again, this is an excellent and underappreciated point. But we need to be careful how far we take it. If we go beyond Bob's argument, and we say that good design of the kind he advocates can completely solve the online privacy problem, then we have gone too far.

Technology doesn't just move old privacy problems online. It also creates new problems and exacerbates old ones. In the old days, Johnny and Julie might have taken a photo of Jimmy drinking at the office party, and snail-mailed the photo to HR. That would have been a pretty hostile act. Now, the same harm can arise from a small misunderstanding: Johnny and Julie might assume that HR is more tolerant, or that HR doesn't watch Facebook; or they might not realize that a site allows HR to search for photos of Jimmy. A photo might be taken by Johnny and tagged by Julie, even though Johnny and Julie don't know each other. All in all, the photo scenario is more likely to happen today than in the pre-Net age.

This is just one example of what James Grimmelmann calls Accidental Privacy Spills. Grimmelmann tells the story of a private email message that was forwarded and re-forwarded to thousands of people, not by malice but because many people made the seemingly harmless decision to forward it to a few friends. This would never have happened with a personal letter. (Personal letters are sometimes publicized against the wishes of the author, but that's very rare and wouldn't have happened in the case Grimmelmann describes.) As the cost of capturing, transmitting, storing, and searching photos and other digital information falls to near-zero, it's only natural that more capturing, transmitting, storing, and searching of information will occur.

Good design is not the whole solution to our privacy problem. But design has the huge advantage that we can get started on it right away, without needing to reach some sweeping societal agreement about what the rules should be. If you're designing a product, or deciding which product to use, you can support good privacy design today.

Introducing FedThread: Opening the Federal Register

Mon, 10/05/2009 - 8:51am

Today we are rolling out FedThread, a new way of interacting with the Federal Register. It's the latest civic technology project from our team at Princeton's Center for Information Technology Policy.

The Federal Register is "[t]he official daily publication for rules, proposed rules, and notices of Federal agencies and organizations, as well as executive orders and other presidential documents." It's published by the U.S. government, five days a week. The Federal Register tells citizens what their government is doing, in a lot more detail than the news media do.

FedThread makes the Federal Register more open and accessible. FedThread gives users:

  • collaborative annotation: Users can attach a note to any paragraph of the Federal Register; a conversation thread hangs off of every paragraph.
  • advanced search: Users can search the Federal Register (going back to 2000) on full text, by date, agency, and other fields.
  • customized feeds: Any search can be turned into an RSS feed. The resulting feed will include any new items that match the search query. Feeds can be delivered by email as well.

I think FedThread is a nice tool, but what's most amazing to me is that the whole project took only ten days to create. Ten days ago we had no code, no HTML, no plan, not even a block diagram on a whiteboard. Today we launched a pretty good service.

How was this possible? Three things enabled it.

First, government provided the necessary data, for bulk download, in a format (XML) that's easy for software to handle. This let us acquire and manipulate the underlying data (Federal Register contents) quickly. Folks at the Government Printing Office, National Archives and Records Administration, and Office of Science and Technology Policy all helped to make this possible. The roll-out of the government's XML-based Federal Register site today is a significant step forward.

Second, we had great tools, such as Linux, Apache, MySql, Python, Django, jQuery, Datejs, and lxml. These tools are capable, flexible, and free, and they fit together in useful ways. More than once we faced a challenging engineering problem, only to find an existing tool that did almost exactly what we needed. When we needed a tool for managing inline discussion threads within a document, Adrian Holovaty, Jacob Kaplan-Moss and Jack Slocum graciously let us use their code from djangobook.com, which served as the basis for our system. Tools like these help small teams build big projects quickly.

Third, we have a amazing team. A project like this needs people who are super-smart, tireless, have great engineering judgment, and know how to work as a team. Joe Calandrino, Ari Feldman, Harlan Yu, and Bill Zeller all did fantastic work building the site. We set an insane schedule -- at the start we guessed we had a 50% chance of having anything at all ready by today -- and they raced ahead of the schedule, to the point that we expanded the project's scope more than once. Great job, guys! Now please get some sleep.

We hope FedThread is a useful tool that brings more people into contact with the operations of their government -- one small step in a larger trend of using technology to make government more transparent.

Antisocial networking

Fri, 10/02/2009 - 6:06pm

I just got my invitation to Google Wave. The prototype that's now public doesn't have all of the amazing features in the original video demos. At this point, it's pretty much just a way of collecting IM-style conversations all in one place. But several of my friends are already there, and I've had a few conversations there already.

How am I supposed to know that there's something new going on at Wave? Right now, I need to keep a tab open in my browser and check in, every once in a while, to see what's up. Right now, my standard set of tabs includes my Gmail, calendar, RSS reader, New York Times homepage, Facebook page, and now Google Wave. Add in the occasional Twitter tab (or dedicated Twitter client, if I feel like running it) plus I'll occasionally have an IM window open. All of these things are competing for my attention when I'm supposed to be getting real work done.

A common way that people try to solve this problem is by building bridges between these services. If you use Twitter and Facebook, there are several ways to arrange for your tweets to show up at Facebook (bewildering Facebook users with all the #hashtags and @references) and there are also a handful of ways for getting data out of Facebook. I'd been using FriendFeed as a central hub for all this, but it would sometimes stop working for days at a time. Now that they've been bought out by Facebook, maybe this will shake itself out.

The bigger problem is that these various vendors and technologies have different data models for visibility and for how metadata is represented. In Twitter, everything is default-public, follow-up comments are first-class objects in the system, and there's effectively no metadata outside of the message, causing Twitter users to have adopted a variety of seemingly obscure conventions (e.g., "RT" to indicate a retweet of some other tweet). Contrast this with Facebook, where comments are a very different sort of message from the parent messages, where they have all sorts of security rules (that nobody really understands) about who can see what, and where there is actually structure to a message. If I link to a Youtube video, it gets magically embedded, versus the annoying URL shorteners that people have to use to shoehorn messages into Twitter.

Comments are a favorite area for people to complain. Twitter comments are often implicit with the @username tags. If I'm following a friend and a friend-of-my-friend comments on one of their tweets, I won't necessary see it. In Facebook, I have a better shot at seeing those comments. But what if I wrote a blog post here at Freedom to Tinker, which Facebook nicely picks it up and makes it look just like I posted a note on my Facebook page. Now we'll have comments on Freedom to Tinker and more comments inside Facebook which won't intermingle. Of course, thanks to FriendFeed, a tweet will (probably) be automatically generated when I post this, causing some small amount of Twitter commenting traffic, and there may be comments within FriendFeed itself as well as Google Reader commentary (which is also different from Google Reader's "share with note" commentary).

Given these disparate data models, there's no easy way to unify Twitter and Facebook, much less the commenting disaspora, even assuming you could sort out the security concerns and you could work around Facebook's tendency to want to restrict the flow of data out of its system. This is all the more frustrating because RSS completely solved the initial problem of distributing new blog posts in the blog universe. I used to keep a bunch of tabs open to various blog-like things that I followed, but that quickly proved unwieldy, whereas an RSS aggregator (Google Reader, for me) solved the problem nicely. Could there ever be a social network/microblogging aggregator?

There are no lack of standards-in-the-wings that would like to do this. (See, for example, OpenMicroBlogging, or our own work on BirdFeeder.) Something like Google Wave could subsume every one of these platforms, although I fear that integrating so many different data models would inevitably result in a deeply clunky UI.

In the end, I think the federation ideas behind Google Wave and BirdFeeder, and good old RSS blog feeds, will ultimately win out, with interoperability between the big vendors, just like they interoperate with email. Getting there, however, isn't going to happen easily.

Breaking Vanish: A Story of Security Research in Action

Tue, 09/29/2009 - 12:16pm

Today, seven colleagues and I released a new paper, "Defeating Vanish with Low-Cost Sybil Attacks Against Large DHTs". The paper's authors are Scott Wolchok (Michigan), Owen Hofmann (Texas), Nadia Heninger (Princeton), me, Alex Halderman (Michigan), Christopher Rossbach (Texas), Brent Waters (Texas), and Emmett Witchel (Texas).

Our paper is the next chapter in an interesting story about the making, breaking, and possible fixing of security systems.

The story started with a system called Vanish, designed by a team at the University of Washington (Roxana Geambasu, Yoshi Kohno, Amit Levy, and Hank Levy). Vanish tries to provide "vanishing data objects" (VDOs) that can be created at any time but will only be usable within a short time window (typically eight hours) after their creation. This is an unusual kind of security guarantee: the VDO can be read by anybody who sees it in the first eight hours, but after that period expires the VDO is supposed to be unrecoverable.

Vanish uses a clever design to do this. It takes your data and encrypts it, using a fresh random encryption key. It then splits the key into shares, so that a quorum of shares (say, seven out of ten shares) is required to reconstruct the key. It takes the shares and stores them at random locations in a giant worldwide system called the Vuze DHT. The Vuze DHT throws away items after eight hours. After that the shares are gone, so the key cannot be reconstructed, so the VDO cannot be decrypted -- at least in theory.

What is this Vuze DHT? It's a worldwide peer-to-peer network, containing a million or so computers, that was set up by Vuze, a company that uses the BitTorrent protocol to distribute (licensed) video content. Vuze needs a giant data store for its own purposes, to help peers find the videos they want, and this data store happens to be open so that Vanish can use it. The million-computer extent of the Vuze data store was important, because it gave the Vanish designers a big haystack in which to hide their needles.

Vanish debuted on July 20 with a splashy New York Times article. Reading the article, Alex Halderman and I realized that some of our past thinking about how to extract information from large distributed data structures might be applied to attack Vanish. Alex's student Scott Wolchok grabbed the project and started doing experiments to see how much information could be extracted from the Vuze DHT. If we could monitor Vuze and continuously record almost all of its contents, then we could build a Wayback Machine for Vuze that would let us decrypt VDOs that were supposedly expired, thereby defeating Vanish's security guarantees.

Scott's experiments progressed rapidly, and by early August we were pretty sure that we were close to demonstrating a break of Vanish. The Vanish authors were due to present their work in a few days, at the Usenix Security conference in Montreal, and we hoped to demonstrate a break by then. The question was whether Scott's already heroic sleep-deprived experimental odyssey would reach its destination in time.

We didn't want to ambush the Vanish authors with our break, so we took them aside at the conference and told them about our preliminary results. This led to some interesting technical discussions with the Vanish team about technical details of Vuze and Vanish, and about some alternative designs for Vuze and Vanish that might better resist attacks. We agreed to keep them up to date on any new results, so they could address the issue in their talk.

As it turned out, we didn't establish a break before the Vanish team's conference presentation, so they did not have to modify their presentation much, and Scott finally got to catch up on his sleep. Later, we realized that evidence to establish a break had actually been in our experimental logs before the Vanish talk, but we hadn't been clever enough to spot it at the time. Science is hard.

Some time later, I ran into my ex-student Brent Waters, who is now on the faculty at the University of Texas. I mentioned to Brent that Scott, Alex and I had been studying attacks on Vanish and we thought we were pretty close to making an attack work. Amazingly, Brent and some Texas colleagues (Owen Hoffman, Christopher Rossbach, and Emmett Witchel) had also been studying Vanish and had independently devised attacks that were pretty similar to what Scott, Alex, and I had.

We decided that it made sense to join up with the Texas team, work together on finishing and testing the attacks, and then write a joint paper. Nadia Heninger at Princeton did some valuable modeling to help us understand our experimental results, so we added her to the team.

Today we are releasing our joint paper. It describes our attacks and demonstrates that the attacks do indeed defeat Vanish. We have a working system that can decrypt Vanishing data objects (made with the original version of Vanish) after they are supposedly unrecoverable.

Our paper also discusses what went wrong in the original Vanish design. The people who designed Vanish are smart and experienced, but they obviously made some kind of mistake in their original work that led them to believe that Vanish was secure -- a belief that we now know is incorrect. Our paper talks about where we think the Vanish authors went wrong, and what security practitioners can learn from the Vanish experience so far.

Meanwhile, the Vanish authors went back to the drawing board and came up with a bunch of improvements to Vanish and Vuze that make our attacks much more expensive. They wrote their own paper about their experience with Vanish and their new modifications to it.

Where does this leave us?

For now, Vanish should be considered too risky to rely on. The standard for security is not "no currently demonstrated attacks", it is "strong evidence that the system resists all reasonable attacks". By updating Vanish to resist our attacks, the Vanish authors showed that their system is not a dead letter. But in my view they are still some distance from showing that Vanish is secure . Given the complexity of underlying technologies such as Vuze, I wouldn't be surprised if more attacks turn out to be possible. The latest version of Vanish might turn out to be sound, or to be unsound, or the whole approach might turn out to be flawed. It's too early to tell.

Vanish is an interesting approach to a real problem. Whether this approach will turn out to work is still an open question. It's good to explore this question -- and I'm glad that the Vanish authors and others are doing so. At this point, Vanish is of real scientific interest, but I wouldn't rely on it to secure my data.

[Update (Sept. 30, 2009): I rewrote the paragraphs describing our discussions with the Vanish team at the conference. The original version may have given the wrong impression about our intentions.]

Android Open Source Model Has a Short Circuit

Sat, 09/26/2009 - 3:10pm

Last year, Google entered the mobile phone market with a Linux-based mobile operating system. The company brought together device manufacturers and carriers in the Open Handset Alliance, explaining that, "Together we have developed Android™, the first complete, open, and free mobile platform." There has been considerable engagement from the open source developer community, as well as significant uptake from consumers. Android may have even been instrumental in motivating competing open platforms like LiMo. In addition to the underlying open source operating system, Google chose to package essential (but proprietary) applications with Android-based handsets. These applications include most of the things that make the handsets useful (including basic functions to sync with the data network). This two-tier system of rights has created a minor controversy.

A group of smart open source developers created a modified version of the Android+Apps package, called Cyanogen. It incorporated many useful and performance-enhancing updates to the Android OS, and included unchanged versions of the proprietary Apps. If Cyanogen hadn't included the Apps, the package would have been essentially useless, given that Google doesn't appear to provide a means to install the Apps on a device that has only a basic OS. As Cyanogen gained popularity, Google decided that it could no longer watch the project distribute their copyright-protected works. The lawyers at Google decided that they needed to send a Cease & Desist letter to the Cyanogen developer, which caused him to take the files off of his site and spurred backlash from the developer community.

Android represents a careful balance on the part of Google, in which the company seeks to foster open platforms but maintain control over its proprietary (but free) services. Google has stated as much, in response to the current debate. Android is an exciting alternative to the largely closed-source model that has dominated the mobile market to date. Google closely integrated their Apps with the operating system in a way that makes for a tremendously useful platform, but in doing so hampered the ability of third-party developers to fully contribute to the system. Perhaps the problem is simply that they did not choose the right location to draw the line between open vs. closed source -- or free-to-distribute vs. not.

The latter distinction might offer a way out of the conundrum. Google could certainly grant blanket rights to third-parties to redistribute unchanged versions of their Apps. This might compromise their ability to make certain business arrangements with carriers or handset providers in which they package the software for a fee. That may or may not be worth it from their business perspective, but they could have trouble making the claim that Android is a "complete, open, and free mobile platform" if they don't find a way to make it work for developers.

This all takes place in the context of a larger debate over the extent to which mobile platforms should be open -- voluntarily or via regulatory mandate. Google and Apple have been arguing via letters to the FCC about whether or not Apple should allow the Google Voice application in the iPhone App Store. However, it is yet to be determined whether the Commission has the jurisdiction and political will to do anything about the issue. There is a fascinating sideshow in that particular dispute, in which AT&T has made the very novel claim that Google Voice violates network neutrality (well, either that or common carriage -- they'll take whichever argument they can win). Google has replied. This is a topic for another day, but suffice to say the clear regulatory distinctions between telephone networks, broadband, and devices have become muddied.

(Cross-posted to Managing Miracles)

The Markey Net Neutrality Bill: Least Restrictive Network Management?

Fri, 09/25/2009 - 10:38am

It's an exciting time in the net neutrality debate. FCC Chairman Jules Genachowski's speech on Monday promised a new FCC proceeding that will aim to create a formal rule to replace the Commission's existing policy statement.

Meanwhile, net neutrality advocates in Congress are pondering new legislation for two reasons: First, there is a debate about whether the FCC currently has enough authority to enforce a net neutrality rule. Second, regardless of whether the Commission has such authority today or doesn't, some would rather see net neutrality rules etched into statute than leave them to the uncertainties of the rulemaking process under this and future Commissions.

One legislative proposal comes from Rep. Ed Markey and colleagues. Called the Internet Freedom Preservation Act of 2009, its current draft is available on the Free Press web site.

I favor the broad goals that motivate this bill -- an Internet that remains friendly to innovation and broadly available. But I personally believe the current draft of this bill would be a mistake, because it embodies a very optimistic view of the FCC's ability to wield regulatory authority and avoid regulatory capture, not only under the current administration but also over the long-run future. It puts a huge amount of statutory weight behind the vague-till-now idea of "reasonable network management" -- something that the FCC's policy statement (and many participants in the debate) have said ISPs should be permitted to do, but whose meaning remains unsettled. Indeed, Ed raised questions back in 2006 about just how hard it might be to decide what this phrase should mean.

The section of the Markey bill that would be labeled as section 12 (d) in statute says that a network management practice

. . . is a reasonable practice only if it furthers a critically important interest, is narrowly tailored to further that interest, and is the means of furthering that interest that is the least restrictive, least discriminatory, and least constricting of consumer choice available.

This language -- particularly the trio of "leasts" -- puts the FCC in a position to intervene if, in the Commission's judgment, any alternative course of action would have been better for consumers than the one an ISP actually took. Normally, to call something "reasonable" means that it is within the broad range of possibilities that might make sense to an imagined "reasonable person." This bill's definition of "reasonable" is very different, since on its terms there is no scope for discretion within reasonableness -- the single best option is the only one deemed reasonable by the statute.

The bill's language may sound familiar -- it is a modified form of the judicial "strict scrutiny" standard the courts use to review government action when the state uses a suspect classification (such as race) or burdens a fundamental right (such as free speech in certain contexts). In those cases, the question is whether or not a "compelling governmental interest" justifies the policy under review. Here, however, it's not totally clear whose interest, in what, must be compelling in order for a given network management practice to count as reasonable. We are discussing the actions of ISPs, who are generally public companies-- do their interests in profit maximization count as compelling? Shareholders certainly think so. What about their interests in R&D? Or, does the statute mean to single out the public's interest in the general goods outlined in section 12 (a), such as "protect[ing] the open and interconnected nature of broadband networks" ?

I fear the bill would spur a food fight among ISPs, each of whom could complain about what the others were doing. Such a battle would raise the probability that those ISPs with the most effective lobbying shops will prevail over those with the most attractive offerings for consumers, if and when the two diverge.

Why use the phrase "reasonable network management" to describe this exacting standard? I think the most likely answer is simply that many participants in the net neutrality debate use the phrase as a shorthand term for whatever should be allowed -- so that "reasonable" turns out to mean "permitted."

There is also an interesting secondary conversation to be had here about whether it's smart to bar in statue, as the Markey bill would, ". . .any offering that. . . prioritizes traffic over that of other such providers," which could be read to bar evenhanded offers of prioritized packet routing to any customer who wants to pay a premium, something many net neutrality advocates (including, e.g. Prof. Lessig) have said they think is fine.

My bottom line is that we ought to speak clearly. It might or might not make sense to let the FCC intervene whenever it finds ISPs' network management to be less than perfect (I think it would not, but recognize the question is debatable). But whatever its merits, a standard like that -- removing ISP discretion -- deserves a name of its own. Perhaps "least restrictive network management" ?

Cross-posted at the Yale ISP Blog.

Netflix's Impending (But Still Avoidable) Multi-Million Dollar Privacy Blunder

Mon, 09/21/2009 - 6:37pm

In my last post, I had promised to say more about my article on the limits of anonymization and the power of reidentification. Although I haven't said anything for a few weeks, others have, and I especially appreciate posts by Susannah Fox, Seth Schoen, and Nate Anderson. Not only have these people summarized my article well, they have also added a lot of insightful commentary, and I commend these three posts to you.

Today brings news relating to one of the central examples in my paper: Netflix has announced plans to commit a privacy blunder that could cost it millions of dollars in fines and civil damages.

In my article, I focus on Netflix's 2006 decision to release millions of records containing the movie rating preferences of "anonymized" users to the public, in order to fuel a crowd-sourcing competition called the Netflix Prize. The Netflix Prize has been a huge win for Netflix's public relations, but it has also been a win for academics, who have used the data to improve the science of guessing human behavior from past preferences.

The Netflix Prize was also a watershed event for reidentification research because Arvind Narayanan and Vitaly Shmatikov of U. Texas revealed that they could reidentify some of the "anonymized" users with ease, proving that we are more uniquely tied to our movie rating preferences than intuition would suggest. In my paper, I argue that we should worry about this privacy breach even if we don't think movie ratings are terribly sensitive, because it can be used to enable other, more terrifying privacy breaches.

I never argue, however, that Netflix deserves punishment or sanction for having released this data. In my opinion, Netflix acted pretty responsibly. It consulted with computer scientists in a (failed) attempt to anonymize successfully. It tried perturbing the data in order to make reidentification harder. And other experts seem to have been surprised by how easy it was for Narayanan and Shmatikov to reidentify. Even with the benefit of hindsight, I find nothing to blame in how Netflix handled the privacy implications of what it did.

Although I give Netflix a pass for its past privacy breach, I am astonished to learn from the New York Times that the company plans a second act:

The new contest is going to present the contestants with demographic and behavioral data, and they will be asked to model individuals’ “taste profiles,” the company said. The data set of more than 100 million entries will include information about renters’ ages, gender, ZIP codes, genre ratings and previously chosen movies. Unlike the first challenge, the contest will have no specific accuracy target. Instead, $500,000 will be awarded to the team in the lead after six months, and $500,000 to the leader after 18 months.

Netflix should cancel this new, irresponsible contest, which it has dubbed Netflix Prize 2. Researchers have known for more than a decade that gender plus ZIP code plus birthdate uniquely identifies a significant percentage of Americans (87% according to Latanya Sweeney's famous study.) True, Netflix plans to release age not birthdate, but simple arithmetic shows that for many people in the country, gender plus ZIP code plus age will narrow their private movie preferences down to at most a few hundred people. Netflix needs to understand the concept of "information entropy": even if it is not revealing information tied to a single person, it is revealing information tied to so few that we should consider this a privacy breach.

I have no doubt that researchers will be able to use the techniques of Narayanan and Shmatikov, together with databases revealing sex, zip code, and age, to tie many people directly to these supposedly anonymized new records.

Because of this, if it releases the data, Netflix might be breaking the law. The Video Privacy Protection Act (VPPA), 18 USC 2710 prohibits a "video tape service provider" (a broadly defined term) from revealing "personally identifiable information" about its customers. Aggrieved customers can sue providers under the VPPA and courts can order "not less than $2500" in damages for each violation. If somebody brings a class action lawsuit under this statute, Netflix might face millions of dollars in damages.

Additionally, the FTC might also decide to fine Netflix for violating its privacy policy as an unfair business practice.

Either a lawsuit under the VPPA or an FTC investigation would turn, in large part, on one sentence in Netflix's privacy policy: "We may also disclose and otherwise use, on an anonymous basis, movie ratings, consumption habits, commentary, reviews and other non-personal information about customers." If sued or investigated, Netflix will surely argue that its acts are immunized by the policy, because the data is disclosed "on an anonymous basis." While this argument might have carried the day in 2006, before Narayanan and Shmatikov conducted their study, the argument is much weaker in 2009, now that Netflix has many reasons to know better, including in part, my paper and the publicity surrounding it. A weak argument is made even weaker if Netflix includes the kind of data--ZIP code, age, and gender--that we have known for over a decade fails to anonymize.

The good news is Netflix has time to avoid this multi-million dollar privacy blunder. As far as I can tell, the Netflix Prize 2 has not yet been launched.

Dear Netflix executives: Don't do this to your customers, and don't do this to your shareholders. Cancel the Netflix Prize 2, while you still have the chance.