CS 261 Homework 2

Instructions

This problem set is due Friday, November 2, at 2pm.

Work on your own for this homework. You may use any source you like (including other papers or textbooks), but if you use any source not discussed in class, you must cite it.

Question 1

Someone recently pointed out to me that Facebook and a few other websites have a rather fascinating feature in their login process: if you inadvertently have caps-lock turned on when entering your password, it will still log you in. For instance, if my password is h0rsebattery, then the system will also log me in if I enter H0RSEBATTERY as my password (but not H0rSeBaTTerY). This got me wondering how they implemented this feature and whether this feature is a security risk.

If Facebook stored each user's password in the clear in their database, it would be easy for them to provide this feature. They could check a password entered into their website against both the stored password and a capslocked-version of the stored password. However, storing passwords in the clear is a major security no-no. Fortunately, Facebook doesn't do this. (The conventional defense is to store a hash of the password instead of the password P itself. For example, a website might store the hash h(P), where h(.) is a cryptographic one-way hash function. When the user enters a password P', the system hashes it and then compares h(P') against the stored hash. However, with this conventional approach, if you have capslock on when you enter your password, the hashes won't match.)

My next thought was that maybe Facebook has some Javascript running in your browser that tests whether the caps-lock is enabled and, if so, undoes the effect of the capslock key. However, it turns out that this is not the case.

It also occurred to me that maybe Facebook is looking at the username, and if the username is in all capitals, they un-capslock the password before checking it against the hashed version. However, a quick test confirmed that this is not what they are doing: I can still log on with my username in all lowercase but with the capslock enabled when I type my password.

So, this is a bit of a puzzler. It raises an obvious question: is it possible to implement this feature, without major loss of security, and subject to the above constraints (no client-side Javascript, no uppercase/lowercase checks on the username, no passwords stored in the clear in persistent storage anywhere)? More precisely, please answer two variations of this question, for two different scenarios:

On Windows and Linux, capslock reverses the case of all letters. Thus, clOwN becomes CLoWn if you type it with capslock enabled. Can you provide the feature described above for Windows/Linux users without major loss of security? If yes, explain how, and quantify how much security is lost compared to the traditional method of storing the password in hashed form. If no, explain why not.
On Macs, capslock upper-cases all letters. Thus, clOwN becomes CLOWN if you type it with capslock enabled. Can you provide the feature described above for Mac users without major loss of security? If yes, explain how, and quantify how much security is lost compared to the traditional method of storing the password in hashed form. If no, explain why not.

Question 2

Steve proposes the following modified procedure for authenticating users on the web, to make password guessing attacks more expensive. When the user creates an account, they specify a username and password, just like is ordinarily done today. The primary difference comes in the login procedure.

To log in, the user and the website engage in a protocol, which works via the following sequence of steps:

Step 1. The user enters their username into the page. This is immediately sent to the website.
Step 2. The website looks up the information about this user in the database. If this user exists and has at most 9 consecutive failed login attempts, and if at most 49 other users share the same password as this user, the web page shows a password entry box. If this user had 10 or more consecutive failed login attempts, or if there are 50 or more other users who share the same password as this user, or if this user does not exist (the username is invalid), the web page shows a password entry box and also displays a CAPTCHA and a textbox where the user is prompted to enter in the text shown in the CAPTCHA.
Step 3. The user enters in their password and, if prompted to do so, the solution to the CAPTCHA. This is sent to the website.
Step 4. The website checks the user's information for validity. If it is all valid, the website logs the user in, displays a welcome message, and sets the number of consecutive failed login attempts for this user to zero. If any of the submitted information is invalid, the site displays an error message, does not log the user in, and increments the number of consecutive failed login attempts for this user.

To support this protocol, the site stores a row in its database for each user, with the following information: the username, a hashed version of the user's password (with the same salt for all users), and a count of the number of consecutive failed login attempts (initialized to zero when the account is created or upon successful login).

I want you to analyze the security of this protocol under two different scenarios, but first, let me give you some background assumptions.

First, you can assume that the attacker can crack CAPTCHAs at the cost of $0.002 per CAPTCHA solved. (There is an underground market that will solve CAPTCHAs for you, and $2 per 1000 CAPTCHAs solved appears to be approximately the going rate.)

Second, you can assume that the site has ten million users. You can assume that users choose their passwords as following: 1% of users choose a random password with 10 bits of entropy (i.e., choose a password uniformly at random from a set of 1024 candidates), and 99% of users choose a random password with 20 bits of entropy (i.e., choose uniformly at random from a set of size 2²⁰). These sets are publicly known. (This model is based upon recent academic research; it is admittedly a massive simplification of user behavior, but appears to be roughly in the right ballpark.)

We can consider two different kinds of sites:

The site is a closed site: the attacker does not have the ability to create an account on the site.
The site is an open site: the attacker can create a new account on the site. (Creating a new account does not require solving a CAPTCHA, email verification, or any other limit; just sign up via a webform.)

We can also consider two different kinds of attack goals:

Targeted attack: the attacker knows a specific username, and wants to find the password for that user.
Untargeted attack: the attacker just wants to find the password for a single account on the site; the attacker doesn't care whose account he cracks.

In all scenarios, assume the list of all usernames is publicly visible (e.g., maybe this is a web forum and each post comes with the user's username, and the attacker can crawl all posts).

In each of the four situations below, describe the lowest-cost attack you can find, estimate the cost of your attack, and estimate about how many requests the attacker has to make to the website in your attack. Your attack needs to have at least a 50% chance of success. By cost, I mean the amount of money the attacker has to spend to solve the CAPTCHAs. (Assume the attacker is not willing to solve CAPTCHAs himself; he uses the underground market to solve all CAPTCHAs. Assuming that bandwidth is free, so sending requests to the website does not cost anything.)

Targeted attack on a closed site.
Untargeted attack on a closed site.
Targeted attack on an open site.
Untargeted attack on an open site.

Question 3

I wrote a setuid-root program that requires you to enter a secret password you can use the program's functionality. The code looks something like this:

int matches(char *s, char *t) {
    while (1) {
        if (*s != *t)
            return 0;
        if (*s == '\0')
            break;
        s++; t++;
    }
    return 1;
}

int main() {
    char pass[160], *rv, realpass[160];
    FILE *f;

    /* Read secret password from the file. */
    f = fopen("/etc/secretpassword", "r");
    rv = fgets(realpass, sizeof(realpass), f);
    if (!rv)
        exit(1);

    /* Prompt user for their password. */
    printf("Password: ");
    fflush(stdout);
    rv = fgets(pass, sizeof(pass), stdin);
    if (!rv)
        exit(1);
    printf("Thank you.\n");
    fflush(stdout);
    
    /* Check password. */
    if (!matches(pass, realpass)) {
        printf("Incorrect password.\n");
        fflush(stdout);
        exit(1);
    }

    /* Password was correct.  Invoke the real functionality... */
    go();
}

My friend Angela points out that a malicious user can invoke this program, enter a guess at the password, and time how long the matches() function takes, by measuring the time from when the program prints Thank you to when it prints Incorrect password. (Good catch, Angela!)

Let's see if it's possible to use Angela's observation to work out the password. In particular, consider the following two scenarios. In both scenarios, the secret password contains a random sequence of lowercase letters, uppercase letters, and digits and is 20 characters long.

Suppose each iteration of the loop takes 10 nanoseconds. Assume you can measure elapsed time to within an accuracy of +/- 1 nanosecond. Describe an attack that recovers the secret password, using Angela's observation. Estimate about how many guesses your attack will need, on average, to recover the password. Explain how you got this estimate.
On a different machine, each iteration of the loop still takes 10 nanoseconds, but measurements of the elapsed time are noisier. In particular, during the ith execution of this program, the measured time Y_i (measured in nanoseconds) depends upon the number X_i of iterations of the loop, like this:
Y_i = 10 X_i + E_i
where E_i represents some random noise due to inaccuracies in the measurement In particular, E_i is a random variable with a normal distribution with mean 0 and standard deviation 1000 nanoseconds. Assume E_i is random and independent of everything else, and is different on each execution of the program (it has the same distribution, but there is an independent random variable per execution of the program). You can only observe Y_i (you cannot observe X_i or E_i directly). Is it possible to recover the secret password on this machine? If yes, describe an attack that recovers the secret password and estimate about how many guesses your attack will need, on average, to recover the password. If no, explain why not.

Question 4

You've been hired to design a protocol for a garage-door opener. There are two endpoints: the remote fob (a small battery-powered gadget that you put in your car), and the garage-door lifter (a device that is installed inside your garage and can open your garage door when you prompt it to do so). The remote and the lifter communicate by radio. When the user presses a button on their remote, it sends a signal to the lifter to open the garage door, so you can drive your car in.

You've been asked to design a secure protocol. We don't want random strangers to find some way to spoof a user's remote and get into their (locked) garage.

Unfortunately, there are strict engineering constraints imposed by cost and power limitations. The remote can store a 128-bit crypto key K in persistent storage, and we can assume the same crypto key can be initialized into the lifter's persistent storage as well. You can store up to 32 bits of additional information in persistent storage in the remote if you want, but that's it: no more. The remote can perform symmetric-key cryptographic operations, but public-key cryptography would cost too much and use up too much battery power, so you are not allowed to use any public-key scheme. The remote has a radio transmitter, but no receiver. Also, using the radio eats up a lot of battery power, so when the user presses a button on the remote, your protocol should require the remote to send no more than 64 bits of information over the radio waves. When the user presses the button on the remote, the remote can do a little bit of computation and send up to 64 bits of information, but to save power, the rest of the time it will be powered down and cannot perform any computation or remember any state (beyond the 128-bit key K and whatever is stored in the 32 bits of persistent storage). The remote does not have access to a clock, GPS, or random-number generator.

The lifter is plugged into AC power all the time, and it is always listening for something sent over the radio. When it receives a message, it can perform any reasonable amount of computation and store any reasonable amount of information in persistent storage that it wishes.

Describe a protocol to solve this problem. Specify what happens when the user presses a button on the remote: (a) what computation is performed?, (b) what is sent over the air? Make sure that it's clear how long each value that you send over the radio is, in bits. Also, specify what the lifter does when it receives a message over the radio: (c) what computation is performed?, (d) how does it decide whether to raise the garage door or not?
Analyze the security of your scheme.
Now let's add a twist. Unfortunately, the radio link is not 100% reliable. Occasionally, a bit of noise may cause a transmission to be lost. If our crypto protocol caused users to become permanently locked out of their garage and caused their remote to stop working from there on when this happens, they'd be really annoyed and they'd stop buying our products. Describe how to modify your protocol so that the user doesn't get permanently locked out of their garage if a transmission is lost or garbled.