xiven.com stating the blatantly obvious since 2002

User Authentication on the World Wide Web

[19:48:04] <draq> Xiven: can authentication only be done with cookies
[19:48:15] <draq> I have decided to disable cookies
⋮
[19:55:30] <draq> why not write a blog entry with the pros and cons ? :)
[19:55:36] * draq nudges Xiven 
[19:56:02] <Xiven> guess that'd give me something to write about

Thus it begins:

Abstract

Many websites when looking to provide a means for a user to “log in”, make use of cookies: small files stored on your computer by your web browser that can contain any information the website chooses to store.

Many people have heard of cookies; some people consider cookies to be a privacy concern; a few people disable cookies altogether because of these concerns. Are these concerns valid? What alternatives are there to the use of cookies for user authentication? What are the pros and cons of these methods? This short article will attempt to answer these questions.

This article is not intended to contain an in-depth discussion of cookies themselves. If you are looking for more information I strongly recommend that you look at The Unofficial Cookie FAQ which is an excellent resource on this subject.

This article may on occasion tend towards dealing with Apache HTTP Server and the PHP web application language. Similar conclusions may be brought about when using other technologies, but this may not always be the case.

Baking a cookie

What is a cookie? In this context, a cookie is a small piece of information that is stored on your computer and is linked to a particular website. Every time you request a page from that website, your browser sends any cookies it has for that site along with its request. By default, your browser stores any cookie that a website tells it to store.

A cookie is not magical: a website cannot store information about you in a cookie that you do not give it. For example, it cannot store your e-mail address unless you have given your e-mail address to that website in the first place.

Big Brother

The main privacy concern about cookies is that they can be used to keep track of a person's browsing habits. Although a website can only ever read a cookie that it itself has set, if a website contains an inline item (such as an image) that comes from a different server, then that server can set a cookie too. Therefore if a banner advert site (such as oooh I don't know... ad.doubleclick.net) has adverts placed on various sites, then it can keep track of all of those sites that you visit. These are known as “3rd party" cookies and most modern browsers allow their users to disable these altogether (though this can break some sites).

These concerns aside though, there are those who dislike the fact that a cookie stores any information it chooses.

[16:27:32] <draq> It's like, who are these guys?
[16:27:39] <draq> Why is my site sending them?
[16:27:44] <draq> Why are there so many?

Alternatives

The problem that cookies were introduced to overcome is that the HyperText Transfer Protocol is a stateless protocol. That is, each time you request a page from a web server, it is as if you have never requested a page from that site before. The site cannot remember the fact that it was you who asked for a page from it 10 seconds ago. It has no way of associating a request with a particular unique individual.

In essence, the server “forgets” everything after each request, unless it can somehow mark a visitor (that is, hand him a “laundry ticket”) to help it remember. Cookies can accomplish this.

So, are there any alternatives to cookies for the purpose of “Session Identification”?

  1. HTTP Authentication

    As my earlier work on implementing HTTP Digest Authentication in PHP has shown me, there is another way for a website to associate a series of requests with a particular user.

    HTTP Authentication is detailed in RFC 2617. There are 2 kinds, “Basic” and “Digest”. The difference is that Basic sends the username and password across the net in plain text whereas Digest uses some clever one-way encryption techniques (eg. MD5 or SHA1) to prove that both client and server are exactly who they say they are without actually sending the password at all (in fact the server doesn't even need to store the password itself).

  2. URI Rewriting

    The popular web application programming language “PHP” has a feature called "Transparent Session ID" which it can use if cookies are disabled to allow sessions to continue to work. It works by rewriting the webpage to append the Session ID to any local URIs.

    For example, the link <a href="foo/bar.php"> might become <a href="foo/bar.php?PHPSESSID=45af781b75d45ce49761f0ce">

  3. Form posting

    I include this method because it's what Voidwars uses. This is not an endorsement of this method though.

    The way it works is every time you click a “link” in the page, instead of just directly loading a page, it fills some parameters in a form and then submits this form to the server. The form also contains the user's session identifier. Essentially this is a POST version of URI Rewriting. It typically requires JavaScript to be enabled (though you could also achieve the same effect by having each link actually being a “submit” <button>).

So, different methods exist. The question is, do any of them actually alleviate the privacy concerns of certain individuals, and do they have other advantages or disadvantages compared to cookies?

Analysis

Cookies

Ease of implementation

Fairly easy to use: most web application languages have built-in support for cookies. Some even have automatic “Session” support, making things even easier. The site will need a login form somewhere for the user to fill in their data.

User experience

The site author can fit the login form neatly into their site's layout. Users normally expect login forms of this kind.

Privacy

Many dislike cookies in general due to privacy concerns with their other uses (eg. tracking browsing habits).

Security

Unless on a secure (SSL) server, cookies are sent and stored in plain text. It is entirely up to the author of the website to encrypt sensitive data.

Other issues of note

Some privacy concerns can be alleviated by disabling 3rd party cookies or allowing only session cookies (session cookies are deleted once you close the browser window). Many browsers allow sites to be added to a “cookie blacklist”, preventing such sites from ever storing a cookie. Additionally the user can choose to make the browser prompt them every time a site wishes to store a cookie (though with the number of sites storing cookies this can be exceptionally annoying).

HTTP Authentication

Ease of implementation

In its simplest form it is fairly easy to use. With Apache HTTP Server the site author can create a file that contains a list of users that are allowed into a certain directory. Apache then deals with it itself using Basic Authentication. Additionally, with the module mod_digest, Digest Authentication can be used in the same way.

Integrating such authentication with web applications is a little trickier though. Basic Authentication is relatively easy, but Digest Authentication is pretty hard because Apache hides the HTTP headers from the CGI handler that are needed. There is a way around this in PHP but it only works if PHP is run as an Apache module.

User experience

Best illustrated with an example:


						An example of an HTTP authentication dialog box: a small window appears asking the user to “Enter username and password for "example application" at http://www.example.com” with a username and password field to fill in and “OK” and “Cancel” buttons.

The above dialog appears in the middle of the screen preventing the user from proceeding until they deal with it, either by filling in their username and password and clicking “OK” or by clicking “Cancel”. The user can also choose to save the password for future access to the site.

Privacy

The user must always fill in this dialog box before proceeding, therefore privacy concerns with the technology itself are minimal.

Security

Basic Authentication is highly insecure unless it is used on a secure server (SSL). Passwords are sent in plain text.

Digest Authentication is much more secure and is therefore best used when security of the password is considered important. Note that is not the world's answer to web security, the only thing it protects is the password, all other data is open for view. Also potentially vulnerable to “Man in the middle” attacks.

Other issues of note

Certain browsers deal with HTTP Authentication slightly differently (dialogs look different etc.). Internet Explorer has a number of bugs in its Digest Authentication implementation.

URI Rewriting

Ease of implementation

Certain web application languages do it for you transparently (eg. PHP, which calls it “Transparent Session ID”). Otherwise though, it's quite a lot of work to implement.

User experience

The user may notice that the page they are visiting has a weird-looking address, but other than that is unlikely to notice anything different from typical cookie logins.

Privacy

The main privacy concerns are related to its insecurity (see below).

Security

There are major security concerns with this method. Sites often keep track of referrers (ie. the website you came from to reach their site). If you follow a link from a site using this method to a site that tracks referrers, the session identifier will appear in their logs. This could then be easily used to hijack the session. Additionally a user can inadvertently allow other to access their session if they decide to post the address as a link on another site.

Other issues of note

URI Rewriting can quite easily break a site's validity if, for example, the website author is using a host who has the feature enabled by default.

Form Posting

Ease of implementation

Very hard to implement.

User experience

Creates a very non-standard experience for the user (this is a bad thing). With Voidwars, we only use it because the game itself is hardly a standard use of HTML. We really shouldn't be using HTML for it at all, but there is no currently standardised language that is truly suited for the purpose of running a multiplayer online game.

Privacy

Could potentially suffer from the same privacy concerns as cookies, but without the user having a way of controlling it.

Security

Same security issues as cookies.

Other issues of note

Don't do it. Ever.

Other methods

[21:09:28] <draq> Xiven: are you writing still? can't you "log" without cookies?
couldn't apache do with some patches to track sessions more appropriately through
UA string IP and stuff
[21:10:11] <Xiven> are you insane?

Put simply it won't work. User agent strings aren't unique and neither are IP addresses. Also, rotating proxies mean that each sequential request could come from a different IP address each time.

Summary

Method Pros Cons
Cookies Fairly easy to implement, allows customised login page Some users disable them due to privacy concerns, poor security unless combined with SSL
HTTP Auth (Basic) Works when cookies are disabled, easy to implement, fewer privacy issues Unfriendly user interface, poor security unless combined with SSL
HTTP Auth (Digest) Works when cookies are disabled, improved security, fewer privacy issues Unfriendly user interface, hard to implement
URI Rewriting Works when cookies are disabled, fairly easy to implement Highly insecure, rewriting can potentially break website validity
Form posting Works when cookies are disabled Very tricky to implement, poor security unless combined with SSL

Conclusions for the user

Cookies are used by 99.9% of sites out there (and 42.7% of all statistics are made up on the spot). If you disable them you severely limit your ability to use a large number of sites (including online shopping sites etc.). Disabling 3rd-party cookies is probably a sensible measure if you are concerned about the privacy issues, as is the use of cookie blacklists and/or whitelists or allowing only session cookies.

Conclusions for web authors

Despite the controversy, cookies are probably the best option for most web authors due to good and relatively consistent support by user agents, as well as by web application languages. If however, you just need to password protect a directory from the general public, HTTP Authentication may be more suitable for you (Basic or Digest, the choice is yours). If you're looking for a challenge, by all means go ahead and implement Digest Authentication in the web application language of your choice.

Edited on 2004-07-24 19:02 UTC: added mention of session cookies (thanks for the reminder Kamakaze)

Edited on 2004-08-01 23:27 UTC: see comments for further important considerations

Posted: 2004-07-24 18:16:32 UTC by Xiven | Cross-references (1) | Comments (7)

Comments

  • Kai Hendry (2004-07-25 09:13:19 UTC)


    I have noticed at least in Mozilla since I have been careful about cookies, there is no option to deny all. When I visit a news site like the Guardian it is HELL. :(

    Couldn't a UA give a unique idenitier? Based on the profile? Like Mozilla's SALT. Be imaginative. I am sure it could be implemented or specced.

  • Henri Sivonen (2004-08-01 17:18:29 UTC)

    The main reason for not using HTTP-level authentication is the lack of logout in browsers. Many webmasters are worried that people who use their account from a kiosk leave their credentials for the next guy to use.

  • Xiven (Registered) (2004-08-01 23:23:23 UTC)

    Ah yes you make a good point, I forgot about that too. The lack of a logout feature is a major flaw IMHO and is something that really should have been addressed in the HTTP spec. Example: an "HTTP/1.1 418 Discontinue Authentication" code.

  • Isaac Schlueter (2004-08-02 02:06:24 UTC)

    As is often the case, I think the best solution is in fact a combination of solutions.

    I am building an application which provides clients with access to sensitive data. Here's what we're doing:
    0. The site is accessed via https://... (SSL)
    1. Password protect the directory with HTTP authentication (.htpasswd). We have one username/password for each client, and each user at the client's office uses the same pair.
    2. The application then requires the user to log in with their unique Username, Client ID, and Password. (Each client ID can only access data with a matching client ID, and this is checked against the clientID used to get through the first gate.)
    3. The Username, Client ID, and MD5'ed password are stored as domain- and directory-specific session cookies.
    4. Every time there's a successful login, the IP Address, UA string, the page visited, and time are recorded to a database. (Only once in a given 3-hour period, or else its ridiculous.)
    5. The logs are analyzed by a human (the last and greatest anti-hacking system) who runs reports looking for irregularities. These reports can be accessed by the client. We're working on something that will analyze the logs on a cron-job, and send out alerts when it finds "strange" things. (IP Addresses not matching the norm, strange times of day, etc.)

    We were going to use Sessions, but they are difficult to implement in a clean user-oriented fashion, and they wouldn't add any security. (In fact, they would reduce the security level. And, IMNSHO, php's "Transparent Sessions" are anything but transparent, especially if you're doing anything at all interesting with the URIs.)

    All of this really isn't *that* tricky to do. It just takes some careful planning. A combination of methods, along with a firm "You need cookies to access this site" mentatlity, can create a very secure environment.

  • Kamakaze (2004-08-03 14:52:06 UTC)

    Well You can turn off PHP's transient ID thing for sessions, and simply use cookies, you could implement session ID's and set cookies manually but I fail to see the point when there is session support.

    Disabling non-session cookies means most of the concerns about them with regards to pages tracking who you are are gone, and if you don't use SSL then your site isn't designed to accept sensitive information. So long as people don't use the same passwords for random messageboards as they do for their bank account there are no serious problems (unfortunately I expect people do =)

  • Anonymous (2005-11-26 20:14:45 UTC)

    could someone tell me how to log into the msn web account using http digest authentication via the www web.

  • Sivakumar (2006-07-14 12:56:42 UTC)

    Hi,

    Useful information. I wish if one of the reader here or the author himself could help me giving a real quick solution for the below issue, I have some pages in my site which are restricted with htaccess, hence if any user try to access will be asked to enter the username/password. Same time I have some partner website who already has some pages with htaccess or other type of authentication, now if the cross link the pages in my site which are protected with htaccess is poping up the dialog box once again to my client site visitor. But I wanted a solution for selectively skiping the htaccess process for requests. Since my clients website visitors are from different IP address i'm really not able to allow them using IP. Again with HTTP_REFERER variable also i'm sick.

    Siva