![]() |
|
URL of the article:
Issue:
02.2004
The Truth about Sessions
Session Management Exposed
by Chris Shiflett
Nearly every PHP application uses sessions. This article takes a detailed look at implementing a secure session management mechanism with PHP. Following a fundamental introduction to the Web's underlying architecture, the challenge of maintaining state, and the basic operation and intent of cookies, I will step you through some simple and effective methods that can be used to increase the security and reliability of your stateful PHP applications.
It is a common misconception that PHP provides a certain level of security with its native session management features. On the contrary, PHP simply provides a convenient mechanism. It is up to the developer to provide the complete solution, and as you will see, there is no one solution that is best for everyone. Statelessness Hypertext Transfer Protocol (HTTP), the protocol that powers the Web, is a stateless protocol. This is because there is nothing within the protocol that requires the browser to identify itself during each request, and there is also no established connection between the browser and the Web server that persists from one page to the next. When a user visits a Web site, the user's browser sends an HTTP request to a Web server, which in turn sends an HTTP response in reply. This is the extent of the communication, and it represents a complete HTTP transaction. Because the Web relies on HTTP for communication, maintaining state in a Web application can be particularly challenging for developers. Cookies are an extension of HTTP that were introduced to help provide stateful HTTP transactions, but privacy concerns have prompted many users to disable support for cookies. State information can be passed in the URL, but accidental disclosure of this information poses serious security risks. In fact, the very nature of maintaining state requires that the client identify itself, yet the security-conscious among us know that we should never trust information sent by the client. Despite all of this, there are elegant solutions to the problem of maintaining state. There is no perfect solution, of course, nor is there any one solution that can satisfy everyone's needs. This article introduces some techniques that can reliably provide statefulness as well as defend against session-based attacks such as impersonation (session hijacking). Along the way, you will learn how cookies really work, what PHP sessions do, and what is required to hijack a session. HTTP Overview In order to appreciate the challenge of maintaining state as well as choose the best solution for your needs, it is important to understand a little bit about the underlying architecture of the Web - the Hypertext Transfer Protocol (HTTP). A visit to http://www.example.org/ requires the Web browser to send an HTTP request to www.example.org on port 80. The syntax of the request is similar to the following: GET / HTTP/1.1 Host: www.example.org The first line is called the request line, and the second parameter (a slash in this example) is the path to the resource being requested. The slash represents the document root; the Web server translates the document root to a specific path in the filesystem. Apache users might be familiar with setting this path with the DocumentRoot directive. If http://www.example.org/path/to/script.php is requested, the path to the resource given in the request is /path/to/script.php. If the document root is defined to be /usr/local/apache/htdocs, the complete path to the resource that the Web server uses is /usr/local/apache/htdocs/path/to/script.php. The second line illustrates the syntax of an HTTP header. The header in this case is Host, and it identifies the domain name of the host from which the browser intends to be requesting a resource. This header is required by HTTP/1.1 and helps to provide a mechanism to support virtual hosting, multiple domains being served by a single IP address (often a single server). There are many other optional headers that can be included in the request, and you may be familiar with referencing these in your PHP code; examples include $_SERVER['HTTP_REFERER'] and $_SERVER['HTTP_USER_AGENT']. Of particular note, in this example request, is that there is nothing within it that can be used to uniquely identify the client. Some developers resort to information gathered from TCP/IP (such as the IP address) for unique identification, but this approach has many problems. Most notably, a single user can potentially use a different IP address for each request (as is the case with AOL users), and multiple users can potentially use the same IP address (as is the case in many computer labs using an HTTP proxy). These situations can cause a single user to appear to be many, or many users to appear to be one. For any reliable and secure method of providing state, only information obtained from HTTP can be used. The first step in maintaining state is to somehow uniquely identify each client. Because the only reliable information that can be used for such identification must come from the HTTP request, there needs to be something within the request that can be used for unique identification. There are a few ways to do this, but the solution designed to solve this particular problem is the cookie. Cookies The realization that there must be a method of uniquely identifying clients has resulted in cookies, a fairly creative solution. Cookies are easiest to understand if you consider them to be an extension of the HTTP protocol, which is precisely what they are. Cookies are defined by RFC 2965, although the original specification written by Netscape (http://wp.netscape.com/newsref/std/cookie_spec.html) more closely resembles industry support. There are two HTTP headers that are necessary to implement cookies, Set-Cookie and Cookie. A Web server includes a Set-Cookie header in a response to request that the browser include this cookie in future requests. A compliant browser that has cookies enabled includes the Cookie header in all subsequent requests (that satisfy the conditions defined in the Set-Cookie header) until the cookie is expired. A typical scenario consists of two transactions (four HTTP messages):
![]() Figure 1: A typical Cookie exchange The addition of the Cookie header in the client's second request (Step 3) provides information that the server can use to uniquely identify the client. It is also at this point that the server (or a server-side PHP script) can determine whether the user has cookies enabled. Although the user can choose to disable cookies, it is fairly safe to assume that the user's preference will not change while interacting with your application. This fact can prove to be very useful, as will soon be demonstrated. GET and POST Data There are two additional methods that a client can use to send data to a server, and these methods predate cookies. A client can include information in the URL being requested, whether in the query string or simply the path, although the latter case requires specific programming that is not covered in this article. As an example of utilizing the query string, consider the following example request: GET /index.php?foo=bar HTTP/1.1 Host: www.example.org The receiving script, index.php, can reference $_GET['foo'] to get the value of foo. Because of this, most PHP developers refer to this data as GET data (others sometimes refer to it as query data or URL variables). One common point of confusion is that GET data can exist in a POST request, because it is simply part of the URL being requested and not reliant on the actual request method. The other method that a client can use to send information is by utilizing the content portion of an HTTP request. This technique requires that the request method be POST, and an example of such a request is as follows: POST /index.php HTTP/1.1 Host: www.example.org Content-Type: application/<br></br> x-www-form-urlencoded Content-Length: 7 foo=bar In this case, the receiving script (index.php) can reference $_POST['foo'] to get the value of foo. PHP developers typically refer to this data as POST data, and this is how a browser passes data submitted from a form where method=post. A request can potentially have both types of data, like this: POST /index.php?getvar=foo HTTP/1.1 Host: www.example.org Content-Type: application/<br></br> x-www-form-urlencoded Content-Length: 11 postvar=bar These two additional methods of sending data in a request can provide substitutes for cookies. Unlike cookies, GET and POST data support is not optional, so these methods can also be more reliable. Consider a unique identifier called PHPSESSID included in the request URL as follows: GET /index.php?PHPSESSID=12345 HTTP/1.1 Host: www.example.org This achieves the same goal as the Cookie header, because the client identifies itself, but it is much less automatic for the developer. Once a cookie is set, it is the browser's responsibility to return it in subsequent requests. To propagate the unique identifier through the URL, the developer must ensure that all links, form submission buttons and the like, contain the appropriate query string (PHP can help with this, however, if you enable the PHP directive session.use_trans_sid). In addition, GET data is displayed in the URL and is much more exposed than a cookie. In fact, unsuspecting users might bookmark such a URL and send it to a friend or do any number of things that can accidentally reveal the unique identifier. Although POST data is less likely to be exposed, propagating the unique identifier as a POST variable requires that all user requests are POST requests. This is typically not a convenient option, although your application design might make it more viable. Session Management Until now, I have been discussing state. This is a rather low-level detail that involves associating one HTTP transaction with another. The more useful feature that you are likely to be familiar with is session management. Session management not only relies on the ability to maintain state, but it also requires that you maintain client data for each user session. This data is more commonly called session data, because it is associated with a specific user session. If you use PHP's built-in session management mechanism, session data is maintained for you (in /tmp by default) and available in the $_SESSION superglobal. A simple example of using sessions involves the persistence of session data from one page to the next. Listing 1, which presents the session_start.php script, demonstrates how this can be done. Listing 1 <?php session_start(); $_SESSION['foo'] = 'bar'; ?> <a href="session_continue.php">session_continue.php</a> Assuming the user clicks the link in session_start.php, the receiving script (session_continue.php) will be able to access the same session variable, $_SESSION['foo']. This is detailed in Listing 2. Listing 2 <?php session_start(); echo $_SESSION['foo']; /* bar */ ?> Serious security risks exist when you write code, similar to the above, without understanding what PHP is doing for you. Without this knowledge, you will find it difficult to debug session errors or provide any reasonably safe level of security. Impersonation It is a common misconception that PHP's native session management mechanism provides safeguards against common session-based attacks. On the contrary, PHP simply provides a convenient mechanism. It is the developer's responsibility to provide the appropriate safeguards for security. As mentioned previously, there is no perfect solution, nor a best solution that is right for everyone. To explain the risk of impersonation, consider the following series of events:
Figure 2: An Impersonation attack Of course, this scenario requires that Bad Guy somehow discovers or guesses the valid PHPSESSID that belongs to Good Guy. While this may seem unlikely, it is an example of security through obscurity and is not something that should be relied upon. Obscurity isn't a bad thing, of course, and it can help, but there needs to be something more substantial in place that offers reliable protection against such an attack. Preventing Impersonation There are many techniques that can be used to complicate impersonation or other session-based attacks. The general approach is to make things as convenient as possible for your legitimate users and as complicated as possible for the attackers. This can be a very challenging balance to achieve, and the perfect balance largely depends on the application design. So you are ultimately the best judge. The simplest valid HTTP/1.1 request, as mentioned earlier, consists of a request line and the Host header: GET / HTTP/1.1 Host: www.example.org If the client is passing the session identifier as PHPSESSID, this can be passed in a Cookie header as follows: GET / HTTP/1.1 Host: www.example.org Cookie: PHPSESSID=12345 Alternatively, the client can pass the session identifier in the request URL: GET /?PHPSESSID=12345 HTTP/1.1 Host: www.example.org The session identifier can also be included as POST data, but this typically involves a less friendly user experience and is the least popular approach. Because information gathered from TCP/IP cannot be reliably used to help strengthen the security of the mechanism, it seems that there is little that a Web developer can do to complicate impersonation. After all, an attacker must only provide the same unique identifier that a legitimate user would in order to impersonate that user and hijack the session. Thus, it would appear that the only protection is to either keep the session identifier hidden or to make it difficult to guess (preferably both). PHP generates a random session identifier that is practically impossible to guess, so this concern is already mitigated. Preventing the attacker from discovering a valid session identifier is much more difficult, because much of this responsibility lies outside of the developer's realm of control. There are many situations that can result in the exposure of a user's session identifier. GET data can be mistakenly cached, observed by an onlooker, bookmarked, or e----mailed. Cookies provide a somewhat safer mechanism, but users can disable support for cookies, ruling out the possibility of using them, and past browser vulnerabilities have been known to accidentally leak cookie information to unauthorized sites (see http://www.peacefire.org/security/iecookies/ and http://www.solutions.fi/iebug/ for more information). Thus, a developer can be fairly certain that a session identifier cannot be guessed, but the possibility that it can be revealed to an attacker is much more likely, regardless of the method used to propagate it. Something additional is needed to help prevent impersonation. In practice, a typical HTTP request includes many optional headers in addition to Host. For example, consider the following request: GET / HTTP/1.1 Host: www.example.org Cookie: PHPSESSID=12345 User-Agent: Mozilla/5.0 Galeon/1.2.6 (X11; <br></br> Linux i686; U;) Gecko/20020916 Accept: text/html;q=0.9, */*;q=0.1 Accept-Charset: ISO-8859-1, utf-8;q=0.66, <br></br> *;q=0.66 Accept-Language: en This example includes four optional headers, User-Agent, Accept, Accept-Charset, and Accept-Language. Because these headers are optional, it is not very wise to rely on their presence. However, if a user's browser does send these headers, is it safe to assume that they will be present in subsequent requests from the same browser? The answer is yes, with very few exceptions. Assuming that the previous example is a request sent from a current user with an active session, consider the following request sent shortly thereafter: GET / HTTP/1.1 Host: www.example.org Cookie: PHPSESSID=12345 User-Agent: Mozilla/5.0 (compatible; IE 6.0 <br></br> Microsoft Windows XP) Because the same unique identifier is being presented, the same PHP session will be accessed. If the browser is identifying itself differently than noted in previous interactions, should it be assumed that this is the same user? It is hopefully clear to you that this is not desirable, yet this is exactly what happens if you do not write code that specifically checks for such situations. Even in cases where you cannot be sure that the request is an impersonation attack, simply prompting the user for a password can help prevent impersonation without adversely affecting your users too much. You can add user agent checking to your security model with code similar to that in Listing 3. Listing 3 <?php session_start(); if (md5($_SERVER['HTTP_USER_AGENT']) != <br></br> $_SESSION['HTTP_USER_AGENT']) { /* Prompt for password */ exit; } /* Rest of code */ ?> Of course, you will need to first store the MD5 digest of the user agent whenever you first begin a session, as shown in Listing 4. Listing 4 <?php session_start(); $_SESSION['HTTP_USER_AGENT'] = md5($_SERVER['HTTP_USER_AGENT']); ?> While it is not necessary that you use the MD5 digest instead of the entire user agent, it helps provide consistency and eliminates the necessity to validate $_SERVER['HTTP_USER_AGENT'] before storing it in the session. Because this data originates from the client, it should not be blindly trusted, but the format of an MD5 digest is guaranteed, regardless of the input data. Now that we have added user agent checking, an attacker must complete two steps in order to hijack a session:
Other headers can be added in this way, and you can even use a combination of headers as a fingerprint. If you also include some secret padding of some sort, this fingerprint becomes practically impossible to guess. Consider the example in Listing 5. Listing 5 <?php session_start(); $fingerprint = 'SECRETSTUFF' . <br></br> $_SERVER['HTTP_USER_AGENT'] . <br></br> $_SERVER['HTTP_ACCEPT_CHARSET']; $_SESSION['fingerprint'] = md5($fingerprint <br></br> . session_id()); ?> The Accept header should not be used in the fingerprint, because Microsoft's Internet Explorer is known to vary the value of this header when the user refreshes as opposed to clicking on a link. With a fingerprint that is difficult to guess, little is gained without leveraging this information in an additional way than demonstrated thus far. With the existing mechanism, there are still basically two steps required for impersonation, although the second step is more complicated now that the attacker has to reproduce multiple headers. To add increased security, it is necessary to begin including data in addition to the unique identifier. Consider a session management mechanism where the unique identifier is propagated as GET data. If the fingerprint generated in the previous example is also propagated as GET data, an attacker must complete the following three steps to successfully hijack a session:
There are many more techniques that can be used to help strengthen the security of your session management mechanism. Hopefully you are well on your way to creating some techniques of your own. After all, you are the expert of your own applications, so armed with a good understanding of sessions, you are the best person to implement some added security. Obscurity I would like to dispel a common myth about obscurity. The myth is that there is no security through obscurity. As mentioned previously, obscurity is not something that offers adequate protection, nor should it be relied upon. However, this does not mean that there is absolutely no security that can be provided through obscurity. On the contrary, backed by an already secure session management mechanism, obscurity can offer a small degree of additional security. Simply using misleading variable names for the unique identifier and fingerprint can help. You can also propagate decoy data to mislead a potential attacker. These techniques certainly should never be relied upon for protection, of course, but you will not waste your time by implementing a bit of obscurity in your own mechanism. For those who do not have a basic understanding of session security, it is probably best to support the myth about obscurity, else someone might be mislead into believing that it provides a sufficient level of protection. Summary I hope that you have gained several things from this article. Notably, you should now have a basic understanding of how the Web works, how statefulness is achieved, what a cookie really is, how PHP sessions work, and some techniques that you can use to improve the security of your sessions. If you have any questions or comments, my contact information is available on my Web site at http://shiflett.org/; alternatively, you could also post your feedback on this article at the PHP Magazine forum at http://forum.php-mag.net/. I would love to hear about your own solutions for secure session management, and I hope that this article provides the background information that you need to support your own creativity. Links and Literature There are many more resources available on this topic. A few notable ones freely available on the Web are as follows:
|
||
|