Hardware, The parts of a computer that can be kicked. - Jeff Pesis
The second option was to work with PHP's socket interface, via the
fsockopen() function, and hand coding any type of request that was more advanced than a simple HTTP GET request. While this worked adequately, it was slow, simply because sockets and socket I/O within PHP is slow, and, perhaps more importantly, it required that you re-invent the wheel. When dealing with Internet protocols, re-inventing the wheel is probably the worst thing you can do, as the underlying protocols are subject to constant change. Further, many different vendors have many different implementations and implementation bugs, so that even basic code can become a large tangled mess.
Finally, the third option was to fork an external process, and manipulate the output of the command. Tools such as
wget(1) and
lynx(1) have existed for a very long time, and were frequently used to facilitate URL downloads with PHP. While these tools were good alternatives to writing the code yourself, they introduced the problem of having to fork a process for every PHP request, as well as introducing extra I/O by requiring the user to read from either a file or a pipe. While this may be fine for smaller scale applications, when the load starts increasing, it's a largely unnecessary overhead that could easily be overcome.
Enter cURL. cURL is both a command line program, akin to
wget(1), and a feature rich library that offers a simple API for "grokking" all different sorts of internet related protocols, from the essentials like
HTTP and
FTP, to lesser used protocols like
DICT and
GOPHER. cURL brings a new meaning to feature rich. Here is a list of some of the different protocols and options that cURL supports:
- HTTP, including full HTTP GET, POST and PUT support, HTTP proxy support, Cookies, Redirects, Gzip encoding, Chunked data transfers, HTTP file uploads and HTTP over SSL (https).
- FTP, including support for FTP downloads and uploads, Username/password authentication, Resumable transfers, HTTP proxy and HTTP proxy tunneling, Kerberos, and FTP over SSL (ftps).
- LDAP, with full LDAP URL support, allowing you to query LDAP servers for contacts, etc. and retrieve the results from the server.
- DICT, GOPHER, FILE and many more features!
Not only is cURL feature rich, but it's very fast. Written in highly optimized C code, cURL is tuned at a low-level to offer performance that no PHP based solution can match. This is both in regards to the way that C handles sockets, and C's socket interface (as compared to PHP's), and can also be attributed to the fact that PHP's string handling is not as optimized for parsing as the string handling of C is. Unofficial benchmarks comparing cURL to PHP's native implementation (using the Sockets extension), show cURL as being more than
three to four times as fast as the PHP-grown solution.
cURL also works on every major platform, whether it be VMS, OS/2, Windows or one of the UNICES, such as Linux, BSD or Solaris. It is written in compiler portable ANSI C, and uses the
autoconf and
automake toolset in order to abstract the nitty-gritty implementation details that occur when dealing with multiple operating systems. Even when using PHP socket code (i.e. the Sockets extension), you don't have the same level of portability that cURL offers because PHP has a much less advanced socket layer and socket abstraction mechanism than cURL.
Finally, cURL is actively developed. As mentioned early, Internet protocols are constantly changing, constantly acquiring new implementation bugs that need to be worked around, and constantly needing a little tweaking. Why do all this work yourself? When you use cURL you get to leverage the work of a team of developers who are well versed in these protocols, and are constantly working to make cURL faster, more stable and ensure that it works with all of the different Internet protocols and protocol implementations possible.
Installation
Installing cURL and PHP is normally quite an easy process, consisting of the standard
./configure,
make,
make install. However, there a few catches that you need to be aware of when using cURL and PHP, in this section we'll cover the process of installing cURL and PHP on a machine by compiling both from their sources. The following section is UNIX specific, on Win32 systems you can use the bundled
php_curl.dll file.
The first step is to download both of CURL and PHP from their respective sites,
curl.haxx.se/ and www.php.net/. Make sure to get the latest version of cURL, as it's most likely to be the most stable, and feature rich version to date (all cURL releases are very stable, it's rare that a new bug is introduced). Once you have downloaded cURL, un-tar and un-compress the cURL download (something like
tar xvfj curl-.tar.bz2). Then change into the newly unpacked directory (something like
cd curl-).
The next step is to run the configure script, and configure cURL with the options and protocols that you want/need. If you want to compile cURL with LDAP or SSL support, you need to also install the
OpenSSL and
OpenLDAP libraries for cURL to work properly with these protocols. Once
configure finishes, you must then type
make and
make install (the latter command should be run with super user privileges). This will compile and install cURL with the values specified in the
configure command, with the default paths being
/usr/local/lib,
/usr/local/bin,
/usr/local/include and
/usr/local/man.
Now that cURL is installed, you need to go into your PHP source directory (this article assumes that you have installed/compiled PHP before), and re-run
configure with the normal parameters, except that you must also add,
--with-curl[=PATH] where
PATH is the base cURL installation directory (in the default case it would be
/usr/local). Then you can proceed normally and PHP will be installed with cURL support. Please note that if you wish to install cURL extension as a shared library, you need to add the
--enable-shared[=curl] command to your configure line. Once you have installed cURL and PHP together, continue reading this article and try out some of the sample examples in the following sections!
Listing 1 <?php
$url = "http://www.sourceforge.net/";
$file = "sf.index.html";
$ch = curl_init ($url);
$fp = fopen ($file, "w") or
die("Unable to open $file for writing.\n");
curl_setopt ($ch, CURLOPT_FILE, $fp);
curl_setopt ($ch, CURLOPT_FAILONERROR, true);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
if (!curl_exec ($ch)) {
print("Unable to fetch $url.\n");
}
curl_close ($ch);
fclose ($fp);
?>
A Simple Example
The example in Listing 1 shows how you can use the basic functionality of
cURL to download a webpage and write it to a file. It starts by defining two variables:
$url which contains the URL we want to fetch, and
$file, a variable specifying where the contents of
$url will be placed. We initialize a new cURL session by calling the
curl_init() function with the
$url argument, which represents the URL that we want to fetch (the
SourceForge homepage).
curl_init() then returns a new cURL handle into the
$ch variable, setting up a context for all further operations.
Next we open the destination file,
$file,
with write permissions. Then we associate the file handle,
$fp,
with our cURL handle,
$ch, by calling the
curl_setopt() function with the
CURLOPT_FILE argument. Note: please make sure that the user who will execute this script has write permissions on
$file (and write permissions for the directory in which you create
$file). If you are calling this script from your web browser then the user will most likely be your web server (www-data, nobody, apache, or similar, depending on the set up of your server).
The
CURLOPT_FAILONERROR option is set to true which allows us to check whether or not the
curl_exec() function has successfully fetched the URL. If the HTTP return code of the operation is equal to or larger then 300, then the library will fail silently and
curl_exec() will return false. If this option is not set, cURL will simply place the HTTP error message and HTTP header into the file specified via the
CURLOPT_FILE option.
SourceForge uses the HTTP
Location header, which redirects web browsers from dummy URLs (or invalid URLs) to the URL where actual content resides. Therefore we must specify the
CURLOPT_FOLLOWLOCATION option, which instructs cURL to follow
Location headers to reach the document content.
Once we have set all necessary options, it is time to call the
curl_exec() function, which will take your cURL session handle (
$ch) and perform the transfer, using all relevant options. We then check to make sure the transfer succeeded, and if so, we close the cURL session and the output file with
curl_close() and
fclose().
Finally it is time to execute the script. If everything works you should not get any message during the execution of the script and once the script has finished execution you should find a file named
sf.index.html in your current working directory.
sf.index.html should contain Sourceforge's index page. If an error occurred and cURL was not able to fetch the URL's content you should get an appropriate message. This message won't tell you what exactly went wrong, but we'll get into debugging with cURL a bit later in the article.
Workflow
Now that we have seen a simple example of how to use cURL, we will take a closer look at the single steps and the commands involved. The basic idea behind the cURL functions is that you initialize a cURL session using the
curl_init(), then you can set all your options via the
curl_setopt() function, initiate the transfer via the
curl_exec() and then you finish off your session using the
curl_close().
Initialization
resource curl_init ([string url])
The
curl_init() initializes a new session and returns a cURL handle for use with all of the other
cURL functions. The most important of these being the
curl_setopt(),
curl_exec(), and
curl_close() functions. The
url parameter is optional; when it is supplied, internally the
CURLOPT_URL option will be set to the argument value. Otherwise this option can be set manually using the
curl_setopt() function.
Transfer Options
int curl_setopt (resource ch, int option, mixed value)
cURL understands over one hundred option settings. Transfer options for a cURL sessions can be set with the command
curl_setopt(), which accepts three parameters: the cURL resource handle (
ch, initialized via
curl_init()), the option identifier (i.e.
CURLOPT_FILE or
CURLOPT_FAILONERROR), and the value that should be assigned to the aforementioned option. In the above example we have set three options:
CURLOPT_FILE,
CURLOPT_FAILONERROR and
CURLOPT_FOLLOWLOCATION.
Transfer Data
int curl_exec (resource ch)
Once all desired options are set,
curl_exec() is called with the
cURL session handle as its only parameter. cURL then tries to fetch the URL (specified by either the
CURLOPT_URL option, or as an argument to the
curl_init() function), obeying the options that were set via the
curl_setopt() function. If any processing callbacks have been specified (i.e. a write callback, or a password callback) these callbacks will be called during the execution of
curl_exec().
Destroy resources
int curl_close (resource ch)
After
curl_exec() has finished execution, the
curl_close() function should be called. It expects the cURL resource handle as its only parameter, and it frees the allocated session resources. Also, remember to close all associated resources (such as file handles, or sockets), as they will no longer be needed by cURL. This is more a question of a good practice, rather than a necessity, simply due to the fact that PHP 4 introduces the concept of garbage collection, and will automatically cleanup resources when they are no longer referenced in a script.
cURL and Proxies
cURL can handle proxies in two ways: it can either silently convert all operations to HTTP, or it can tunnel them given through an HTTP proxy.
The option
CURLOPT_PROXY lets you define which proxy server and port to use. If any of the environment variables
http_proxy, ftp_proxy or
all_proxy are set, cURL will automatically recognize and respect them. The following example will fetch freshmeat.net's latest news feed via a proxy server that runs on
localhost port
8118 and print out the contents:
<?php
$ch = curl_init ("http://freshmeat.net/backend/fm.rdf");
curl_setopt ($ch, CURLOPT_PROXY, "http://127.0.0.1:8118");
curl_exec ($ch);
curl_close ($ch);
?>
If your proxy server requires authentication, you can set the
username and
password with the
CURLOPT_PROXYUSERPWD option. Currently cURL works only with Basic Authentication (NTLM and Digest based authentication are planned for the near future). User information is given in a string of the format
[username]:[password]. Taking the example from above and extending it use proxy authentication is as simple as:
<?php
$ch = curl_init ("http://freshmeat.net/backend/fm.rdf");
curl_setopt ($ch, CURLOPT_PROXY, "http://127.0.0.1:8118");
curl_setopt ($ch, CURLOPT_PROXYUSERPWD, "login:secretpass");
curl_exec ($ch);
curl_close ($ch);
?>
If you want to tunnel all operations through the given proxy server instead of transparently converting them to HTTP, set the
CURLOPT_HTTPPROXYTUNNEL option to a non-zero value, e.g.:
curl_setopt ($ch, CURLOPT_HTTPPROXYTUNNEL, 1);
cURL and SSL
One of the reasons that cURL is so popular with PHP users is due to its excellent support of SSL. Before PHP version 4.3, and Wez Furlong's excellent work on
Streams, the only way to interface with a remote server via SSL was to use the cURL extension. Even now that SSL is implemented through streams, cURL still supports a much larger subset of SSL features, including:
- SSL Certificates
- SSL Private Key Authentication
- Remote Certificate Verification
- Multiple SSL encryption mechanisms
- Specialized Random Number Generation
As mentioned earlier in this article, if you want to use cURL with SSL support, you need to use the OpenSSL library. You can whether the two run successfully together either by running
curl-config --libs | grep ssl, or by checking the output of
phpinfo().
Listing 2 <?php
$ch = curl_init ("https://www.verisign.com/");
if (!$ch) {
die ("Cannot initialize a new cURL handle\n");
}
$data = curl_exec ($ch);
// Data is returned on success, error code on failure
if (is_int ($data)) {
die ("cURL error: " . curl_error ($ch) . "\n");
}
print ($data);
curl_close ($ch);
?>
Basic Usage
While cURL does have very advanced support for SSL, it only makes the hard things possible, and doesn't bother making the easy things complex. To use cURL with a normal SSL URL, simply place a HTTPS URL in place of a HTTP URL. Take, for example, Listing 2. As you can see, the only thing related to the SSL transfer is simply specifying an
HTTPS URL, which tells cURL that this is indeed a SSL transaction, and it goes ahead and handles the necessary dirty work for you.
Listing 3 <?php
$url = ".."; // The URL to use certificate authentication with
$certfile = ".."; // The certificate to use for authentication
$ch = curl_init ($url);
if (!$ch) {
die ("Cannot open a new cURL handle");
}
curl_setopt ($ch, CURLOPT_SSLCERT, $certfile);
$code = curl_exec ($ch);
if ($code != CURLE_OK) {
die ("cURL error: " . curl_error ($ch) . "\n");
}
curl_close ($ch);
?>
SSL Certificates
SSL certificates are akin to the concept of SSH keys, they allow for a secure authentication method without needing to use username and password authentication with remote servers. When you use SSL certificates with cURL, you must use the
CURLOPT_SSLCERT option to specify the SSL certificate you want to authenticate with the remote server. Have a look at the example in Listing 3. To use it, simply change the
$url variable to the location of the URL you wish to fetch, and place the certificate you want use for authentication into
$certfile.
They are a few different methods of specifying SSL certificates, each with their own merits (beyond the scope of this article). In order to provide you with maximum flexibility cURL allows you to specify the certificate type, via the
CURLOPT_SSLCERTTYPE option. Currently only PEM and DER certificates are supported (it defaults to PEM):
curl_setopt ($ch, CURLOPT_SSLCERTTYPE, "DER");
SSL Key Authentication
Digital certificates allow for a host-based trust metric system, however, only SSL keys allow for true user-based authentication in SSL. cURL offers full support for SSL key based authentication with the
CURLOPT_SSLKEY option, and a few other helper options. A basic example of using cURL with SSL keys, is presented in Listing 4.
Listing 4 <?php
$url = "https://.../"; // URL you want to fetch
$keyfile = "privkey.pem"; // your private key file
$certfile = "sslcert.pem"; // SSL certificate
$passwd = "****"; // password for private key
$ch = curl_init ($url);
if (!$ch) {
die ("Couldn't Initialize cURL handle\n");
}
curl_setopt ($ch, CURLOPT_SSLCERT, $certfile);
curl_setopt ($ch, CURLOPT_SSLKEY, $keyfile);
curl_setopt ($ch, CURLOPT_SSLKEYPASSWD, $passwd);
$code = curl_exec ($ch);
if ($code != CURLE_OK) {
die ("cURL error: " . curl_error ($ch) . "\n");
}
curl_close ($ch);
?>
When using SSL private keys it is important that you use a passkey - a password which is associated with (and required in order to decrypt) your private key. Passkeys add the 4th level of security to your encrypted password, making it nearly impossible for the encryption to be cracked (especially if you have a good passphrase!)
If you are using SSL private keys and want to specify an alternate method of encryption you can use a combination of the
CURLOPT_SSLKEYTYPE and
CURLOPT_SSL_ENGINE options. The
CURLOPT_SSLKEYTYPE option can either be set to:
PEM (default),
DER, or
ENG:
curl_setopt ($ch, CURLOPT_SSLKEYTYPE, "DER");
If you set the
CURLOPT_SSLKEYTYPE option to
ENG, then you must specify an alternate SSL processing engine with the
CURLOPT_SSL_ENGINE option. The string value passed to this option will be used as the Engine identifier within OpenSSL.
Peer Certificate Verification
cURL can perform validation of a remote SSL certificate against a local SSL certificate via the
CURLOPT_SSL_VERIFYPEER and
CURLOPT_CAINFO (or
CURLOPT_CAPATH) options as shown in Listing 5.
Listing 5 <?php
$url = "https://.../"; // url to fetch
$criterium = "local.pem";
$ch = curl_init ($url);
if (!$ch) {
die ("Cannot initialize a new cURL handle\n");
}
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt ($ch, CURLOPT_CAINFO, $criterium);
$code = curl_exec ($ch);
if ($code != CURLE_OK) {
die ("cURL error: " . curl_error ($ch) . "\n");
}
curl_close ($ch);
?>
The first step is to enable peer verification by setting the
CURLOPT_SSL_VERIFYPEER option to true. Once you have told cURL that peer verification will be performed, you must then set a certificate file for it to authenticate the remote host against. The
CURLOPT_CAINFO option is used to specify the certificate file that we will authenticate the remote server against. If you want to specify multiple certificates to authorize the remote server against, use the
CURLOPT_CAPATH option, and specify a directory where all of your certificate files reside:
curl_setopt ($ch, CURLOPT_CAPATH, "/path/to/certs/");
Hostname Validation
cURL (unlike certain Microsoft based browsers) allows you to perform hostname validation, which allows you to make sure that the host you are requesting is the same host that is sending you the SSL certificate. While this is a bit slower than normal SSL operation, if you need a secure transaction (online payment, for example), it is essential that you enable hostname validation when using cURL and PHP.
To use hostname validation, simply set the
CURLOPT_SSL_VERIFYHOST, to the level of certificate verification that you wish to perform. If you want
cURL to only verify that a hostname exists, pass a value of 1. If you want
cURL to check that the supplied common name also matches the provided hostname, pass a value of 2.
curl_setopt ($ch, CURLOPT_SSL_VERIFYHOST, 2);
Conclusion
Developing web applications is becoming more and more about interoperability. Talking to other people and accessing information from other sources. The ability to not only serve HTTP documents, but also to speak HTTP is becoming increasingly critical. When it comes to accessing remote documents, there is nothing better than PHP's cURL extension.
Links and Literature