Magazine PDF Issue Conference Forum Software & Support Verlag











from PHP Magazine - International Edition Issue: 03.2003

Using the Streams API (Part II)

What's hot about the Streams API in PHP 4.3.
Wez Furlong

Following on from the first part of the article in Issue 2, this part shows you how to implement your own wrapped streams for custom protocols that can be used to access a variety of data sources via a regular fopen() call, implement your own virtual filesystem and tie it into regular PHP functions such as opendir(), readdir() and stat().

Wrappers
Last time we looked at implementing a stream in your extension. If you recall, a stream is an object that exhibits streamable behaviour - you can read or write chunks of data to/from the stream. Wrappers are another (optional) layer on top of these stream objects - some wrappers define their own streams, while others encapsulate (or wrap around) existing stream objects. It is a common mistake to assume that a stream is a wrapper or vice versa. A stream is a stream, a wrapper complements and enhances a stream.

What do wrappers do? Wrappers define a number of attributes for a particular protocol scheme, such as http:// or ftp://, and provide the streams API with a means to open a stream on a resource using that protocol. Additionally wrappers allow you to implement a virtual filesystem by providing hooks for readdir()and stat().

Figures 1 and 2 illustrate some of the differences between a "real" wrapper (one that re-uses an existing generic stream) and a specialized wrapper (one that provides its own private stream implementation). In the latter case, the wrapper is only really present so that the zlib streams can be opened using fopen().


Fig. 1: A Real Wrapper


Fig. 2: A Specialized Stream + wrapper

A Basic Wrapper
Depending on what you learnt from the last article, you should be able to create your own extension which has a function that returns a stream representing a file. In this article we are going to implement a "home://" protocol that will open files from the current users home directory. This implementation will only work when run from the command line (for the sake of simplicity); it is left as an exercise for the reader to add the appropriate POSIX or Windows API calls to make it work correctly in all cases.

PHP includes a full-featured file stream implementation, so rather than continue working with our basic example, we are going to re-use the plain file stream component. This gives us the advantage of not having to think about writing this code again (and dealing with all the platform specific problems involved), and it also means that we can start from scratch in this article. So if you missed the last article, you won't feel too much in the dark.

First, let's generate a skeleton extension for this article - we can use the ext_skel tool to do this:

cd php-4.3.x/ext
./ext_skel --extname=mystreams2
cd mystreams2
# edit config.m4 and add this line near the top
PHP_ARG_ENABLE(mystreams2, mystreams2 sample code,
[--enable-mystreams2 myStreams2 sample code])

You can either use phpize to build the extension as a self-contained extension, or re-run the buildconf script in the top of the PHP source tree to add your extension to the PHP core. The choice is yours.

Listing 1 shows the source for the myfile_open() function that we wrote in the last article (based on the function from Zeev's article from Issue 1).

Listing 1: myfile_open

PHP_FUNCTION(myfile_open)
{
char *filename = NULL;
char *mode = NULL;
int argc = ZEND_NUM_ARGS();
int filename_len;
int mode_len;
FILE *fp;
php_stream *stream;

if (zend_parse_parameters(argc TSRMLS_CC,
"s|s", &filename, &filename_len, &mode,
&mode_len) == FAILURE) {
RETURN_NULL();
}

if (!mode) {
/* Assume mode is read-only if missing */
mode = "r";
}

fp = VCWD_FOPEN(filename, mode);

if (!fp) {
RETURN_NULL();
}

/* We opened the file. Now we need to create
* a stream structure that will use our stream
* operations on the file pointer.
* This next function call will register the
* stream as a resource for us.
*/
stream = php_stream_alloc(&myfile_stream_ops,
fp, 0, mode);

/* Now we need to return the stream to the
* script */
php_stream_to_zval(stream, return_value);

/* Increment the number of open_files */
MYFILE_G(open_files)++;
}

What this code does is:
  • Fetches the name of the file to open,
  • Fetches the mode to use to open the file (defaulting to read-only),
  • Opens a file handle,
  • Allocates a stream,
  • Sets up the stream as the return value, and
  • Increments a global variable.
We don't need the global variable to count the number of opened files anymore in this example, so it beeen dropped from the function. Our wrapper implementation needs to do the same tasks, although we need to declare the function slightly differently:

php_stream *mystreams2_wrapper_opener(
php_stream_wrapper *wrapper,
char *path, char *mode, int options,
char **opened_path,
php_stream_context *context
STREAMS_DC TSRMLS_DC)
{
.... <i>Your code goes here</i> ....
}

As you can see, the function declaration is a little bit more intimidating, although the good news is that you can generally ignore most of the parameters. I'll briefly explain what they do now before moving on with the code.

The wrapper parameter is a pointer to the wrapper object being used to open the stream; its purpose is to allow your code to store/retrieve wrapper specific data. We won't be using it in this article, so we can ignore it.
The path and mode parameters were obtained using the zend_parse_parameters() call in our original function. We will be using these to open the actual file.
options is used to pass flags all the way from the php_stream_open_wrapper() call. It holds information about whether you should respect safe_mode/open_basedir, if you should raise errors or keep silent, and if you should search the include_path to locate the file.
opened_path is used to return to the caller the path of the actual file that was opened. This is only really important if the caller indicated that the include_path should be searched. Here, we can ignore this parameter.
context is used to carry additional contextual information, such as notification callbacks, http header preferences and so forth. We won't be using any of these either, so we can also ignore this parameter.
STREAMS_DC and TSRMLS_DC are some magical parameters that exist to make streams memory allocation tracking and thread safety work nicely, currently this is only really important on Windows. You can ignore them, but must always include them in your function definition!

Now that we know that we can ignore most of the parameters, we can write the code. Remember, our custom "protocol" is designed to open files that can be found in your home directory, so we first determine the home directory by inspecting the environmental variable HOME, and concatenate it with the requested path. Listing 2 is an implementation of the wrapper opener function - add it to your source file.

Listing 2: mystreams2_wrapper_opener

php_stream *mystreams2_wrapper_opener(
php_stream_wrapper *wrapper,
char *path, char *mode, int options,
char **opened_path,
php_stream_context *context
STREAMS_DC TSRMLS_DC)
{
php_stream *stream;
char *real_path = NULL;
char *home_dir;

home_dir = getenv("HOME");
if (home_dir == NULL) {
/* problem; no home dir can be determined */
php_stream_wrapper_log_error(wrapper,
options TSRMLS_CC,
"Could not find home dir");
return NULL;
}

/* build the path relative to home; this
* emallocs() a string for us.
* path includes the home:// part, which we
* need to skip */
spprintf(&real_path, 0, "%s%c%s", home_dir,
PHP_DIR_SEPARATOR,
path + sizeof("home://")-1);
/* open the file */
stream = php_stream_fopen(real_path, mode,
opened_path);

/* release resources */
efree(real_path);

return stream;
}

This looks roughly similar to what we had before; the important things to note here are:
  • We are using php_stream_wrapper_log_error() to record/log errors. The function correctly interprets the REPORT_ERRORS flag that is set in the options parameter and either outputs the error or saves it for later reporting. This is important with wrappers as a number of individual errors might occur while attempting to open the resource; streams can group them together in a single error event for the error handler.
  • We are using PHP_DIR_SEPARATOR to determine the correct kind of "slash" character to use when building up the filename to open. PHP_DIR_SEPARATOR is defined as '/' under UNIX-style systems and '\\' under Windows.
  • We are using php_stream_fopen() to return a file stream, rather than implementing our own. This is always a good thing to do because the PHP-provided file stream implements a number of advanced streams features which are designed to improve performance, and it knows how to work around a number of platform-specific quirks.
Apart from these points, the code is doing the same things as before. All we need to do now is glue this code into the streams API so that the fopen() function will recognize it. To do this we need to declare two structures and add a function call to your extension MINIT and MSHUTDOWN functions. Add the code from Listing 3 to your source file.

Listing 3: wrapper registration structures

php_stream_wrapper_ops mystreams2_wops = {
mystreams2_wrapper_opener,
NULL, /* stream_close */
NULL, /* stream_stat */
NULL, /* url_stat */
NULL, /* opendir */
"home:// dir"
};

php_stream_wrapper mystreams2_wrapper = {
&mystreams2_wops,
NULL, /* abstract */
0 /* is_url */
};

You can change your MINIT_FUNCTION and MSHUTDOWN_FUNCTION to look like that shown in Listing 4.

Listing 4: wrapper registration functions

PHP_MINIT_FUNCTION(mystreams2)
{
return php_register_url_stream_wrapper("home",
&mystreams2_wrapper TSRMLS_CC);
}

PHP_MSHUTDOWN_FUNCTION(mystreams2)
{
php_unregister_url_stream_wrapper("home" TSRMLS_CC);
return SUCCESS;
}

What was that stuff? - Wrapper Registration
We added a few strange looking things quite quickly just now - what are they? php_stream_wrapper_ops and php_stream_wrapper are the structures that define a particular class of wrapper. If you read my first article, or feel like reading the php_stream.h header file, you may notice that this is similar to the way that streams are structured. The first structure defines the wrapper operations (or functions, if you prefer), while the second declares an instance of a wrapper and links it to the appropriate wrapper ops.

As you can see from the comments, so far our wrapper only implements the wrapper opener function, with the others NULL-ed out. If we wanted to release some resources when our wrapped stream is shutdown, we could implement the stream_close operation. Similarly, we can override a number of filesystem related functions (which we will cover shortly).

In the php_stream_wrapper structure you will notice an abstract field and an is_url field. The is_url field is used to distinguish between wrappers that connect to the network and those that operate locally. Opening a wrapper marked with is_url set to 1 will be prohibited if the allow_url_fopen ini setting is disabled by the administrator.

The abstract field is used to hold instance data for dynamic wrappers - we have declared a static wrapper here, as will most extensions, so we have set abstract to NULL. However, it is worth noting that you can create wrappers and register them dynamically at runtime - this is the technique used by the stream_register_wrapper() PHP function. When creating a dynamic wrapper in this way, you will quite often want to associate some data with the wrapper; the abstract field provides you with a place to store a pointer to that data.

Once declared, we need to register our wrapper with the Streams runtime, so that fopen() and stream_open_wrapper() will know that our wrapper should be called. This is achieved by calling php_register_url_stream_wrapper() and passing the protocol's name (without ://) and a pointer to the php_stream_wrapper structure. It is particularly important to unregister the wrapper when the module is unloaded (or at request end if you are registering dynamic wrappers), otherwise there is a risk of causing PHP to crash when someone tries to fopen() something using your previously registered protocol. To un-register a wrapper simply call php_unregister_url_stream_wrapper() and pass the name of the protocol used when it was registered.

Compile and Test!
You should be able to compile your extension now, and run the test script in Listing 5.

Listing 5: Test PHP script

<?php
// Create a test file in the home dir first
$fp = fopen(getenv("HOME") . DIR_SEPARATOR .
"wrapper_test.txt", "w");
fwrite($fp, "This is a test\n");
fclose($fp);

// Now, let's try our wrapper
readfile("home://wrapper_test.txt");
?>

If everything goes according to plan, you will see "This is a test" on your screen. If that is not what you see, or if it didn't compile at all, please go back and double check that you copied the code correctly. You might find that you either need to move the functions around or provide prototypes for them, depending on your coding style.

Fleshing it out
We have covered how to tie our own streams into the fopen() function. Wouldn't it be nice if we could stat("home://wrapper_test.txt") to find out details about its size and modification time? Or how about opendir("home://") and using readdir() to get a list of the files contained therein? That's what we are going to do now, starting with stat().

fstat() and stat()
If you are reasonably new to working with file handles and descriptors, you may not know that fstat()and stat() are used to query information about the filesize, modification time, owner and so on of an open file handle, or a named file. Streams provides its own stat and fstat API for use by extension authors and by the PHP fstat() and stat() functions.

A wrapper can provide an override for fstat() calls on the stream. This is useful when the underlying stream is not connected to a real local resource, such as a socket connection to a file on a remote machine. The HTTP wrapper uses this override to cause all fstat() calls to fail when used on HTTP streams instead of allowing the socket stream to handle it itself and return misleading information. Although this currently fails, it could be extended to fill in information from the HTTP headers, such as Content-Length to set the file size field and fill in modification times based on the cache headers.

In our wrapper implementation, since we are only really providing an alias for the home directory of the user and allowing access to the actual files, we don't have any problem with allowing the stream to handle the fstat() so we will leave the stream_stat field of our php_stream_wrapper_ops structure set to NULL.

However, since we do want to handle stat("home://wrapper_test.txt"), we will fill in the url_stat field. It is important to note that (at the time of writing), only C or PECL extensions can take advantage of this hook. The necessary code to expose it to the PHP script is not yet complete, but may find its way into the PHP 4.3.2 release. You can test it from C by calling php_stream_stat_path() on an URL and then using the result.

We need to declare our stat() function like this:

int mystreams2_url_stat(
php_stream_wrapper *wrapper,
char *url,
php_stream_statbuf *ssb
TSRMLS_DC)
{
... your code goes here ...
}

Streams passes the wrapper - a pointer to our own php_stream_wrapper structure) the URL to be examined and a stat buffer to be filled in with details. All you need to do is fill in the stat buffer with correct information and return 0. If an error or another problem occurs, you should return -1. These are the same semantics as the POSIX stat() system call. Streams defines its own statbuf structure. In PHP 4.3 it contains a regular struct stat as provided by your OS, but could be extended in the future to hold additional meta-data.

All we need to do in our extension is map the home://wrapper_test.txt filename onto the real pathname, just as we did in our wrapper opener, and then call the system stat() function to return the correct information. Since we will need to perform this mapping again a little later, now is a good time to separate the mapping code into a function of its own. Let's define a function that will allocate and return the real path name when given a home:// path. The caller is responsible for efree()ing the memory. Listing 6 provides one implementation. If you were serious about using the home:// protocol, this function would be the ideal place to perform the various system calls to determine an appropriate home directory.

Ideally, you should change your wrapper_opener function to make use of this. I'm not going to do that in this article, it's quite easy to do (the sample code shows how).

Listing 6: mystreams2_map_path

static char *mystreams2_map_path(
char *url
TSRMLS_DC)
{
char *real_path = NULL;
char *home_dir;

home_dir = getenv("HOME");
if (home_dir == NULL) {
/* problem; no home dir can be determined */
return NULL;
}

/* build the path relative to home; this
* emallocs() a string for us.
* url includes the home:// part, which we
* need to skip */
spprintf(&real_path, 0, "%s%c%s", home_dir,
PHP_DIR_SEPARATOR,
url + sizeof("home://")-1);

return real_path;
}

Now, let's write our stat function. Copy the code from Listing 7 into your source code.

Listing 7: mystreams2_url_stat

int mystreams2_url_stat(
php_stream_wrapper *wrapper,
char *url,
php_stream_statbuf *ssb
TSRMLS_DC)
{
char *real_path;
int retval;

real_path = mystreams2_map_path(url TSRMLS_CC);
if (real_path == NULL) {
return -1;
}
retval = stat(real_path, &ssb->sb);
efree(real_path);
return retval;
}

And hook into the wrapper operations by updating your code to match Listing 8.

Listing 8: hook url_stat into wrapper operations

php_stream_wrapper_ops mystreams2_wops =
mystreams2_wrapper_opener
NULL, /* stream_close */
NULL, /* stream_stat */
mystreams2_url_stat,
NULL, /* opendir */
"home:// dir"
};

It's really pretty simple. Unfortunately, since we can't write a PHP script to test this part of the wrapper, you may want to prove to yourself that this whole stat thing is working by implementing the stream_stat/fstat() function. Listing 9 shows how you would implement it:

Listing 9: sample stream_stat operation

int mystreams2_stream_stat(
php_stream_wrapper *wrapper,
php_stream *stream,
php_stream_statbuf *ssb
TSRMLS_DC)
{
int retval = -1;

// Call the default implementation
if (stream->ops->stat) {
retval =
(stream->ops->stat)
(stream, ssb TSRMLS_CC);
}
return retval;
}

Don't forget to hook this into the wrapper operations structure. What we are doing here is hooking into the normal fstat() call and then doing the same thing that the Streams API does for itself - calling the stream defined fstat function. For the sake of testing, you could try doubling the ssb->sb.st_filesize field before returning from the function - something like this:

$fp = fopen("home://wrapper_test.txt", "r");
var_dump(fstat($fp));

The size field should then be twice the actual size of the file.

Opendir()
The last thing to cover with wrappers is hooking into opendir(). The opendir() function creates a handle that can be used to enumerate the names of the files contained within a directory. The wrapper hooks allow you to implement this behaviour for your protocol - the default is to issue a warning if you try to opendir() a protocol based filename.

The streams API implements opendir()/readdir() by creating/using a special directory stream. The directory stream is read-only and will only allow you to read chunks that are precisely the size of a php_stream_dirent structure. Each chunk read in this way contains information about a particular entry in the directory.

The default plain-files directory stream is implemented by issuing real opendir() and readdir() system calls, and again, takes into account some OS specific quirks. If you are implementing a complete protocol wrapper, perhaps implementing opendir() for FTP resources, you will need to define your own FTP based directory stream. This is a reasonably complicated thing to implement, and I recommend that you read the first part of this article and refer to the source code for the php_plain_files_dirstream implementation in the PHP source code.

Luckily for us, we only need to alias home:// to the real path again, so we are spared from implementing dirstream, and can re-use the PHP provided implementation. The function declaration is identical to the wrapper opener (the first chunk of code we wrote in this article), as it is doing the same job - returning a stream on a particular resource (see listing 10). The only difference is that we are calling php_stream_opendir() instead of php_stream_fopen().

Listing 10: mystreams2_dir_opener

php_stream *mystreams2_dir_opener(
php_stream_wrapper *wrapper,
char *path, char *mode,
int options, char **opened_path,
php_stream_context *context
STREAMS_DC TSRMLS_DC)
{
char *real_path;
php_stream *stream;

real_path = mystreams2_map_path(path TSRMLS_CC);
if (real_path == NULL) {
php_stream_wrapper_log_error(
wrapper, options TSRMLS_CC,
"Could not map path %s",
path);
return NULL;
}

stream = php_stream_opendir(real_path,
options, context);

efree(real_path);

return stream;
}

Add Listing 10 to your source code and hook it in to the mystreams2_wops php_stream_wrapper_ops structure as the opendir field, then compile it and test Listing 11.

Listing 11: opendir test script

<?php
$d = opendir("home://");
if (!$d) {
die("Failed to opendir home://");
}
while (($f = readdir($d)) !== false) {
echo "Item: $f\n";
}
closedir($d);
?>

Hopefully you will see a list of files and directories from your home directory on your screen.

Wrapping Up *groan*
We have touched on all the key features of wrappers, and you should now be capable of implementing your own protocol wrappers as PECL extensions. This has a number of applications, ranging from just providing useful protocol support, to using these features to implement your own storage for encrypted or encoded PHP source files - remember that the Zend Engine is also tied into the wrappers system, so you can include files via your own custom wrappers.

Where do you go from here? Well, hopefully we've covered enough of the streams features to keep you happy until PHP 5 is released. There will be a whole bunch of new things to sink your teeth into then, such as the stackable filter API, the new socket transport system and the memory mapping API.
Wez Furlong is author/maintainer of the Streams API, mailparse, sysvmsg and openssl extensions, helps to maintain the PHP core and contributes to other OpenSource projects from time to time. His company "The Brain Room" makes heavy use of PHP in their systems building and integration projects.

Links and Literature

Software & Support Verlag - Global Alliance Program!







-- Advertisement --
Kelkoo price comparison in Germany
- Mobiles
- Furniture
- Notebooks
- Hotels
- Flights
- Digital cameras
Software & Support Verlag GmbH