Magazine PDF Issue Conference Forum Software & Support Verlag











from PHP Magazine - International Edition Issue: 06.2003

The eight arms of Shiva

An OO approach for shared memory
Andrey Hristov

PHP becomes more popular as a programming language every day. PHP has different modules for performing all kind of things. Two of these modules allow the user to use System V semaphores and shared memory.

By using these modules scripts running on the server can communicate to each other without using the file system or a RDBMS. In this article I will show what you can gain by using the API provided. To understand this article it is not necessary to be a OO guru, but a little understanding of object oriented programming will help.

Semaphores
The semaphores allow running processes to synchronize their execution. In most cases they are used in conjunction with shared memory thus providing a powerful tool for inter-process communication (IPC). The semaphores and semaphore operations in System V are a generalization of the semaphores and operations P and V invented by Dijkstra. This generalization provides the opportunity to perform several operations and the semaphores can increase and decrease their values by more than one. Semaphore operations are atomic. This functionality is provided by the kernel. System V semaphores and shared memory modules do not work on Windows.

Shared memory
This kind of memory allows several processes to image part of their virtual memory segments in memory segments that are visible to all of them. In this way the processes can communicate directly through that kind of memory. They write and read into these memory segments by using standard commands for memory read and write. One program will create a memory portion which other processes (if permitted) can access. While the write access is allowed for more than one process, an outside protocol or mechanism such as a semaphore can be used to prevent inconsistencies and collisions.

The shared memory is an alternative to the message passing mechanism but the programmer needs to set everything up. A critical section is a block of code which only one process (generally) can run at one time. Every such section has a start and a beginning. At the start of the section, a check is made to see if another process is already running this section. If not, the current process is allowed to enter and thus other processes are prohibited fromentering this section until the current process exits the section. It is critical because only the process that is in it has access to the shared memory and all other processes wait or skip it (depends on the realization). The problem of accessing the critical section can be solved in this way:
  • P(s) --decrease the value of the binary semaphore
  • ... --whatever operations are needed
  • V(s) --increase the value of the binary semaphore
P and V (the Dijkstra operations) are like brackets around the critical sections. The initial value of the semaphore s is 1. When one of the processes is in the critical section then its value is 0. Mutual exclusion is guaranteed because when s is 1, only one process can use the P operation. The other processes wait until s is 0. The increase of s's value is performed with the V operation. If a process forgets to release the semaphore, then all other processes will wait forever (or until they are terminated). With a critical section a semaphore with max_acquire 1 is used and those kinds of semaphores are called binary semaphore because it can only have two states.

The set of classes presented here provides functionality by which it is easier to code scripts that need to use semaphores and shared memory. There is also a prebuilt class that can be used on heavily loaded servers to prevent them from overload.

Shm_SharedObject
This class is the common root of the class wrappers for Shm_Semaphore and Shm_Var. It provides only one method: proto int _memSegSpot(string str). This method returns a memory address that is calculated by using the first four chars of the passed string. Thus, by giving each object a different name it will have a unique memory address and there will not be any clash. This method is private can be seen from its definition. It is used by the classes that derive from Shm_SharedObject.

Shm_Semaphore
This class extends Shm_SharedObject. It provides a constructor that is used for initialization of an object by providing the needed semaphore creation information. It has these methods:

proto Shm_Semaphore(string sem_name, int max_acquire, string perm)
proto bool acquire(void)
proto bool release(void)

The string sem_name is passed to the method, inherited from its superclass which is named _memSegSpot(). The int max_acquire is the maximum value that the semaphore can have. By default, if this parameter is not provided, its value is 1.

The string perm is used to setup the permissions over the semaphore. Its length must be 3 and it must represent an octal number. The way this parameter is interpreted is the same as the permissions of the files onto the *nix systems. If value is not provided a default one of 666 will be used. The methods acquire() and release() don't take any parameters. acquire() tries to increase the value of the semaphore, false is returned on error, true on success. The value cannot be higher than the max_acquire parameter passed to the constructor. release() tries to decrease the value of the semaphore. Like acquire() it returns true on success and false on error.

Shm_Var
Like Shm_Semaphore this class extends Shm_SharedObject. It provides the basic functionality that is needed to perform operations over shared memory variables. All the needed information for object instantiation is passed to the constructor. Additionally, two methods named getVar() and putVar() are exported (i.e. public). The prototypes are:

proto Shm_Var(string shm_name, int memory_size, string perm)
proto mixed getVar(void)
proto bool putVar(mixed val)

The parameter shm_name resembles the parameter with the same name of the Shm_Semaphore's class constructor. It is used to generate a unique memory address for the variable that will be kept in the shared memory. The integer memory_size is the size of the memory that will be allocated for the keeping of the variable. It cannot be too small. If you don't know which value to use then just use 1024. Keep in mind that PHP serializes the internally stored variables so the size needs to be large enough to hold the serialized value of the variable. Therefore use a number that is bigger then the length of the serialized representation of largest value that will be put in the shared memory. Once created, the assigned memory segment cannot be resized easily (without detaching and reattaching). For more information take a look at the serialize() PHP function.

The perm parameter is used for setting permissions for the access of the shared memory segment. If this parameter is not provided, a default value of 666 is used. The default value allows all processes to access the variable. If the access has to be restricted to scripts run only by the web server, 600 should be used. The method getVar() returns the value of the variable stored in the shared memory segment. In the case of an error, false is returned, otherwise, the value of the variable. If there is an error it will be suppressed. The method putVar() expects one parameter that can be of any type. In fact, it is meaningless to store variables of type resource in the shared memory, because it is pointless to serialize them. This is because the memory structure associated with the resource identifier is specific only to the current running instance of the script. true is returned on success and false in case an of error. Error messages are not suppressed.

Shm_Protected_Var
Unlike the classes that are presented above this one has no superclass. It aggregates two objects (instances) of the classes mentioned above thus providing the opportunity to use shared memory location by restricting the access to it for only one process at a time. The use of the variable stored in the memory is mutually exclusive. The interface of this class has one constructor and four methods available to the public. Prototypes:

proto Shm_Protected_Var(string name, int size)
proto bool startSection()
proto bool endSection()
proto mixed getVal()
proto bool setVal(mixed val)

The parameter name is used to generate the memory address of the protected variable. The same memory address is used in the initialization of the aggregated Shm_Semaphore and Shm_Var instances. The parameter size is used to allocate the required memory. The permission parameter used is 666 thus giving access to all processes in the system.

The method startSection() is used to start a critical section. When one script enters a critical section by invoking this method, all other scripts that use the same name of the Shm_Protected_Var object will wait when they call startSection(). This is true until the moment when the script that is in critical section calls endSection(). Then one and only one of the other scripts will enter the section. If there are other waiting scripts they will continue to be in a blocked state. Note that PHP will release any semaphores that are acquired and not released. Thus, even if a script ends without calling endSection() the end of the section will be the end of the script (or of its execution). However, it is not good to rely on this. Also, an error message will be shown. The method getVal() is used to get the value of a variable. There is a prerequisite to this method; namely the startSection() method has to be called in advance to access the variable's value. The same is true and for the setVal() method. If the object is not in a critical section state a warning message will be printed and FALSE will be returned for both getVal() and setVal().

Shm_Load_Protector
This class is an example of how shared memory semaphores and shared memory can be used together for controlling the load of a server. In fact, an instance of Shm_Protected_Var is used as a counter of currently running scripts that use the same name. By same name is meant the same string that is passed to the constructor of the class and used for accessing the same counter. Thus, scripts with different names can use one counter. By using this technique, the number of scripts running can be limited.

This class exposes four methods and a constructor. Their prototypes are :

proto Shm_Load_Protector(string code_name, int max_processes)
proto bool increaseLoad()
proto bool decreaseLoad()
proto void nullCounterValue()
proto int getCounterValue()

The string code_name is used to identify the load protector. We may call it such that it is its identifier. Of course all instances of a script will use one name but the use of this name is not limited only to this script as I noted above. Note that only the first four characters of the string are important. The maximum number of processes that will be guarded is passed as the second parameter to the constructor. It is recommended to instantiate an object of this class as early as possible in the script. This can be the second line in the script after including the file with the source of the class. Also increaseLoad() method can be called which will check how many scripts that use same code_name are running. If the number is lower than max_processes the internal counter will be incremented by one, and true will be returned to show success. Otherwise, the internal counter will not be incremented and false will be returned. When a script receives false from this method, it is recommended that it does some action to show that there is an overload and to quit immediately. For example, when the script is a back-end for some Java front-end, the former may return the string toomanyprocesses20, showing that there are too many running scripts and giving information of the maximum number. In another case, where the script is a web page, HTML can be output which tells the user that there is a temporary server overload or may even forward him or her to another server.

The other two methods, nullCounterValue() and getCounterValue() can be used by administrative interfaces and for debugging. nullCounterValue(), as indicated by its name, can be used to nullify the value of the internal counter. This is especially useful in administrative interfaces to recover from a deadlock. This may occur if one or more scripts run for longer than normal. They can be blocked, waiting for an external event or have entered some endless loop. The second in the pair is useful for getting the number of running scripts. Here are a few cases where this class is used :
  • Use case 1: Despite the fact that the running web server, for example Apache, may have a directive for limiting the number of running processes (threads), it is not possible to limit the number of running scripts of some types. For example, if one Apache server serves both PHP and Perl scripts, the number of running PHP scripts can be limited to some number. Therefore, it can be guaranteed that when there is a high load at least some Perl scripts will run. This minimal border will be the result of the subtraction of maximum running processes, minus the maximum PHP scripts.
  • Use case 2: A heavy script may be running. A heavy script is one which uses many system resources whihc may be easily exhaustible. In this case the number of running instances of this script has to be limited to prevent the server from being overloaded. Also, on the client side there can be a logic that if the script returns an error message, it will retry the call after a random length delay. This is like the Ethernet's CSMA/CD, but unlike it, the maximum number of talking processes is not limited to 1. By using a random delay the load should be better distributed over time. So the script uses a unique (for all PHP scripts on the server) name for its Shm_Load_Protector.
  • Use case 3: A long time ago, I read an article in Dr. Dobb's journal about modifying the Apache web server's scheduling function to allow prioritization of processes to run. The programmer that did it wrote that this is useful for commericial web sites which have free resources. These sites may guarantee quality of service (QoS) to its members that pay because the processes that service registered users will be given a higher priority to those that serve content to unregistered users. By using this class, it is easy to provide such QoS without the hassle of modifying Apache's core routines, although this may not be the best solution for this problem.
Listing 1: How to use semaphore
File : shm_example1.php

<?php
require 'shm.php';
$sem = new Shm_Semaphore("test", 2); // semaphore's name is "test"
// max_acquire is 2. All processes can use
// the semaphore
$sem->acquire();
printf("Entered section at : %s\n", strftime("%D %T", time()));
sleep(5);
printf("Exited section at : %s\n", strftime("%D %T", time()));
$sem->release();
?>

You can test this script for example on a *nix console. One option is to use the wget utility which is not hard to find these days. Another option is, rather than testing with a web server, is to use the PHP CLI (Command Line Interface) binary. The latter is actually better for this purpose. Start the script by typing php shm_example1.php on all opened consoles (at least 3). Try to start the scripts quickly on all consoles. After they finish their work, you will see that the first 2 started worked for about 5 seconds but the third script run worked for more than 5 seconds - possibly 7 or 8, depending on how fast you were in starting the scripts. So, when the third instance was started and reached the line $sem->acquire(), it was blocked, because the max_acquire number had been reached. It remained blocked until the first script started invoked the release method of $sem after sleeping for 5 seconds. After that, the third script sleeps for 5 seconds, then releases its semaphore and quits. If we forgot to call the release method, the semaphore would be automatically released at the end of the script and a warning message would be given.

Listing 2: How to use shared memory
File: shm_example2.php

<?php
require 'shm.php';
$shm_var = new Shm_Var("foo1", 1024);
//allocating 1024 bytes for the variable
if (!($str = $shm->getVar())) {
$str = strftime("First script started at : %H:%M:%S", time());
$shm->putVar($str);
}
echo $str;
?>

Start this script twice or more on the console. For every start, the same message will be printed showing the time of day when the script was first started. Note that the variable is not guarded against simultaneous writes, because it is not possible in this case. Only the first instance writes and all after it read from the shared memory, so inconsistencies are not possible. However inconsistencies are possible if the scripts are started too quickly one after another on different consoles or if tested using a web server in conjunction with a tool like ab (Apache Benchmark).

Listing 3: How to use shared memory with write guard
File : shm_example3.php

<?php
require 'shm.php';
$guarded_var = new Shm_Protected_Var("foo2", 1024);
//allocating 1024 bytes
$guarded_var->startSection();
$s = sprintf("putVal() at : \n",microtime());
echo $s;
$guarded_var->setVal($s);
echo $guarded_var->getVal();
$guarded_var->endSection();
?>

You may test this script using the CLI version of PHP or another tool for load testing. In all cases, each script will write an equal string twice to the standard output. There will not be any race conditions as one script writes the string generated by another script.

Usage of Shm_Load_Protector
Listing 4.1: Use case 1
File : shm_example41.php

<?php
require 'shm.php';
$protector = new Shm_Load_Protector("bgdt",200);//max 200 scripts using bgdt
if (!$protector->increaseLoad()) {
/* There are 200 scripts running */
include 'high_load.html';
exit;
}
register_shutdown_function("shutdown_function");
....
....
....
function shutdown_function() {
global $protector;
$protector->decreaseLoad();
}
?>

For this use case all scripts should be like this and use "bgdt" (in our case) as first parameter to the constructor of Shm_Load_Protector. They must create an instance of the class as early as possible in the script. Then they have to call the increaseLoad() method. In case it returns FALSE, an HTML page that tells that there is a high load is shown. On the other hand, if TRUE is returned, the script continues by registering a shutdown function. The use of shutdown function is not a must, but is highly recommended. What is gained? When the script finishes and is terminated by the Zend Engine, the shutdown function will be called and it will clean up. In our case, it will call the decreaseLoad() method. So the code can terminate its execution wherever it wants with exit/die or normally by reaching the end of the script. You are not obligated to enter code for cleanup everywhere. Therefore, the code size will be lower and easier to maintain. If you use OO approach for building your scripts, as I do in most cases then make $protector a member of your factory class and register for shutdown function to be a function of this class.

4.2:
Use case 2 Because the example file is almost the same as shm_example41.php, I won't include it here. It is different from it only because a lower number should be passed for the second parameter of the constructor. The first parameter has to be unique and used only by the heavy script. For example it can be heavy.

Listing 4.3: Use case 3
File : shm_example43.php

<?php
require 'consts.php';
require 'shm.php';
if (!empty($_SESSION['user_id'])) {
$code_name = "1log";
$max_scripts = MAX_PROCESSES_1LOG;
} else {
$code_name = "0log";
$max_scripts = MAX_PROCESSES_0LOG;
}
$protector = new Shm_Load_Protector($code_name, $max_scripts);
if (!$protector->increaseLoad()) {
include 'high_load.html';
exit;
}
register_shutdown_function("shutdown_function");
....
....
....
function shutdown_function() {
global $protector;
$protector->decreaseLoad();
}
?>

This example uses the approach of one dispatcher for pages in a site. The different services are identified by the parameters passed to the script. I want to show how registered users can be differentiated from unregistered ones. The former receive QoS because the limit of running scripts for them is 200 and for unregistered users it is 50. Note that even if you try to pass different numbers as a second parameter, this won't change the maximum. The reason is that only the number used during the first call of the script matters. As I mentioned before, the class is not the best for this situation because when a small number of registered users browse the site only 50 unregistered users may use it and thus, the system resources will not be as well as they could be. A better class could be written. One additional parameter should be added - the minimum number of scripts that should receive QoS.

In brief, this thin OO wrapper around the native API for System V semaphores and shared memory may help you when your scripts needs to exploit the functionality of these sweeties. It easy as 1, 2, 3 to instantiate from Shm_Protected_Var and then only to use the methods for reading, writing and critical section. Never more should you need to experience the mess of many variables for semaphore ids and shared memory segments ids.
Andrey Hristov develops applications using PHP and contibutes to the PHP project. He is an author of several extensions that wrap various libraries. Can be reached at andrey@php.net.

Links und Literature

Software & Support Verlag - Global Alliance Program!







-- Advertisement --
Kelkoo price comparison in Germany
- Mobiles
- Furniture
- Notebooks
- Hotels
- Flights
- Digital cameras
Software & Support Verlag GmbH