- Webdevelopment Webdevelopment
- Webdevelopment Software
- Webdevelopment Technology
- Webdevelopment Macintosh
- Webdevelopment News
- Webdevelopment Apple
- Webdevelopment Mac-News
- Webdevelopment Miscellaneous
- Webdevelopment all available Feeds
- Updated Optimizing PHP Article
I have just updated my popular Optimizing PHP article with additional information on caching. I discuss memcache and squid. I also updated the PHP Accelerators and changed the tone of some parts of the article. I quote:
Perhaps the most significant change to PHP performance I have experienced since I first wrote this article is my use of Squid, a web accelerator that is able to take over the management of all static http files from Apache. You may be surprised to find that the overhead of using Apache to serve both dynamic PHP and static images, javascript, css, html is extremely high. From my experience, 40-50% of our Apache CPU utilisation is in handling these static files (the remaining CPU usage is taken up by PHP running in Apache).
It is better to offload downloading of these static files by using Squid in httpd-accelerator mode. In this scenario, you use Squid as the front web server on port 80, and set all .php requests to be dispatched to Apache on port 81 which i have running on the same server. The static files are served by Squid.

- Malaysian FOSS Conference 2009 Opening Keynote Last Saturday, I gave the opening keynote of the Malaysian Free & Open Source Software 2009 conference. The speech was prepared the day before, but as usual, I will improvise some stuff, so some parts have been amended based on memory: Ladies and gentlemen, honored guests, good morning! Today the landscape of information technology has been transformed by the vision of free software and open source. The search engines of Google roar with the sounds of open source Linux. Our Malaysian government encourages the use of open source whenever possible. Sounds of PHP, MySQL, Apache, GPL have become familiar names in the tapestry of IT. But that was not what it was like when I first started out as a young student in the mid-80s at the University of Melbourne, Australia. Things were different then. Concepts such as open source, GPL were still unknown. I still remember a fellow student was expelled from university for making copies of the source code of proprietary Unix software for his personal use. I admit I was disturbed by this, because I too had an insatiable curiosity about how software worked, and it was impossible to learn more without access to the source code. I wanted to find and understand the wiring inside the software. I remember fondly, and today with a bit of guilt, that I used to crack copy protected games, not for the pursuit of profit, but as an intellectual challenge ? well ok, I have to admit I did it to play the games. The trick doing this (cracking) metaphorically is finding the wiring behind the copy protection and reversing the wires so that instead of refusing to run, it does the opposite and continues working. Of course to quickly find the right wires to switch and crack a large program is not easy. Which brings me to the first piece of advice if you want to be successful in software design? You need to have good taste, which is kind of weird because nerdy programmers are notoriously bad dressers, fond of bad hair days and certainly not fussy about the finer points of fine dining. What I?m taking about is of course is a taste for good logic. The feel of a beautiful idea, the taste of a mighty logic, or the fun in a great hack. Games designers and coders are a talented bunch of people, and if you understand their logical rhythms and designs, it becomes obvious where the wires you need to reverse to crack the software reside. The other important element of success is being happy. You have to have passion and enjoy what you are doing. To me cracking games was like cracking walnuts, a fun thing to do, but after a while it got boring. You need to do something with others and share with others to become really passionate about something. Social responsibility is another important element of life. You need to channel your life productively - only then will you find true happiness. Cracking games became boring and I found other better diversions. It was around this time my fellow student was expelled that I learned about the international USENET community. To young people, you have to imagine a time before the World Wide Web, when people used the Internet primarily for email. USENET was a fantastic group of mailing lists with forums and archives. USENET was also used to disseminate programming ideas and knowledge in the form of source code. So even before the concepts of Open Source and licenses such as GPL became well known, there was this thriving community of programmers who shared their source code and learnt from others. Which brings me to the next lesson: the typical image of the best programmers being lonely introverted hackers is misleading. People are only successful in a community. Open source software needs to be grown organically and for that you need social skills. The classic example here is of course Linus Torvalds, author of Linux, who has skillfully led the Linux community from its inception. It was through the USENET that I released software that I had written, including the one that won runner-up for best Australian Macintosh software in 1988 while still a foreign student in Melbourne. You know, while preparing this speech, at the back of my mind I have always wondered why Malaysia has not had a bigger role in contributing software to the open source community? Was what I achieved due to my overseas education? I was thinking about it last night while writing this speech, and I don't think so: I will tell you why... Malaysians do not lack ability. I see many smart and interesting people around me here at the conference. And I have seen many sophisticated pieces of software in the commercial world developed by talented teams of Malaysians. English, the language of Science and the Internet, is widely spoken here. However in the open source world, we have many more consumers than contributors. Is it our education system? Perhaps an over-emphasis on exams it is a contributing factor, but I don?t think that is the main reason. I studied for 12 years in Malaysian state schools, and I survived sane and reasonably intelligent! And exposure to the Internet has made young people more worldly than any previous generation of Malaysians. After reflecting, I suspect the reason is primarily economic. After college, it is difficult to sustain a living and have the time to contribute meaningfully to an open source project here in Malaysia. There are companies with strong support for open source here, but most companies here see little value in allowing their staff to contribute to open source. So let?s flash forward from studying Melbourne in the 80?s to working in Malaysia in the year 2000. At that point in time, my company was planning on developing their next generation web application server, called PHPLens. An application server is a professional software framework which makes it easier for programmers to build high quality software modules. We also wanted PHPLens to support as many databases as possible. That was the reason why we decided to open source our database abstraction library. Contributions from the programming community were encouraged so that we could support more databases. And as this was the 3rd database abstraction library I had developed in my career, I had some meaningful experience in this area. Other developers liked it and today the library has become very popular world-wide and is in use by thousands of developers. I have been working with and supporting the ADOdb abstraction library for over 9 years. I can tell you working on open source is sometimes not fun. You work for hours to implement some feature and then the feedback you get is that it?s not very useful. People will disagree with you. You also get cranky people emailing you in broken English to fix their problems urgently. And if you misunderstand them, it just gets worst. To survive, you need to be passionate about your work, really listen to people (which isn?t easy in an email exchange) and be committed to excellence. I would like to show you now a presentation I did on ADOdb a few years ago. [presentation here]In closing, I would like to ask how do I think the Malaysian Free & Open Source Software movement can advance further? Actually I think we are doing a good job. I see a lot of local companies have already switched to using Open Office or running Linux, Apache, MySQL, PHP for their web-sites. As I mentioned before, the real factors we need to look into are still economic, your take-home pay. What we need is more demand for people with the right skills to support this open source infrastructure, and an ecosystem where the pay is attractive. We need to transition from the idea that ?free software is cheap? to ?free software is cost-effective?. There is dignity in work, and people deserve to be rewarded. Thank you.
- Monitoring and logging CPU Utilization of Virtual Machines in Xen Oct 6 update: Added logging of disk [d] and network [n] info. Oct 4 update: added availability option. Now uses xentop internally. Oct 2 update: added graphing to xenstat.pl. Now xenstat.pl detects Guest VM start/shutdown and resets itself. Number of vcpus also shown. Misc bug fixes. You can download xenstat.pl here. Syntax perl xenstat.pl [$mode] [$intervalsecs=5] [$nsamples=0] [$urlToPostStats] Quick Guide perl xenstat.pl -- generate cpu stats every 5 secs perl xenstat.pl 10 -- generate cpu stats every 10 secs perl xenstat.pl 5 2 -- generate cpu stats every 5 secs, 2 samples perl xenstat.pl d 3 -- generate disk stats every 3 secs perl xenstat.pl n 3 -- generate network stats every 3 secs perl xenstat.pl a 5 -- generate cpu avail (e.g. cpu idle) stats every 5 secs perl xenstat.pl 3 1 http://server/log.php -- gather 3 secs cpu stats and send to URL perl xenstat.pl d 4 1 http://server/log.php -- gather 4 secs disk stats and send to URL perl xenstat.pl n 5 1 http://server/log.php -- gather 5 secs network stats and send to URL Requires xentop from Xen 3.2 or newer xentop backported to Xen 3.1. Usage To use run "perl xenstat.pl" in domain 0. The following output will be generated, with a new statistic generated every 5 seconds: [root@server ~]# perl xenstat.pl cpus=2 40_falcon 2.67% 2.51 cpu hrs in 1.96 days ( 2 vcpu, 2048 M) 52_python 0.24% 747.57 cpu secs in 1.79 days ( 2 vcpu, 1500 M) 54_garuda_0 0.44% 2252.32 cpu secs in 2.96 days ( 2 vcpu, 750 M) Dom-0 2.24% 9.24 cpu hrs in 8.59 days ( 2 vcpu, 564 M) 40_falc 52_pyth 54_garu Dom-0 Idle 2009-10-02 19:31:20 0.1 0.1 82.5 17.3 0.0 ***** 2009-10-02 19:31:25 0.1 0.1 64.0 9.3 26.5 **** 2009-10-02 19:31:30 0.1 0.0 50.0 49.9 0.0 ***** In the above output, the first few lines summarise the CPUs and running domains. Then we have the statistics generated every 5 seconds. At the end of each line is a simple graph. 5 stars means 90% or over CPU utilisation, 4 stars is 70% or over, etc. You can also define the interval to poll (in seconds), and the number of samples just like vmstat: [root@server ~]# perl xenstat.pl 3 2 cpus=2 40_falcon 2.67% 2.51 cpu hrs in 1.96 days ( 2 vcpu, 2048 M) 52_python 0.24% 748.07 cpu secs in 1.79 days ( 2 vcpu, 1500 M) 54_garuda_0 0.44% 2258.38 cpu secs in 2.96 days ( 2 vcpu, 750 M) Dom-0 2.24% 9.24 cpu hrs in 8.59 days ( 2 vcpu, 564 M) 40_falc 52_pyth 54_garu Dom-0 Idle 2009-10-01 12:14:59 0.0 0.0 1.7 5.7 92.5 2009-10-01 12:15:02 0.0 0.0 0.3 10.4 89.3 * [root@server ~]# Logging Using REST web service To log the CPU utilisation using the Perl script, I didn't want to install a database client in Dom-0. So I added another parameter, a URL to a web server to call with the CPU info as GET parameters. I assume wget is installed in your Dom-0. [root@server ~]# perl xenstat.pl 10 1 http://192.168.0.1/ cpus=2 54_garuda_0 0.49% 165.81 cpu sec over 3.62 days (2 vcpu, 750 M) 59_gyrfalcon 0.62% 69.03 cpu sec over 0.80 days (2 vcpu, 2000 M) Dom-0 1.57% 2.15 cpu hrs over 3.62 days (2 vcpu, 564 M) --10:46:42-- http://192.168.0.1/?54_garuda_0=0.1&59_gyrfalcon=2.1&Dom%2D0=2.2& Connecting to 192.168.0.1:80... connected. HTTP request sent, awaiting response... 200 OK Length: 498 [text/html] Saving to: `STDOUT' 100%[============================================>] 498 --.-K/s in 0s 10:46:42 (67.8 MB/s) - `-' saved [498/498] 2009-09-29 10:46:42 0.1 2.1 2.2 95.6 This will accumulate statistics for 10 seconds then send it to the above url in this format: http://192.168.0.1/?54_garuda_0=0.1&59_gyrfalcon=2.1&Dom%2D0=2.2&. This allows you to log the data using a REST-ful web service. Network mode [n] Shows total network reads and writes in KBytes or MBytes for that time period. perl xenstat.pl n Network I/O (K) 52_pyth 54_garu 59_gyrf Dom-0 2009-10-05 19:55:08 7 979 1 3 2009-10-05 19:55:13 6 1.2M 1 1 2009-10-05 19:55:18 5 600 2 3 Disk IO mode [d] Shows total reads and write requests for each domain for that time period. perl xenstat.pl d Disk I/O (Reqs) 52_pyth 54_garu 59_gyrf Dom-0 2009-10-05 19:51:02 4 0 1317 0 2009-10-05 19:51:07 27 0 1140 0 Availability Option [a] Shows CPU Availability % (which is the same as CPU Idle %) instead of CPU Utilisation %. The problem with showing CPU Utilisation occurs when you have multiple Guest VMs with different number of vcpus. If the CPU Utilisation of a guest VM is 50% can you tell whether it is already capped (vcpus = 50% of physical cpus), or can it go higher? The solution is to reverse the CPU figures and view information in terms of Available CPU % left (100 - CPU Utilisation %). The advantage is that you know when the CPU of a guest VM are exhausted as the figures approach zero. In the example below, note that garuda has only 1 vcpu which means that cpu available is capped at 50% for garuda. [server~ ]# xenstat a Output: ------- cpus=2 40_falcon 2.33% 2.53 cpu hrs in 2.26 days (2 vcpu, 2048 M) 52_python 0.26% 940.55 cpu secs in 2.08 days (2 vcpu, 1500 M) 54_garuda_0 1.48% 18.47 cpu secs in 0.01 days (1 vcpu, 750 M) Dom-0 2.28% 9.73 cpu hrs in 8.89 days (2 vcpu, 564 M) Available CPU % 40_falc 52_pyth 54_garu Dom-0 CPU-free 2009-10-07 18:25:20 100.0 49.9 99.8 99.5 99.1 2009-10-07 18:25:22 100.0 48.2 42.1 91.7 32.0 *** 2009-10-07 18:25:24 100.0 45.2 25.5 79.3 0.0 ***** 2009-10-07 18:25:26 99.9 50.0 0.3 99.8 0.0 ***** 2009-10-07 18:25:28 100.0 50.0 16.7 87.7 4.3 ***** 2009-10-07 18:25:30 100.0 50.0 73.7 99.8 73.3 * Initially in the first line of statistics below everything is quiet. With CPU Availability as the statistic, we can immediately notice that garuda has 1 vcpu (50% of 2 physical cpus) and all the others have 2 vcpus: Available CPU % 40_falc 52_pyth 54_garu Dom-0 CPU-free 2009-10-07 18:25:20 100.0 49.9 99.8 99.5 99.1 In the 2nd line, we can see: Available CPU % 40_falc 52_pyth 54_garu Dom-0 CPU-free 2009-10-07 18:25:22 100.0 48.2 42.1 91.7 32.0 *** Now the server is getting busy (with garuda being the busiest), and the amount of CPU-free is less than each of the domains. This means that python domain has 48.2% virtual idle capacity, but at that point in time only 32% of that idle capacity can be serviced. In the 3rd line, python is heavily loaded and there is no more spare CPU capacity. Available CPU % 40_falc 52_pyth 54_garu Dom-0 CPU-free 2009-10-07 18:25:26 99.9 0.03 50.0 99.8 0.0 ***** If we were looking at it in terms of CPU idle, it would not be obvious that python is overloaded, as you can see if we look only at CPU usage for the same statistics as the 3rd line: [server~]# xenstat 40_falc 52_pyth 54_garu Dom-0 Idle 2009-10-07 18:25:26 0.1 49.97 50.0 0.2 0.0 ***** I hope this is useful for anyone using Xen. This has been a good experience down memory lane too as I haven't coded in Perl for nearly 10 years! Download xenstat.pl. History In Sept 2009, we started experimenting with the Xen hypervisor. In my testing, I have found that Linux performance is better on Xen than VMWare and we are considering it for Linux rollouts. Normally when we roll out a new server for a customer, we have a simple PHP script installed as a cron job that runs vmstat and logs the CPU utilization of the server into our database every 5 minutes. It's very useful for benchmarking, monitoring and troubleshooting mysterious performance problems. I needed a similar script for Xen. A search in Google revealed a Perl script by Tom Brown to record the Xen domain CPU utilisation. However the following limitations led me to modify it: I want total CPU utilisation to be capped at 100%, which is the way "top" works, but not the way "xm top" works. Does not work properly with multi-core CPUs. CPU utilisation can go over 100%. Unfortunately sleep() does not sleep for precisely the number of seconds you define causing the CPU utilization to go over 100% again. There is some perturbation, either because Dom-0 is still virtualised or some other reason. No easy way of logging to a database. So i rewrote parts of the script and renamed in xenstat.pl (after vmstat). Other tools: see xentop, which can run in batch mode, but cannot post to web server. The original script written by Tom Brown.
- PHP with Oracle RAC
My article on High Performance and Availability with Oracle RAC and PHP is out on the Oracle web site.
It discusses my experiences creating an Oracle Real Application Cluster and running it with PHP 5.2 and oci8 1.3 for a customer. Since that article was written I currently recommend that oci8.events be turned off in php.ini since I've had some reliability issues with this setting.

- Boiling down your Computer Science degree into 4000 words
Four thousand words that distill what you really need to understand to build scalable multiuser servers: Server Design by Jeff Darcy.

- HTML5 Gaining Ground?
Tim O'Reilly talks about how Google bets big on HTML5. Sadly, without Internet Explorer support, a lot of this talk is moot to me. Not to say that I wouldn't salivate over using the new HTML5 canvas element. We draw a lot of flow charts graphics and it needs to run on as many browsers as possible. We don't use flash, we use Walter Zorn's excellent jsgraphics library. It's not very fast, but hey, that's why we have multi-gigahertz PCs on our desktops and on our laps.

- The State of Solid State Devices for Databases
Recently I read in AnandTech a good article on Solid State Devices (SSD). It certainly blew away many misconceptions I had about SSD.
From a professional point of view, my main interest would be how databases are affected by the following characteristics of SSD:
Both sequential and random reads are fast with a granularity of 4K. In other words, to modify 1 byte, you still have to write 4K.
Writes require the block to be erased first. A block is typically 512K. That means if there are no erased (also called trimmed) blocks, you need to erase, and it is a s-l-o-w operation.
You can only erase a block 10,000 times before it stops working.
Good quality SSD controllers with onboard caches and highly parallel architectures make a big difference.
From this summary, it appears that the SSD is ideal for relatively static data, and we can selectively put certain parts of the database on SSD. Examples include:
The typical publishing web-site, where articles and messages are rarely edited more than a few times.
Systems with large amounts of static data, eg. multi-player online games such as the EVE Online case-study.
For transaction processing systems it depends on the usage. Assuming the blocks in a table are updated 10,000 times a day, the data distribution of updates is even across all records, and the table fits into 100 512K blocks, then the lifetime of the SSD for those blocks would be 100 days (this might be acceptable if SSD was sufficiently cheap). And even for data warehouse applications with relatively static data, B-tree index rebalancings will cause the frequent rewrites of indexes.
It also appears that operations characterized by sequential writes such as transaction logging should continue to be placed on hard disks.
For some information on potential database performance, see these articles testing SSD with mysql, DB2 and PostgreSQL. Also see Windows 7 Support and Q&A for Solid-State Drives.

- Memories are made of Squid... Apache is not a particularly fast web server. A single Apache server doesn't handle a mix of static and dynamic data particularly well. Ideally, static data such as gifs, pngs and html pages should be cached in memory for quick access; Apache doesn't do this. And the prefork design of Apache (where we have a simple reliable parent process managing multiple worker processes that do the real work) makes it exceptionally robust, but the overhead of having these parent and child processes makes things run slower. Squid is a web proxy accelerator. What it does is make Apache look good -- downloads magically become faster because of the caching of static files, and the overhead of connection setup and keep-alives is offloaded from Apache. When a request for a .php page is made, Squid will pass the request to Apache and return the results. Since 2004, we have had a customer runnning an Intranet system with Apache 1.3, PHP 4.3, Oracle and Squid. This system runs on 16 CPU Sun E20K mainframe and has over 3000 users logged in every day. A few months ago, we upgraded them to PHP 5.2. While planning for the upgrade, I did some benchmarks and found that PHP 5.2 was about 30-40% faster than PHP 4.3. So I confidently recommended that we disable Squid... On the morning of the rollout, I saw to my horror the CPU utilisation surge, from 50%, to 60%, to 70%, until it hit 98% -- and it stayed there! Only then did I remember that in my testing back in 2004, Squid had improved performance significantly more than an upgrade to PHP5 ever would. At eleven, we started Squid and modified apache to listen to port 3000 again. CPU utilisation dropped from 98% to 40%. Squid had saved day again. The moral of the story: never underestimate the power of a good squid The Server setup is as below, all software running on a single E20K Solaris Server: Squid listens on port 80 ---- Apache on port 3000 (http) ---- Apache Children ---- Oracle 10g and port 443 (https) running PHP 5.2 When web browsers use https for login authentication, they connect directly to Apache. When users are accessing normal data on port 80, Squid will forward the request to Apache which is listening on port 3000. Squid is setup to cache .png, .gif, .jpg, .js, .htm, .html and .css files.
- ADOdb Active Record and the art of redesign Merry Christmas and Happy New Year everyone. Looking forward to the new year as I expect to be a father in January :) Let's now talk a little bit about the parenting and the past: Since 2006 ADOdb has supported Active Record, the object-oriented paradigm for processing records using SQL. One of the most powerful features of Active Record is the ability to define parent-child relationships. The old way was: $person = new person(); $person->HasMany('children', 'person_id'); $person->Load("id=1"); Where "persons" is the parent table, "children" is the child table and "children.person_id" is a field in "children" pointing to "persons". All the children of person with id=1 would be dynamically loaded into the array $person->children when the property was accessed (lazy loading). This was confusing for the programmer and had many limitations, as was pointed out by Arialdo Martini in this post. Firstly it was confusing to the programmer. Should HasMany() be called everytime you create a new person()? The answer was no, it's global, but the implementation made it look like it was local to the instance. The HasMany function really should be defined as a statically, before new person() was used. Another problem was you could not override the class of the child objects. So you couldn't modify the behaviour of child object easily. My objective was to fix all this, and still keep backward compatibility so your old code continued to work. The good news is that all the metadata to keep track of all the object-table relationships could still be reused. The problem was one of a weak API, but the internals were sound. The solution implemented in ADOdb 5.07 was to create a new set of static functions that override the default behaviour: ClassHasMany The new way defines the relationship in a static function, which makes it clearer that it only needs to be called once in your init php code: class person extends ADOdb_Active_Record{} ADODB_Active_Record::ClassHasMany('person', 'children','person_id'); $person = new person(); $person->Load("id=1"); TableHasMany One of the things that I try to do in ADOdb is maintain backward compatability. You are able to override the defaults of Active Record (id is the primary key, the name of the table is the plural version of the class name). So if the table name of the parent is not "persons", but "people", you can use: ADODB_Active_Record::TableHasMany('people', 'children','person_id'); $person = new person(); $person->Load("id=1"); TableKeyHasMany The default primary key name is "id". You can override it (say "pid" is used) using ADODB_Active_Record::TableKeyHasMany('people', 'pid', 'children', 'person_id') Child Class Definition Formerly, the child class was always an ADODB_Active_Record instance. Now you can derive the class of the children like this: class childclass extends ADODB_Active_Record {... }; ADODB_Active_Record::ClassHasMany('person', 'children','person_id', 'childclass'); Works the same way with TableHasMany(). Belongs To Analogously, there are functions ClassBelongsTo, TableBelongsTo, TableKeyBelongsTo for defining child pointing to parent. Download ADOdb ADOdb Active Record docs
- Divide-and-conquer and parallel processing in PHP In my previous post Easy Parallel Processing in PHP, I showed you how to implement parallel batch processing using PHP and a web server. In this post, I want to discuss partitioning your tasks so that they become easily parallelized. The strategy I prefer is divide-and-conquer. This works by recursively breaking down a problem into two or more sub-problems of the same type, until these become simple (and fast) enough to be solved directly. The solutions to the sub-problems are then combined to give a solution to the original problem. To illustrate with an example, lets say you have millions of financial payment data records in a database you want to process in parallel using PHP: First group your data into logical chunks that need to be processed in one transaction. If you are processing payments for lots of accounts, then grouping them by account number makes lots of sense. Decide on how many parallel child processes you want to run. For this example, assume we are on a single dual core CPU server, so it makes sense to only run two concurrent child processes. Split all the records by the median account (the median is a statistical term that means the middle record in a range). To make it easy to split by the median From the parent process, pass one child process record 1 to median-1 and pass the second child process the median to final records as $_GET parameters. For simplicity's sake, both child processes will run the same code, but receive different parameters. The results of the processing can be either stored in the database, or returned back to the parent process by echo'ing the results in the child process. To find the median of a set of records in a database, I have extended ADOdb, the popular PHP open source database abstraction library I maintain with the following function defined in the ADOConnection class: function GetMedian($table, $field,$where = '') { $total = $this->GetOne("select count(*) from $table $where"); if (!$total) return false; $midrow = (integer) ($total/2); $rs = $this->SelectLimit("select $field from $table $where order by 1",1,$midrow); if ($rs && !$rs->EOF) return reset($rs->fields); return false; } If you have a Quad-Core CPU then you can call GetMedian 3 times to break up the data into 4 approximately equal parts, and pass then to 4 child processes: $mid = $db->GetMedian('PAYMENTS', 'ACCOUNTNO'); if (!$mid) return 'Error'; $lomid = $db->GetMedian('PAYMENTS', 'ACCOUNTNO', "where ACCOUNTNO < $mid"); $himid = $db->GetMedian('PAYMENTS', 'ACCOUNTNO', "where ACCOUNTNO >= $mid"); The above GetMedian function is not particularly optimal when you want need to run it multiple times on the same dataset. Improvements are left to the reader (or in a future blog entry). PS: Another strategy for parallelization popularised by Google is Map Reduce.
- Easy Parallel Processing in PHP The proliferation of multicore CPUs and the inability of our learned CPU vendors to squeeze many more GHz into their designs means that often the only way to get additional performance is by writing clever parallel software. One problem we were having is that some of our batch processing jobs were taking too long to run. In order to speed the processing, we tried to split the processing file into half, and let a separate PHP process run each job. Given that we were using a dual core server, each process would be able to run close to full speed (subject to I/O constraints). Here is our technique for running multiple parallel jobs in PHP. In this example, we have two job files: j1.php and j2.php we want to run. The sample jobs don't do anything fancy. The file j1.php looks like this: $jobname = 'j1'; set_time_limit(0); $secs = 60; while ($secs) { echo $jobname,'::',$secs,"\n"; flush(); @ob_flush(); ## make sure that all output is sent in real-time $secs -= 1; $t = time(); sleep(1); // pause } The reason why we flush(); @ob_flush(); is that when we echo or print, the strings are sometimes buffered by PHP and not sent until later. These two functions ensure that all data is sent immediately. We then have a 3rd file, control.php, which does the coordination of jobs j1 and j2. This script will call j1.php and j2.php asynchronously using fsockopen in JobStartAsync(), so we are able to run j1.php and j2.php in parallel. The output from j1.php and j2.php are returned to control.php using JobPollAsync(). # # control.php # function JobStartAsync($server, $url, $port=80,$conn_timeout=30, $rw_timeout=86400) { $errno = ''; $errstr = ''; set_time_limit(0); $fp = fsockopen($server, $port, $errno, $errstr, $conn_timeout); if (!$fp) { echo "$errstr ($errno)<br />\n"; return false; } $out = "GET $url HTTP/1.1\r\n"; $out .= "Host: $server\r\n"; $out .= "Connection: Close\r\n\r\n"; stream_set_blocking($fp, false); stream_set_timeout($fp, $rw_timeout); fwrite($fp, $out); return $fp; } // returns false if HTTP disconnect (EOF), or a string (could be empty string) if still connected function JobPollAsync(&$fp) { if ($fp === false) return false; if (feof($fp)) { fclose($fp); $fp = false; return false; } return fread($fp, 10000); } ########################################################################################### if (1) { /* SAMPLE USAGE BELOW */ $fp1 = JobStartAsync('localhost','/jobs/j1.php'); $fp2 = JobStartAsync('localhost','/jobs/j2.php'); while (true) { sleep(1); $r1 = JobPollAsync($fp1); $r2 = JobPollAsync($fp2); if ($r1 === false && $r2 === false) break; echo "<b>r1 = </b>$r1<br>"; echo "<b>r2 = </b>$r2<hr>"; flush(); @ob_flush(); } echo "<h3>Jobs Complete</h3>"; } And the output could look like this: r1 = HTTP/1.1 200 OK Date: Wed, 03 Sep 2008 07:20:20 GMT Server: Apache/2.2.4 (Unix) mod_ssl/2.2.4 OpenSSL/0.9.8d X-Powered-By: Zend Core/2.5.0 PHP/5.2.5 Connection: close Transfer-Encoding: chunked Content-Type: text/html 7 j1::60 r2 = HTTP/1.1 200 OK Date: Wed, 03 Sep 2008 07:20:20 GMT Server: Apache/2.2.4 (Unix) mod_ssl/2.2.4 OpenSSL/0.9.8d X-Powered-By: Zend Core/2.5.0 PHP/5.2.5 Connection: close Transfer-Encoding: chunked Content-Type: text/html 7 j2::60 ---- r1 = 7 j1::59 r2 = 7 j2::59 ---- r1 = 7 j1::58 r2 = 7 j2::58 ---- Note that "7 j2::60" is returned by PollJobAsync(). The reason is that the HTTP standard requires the packet to return the payload length (7 bytes) in the first line. I hope this was helpful. Have fun! PS: Also see Divide-and-conquer and parallel processing in PHP. Also see popen for an alternative technique.
- Microsoft contributes to LGPL project for first time: ADOdb mssqlnative drivers Last week, I got an email from Garrett Serack, M'soft Open Source Community Developer. Microsoft have been kind enough to donate a set of ADOdb drivers for the new MSSQL Native Extension for PHP. You can download the extension here and the ADOdb drivers here. Garrett also mentions that ADOdb is the first LGPL project that Microsoft has ever contributed to. I quote from his email to me: ADODB is actually the first LGPL Open Source project that Microsoft has ever contributed to. We've got a dozen or so others lined up and ready to go to other open source PHP projects (GPL, BSD and others), But ADODB was the *FIRST*. You could say that contributing to ADODB is Microsoft going from zero to one. We announced it at OSCON, (see the post at http://port25.technet.com/archive/2008/07/25/oscon2008.aspx ) along with Microsoft becoming a platinum sponsor of the Apache Software Foundation. Either of these two steps is such a good move for Microsoft, and both together, is a good sign that the Company is learning. Thanks Garrett. Story in The Register. PS: ADOdb is dual licensed as LGPL and BSD. Choose which license you want.
- Perception is 99% of reality Jeff Atwood writes: If you've used Windows Vista, you've probably noticed that Vista's file copy performance is noticeably worse than Windows XP. I know it's one of the first things I noticed. Here's the irony-- Vista's file copy is based on an improved algorithm and actually performs better in most cases than XP. So how come it seems so darn slow? PS: Jeff adds that Vista SP1 has switched back to XP's algorithm. Duhh!
- Octalpussy In my previous post I asked what would be the output of of the following numbers: echo 09," => (09) <br>"; echo 9," => (9) <br>"; The answer is: 0 => (09) 9 => (9) That's because any number preceded by 0 is treated as an octal number, and 9 is an invalid octal number. Octal numbers are base 8, e.g.: Octal ValueDecimal Value 11 22 33 44 55 66 77 108 119 The silly thing is that hardly anyone uses octal nowadays, but it continues to be part of the C, C++, Java and PHP standards. The mistake is also very common. C-style languages pride themselves in their terse and minimalist syntax, but this is one scenario where a language design error was probably made. Perhaps 0c should have been used to represent octal in analogy to 0x for hexadecimal, but this suggestion is sadly 35 years too late. 0 for octal is too deeply imprinted in modern compiler DNA. PS: Here's the mistaken ADOdb bug report that started it.
- Octopussy numbers in PHP Someone reported a bug in ADOdb, the open source db library i maintain. I went crazy for half an hour until i realised the problem. Here's a little gotcha you can try: echo 09," => (09) <br>"; echo 9," => (9) <br>"; If you expect the above code to produce the same values, you are sadly mistaken. Try it. Also see the followup.
Feed cached for the next hour.

