Learn the most common output functions through examplese

Teaching of a Samurai Engineer 10: Output control functions, part 1

This time, we’ll be looking at the various functions relating to output control (the ob_* functions).
First, let’s look at the common use case.

First, by far the most common use is probably to work against the messages ‘Warning: session_start (): Cannot start session when headers already sent in’ and ‘Warning: Cannot modify header information — headers already sent by’.
You can rewrite programs so that nothing is outputted to stanard output before the header is outputted, no matter what… but depending on code and logic, this can be difficult, or demand major rewrites, which is where this comes in.
The code would look something like what’s below.

// Begin buffering
ob_start ();
// XXX processing
// Finally, output the buffer contents and end buffering
ob_end_flush ();

Sometimes, ob_start () won’t be performed, and echo, print, var_dump etc. will occur in standard output before the header is outputted, but you won’t receive an error message.
This is because setting output_buffering in php. ini to ‘on’ will enable output buffering, so it will work without errors even without calling ob_start () functions.
However, if output_buffering is set to an integer rather than a boolean On, it will flush once at that byte count, so you need to be careful.
Additionally, output_buffering is a PHP_INI_PERDIR, which cannot be set within the program, so if it’s a program that needs to have php. ini specifically set up, depending on the environment, it may or may not work.
Personally, I often write ob_start () in the program itself.

A common use is if you want to get the results of a var_dump etc. as a character row.
For example, var_dump is a very useful function, but the var_dump function itself’s return values are defined as ‘: void’, and will always output in standard output.
Still, sometimes when writing a program, you might want to get this as a character string to save in a file or DB.

At those times, you can use an output control function to get it as a character string.
The ob_get_clean () function can be useful then [ob_get_contents () functions + ob_end_clean () functions work too, but ob_get_clean is probably easier as it’s all under one function].

Here’s some example code.

// Begin buffering
ob_start ();
$obj = new stdClass (); // The variable you want to see in var_dump
ob_start (); // Start buffering for the character string you want to acquire through var_dump
var_dump ($obj);
$s = ob_get_clean (); // Acquire the buffer contents for the buffer of the character string acquired from var_dump, delete buffer contents, end buffering
// $s contains var_dump contents as a character string, so save that to a file, DB, and so on: For example, saving a file
file_put_contents (’. /var_dump_string. txt’, $s);
// Finally, output the buffer contents and end buffering
ob_end_flush ();

As you see here, you can also use this to acquire the output of functions that always give standard output as a variable instead, so it’s worth knowing.

You can also use these functions to compress output.

This takes some explanation, so let’s begin with the technical background.

HTTP communications come down to a request and response. Requests and responses can be broken down into the head and body of each.

Of these, the body of the response is the core of the contents.

The core of the contents is, for example, character information in HTML or JSON, so compressing it is relatively efficient, and may optimise communications.

Strictly speaking, it’s down to balancing communication speed against the CPU cost of compressing, and the CPU cost of unpacking it client-side, but typically it’s more efficient to compress.

As such, you can add a setting to compress in httpd (usually gzip). For example, in nginx, you can compress the response with the ‘gzip on; ’ setting.

Depending on the situation, there will be times when you can’t add such settings to httpd.
At times like that, by handing giving parameters to the ob_start () function, you can tell it to apply gzip compression when outputting buffered contents.

Let’s write some code to try it out.

// Register the callback function to perform gzip compression with ob_gzhandler, and begin buffering
ob_start (’ob_gzhandler’);
// XXX processing
// Finally, output the buffer contents and end buffering
ob_end_flush ();

This depends on the state of compression support in the browser too, but if it’s supported, it should perform gzip compression on the output.
This callback function will be explained in more detail later.

The talk towards the end about uses is something I don’t personally like much, but please remember it anyway, since actual work environments can still call for it in practice.
To summarise, output control functions are a powerful weapon when (for reasons such as being given everything from an outside source) editing the files is difficult, but you need to prevent obstruction from including/requiring problematic files.

In PHP, everything from the starting tag of is considered PHP code.
Putting it another way, everything before the start tag or after the end tag will be ignored by the PHP parser and outputted as-is.

The first problem with this is if there are linebreaks and the like after the end tag.
? >

In the case of tags like this, the line break after the end tag will be outputted as-is.

Another issue is the BOM (byte order mark) in UTF-8 files.
Currently, the letter code of PHP is often UTF-8, but if you save this with BOM included, the start of the file will have 3 bytes of information not visible in a text editor (0 xEF 0 xBB 0 xBF).
As these are letters before the start tag, they will be outputted as well.


Write just that code and try it out.
We’ll take this file, and execute it, redirect the output to be written in a file, and look at that with the od command.

Executing a PHP file without BOM gets you the following:

< execute>
[ ~]$od — x output. txt
< /execute>

The file size is also 0.

However, if you save it with BOM included and then execute it and look at the results…

< execute>
[ ~]$od — x output. txt
0000000 bbef 00 bf
< /execute>
You get this. The file size will be 3 bytes.
From this you can tell the BOM is being outputted.

If you’re only outputting HTML, things like unnecessary whitespace and BOM won’t be an issue.
However, in situations where you’re trying to generate an image, unnecessary data will attach itself to the start of the image’s binary, which will break the image.

Since things like this can happen…

  • Save files in UTF-8, without the BOM
  • Do not write an end tag (unless you’re writing HTML in a PHP file)

This is the standard advice. However, in situations where there are end tags, unnecessary whitespace, or it’s saved with the BOM, if it’s from a library contributed by another company, or from your own company but riddled with broken rules, it can be troublesome to deal with.

At times like this, ob_start () functions can fix these problems. The actual method is simple.

  • Before using include (require) on files that create unnecessary output, perform ob_start ()
  • After using include (require) on all files that create unnecessary output, perform ob_end_clean ()

That’s it.

As far as how operation works…

  • ob_start () begins buffering of the output
  • if you include (require) files with unnecessary line breaks or BOM, the BOM and line breaks are instantly outputted: They are then buffered
  • ob_end_clean () clears (deletes) the output buffer, and ends output buffering

Thus, unnecessary line breaks and BOM are buffered and then deleted without being outputted, so in the end there’s no problem.

Like you see here, output control functions when used well are a very useful group of functions.
Next time, we’ll go into a bit more detail on individual functions.

Michiaki Furusho