How to find out the opcodes of a PHP script?

To find out the opcodes of a PHP Script you can use the „Vulcan Logic Disassembler„, a PHP extension developed by Derick Rethans that hooks into the Zend Engine and dumps all the opcodes of a script. It is available in the PHP Extension Community Library (PECL).

PHP is an interpreted language. Every time a script gets called the PHP scripting engine Zend Engine scans it and turns it into the tokens known by the PHP language (a.k.a lexing), than parses the tokens (checks the syntax and builds a data structure) and directly invokes opcode compilation routines that create PHP opcodes (Bytecode) and finally executes those opcodes. Unlike other languages the PHP parser currently doesn’t build an abstract syntax tree that gets compiled (see abstract syntax tree, parse tree and one-pass compiler).

Knowing which opcodes are generated might help to understand why certain things don’t work as expected. You can get a table of the opcodes for a script by running this command:

php -d vld.active=1 -d vld.execute=0 -f script.php

Lets take a simple example:

<?php
// test.php
$a= 2*3+4;

Lets list all the tokens first:

php -r '$tokens = token_get_all(file_get_contents("test.php")); foreach($tokens as $key => $token) { if(count($token) == 3) { $tokens[$key][0] = token_name($token[0]); } } var_dump($tokens);';
array(12) {
  [0]=>
  array(3) {
    [0]=>
    string(10) "T_OPEN_TAG"
    [1]=>
    string(6) "<?php " 
    [2]=>
    int(1)
  }
  [1]=>
  array(3) {
    [0]=>
    string(12) "T_WHITESPACE"
    [1]=>
    string(1) "
"
    [2]=>
    int(2)
  }
  [2]=>
  array(3) {
    [0]=>
    string(10) "T_VARIABLE"
    [1]=>
    string(2) "$a"
    [2]=>
    int(3)
  }
  [3]=>
  string(1) "="
  [4]=>
  array(3) {
    [0]=>
    string(12) "T_WHITESPACE"
    [1]=>
    string(1) " "
    [2]=>
    int(3)
  }
  [5]=>
  array(3) {
    [0]=>
    string(9) "T_LNUMBER"
    [1]=>
    string(1) "2"
    [2]=>
    int(3)
  }
  [6]=>
  string(1) "*"
  [7]=>
  array(3) {
    [0]=>
    string(9) "T_LNUMBER"
    [1]=>
    string(1) "3"
    [2]=>
    int(3)
  }
  [8]=>
  string(1) "+"
  [9]=>
  array(3) {
    [0]=>
    string(9) "T_LNUMBER"
    [1]=>
    string(1) "4"
    [2]=>
    int(3)
  }
  [10]=>
  string(1) ";"
  [11]=>
  array(3) {
    [0]=>
    string(12) "T_WHITESPACE"
    [1]=>
    string(1) "
"
    [2]=>
    int(3)
  }
}

Now lets get the table of the opcodes:

php -d vld.active=1 test.php
Finding entry points
Branch analysis from position: 0
Return found
filename:       /home/stefan/projects/tests/test.php
function name:  (null)
number of ops:  5
compiled vars:  !0 = $a
line  # *    op        fetch ext return operands
------------------------------------------------
   3  0  >   EXT_STMT
      1      MUL                  ~0    2, 3
      2      ADD                  ~1    ~0, 4
      3      ASSIGN                     !0, ~1
   4  4    > RETURN                     1

Now lets change the script and add parentheses around the numbers 3 and 4:

<?php
$a= 2*(3+4);

The order of the opcodes changes:

number of ops:  5
compiled vars:  !0 = $a
line  # *    op        fetch ext return operands
------------------------------------------------
   3  0  >   EXT_STMT
      1      ADD                 ~0     3, 4
      2      MUL                 ~1     2, ~0
      3      ASSIGN                     !0, ~1
   4  4    > RETURN                     1

Caching

The opcodes can be cached. There was among others the „Alternative PHP Cache (APC)“ available.

Opcode Caches might also have an optimizer built in. But the APC configuration option „apc.optimization“ was removed in APC 3.0.13.

There is a list of all PHP tokens and a list of all the opcodes generated by the Zend Engine 2 on the PHP Website.

Sources: