Hack and HHVM
Owen Yamauchi
Beijing Cambridge Farnham Kln Sebastopol Tokyo
Chapter 1. Introduction
For most of its history, Facebook has held internal hackathons every few months. For hackathons, engineers are encouraged to come up with ideas that arent related to their day jobs, form teams, and try to make something cool, in the span of a day or two.
In 2007, one hackathon in November resulted in an interesting experiment: a tool that could convert PHP programs into equivalent C++ programs and then compile them with a C++ compiler. The idea was that the C++ program would run a lot faster than the PHP original, since it could take advantage of all the optimization work that has gone into C++ compilers over the years.
This possibility was of great interest to Facebook. It was gaining a lot of new users, and supporting more users requires more CPU cycles. As you run out of available CPU cycles, unless you buy more CPUs, which gets very expensive, you have to find a way to consume fewer CPU cycles per user. Facebooks entire web front-end was written in PHP, and any way to get that PHP code to consume fewer CPU cycles was welcome.
Over the next seven years, the project grew far beyond its hackathon origins. As a PHP-to-C++ transformer called HPHPc, in 2009 it became the sole execution engine powering Facebooks web servers. In early 2010, it was open-sourced under the name HipHop for PHP. And then, starting in 2010, an entirely new approach to executionjust-in-time compilation to machine code, with no C++ involvedgrew out of HPHPcs codebase, and eventually superseded it. This just-in-time compiler, called the HipHop Virtual Machine, or HHVM for short, took over Facebooks entire web server fleet in early 2013. The original PHP-to-C++ transformer is gone; it is not deployed anywhere and its code has been deleted.
The origin of Hack is entirely separate. Its roots are in a project that attempted to use static analysis on PHP to automatically detect potential security bugs. Fairly soon, it turned out that the nature of PHP makes it fundamentally difficult to get static analysis thats deep enough to be useful. Thus the idea of strict mode was born: a modification of PHP, with some features removed (such as references), and a sophisticated type system added. Authors of PHP code could opt into strict mode, gaining stronger checking of their code while retaining full interoperability.
Hacks direction since then belies its origin as a type system on top of PHP. It has gained new features with significant effects on the way Hack code is structured, like async. It has added new features specifically meant to make the type system more powerful, like collections. Philosophically, its a different language from PHP, carving out a new position in the space of programming languages.
This is how we got where we are today: a modern, dynamic programming language with robust static typechecking, executing with just-in-time compilation on an engine with full PHP compatibility and interoperability.
What are Hack and HHVM?
Hack and HHVM are closely related, and there has occasionally been some confusion as to what exactly the terms refer to.
Hack is a programming language. Its based on PHP, shares much of PHPs syntax, and is designed to be fully interoperable with PHP. However, it would be severely limiting to think of Hack as nothing more than some decoration on top of PHP. Hacks main feature is robust static typechecking, which is enough of a difference from PHP to qualify Hack as a language in its own right. Hack is useful for developers working on an existing PHP codebase, and has many affordances for that situation, but its also an excellent choice for ground-up development of a new project.
Beyond static typechecking, Hack has several other features that PHP doesnt have, and most of this book is about those features: async functions, XHP, and many more. It also intentionally lacks a handful of PHPs features, to smooth some rough edges.
HHVM is an execution engine. It supports both PHP and Hack, and it lets two languages interoperate: code written in PHP can call into Hack code, and vice versa. When executing PHP, its intended to be usable as a drop-in replacement for the standard PHP interpreter from php.net. This book has a few chapters that cover HHVM: how to configure and deploy it, and how to use it to debug and profile your code.
Finally, separate from HHVM, there is the Hack typechecker: a program that can analyze Hack code (but not PHP code) for type errors. The typechecker is somewhat stricter than HHVM about what it will accept, although HHVM will become stricter to match the typechecker in future versions. The typechecker doesnt really have a name, other than the command you use to run it, hh_client
. Ill refer to it as the Hack typechecker or just the typechecker.
As of now, HHVM is the only execution engine that runs Hack, which is why the two may sometimes be conflated.
Who This Book is For
This book is for readers who are comfortable with programming. It spends no time explaining concepts common to many programming languages, like control flow, data types, functions, and object-oriented programming.
Hack is a descendant of PHP. This book doesnt specifically explain common PHP syntax, except in areas where Hack differs, so basic knowledge of PHP is helpful. If youve never used PHP, youll still be able to understand much of the code in this book if you have experience with other programming languages. The syntax is generally very straightforward to understand.
You dont need to have worked on a large PHP codebase. Hack is useful for codebases of all sizesfrom simple stand-alone scripts to multi-million-line web apps like Facebook. Theres nothing here that you wont understand if youve never worked on a complex high-traffic PHP website.
There is some material that assumes familiarity with typical web app tasks like querying relational databases and memcached
(in ). You can skip these parts if theyre not relevant to you, but they require no knowledge that you wouldnt get from experience with even a small, basic web app.
I hope to make this book not just an explanation of how things are, but also of how they came to be that way. Programming language design is a hard problem; its essentially the art of navigating hundreds of tradeoffs at once. Its also subject to a surprising range of pragmatic concerns like backward compatibility, and Hack is no exception. If youre at all interested in a case study of how one programming language made its way through an unusual set of constraints, this book should provide.
Philosophy
There are a few principles that underlie the design of both Hack and HHVM, which can help you understand how things came to be the way they are.
Program Types
There is a single observation about programs that informs both HHVMs approach to optimizing and executing code, and Hacks approach to verifying it. That is: behind most programs in dynamically-typed languages, a statically-typed program is hiding.
Consider this code, which works as both PHP and Hack:
for
(
$i
=
0
;
$i
<
10
;
$i
++
)
{
echo
$i
+
100
;
}
Although its not explicitly stated anywhere, its obvious to any human reader that