Copyright 2012-2013, Jason Hickey, Anil Madhavapeddy and Yaron Minsky. Licensed under CC BY-NC-ND 3.0 US.
Prologue
Why OCaml?
The programming languages that you use affect the software you create. They influence your software's reliability, security and efficiency, and how easy it is to read, refactor, and extend. The languages you know can also deeply affect how you think about programming and software design.
But not all ideas about how to design a programming language are created equal. Over the last 40 years, a few key language features have emerged that together form a kind of sweet-spot in language design. These features include:
Garbage collection for automatic memory management, now a feature of almost every modern high-level language.
First-class functions that can be passed around like ordinary values, as seen in JavaScript and C#.
Static type-checking to increase performance and reduce the number of runtime errors, as found in Java and C#.
Parametric polymorphism , which enables the construction of abstractions that work across different datatypes, similar to generics in Java and C# and templates in C++.
Good support for immutable programming , i.e. , programming without making destructive updates to data-structures. This is present in traditional functional languages like Scheme, and is also found in distributed big data frameworks like Hadoop.
Automatic type inference to avoid having to laboriously define the type of every single variable in a program and instead have them inferred based on how a value is used. Available in C# with implicitly typed local variables and in a limited form in C++11 with its auto
keyword.
Algebraic datatypes and pattern matching to define and manipulate complex data structures. Available in Scala and F#.
Some of you will know and love these features, and others will be completely new to them. Most of you will have seen some of them in other languages that you've used. As we'll demonstrate over the course of this book, there is something transformative about having them all together and able to interact in a single language. Despite their importance, these ideas have made only limited inroads into mainstream languages and when they do arrive there, like higher-order functions in C# or parametric polymorphism in Java, it's typically in a limited and awkward form. The only languages that completely embody these ideas are statically-typed functional programming languages like OCaml, F#, Haskell, Scala and Standard ML.
Among this worthy set of languages, OCaml stands apart because it manages to provide a great deal of power while remaining highly pragmatic. The compiler has a straightforward compilation strategy that produces performant code without requiring heavy optimization and without the complexities of dynamic JIT compilation. This, along with OCaml's strict evaluation model, makes runtime behavior easy to predict. The garbage collector is incremental , letting you avoid large GC-related pauses, and precise , meaning it will collect all unreferenced data (unlike many reference-counting collectors), and the runtime is simple and highly portable.
All of this makes OCaml a great choice for programmers who want to step up to a better programming language, and at the same time get practical work done.
A brief history from the 1960s
OCaml was written in 1996 by Xavier Leroy, Jrme Vouillon, Damien Doligez and Didier Rmy at INRIA in France. It was inspired by a long line of research into ML starting in the 1960s, and continues to have deep links to the academic community.
ML was originally the meta language of the LCF proof assistant released by Robin Milner in 1972 (at Stanford, and later at Cambridge). ML was turned into a compiler in order to make it easier to use LCF on different machines, and gradually turned into a fully fledged system of its own by the 1980s.
The first implementation of Caml appeared in 1987, initially created by Ascander Saurez and later continued by Pierre Weis and Michel Mauny. In 1990, Xavier Leroy and Damien Doligez built a new implementation called Caml Light that was based on a bytecode interpreter with a fast sequential garbage collector. Over the next few years useful libraries appeared, such as Michel Mauny's syntax manipulation tools, and this helped promote the use of Caml in education and research teams.
Xavier Leroy continued extending Caml Light with new features, which resulted in the 1995 release of Caml Special Light. This improved the executable efficiency significantly by adding a fast native code compiler that made Caml's performance competitive with mainstream languages such as C++. A module system inspired by Standard ML also provided powerful facilities for abstraction and made larger-scale programs easier to construct.
The modern OCaml emerged in 1996, when a powerful and elegant object system was implemented by Didier Rmy and Jrme Vouillon. This object system was notable for supporting many common OO idioms in a statically type-safe way, whereas the same idioms required runtime checks in languages such as C++ or Java. In 2000, Jacques Garrigue extended OCaml with several new features such as polymorphic methods and variants and labeled and optional arguments.
The last decade has seen OCaml attract a significant user base. Language improvements have been steadily added to support the growing commercial and academic codebases written in OCaml. First-class modules, Generalized Algebraic Data Types (GADTs) and dynamic linking have improved the flexibility of the language and there is fast native code support for x86_64, ARM, PowerPC, and Sparc, making OCaml a good choice for systems where resource usage, predictability, and performance all matter.
The Core Standard Library
A language on its own isn't enough. You also need a rich set of libraries to base your applications on. A common source of frustration for those learning OCaml is that the standard library that ships with the compiler is limited, covering only a small subset of the functionality you would expect from a general-purpose standard library. That's because the standard library isn't a general-purpose tool; it was developed for use in bootstrapping the compiler, and is purposefully kept small and simple.
Happily, in the world of open-source software nothing stops alternative libraries from being written to supplement the compiler-supplied standard library, and this is exactly what the Core distribution is.
Jane Street, a company that has been using OCaml for more than a decade, developed Core for its own internal use, but designed it from the start with an eye towards being a general-purpose standard library. Like the OCaml language itself, Core is engineered with correctness, reliability and performance in mind.
Core is distributed with syntax extensions which provide useful new functionality to OCaml, and there are additional libraries such as the Async network communications library that extend the reach of Core into building complex distributed systems. All of these libraries are distributed under a liberal Apache 2 license to permit free use in hobby, academic and commercial settings.