This is a Leanpub book. Leanpub empowers authors and publishers with the Lean Publishing process. Lean Publishing is the act of publishing an in-progress ebook using lightweight tools and many iterations to get reader feedback, pivot until you have the right book and build traction once you do.
Preface
I have been programming since high school and since the early 1980s I have worked on artificial intelligence, neural networks, machine learning, and general web engineering projects. Most of my professional work is reflected in the examples in this book. These examples programs were also chosen based on their technological importance, i.e. the rapidly changing technical scene of big data, the use of machine learning in systems that touch most parts of our lives, and networked devices. I then narrowed the list of topics based on a public survey I announced on my blog. Many thanks to the people who took the time to take this survey. It is my hope that the Java example programs in this book will be useful in your projects. Hopefully you will also have a lot of fun working through these examples!
Java is a flexible language that has a huge collection of open source libraries and utilities. Java gets some criticism for being a verbose programming language. I have my own coding style that is concise but may break some of the things you have learned about proper use of the language. The Java language has seen many upgrades since its introduction over 20 years ago. This book requires and uses the features of Java 8 so please update to the latest JDK if you have not already done so. You will also need to have maven installed on your system. I also provide project files for the free Community Version of the IntelliJ IDE.
Everything you learn in this book can be used with some effort in the alternative JVM languages Clojure, JRuby, and Scala. In addition to Java I frequently use Clojure, Haskell, and Ruby in my work.
Book Outline
This book consists of eight chapters that I believe show the power of the Java language to good effect:
- Network programming techniques for the Internet of Things (IoT)
- Natural Language Processing using OpenNLP including using existing models and creating your own models
- Machine learning using the Spark mllib library
- Anomaly Detection Machine Learning
- Deep Learning using Deeplearning4j
- Web Scraping
- Using rich semantic and linked data sources on the web to enrich the data models you use in your applications
- Java Strategies for Knowledge Management-Lite using Cloud Data Resources
The first chapter on IoT is a tutorial on network programming techniques for IoT development. I have also used these same techniques for multiplayer game development and distributed virtual reality systems, and also in the design and implementation of a world-wide nuclear test monitoring system. This chapter stands on its own and is not connected to any other material in this book.
The second chapter shows you how to use the OpenNLP library to train your own classifiers, tag parts of speech, and generally process English language text. Both this chapter and the next chapter on machine learning using the Spark mllib library use machine learning techniques.
The fourth chapter provides an example of anomaly detection using the University of Wisconsin cancer database. The fifth chapter is a short introduction to pulling plain text and semi-structured data from web sites.
The last two chapters are for information architects or developers who would like to develop information design and knowledge management skills. These chapters cover linked data (semantic web) and knowledge management techniques.
The source code for the examples can be found at https://github.com/mark-watson/power-java and are all released under the Apache 2 license. I have tried to use only existing libraries in the examples that are either Apache 2 or MIT style licensed. In general I prefer Free Software licenses like GPL, LGPL, and AGPL but for examples in a book where I expect readers to sometimes reuse entire example programs or at least small snippets of code, a license that allows use in commercial products makes more sense.
There is a subdirectory in this github repository for each chapter, each with its own maven pom.xml file to build and run the examples.
The five chapters are independent of each other so please feel free to skip around when reading and experimenting with the sample programs.
This book is available for purchase at https://leanpub.com/powerjava.
You might be interested in other books that I have self-published via leanpub:
- Practical Artificial Intelligence Programming With Java
- Loving Common Lisp, or the Savvy Programmers Secret Weapon
- Build Intelligent Systems with JavaScript
My older books published by Springer-Verlag, McGraw-Hill, Morgan Kaufman, APress, Sybex, M&T Press, and J. Wiley are listed on the books page of my web site.
One of the major themes of this book is machine learning. In addition to my general technical blog I have a separate blog that contains information on using machine learning and cognition technology: blog.cognition.tech and an associated website supporting cognition technology.
If You Did Not Buy This Book
I frequently find copies of my books on the web. If you have a copy of this book and did not buy it please consider paying the minimum purchase price of $4 at leanpub.com/powerjava.
Network Programming Techniques for the Internet of Things
This chapter will show you techniques of network programming relevant to developing Internet of Things (IoT) projects and products using local TCP/IP and UDP networking. We will not cover the design of hardware or designing IoT user experiences. Specifically, we will look at techniques for using local directory services to publish and look up available services and techniques for efficiently communicating using UDP, multicast and broadcast.
This chapter is a tutorial on network programming techniques that I believe you will find useful for developing IoT applications. The material on User Data Protocol (UDP) and multicast is also useful for network game development.
I am not covering some important material: the design of user experience and devices, and IoT devices that use local low power radios to connect cooperating devices. That said, it is worth thinking about what motivates the development of IoT devices and we will do this in the next section.
There are emerging standards for communication between IoT devices and open source projects like TinyOS and Contiki that are C language based, not Java based, so I wont discuss them. Oracle supports the Java ME Embedded profile that is used in some IoT products but in this chapter I want to concentrate network programming techniques and example programs that run on stock Java (including Android devices).
Motivation for IoT
We are used to the physical constraints of using computing devices. When I was in high school in the mid 1960s and took a programming class at a local college I had to make a pilgrimage to the computer center, wait for my turn to use a keypunch machine, walk over to submit my punch cards, and stand around and wait and eventually get a printout and my punch cards returned to me. Later, interactive terminals allowed me to work in a more comfortable physical environment. Jumping ahead almost fifty years I can now use my smartphone to SSH into my servers, watch movies, and even use a Java IDE. As I write this book, perhaps 70% of Internet use is done on mobile devices.