Foreword
Nearly 30 years ago, when I started my career, a 10 MB upgrade on a hard-disk drive was a big purchase and had to go through many approvals in the enterprise. The drawing office of a medium-sized engineering enterprise stored their drawings in this extra large storage! Over the years, storage became cheaper and bigger. The supply side proved the Moore's law and its variations accurately.
Much more has happened on the demand side though. User organizations have realized the potential of data and analytics. So, the amount of data generated at each level in the enterprise has gone up much more steeply. Some of this data comes through well-defined processes; on the other hand though, a large majority of it comes through numerous unstructured forms, and as a result, ends up as unstructured data. Analytics tried to keep pace and mostly succeeded. However, the diversity of both the data and the desired analytics demands newer and smarter methods for working with the data. The Pig platform surely is one of these methods. Nevertheless, the power of such a platform is best tapped by extending it efficiently. Extending requires great familiarity of the platform. More importantly, extending is fun when the process of building such extensions is easy.
The Pig Latin platform offers great simplicity. However, a practitioner's advice is immensely valuable in leveraging this simplicity to an enterprise's own requirement. This is where I find this book to be very apt. It makes you productive with the platform pretty quickly through very well-researched design patterns. This helps simplify programming in Hadoop and create complex end-to-end enterprise-grade Big Data solutions through a building block and best-pattern approach.
This book covers the journey of Big Data from the time it enters the enterprise to its eventual use in analytics, either in the form of a dashboard or a predictive model.
I particularly liked the presentation of the content. You need not go sequentially through the book; you can go straight to the pattern of your interest, skipping some of the preceding content. The fact that every pattern you see in this book will be relevant to you at some point in your journey with Big Data should be a good reason to spend time with those patterns as well. The simplicity of the quoted examples puts the subject in the right perspective, in case you already browsed through some pages and felt that the examples were not exactly from your domain.
Most likely, you will find a few patterns that exactly fit your requirement. So go ahead, adopt them, and gain productivity right away.
As of writing this foreword, the world is still struggling with analyzing incomprehensibly large data, which is like trying to locate a passenger plane that went missing in the sky! This is the way things seem to work. Just when we think we have all the tools and technologies, we realize that we need much more power beyond what we have available today. Extending this, one would realize that data (creation, collection, and so on) and analytics will both play an extremely important role in our future. A knowledge tool that helps us move toward this future should always be welcomed, and what could be a better tool than a good book like this!
I had a very enriching experience while working with Pradeep earlier in my career. I spotted talent in him that was beyond the ordinary. However, in an environment that is driven primarily by a customer project and where technologies and platforms are defined by the customer, I must admit that we did not give sufficient room for him to show his creativity in designing new technologies. Even here, I fondly recollect a very creative work of distributed processing of a huge vector map data by Pradeep and his colleagues. This monster of a job would run overnight on many desktop systems that were otherwise lying unused in our organization. A consolidation engine would later stitch up the results from individual systems to make one seamless large dataset. This might look very trivial today, but more than a decade ago, it was a big innovation that helped greatly compress our release cycles.
Throughout the years, he continued this passion of using machine learning on Big Data to solve complex problems and find answers that touch human lives. Possessing a streak of hard-to-hide innovativeness, Pradeep is bold enough to think beyond what is possible. His works on computational linguistics (NLP) and deep-learning techniques to build expert systems are all examples of this.
That he made a transition from being the lead of a development-focused team to an established technology author makes me immensely pleased. His constant and unlimited appetite for knowledge is something to emulate for people like me, who are in the technology space! Although not directly related to this book, it is appropriate that I mention even his strong value system as an individual. This quality is what makes him a successful professional, a great leader, and a guru to learn from!
He was kind enough to ask me to review this book. However, the boss in me jumped out and tried to grill him as I often did when he worked in my team. He responded very positively to my critique, which at times was harsh when I look back at it! For you see, both of us share a common belief that it is better to realize the existing errors and potential improvements in processes ourselves, and not simply leave them to reach our customers or you, the audience of this book.
I always felt that a good book can be authored only with a specific end user profile in mind. A book written for beginners may not appeal to a professional at all. The opposite of this is even truer. However, this work by Pradeep benefits both beginners and professionals equally well. This is the biggest difference that I found in this book.