About the Author
Garry Turkington has 14 years of industry experience, most of which has been focused on the design and implementation of large-scale distributed systems. In his current roles as VP Data Engineering and Lead Architect at Improve Digital, he is primarily responsible for the realization of systems that store, process, and extract value from the company's large data volumes. Before joining Improve Digital, he spent time at Amazon.co.uk, where he led several software development teams building systems that process Amazon catalog data for every item worldwide. Prior to this, he spent a decade in various government positions in both the UK and USA.
He has BSc and PhD degrees in Computer Science from the Queens University of Belfast in Northern Ireland and an MEng in Systems Engineering from Stevens Institute of Technology in the USA.
I would like to thank my wife Lea for her support and encouragementnot to mention her patiencethroughout the writing of this book and my daughter, Maya, whose spirit and curiosity is more of an inspiration than she could ever imagine.
About the Reviewers
David Gruzman is a Hadoop and big data architect with more than 18 years of hands-on experience, specializing in the design and implementation of scalable high-performance distributed systems. He has extensive expertise of OOA/OOD and (R)DBMS technology. He is an Agile methodology adept and strongly believes that a daily coding routine makes good software architects. He is interested in solving challenging problems related to real-time analytics and the application of machine learning algorithms to the big data sets.
He foundedand is working withBigDataCraft.com, a boutique consulting firm in the area of big data. Visit their site at www.bigdatacraft.com. David can be contacted at david@bigdatacraft.com
. More detailed information about his skills and experience can be found at http://www.linkedin.com/in/davidgruzman.
Muthusamy Manigandan is a systems architect for a startup. Prior to this, he was a Staff Engineer at VMWare and Principal Engineer with Oracle. Mani has been programming for the past 14 years on large-scale distributed-computing applications. His areas of interest are machine learning and algorithms.
Vidyasagar N V has been interested in computer science since an early age. Some of his serious work in computers and computer networks began during his high school days. Later, he went to the prestigious Institute Of Technology, Banaras Hindu University, for his B.Tech. He has been working as a software developer and data expert, developing and building scalable systems. He has worked with a variety of second, third, and fourth generation languages. He has worked with flat files, indexed files, hierarchical databases, network databases, relational databases, NoSQL databases, Hadoop, and related technologies. Currently, he is working as Senior Developer at Collective Inc., developing big data-based structured data extraction techniques from the Web and local information. He enjoys producing high-quality software and web-based solutions and designing secure and scalable data systems. He can be contacted at <>
.
I would like to thank the Almighty, my parents, Mr. N Srinivasa Rao and Mrs. Latha Rao, and my family who supported and backed me throughout my life. I would also like to thank my friends for being good friends and all those people willing to donate their time, effort, and expertise by participating in open source software projects. Thank you, Packt Publishing for selecting me as one of the technical reviewers for this wonderful book. It is my honor to be a part of it.
www.PacktPub.com
Support files, eBooks, discount offers and more
You might want to visit www.PacktPub.com for support files and downloads related to your book.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at > for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
http://PacktLib.PacktPub.com
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books.
Why Subscribe?
- Fully searchable across every book published by Packt
- Copy and paste, print and bookmark content
- On demand and accessible via web browser
Free Access for Packt account holders
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.
Preface
This book is here to help you make sense of Hadoop and use it to solve your big data problems. It's a really exciting time to work with data processing technologies such as Hadoop. The ability to apply complex analytics to large data setsonce the monopoly of large corporations and government agenciesis now possible through free open source software ( OSS ).
But because of the seeming complexity and pace of change in this area, getting a grip on the basics can be somewhat intimidating. That's where this book comes in, giving you an understanding of just what Hadoop is, how it works, and how you can use it to extract value from your data now.