LitArk » Books » Computer

Pete Warden - Big Data Glossary

Here you can read online Pete Warden - Big Data Glossary full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2011, publisher: OReilly Media, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Book:
Big Data Glossary
Author:
Pete Warden
Publisher:
OReilly Media
Genre:
Books / Computer
Year:
2011
Rating:
3 / 5
Favourites:
Add to favourites
Your mark:
- 60
- 1
- 2
- 3
- 4
- 5

Description
Author's other books
Similar books

Big Data Glossary: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Big Data Glossary" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

To help you navigate the large number of new data tools available, this guide describes 60 of the most recent innovations, from NoSQL databases and MapReduce approaches to machine learning and visualization tools. Descriptions are based on first-hand experience with these tools in a production environment. This handy glossary also includes a chapter of key terms that help define many of these tool categories:NoSQL DatabasesDocument-oriented databases using a key/value interface rather than SQL MapReduceTools that support distributed computing on large datasets StorageTechnologies for storing data in a distributed way ServersWays to rent computing power on remote machines ProcessingTools for extracting valuable information from large datasets Natural Language ProcessingMethods for extracting information from human-created text Machine LearningTools that automatically perform data analyses, based on results of a one-off analysis VisualizationApplications that present meaningful data graphically AcquisitionTechniques for cleaning up messy public data sources SerializationMethods to convert data structure or object state into a storable format

Pete Warden: author's other books

Who wrote Big Data Glossary? Find out the surname, the name of the author of the book and a list of all author's works by series.

Big Data Glossary — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Big Data Glossary" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Big Data Glossary

Pete Warden

Editor

Mike Loukides

OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (.

Nutshell Handbook, the Nutshell Handbook logo, and the OReilly logo are registered trademarks of OReilly Media, Inc. Big Data Glossary , the image of an elephant seal, and related trade dress are trademarks of OReilly Media, Inc.

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and OReilly Media, Inc., was aware of a trademark claim, the designations have been printed in caps or initial caps.

While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein.

OReilly Media Preface Theres been a massive amount of innovation in data - photo 1

O'Reilly Media

Preface

Theres been a massive amount of innovation in data tools over the last few years, thanks to a few key trends:

Learning from the Web

Techniques originally developed by website developers coping with scaling issues are increasingly being applied to other domains.

CS+?=$$$

Google has proven that research techniques from computer science can be effective at solving problems and creating value in many real-world situations. Thats led to increased interest in cross-pollination and investment in academic research from commercial organizations.

Cheap hardware

Now that machines with a decent amount of processing power can be hired for just a few cents an hour, many more people can afford to do large-scale data processing. They cant afford the traditional high prices of professional data software, though, so theyve turned to open source alternatives.

These trends have led to a Cambrian explosion of new tools, which means that when youre planning a new data project, you have a lot to choose from. This guide aims to help you make those choices by describing each tool from the perspective of a developer looking to use it in an application. Wherever possible, this will be from my firsthand experiences or from those of colleagues who have used the systems in production environments. Ive made a deliberate choice to include my own opinions and impressions, so you should see this guide as a starting point for exploring the tools, not the final word. Ill do my best to explain what I like about each service, but your tastes and requirements may well be quite different.

Since the goal is to help experienced engineers navigate the new data landscape, this guide only covers tools that have been created or risen to prominence in the last few years. For example, Postgres is not covered because its been widely used for over a decade, but its Greenplum derivative is newer and less well-known, so it is included.

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic

Indicates new terms, URLs, email addresses, filenames, and file extensions.

Constant width

Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.

Constant width bold

Shows commands or other text that should be typed literally by the user.

Constant width italic

Shows text that should be replaced with user-supplied values or by values determined by context.

Tip

This icon signifies a tip, suggestion, or general note.

Caution

This icon indicates a warning or caution.

Using Code Examples

This book is here to help you get your job done. In general, you may use the code in this book in your programs and documentation. You do not need to contact us for permission unless youre reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from OReilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your products documentation does require permission.

We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: Big Data Glossary by Pete Warden ( OReilly ). Copyright 2011 Pete Warden, 978-1-449-31459-0.

If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at .

Safari Books Online

Note

Safari Books Online is an on-demand digital library that lets you easily search over 7,500 technology and creative reference books and videos to find the answers you need quickly.

With a subscription, you can read any page and watch any video from our library online. Read books on your cell phone and mobile devices. Access new titles before they are available for print, and get exclusive access to manuscripts in development and post feedback for the authors. Copy and paste code samples, organize your favorites, download chapters, bookmark key sections, create notes, print out pages, and benefit from tons of other time-saving features.

OReilly Media has uploaded this book to the Safari Books Online service. To have full digital access to this book and others on similar topics from OReilly and other publishers, sign up for free at http://my.safaribooksonline.com.

How to Contact Us

Please address comments and questions concerning this book to the publisher:

OReilly Media, Inc.

1005 Gravenstein Highway North

Sebastopol, CA 95472

800-998-9938 (in the United States or Canada)

707-829-0515 (international or local)

707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at:

http://www.oreilly.com/catalog/9781449314590

To comment or ask technical questions about this book, send email to:

For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.

Find us on Facebook: http://facebook.com/oreilly

Watch us on YouTube: http://www.youtube.com/oreillymedia

Chapter 1. Terms

These new tools need some shorthand labels to describe their properties, and since theyre likely to be unfamiliar to traditional database users, Ill start off with a few definitions.

Document-Oriented

In a traditional relational database, the user begins by specifying a series of column types and names for a table. Information is then added as rows of values, with each of those named columns as a cell of each row. You cant have additional values that werent specified when you created the table, and every value must be present, even if its as a NULL value.

Document stores instead let you enter each record as a series of names with associated values, which you can picture being like a JavaScript object, a Python dictionary, or a Ruby hash. You dont specify ahead of time what names will be in each table using a schema. In theory, each record could contain a completely different set of named values, though in practice, the application layer often relies on an informal schema, with the client code expecting certain named values to be present.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Similar books «Big Data Glossary»

Look at similar books to Big Data Glossary. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.

DasGupta

Practical Big Data Analytics: Hands-on techniques to implement enterprise analytics and machine learning using Hadoop, Spark, NoSQL and R

Linoff

Data Analysis Using SQL and Excel

Hubbard

Java data analysis: data mining, big data analysis, NoSQL, and data visualization

Celko

Joe Celkos complete guide to NoSQL: what every SQL professional needs to know about nonrelational databases

Andreas Meier

SQL & NoSQL Databases: Models, Languages, Consistency Options and Architectures for Big Data Management

Warden

Big Data Glossary

Jesse C. Daniel

Data Science with Python and Dask

Michael Manoochehri

Data Just Right Introduction to Large-Scale Data & Analytics

Davy Cielen

Introducing Data Science: Big Data, Machine Learning, and more, using Python tools

James Church

Learning Haskell Data Analysis

Ted Dunning

Time Series Databases: New Ways to Store and Access Data

Shashank Tiwari

Professional NoSQL

Reviews about «Big Data Glossary»

Discussion, reviews of the book Big Data Glossary and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.