• Complain

Vincent Granville - Developing Analytic Talent: Becoming a Data Scientist

Here you can read online Vincent Granville - Developing Analytic Talent: Becoming a Data Scientist full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2014, publisher: Wiley, genre: Politics. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Vincent Granville Developing Analytic Talent: Becoming a Data Scientist
  • Book:
    Developing Analytic Talent: Becoming a Data Scientist
  • Author:
  • Publisher:
    Wiley
  • Genre:
  • Year:
    2014
  • Rating:
    4 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 80
    • 1
    • 2
    • 3
    • 4
    • 5

Developing Analytic Talent: Becoming a Data Scientist: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Developing Analytic Talent: Becoming a Data Scientist" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Learn the skills needed for the most in-demand tech job

Harvard Business Review calls it the sexiest tech job of the 21st century. Data scientists are in demand, and this unique book shows you exactly what employers want and the skill set that separates the quality data scientist from other talented IT professionals. Data science involves extracting, creating, and processing data to turn it into business value. This guide discusses the essential skills, such as statistics and visualization techniques, and covers everything from analytical recipes and data science tricks to common job interview questions, sample resumes, and source code.

The applications are endless and varied: automatically detecting spam and plagiarism, optimizing bid prices in keyword advertising, identifying new molecules to fight cancer, assessing the risk of meteorite impact. Complete with case studies, this book is a must, whether youre looking to become a data scientist or to hire one.

  • Explains the finer points of data science, the required skills, and how to acquire them, including analytical recipes, standard rules, source code, and a dictionary of terms
  • Shows what companies are looking for and how the growing importance of big data has increased the demand for data scientists
  • Features job interview questions, sample resumes, salary surveys, and examples of job ads
  • Case studies explore how data science is used on Wall Street, in botnet detection, for online advertising, and in many other business-critical situations

Developing Analytic Talent: Becoming a Data Scientist is essential reading for those aspiring to this hot career choice and for employers seeking the best candidates.

Vincent Granville: author's other books


Who wrote Developing Analytic Talent: Becoming a Data Scientist? Find out the surname, the name of the author of the book and a list of all author's works by series.

Developing Analytic Talent: Becoming a Data Scientist — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Developing Analytic Talent: Becoming a Data Scientist" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Chapter 1
What Is Data Science?

Sometimes, understanding what something is includes having a clear picture of what it is not . Understanding data science is no exception. Thus, this chapter begins by investigating what data science is not, because the term has been much abused and a lot of hype surrounds big data and data science. You will first consider the difference between true data science and fake data science. Next, you will learn how new data science training has evolved from traditional university degree programs. Then you will review several examples of how modern data science can be used in real-world scenarios.

Finally, you will review the history of data science and its evolution from computer science, business optimization, and statistics into modern data science and its trends. At the end of the chapter, you will find a Q&A section from recent discussions Ive had that illustrate the conflicts between data scientists, data architects, and business analysts.

This chapter asks more questions than it answers, but you will find the answers discussed in more detail in subsequent chapters. The purpose of this approach is for you to become familiar with how data scientists think, what is important in the big data industry today, what is becoming obsolete, and what people interested in a data science career dont need to learn. For instance, you need to know statistics, computer science, and machine learning, but not everything from these domains. You dont need to know the details about complexity of sorting algorithms (just the general results), and you dont need to know how to compute a generalized inverse matrix, nor even know what a generalized inverse matrix is (a core topic of statistical theory), unless you specialize in the numerical aspects of data science.


Technical Note
This chapter can be read by anyone with minimal mathematical or technical knowledge. More advanced information is presented in Technical Notes like this one, which may be skipped by non-mathematicians.


CROSS-REFERENCE You will find definitions of most terms used in this book in Chapter 8.

Real Versus Fake Data Science

Books, certificates, and graduate degrees in data science are spreading like mushrooms after the rain. Unfortunately, many are just a mirage: people taking advantage of the new paradigm to quickly repackage old material (such as statistics and R programming) with the new label data science.

Expanding on the R programming example of fake data science, note that R is an open source statistical programming language and environment that is at least 20 years old, and is the successor of the commercial product S+. R was and still is limited to in-memory data processing and has been very popular in the statistical community, sometimes appreciated for the great visualizations that it produces. Modern environments have extended R capabilities (the in-memory limitations) by creating libraries or integrating R in a distributed architecture, such as RHadoop (R + Hadoop). Of course other languages exist, such as SAS, but they havent gained as much popularity as R. In the case of SAS, this is because of its high price and the fact that it was more popular in government organizations and brick-and-mortar companies than in the fields that experienced rapid growth over the last 10 years, such as digital data (search engine, social, mobile data, collaborative filtering). Finally, R is not unlike the C, Perl, or Python programming languages in terms of syntax (they all share the same syntax roots), and thus it is easy for a wide range of programmers to learn. It also comes with many libraries and a nice user interface. SAS, on the other hand, is more difficult to learn.

To add to the confusion, executives and decision makers building a new team of data scientists sometimes dont know exactly what they are looking for, and they end up hiring pure tech geeks, computer scientists, or people lacking proper big data experience. The problem is compounded by Human Resources (HR) staff who do not know any better and thus produce job ads that repeat the same keywords: Java, Python, MapReduce, R, Hadoop, and NoSQL. But is data science really a mix of these skills?

Sure, MapReduce is just a generic framework to handle big data by reducing data into subsets and processing them separately on different machines, then putting all the pieces back together. So its the distributed architecture aspect of processing big data, and these farms of servers and machines are called the cloud .

Hadoop is an implementation of MapReduce, just like C++ is an implementation (still used in finance) of object oriented programming. NoSQL means Not Only SQL and is used to describe database or data management systems that support new, more efficient ways to access data (for instance, MapReduce), sometimes as a layer hidden below SQL (the standard database querying language).


CROSS-REFERENCE See Chapter 2 for more information on what MapReduce cant do.

There are other frameworks besides MapReduce for instance, graph databases and environments that rely on the concepts of nodes and edges to manage and access data, typically spatial data. These concepts are not necessarily new. Distributed architecture has been used in the context of search technology since before Google existed. I wrote Perl scripts that perform hash joins (a type of NoSQL join, where a join is the operation of joining or merging two tables in a database) more than 15 years ago. Today some database vendors offer hash joins as a fast alternative to SQL joins. Hash joins are discussed later in this book. They use hash tables and rely on name-value pairs . The conclusion is that MapReduce, NoSQL, Hadoop, and Python (a scripting programming language great at handling text and unstructured data) are sometimes presented as Perls successors and have their roots in systems and techniques that started to be developed decades ago and have matured over the last 10 years. But data science is more than that.

Indeed, you can be a real data scientist and have none of these skills. NoSQL and MapReduce are not new concepts many people embraced them long before these keywords were created. But to be a data scientist, you also need the following:

  • Business acumen
  • Real big data expertise (for example, you can easily process a 50 million-row data set in a couple of hours)
  • Ability to sense the data
  • A distrust of models
  • Knowledge of the curse of big data
  • Ability to communicate and understand which problems management is trying to solve
  • Ability to correctly assess lift or ROI on the salary paid to you
  • Ability to quickly identify a simple, robust, scalable solution to a problem
  • Ability to convince and drive management in the right direction, sometimes against its will, for the benefit of the company, its users, and shareholders
  • A real passion for analytics
  • Real applied experience with success stories
  • Data architecture knowledge
  • Data gathering and cleaning skills
  • Computational complexity basics how to develop robust, efficient, scalable, and portable architectures
  • Good knowledge of algorithms

A data scientist is also a generalist in business analysis, statistics, and computer science, with expertise in fields such as robustness, design of experiments, algorithm complexity, dashboards, and data visualization, to name a few. Some data scientists are also data strategists they can develop a data collection strategy and leverage data to develop actionable insights that make business impact. This requires creativity to develop analytics solutions based on business constraints and limitations.

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Developing Analytic Talent: Becoming a Data Scientist»

Look at similar books to Developing Analytic Talent: Becoming a Data Scientist. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Developing Analytic Talent: Becoming a Data Scientist»

Discussion, reviews of the book Developing Analytic Talent: Becoming a Data Scientist and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.