PYTHON
For Data Science
The Ultimate Step-by-Step Guide to Learn Python In 7 Days & NLP, Data Science from Scratch with Python ( Master The basics of Data Science and Improve Artificial Intelligence )
RICHARD WILSON
Copyright 2021 by Richard Wilson
All rights reserved.
No a part of this book may be reproduced or transmitted in any kind or by any suggests that electronic or mechanical, as well as photocopying, Recording or by any data storage and retrieval system, while not written permission from the publisher, aside from the inclusion of temporary quotations in an exceeding review.
CONTENTS
CHAPTER 1
Data Scientist: This Is How It Is And How One Is Formed In This Increasingly Demanded Profession
The popular wisdom is clear; a data scientist is a "statistician who works in San Francisco." And, for a few years, this profession is fashionable thanks, in part, to the startup world. But data science goes much further and is becoming one of today's most promising professions.
The data fever has made us begin to hear about this discipline everywhere. But, we can't help wondering if it's a fad or data, scientists have come to stay. We review what exactly that is about data science, its job opportunities and the possibilities that exist to train.
What is a data scientist?
Another way to see it is that of Josh Wills. Wills uses another definition that seems much more accurate and intuitive to me: " Data scientist: Person who knows more about statistics than any programmed and at the same time knows more about programming than any statistic ". A little more seriously, a data scientist is simply a professional dedicated to analyzing and interpreting large databases. Or what is the same, one of the most important professionals in any internet company today.
Why has it become fashionable?
The answer was given by Javi Pastor: current technology not only needs the best talent but also data, lots of data. That is to say, that fashion for the open and the turn towards data is nothing more than the net mask of the same corporate spirit of always looking for the next site. And what is valid for artificial intelligence and machine learning environments is valid for almost any technology.
The funny thing is that this great value of the data contrasts with that precisely the data is the most abundant resource on the planet (it is estimated that 2.5 trillion bytes of new information is created per day). They don't seem easy to make things compatible. How is it possible that something so abundant is so valuable? Even if it was pure supply and demand, accumulating data should be trivial. And it is, the complex thing is to process them.
Until relatively recently we simply couldn't do it. At the end of the 90s, the field of machine learning began to take on an autonomous entity, our ability to work with immense amounts of data was reduced and the social irruption of the internet did the rest. For a few years we have faced the first great 'democratization' of these techniques. And with that, the boom of data scientists: nobody wants to have an untapped gold mine.
In search of a data scientist
The problem is that, suddenly, there has been a great demand for a profile that until now practically did not exist. Remember that you need statistical knowledge that a programmer does not usually have and computer knowledge that a statistician does not usually even imagine.
Most of the time it has been solved with self-taught training that completes the basic skills that the training program should have but does not have. That is why, today, we can find a great diversity of professional profiles in the world of data science. According to Burtch Works , 32% of active data, scientists comes from the world of mathematics and statistics, 19% from computer engineering and 16% of other engineering.
How to train
Degrees
Today, there are some double degrees in computer engineering and mathematics (Autonomous University of Madrid, Granada, Polytechnic University of Madrid, Polytechnic University of Catalonia, Complutense, Murcia Autonomous University of Barcelona ) or in computer science and statistics (University of Valladolid) that seem the best option if we consider this specialization. In fact, this option seems more interesting than the possible 'degrees in data science' that could arise in the future: the possibilities are broader, the formation more diverse and allows us not to typecast.
Postgraduate
The postgraduate is a very diverse world. We can find postgraduate, masters or specialized courses in almost all universities and a truly excessive private offer. To give some examples we have postgraduate degrees at the UGR, the UAB , the UAM , the UPM or the Pompeu Fabra. However, in postgraduate courses it is more difficult to recommend a specific course. The key is to seek to complement our previous training and, in that sense, diversity is good news.
What we can find in the postgraduate training that we cannot find in the previous training is the ' business orientation ' component. We must not forget that most of the work of data scientists is in companies that seek to make their databases, profitable, because what market orientation is highly recommended. In fact, many of the masters 'big data' are offered by business schools such as OEI or Instituto Empresa.
MOOCS
One of the most interesting resources you can find are the moocs (you know, the massive open online courses). In fact recently, we saw that this self-training option could have a lot of future . Starting with the specialization program in Big Data of Coursera , we can find online courses from the best universities in the world. All this without mentioning the numerous tools to learn languages like Python or R .
Certificates and other options
There are also a series of certificates or accreditations that allow us to guarantee our knowledge in data science: Certified Analytics Professional (CAP), Cloudera Certified Professional: Data Scientist (CCP: DS), EMC: Data Science Associate (EMCDSA) or more specific certificates like those of SAS. Some of these certificates have very hard requirements, but are a good alternative if we have been working in this field before.
Other interesting resources are the associations (such as R Hispanic or Python Spain) and informal groups such Databeers so successful are having throughout the country. It is true that the ecosystem of events and meetings in data science is beginning to develop, but with the experience accumulated in other areas it is sure to be updated soon.
What languages should be learned?
In reality, as any initiate knows, in programming the choice of one language or another is always complicated. In this election, they intervene from technical or formative factors to simple personal preferences. What is clear is that there are some languages more popular than others.
The three musketeers of Data Science
An irreplaceable
The great division
A: Around 52% of the dateros use R for their usual work. It has in its favor that it has been the statistical language par excellence for many years and we can find codes and packages for almost anything we can think of. He has against him that his syntax is older, complex and ugly than other more modern languages that push hard. It is the language of those who approach from a scientific background.