LitArk » Books » Computer

Andreas FranГ§ois Vermeulen [Andreas FranГ§ois - Practical Data Science: A Guide to Building the Technology Stack for Turning Data Lakes into Business Assets

Here you can read online Andreas FranГ§ois Vermeulen [Andreas FranГ§ois - Practical Data Science: A Guide to Building the Technology Stack for Turning Data Lakes into Business Assets full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2018, publisher: Apress, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Book:
Practical Data Science: A Guide to Building the Technology Stack for Turning Data Lakes into Business Assets
Author:
Andreas Franois Vermeulen Andreas Franois
Publisher:
Apress
Genre:
Books / Computer
Year:
2018
Rating:
4 / 5
Favourites:
Add to favourites
Your mark:
- 80
- 1
- 2
- 3
- 4
- 5

Description
Author's other books
Similar books

Practical Data Science: A Guide to Building the Technology Stack for Turning Data Lakes into Business Assets: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Practical Data Science: A Guide to Building the Technology Stack for Turning Data Lakes into Business Assets" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Learn how to build a data science technology stack and perform good data science with repeatable methods. You will learn how to turn data lakes into business assets.
The data science technology stack demonstrated in Practical Data Science is built from components in general use in the industry. Data scientist Andreas Vermeulen demonstrates in detail how to build and provision a technology stack to yield repeatable results. He shows you how to apply practical methods to extract actionable business knowledge from data lakes consisting of data from a polyglot of data types and dimensions.
What Youll Learn

Become fluent in the essential concepts and terminology of data science and data engineering
Build and use a technology stack that meets industry criteria
Master the methods for retrieving actionable business knowledge
Coordinate the handling of polyglot data types in a data lake for repeatable results

Who This Book Is For
Data scientists and data engineers who are required to convert data from a data lake into actionable knowledge for their business, and students who aspire to be data scientists and data engineers

Andreas FranГ§ois Vermeulen [Andreas FranГ§ois: author's other books

Who wrote Practical Data Science: A Guide to Building the Technology Stack for Turning Data Lakes into Business Assets? Find out the surname, the name of the author of the book and a list of all author's works by series.

Practical Data Science: A Guide to Building the Technology Stack for Turning Data Lakes into Business Assets — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Practical Data Science: A Guide to Building the Technology Stack for Turning Data Lakes into Business Assets" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Andreas Franois Vermeulen 2018

Andreas Franois Vermeulen Practical Data Science

10. Transform Superstep

Andreas Franois Vermeulen 1

(1)

West Kilbride North Ayrshire, UK

The Transform superstep allows you, as a data scientist, to take data from the data vault and formulate answers to questions raised by your investigations. The transformation step is the data science process that converts results into insights.

It takes standard data science techniques and methods to attain insight and knowledge about the data that then can be transformed into actionable decisions, which, through storytelling, you can explain to non-data scientists what you have discovered in the data lake.

Any source code or other supplementary material referenced by me in this book is available to readers on GitHub, via this books product page, located at .

Transform Superstep

The Transform superstep uses the data vault from the process step as its source data. The transformations are tuned to work with the five dimensions of the data vault. As a reminder of what the structure looks like, see Figure .

Figure 10-1

Five categories of data

Dimension Consolidation

The data vault consists of five categories of data, with linked relationships and additional characteristics in satellite hubs.

To perform dimension consolidation, you start with a given relationship in the data vault and construct a sun model for that relationship, as shown in Figure .

Figure 10-2

T-P-O-L-E hHigh-level design

I will cover the example of a person being born, to illustrate the consolidation process.

Note

You will need a Python editor to complete this chapter, so please start it, then we can proceed to the data science required.

Open a new file in the Python editor and save it as Transform-Gunnarsson_is_Born.py in directory ..\VKHCG\01-Vermeulen\04-Transform . You will require the Python ecosystem , so set it up by adding the following to your editor (you must set up the ecosystem by adding the libraries):

import sys

import os

from datetime import datetime

from datetime import timedelta

from pytz import timezone, all_timezones

import pandas as pd

import sqlite3 as sq

from pandas.io import sql

import uuid

pd.options.mode.chained_assignment = None

Find the working directory of the examples.

################################################################

if sys.platform == 'linux':

Base=os.path.expanduser('~') + '/VKHCG'

else:

Base='C:/VKHCG'

print('################################')

print('Working Base :',Base, ' using ', sys.platform)

print('################################')

Set up the company you are processing.

Company='01-Vermeulen'

Add the company work space:

sDataBaseDir=Base + '/' + Company + '/04-Transform/SQLite'

if not os.path.exists(sDataBaseDir):

os.makedirs(sDataBaseDir)

sDatabaseName=sDataBaseDir + '/Vermeulen.db'

conn1 = sq.connect(sDatabaseName)

Add the data vault.

sDataVaultDir=Base + '/88-DV'

if not os.path.exists(sDataVaultDir):

os.makedirs(sDataVaultDir)

sDatabaseName=sDataVaultDir + '/datavault.db'

conn2 = sq.connect(sDatabaseName)

Add the data warehouse.

sDataWarehousetDir=Base + '/99-DW'

if not os.path.exists(sDataWarehousetDir):

os.makedirs(sDataWarehousetDir)

sDatabaseName=sDataVaultDir + '/datawarehouse.db'

conn3 = sq.connect(sDatabaseName)

Execute the Python code now, to set up the basic ecosystem.

Note

The new data structure, called a data warehouse, is in directory ../99-DW .

The data warehouse is the only data structure delivered from the Transform step. Lets look at a real-world scenario.

Gumundur Gunnarsson was born on December 20, 1960, at 9:15 in Landsptali, Hringbraut 101, 101 Reykjavk, Iceland. Following is what I would expect to find in the data vault.

Time

You need a date and time of December 20, 1960, at 9:15 in Reykjavk, Iceland. Enter the following code into your editor (you start with a UTC time):

print('Time Category')

print('UTC Time')

BirthDateUTC = datetime(1960,12,20,10,15,0)

BirthDateZoneUTC=BirthDateUTC.replace(tzinfo=timezone('UTC'))

BirthDateZoneUTCStr=BirthDateZoneUTC.strftime("%Y-%m-%d %H:%M:%S (%Z) (%z)")

BirthDateLocal=BirthDate.strftime("%Y-%m-%d %H:%M:%S")

print(BirthDateZoneUTCStr)

Formulate a Reykjavk local time.

print('Birth Date in Reykjavik :')

BirthZone = 'Atlantic/Reykjavik'

BirthDate = BirthDateZoneUTC.astimezone(timezone(BirthZone))

BirthDateStr=BirthDate.strftime("%Y-%m-%d %H:%M:%S (%Z) (%z)")

print(BirthDateStr)

You have successfully discovered the time key for the time hub and the time zone satellite for Atlantic/Reykjavik .

Time Hub : You have a UTC date and time of December 20, 1960, at 9:15 in Reykjavk, Iceland, as follows:
1960-12-20 10:15:00 (UTC) (+0000)
Time Satellite : Birth date in Reykjavk:
1960-12-20 09:15:00 (-01) (-0100)

Now you can save your work, by adding the following to your code.

Build a data frame, as follows:

IDZoneNumber=str(uuid.uuid4())

sDateTimeKey=BirthDateZoneStr.replace(' ','-').replace(':','-')

TimeLine=[('ZoneBaseKey', ['UTC']),

('IDNumber', [IDZoneNumber]),

('DateTimeKey', [sDateTimeKey]),

('UTCDateTimeValue', [BirthDateZoneUTC]),

('Zone', [BirthZone]),

('DateTimeValue', [BirthDateStr])]

TimeFrame = pd.DataFrame.from_items(TimeLine)

Create the time hub.

TimeHub=TimeFrame[['IDNumber','ZoneBaseKey','DateTimeKey','DateTimeValue']]

TimeHubIndex=TimeHub.set_index(['IDNumber'],inplace=False)

sTable = 'Hub-Time-Gunnarsson'

print('\n#################################')

print('Storing :',sDatabaseName,'\n Table:',sTable)

print('\n#################################')

TimeHubIndex.to_sql(sTable, conn2, if_exists="replace")

sTable = 'Dim-Time-Gunnarsson'

TimeHubIndex.to_sql(sTable, conn3, if_exists="replace")

Create the time satellite.

TimeSatellite=TimeFrame[['IDNumber','DateTimeKey','Zone','DateTimeValue']]

TimeSatelliteIndex=TimeSatellite.set_index(['IDNumber'],inplace=False)

BirthZoneFix=BirthZone.replace(' ','-').replace('/','-')

sTable = 'Satellite-Time-' + BirthZoneFix + '-Gunnarsson'

print('\n#################################')

print('Storing :',sDatabaseName,'\n Table:',sTable)

print('\n#################################')

TimeSatelliteIndex.to_sql(sTable, conn2, if_exists="replace")

sTable = 'Dim-Time-' + BirthZoneFix + '-Gunnarsson'

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Similar books «Practical Data Science: A Guide to Building the Technology Stack for Turning Data Lakes into Business Assets»

Look at similar books to Practical Data Science: A Guide to Building the Technology Stack for Turning Data Lakes into Business Assets. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.

Probyto Data Science and Consulting Pvt. Ltd.

Data Science for Business Professionals: A Practical Guide for Beginners

Dave Fowler

The Informed Company: How to Build a Cloud-Based Data Stack to Explore and Understand Data

Schmarzo

Big data MBA: driving business strategies with data science

Madhavan

Mastering Python for Data Science

Foreman

Data smart: using data science to transform information into insight

Fawcett Tom

Data Science for Business

Greg Foss

Practical Data Science with SAP: Machine Learning Techniques for Enterprise Data

Ulrika Jägare

Data Science Strategy For Dummies

Michael R. Brzustowicz

Data Science with Java: Practical Methods for Scientists and Engineers

EMC Education Services [EMC Education Services]

Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data

John W. Foreman

Data Smart: Using Data Science to Transform Information into Insight

Foster Provost

Data Science for Business: What you need to know about data mining and data-analytic thinking

Reviews about «Practical Data Science: A Guide to Building the Technology Stack for Turning Data Lakes into Business Assets»

Discussion, reviews of the book Practical Data Science: A Guide to Building the Technology Stack for Turning Data Lakes into Business Assets and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.