• Complain

Q. Ethan McCallum - Bad Data Handbook: Cleaning Up The Data So You Can Get Back To Work

Here you can read online Q. Ethan McCallum - Bad Data Handbook: Cleaning Up The Data So You Can Get Back To Work full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2012, publisher: OReilly Media, genre: Politics. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Q. Ethan McCallum Bad Data Handbook: Cleaning Up The Data So You Can Get Back To Work
  • Book:
    Bad Data Handbook: Cleaning Up The Data So You Can Get Back To Work
  • Author:
  • Publisher:
    OReilly Media
  • Genre:
  • Year:
    2012
  • Rating:
    3 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 60
    • 1
    • 2
    • 3
    • 4
    • 5

Bad Data Handbook: Cleaning Up The Data So You Can Get Back To Work: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Bad Data Handbook: Cleaning Up The Data So You Can Get Back To Work" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

What is bad data? Some people consider it a technical phenomenon, like missing values or malformed records, but bad data includes a lot more. In this handbook, data expert Q. Ethan McCallum has gathered 19 colleagues from every corner of the data arena to reveal how theyve recovered from nasty data problems.

From cranky storage to poor representation to misguided policy, there are many paths to bad data. Bottom line? Bad data is data that gets in the way. This book explains effective ways to get around it.

Among the many topics covered, youll discover how to:

  • Test drive your data to see if its ready for analysis
  • Work spreadsheet data into a usable form
  • Handle encoding problems that lurk in text data
  • Develop a successful web-scraping effort
  • Use NLP tools to reveal the real sentiment of online reviews
  • Address cloud computing issues that can impact your analysis effort
  • Avoid policies that create data analysis roadblocks
  • Take a systematic approach to data quality analysis

Q. Ethan McCallum: author's other books


Who wrote Bad Data Handbook: Cleaning Up The Data So You Can Get Back To Work? Find out the surname, the name of the author of the book and a list of all author's works by series.

Bad Data Handbook: Cleaning Up The Data So You Can Get Back To Work — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Bad Data Handbook: Cleaning Up The Data So You Can Get Back To Work" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Bad Data Handbook
Q. Ethan McCallum
Published by OReilly Media

Beijing Cambridge Farnham Kln Sebastopol Tokyo About the Authors Guilty - photo 1

Beijing Cambridge Farnham Kln Sebastopol Tokyo

About the Authors

(Guilty parties are listed in order of appearance.)

Kevin Fink is an experienced biztech executive with a passion for turning data into business value. He has helped take two companies public (as CTO of N2H2 in 1999 and SVP Engineering at Demand Media in 2011), in addition to helping grow others (including as CTO of WhitePages.com for four years). On the side, he and his wife run Traumhof, a dressage training and boarding stable on their property east of Seattle. In his copious free time, he enjoys hiking, riding his tandem bicycle with his son, and geocaching.

Paul Murrell is a senior lecturer in the Department of Statistics at the University of Auckland, New Zealand. His research area is Statistical Computing and Graphics and he is a member of the core development team for the R project. He is the author of two books, R Graphics and Introduction to Data Technologies, and is a Fellow of the American Statistical Association.

Josh Levy is a data scientist in Austin, Texas. He works on content recommendation and text mining systems. He earned his doctorate at the University of North Carolina where he researched statistical shape models for medical image segmentation. His favorite foosball shot is banked from the backfield.

Adam Laiacano has a BS in Electrical Engineering from Northeastern University and spent several years designing signal detection systems for atomic clocks before joining a prominent NYC-based startup.

Jacob Perkins is the CTO of Weotta, a NLTK contributer, and the author of Python Text Processing with NLTK Cookbook. He also created the NLTK demo and API site text-processing.com, and periodically blogs at streamhacker.com. In a previous life, he invented the refrigerator.

Spencer Burns is a data scientist/engineer living in San Francisco. He has spent the past 15 years extracting information from messy data in fields ranging from intelligence to quantitative finance to social media.

Richard Cotton is a data scientist with a background in chemical health and safety, and has worked extensively on tools to give non-technical users access to statistical models. He is the author of the R packages assertive for checking the state of your variables and sig to make sure your functions have a sensible API. He runs The Damned Liars statistics consultancy.

Philipp K. Janert was born and raised in Germany. He obtained a Ph.D. in Theoretical Physics from the University of Washington in 1997 and has been working in the tech industry since, including four years at Amazon.com, where he initiated and led several projects to improve Amazons order fulfillment process. He is the author of two books on data analysis, including the best-selling Data Analysis with Open Source Tools (OReilly, 2010), and his writings have appeared on Perl.com, IBM developerWorks, IEEE Software, and in the Linux Magazine. He also has contributed to CPAN and other open-source projects. He lives in the Pacific Northwest.

Jonathan Schwabish is an economist at the Congressional Budget Office. He has conducted research on inequality, immigration, retirement security, data measurement, food stamps, and other aspects of public policy in the United States. His work has been published in the Journal of Human Resources, the National Tax Journal, and elsewhere. He is also a data visualization creator and has made designs on a variety of topics that range from food stamps to health care to education. His visualization work has been featured on the visualizaing.org and visual.ly websites. He has also spoken at numerous government agencies and policy institutions about data visualization strategies and best practices. He earned his Ph.D. in economics from Syracuse University and his undergraduate degree in economics from the University of Wisconsin at Madison.

Brett Goldstein is the Commissioner of the Department of Innovation and Technology for the City of Chicago. He has been in that role since June of 2012. Brett was previously the citys Chief Data Officer. In this role, he lead the citys approach to using data to help improve the way the government works for its residents. Before coming to City Hall as Chief Data Officer, he founded and commanded the Chicago Police Departments Predictive Analytics Group, which aims to predict when and where crime will happen. Prior to entering the public sector, he was an early employee with OpenTable and helped build the company for seven years. He earned his BA from Connecticut College, his MS in criminal justice at Suffolk University, and his MS in computer science at University of Chicago. Brett is pursuing his PhD in Criminology, Law, and Justice at the University of Illinois-Chicago. He resides in Chicago with his wife and three children.

Bobby Norton is the co-founder of Tested Minds, a startup focused on products for social learning and rapid feedback. He has built software for over 10 years at firms such as Lockheed Martin, NASA, GE Global Research, ThoughtWorks, DRW Trading Group, and Aurelius. His data science tools of choice include Java, Clojure, Ruby, Bash, and R. Bobby holds a MS in Computer Science from FSU.

Steve Francia is the Chief Evangelist at 10gen where he is responsible for the MongoDB user experience. Prior to 10gen he held executive engineering roles at OpenSky, Portero, Takkle and Supernerd. He is a popular speaker on a broad set of topics including cloud computing, big data, e-commerce, development and databases. He is a published author, syndicated blogger (spf13.com) and frequently contributes to industry publications. Steve's work has been featured by the New York Times, Guardian UK, Mashable, ReadWriteWeb, and more. Steve is a long time contributor to open source. He enjoys coding in Vim and maintains a popular Vim distribution. Steve lives with his wife and four children in Connecticut.

Tim McNamara is a New Zealander with a laptop and a desire to do good. He is an active participant in both local and global open data communities, jumping between organising local meetups to assisting with the global CrisisCommons movement. His skills as a programmer began while assisting with the development Sahana Disaster Management System, were refined helping Sugar Labs, the software which runs the One Laptop Per Child XO. Tim has recently moved into the escience field, where he works to support the research communitys uptake of technology.

Marck Vaisman is a data scientist and claims hes been one before the term was en vogue. He is also a consultant, entrepreneur, master munger, and hacker. Marck is the principal data scientist at DataXtract, LLC where he helps clients ranging from startups to Fortune 500 firms with all kinds of data science projects. His professional experience spans the management consulting, telecommunications, Internet, and technology industries. He is the co-founder of Data Community DC, an organization focused on building the Washington DC area data community and promoting data and statistical sciences by running Meetup events (including Data Science DC and R Users DC) and other initiatives. He has an MBA from Vanderbilt University and a BS in Mechanical Engineering from Boston University. When hes not doing something data related, you can find him geeking out with his family and friends, swimming laps, scouting new and interesting restaurants, or enjoying good beer.

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Bad Data Handbook: Cleaning Up The Data So You Can Get Back To Work»

Look at similar books to Bad Data Handbook: Cleaning Up The Data So You Can Get Back To Work. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Bad Data Handbook: Cleaning Up The Data So You Can Get Back To Work»

Discussion, reviews of the book Bad Data Handbook: Cleaning Up The Data So You Can Get Back To Work and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.