Python for Six Sigma
SHU LIU
Copyright 2020 Shu Liu
All rights reserved.
ISBN: 9781653888245
DEDICATION
To my wife, Gusui Zhang
TABLE OF CONTENTS
CHAPTER 1: INTRODUCTION
We are now in the era of Industry 4.0, the revolution that transforms production facilities into a smart factory. In this highly digitalized and automated factory, collaborative robotics, artificial intelligence, and the Internet of Things are utilized to make substantially customized, flexible, and efficient products. Digital integration of information in this revolution provides companies with real-time access to the continuous flow of data from their production lines and their customers. The abundant data flow enables companies to serve customers with innovative products, asset management, predictive maintenance, and other on-time services.
Then what about you, the individual employee? Do you want to be an active contributor in this new era or, as Yuval Noah Harari puts it, to be a member of the useless mass? If you choose the former, you will need to master the skills of unlocking data and using analytic tools in creative and efficient ways. You will need to learn how to combine data from different parts of the business, get value out of data through intelligent systems design, and use real-time analytics to improve your processes continually. In a nutshell, you will need to become a data scientist and a process improvement expert.
If you are a Six Sigma practitioner, you have the skills needed to systematically solve a problem through the approach of define, measure, analyze, improve, and control (DMAIC). You define the problem, collect relevant variables, and measure them in the right fashion. You then analyze the data to identify the root causes of variations. Finally, you develop solutions and implement them. All these skills come in handy, but there is more for you to learn. To become a good data scientist in the new era, you will need to develop some knowledge of machine learning techniques, and you will need to learn at least one programming language for machine learning.
Among the many program languages available, I would highly recommend Python, which has been ranked by several institutions as the top programing language for machine learning.
The goal of this book is to teach you how to use Python for data analytics in your Six Sigma projects. The book follows Six Sigmas DMAIC roadmap. Chapter two introduces some basic concepts of both Six Sigma and the Python language. From chapter three to chapter seven, each chapter shows you how to use Python to perform significant tasks in one of the DMAIC phases by using many case studies in which mathematics is discussed before step-by-step instructions on Python coding. Chapter eight collects all Python codes described in the book. You can find all the datasets used in this book from my GitHub website:
https://github.com/shuliu10/python_for_six_sigma
After you have read this book, you will become an expert with knowledge in using Python for your Six Sigma projects. You will set yourself apart from your colleagues who only know some essential statistical software such as Minitab and JMP. You will have a critical new skill to prepare yourself for success in Industry 4.0.
CHAPTER 2: SIX SIGMA AND PYTHON
SIX SIGMA OVERVIEW
Six Sigma is a structured and disciplined process for reducing the output variability of a business process by identifying and minimizing the variability of its inputs. The term sigma refers to a standard deviation that measures the amount of distribution dispersion of a dataset. The number of sigma values shows how well a given process performs. A Six Sigma process is a world-class performance with its defect rate at 3.4 parts per million opportunities (DPMO), accounting for a 1.5 sigma shift in the mean.
Every process has its inputs and outputs. Its outputs will be stable if its inputs are well controlled. Six Sigma provides a set of tools to improve inputs of a process to improve the quality of its outputs, such as reducing defect rate, shortening cycle time, and minimizing costs. From the Six Sigma point of view, all processes can be defined, measured, analyzed, improved, and controlled. Therefore, a typical Six Sigma process consists of five phases: define, measure, analyze, improve, and control, commonly referred to by the acronym DMAIC.
Define phase. In the define phase, the project team defines the problem it wants to work on, the project objectives, and the project scope. The output of the define phase is a project charter that has the following major components:
Business case. A business case explains the importance of the project. It details costs incurred from the problem and the consequences of taking no actions. In the define phase, the team applies some statistical techniques to select a business case from many alternatives. These techniques can include cluster analysis, Pareto analysis, time series analysis, and financial analysis.
Problem statement. The problem statement describes the nature of the problem and its impacts on the companys business.
Project objective. This statement defines the expected results of the project, including their information, measurement, and target date.
Project scope. The project scope draws project boundaries and adds one step outside the limits on each side of the scope body.
Business impact. You can measure your business impact by a metric comprising a set of financial, quality, or safety measures.
Stakeholders. Stakeholders are people who have stakes on this project, including champions, process owners, and team members.
Project plan. The project plan defines resource, timing, significant activities, and deliverables in every step of the DMAIC process.
Measure phase. In the measure phase, the team assesses the measurement system, defines process inputs and outputs, and determines the capability of process performance. Major tasks in the measure phase are measurement system analysis, process mapping, process capability analysis, and product capability analysis.
Measurement system analysis. Measurement system analysis (MSA) is a series of tests that measure variation in measured values of a measurement system.
For variable data, MSA addresses precision, accuracy, stability, and discrimination.
Precision refers to how close measured values on the same sample are to each other. Gage R&R studies precision, which assesses repeatability and reproducibility. Repeatability measures variability in the measurement system caused by the measurement device. Reproducibility measures variability caused by operators.
Accuracy refers to how close a measurement is to the actual value. Bias and linearity measures accuracy. Bias examines the difference between the measured value and the actual value, while linearity determines if bias is constant across all actual values.
Stability measures measurement variation over time.
The resolution of a measurement system is the ability to detect small changes in measurement. The guideline provided by the Automotive Industry Action Groups (AIAG) states that gage resolution should divide process tolerance into at least ten parts.
For attribute data measurement, operators look at a characteristic either to determine its acceptability (yes/no) or to rate it on a scale. MSA for attribute data assesses how well operators are consistent with themselves, with one another, and with known standards.
Process mapping. Process mapping graphically identifies the throusequence of a process and inputs and outputs in each step of the process.
Process capability analysis. Process capability analysis determines whether a process can produce output that meets expected specifications, often measured by capability indices. Capability indices are ratios of process variation range to specification range.