Lee Baer and Mark A. Blais (eds.) Current Clinical Psychiatry Handbook of Clinical Rating Scales and Assessment in Psychiatry and Mental Health 10.1007/978-1-59745-387-5_1 Humana Press, a part of Springer Science+Business Media, LLC 2009
1. Understanding Rating Scales and Assessment Instruments
Mark Blais 1 and Lee Baer 1
(1)
Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
Abstract
The successful integration of screening tests and brief rating scales into clinical practice requires an adequate understanding of a few basic psychometric and statistical concepts. This chapter provides an overview of such concepts as reliability, validity, diagnostic accuracy and sensitivity to change. In addition we attempt to demonstrate how these concepts relate to clinical care. We believe that a solid understanding of these concepts will increase the utility and benefit patients and clinicians obtain through the integration of brief measurement tools into their practice.
It is only in the past 60 years that measurement has become a routine part of health-care practice and research. In psychiatry and psychology, many of the characteristics we are most interested in, such as quality of life, depression, anxiety, or personality style, are not physical entities like body weight or heart rate. Rather they are subjective experiences or theoretical constructs that cannot be directly measured, but are instead inferred from observable patterns of behavior, such as responses to a rating scale. Self-report instruments and clinician-administered rating scales can aid clinicians in identifying, quantifying, and tracking change in these important but not directly observable variables. The overarching goal of this handbook is to provide you, as a mental health clinician, with the knowledge and tools necessary to integrate measurement into your ongoing clinical practice.
Types of Rating Scales Presented in This Handbook
Measurement tools can take many different forms and serve a variety of important functions. The following chapters focus primarily on two types of measurement instruments: screening tests and symptom- or disorder-specific rating scales. See the table in the front of this handbook (page xix) for a listing of all rating scales reproduced in this handbook; the fourth column of this table describes what type of rating scale each represents.
Screening tests are assessment tools designed to identify the presence or absence of a target disorder (such as ADHD or OCD) or condition (such as personality disorder or cognitive impairment). Clinicians typically administer screening tests to rule out the presence of important co-morbid conditions, such as alcohol abuse in patients with attention deficit disorder, at the initiation of care. Thus, screening tests are similar to diagnostic instruments like structured diagnostic interviews (such as the SCID), but are briefer and typically less precise.
Symptom- or disorder-specific rating scales are designed to quantify the severity of a disorder after the presence of the disorder has been established (quantifying the severity of depressive symptoms for patients treated in a depression clinic). Symptom or disorder rating scales can be administered at anytime during treatment to help quantify the severity of the disorder. Information provided by these scales can inform treatment planning (such as helping establish the appropriate level of care or frequency of sessions) and monitor patients progress over the course of treatment. Both types of scales can be either patient rated (self-report) or clinician rated (clinician-administered).
Basic Statistical Concepts
The branch of statistics known as psychometrics is concerned with the scientific properties of measurement instruments, such as those described in the following chapters. Some of the questions addressed by psychometrics are as follows: (1) how reproducible is our patients score on a particular rating scale (reliability), (2) how well is the rating scale measuring the intended construct (construct validity and dimensionality), and (3) how useful is the scale for tracking a patients progress across the course of treatment (sensitivity to change). Questionnaires that are simply for information gathering and whose responses are not combined into a total score are generally not assessed using psychometric methods (these instruments are noted as Questionnaire in the fourth column of the Table of Rating Scales on p. xix).
Reliability and validity are generally presented in the form of a correlation coefficient with absolute values ranging from 0.00 to 1.00. A coefficient of 0.00 represents no reliability (or validity), while a coefficient of 1.00 indicates perfect reliability (or validity). But measurement is never perfect. Anytime we measure a characteristic of a person or object, the value we obtain contains some degree of error.
Reliability
Reliability statistics provide a means for quantifying this degree of error contained in our measurement and indicating the consistency and stability of the score. Reliability is a necessary, but not a sufficient, quality of useful rating scales. The more reliable an instrument is, the more consistent a patients score will be over time or across different raters (in the case of self-report scales and clinician-administered scales, respectively, as shown in Table ). Important reliability concepts include internal consistency, testretest reliability, and inter-rater agreement.
Table 1.1
Psychometric considerations for various types of rating scales
| Types of scale |
---|
| Screening | Rating scale |
---|
| Self-report | Clinician-administered | Self-report | Clinician-administered |
---|
Reliability a |
Testretest | X | X |
Inter-rater | X | X |
Internal consistency | X | X | X | X |
Validity b |
Sensitivity | X | X |
Specificity | X | X |
Positive predictive value | X | X |
Convergent validity | X | X |
Divergent validity | X | X |
Face validity | X | X | X | X |
aTestretest and inter-rater reliabilities assess the average reproducibility of a score on a particular scale, over time and with different raters, respectively. Internal consistency reliability assesses the degree to which a particular scale is measuring a single concept (such as depression or reading ability).
bValidity statistics assess the usefulness of a scale for a particular purpose. Sensitivity and specificity are related to the rates of false negatives and false positives, respectively, for a given screening or diagnostic test. Positive predictive value is related to the meaning of a positive test, given the base rate of the particular disorder. Convergent and divergent validity is the degree to which a scale is more closely related (correlated) to scales measuring the same construct than it is to scales measuring different or unrelated constructs. Face validity is not assessed statistically, but refers to the degree to which a test seems to the test-taker to be measuring what it is intended to measure.