Читать книгу SAS Programming with Medicare Administrative Data - Matthew Gillingham - Страница 8
ОглавлениеChapter 1: Introduction
Introduction and Purpose of This Book
Introduction and Purpose of This Book
Welcome to the world of SAS® programming with Medicare data! This book provides beginner, intermediate, and advanced SAS users with the information needed to execute a research programming project using Medicare administrative data. It introduces the reader to common, important, and frequently used concepts encountered when programming with Medicare data. The focus of this book is decidedly on the data and the policies that drive the content of the data, using the SAS syntax as a tool to answer common research questions. This approach is very intentional; in my experience, the SAS syntax is the easiest part of programming with Medicare data! The most challenging aspect of my job is to understand the data and how to properly use them to correctly answer research questions.1
Medicare data are unlike any other health care data, and the uniqueness of Medicare data is driven by the uniqueness of the Medicare program. For example, public policy decisions largely influence decisions surrounding what services Medicare covers and how those services are paid for. This differs from the decision making process of a commercial health plan, which seeks to maximize profits. Here are a few examples, some of which we will discuss in coming chapters:
• Medicare generally covers elderly beneficiaries without regard to medical history. Therefore, the Medicare population is generally sicker than the population of most commercial health plans.
• Part D prescription drug data contains information for out-of-hospital prescription drug fills. Prescription drugs administered during a hospital stay may not appear in the claims data at all.
• Services paid for by Medicare Part C (Medicare Advantage) may not appear in the administrative claims files because they are paid for by managed care providers.
• Medicare pays for some services (e.g., home health agencies, hospice, or acute inpatient hospitals) using what are called prospective payment systems (PPS).
These examples are supposed to whet your appetite for learning about Medicare data. I am hoping that you have not run screaming from your desk! If you have, please come back! We will learn about all of these concepts and more! The point I am trying to make is that the more you understand the Medicare program, the more you will understand how to properly use Medicare administrative data for research purposes. To this end, this book will describe the Medicare program, Medicare data, how the two interact, and how that affects the SAS programmer who uses the data. These concepts and techniques will be illustrated through the completion of an example research project.
Learning how to use Medicare data is a lifetime pursuit. It takes many, many years to build subject matter expertise due to the sheer volume of information on topics like Medicare policy, payment systems, and data. What’s more, the world of Medicare data is changing rapidly due to increased attention on Medicare costs, quality, and the use of information technology in health care delivery. For example, the use of electronic health records (EHRs) in the measurement of utilization, cost, and quality is on the horizon. EHRs will supplement, or perhaps someday replace, the administrative claims data we use in this book. In addition, Medicare is exploring different methods of payment, such as the bundling of services previously paid on an individual basis. As such, we will never be able to cover every topic you will run into. However, we will build a very strong foundation in standard concepts, such as specifying and coding continuous enrollment algorithms and identifying and summarizing common services and events. My hope is that the reader can apply this foundation to programming projects that use Medicare data and use it to gain a broader perspective on the use of health care administrative data in general.
Framework of This Book
This book is organized as an approach to completing a research programming project. The basic framework below can be used as a blueprint for programming most any research project, including the example we will use in this book.
1. Plan the project by identifying the research questions to be answered and thinking through how we will construct our code to provide those answers. Prior to writing any code, we will create a data flow diagram of our programming plan and use that plan to request data. Typically, prior to writing any SAS code you would write programming specifications (or business requirements or functional requirements) that state exactly how you plan to answer the research questions using SAS code. For illustrative purposes, we will write the specifications as we code (i.e., the specifications will be our explanation of the code).
2. Obtain the needed data. There are many different types of data you can use, depending on the nature of the project and its requirements. Our project will focus on using administrative claims and enrollment data.
3. Develop code to create the analytic files needed to answer research questions. Analytic files are essentially summaries or subsets of the raw source data that we will use to perform the analysis that answers the research questions.
4. Develop code that utilizes the analytic files to create answers to the research questions.
5. Perform quality assurance and quality control of our algorithms.
6. Run our algorithms in production, typically by using a batch submittal.
7. Create documentation, take steps to preserve our output data sets, and complete any contractually required data destruction.
Our Programming Project
During the planning process for this book, I thought long and hard about an example project that would be useful to a variety of users. I wanted the project to address common research questions and result in the creation of algorithms that are almost universally applied to research programming work that uses Medicare data. As such, I came up with the following criteria for the project:
• The research questions must be applicable and relevant to today’s research environment. For example, the accurate measurement of utilization and cost of health care services are topics that have been consistently important and relevant since the first person used a computer to analyze health care claims.
• The research questions must lend themselves to building a foundation for addressing “real world” questions. The foundation for all research programming projects is the study population, so our example project will include obtaining beneficiary enrollment data for defining continuous enrollment.
• The research questions must result in algorithms that are easily adaptable to being used to answer research questions in your “real world” work. For example, we will create an algorithm that defines continuously enrolled beneficiaries as those beneficiaries who have had Medicare Fee-for-Service (FFS) coverage for all 12 months of a study year. This kind of algorithm is easily modified to define continuously enrolled beneficiaries as those beneficiaries enrolled in Medicare Advantage (MA) for 6 months of the year.
• Although this is an introductory text, the research questions must illustrate some of the complexity of using Medicare data. If there is one common trait that unites all of the projects that I have worked on, it is that the project always grows in complexity.
With these criteria in mind, I designed the following example research project:
Let’s imagine that we are working at a university or a policy research company (maybe you don’t need to imagine this because you already are!). Let’s further imagine the Centers for Medicare & Medicaid Services (CMS) enacted a pilot program during calendar year 20102 designed to reduce costs to Medicare and improve (or at least not harm) quality outcomes for Medicare beneficiaries. The details of the pilot program are not particularly important to our effort, so let’s imagine that the program provides an extra payment to providers that significantly reduce payments and improve quality outcomes when compared to groups of their peers. We have been asked to evaluate the effectiveness of the program, and we would like to start by measuring simple payment, utilization, and quality outcomes for those providers that interacted with the beneficiaries in our sample population during the study year 2010. Because the program had been operational for the full 2010 calendar year, we can identify the providers that participated in the pilot. Therefore, the starting point for our example research programming project is a file provided by CMS that contains identifiers for the providers that participated in the pilot program, along with identifiers for the beneficiaries associated with those providers.3 Therefore, we must acquire enrollment and claims data for these beneficiaries, and subsequently develop algorithms that will query the data to produce summaries of payment, utilization, and quality outcomes during the study year. These summaries will be used in our evaluation of the program.4
In the end, the goal of this text is not to make any real determinations about utilization, payments, and quality of care; after all, we are using fake data that cannot be used for drawing any real conclusions and exists solely to develop code.5 Rather, our goal is to prepare you for working on your own real world research projects by using our example research programming project to teach the mechanics of using Medicare data to measure utilization and Medicare payments, and to identify chronic conditions and commonly used indicators of quality outcomes. Therefore, by the end of this text, the reader will understand important concepts that are applicable and foundational to using Medicare administrative claims and enrollment data to, say, identify most any chronic condition or compute most any quality outcome metric.
We can now be more specific about the things we would like to measure. In particular, evaluating the success of the program involves coding the following measurements of utilization and payment for the beneficiaries in the pilot program we are studying. We need to:
• Calculate the number of evaluation and management (E&M) visits in a physician office setting, and the amount paid for those E&M services.
• Calculate measures of inpatient hospital utilization, and the amount paid for inpatient hospital claims.
• Calculate the utilization as it pertains to the professional component of emergency department (ED) visits.
• Calculate the utilization of ambulance services.
• Calculate the number of outpatient visits, as well as skilled nursing facilities (SNFs), home health agencies (HHAs), and hospice care.
• Calculate the total Medicare amount paid for all Part A claims for our population.
In addition, evaluating the success of the program also entails coding the following measurements of quality outcomes, often at the physician level:
• Measure evaluation and management utilization for beneficiaries with diabetes or chronic obstructive pulmonary disease (COPD).
• Identify the extent to which diabetics received services for eye exams.
• Calculate the number of hospital readmissions for beneficiaries with COPD.
• Finally, we will provide examples of methods to summarize and present results by beneficiary demographic characteristics, as well as by provider. While these examples are by no means exhaustive (e.g., we do not summarize and present every analysis performed in earlier chapters, we do not endeavor to analyze results using a control population, and we do not look for significant changes in performance over time), they do provide the reader with a foundation for further work.
The above concepts meet our criteria of being relevant, foundational, and adaptable. For example, instead of studying hospital admissions for Medicare beneficiaries with diabetes, you could study the same utilization and cost measurements for beneficiaries with prostate cancer. Similarly, you could adapt the measurement of retinal eye exams for diabetics to examine a different procedure (say, immunization for influenza) for beneficiaries with a different chronic condition (say, beneficiaries with prostate cancer or COPD).
Chapter Outline
Each chapter in this book will address a section of the project:
• Chapter 2 sets a foundation for using and understanding the data by learning about the Medicare program. Remember, the guiding principle of this book is that the only proper way to answer research questions about the Medicare program is by understanding the program that drives the data.
• Chapter 3 builds on the foundation developed in Chapter 2 by describing the content of Medicare data files in detail.
• Chapter 4 plans the project by describing the initiation, planning, and design phase of the Systems Development Life Cycle (SDLC).
• Chapter 5 covers requesting, obtaining, and loading the necessary data. In this chapter, we begin to work with our source data. This chapter marks the beginning of the discussion on the creation of the analytic files we will use to summarize our results and ultimately to answer our research questions.
• Chapter 6 defines beneficiary enrollment characteristics, including the creation of variables that indicate continuous enrollment, age, and geographic information.
• Chapter 7 presents code to calculate the aforementioned measurements of utilization.
• Chapter 8 presents code to calculate the aforementioned measurements of Medicare payment.
• Chapter 9 identifies common medical conditions by focusing on diabetes and COPD, and develops examples of basic measurements of quality outcomes for beneficiaries who have these conditions.
• Chapter 10 focuses on bringing the output of Chapter 5 through Chapter 9 together, using that output for answering the research questions, and presenting those answers. We will also discuss the steps involved in finalizing our work through documentation, preservation of code, and complying with all of CMS’s Data Use Agreement6 policies, such as the destruction of data.
As you can see, another way of presenting the organization of this book is to say that the first four chapters are not focused on writing SAS code. Rather, they are focused on learning about the Medicare program, Medicare data, CMS’s systems, and the unique process of planning a research programming project that utilizes Medicare administrative data. Again, this is intentional. It is significantly important that the reader acknowledge that one must understand the Medicare program in order to successfully work with Medicare data, and one must understand Medicare data in order to properly answer research questions with Medicare data. Therefore, the first four chapters set up a foundation for the remaining chapters, with the coding and the actual execution of answering our example research questions occurring in Chapter 5 through Chapter 10.
How to Use This Book
With this in mind, each reader will come to this book with different levels of programming experience and various levels of exposure to working with Medicare administrative data. I recommend the following approach based on the reader’s experience:
• Those readers with experience in SAS, but with no background in healthcare or Medicare data could focus more on Chapter 1 through Chapter 5, where we learn about the Medicare program, how the Medicare program drives the content of the data, and the acquisition of Medicare administrative claims and enrollment data.
• Those readers with knowledge of the Medicare program, but not Medicare data or SAS could focus more on Chapter 3, where we discuss Medicare data, as well as Chapter 6 through Chapter 10, where we develop the majority of code.
For those readers using the code developed in this book, it is important to note that there will almost certainly be ways of making the code we write more efficient. I have consciously sacrificed developing code that processes efficiently (e.g., shorter CPU and wall time) for the efficiency gained by writing code that clearly steps through a process, even if it involves coding additional DATA steps. Indeed, the full set of algorithms presented in Chapter 5 through Chapter 10 can be combined into fewer steps requiring less reading and writing of large Medicare claims files, which would result in a set of code that processes faster. However, our objective is to learn about programming with Medicare data, so we are sacrificing those efficiency gains in order to develop specific algorithms in a stepwise fashion, chapter by chapter. It would be a good exercise for you to revamp the code developed in Chapter 5 through Chapter 10 in order to reduce processing time (and I’d love to see your results!).
The online companion to this book is at http://support.sas.com/publishing/authors/gillingham.html. Here, you will find information on creating dummy source data, the code presented in this book, and answers to the exercises in each chapter. I expect you to visit the book’s website, create your own dummy source data, and run the code yourself.
Disclaimer
The synthetic data used for purposes of this book originated with the Centers for Medicare & Medicaid Services’ (CMS) Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF), which is available in the public domain. While the DE-SynPUF is derived from data that is used by CMS for operational purposes, the DE-SynPUF does not permit direct identification of any individuals because all direct identifiers have been removed. The author assumes no responsibility for the accuracy, completeness, or reliability of the DE-SynPUF, and assumes no responsibility for the consequences of any use of the data or algorithms contained in this book. The data are used herein without any representation or endorsement and without warranty of any kind, whether express or implied. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, SPECIAL OR INCIDENTAL DAMAGES RESULTING FROM, ARISING OUT OF OR IN CONNECTION WITH, THE USE OF THE DATA OR ALGORITHMS CONTAINED HEREIN.
Chapter Summary
In this chapter, we introduced the purpose of the book, described the framework of the book, and specified our example research programming project.
1 The idea of understanding Medicare data prior to writing any code is so important that the first five chapters of this book are focused on laying out the framework of the book, learning about the Medicare program, Medicare data, and CMS’s systems, and the unique planning process of a research programming project that utilizes Medicare administrative data. We do not begin to write any SAS code until Chapter 5!
2 Updating the year of study requires examining the choice of descriptive codes (such as procedure codes) discussed in later chapters. The need to choose relevant codes based on year of study is very common in health services research. In Chapter 10, we present an exercise that asks the reader to contemplate the changes we would need to make to update the text as if the demonstration program we are researching took place during the year 2015.
3 We will discuss the use of this file more in Chapter 5. Specifically, this file will serve two purposes in our work. First, we will use this file as a “finder file” that serves as the basis for our data extraction. In addition, we will use this file to assign responsibility for a beneficiary’s care to a provider (called attribution). We assume that this finder file was sent to us as a SAS data set and a flat text file.
4 It is worth noting that these types of evaluations are not unique to the Medicare world. Evaluative studies like the sample project we will undertake in this book are frequently performed by private health plans, government purchasing agents, and other entities around the world.
5 For more information, please see the disclaimer below.
6 A Data Use Agreement (DUA) is a contract governing how the user will interact with the data, including data security and data destruction procedures.