Please send us your comments on this issue, ideas for future issues, and news about your professional interests and accomplishments.

Al Dorof, Editor


Biostatistics: Science by the Numbers

Heidi Christ-Schmidt '91
Heidi Christ-Schmidt '91

As a reporting biostatistician to an independent data monitoring committee for a clinical trial, Heidi Christ-Schmidt '91 was responsible for summarizing interim safety and efficacy data for the committee in written reports and answering questions during their meetings. Sponsored by the biotechnology company, Genentech, the clinical trial aimed to show that administration of its drug, Avastin (bevacizumab), to patients with metastatic colorectal cancer would reduce mortality. The data monitoring committee was composed of four members three clinicians and one biostatistician who were independent of Genentech. As part of the U.S. Food and Drug Administration's (FDA) clinical trial process, the committee was responsible for reviewing real-time safety data, selecting the experimental arm of the trial at an interim analysis, and assessing efficacy at a second interim analysis.

"My challenge was to present the interim trial data succinctly and accurately so that the DMC could make the appropriate decisions at each meeting," recalls Christ-Schmidt, a biostatistician with Statistics Collaborative, Inc. (SCI), Washington, D.C., a biostatistical consulting firm serving pharmaceutical and biotech companies and government clients.

At any one of their reviews, the committee could have recommended stopping the study if safety data indicated that patients were being exposed to a potentially harmful treatment. Genentech remained blind to any information by the treatment arm during the course of the clinical trial. The trial, which proceeded to its planned end, showed that the risk of death in the treatment group was about two-thirds that of the risk in the control group. The probability that the difference in mortality would have occurred by chance if the drug treatment were ineffective was less than one out of 1,000.

"On the basis of this study, the FDA approved Avastin for patients with metastatic colon cancer," Christ-Schmidt says.

The term "biometrics" (often used interchangeably with biostatistics) has been used since early in the 20th century to refer to the application of statistical and mathematical methods to data analysis in the biological sciences, according to the International Biometric Society, which counts almost 6,000 active members, including more than 2,000 in the United States alone. Biostatisticians work in a wide range of fields, including agriculture, biomedical science and public health, ecology, environmental sciences, forestry and allied disciplines.

For this issue of S&T, we talked to five Bryn Mawr alumnae working in several of these areas about the tools, challenges and rewards of their work.

A Solid Foundation

Approximately two thirds of SCI's contracts involve presenting interim safety and efficacy analyses of clinical trial data to data monitoring committees, says Christ-Schmidt, and the remainder includes assistance in preparing final study reports, protocol development, meta-analyses and epidemiologic analysis.

Data monitoring committees and regulatory groups use the statistical reports generated at SCI to make important decisions about the future use of an experimental drug or device. "This is both exciting and nerve-racking!" Christ-Schmidt says. "We must strive to present data accurately and thoroughly, but also succinctly and quickly. So we are constantly struggling with the competing drives."

Unfortunately, clinical trial results may be inconclusive, not necessarily because a drug was ineffective, but because the sponsor designed the trial poorly, failed to collect the appropriate data, or failed to apply the correct analytical techniques. "They will then have a very difficult time defending their results to the FDA," Christ-Schmidt says. "It is particularly a problem when there are few patients, for example, in a study involving a rare disease. In these cases, it is very difficult to conduct a confirmatory trial." SCI is often consulted after a study fails to demonstrate a statistically conclusive result in order to either advise on the design of a new trial or, when possible, to salvage the results from the original trial.

There is a lot at stake. "Increasingly, sponsors involve my colleagues and me when the research protocol is being written," Christ-Schmidt says.

The protocol not only describes how the trial will be conducted and the clinical questions that the trial is designed to answer, but also information about the main statistical analyses that will be used to answer these questions. The protocol also states how large the clinical trial will be and the justification of that sample size, which is another statistical component. "Often, clinical researchers think too much in the abstract," notes Christ-Schmidt. As a biostatistician, she can help them design a concrete, practical protocol that will generate clear results.

A Crucial Role

Multidisciplinary collaboration is crucial to the successful design of clinical trials, agrees Wendy M. Leisenring '86, a biostatistician who is a member of the Clinical Research Division (CRD) of the Fred Hutchinson Cancer Research Center in Seattle, a pioneer in bone marrow and other forms of stem-cell transplantation.

In collaboration with principal investigators in the CRD, Leisenring participates in the design and analysis of Phase I, II and III clinical trials, which aim to further expand and improve the center's ability to treat patients who are suffering with a wide range of diseases and prognoses. "My overall research goals are to ensure that scientific research is carried out using appropriate statistical methodology," she says. "Biostatisticians are a well-respected and integral part of the CRD team."

Leisenring has developed new statistical methods that allow researchers to answer clinical questions about diagnostic tests. Because infection is a serious potential complication in patients undergoing stem-cell transplantation, accurate, early detection of infection is an important research topic at the CRD. "It is important that research leading to the development of new tests be rigorous in determining the tests' accuracy," she explains.

For example, Leisenring and her colleagues have developed new methods to compare multiple diagnostic tests, describe the degree to which the outcome of a diagnostic test revises the pre-test odds of disease, and compare predictive values of diagnostic tests when data are obtained under paired-study designs.

Collaboration also plays a crucial role in the design and analysis of large epidemiological studies, such as the Childhood Cancer Survivor Study (CCSS), a 13-year retrospective study of more than 10,000 five-year survivors of childhood and adolescent cancer and more than 3,000 siblings. Leisenring is the lead statistician for the study, which is supported by a grant from the National Cancer Institute and involves 27 institutions across the United States. The ongoing study involves an analysis of data from patient records and a 289-item questionnaire completed by survivors and siblings. A comprehensive summary of the chronic health conditions among these subjects was published in the New England Journal of Medicine in October 2006, showing that childhood cancer survivors are eight times more likely than a cohort of their siblings to have at least one severe or life-threatening condition.

"For this large study, the investigators work with us to write a proposal for each project we undertake," Leisenring says. "We describe the importance of the study, its specific aims, the population, key data elements and analytical process, even going so far as to mock up tables of potential results. This level of planning does not always happen in clinical studies, but it is essential with so many people involved from all over the country. Throughout this process, lots of discussion between the clinical researchers and statisticians is necessary in order to make sure relevant questions are being answered with the existing data."

Public Health

Rebecca K. Stellato '90
Rebecca K. Stellato '90

Large epidemiological public-health studies pose similar challenges, says Rebecca K. Stellato '90, a lecturer and statistical consultant to researchers and students in the Center for Biostatistics at the University of Utrecht in The Netherlands. Previously, as a biostatistician/researcher for the Center for Environmental Health Research of the Dutch National Institute for Public Health and the Environment, she was the lead statistician for a study of a 2000 fireworks disaster in the city of Enschede, in which a fire and subsequent explosion at a fireworks depot killed more than 20 persons, injured hundreds and destroyed adjacent properties. The study examined the physical and psychological health impacts on residents, visitors, passersby and first responders police, firefighters and paramedics at four weeks, 18 months and four years after the disaster.

"Every study is slightly different," Stellato observes, "and so are the methods of collecting and analyzing the data. The actual research question sometimes gets lost in the sheer volume of information. The biggest challenge for a biostatistician is identifying precisely what question the researcher is trying to answer through the study."

In this case, the data set comprised subjects' responses to a 350-item questionnaire. Among the challenges were sporadic subject participation and drop-outs, associated missing data, and the confounding effects of the relationship between physical and psychological symptoms.

Missing data is a common problem with studies involving questionnaires. "If a questionnaire poses five questions to 20 people and we want to use all five of those variables, yet someone fails to answer one of the questions, then that subject generally will be thrown out of the model," Stellato explains. "Every time a subject is missing a variable, they are eliminated. As a result, we can lose 20, 30 or even 40 percent of our subjects. In the past, statisticians tried to deal with that by taking the average response of all subjects who answered the question, and imputing it to all those who did not. One problem with that is we may be imputing a value to one subject based on subjects who are very different. For this study we used a newer technique, called multiple imputation, which imputes several plausible values, based on a model, plus a little random variation, which more accurately reflects reality."

Art and Science

The workhorse for biostatistical analyses is SAS, a statistical software package developed by SAS Institute, but there are a number of different packages that can be used, including "R," a free software environment for statistical computing and graphics, which runs on UNIX, Windows and MacOS platforms. "SAS incorporates the most widely used statistical techniques," Stellato says. "The statistician writes the program for the data in the language of that software package, and then the software performs the analyses."

The statistical tools run the gamut from simple descriptive statistics to more complicated methods to specialized techniques used for clinical trials, Christ-Schmidt says. "For example, 'conditional power' is the probability of seeing a statistically significant result at the end of the trial given the current results that is, 'conditional' on the results from the data available now. Conditional power is a tool that allows the trial sponsor, usually through an independent data monitoring group, to determine whether a trial is very unlikely to show that the new treatment is beneficial."

Like other biostatisticians, Stellato says, "Sometimes I have to write an independent program, either because there is no software for it or the software doesn't give me exactly what I need."

Leisenring agrees, "We may need to write a program to do something in a different or more innovative way. Also, a 'canned' statistical routine may not be the best approach to an analysis. For example, to some degree there is an art to building a good statistical model. There are statistical analysis packages that just 'spit out' a multivariable model that one might use, but it's really important to go through a much more thoughtful process in developing those models."

Animal-Human Connection

Yoko Adachi '97
Yoko Adachi '97

The health of more than a hundred million companion animals, plus millions of poultry, cattle, pigs and fish in the United States and the people who dine on them also depend on the work of biostatisticians. Yoko Adachi '97 is a mathematical statistician in the Office of New Animal Drug Evaluation at the FDA's Center for Veterinary Medicine (CVM), where she reviews study protocols and applications for approval of new animal drugs. The center is responsible for ensuring that animal drugs and medicated feeds are safe and effective, and that foods derived from treated animals are safe for human consumption.

For example, in FY 2005, the center approved a number of new animal drug applications, including an antimicrobial drug for control of mortality due to enteric septicemia in catfish, a drug feed ingredient to increase milk production in dairy cows, and an antiprotozoal drug for the treatment of horses with equine protozoal myeloencephalitis.

Pre-marketing reviewers study data submitted by drug sponsors to determine if they support a drug's approval for commercial marketing. Biostatisticians, microbiologists and pharmacologists typically work closely with veterinarians, who are designated as primary reviewers. "The methods of statistical analyses for animal drug applications are fairly similar to those for human drugs," Adachi says, "although sample sizes are typically smaller and we do not have clinical phases."

As is the case with human clinical trials, however, the experimental design affects the statistical techniques used to analyze the data. "There was a case in which a sponsor was proposing to collect a blood sample once before and once after the treatment, but the primary reviewer requested more frequent sample collections," Adachi recalls. "The study wound up having four post-treatment collections, and we determined that the statistical methods also needed to be changed to analyze these data."

Animal pharmaceutical companies often submit study designs to FDA/CVM during the planning phases of developing a protocol. "For example, they may need input from a biostatistician as to the appropriate end-points and statistical procedures, and sample size for the study," she says. "If we are consulted in advance, we can help prevent problems with study design and data analysis."

Natural Systems

Camilla Lieske '91
Camilla Lieske '91

There are unique challenges associated with the analysis of data from natural systems. Take the case of population decline among Steller sea lions. The largest of the "eared" seals, they can be found throughout the North Pacific Ocean, through the Aleutian Islands and Bering Sea, and south along the North American coast to central California, although 70 percent of the total population resides in Alaska. Counts of Steller sea lions on rookeries and major haul-outs between the mid-1970s and the present indicate a major population decline. The causes are unknown but may include disease, environmental changes and the effects of commercial fisheries, according to the Alaska Department of Fish and Game. Wildlife veterinarian Camilla Lieske '91 is in Fairbanks studying the decline through biometric analysis of data that have been collected on the species over a period of 20 years.

"The biological systems I work with are not simple and they are not controlled," Lieske says, "so the analyses are often complex both in how they are formed and in how they are interpreted. Rather than bringing an animal into a laboratory where we can control the variables and look at one variable at a time, I am trying to allow all the variables that are in action to be working at the same time, and trying to find patterns and meaning in these natural biological systems."

Lieske also encounters problems with data sets. "They are often incomplete, and they have been collected by various people, so I have to make sure there is consistency," she says. "Analytical techniques have changed over the years, too, and that is one reason why I have focused on the more recent data sets in my work on Steller sea lions. "Although I have been looking primarily at data from the most recent five years, I am also trying to include as many data as possible for the last 10 years." 

The science of biostatistics brings order to the chaos. "It allows methods of interpretation and understanding," Lieske says. "Using statistical methods, I can filter out the noise to get at the nuggets of truth."

Since the initial decline in the Steller sea lion population as a whole, the western stock has continued to decline, while the eastern stock is rebounding. "There are a lot of questions about the difference in recovery between these two populations," Lieske says. "A lot of scientists have been wondering about juvenile health differences between the two. My interest is in how to define health, so I am looking at the data that have been collected over the years, focusing on recent blood samples, to identify and compare health parameters."

Thus far, Lieske has found little difference in juvenile health of the two populations. "My results suggest that if that was an issue when the population first began to decline in the 1970s, it is not necessarily the ongoing issue," she says. "So we are now looking at other issues, such as reproductive health."

Lieske will undertake a similar analysis of data pertaining to health and respiratory diseases in caribou.

Computational Power

Like professionals in many other fields, biostatisticians have benefited from the enormous strides made in computer hardware and software technology over the past couple of decades. "The statistical software packages have all evolved enormously," Leisenring observes. "There are many more options within those packages for carrying out a greater variety of analyses. When I first started in 1993, I had to write special programs more frequently, and now they are part of the software."

For example, Stellato says, "There is much more available in the way of longitudinal and multi-level data analyses within standard software packages, largely, I think, because of the increasing power and speed of computers. An analysis that once required an overnight run on a mainframe computer might take only five minutes on a PC today. One exception is genetics research, which involves such enormous data sets that the analyses still sometimes have to be run overnight."

The use of computer simulations has also grown. "Thirty years ago, it was difficult to do large simulation studies, in which you create a set of parameters and assumptions about how a population might behave, generate many data sets using random numbers and random variables, and see what happens when you tweak these parameters," Leisenring explains. "Statisticians use modeling to predict what would happen in an epidemic, for example, increasing the number of people who are vaccinated to see how that might affect the spread of the epidemic."

Patterns Emerge

For all its challenges, the field of biostatistics offers many rewards. For Lieske, biostatistics is a means of understanding the effects of the environment on animal populations. "I love it when patterns emerge," Lieske says. "It's beautiful when you come up with an understanding of what's going on. I've always been driven to understand the why and how. That's what I find rewarding: the moment when I say, 'Ah, that's what's driving it.'"

The work is challenging and rewarding, says Adachi, "especially since animal welfare may affect the welfare of humans. I also find it interesting to gain information about the latest research in biotechnology and medicine."

In her role as a participant in the design and analysis of clinical research trials, Leisenring says, "One of the most rewarding things is that I have really gotten to know an area of medical research and that I have been able to help with the process of developing better treatments."

Then there is the pleasure derived from solving a puzzle. "When somebody gives me a data set, an idea of their questions, and what they want to learn from their data, it is a puzzle for me," Stellato says. "I get to mess around with the computer as long as I need to in order to get the answer."

When Christ-Schmidt audits a colleague's work for quality-control purposes, she says, "It is like detective work or doing a puzzle, trying to find out why my results, which I programmed independently, do not match someone else's results." Similarly, she says, "We have actually been hired by clients to check a competitor's work because the clients were concerned about quality-control issues. We also often audit a trial's randomization during the course of the trial to ensure that the randomizing company is carrying out the randomization as planned. I know that problems with the randomization are serious and I still get a kick out of being the one to find them."

About Our Sources

Yoko Adachi '97 is a mathematical statistician in the Office of New Animal Drug Evaluation at the U.S. Food and Drug Administration Center for Veterinary Medicine. She reviews study protocols and data submissions that pharmaceutical companies submit to fulfill requirements that new animal drugs must be shown to be safe and effective. She has served as the center's representative for the FDA Statistical Association, and she is a member of the 2007 FDA Science Symposium Planning Committee. Adachi earned a Master of Science in biostatistics from the School of Public Health at the University of Pittsburgh .

Camilla Lieske '91, a wildlife veterinarian, is a biometrician for the Alaska Department of Fish and Game in Fairbanks, where she is analyzing data for the department's Wildlife Conservation Sea Lion Program. Lieske is a doctoral candidate investigating the effects of environmental factors on amphibian health at the University of Illinois, Urbana, where she completed a residency in veterinary toxicology. She earned a Doctor of Veterinary Medicine, as well as a Master of Preventive Veterinary Medicine, from the School of Veterinary Medicine at the University of California, Davis. She is a Diplomate of the American Board of Veterinary Toxicology and serves on its testing committee.

Heidi Christ-Schmidt '91 is a biostatistician with the Statistics Collaborative, Inc., a biostatistical consulting firm serving pharmaceutical, biotechnical and government clients. A recipient of a three-year GAANN fellowship, she earned a Master of Science in Engineering in probability and statistics from the Department of Mathematical Sciences at Johns Hopkins University. She is a member of the American Statistical Association, the Association for Women in Mathematics and the Society for Clinical Trials.

Wendy M. Leisenring '86 is a full member of the Clinical Statistics faculty of the Clinical Research Division of Fred Hutchinson Cancer Research Center in Seattle . She serves as lead statistician and a member of the steering committee for the National Cancer Institute-funded Childhood Cancer Survivor Study. An affiliate professor in the Department of Biostatistics at the University of Washington, Seattle, Leisenring earned a Doctor of Science in biostatistics from Harvard University. She is a member of the American Statistical Association, the Biometrics Society and the Society for Clinical Trials.

Rebecca K. Stellato '90 is a lecturer and statistical consultant at the Center for Biostatistics, University of Utrecht, the Netherlands. Previously she was a biostatistician/researcher for the Center for Environmental Health Research, National Institute for Public Health and the Environment in Bilthoven, the Netherlands. She earned a Master of Science in biostatistics from the School of Public Health at Harvard University .


Dorothy Wright contributes news and feature articles on science, technology, engineering and general-interest topics to a variety of publications, including Civil Engineering and Engineering News Record.