Statistically Significant | Teachers College Columbia University

Skip to content Skip to main navigation

Statistically Significant

Quantitative techniques have been faulted both for failing to account for real-life nuances and for enabling researchers to generate whatever analysis illustrates a preferred story. Yet researchers grounded in theories of education and human development also are bringing a new rigor to more complex, humanistic questions. TC has hired many of them in recent years. Spanning domestic and international education policy, economics and the science of data analysis itself, they seek to understand individual people and institutions as well as mass trends. 


“You can’t do quantitative work in a vacuum,” says Professor Madhabi Chatterji, Director of TC’s Assessment and Evaluation Research Initiative. “Without knowing a school’s culture, you’ll ask the wrong questions and draw conclusions that don’t help.”

TC’s newest faculty hires get that, Pallas says. “They’re developing and using new tools to harness complexity. They’re driven by questions, not methods — and that’s what TC has always tried to do.”

Here, we bring you some of TC’s most powerful quantitative work.

As quantitative researchers get better at explaining the real world, TC is hiring some of the best young minds in the game 

Long-time education scholar Henry Levin avoids absolutes, but on one point he is unequivocal.

 

“An educational researcher needs a qualitative understanding of the practices being studied, but increasingly policy-makers want quantitative evidence on the impacts of specific policies and practices,” says Levin, William Heard Kilpatrick Professor of Economics & Education. Several new variables factor into that equation. “An explosion of information about education is revealing greater complexity in the world than we could see before,” says Aaron Pallas, Arthur I. Gates Professor of Sociology & Education. “It’s been spurred by development of the internet and high-speed computers and the trend towards evidence-based policy and practice in fields such as medicine and human services.”

 

 

The Economists

THE REAL SCORES ON REMEDIAL ED

Judith Scott-Clayton

 

Problem: Fewer than half of U.S. college students earn a degree six years after enrolling. Colleges spend upwards of $7 billion annually on remedial courses to upgrade basic skills, yet students who take them often fail to earn a two-year degree or transfer to a four-year institution.

Why isn’t remedial education working?

Two years ago, Judith Scott-Clayton, Associate Professor of Economics & Education, found that tens of thousands of students were placed in remedial courses unnecessarily at more than 50 community colleges that based assignments on a brief standardized test. Subsequently, she demonstrated that remedial assignments based on high school grade point averages would be more accurate, and that a quarter of remedial students would pass college courses with a B or higher.

Scott-Clayton employed a technique called regression discontinuity design that, in looking at any intervention, ferrets out the aspect that causes an observed effect. Regression discontinuity focuses on those closest to a cut point for deciding who receives treatment — in this case, students just passing or just failing remedial screening. Because those two groups are academically nearly identical, it’s assumed that remedial assignment or non-assignment must determine their subsequent outcomes. “A community college degree pays off in the labor market,” Scott-Clayton says, “so we need to target the right students for remediation.”

 

GETTING PARENTS TO TUNE IN

Peter Bergman

AN EXPLOSION OF INFORMATION ABOUT EDUCATION IS REVEALING GREATER COMPLEXITY IN THE WORLD...SPURRED BY THE INTERNET AND HIGH-SPEED COMPUTERS AND THE TREND TOWARDS EVIDENCE- BASED POLICY AND PRACTICE.”

When parents are watching them, students attend class, do their homework and get better grades. But getting parents to tune in — especially lower-income parents — can be tough. Many lack time to meet with teachers or the skills to check grades online. Those from other countries may not understand the U.S. grading system.

 

Two years ago, at a low-income, predominantly Latino high school in Los Angeles, TC Assistant Professor of Eco­nomics & Education Peter Bergman randomly signed up half the parents to receive bimonthly text messages about their children’s grades and missed assignments. Attendance im­proved, and GPAs went up. Parents initiated more contact with the school, developed sharper perceptions of how hard their kids were working, and became more likely to use punish­ment and revoke privileges as motivation.

“It was so effective that kids would ask each other, ‘Have you been Petered yet?’” Bergman recalls. “The school warned me not to park on their street.”

Now, funded by the Smith-Richardson Foundation, Bergman is testing the use of electronic grade books that automatically text parents whenever a teacher records a failing grade or absence. The study is “quantitative” in that a computer ran­domized the parents into the “treatment” and “control” groups, and because Bergman now has a database with which to model future policy approaches through simulation rather than direct observation. Meanwhile, he has addressed what economists call “information friction,” in which only the “seller” — in this case, the student — knows the quality of what he or she is providing. Each text costs just a tenth of a cent — minuscule compared with, say, paying teachers to make calls. “Parents care about their kids’ education,” Bergman says. “So we’re bringing school to them.”

 

 

DOES TEACH FOR AMERICA REALLY GET RESULTS?

Douglas Ready

 

Advocates of Teach for America (TFA), which sends graduates of elite colleges to teach in schools in low-income communities, say it pairs the best and brightest with the neediest. Critics say TFA teachers are untrained, deprive qualified teachers of jobs and quit before learning their craft.

In August 2014, Douglas Ready, Associate Professor of Education & Public Policy, published an eight-year study of the math and reading performance of 500,000 children in Duval County, Florida, which has employed 500 TFA teachers since 2005. He found that students of TFA teachers, who work in the lowest-performing schools, typically scored below others on state assessments. But when he measured student progress, a different picture emerged. By comparing students’ outcomes in years they had a TFA teacher to their own outcomes in years they did not, Ready discounted the potential impact of school quality, student socioeconomic status or teacher inexperience, isolating the impact of whether the teacher was TFA or non-TFA.

The result: a small “adjusted” advantage for TFA teachers in math and literacy.

“INTERNATIONAL ASSESSMENTS WERE BASED ON A LOOSE NETWORK

OF SCHOLARS GUIDED BY SPECIFIC RESEARCH QUESTIONS AND HYPO-THESES. [NOW] OFFICIAL REPORTS CONTAIN MORE RANKING TABLES AND LESS ABOUT RESEARCH.”

 

The Policy Analysts

LOKING UNDER THE HOOD OF INTERNATIONAL ASSESSMENTS

Matthew Johnson, Young-Sun Lee, Oren Pizmony-Levy

 

A slip in the global education rankings can trigger nation­wide hand-wringing. Yet there is often more nuance to the story. For example, while U.S. eighth graders trailed 10 other countries in the 2007 Trends in International Mathe­matics and Science Study (TIMSS), analysis by TC’s Matthew Johnson, Associate Professor of Statistics & Education, and Young- Sun Lee, Associate Professor of Psychology & Education, re­vealed that Americans outperformed several countries on specific skills such as data analysis, probability, location and movement. 

“These distinctions have major implications for how we approach math education,” Johnson says.

 

For Oren Pizmony-Levy, Assistant Professor of Interna­tional & Comparative Education, such findings highlight another concern: How testing and international ranking of nations became a legitimized global practice. The question has intrigued Pizmony-Levy since graduate school, when he attended a meeting of the International Association for the Evaluation of Educational Achievement (IEA).

“The Association was formed by U.S., European and Israeli scholars to test hypotheses of how social contexts affect education,” he says. “Nations weren’t ranked. Now, it’s all about providing high-quality data, benchmarks and indicators to governments.”

Pizmony-Levy created a quantitative data set mapping countries’ participation in all large-scale international assessments from 1958—2012. He interviewed key IEA members and sifted through unopened boxes of the organi­zation’s records. “Until the early ’90s, international assessments were based on a loose network of scholars guided by specific research questions and hypotheses,” he says.

 

“Since then, the work has been framed in terms of global governance and auditing of education systems. Official reports contain more ranking tables and less about research. Rankings can shake public confidence, creating an ‘education crisis’ that may not exist. Schools might narrow their curriculum to focus on test prep.” Now, he’s developing courses on the social analysis of international assessments to show students what we can and cannot learn from these assessments, which affect public discourse, policy and practice. That’s where the scholarship gets really interesting.”

 

The Methodologists

DESIGNING TRIALS FOR THE REAL WORLD

Elizabeth Tipton and Bryan Keller

Since 2002, the federal Institute of Education Sciences has sought to establish randomized clinical trials as the gold standard for determining what works and why.

In these large-scale experiments, researchers recruit schools and districts for studies to evaluate curricular or after-school programs, whole-school reforms and teacher professional development strategies. Half of schools are randomly assigned to receive a program and half to continue with business as usual. Outcomes are compared — say, student test scores — and the differences, if any, provide evidence that the program works.  

At TC, Assistant Professor of Applied Statistics Elizabeth Tipton helps researchers make better, more thoughtful gener­alizations from their experiments. As part of this work, Tipton collaborates with study designers to ensure recruitment of the most broadly representative populations.

“When you evaluate a reading program in 40 schools, you really want to know whether the program works in West Vir­ginia, or Texas or nationwide,” Tipton says. “You want to apply the results to policymaking on the largest scale. So I help think through how a study will be used. Then I work with recruiters, who often aren’t keyed in to the need for generalizability.”

Funded by the Spencer Foundation, Tipton is developing new web-based software (www.thegeneralizer.org) to facilitate this process. “There’s nothing like that right now,” she says. “It could improve the relevance of education research.”

Often, truly randomized trials are impossible or unethical. For example, you wouldn’t withhold a proven math program or make children repeat a grade just to observe the effect on future educational success or earnings. But in real life, kids get held back and schools lack funds for proven programs. When researchers observe such unscripted experiences, they often compare children who differ in income, race, cultural practices, geographic location, and school quality and culture.

 

In such situations, Assistant Professor of Applied Statistics Bryan Keller devises statistical methods — often after data has been gathered — to mimic random assignment to treatment. Keller specializes in separating out the effect caused by an inter­vention from the impact of differences in race, income or culture — variables that, in real life, may partly dictate why someone re­ceives the intervention. For example, children of color are likelier to be retained in grade, due in part to societal preconceptions.

Keller uses a technique called “propensity score analysis” to identify subjects in both study arms — treatment and control — with the most similar probability, based on all factors, of receiving the treatment. The process yields two groups matched in terms of key variables, isolating the newly introduced variable — the treatment — as the cause of difference in outcome.

Now Keller is combining propensity score estimation with the use of a method harnessing multiple computer processing units to parse big data. The technique “automatically handles com­plex relationships in the data too difficult for an analyst to detect.”

 

“WHEN YOU EVALUATE A READING PROGRAM IN 40 SCHOOLS, YOU WANT TO KNOW WHETHER IT WORKS IN WEST VIRGINIA OR TEXAS OR NATIONWIDE. YOU WANT TO APPLY THE RESULTS TO POLICY- MAKING ON THE LARGEST SCALE.”

 

 

The Data Miners

TOO MUCH INFORMATION? THEY DON’T THINK SO

Ryan Baker and Alex Bowers

 

 “PEOPLE SAY WE NEED THE RIGHT DATA, BUT I SAY WE HAVE IT,” BOWERS SAYS. “PAIR GRADES WITH STANDARDIZED ASSESS- MENT, USE CURRENT DATA- MAXIMIZING TECHNIQUES, AND YOU CAN SEE THE PROBLEMS.”

 

 

MADHABI CHATTERJI Professor of Measurement, Evaluation & Education

For decades, federal and state agencies have tracked school and student performance. Now smart tutoring systems and other technologies that record every keystroke have spawned the field of learning analytics, in which researchers search data for patterns and correlations to identify challenges facing individual learners, classes and entire school systems.

“Some people dismiss our data as ‘roadkill,’ meaning, figu­ratively speaking, that we ran it over by accident.” Ryan Baker, Associate Professor of Cognitive Studies, grins. “But I’m from Texas, and I say roadkill can be a delicious meal.”

As Baker puts it, “learning analytics is useful in messier situations — for example, when neither “good inquiry” nor what matters in achieving it has been defined. TC has emerged as a leader in learning analytics, with Bak­er and recent hire Alex Bowers, Associate Professor of Educa­tional Leadership, among two of the field’s rising stars.

 

Baker focuses on creating computer-based environments that best engage students in their work. He has correlated the in-the-moment intellectual decisions of teens who used intelligent tutoring systems with their subsequent academic success. He’s also taught and analyzed a MOOC (massive open online course) to determine better MOOC teaching strategies.

Now, Baker has created a learning analytics master’s degree program drawing on TC’s broad expertise in making sense of data — including diagram production and comprehension, because teachers and administrators want the reams of new data formatted to highlight key information.

MOUMIÉ MAOULIDI:

Bringing Rigor to the Field

STUDYING WITH TC’s Henry Levin, Moumié Maoulidi (Ph.D. ’09) read “Let’s Take the Con out of Economet­rics,” an Edward Leamer article calling for more rigorous empirical work. The critique piqued his interest in experimental research methods such as randomized control trials. n Sub­sequently, while working at Columbia University’s Earth Institute, Maoulidi noticed how findings from these trials are influencing policy discussions in developing countries. Now at Stanford University’s Institute for Eco­nomic Policy Research, he is using knowledge from TC’s Economics & Education program to conduct cross-disciplinary applied research.

Are teachers using such data? In a study of an analytics-informed Texas math program called “Reasoning Mind,” Baker found that students were engaged and on task 89 percent of the time, meaning they received the equivalent of 40 more hours of math instruction than in a typical classroom.

 

Alex Bowers is a former pharmaceutical cell biologist who was among the first to make use of the newly sequenced human genome. His team’s challenge: to identify from among roughly 3 billion base pairs of nucleotides (the strands that form DNA), those that would make likely targets for new drugs.

While patenting two genetic targets for cancer drugs, Bow­ers encountered a knottier problem moonlighting as a commu­nity college science instructor: “My students had never written a paper or taken a class where you talk through science issues.”

Today, helping schools and districts improve students’ long-term learning, Bowers still mines data to find intervention targets. One turns out to be young children’s grades. Re­searchers and policymakers consider test scores more reliable than grades in predicting future performance. Yet, in his dissertation study, Bowers showed that even a student’s marks in first grade powerfully predict the odds of graduating from high school.

Bowers retrieved overlooked student records and applied a technique called cluster analysis to identify meaningful patterns. The result, now adorn­ing his office, is a chart with horizontal lines representing the entire K—12 academic careers of two districts’ students. On each line are a student’s grades. Pairs of the most similar student performance trajectories over time are grouped together. Each pair is grouped with another to which it, in turn, is most similar — and so on, forming color-coded clusters.

 “The human eye is good at picking out blocks,” Bowers says, and in­deed, the chart clearly shows, by third grade, who will and won’t graduate. Students diverge into higher- and low­er-achieving clusters, divided by a B in subjects such as reading. Third-graders with B minuses and C pluses fail in high school. Nearly half the third-grad­ers in the lower cluster fail to graduate.

 “Statistics seem impersonal, but I’m showing each student’s experience,” says Bowers. “Like qualitative work, it creates the possibility for tailored interventions.

“People say we need the right data, but I say we have it. Pair grades with standardized assessment, use current data-maximizing techniques, and you can see the problems.”

 

— Joe Levine, Photographs by Deborah Feingold

Published Wednesday, Nov 4, 2015

Henry Levin
Henry Levin
Aaron Pallas
Aaron Pallas
Oren Pizmony-Levy
Oren Pizmony-Levy
Alex Bowers
Alex Bowers
Madhabi Chatterji
Madhabi Chatterji
Doug Ready
Doug Ready
Bryan Keller
Bryan Keller
Matthew Johnson
Matthew Johnson
Young-Sun Lee
Young-Sun Lee
Peter Bergman
Peter Bergman
Ryan Baker
Ryan Baker
Judith Scott-Clayton
Judith Scott-Clayton