Back to Home
The ConsortiumPerformance AssessmentActivismConsequences of Testing


This article has been archived for you by PerformanceAssessment.org

Putting the test to the test

By BENJAMIN WACHS
Brighton-Pittsford Post
August 28, 2003

After almost 10 years of controversy, New York State education officials insist that their new standardized tests have improved classroom teaching and raised the achievement levels of high school graduates.

"Prior to the early 1990s there were no such thing as education standards in New York," said Deputy Education Commissioner Jim Kadamus. "Under the old system, we had a two-track system: some kids got the Regents, some kids didn't."

Yet critics of the state's program say it may actually be doing more harm than good. And Kadamus and others cite virtually no hard statistical evidence behind most of their claims.

Milton Cofield, a member of the New York State Board of Regents, was asked at a December public meeting in Rochester to name even a single study showing that the Regents exams have been accomplishing everything the state claims.

He couldn't.

Kadamus himself acknowledged in February that the state actually has no statistical information showing that the Regents exams have worked. Or that they should work.

"The evidence we have right now is anecdotal," he said.

Kadamus and the Board of Regents say, despite that lack of evidence, the new exams, five mandatory Regents tests determining whether a student can graduate, are doing everything they are supposed to.

"The Regents adopted the Regents exams as the most reliable and consistent way, on a statewide basis, to measure the standards, and then they set up an assessment review panel with people from around the state to review alternatives," said Alan Ray, the state Education Department's director of communication.

Ray added the exams are clearly improvements over the former student assessment standards, which allowed many students to slip through the school system without a basic mastery of skills.

"The English exams added a lot more writing, the math exams added more multi-step word problems, the social studies exams had more document interpretation problems where you had to do more research," he said.

A close examination of the history of these tests, though, shows that the state's own panel of experts recommended against exactly the kinds of exams the state enacted. Another panel of experts told the state it had to prove that the exams work before it could institute them - which never happened.

A review of scientific research on the subject shows that there is virtually no evidence that standardized tests like the Regents exams are good for education - and a rather large body of scholarly work suggests they aren't.

The research is significant enough that the American Psychological Association, the American Educational Research Association and the National Council on Measurement in Education all determined as far back as 1985 that no important decision on the fate of a child should ever be decided on the basis of a single test.

The state was told that but did it anyway.

The state says it needed to institute the new, mandatory Regents exams because New York had a "two-tiered" educational system that shut out the poor, minorities and people with the bad luck to be in a bad school system.

"Somebody made a decision about that. Somebody said 'Benjamin's smart and gets to take the Regents, Jim's not so smart,'" he said. "And many of the people being put on the lower tier were minority kids, urban kids, poor kids from rural areas who people thought 'it just isn't worth it.' When we forced everybody to take the Regents, many of the kids who weren't even getting a shot before got their shot, and many succeeded."

The biggest benefits, he said, are among those who suffered the worst under the previous system.

"In the suburban schools, you're probably not noticing much difference," he said. "Most suburban schools were getting 70, 80 percent of the kids passing the Regents anyway.

"For Pittsford and Brighton it's really not a big change," he said. "But where this has to have an impact is in places like Rochester and Buffalo and New York City and Utica and Jamestown."

Combined with the exams given at the fourth- and eighth-grade levels, designed to make sure students enter high school on track, Kadamus said the Regents exams will eventually raise the bar for all students in New York.

"The minimum level of competency has gone up, and the whole system has shifted up," he said. "We've established that many kids can pass the Regents exam. We haven't established that all kids can. We're still working on that."

And, even though Kadamus could not cite any specific study demonstrating that the Regents was having a beneficial impact, he did say his understanding is that surveys out of the State University of New York system show improved grade point averages as a result of the Regents exams.

There are a few studies - even one as late as 2003 - suggesting that a program like the Regents exams can have a good impact on students, though the state did not cite any.

One study, completed last year by Martin Carnoy and Susanna Loeb, showed that some states that enact standardized testing have higher average scores on other standardized tests like the SATs.

But the overwhelming body of research on standardized testing either strongly indicates or states directly that tests like the Regents exams are bad for classrooms and students, especially the poor and minority students the state was most trying to help.

The most recent studies were released this month and are considered definitive enough that when the American Psychological Association, the American Educational Research Association and the National Council on Measurement in Education decided to revise their testing guidelines in 1999, they retained and updated the warning that no important decision about a child should be based on the result of a single test.

"It's just good practice across the board to make important decisions based on more than one piece of information, to try to get as much information, and different kinds of information, as possible," said Marianne Ernesto, the American Psychological Association's director of testing and assessment. "It's ingrained in everything we know about assessment."

Assemblyman Steven Sanders, D-NYC, who chairs the state Assembly's committee on education, said the evidence presented to his committee on that point is overwhelming.

"Even the companies that make the tests, like McGraw Hill, will tell you that," Sanders said. "It's like the warning on the side of a pack of cigarettes: they warn you that a test should not be taken in and of itself to determine the worth of a student."

Linda Darling Hammond, an educational specialist at Stanford University, who chaired the state's council on curriculum and assessment in the early-1990s, likewise said that there's little ambiguity on this point.

"I don't know that there's total unanimity," she said, "but there's a strong body of research that's well known by a lot of scholars."

In fact, prior to implementing the standardized tests in New York, the Regents were specifically warned by their own panel of experts not to put a system like the current Regents exams in place, according to Darling Hammond, who chaired that panel.

The Regents created the new state educational standards, called "A New Compact for Learning," in 1991. The compact called for a number of standards to be met in areas ranging from science, math and technology to career development and the arts; but the question was how to test whether schools and students were meeting those goals.

The state appointed the New York State Council for Curriculum and Assessment that same year. Made up of leading educational policy experts, the council was charged with deciding the best way to gauge whether the standards are being met.

The SUNY system weighed in immediately with a 1992 report that specifically asked the Regents to place a greater emphasis on portfolios and performance-based assessments including research projects, laboratory experiments, essays and exhibitions, "rather than short-duration standardized paper-and-pencil tests." SUNY said the tests provided no useful information about a student's real capabilities.

The recommendations of the council, published in a 1994 report, are a lot like what SUNY wanted, and almost nothing like the system eventually put in place.

Instead of standardized exams, the council determined that students should be required to put together a "Regent's Portfolio" that would include special projects, research papers, writing samples and other evidence indicating mastery of skills, in addition to evaluations of teachers and some standardized tests. No single area could have ensured graduation, or kept a student from graduating. It would be considered in total.

"Every district would have created pieces that would have fit the standards, such as special science projects," said Darling Hammond.

Council member Deborah Meier, who founded Central Park Elementary School in New York City and wrote the book "In Schools We Trust," called it "a very balanced idea between responsibility of state and local control. It was a real leap in accountability that was tied to good data."

The Regents endorsed the plan in 1994.

But then Richard Mills, appointed commissioner of education in 1995, reversed that decision. Instead, Mills opted to go in a new direction.

Prior to coming to New York, Mills headed the education system in Vermont and had instituted a portfolio system for that state, which experts said was applied unevenly from school district to school district.

The results of Mills' decision for New York are well known: every public high school student now has to pass five standardized tests in order to graduate. No exceptions.

Mills' office in Albany deferred comment for this article to communication director Alan Ray.

"The long and short of why the decision was made not to use (portfolios)," Ray said, is that portfolios vary too much from school district to school district. "They can not work on a statewide basis to evaluate student performance in a consistent way from, say, Long Island to Buffalo."

Meier, asked if the curriculum and assessment council supported that conclusion or the testing system that the state then came up with to replace portfolios, said she was shocked.

"I think the state has come up with something so shabby as to be scandalous," she said. "To use one very narrow form of standardized testing is bound to distort schooling. It's like assuming that the only thing a corporate leadership should be interested is the bottom line this year. That's not the only thing they should be interested in. If they are, you get Enron. Good management wants to know something about the long-term goals, what the numbers mean.

"The more you put your focus on this year's bottom line," she said, "the more you distort, both in the corporate world and in the education world."

Asked if an increased drop-out rate and a narrowing of school curricula to fit the tests were predictable, Darling Hammond pointed out that the Regents exams violate the most basic tenant of testing, the American Psychological Association's guideline that no single test should determine an important decision for a child.

"When this policy direction was adopted, there was already some indication that the use of high-stakes testing, particularly if it doesn't allow for a range of measures of performance, would result in high failure rates for students and the potential for increased drop-out and push-out rates," she said. "That had been documented already in states that had adopted systems like that."

It's been documented in New York, too. According to Walter Haney, a senior researcher at the Center for the Study of Testing, Evaluation and Educational Policy at Boston College, New York's graduation rate has dropped from 61 percent in 1997-1998, when the tests began being implemented, to 57.6 percent in 2001-2002, the last year for which statistics are available.

That's one of the five worst statewide graduation rates in the United States and, Haney said, represents 250,000 students who dropped out as a direct result of the state's policy.

The state fervently disputes Haney's data.

"The graduation numbers, if you look at the last 10 years, have stayed the same and, in fact, have gone up a little bit," Kadamus said. "We have the same or more graduates and we know now that you've got to complete five Regents tests."

But other independent surveys compiled on the state's drop-out rates also show declining graduation rates.

John Warren, in a 2003 paper presented before the American Sociological Association, found that the state's graduation rate declined by 3 percent from 1995-2000 (the last year for which he calculated), a result almost identical to Haney's.

Jay Green and Greg Forster, analysts at the Manhattan Institute, used yet another method of calculating graduation rates. They found that as of 2001, New York had the ninth worst graduation rate among all 50 states.

"Official graduation rates going back many years have been highly misleading in New York City, Dallas, the state of California, the state of Washington, several Ohio school districts, and many other jurisdictions," their report said.

To help deal with the technical issues that crop up when designing a major new testing program, the state put together a second committee, also composed of nationally known testing and education experts, called the Technical Advisory Group (TAG).

That group, too, found its recommendations ignored - particularly on the critical issue of determining whether the tests actually measure anything at all.

In testing terms, that's called "validity." If a test is "valid," then an increase or decrease in scores actually measures something, like how much a student has learned. If a test is "invalid," then all a changing test score means is that a number has changed - it doesn't actually refer to anything in the real world.

It's not enough that a test be designed around a curriculum, it also has to be valid.

"Do rises on (standardized) test scores indicate, for example, rises on other measures?" asked Richard Ryan, a psychologist specializing in testing issues at the University of Rochester. "Generally no."

Ray said the state has data showing the tests are valid.

"We've done a whole bunch of validity studies," he said. "We've published 70 or so studies."

He was unable, though, to name a publication the studies appeared in.

After six weeks of requests by Messenger Post Newspapers for those studies, Ray provided five.

Those studies were sent to testing experts, including Ryan, Darling Hammond and Joshua Aronson of New York University, who each independently concluded that they contained no validity data at all.

But the state was warned it needed validity data. In at least three separate memos dating from 1998 to 2000, the TAG specifically asked the state to research both validity data and to create a complete "technical manual" for the tests.

As of the time of this publication, the state has yet to prove that its tests measure anything.

Still, Kadamus and other members of the Board of Regents insist the tests work.

"People who are college (admissions) people tell us that they're getting kids who are more capable of doing college-level work," Kadamus said. "Second, business people tell us that people who are going directly into the work force have higher levels of reading and writing skills. Third, we know from other measures, the SAT scores, the NAEP (National Assessment of Educational Progress) scores - a national test given across the country - that those scores are going up. They've gone up significantly."

Donald Hossler, associate vice president for enrollment services at Indiana University, though, said while the university receives many undergraduate applications from New York state, there is no evidence that either the quantity or the quality of those applications is improving.

Dan Shelley, director of undergraduate admissions at Rochester Institute of Technology, said he, too, has seen no evidence that the Regents exams are improving the quality of applicants.

Jonathan Burdick, dean of admissions and financial aid at the University of Rochester, said he believes the quality of New York applicants had actually gone down as a result of the impact the Regents exams are having on curriculum.

"The state efforts have actually made it harder for them to do some of the other creative and innovative things," Burdick said. "I think it's pretty clear that because there's so much testing so many of the other qualities, creative qualities, that we look for in our students - their interest in music, their ability to come up with unorthodox ideas - that's harder to come by because there's so much emphasis on testing."

Members of the business community also did not leap to the test's defense. Sandra Parker, president of the Rochester Business Alliance, said "There seems to be enough questions about that, maybe (the tests) should be re-looked at."

New York's SAT scores are rising. But so are everyone else's. In fact, while the state's SAT scores have gone up over the last 10 years, they are behind the national average in the rate of increase - so comparatively speaking, New York's SAT scores have gone down since the Regents exams were instituted.

U of R psychologist Ryan said he's never understood why the state would decide to use a system like the Regents exam.

"It's rather tragic," he said.

Brighton, Pittsford officials say tests are only part of student evaluation
By JEREMY MOULE / jmoule@mpnewspapers.com

Administrators in the Brighton and Pittsford school districts believe Regents exams have their place in evaluation of student learning, but caution that the tests shouldn't be the only measure.

Instead, students should be evaluated using a variety of methods in conjunction with each other.

"One test in one sitting over a few hours is not going to give you a complete picture of what a child is learning," said Jeanne Strining, assistant superintendent for curriculum and instruction in the Brighton Central School District.

Brighton believes there should be multiple measures of a student's progress and learning, Strining said. Those measures should occur over a period of time and should take into consideration both deep thinking and surface knowledge, she said.

Students have different strengths, she said. For instance, she said, one student may excel during class discussions, but may not be a good essay writer. It's up to the teacher to recognize those strengths, she said. Tools other than tests are available, Strining said, such as classwork and problem solving.

Administrators in Pittsford feel the same.

"There's no perfect exam or no perfect test or no perfect testing system," said Deputy Superintendent Bob Kendall.

Test data, however, is important to the district's efforts to improve instruction.

In fact, said Jackie Roblin, director of pupil services, there are flaws with any system of evaluation.

Roblin and Kendall said it's important to use multiple evaluation methods to determine how well a student is learning. Among the systems used in Pittsford are projects, portfolios, writing assessments and class participation.

"The state tests are just one piece of the measures we use in our assessment program," Kendall said.

Strining does believe it is important to take a hard look at tests and determine the best way to evaluate student performance, especially in the wake of the June 2003 Math A exam and the 2002 and 2003 physics exam. Both tests drew considerable attention because of vocal concern expressed over both content and the wording of test questions.

Roblin said the state is trying to work within its own system to create better tests and better student evaluation systems.

But for now, district administrators maintain the best system is to use several systems simultaneously.

"You can't use only a test to determine what a student knows," Roblin said.

Experts weigh in on portfolios vs. exams
By BENJAMIN WACHS / bwachs@mpnewspapers.com

Maybe the state acted a little too quickly when it dismissed the idea of a portfolio system for evaluating all students.

It certainly refused to study the question in detail Ð even after a panel of its own experts recommended it.

The Board of Regents' objection to a portfolio system, according to state Education Department Director of Communication Alan Ray, came from the experiences of Richard Mills, who was appointed commissioner of education in 1995. Prior to coming to New York, Mills had headed the education system in Vermont.

"Vermont had conducted a portfolio approach (when Mills was in charge). And, after several years of this, the RAND Corp. conducted analyses, and their conclusion was that portfolios on a large, statewide basis are unreliable," he said.

The RAND Corp. is a nonprofit research institute.

Portfolios, Ray said, "can be valuable for instruction," but have three flaws as a statewide evaluation measure: there's no way to ensure that the portfolios at a school in Rochester are at the same level as a school in Brighton Ð or that the teachers in Greece aren't giving more help than the teachers in Henrietta Ð or that they're graded the same way.

The state's Council on Curriculum and Assessment addressed the issue by empowering the state to regularly examine the portfolio systems of school districts to make sure that each one met the state standards.

"Are we evaluating the ends or the means?" asked Dan Drmacich, principal of School Without Walls in Rochester. "Who cares what you're doing, as long as the kids are doing well?"

Drmacich is a strong proponent of portfolios because his "consortium" school is one of a number in New York state that, until just recently, were allowed to use the portfolio approach instead of the Regents exams.

Consortium schools - most are in New York City - have virtually the same ethnic and economic backgrounds as the average public school in the city: in fact, they have 20 percent more students who qualify for free lunches.

But as of 1999, consortium schools also had a 91 percent college acceptance rate for their students while the average New York City public school had a college acceptance rate of 62.6 percent.

Consortium schools had almost half the dropout rate of city public schools.

But the state Education Department forced consortium schools to drop their portfolios and adopt the Regents exams in 2001.

That went against the recommendation of a panel of experts, this one appointed by the state in 2000 to examine the issue. Called "The Blue Ribbon Panel," its final report recommended that consortium schools be given a three-year extension on their independence from Regents exams so that their methods could be further studied by an organization independent from New York state.

The panel also recommended that the Regents exams be examined closely over those three years by the same outside entity so that, at the end, both methods could be compared side by side, by the same independent organization, using the same criteria.

Making a wrong decision on this issue, the panel concluded, "could have serious negative effects which could echo throughout the state."

But the state Education Department decided not to allow those studies, or to reverse its earlier decision.

"They're killing us," said Drmacich. "We were doing something that worked!"

But what about the RAND studies?

Daniel Koretz, a professor at the Harvard Graduate School of Education, headed the RAND studies.

Koretz said he disagrees with other researchers. He said there is no scientific evidence that portfolios work better for students. But, he said, that doesn't mean they don't work; there just isn't enough evidence to know.

"We don't have the right kind of data to answer this question, in my view," he said.

By the same token, Koretz said, there's absolutely no evidence suggesting that the Regents exams are any better.

"I think that using a single test (to determine student promotion) is the wrong answer," he said. "You can use multiple tests, like New York state does, but if you have to pass them all, then that doesn't necessarily help."

Research does show that "teaching to the test" is a real phenomenon and extremely destructive.

"It's clear that it causes real problems," Koretz said. "It's one of the main reasons not to rely on a single measure. It's difficult to devise a test that doesn't raise the risk of that."

As a result, he said, while he finds methodological flaws in the Regents portfolio approach, the RAND studies in no way recommended that the state of New York use the Regents exam system.

Koretz suggested that the best use of standardized tests is to see how schools are doing on an annual basis, and then use that data to determine which schools you want to study - not which students should pass or fail.

"If a school shows sudden test score improvements, go in and see why. Maybe they're doing something right, maybe they're teaching to the test," he said. "If a school just isn't improving, go in and see why. Maybe they just need to have their feet held to the fire, but maybe there's some other factor at work - lots of transient kids, or a huge influx of kids who don't speak English. Teachers can be working very, very hard, and unless you go in, you don't know what's going wrong."

What do the studies say?

Asked for studies proving that tests like the Regents exams work, the State Education Department could not provide a single one, although they do exist.

Those studies, however, are few and far between when compared to the overwhelming body of evidence against standardized testing and high-stakes exams.

Most recently released were a group of studies this month out of the University of Chicago tracking that city's high-stakes standardized exams for third- and sixth-graders.

The most complete examination to date of such exams, the studies found that after nine years there were virtually no benefits to such a program. Dropout rates increased, special education placements increased and there was no evidence that student learning increased over the long run.

A 2003 study by Tonya Moon, Carolyn Callahan and Carol Tomlinson suggested that increases in standardized test scores are more likely the result of test preparation than student academic achievement, and that test preparation drives out academically useful activities.

An analysis by M. Neil and Keith Gaylor in 2001 showed that states without high-stakes tests are more likely to show improvements on national tests than are states with them. The study specifically suggested that high-stakes tests are likely to widen educational inequities between the rich and the poor.

Similarly, a meta-analysis by C.H. Utman in 1997 suggested that while classrooms that focus on tests can improve rote learning, they actually undermine "performance at more heuristic or complex tasks" like creative thinking and problem solving.

Those results were predicted by studies by Wendy Grolnick and Richard Ryan in 1987, and by S. Golan and S. Graham, in 1990, all of which showed that students motivated by the need to do well on a test tended to gain only a superficial understanding of the material covered and did poorly when studied for long-term retention and growth.

A 1990 experiment by C. Flink, A.K. Bogiano, and M. Barrett, showed that the more pressure teachers put on students to perform "up to standards," as opposed to simply understanding the material, the more their students performed poorly on subsequent tests.

A series of studies by Joshua Aronson, Claude Steele and others demonstrate that the students most likely to underperform on standardized tests are the students who care the most about them.

They have concluded that the more mental energy an individual spends caring about their test performance, the more likely they are to repeatedly underperform.

Many of those students drop out.

A 2002 study by Sean Reardon and Claudia Galindo showed that the imposition of high-stakes tests increased the odds of a student dropping out of school between the eighth and 10th grades by 39 percent. And an analysis by Brian Jacob in 2001 showed that students in the bottom 20th percentile were 25 percent more likely to drop out of school in states with high-stakes testing.

There is also evidence about how teachers tend to react to high stakes testing regimens.

A study by Tonya Moon, Catherine Brighton and Carolyn Callahan in 2003 indicated that, when forced to work with high-stakes testing programs, "teachers are not likely to engage in effective classroom practices but instead engage in one-size-fits-all practices."

A study of the Texas school system by Linda McNeil and Angela Valenzuela in 2000 found that teachers were encouraged or required to take time away from core subjects not tested on state exams and to eliminate or curtail special projects, experiments, library research, extensive writing and oral assignments.

David Hoff, in a 2000 article in Education Week , documented instances of outright cheating by teachers on test results.

The bottom line, according to a 1995 study by Iris Rotber, is that "developing new tests cannot reduce problems in the current standardized testing programs, such as methodological problems, unreliable scoring and expense."

And that's just a partial list.

Copyright © 2002, Brighton-Pittsford Post

Return to complete article list