Levenberg-Marquart Methods and Nonlinear Estimation

Paul Davis

SIAM News, Vol. 26, Number 6, October 1993

A. Conan Doyle might have called it "The Case of the Disappearing Duplicate Diagonal Dominator":

Freshly returned from a cold walk in Regent's Park, Sherlock Holmes and Dr. Watson were warming themselves before the fire at Baker Street when Holmes announced, "Dr. Watson, I am sure you are familiar with the widely used scheme of Dr. Donald Marquardt for robust nonlinear least-squares estimation." Watson shook himself awake from a doze and returned a look of sleepy interest.

Holmes continued, "Now I have lately received a letter inquiring about the fate of an earlier promulgation of quite similar ideas. These too recommended the addition of a positive term to the diagonal of the matrix obtained from the Jacobian. Should we not use our deductive powers to ascertain why this earlier foray at tracted so little attention while Dr. Marquardt's later work was so remarkably influential?"

As he reached from his wing chair to accept the letter proffered by Holmes, Wats on mumbled, "Why, yes, yes, of course. We should investigate." After a moment's perusal of the letter, he continued in more certain tones. "This letter refers to the work of Kenneth Levenberg whilst he was employed at the Americans' Frankford Army Arsenal. It cites a 1944 paper in their Quarterly of Applied Mathematics. That was nearly twenty years before Marquardt's paper appeared--in '63, in the Journal of the Society for Industrial and Applied Mathematics, if memory serves--and the earlier work went largely unnoticed until Marquardt mentioned it again.

"Now wasn't Marquardt employed by that chemical company with the French-sounding name--DuPont, I think? Possibly a connection with the Jacobins? Definitely a case of a disappearing duplicate diagonal dominator, I should say."

"Yes, yes, Dr. Watson," Holmes agreed. "My belief is that the real clues to the neglect of Levenberg's work and the success of Marquardt's are several. We should study Marquardt's Fortran code, his attention to a certain angle, and perhaps a subtle difference between the two approaches that only the power of modern computation can reveal."

"Certainly," Watson answered, "you know best. But perhaps you could first tell me a bit about the importance of nonlinear estimation. Why all the fuss? And I would like to warm my toes a bit more before we venture out again!"

If he had stayed awake while his feet warmed up, Dr. Watson would have learned that nonlinear estimation is a ubiquitous tool of modern technology and that the Levenberg-Marquardt method is the breakthrough that accounts for much of its commonplace use.

Nonlinear Estimation

Estimation is the process of fitting a mathematical model to experimental data to determine unknown parameters in the model. The parameters are chosen so that the output of the model is the best match in some sense to the observed data.

This best fit idea is a bit like observers on a beach deciding whether a distant ship on the horizon is a luxury liner or a freighter--the color of the hull, the presence or absence of rigging, and other features of the observed ship are compared with those of known types of ships until the best match is found. The estimation process is often nonlinear because the observed data do not vary in direct proportion to the parameters in question.

Having found accurate values of the estimated parameters, investigators can differentiate among superficially similar situations, or they can make accurate predictions of behavior from the underlying model. These outcomes are an integral part of plant design, process refinement, drug development, quality control, and many other aspects of modern industrial practice.

Processes and Products

Joe Skwish, a senior consultant at DuPont, describes a canonical problem of the chemical industry that hinges on nonlinear estimation. "In chemical reaction rate studies, exponential rate coefficients determine the rate of transition from one chemical product to another. Typically, there is a multiple reaction chain. The sizing and construction of chemical plants depend upon accurate values of tho se reaction rates. They are determined by using nonlinear estimation to fit the exponential rate coefficients to laboratory data."

Skwish explains that the nonlinearity of most scientific models arises from "powers, expo-nentials, and ratios, the same kinds of terms that scientists encounter in the course of their educations. They emulate those expressions when they formulate models of their own."

As a specific application of nonlinear estimation, Skwish mentions the design of a waste treatment plant that required the use of mathematical models to characterize the efficiency and thus to determine the size of continuous stirred tank reactors. "Rate constants for these tanks--the coefficients for a model--were estimated from a pilot plant," Skwish explains. "Then data from actual waste streams were used as input to the model to determine the final design. The accuracy of those coefficients was verified by comparison with the actual coefficients measured from the final, full-scale plant."

In an earlier position with Eastman Kodak, Skwish had used nonlinear estimation to predict the properties of photographic paper from an examination of the pulp from which it would be produced. By estimating the parameters in an appropriate model, paper chemists could predict the tear and puncture strength of the final product from such pulp characteristics as fiber length and time required to pass through a filter.

Exploring and Evaluating Multidrug Treatment Strategies

James Minor, an independent consultant with experience in both the chemical and the pharmaceutical industries, points to the Levenberg-Marquardt method as "one of the most robust estimation methods of all those that are available, especially for industrial problems." He has found the method especially useful for accelerating the training of the neural networks he uses to evaluate different multidrug treatment strategies.

"These multidrug treatments are of increasing interest to both the FDA and the drug industry," he explains. "In the past, just one drug was tested, and the remaining components in the blood stream--nutrients, other drugs, and so on--were considered a nuisance. Now biochemists want to take advantage of those interactions to treat complex viruses like HIV. The attitude is, `Why limit the treatment to a single drug? Why handicap yourself when you can evaluate and explore different multidrug treatments?'"

Seth Michelson, research section leader in the Department of Biomathematics at Syntex Drug Discovery, elaborates on the impact of drug-drug combination therapies. "The question is, What is the effect of the second drug? Is it simply additive? Or is the effect of the second drug compounded, whether synergistically or antagonistically?

"For example, HIV is accompanied by a co-virus, cytomegalovirus (CMV), that causes blindness. The drug AZT treats HIV but not CMV. Can a second drug counteract the threat of CMV without interfering with AZT's primary effect? Doctors must know when they can mix and match drugs, and virologists need to narrow down the number of pathways that are interacting (when they are developing therapies)."

The neural net approach based on nonlinear estimation determines the combinations of drugs in which interactions occur and reveals whether those interactions are synergistic or antagonistic. "This information," Michelson suggests, "builds intuition for the virologists so they can formulate and test a more limited range of hypotheses about interaction pathways." Minor says simply, "It is the best tool for postulating new chemical pathways."

Quality Control

Charles Pfeifer, consultant manager for quality management and technology at DuPont, describes the importance of nonlinear estimation from another angle: "The model-based process control approach uses first principles chemical-physical mode ls or empirically derived models. Such inferential control approaches traditionally focus on determining set points for purposes of process regulation, stabilization, or obtaining complete chemical reactions. The assumption is that controlling for those in-process set points leads to good product.

"But engineering or automatic process control doesn't usually relate those set points to finished product properties important to the customer. From a quality control perspective, the challenge is to estimate process parameters and, thus, process operating windows, to ensure consistent on-aim properties while satisfying traditional goals. Nonlinear estimation often is required in these kinds of models."

Marquardt's Problems

Donald Marquardt was motivated to develop his nonlinear estimation technique by problems in this same mold. One such problem involved fitting the Van Laer model of vapor pressure versus temperature to determine the behavior of systems at temperatures intermediate to those at which the data were recorded.

Another involved fitting paramagnetic spectral data so that relative peaks could be located precisely. From the location of the spectral peaks, the physical behavior of the molecules under study could be predicted with considerable confidence. For a third problem, heats of formation were estimated from laboratory data, providing what Marquardt calls "a real breakthrough."

Still other problems involved the identification of polymers, improvement of processes, reduction of impurities by means of precise identification, and even safety statistics.

Unavoidable Nonlinearity

Eventually, what came to be known as the Levenberg-Marquardt method for nonlinear estimation was found to be useful, in Marquardt's words, in "hundreds and hundreds of applications because it was a technique that worked on most nonlinear problems most of the time. The practical reliability of the method--its ability to converge promptly from a wider range of initial guesses than other typical methods--is a factor in its continued popularity."

Marquardt explains that in February 1953 he joined a consulting group at DuPont, "the premier chemical engineering research organization." At that time, he suggests, the demands of engineering modeling were running far ahead of current computational capabilities: "Even though estimation involving linear models was only beginning to be used, all of the engineering models were nonlinear.

"The laboratory people often took very ingenious data, from which the engineers would extrapolate to plant-sized equipment. But the engineering groups would value our consulting services only if we could solve the nonlinear estimation problems that were crucial to these extrapolations."

The computational problem is one of finding the minimum of the cost function. The most common cost function is the sum of the squares of the differences between the actual data and the values predicted by the current choice of the estimation parameters. The lowest point on this surface corresponds to the best estimate of the unknown parameters. Since the problems are nonlinear, the search for the minimum is always iterative.

The insights leading to the Levenberg-Marquardt method arose from Marquardt's experience with several two-parameter estimation problems, including the Van Laer model mentioned earlier. The intuition of Marquardt's chemical engineering colle agues was often sufficient to provide good starting estimates for schemes like steepest descent, which iterate to the best estimate of the parameters by heading straight down the wall of a valley in the cost function surface. But without a good initial guess, many more iterations were usually needed. Worse, they often would not converge.

Primitive Computing Equipment Proves Beneficial

Marquardt worked with an IBM Card Programmed Calculator operating with 46 words of memory at 3.75 floating-point operations per second; by improvements in the machine's logic, he had increased the speed by 50% from its original 2.5 flops! From the output of that crude equipment, he plotted the contours of the cost function, as well as the individual iterates generated by the various estimation schemes he tried. "We had such slow, primitive equipment, it was easy to see the details of every step," he reports.

He began to observe a generic geometric problem: Methods like steepest descent, which followed the gradient down the cost function surface, marched in a direction that was nearly orthogonal to that of Taylor series methods, which linearized the cost function. This geometric conflict was a consequence of the long, narrow, banana-shaped valleys in the cost function. The ideal method would have to find an angle of descent intermediate between these two extremes. At the same time, the step size would require adjustment to prevent stepping across the valley and entirely missing the floor where the best parameter values lay.

The Diagonal Term

Eventually, Marquardt recognized that these goals could be met by adding a properly sized parameter to the diagonal of the system of equations defining the iterates. Levenberg, whose earlier work was unknown to Marquardt at the time, had provided some intuitive motivation for this diagonal term by deriving it from the argument that an overly long linearized step should offset the apparent reduction in the cost function.

But the extraordinary effectiveness of Marquardt's approach hinged on two particular features that were absent from Levenberg's prior work. First, unlike Levenberg, Marquardt did not insist on finding a local minimum of the cost function at each step. In this way he avoided the relatively slow convergence often encountered in steepest descent techniques as they work their way along a narrow zigzag path, crossing and recrossing the floor of the banana-shaped valley in the cost function surface.

Second, and of equal importance, Marquardt implemented his method in Fortran and tested it "on a large number of problems." His code contained a particular feature, mentioned only in a long footnote in his 1963 paper, that treated cases in which the diagonal parameter had grown unreasonably large.

"Many people initially programming the method have omitted the step described in the footnote in their computer software," Marquardt explains, "but it is very critical. The algorithm is not as robust without it. However, I distributed many hundreds of copies of the code I had tested. I am convinced that this technique would not have received as much attention without the Fortran code. And I believe it is still true today that good results won't receive the attention they deserve if they are not packaged in good code."

Marquardt's original ideas have evolved considerably in the hands of other researchers, points out David Gay of AT&T Bell Laboratories. "The success of the Levenberg-Marquardt algorithm led to the development of trust-region algorithms, which are now popular in many contexts," he says. "For hard nonlinear least-squares problems, these algorithms are often more efficient than Marquardt's code. Modern 'Levenberg-Marquardt' codes actually implement trust-region algorithms, in which the step length, rather than the Marquardt parameter, the multiple of a diagonal matrix added to the Hessian approximation, is explicitly controlled."

Closing the Case

So Sherlock Holmes and Dr. Watson might have found that the Levenberg-Marquardt method became a ubiquitous tool of nonlinear estimation because Marquardt made a subtle adjustment in the angle at which the method moved downhill and because he provided well-tested code that implemented a robust algorithm.

"Clever insight implemented in robust, well-tested code. Obvious, my dear Dr. Watson," Holmes might conclude. "Not just obvious, Mr. Holmes," Watson could wisely reply, "but critical to a wealth of industrial processes."

Paul Davis is a professor of mathematics at Worcester Polytechnic Institute.