pixia-club.info Religion Numerical Methods For Unconstrained Optimization And Nonlinear Equations Pdf

NUMERICAL METHODS FOR UNCONSTRAINED OPTIMIZATION AND NONLINEAR EQUATIONS PDF

Wednesday, May 15, 2019


This book has become the standard for a complete, state-of-the-art description of the methods for unconstrained optimization and systems of nonlinear equations. SIAM's Classics in Applied Mathematics series consists of books that were previously. Lin and L. A. Segel, Mathematics Applied to Deterministic Problems in the. Dennis, Jr. and Robert B. Schnabel, Numerical Methods for Unconstrained. 2» ca. Numerical Methods for Unconstrained. Optimization and. Nonlinear Equations. J. E. Dennis, Jr. Rice University. Houston, Texas. Robert B. Schnabel.


Numerical Methods For Unconstrained Optimization And Nonlinear Equations Pdf

Author:SETH CROCETTI
Language:English, Spanish, Dutch
Country:Namibia
Genre:Fiction & Literature
Pages:756
Published (Last):03.05.2016
ISBN:805-1-16158-480-5
ePub File Size:18.88 MB
PDF File Size:16.59 MB
Distribution:Free* [*Regsitration Required]
Downloads:42361
Uploaded by: DONETTE

Numerical Methods for Unconstrained Optimization and Nonlinear Equations SIAM's Classics in Applied Mathematics serie Size Report. DOWNLOAD PDF . Numerical Methods for Unconstrained Optimization and Nonlinear Equations ( Classics in Applied Mathematics). Read more. pixia-club.info, pixia-club.infoel: Numerical methods for unconstrained op-. timization and Nonlinear Equations, SIAM, Philadelphia, Pennsylvania, pixia-club.info .

If the solution to 1. Fur- thermore.

Numerical Methods for Unconstrained Optimization and Nonlinear Equations

We will not consider the constrained problem because less is known about how it should be solved. Introduction Chap. Of course. Usually 1. It is where Rn denotes n-dimensional Euclidean space. If they are readily available. Our emphasis in this book is on the most common diffi- culties encountered in solving problems in this framework: The typical scenario in the numerical solution of a nonlinear problem is that the user is asked to provide a subroutine to evaluate the problem func- tion S.

We do not necessarily assume that the derivatives are analytically available. It often is. Most of the book is concerned with. The third problem that we consider is also a special case of un- constrained minimization. We also give the analysis that we believe is relevant to understanding these methods and ex- tending or improving upon them in the future.

We are concerned exclusively with the very common case when the nonlinear functions F. For further comments on the typical size and other characteristics of nonlinear problems being solved today. Problem 1. We discuss the basic methods and supply details of the algorithms that are currently considered the best ones for solving such problems.

This is the nonlinear least-squares problem: In particular. The techniques for solving the nonlinear equations and unconstrained minimization problems are closely related.

We feel that the field has jelled to a point where these techniques are identifiable. The researcher derives an equation f x giving the potential energy of a possible configuration as a function of the tangent x of the angle between its two components. Thus Chapter 10 is really an extensive worked-out example that illustrates how to apply and extend the preceding portion of the book.

One difficulty with discussing sample problems is that the background and algebraic description of problems in this field is rarely simple.

Then we make some remarks on the size. This is a minimization problem in the single variable x. It might be appropriate to call such methods local or locally convergent. This is a very difficult problem that is not nearly as extensively studied or as successfully solved as the problems we consider. The nonlinear least-squares problem is just a special case of unconstrained minimization. Throughout this book we will use the word "global.

Since the. Therefore we will simplify our examples when possible. First we give three real examples of nonlinear pro- blems and some considerations involved in setting them up as numerical prob- lems. It is likely to be highly nonlinear. For example. One problem that we do not address in this book is finding the "global minimizer" of a nonlinear functional—that is.

Any method that is guaranteed to converge from every starting point is probably too inefficient for general use [see Allgower and Georg ]. Although this makes consulting work interesting. The simplest nonlinear problems are those in one variable.

Figure 1. In practice. A second common class of nonlinear problems is the choice of some best one of a family of curves to fit data provided by some experiment or from some sample population. Some comments are in order. Since the general equation for a bell-shaped curve is this means choosing x1. Actually ri is linear in x1 and x In order to draw conclusions from the data. As a final example. Scientists had determined that the cost per unit of energy was modeled by where cl c2.

A nuclear fusion reactor would be shaped like a doughnut. There were. An illustrative simplification of the actual problem is that we were asked to find the combi- nation of the inner radius r. Two obvious choices are and The reasons one usually chooses to minimize f x rather than f1 x orf00 x are sometimes statistical and sometimes that the resultant optimization problem is far more mathematically tractable.

The first was that. This is one reason why derivative approximation becomes so important. While certainly there are problems Figure 1. These constant values depended on factors. The first is their size. There were also five more parameters see Figure 1.

And finally. Notice that we said the constraints in the nuclear reactor problem us- ually made a difference. In the nuclear reactor problem. Altogether there were five simple linear constraints in the three variables. This is because we were actually asked to solve instances of the problem. The problems above give some indication of typical characteristics of nonlinear problems.

It was necessary to run different instances of the problem in order to see how the optimal characteristics of the reactor were affected by changes in these factors.

It also makes one willing to experiment with various algo- rithms initially. It is important to emphasize that a constrained problem must be treated as such only if the presence of the constraints is expected to affect the solution. In the next portion of the study. Often in practical applications one wants to solve many related in- stances of a particular problem. The minimization of this sort of function is very common in nonlinear optimization. A second issue is the availability of derivatives.

On the other hand. All our examples had this form. A fifth point. Although it is reasonable to want extra accuracy. In our experience. Intermediate prob- lems in this field are those with from 15 to 50 variables. In this book we try to point out where ignoring the affects of scaling can degrade the performance of nonlinear algorithms. These size estimates are very volatile and depend less on the algorithms than on the availability of fast storage and other aspects of the computing environ- ment.

The state of the art is such that we hope to be able to solve most of the small problems. This is primarily due to the approximate nature of the other parts of the problem: Therefore it is important to have algorithms that work effectively in the absence of analytic derivatives.

Problems with 50 or more variables are large prob- lems in this field. Frequently we deal with problems where the nonlinear function is itself the result of a computer simula- tion. We have heard of a variable problem in petroleum engineering where each function evaluation costs hours of IBM time.

Computers represent real numbers in the same manner. The implications of storing real numbers to only a finite precision are important. Beale Exceptions are problems where some discrete variables are constrained to take only a few values such as 0. The representation of a real number on a computer is called its floating-point representation. The answer is that there certainly are nonlinear problems where some variables must be integers because they rep- resent things like people. The length of the mantissa.

Garfinkel and Nemhauser In this case. In scientific notation. On IBM machines the base is Since On CDC machines the base is 2.

The theory does not guarantee this approach to solve the corresponding integer problem. On occasion. For more information. Although the effects of round-off can be rather subtle. For an example. This notation is used to indicate that the mag- nitude of each ai. Therefore we often need to measure how close a number x is to another number y. It is important. The second is the taking of the difference of two almost identical numbers.

The concept we will use most often is the relative error in y as an approximation to a nonzero x. For further information see Aho. The third is the solution of nearly singular systems of linear equations.

Thus the inaccuracy due to finite precision may accumulate and further diminish the accuracy of the results. This situation is actually a consequence of the first two. The first is the addition of a sequence of numbers. If one is alert to these three situations in writing and using computer programs. For an exam- ple. Another effect of finite-precision arithmetic is that certain aspects of our algorithms.

A consequence of the use of finite-precision arithmetic. This is preferable. A common notation in the measurement of error and discussion of algo- rithms will be useful to us.

Services on Demand

Such errors are called round-off errors. In the case of an underflow. The concept commonly used is machine epsilon.

One routine. In finite precision. This means that y is zero in the context. In the case of an overflow. Another way to view macheps is as a key to the difficult task of deciding when a finite-precision number could just as well be zero in a certain context.

The latter choice is reasonable sometimes. The quantity. What are the results when How do these compare with the correct "infinite-precision" result? What does this show you about adding sequences of numbers on the computer?

How many correct digits of the real answer do you get? What does this show you about subtracting almost identical numbers on the com- puter? It is known that f t is a sine wave. The task is to determine what combi- nation of these three factors will lead to the lowest inflation rate. You may assume that macheps will be a power of 2. An economist has a complex computer model of the economy which. You are to set up and solve this problem numerically.

Assume you have the same computer as in Exercise 4. Pretend you have a computer with base 10 and precision 4 that truncates after each arithmetic operation. Rephrase as a simultaneous nonlinear equation problem in standard form: Find x 1.

What are the relative and absolute errors of the answer obtained by the computer in Exercise 5 as compared to the correct answer? What if the problem is changed to What does this show about the usefulness of relative versus absolute error?

How would you handle the variable "number of housing starts"? What numerical problem would you set up to determine these characteristics from your experimental data?

Write a program to calculate machine epsilon on your computer or hand calculator. In each of the following calculations an underflow will occur on an IBM machine.

In which cases is it reasonable to substitute zero for the quantity that underflows? The value of macheps will vary by a factor of 2 depending on whether rounding or truncating arithmetic is used. For further information on the computer evaluation of machine environment parameters. Print out this value and the decimal value of macheps.

Some references that consider the problems of this chapter in detail are Avriel Conte and de Boor The algorithms for multivariable problems will be more complex than those in this chapter. The reason for studying one-variable problems separately is that they allow us to see those principles for constructing good local.

Brent It is also apparent that one will be able to find only approximate solu- tions to most nonlinear problems. Suppose we wish to calculate the square root of 3 to a reasonable number of places.

In general. It would be wonderful if we had a general-purpose computer routine that would tell us: This is due not only to the finite precision of our computers.

Thus the inability to determine the existence or uniqueness of solutions is usually not the primary concern in practice. Nonlinear Problems in One Variable Chap. It is important to our understanding to take a more abstract view of what we have done. Using 2.

The method we have just developed is called the Newton-Raphson method or Newton's method. In Figure 2. The obvious question is.

Numerical Methods for Unconstrained Optimization and Nonlinear Equations

It seems reasonable to approximate the indefinite integral by and once more obtain the affine approximation to f x given by 2. Newton's method comes simply and naturally from Newton's theorem. It is unappealing and unnecessary to make assumptions about derivatives of any higher order than those actually used in the iteration. Newton's method is typical of methods for solving nonlinear problems. Pedagogical tradition calls for us to say that we have obtained Newton's method by writing f x as its Taylor series approximation around the current estimate xc.

This type of derivation will be helpful to us in multivariable problems. The reason is that an affine model corresponds to an affine subspace through x. There are several reasons why we prefer a different approach. Again the root is given by 2. Since one has or. Now let us see what it will do for the general square-root problem. Newton's method would find its root in one iteration. The error at each iteration will be approximately the square of the previous error.

Then the sequence. This pattern is known as local q- quadratic convergence. Before deriving the general convergence theorem for Newton's method. Definition 2. This agrees with our experience for finding the square root of 3 in the example that began this section. The pattern of decrease in error given by 2.

For further examples see Exercises 2 and It is worth emphasizing that the utility of g-superlinear convergence is directly related to how many iter- ations are needed for ck to become small. A definitive reference is Ortega and Rheinboldt []. In order to prove the convergence of Newton's method. Newton's method will converge q- quadratically to the root of one nonlinear equation in one unknown.

An iterative method that will converge to the correct answer at a certain rate. The local convergence proof for Newton's method hinges on an estimate of the errors in the sequence of affine models Mc x as approxi- mations to f x. In this book we will be interested mainly in methods that are locally g-superlinearly or q- quadratically convergent and for which this behavior is apparent in practice. Note that 2. Then for any x. Assume that for some for every x e D..

We are now ready to state and prove a fundamental theorem of numeri- cal mathematics. From basic calculus dz. We will prove the most useful form of the result and leave the more general ones as exercises see Exercises This is especially convenient in multiple dimensions. The main advantage of using Lipschitz continuity is that we do not need to discuss this next higher derivative.

To appreciate the difference. The condition in Theorem 2. Let 0. Thus from Lemma 2. Theorem 2. A partially scale-free measure of nonlinearity is the relative rate of change in f' x. Thus Newton's method is useful to us for its fast local convergence. The numerator y. If f is linear. Newton's method will diverge. It makes the somewhat reasonable assumption that one starts with an interval [x0. Programs that use bisection generally do so only until an xk is obtained from which some variant of Newton's method will converge.

We also discuss the stopping tests and other computer- dependent criteria necessary to successful computational algorithms. In this section we discuss two global methods and then show how to combine a global method with Newton's method into a hybrid algorithm. A method more indicative of how we will proceed in n-space is the following. The simplest global method is the method of bisection. This makes the method very marginal for practical use.

This is expressed algebraically as: The method of bisection also does not extend naturally to multiple dimensions. This should be obvious geometrically. It sets xl to the midpoint of this interval. Iteration 2. Constructing such hybrid algo- rithms is the key to practical success in solving multivariable nonlinear prob- lems.

Below is the general form of a class of hybrid algorithms for finding a root of one nonlinear equation. Since f ' x us- ually is not available then. We will see in Chapter 6 that the criterion in Step 3 a has to be chosen with only a little bit of care to assure the global convergence in most cases of the hybrid algorithm to a solution. We will treat it more completely in Chapter 7. Naturally this test is very sensitive to the scale off x.

Deciding when to stop is a somewhat ad hoc process that can't be perfect for every problem. Equation 2. Step 2 usually involves calculating the Newton step.

The reader can already see that the choice of stopping rules is quite a can of worms. Partly to guard against this condition's being too restrictive. How- ever. Figure 2. When f' x is unavailable. The resultant quasi-Newton algorithm is called a secant method. The quasi-Newton step to the zero of Mc x then becomes Two questions immediately arise: If nc is chosen to be a small number.

While it may seem totally ad hoc. It seems reasonable that the finite-difference Newton method. Finite-Difference N. Let us take Then If we define and then we have under the same Lipschitz continuity assumption f' e Lipy D as in Lemma 2.

Notice how similar Newton's method and the finite-difference Newton's method are. From Lemma 2. Substituting 2. Then Proof. Notice also that the above analysis so far is independent of the value of Now let us define ac by 2.

Let xc. Then we have an easy corollary which tells us how close. Dividing both sides by hc gives the desired result. If there exists some constant c1 such that or equivalently. Obviously in practice hc cannot be too small in relation to xc. If there exists some constant c3 such that then the convergence is at least two-step q-quadratic.

We began this discussion with a claim that insight is contained in the existence and simplicity of the analysis. Thus we have lost most of the significant digits in taking the difference of the function values. This is just one of many instances in numerical analysis where theoretical analysis is an ally to algorithmic development. After all. See Exercise It turns out that this problem is so closely related to solving one nonlinear equation in one unknown that we virtually know already how to compute solutions.

The result is that the slope of the model of fat xc may not even have the same sign asf' x c. A good com- promise is to try to balance the nonlinearity error caused by taking hc too large with the finite-precision and function evaluation errors from allowing hc too small.

This often-used rule is generally satisfactory in practice. If this hc is large enough to cast doubt on the utility of 2. If the accuracy of the f-subroutine is suspect. First of all. We develop this as an exercise Exercise 19 and discuss it further in Section 5. There is a limit to how large hc can be. This gives a more accurate derivative approximation for a given hc see Exer- cise 20 than the forward difference approximation 2.

This is due not only to finite-precision arithmetic.

We need only choose t. The rest is immediate from calculus. Once again. A proof of this fact suggests an algorithm. That is. Its global minimizer. It will be useful to denote by Cl D and C2 D. Thus the step 2. The use of a Taylor series with remainder does not lead to later problems in the multidimensional instance of the minimization problem.

An iteration of the hybrid method starts by applying Newton's method. The Newton step is It is important to note the meaning of this step in terms of model problems.

Our global strategy for minimization will differ from that in Section 2. Since 2. From 2. A quadratic model is more appropriate than an affine model off x for either maximization or mini- mization because it has at most one extreme point. This time. Again one asks. The solution is so similar to the discussion in Section 2.

The stopping criteria for optimization are a bit different than those for solving nonlinear equations. More advanced global strategies for minimization are discussed in Chapter 6. To understand this. Run your program on: Does this sequence converge g-linearly to 1? What type of convergence is this? Harder There is an interesting relationship between l-step g-order and "1-step" r-order. Give a counterexample to the converse.

For the global strategy. See if you can discover and prove this result. Is q-linear convergence with constant 0. If you use bisection.

What rate of convergence do you observe in each case? It won't always be quadratic. What sort of convergence are you getting? It should follow the hybrid strategy explained in Section 2. To find the result and an application see Gay Do this on paper.

Prove that the r-order of a sequence is always at least as large as its q-order. For the local step. Analyze the convergence rate of the bisection method.

Which root are you converging to? On the IBM series. Which assumptions of Theorem 2. What about the test involving "typx" suggested at the end of Section 2. Prove that the error in the central difference formula 2. What are the major problems in using this method? Do you think a quadratic model is appropriate for this prob- lem?

See also "Muller's method" in most introductory numerical analysis texts. The modifications are explained in Section 2. Suggest a local method for solving this problem. Expand the techniques of Lemma 2.

Suppose D is an open interval containing xc — hc. Find the value of hc that minimizes this bound for fixed y. Modify your program from Exercise 9 to find the minimum of a function of one variable. Part of our interest lies in using problem structure. The solution of the model problem required only a couple of arithmetic operations. A second program note It was evident in Chapter 2 that we like to derive iterative algorithms for nonlinear problems from considering the solutions to properly chosen models of the problems.

Another interest is in considering how much problem structure we can safely ignore in the model for the facilitation of its solution and still expect meaning- ful convergence of the iteration.

This is just a way of stating the obvious fact that we can afford to do more iterations of a cheap method than of an expensive one. This simplicity was our reason for writing Chapter 2. There is a subtle and interesting relationship between the models we build and the means we have to extract information from them.

Chapter 3 presents mate- rial from computational linear algebra relevant to the solution of the model problems or the extraction of useful information from them. Chapter 4 is a review of some multivariable calculus theorems that we will find useful in setting up multivariable models and analyzing their approximation properties. We will give examples of both ap- proaches at appropriate times. In multiple dimensions. There seems no pressing need for another expo- sition of scalar iterations.

The model used must be from a class for which the solution of the model problem is possible by an ef- fective numerical procedure. In Chapter 2. If it is cheap to form and to solve the model problem.

In Section 3. Section 3. Wilkinson Our point of view is that the user may obtain the appropriate algorithms from some subroutine library.

We then describe in Section 3. Excellent references for this material are the books by J. Since these topics will not be needed until Chapters 8 and For reference. In Sections 3. In Sec- tion 3.

Numerical Methods for Unconstrained Optimization and Nonlinear Equations

This section will introduce norms of vectors and matrices as the appropriate generalizations of the absolute value of a real number. There are many different norms. Recall that Rn denotes the vector space of real n x 1 vectors.

Stewart Numerical Linear Algebra Background Chap. These matrices have a special relationship to the Euclidean norm that makes them very important for numerical linear algebra. Definition 3. The superscript T will denote the matrix transpose. Van Loan Thus the reader will easily see from 3. Matrices will be denoted by capital roman letters.

We will use the same symbol 0 for the zero vector as well as the zero scalar. We will generally use Greek lower-case letters for real numbers.

Strarvg Also in this section is a discussion of orthogonal matrices. We will especially be interested in the so-called Euclidean or l2 norm. In the sequel. In Chapter 4 we will see that function values will be vectors or scalars.

Fortunately in R" we have no such problem. The reader can show that 3. It would mean that whether a particular algorithm converged in practice might depend on what norm its stopping criteria used.

We will be thinking of a certain norm when we make such a statement. In our discussions of convergence. We should point out that q-linear convergence is a norm-dependent property. It would be unfortunate if it were necessary to specify the norm.

There exist positive constants a and such that for any v e Rn. This will be useful in analyzing convergence. If v has a certain magnitude v. The important conclusion from relations 3.

We also will want to measure matrices in the context of their roles as operators. We mentioned earlier that we will need to measure matrices as well as vectors. Operator norms must depend on the particular vector norms we use to measure v and Av. Norm 3. This will relate to their function in the local models on which the iterations of our nonlinear methods will be based.

A natural definition of the norm of A induced by a given vector norm is —that is. An easier exercise is the particular set of relations from which 3.

The relation I v can be proven easily. It is easily shown to obey the three defining properties of a matrix norm. It is usual to call these "matrix norms. It is not necessary at all to use the same norm on v and Av. The Frobenius norm of A satisfies. Although it appears that definition 3. Although the Frobenius and the lp in- duced norms obey the consistency condition the weighted Frobenius norm does not.

Proofs of some are given as exercises with appropriate hints as necessary. Sometimes a linear transformation of the problem will cause us to use a weighted Frobenius norm: Several properties of matrix norms and matrix-vector products are of particular importance in analyzing our algorithms.

For M e Rnxn. There exist positive constants a. The proof of 3. We will give the idea rather than the detail. Furthermore it gives a relation between the norms of the inverses of two nearby matrices that will be useful later in analyzing algorithms.

Inequality 3. Proofs are left as exercises.. Equation 3. Some simple properties of orthogonal matrices in the 12 norm indicate that they will be useful computationally. We have been denoting the inner product of two vectors v. The following theorem contains some properties of orthogonal matrices that make them important in practice. In the following two definitions. Let v. Vectors v Because of the availability of these routines and the careful effort and testing invested in them.

Once we have A factored. It is common practice to write the form A. If A is nearly singu- lar. These libraries include algorithms to solve 3. The trouble is that a reader unfamiliar with numerical computation might assume that we actually compute A-1 and take its product with the vector b. It is also important to have a basic understanding of the structure of these routines in order to interface with them easily and to be aware of techniques useful in other parts of nonlinear algorithms.

These are two reasons why orthogonal matrices are useful in matrix factorizations. For us. D is stored as a vector.. To verify this. Al is a nonsingular block diagonal matrix DB. AI is a nonsingular diagonal matrix D.

AI is a permutation matrix P. A permutation matrix has the same rows and columns as the identity matrix. AI is an orthogonal matrix denoted by Q or U. PM is then just the same permutation of the rows of M. For any matrix M. More will be said about this in the next section.

AI is a nonsingular upper triangular matrix denoted by U or R. R upper triangular. The decomposition is found by Gaussian elimination with partial pivoting. Table 3. Each Q. All are small in comparison to the cost of matrix factorizations. The arithmetic costs of solving linear systems with the above types of matrices are given in Table 3. Q orthogonal. L is called unit lower triangular.. For a general square matrix A. Qi is called a. Q 1 A below the main diagonal. When A is symmetric and positive definite.

A QR decompo- sition algorithm is given in Algorithm A3. The algorithm is very stable numeri- cally. L lower triangular.. The permutation matrix. In our secant algorithms. This makes it very stable numerically and is one reason House- holder and other orthogonal transformations are important to numerical linear algebra.

The advantage of the QR decomposition is that. For a detailed discussion of the PL U. D diagonal with positive diagonal elements. L unit lower triangular. In solving them. If A is symmetric but indefinite—i. A Cholesky decomposition algorithm is a special case of Algorithm A5. Aasen gives a version in which DB is replaced by T. See Golub and Van Loan for more details. The arithmetic costs of all the factorizations are small multiples of n3 and are given in Table 3. This form of the de- composition does not require any square roots.

These topics are discussed in this section. Stewart or Dongarra et al P a permutation matrix. For further information on this factorization. D B a 1 x 1 and 2x2 block diagonal matrix. If we change b by Ab.

See Figure 3. Graphing its two equations shows that they represent two nearly parallel lines. It is easily shown that ill-conditioning can be detected in terms of the matrix in the system. Linear systems whose solutions are very sensitive to changes in their data are called ill-conditioned.

If we change bl to b1. It can be shown that some of the methods discussed in Section 3. When a linear system is solved on a computer. This method of analysis is often called the Wilkinson backward error analysis. Since the minimum stretch induced by a singular matrix is zero. In our examples. This term is known as the condition number of A and is denoted by Kp A. In any induced matrix norm. It has the nice property of being a scale-free measure with respect to scalar multiples.

A problem is that calculating K A involves finding A-l. From the analysis of Wilkinson Our algorithms will check for such a condition and then perturb poorly conditioned models into ones that are better behaved. This may indicate that the underlying problem is not well posed.

In our application. If ill-conditioning occurs far from the solu- tion to the nonlinear problem. If A is almost singular in this sense. If ill-conditioned systems occur near the solu- tion. The idea of the algorithm is not hard at all.

The reader who wants to establish more intuition about this is urged to work through the example given in Exercise The algorithm we use is an instance of a class of condition number estimates given by Cline. The technique then used to estimate R' l 1 is based on the inequality for any nonzero z. Another is that it turns out to have a knack for extracting any large elements of R. The algorithm first computes R. One explanation is that the process for obtaining y is related to the inverse power method for finding the largest eigenvalue of R T R.

It is an easy exercise to show that in this case. A Jacobi rotation is a matrix J i. It was necessary to run different instances of the problem in order to see how the optimal characteristics of the reactor were affected by changes in these factors. Often in practical applications one wants to solve many related instances of a particular problem; this makes the efficiency of the algorithm more important. It also makes one willing to experiment with various algorithms initially, to evaluate them on the particular class of problems.

Finally, equation 1. In the next portion of the study, the function giving the cost per unit of energy was not an analytic formula like 1. There were also five more parameters see Figure 1. The minimization of this sort of function is very common in nonlinear optimization, and it has some important influences on our algorithm development.

First, a function like this is probably accurate to only a few places, so it wouldn't make sense to ask for many places of accuracy in the solution.

This is one reason why derivative approximation becomes so important. The problems above give some indication of typical characteristics of nonlinear problems. The first is their size.

While certainly there are problems Figure 1. Function evaluation in refined model of the nuclear reactor Introduction Chap. The state of the art is such that we hope to be able to solve most of the small problems, say those with from 2 to 15 variables, but even 2-variable problems can be difficult.

Intermediate problems in this field are those with from 15 to 50 variables; current algorithms will solve many of these. Problems with 50 or more variables are large problems in this field; unless they are only mildly nonlinear, or there is a good starting guess, we don't have a good chance of solving them economically.

These size estimates are very volatile and depend less on the algorithms than on the availability of fast storage and other aspects of the computing environment. A second issue is the availability of derivatives. Frequently we deal with problems where the nonlinear function is itself the result of a computer simulation, or is given by a long and messy algebraic formula, and so it is often the case that analytic derivatives are not readily available although the function is several times continuously differentiable.

Therefore it is important to have algorithms that work effectively in the absence of analytic derivatives. In fact, if a computer-subroutine library includes the option of approximating derivatives, users rarely will provide them analytically—who can blame them?

Third, as indicated above, many nonlinear problems are quite expensive to solve, either because an expensive nonlinear function is evaluated repeatedly or because the task is to solve many related problems. We have heard of a variable problem in petroleum engineering where each function evaluation costs hours of IBM time. Efficiency, in terms of algorithm running time and function and derivative evaluations, is an important concern in developing nonlinear algorithms.

Fourth, in many applications the user expects only a few digits of accuracy in the answer. This is primarily due to the approximate nature of the other parts of the problem: On the other hand, users often ask for more digits than they need.

Although it is reasonable to want extra accuracy, just to be reasonably sure that convergence has been attained, the point is that the accuracy required is rarely near the computer's precision. A fifth point, not illustrated above, is that many real problems are poorly scaled, meaning that the sizes of the variables differ greatly.

For example, one variable may always be in the range to and another in the range 1 to In our experience, this happens surprisingly often. However, most work in this field has not paid attention to the problem of scaling. In this book we try to point out where ignoring the affects of scaling can degrade the performance of nonlinear algorithms, and we attempt to rectify these deficiencies in our algorithms. Finally, in this book we discuss only those nonlinear problems where the unknowns can have any real value, as opposed to those where some variables must be integers.

All our examples had this form, but the reader may wonder if 10 Chap. The answer is that there certainly are nonlinear problems where some variables must be integers because they represent things like people, trucks, or large widgits.

However, this restriction makes the problems so much more difficult to solve—because all continuity is lost—that often we can best solve them by regarding the discrete variables as continuous and then rounding the solution values to integers as necessary. The theory does not guarantee this approach to solve the corresponding integer problem, but in practice it often produces reasonable answers.

Exceptions are problems where some discrete variables are constrained to take only a few values such as 0, 1, or 2. In this case, discrete methods must be used. On occasion, arithmetical coding also is influenced by an understanding of computer arithmetic.

Therefore, we need to describe briefly finite-precision arithmetic, which is the computer version of real arithmetic. For more information, see Wilkinson In scientific notation, the number Second, depending on the computer and the compiler, the result of each intermediate arithmetic operation is either truncated or rounded to the accuracy of the machine. Thus the inaccuracy due to finite precision may accumulate and further diminish the accuracy of the results.

Such errors are called round-off errors. Although the effects of round-off can be rather subtle, there are really just three fundamental situations in which it can unduly harm computational accuracy. The first is the addition of a sequence of numbers, especially if the numbers are decreasing in absolute value; the right-hand parts of the smaller numbers are lost, owing to the finite representation of intermediate results. For an example, see Exercise 4.

The second is the taking of the difference of two almost identical numbers; much precision is lost because the leading left-hand digits of the difference are zero. For an example, see Exercise 5. The third is the solution of nearly singular systems of linear equations, which is discussed in Chapter 3.

This situation is actually a consequence of the first two, but it is so basic and important that we prefer to think of it as a third fundamental problem. If one is alert to these three situations in writing and using computer programs, one can understand and avoid many of the problems associated with the use of finite-precision arithmetic.

A consequence of the use of finite-precision arithmetic, and even more, of the iterative nature of our algorithms, is that we do not get exact answers to most nonlinear problems. Therefore we often need to measure how close a number x is to another number y. A common notation in the measurement of error and discussion of algorithms will be useful to us. For further information see Aho, Hopcroft, and Ullman []. Another effect of finite-precision arithmetic is that certain aspects of our algorithms, such as stopping criteria, will depend on the machine precision.

It is important, therefore, to characterize machine precision in such a way that 12 Chap. The quantity, macheps, is quite useful when we discuss computer numbers. Similarly, two numbers x and y agree in the leftmost half of their digits approximately when This test is quite common in our algorithms. Another way to view macheps is as a key to the difficult task of deciding when a finite-precision number could just as well be zero in a certain context. This means that y is zero in the context, and sometimes, as in numerical linear algebra algorithms, it is useful to monitor the computation and actually set y to zero.

Finally, any computer user should be aware of overflow and underflow, the conditions that occur when a computation generates a nonzero number whose exponent is respectively larger than, or smaller than, the extremes allowed on the machine. For example, we encounter an underflow condition when we reciprocate on a CDC machine, and we encounter an overflow condition when we reciprocate 10 - 7 7 on an IBM machine. In the case of an overflow, almost any machine will terminate the run with an error message.

In the case of an underflow, there is often either a compiler option to terminate, or one to substitute zero for the offending expression. The latter choice is reasonable sometimes, but not always see Exercise 8.

Fortunately, when one is using well-written linear algebra routines, the algorithms discussed in this book are not usually prone to overflow or underflow.

One routine, discussed in Section 3. Rephrase as a simultaneous nonlinear equation problem in standard form: It is known that f t is a sine wave, but its amplitude, frequency, and displacement in both the f and t directions are unknown. What numerical problem would you set up to determine these characteristics from your experimental data? An economist has a complex computer model of the economy which, given the unemployment rate, rate of growth in the GNP, and the number of housing starts in the past year, estimates the inflation rate.

The task is to determine what combination of these three factors will lead to the lowest inflation rate. You are to set up and solve this problem numerically. How would you handle the variable "number of housing starts"? Pretend you have a computer with base 10 and precision 4 that truncates after each arithmetic operation; for example, the sum of What are the results when How do these compare with the correct "infinite-precision" result?

What does this show you about adding sequences of numbers on the computer? Assume you have the same computer as in Exercise 4, and you perform the computation 3- — 0. How many correct digits of the real answer do you get? What does this show you about subtracting almost identical numbers on the computer? What are the relative and absolute errors of the answer obtained by the computer in Exercise 5 as compared to the correct answer?

What if the problem is changed to What does this show about the usefulness of relative versus absolute error? Write a program to calculate machine epsilon on your computer or hand calculator. Print out this value and the decimal value of macheps.

The value of macheps will vary by a factor of 2 depending on whether rounding or truncating arithmetic is used. For further information on the computer evaluation of machine environment parameters, see Ford In each of the following calculations an underflow will occur on an IBM machine. In which cases is it reasonable to substitute zero for the quantity that underflows? The reason for studying one-variable problems separately is that they allow us to see those principles for constructing good local, global, and derivative-approximating algorithms that will also be the basis of our algorithms for multivariable problems, without requiring knowledge of linear algebra or multivariable calculus.

The algorithms for multivariable problems will be more complex than those in this chapter, but an understanding of the basic approach here should help in the multivariable case. Some references that consider the problems of this chapter in detail are Avriel , Brent , Conte and de Boor , and Dahlquist, Bjorck, and Anderson It would be wonderful if we had a general-purpose computer routine that would tell us: In general, the questions of existence and uniqueness—does a given problem have a solution, and is it unique?

In fact, we must readily admit that for any computer algorithm there exist nonlinear functions infinitely continuously differentiable, if you wish perverse enough to defeat the algorithm. Therefore, all a user can be guaranteed from any algorithm applied to a nonlinear problem is the answer, "An approximate solution to the problem is ," or, "No approximate solution to the problem was found in the alloted time.

Thus the inability to determine the existence or uniqueness of solutions is usually not the primary concern in practice. It is also apparent that one will be able to find only approximate solutions to most nonlinear problems.

Therefore, we will develop methods that try to find one approximate solution of a nonlinear problem. Suppose we wish to calculate the square root of 3 to a reasonable number of places. Using 2. The method we have just developed is called the Newton-Raphson method or Newton's method. It is important to our understanding to take a more abstract view of what we have done. In Figure 2. Again the root is given by 2.

There are several reasons why we prefer a different approach. It is unappealing and unnecessary to make assumptions about derivatives of any higher order than those actually used in the iteration. Furthermore, when we consider multivariable problems, higher-order derivatives become so complicated that they are harder to understand than any of the algorithms we will derive.

Instead, Newton's method comes simply and naturally from Newton's theorem, It seems reasonable to approximate the indefinite integral by and once more obtain the affine approximation to f x given by 2. This type of derivation will be helpful to us in multivariable problems, where geometrical derivations become less manageable.

Newton's method is typical of methods for solving nonlinear problems; it is an iterative process that generates a sequence of points that we hope come increasingly close to a solution. The obvious question is, "Will it work? The reason is that an affine model corresponds to an affine subspace through x, f x , a line that does not necessarily pass through the origin, whereas a linear subspace must pass through the origin.

Nonlinear Problems in One Variable Chap. Since one has or, using relative error, one has Thus as long as the initial error is less than the new error will be smaller than the old error and eventually each new error will be much smaller than the previous error.

This agrees with our experience for finding the square root of 3 in the example that began this section. The pattern of decrease in error given by 2. The error at each iteration will be approximately the square of the previous error, so that, if the initial guess is good enough, the error will decrease and eventually decrease rapidly.

This pattern is known as local qquadratic convergence. Before deriving the general convergence theorem for Newton's method, we need to discuss rates of convergence. Definition 2. Then the sequence 20 Chap.

If , - ' or 3, the convergence is said to be q-quadratic or q-cubic, respectively. In practice, q-linear convergence can be fairly slow, whereas q-quadratic or g-superlinear convergence is eventually quite fast. However, actual behavior also depends upon the constants c in 2. For further examples see Exercises 2 and 3. It is worth emphasizing that the utility of g-superlinear convergence is directly related to how many iterations are needed for ck to become small.

A definitive reference is Ortega and Rheinboldt []. An iterative method that will converge to the correct answer at a certain rate, provided it is started close enough to the correct answer, is said to be locally convergent at that rate. In this book we will be interested mainly in methods that are locally g-superlinearly or qquadratically convergent and for which this behavior is apparent in practice. However, it may not converge at all from a poor start, so that we need to incorporate the global methods of Section 2.

The local convergence proof for Newton's method hinges on an estimate of the errors in the sequence of affine models Mc x as approximations to f x. Then for any x, v e D Proof. The main advantage of using Lipschitz continuity is that we do not need to discuss this next higher derivative. This is especially convenient in multiple dimensions. We are now ready to state and prove a fundamental theorem of numerical mathematics. We will prove the most useful form of the result and leave the more general ones as exercises see Exercises Assume that for some for every x e D.

Thus from Lemma 2. The condition in Theorem 2. A partially scale-free measure of nonlinearity is the relative rate of change in f' x , which is obtained by dividing y by f' x. Theorem 2.

If x0 xc, Newton's method will diverge; i. Thus Newton's method is useful to us for its fast local convergence, but we need to incorporate it into a more robust method that will be successful from farther starting points. In this section we discuss two global methods and then show how to combine a global method with Newton's method into a hybrid algorithm.

We also discuss the stopping tests and other computerdependent criteria necessary to successful computational algorithms. The simplest global method is the method of bisection. It makes the somewhat reasonable assumption that one starts with an interval [x0, z0] that contains a root.

It sets xl to the midpoint of this interval, chooses the new interval to be the one of [x0, x1] or [x1, z0] that contains a root, and continues to halve the interval until a root is found see Figure 2. This is expressed algebraically as: This makes the method very marginal for practical use.

Programs that use bisection generally do so only until an xk is obtained from which some variant of Newton's method will converge. The method of bisection also does not extend naturally to multiple dimensions. A method more indicative of how we will proceed in n-space is the following. This should be obvious geometrically; for the simple proof, see Exercise Iteration 2.

Constructing such hybrid algorithms is the key to practical success in solving multivariable nonlinear problems. Below is the general form of a class of hybrid algorithms for finding a root of one nonlinear equation; it is meant to introduce and emphasize those basic techniques for constructing globally and fast locally convergent algorithms that will be the foundation of all the algorithms in this book. Step 2 usually involves calculating the Newton step, or a variant without derivatives see Section 2.

Equation 2. We will see in Chapter 6 that the criterion in Step 3 a has to be chosen with only a little bit of care to assure the global convergence in most cases of the hybrid algorithm to a solution. Deciding when to stop is a somewhat ad hoc process that can't be perfect for every problem, yet it calls for considerable care. The reader can already see that the choice of stopping rules is quite a can of worms, especially for poorly scaled problems.

We will treat it more completely in Chapter 7. The quasi-Newton step to the zero of Mc x then becomes Two questions immediately arise: If nc is chosen to be a small number, ac is called a finite-difference approximation to f' xc. It seems reasonable that the finite-difference Newton method, the quasi-Newton method based on this approximation, should work almost as well as Newton's method, and we will see later that it does.

However, this technique requires an additional evaluation of f at each iteration, and if f x is expensive, the extra function evaluation is undesirable. In this case, hc is set to. The resultant quasi-Newton algorithm is called a secant method. While it may seem totally ad hoc, it also turns out to work well; the convergence of the secant method is slightly slower than a properly chosen finite-difference method, but it is usually more efficient in terms of total function evaluations required to achieve a specified accuracy.

Figure 2. Notice how similar Newton's method and the finite-difference Newton's method are; the secant method is a bit slower. Finite-Difference N. Let us take Then If we define and then we have under the same Lipschitz continuity assumption f' e Lipy D as in Lemma 2. Notice also that the above analysis so far is independent of the value of Now let us define ac by 2.

Then we have an easy corollary which tells us how close, as a function of hc, the finite-difference approximation 2.

Then Proof. From Lemma 2. Substituting 2. Assume that x E D. If there exists some constant c1 such that or equivalently, a constant c2 such that then the convergence is q-quadratic. We began this discussion with a claim that insight is contained in the existence and simplicity of the analysis.

In particular, the finite-difference idea seemed somewhat ad hoc when we introduced it. This is just one of many instances in numerical analysis where theoretical analysis is an ally to algorithmic development. This is due not only to finite-precision arithmetic, but also to the fact that the function values are sometimes themselves just the approximate results returned by a numerical routine. The result is that the slope of the model of fat xc may not even have the same sign asf' x c.

There is a limit to how large hc can be, since our whole object in computing ac is to use it as an approximation to f' xc , and 2. A good compromise is to try to balance the nonlinearity error caused by taking hc too large with the finite-precision and function evaluation errors from allowing hc too small. We develop this as an exercise Exercise 19 and discuss it further in Section 5. This often-used rule is generally satisfactory in practice. If this hc is large enough to cast doubt on the utility of 2.

It turns out that this problem is so closely related to solving one nonlinear equation in one unknown that we virtually know already how to compute solutions.

First of all, one must again admit that one cannot practically answer questions of existence and uniqueness. Its global minimizer, where the function takes its absolute lowest value, is at but it also has a local minimizer, a minimizing point in an open region, at If we divide the function by x, it becomes a cubic with a local minimum at but no finite global minimizer since In general, it is practically impossible to know if you are at the global minimum of a function. So, just as our algorithms for nonlinear equations can find only one root, our optimization algorithms at best can locate one local minimum; usually this is good enough in practice.

Once again, a closed-form solution is out of the question: Graphically, this just says that the function can't initially decrease in either direction from such a point. A proof of this fact suggests an algorithm, so we give it below. It will be useful to denote by Cl D and C2 D , respectively, the sets of once and twice continuously differentiable functions from D into R.

An iteration of the hybrid method starts by applying Newton's method, or a modification discussed in Section 2. The Newton step is It is important to note the meaning of this step in terms of model problems. Since 2. A quadratic model is more appropriate than an affine model off x for either maximization or minimization because it has at most one extreme point. Thus the step 2.

Our global strategy for minimization will differ from that in Section 2. On the other hand, iff" xc 0, thenf x initially increases going from xc toward XN , and we should take a step in the opposite direction. Finally, there is the question of what to do if the derivativesf' x are not available. The solution is so similar to the discussion in Section 2.

Which root are you converging to? Do this on paper, not on the computer! Is q-linear convergence with constant 0. What type of convergence is this? Prove that the r-order of a sequence is always at least as large as its q-order.

Give a counterexample to the converse. Harder There is an interesting relationship between l-step g-order and "1-step" r-order. See if you can discover and prove this result. To find the result and an application see Gay On the IBM series, extended-precision multiplication is hardwired, but extended-precision division is done by doing double-precision division, and then using a Newton's method iteration that requires only multiplication and addition.

What sort of convergence are you getting? Analyze the convergence rate of the bisection method. Write, debug, and test a program for solving one nonlinear equation in one unknown. It should follow the hybrid strategy explained in Section 2. For the local step, use either Newton's method, finite-difference Newton's method, or the secant method.

For the global strategy, use bisection, backtracking, a strategy that interpolates between xn and xk, or another strategy of your own choosing. Run your program on: If you use bisection, pick an appropriate initial interval for each problem, and constrain all your iterates to remain in that interval. What rate of convergence do you observe in each case? It won't always be quadratic. Modify your program from Exercise 9 to find the minimum of a function of one variable.

The modifications are explained in Section 2. Which assumptions of Theorem 2. What are the major problems in using this method?

Do you think a quadratic model is appropriate for this problem? See also "Muller's method" in most introductory numerical analysis texts. Find the value of hc that minimizes this bound for fixed y, n, and f xc.

Prove that the error in the central difference formula 2. Expand the techniques of Lemma 2. Suggest a local method for solving this problem. This page intentionally left blank A second program note It was evident in Chapter 2 that we like to derive iterative algorithms for nonlinear problems from considering the solutions to properly chosen models of the problems. The model used must be from a class for which the solution of the model problem is possible by an effective numerical procedure, but also it must be chosen to adequately model the nonlinear problems so that the iteration sequence will converge to a solution.

In Chapter 2, elementary calculus gave all the necessary techniques for building the relevant linear and quadratic models and analyzing their approximation errors with respect to a realistic class of nonlinear problems.

The solution of the model problem required only a couple of arithmetic operations. This simplicity was our reason for writing Chapter 2.

There seems no pressing need for another exposition of scalar iterations, but we were able to present, without the complications of multivariable calculus and linear algebra, ideas that will extend to the multivariable case. In multiple dimensions, our models will involve derivatives of multivariable functions, and the solution of a system of linear equations will be needed to solve the model problem.

Chapter 3 presents material from computational linear algebra relevant to the solution of the model problems or the extraction of useful information from them. Chapter 4 is a review of some multivariable calculus theorems that we will find useful in setting up multivariable models and analyzing their approximation properties.

There is a subtle and interesting relationship between the models we build and the means we have to extract information from them.

If it is cheap to form and to solve the model problem, the model need not be as good as if the model solution were very expensive. This is just a way of stating the obvious fact that we can afford to do more iterations of a cheap method than of an expensive one. Another interest is in considering how much problem structure we can safely ignore in the model for the facilitation of its solution and still expect meaningful convergence of the iteration.

We will give examples of both approaches at appropriate times. In Section 3. We then describe in Section 3. Our point of view is that the user may obtain the appropriate algorithms from some subroutine library, but the serious user needs to understand the principles behind them well enough to know the tools of numerical linear algebra that can help, which routines to use for specific problems, and the costs and errors involved.

For reference, Section 3. In Sections 3. Since these topics will not be needed until Chapters 8 and 11, respectively, their consideration can be postponed until then. Excellent references for this material are the books by J. Stewart , G. Strarvg , G. Golub, and C. Van Loan In the sequel, we will need similar conditions, but in a context where the absolute value no longer can measure magnitude because the arguments and derivatives are no longer scalar. Similarly, conditions like the stopping criterion f x k 42 Chap.

It would be unfortunate if it were necessary to specify the norm, as we would have to if it were possible to converge in one norm but not in another. It would mean that whether a particular algorithm converged in practice might depend on what norm its stopping criteria used. Fortunately in R" we have no such problem, owing to the following result that is not true in infinite-dimensional space.

We should point out that q-linear convergence is a norm-dependent property. There exist positive constants a and such that for any v e Rn. An easier exercise is the particular set of relations from which 3. The relation I v can be proven easily, using the Cauchy-Schwarz inequality: The important conclusion from relations 3. We mentioned earlier that we will need to measure matrices as well as vectors. In fact, the useful Frobenius norm is just the l2 norm of A written as a long vector: It is usual to call these "matrix norms," and they are relevant when we view the matrix as a set of pigeonholes filled with quantitative information.

This will relate to their function in the local models on which the iterations of our nonlinear methods will be based.

We also will want to measure matrices in the context of their roles as operators. This will be useful in analyzing convergence; the reader can look back to Section 2. Operator norms must depend on the particular vector norms we use to measure v and Av.

It is not necessary at all to use the same norm on v and Av, but we have no need of such generality. Norm 3. It is easily shown to obey the three defining properties of a matrix norm, which are simply 3. Although it appears that definition 3. Sometimes a linear transformation of the problem will cause us to use a weighted Frobenius norm: Although the Frobenius and the lp induced norms obey the consistency condition the weighted Frobenius norm does not, except in special cases.

Several properties of matrix norms and matrix-vector products are of particular importance in analyzing our algorithms, and these properties are listed below. Proofs of some are given as exercises with appropriate hints as necessary. There exist positive constants a, such that for every A e Un xn. Furthermore it gives a relation between the norms of the inverses of two nearby matrices that will be useful later in analyzing algorithms.

We will give the idea rather than the detail. Inequality 3. The proof of 3. Some simple properties of orthogonal matrices in the 12 norm indicate that they will be useful computationally. We have been denoting the inner product of two vectors v, w e Rn as VTW, but we will occasionally find some special notation useful.

In the following two definitions, we juxtapose two meanings for V1 to warn the reader again to interpret it in context. Definition 3. Let v, w e Rn; the inner product of v and w is defined to be where is the angle between v and w, if v are said to be orthogonal or perpendicular.

Vectors v If Q e Rnx p, then Q is said to be an orthogonal matrix if An equivalent definition to 3. The following theorem contains some properties of orthogonal matrices that make them important in practice. Proofs are left as exercises. Equation 3. These are two reasons why orthogonal matrices are useful in matrix factorizations. Luckily, excellent subroutine libraries for this problem are available: These libraries include algorithms to solve 3. Because of the availability of these routines and the careful effort and testing invested in them, our approach is that a person who solves nonlinear problems need not, and probably should not, write a personal linear equations solver; rather, that person should know which canned routine to use for the various types of problems 3.

It is also important to have a basic understanding of the structure of these routines in order to interface with them easily and to be aware of techniques useful in other parts of nonlinear algorithms. The trouble is that a reader unfamiliar with numerical computation might assume that we actually compute A-1 and take its product with the vector b.

For us, m 48 Chap. More will be said about this in the next section. To verify this, note that each b1 equals sothat Below we list the six most important choices of At and briefly mention special features of solving 3.

AI is a permutation matrix P. A permutation matrix has the same rows and columns as the identity matrix, although not in the same order. AI is an orthogonal matrix denoted by Q or U. D is stored as a vector. AI is a nonsingular upper triangular matrix denoted by U or R, the transpose of a lower triangular matrix.

The arithmetic costs of solving linear systems with the above types of matrices are given in Table 3. All are small in comparison to the cost of matrix factorizations. Table 3. The decomposition is found by Gaussian elimination with partial pivoting, or Doolittle reduction, using standard row operations equivalent to premultiplying A by L - 1 P - 1 to transform A into U, and simultaneously yields the decomposition.

Qi is called a 50 Chap. The permutation matrix, which is not always used, is formed from the column permutations necessary to move the column with the largest sum of squares below row i — 1 into column i at the ith iteration. This makes it very stable numerically and is one reason Householder and other orthogonal transformations are important to numerical linear algebra.

On the other hand, the PLU decomposition is generally quite accurate and is half as expensive as the QR, so both factorizations are used in practice. In our secant algorithms, we will use the QR without column pivots on the derivative matrices because it is cheaper to update, after low-rank changes to the matrix, than the PLU. In fact, we will recommend using the QR algorithm in any algorithm for solving dense systems of nonlinear equations for an implementation reason that is discussed in Section 6.

A QR decomposition algorithm is given in Algorithm A3. This form of the decomposition does not require any square roots. A Cholesky decomposition algorithm is a special case of Algorithm A5.

If A is symmetric but indefinite—i. For further information on this factorization, see Bunch and Parlett Aasen gives a version in which DB is replaced by T, a tridiagonal matrix. See Golub and Van Loan for more details. Stewart or Dongarra et al The arithmetic costs of all the factorizations are small multiples of n3 and are given in Table 3. In solving them, one should use the special subroutines available for sparse matrix problems; the Harwell and Yale packages are widely available. Therefore, it is important to know how much the computed step may be affected by the use of finite-precision arithmetic.

Also, since Ac and F xc are sometimes approximations to the quantities one really wants to use, one is interested in how sensitive the computed step is to changes in the data Ac and F xc. These topics are discussed in this section. If we change bl to b1 - 0. However, if we change b2 by the same — 0. Similarly, changing the first column of A2 to , 2.

Clearly, the second system is very sensitive to changes in its data. Graphing its two equations shows that they represent two nearly parallel lines, so that moving one a little alters their point of intersection drastically. See Figure 3. Linear systems whose solutions are very sensitive to changes in their data are called ill-conditioned, and we will want a way to recognize such systems. It is easily shown that ill-conditioning can be detected in terms of the matrix in the system. This term is known as the condition number of A and is denoted by Kp A , when using the corresponding lp induced matrix norm.

In any induced matrix norm, the condition number is the ratio of the maximum to the minimum stretch induced by A [see equations 3. Since the minimum stretch induced by a singular matrix is zero, the condition number of a singular matrix can be considered infinite. Therefore, the condition number of a nonsingular matrix is a measure of the matrix's nearness to singularity.

This method of analysis is often called the Wilkinson backward error analysis, and the reference is Wilkinson It can be shown that some of the methods discussed in Section 3. Thus, we will look out for ill-conditioned linear systems in extracting information from our local models, because however simple and accurate the model, the solution of a model problem that is sensitive to small changes is certainly of limited use as an approximation to the solution of the nonlinear problem.

If ill-conditioning occurs far from the solution to the nonlinear problem, where the model is not reckoned to be very accurate anyway, we usually just perturb the linear system into a betterconditioned one, and proceed. If ill-conditioned systems occur near the solution, where the model seems to be good, then this indicates that the solution of the nonlinear problem is itself very sensitive to small changes in its data see Figure 3.

This may indicate that the underlying problem is not well posed. Finally, we need to discuss how we will determine in practice whether a linear system 3. Our algorithms will check for such a condition and then perturb poorly conditioned models into ones that are better behaved. A problem is that calculating K A involves finding A-l , and not only may this be unreliable, but its expense is rarely justified.

In our application, we will want Figure 3. It is an easy exercise to show that in this case, Therefore, we estimate K t K , since the l1 norm of a matrix is easy to compute. The algorithm we use is an instance of a class of condition number estimates given by Cline, Moler, Stewart, and Wilkinson and is given by Algorithm A3.

The algorithm first computes R - The technique then used to estimate R' l 1 is based on the inequality for any nonzero z. One explanation is that the process for obtaining y is related to the inverse power method for finding the largest eigenvalue of R T R - l.

Another is that it turns out to have a knack for extracting any large elements of R - 1. The reader who wants to establish more intuition about this is urged to work through the example given in Exercise The idea of the algorithm is not hard at all. The tool we need for the factorization 3. A Jacobi rotation is used to zero out one element of a matrix while only changing two rows of the matrix. The two-dimensional rotation matrix and th n-dimensional Jacobi rotation matrix are defined in Definition 3.

First, n — 1 Jacobi rotations are applied to zero out in succession rows n, n — 1, The effect of each rotation on Rc is to alter some existing elements and to introduce one new element directly below the diagonal in the i, i — 1 position. The reader can verify that the entire QR update process requires only 0 n2 operations. When we are using local quadratic modeling to solve the unconstrained minimization problem, we will prefer the Hessians, second-derivative matrices in Rnxn, of the quadratic models to be symmetric and positive definite.

We then 58 Chap. The complete algorithm is given in Algorithm A3. These algorithms are collected in Goldfarb The sequencing of symmetric indefinite factorizations has been studied extensively by Sorensen , and his somewhat complex algorithm seems completely satisfactory.

We also summarize the definitions of positive definite, negative definite, and indefinite symmetric matrices and their characterizations in terms of eigenvalues. This characterization provides insight into the shapes of the multivariable quadratic models introduced in Chapter 4 and used thereafter. Most of the theorems are stated without proof. Proofs can be found in any of the references listed at the beginning of this chapter.

Let A e R n x n be symmetric. A is said to be indefinite if it is neither positive semidefinite nor negative semidefinite. Then A has n real eigenvalues 1, Then A is positive definite if, and only if, all its eigenvalues are positive. Let A have eigenvalues 1 ,. Suppose j, where at least one is nonzero. Then owing to the orthonormality of the. Thus A is positive definite. Then A is positive semidefinite if, and only if, all its eigenvalues are nonnegative. A is negative definite or negative semidefinite if, and only if, all its eigenvalues are negative or nonpositive, respectively.

A is indefinite if and only if it has both positive and negative eigenvalues. The following definitions and theorems will be needed in Section 5. Let A e Rnxn be symmetric. If A is strictly diagonally dominant, then A is 3. Numerical Linear Algebra Background Chap. The reasons for studying problem 3. In connection with the linear least-squares problem, we introduce the singular value decomposition, a powerful numerical linear algebra tool that is helpful in many situations.

Instead we choose x to minimize some measure of the vector of residuals Ax — b. Note that finding the solution to this instance of 3. This interpretation leads to an easy solution of the linear least-squares problem. This information is summarized in Theorem 3.

Equations 3. This is because forming the matrix ATA can cause underflows and overflows and can square the conditioning of the problem in Figure 3. This decomposition also yields a numerically stable orthonormalkation of the columns of A. The following theorem shows how to use 3.

The unique solution to 3. The existence of the QR decomposition of A follows from its derivation by Householder transformations, and the nonsingularity of Ru follows from the full column rank of A. Using equation 3. Problem 3. The SVD is a matrix factorization that is more costly to obtain than the factorizations we have discussed so far, but also more powerful. It is defined in Definition 3. Theorem 3. The quantities are called the singular values of A. Notice that we are using U to denote an orthogonal matrix, even though we used the same symbol in Section 3.

There should be no confusion for the alert reader in this well-established notation. Furthermore, the use of the symbols U and V is traditional for matrices whose columns are to be thought of as eigenvectors, and the following lemma will show that this is the case.

The number of nonzero singular values equals the rank of A. An existence proof for the SVD can be found in the references given at the beginning of the chapter.

Thus, if u j and Vj are the respective jth columns of U and V, then: Then the unique solution to problem 3. Thus the solution to 3. The singular value decomposition is found by an iterative process closely related to the algorithm for finding the eigenvalues and eigenvectors of a symmetric matrix, so it is more costly to obtain than our other matrix decompositions.

It is the recommended method for solving 3. If m 66 Chap. As n get larger, there are more awkward distributions of the singular values to worry about. No one really knows what to do, but it is certainly reasonable to look for "breaks" in the singular values and count all singular values of about the same magnitude as either all zero, or all nonzero.

It is important that the user of canned programs understand these issues. However, nothing is done behind the user's back, and it is only necessary to read the program documentation to understand what is done. It is safe to say that for internal efficiency reasons, any subroutine to compute an SVD will return zero values for any singular values that are negligible in the context of the computation. Dongarra et al. Hard Prove Theorem 3.

Prove relations 3. Prove that 3. Show that for all v e Rn, with equality for at least one v. Prove 3. Use the techniques of proof of Theorem 3. Complete the proof of Theorem 3. Prove Theorem 3. Solve the block diagonal system: Derive the Householder transformation: Apply Householder transformations to factor A into QR, Use the Cholesky factorization to find out. Find the Cholesky factorization of What if the 10 is changed to a 7? Calculate the l1 condition number of Then estimate its condition number by Algorithm A.

Program the upper triangular condition estimator as a module for later use. The simple form R we gave for Jacobi rotations is subject to unnecessary 68 Chap. Write an accurate procedure to evaluate R a,. Write a Jacobi rotation that would introduce a zero into the 3, 1 position of Show also that if A has full column rank, then ATA is nonsingular. Given the QR decomposition of a rectangular matrix A with full column rank as defined in Theorem 3.

Do the problem using QR and the normal equations. Did you do Exercise 15? Use techniques similar to those used in the proofs of Theorems 3. Show how to carry out the LQ decomposition in the last exercise using Householder transformations. In Section 4. Section 4. Ortega and Rheinboldt is a detailed reference for this material and for the theoretical treatment of nonlinear algebraic problems in general. In this way we managed to have our model problem match the real problem in value, slope, and curvature at the point currently held to be the best estimate of the minimizer.

Taylor series with remainder was a useful analytic device for bounding the approximation error in the quadratic model, and it will be equally useful here. We begin this section with a no-frills exposition of Taylor series up to order two for a real-valued function of a real vector variable.

Then, in order to 69 70 Chap. In this case, the Taylor series approach is quite unsatisfactory because the mean value theorem fails. It was in anticipation of this fact that we used Newton's theorem in Chapter 2 to derive our affine models.

The second part of this section will be devoted to building the machinery to make the same analysis in multiple dimensions. Consider first a continuous function f: In particular, there is a corresponding mean value theorem that says that the difference of the function values at two points is equal to the inner product of the difference between the points and the gradient of some point on the line connecting them. The exact definitions and lemmas are given below.

Thus the results follow directly from the corresponding theorems in one-variable calculus. We will denote the open and closed line segment connecting x, x e Rn, by x, x and [x, x], respectively, and we remind the reader that D Rn is called a convex set, if for every x, x e D, [x, x] D. Definition 4. LEMMA 4. Then, for x e D and any nonzero perturbation p e Rn, the directional derivative off at x in the direction of p, defined by Multivariable Calculus Background Chap.

Finally, by the mean value theorem for functions of one variable, which by the definition of g and 4. This is the reason we were interested in symmetric matrices in our chapter on linear algebra. Then Multivariable Calculus Background Chap. Lemma 4.

In Corollary 4. Now let us proceed to the less simple case of F: It will be convenient to have the special notation ef for the ith row of the identity matrix. There should be no confusion with the natural log base. Thus F' x must be an m x n matrix whose ith row is x T. The following definition makes this official. A continuous function F: The derivative of F at x is sometimes called the Jacobian matrix of F at x, and its transpose is sometimes called the gradient of F at x.

The common notations are: Also, we will often speak of the Jacobian of F rather than the Jacobian matrix of F at x. The only possible confusion might be that the latter term is sometimes used for the determinant of J x. However, no confusion should arise here, since we will find little use for determinants. For example, consider the function of Example 4. Although the standard mean value theorem is impossible, we will be able to replace it in our analysis by Newton's theorem and the triangle inequality for line integrals.The decomposition is found by Gaussian elimination with partial pivoting, or Doolittle reduction, using standard row operations equivalent to premultiplying A by L - 1 P - 1 to transform A into U, and simultaneously yields the decomposition.

In Sections 5. Theorems 2. Successfully reported this slideshow. The NelderMeade simplex algorithm [see, e. For example, one variable may always be in the range to and another in the range 1 to Although the standard mean value theorem is impossible. Modify your program from Exercise 9 to find the minimum of a function of one variable.

MERRILEE from Richmond
I relish exploring ePub and PDF books loosely. Feel free to read my other articles. I am highly influenced by haggis hurling.