05/08/2006
To iteratively solve large scale optimization problems in various contexts like planning, operations, design etc., we need to generate descent directions that are based on linear system solutions. Irrespective of the optimization algorithm or the solution method employed for the linear systems, ill conditioning introduced by problem characteristics or the algorithm or both need to be addressed. In [GL01] we used an intuitive heuristic approach in scaling linear systems that improved performance of a large scale interior point algorithm significantly. We saw a factor of 10*3* improvements in condition number estimates. In this paper, given our experience with optimization problems from a variety of application backgrounds like economics, finance, engineering, planning etc., we examine the theoretical basis for scaling while solving the linear systems. Our goal is to develop reasonably "good" scaling schemes with sound theoretical basis. We introduce concepts and define "good" scaling schemes in section (1), as well as explain related work in this area. Scaling has been studied extensively and though there is a broad agreement on its importance, the same cannot be said about what constitutes good scaling. A theoretical framework to scale an m x n real matrix is established in section (2). We use the first order conditions associated with the Euclidean metric to develop iterative schemes in section (2.3) that approximate solution in O(mn) time for real matrice. We discuss symmetry preserving scale factors for an n x n symmetric matrix in (3). The importance of symmetry preservation is discussed in section (3.1). An algorithm to directly compute symmetry preserving scale factors in O(n2) time based on Euclidean metric is presented in section (3.4) We also suggest scaling schemes based on rectilinear norm in section (2.4). Though all p-norms are theoretically equivalent, the importance of outliers increases as p increases. For barrier methods, due to large diagnal corrections, we believe that the taxicab metric (p = 1) may be more appropriate. We develop a linear programming model for it and look at a "reduced" dual that can be formulated as a minimum cost flow problem on networks. We are investigating algorithms to solve it in O(mn) time that we require for an efficient scaling procedure. We hope that in future special structure of the "reduced" dual could be exploited to solve it quickly. The dual information can then be used to compute the required scale factors. We discuss Manhattan metric for symmetric matrices in section (3.5) and as in the case of real matrices, we are unable to propose an efficient computational scheme for this metric. We look at a linearized ideal penalty function that only uses deviations out of the desired range in section (2.5). If we could use such a metric to generate an efficient solution, then we would like to see impact of changing the range on the numerical behavior.