Coverage for /pythoncovmergedfiles/medio/medio/usr/local/lib/python3.8/site-packages/scipy/optimize/_lsq/trf.py: 3%
287 statements
« prev ^ index » next coverage.py v7.3.2, created at 2023-12-12 06:31 +0000
« prev ^ index » next coverage.py v7.3.2, created at 2023-12-12 06:31 +0000
1"""Trust Region Reflective algorithm for least-squares optimization.
3The algorithm is based on ideas from paper [STIR]_. The main idea is to
4account for the presence of the bounds by appropriate scaling of the variables (or,
5equivalently, changing a trust-region shape). Let's introduce a vector v:
7 | ub[i] - x[i], if g[i] < 0 and ub[i] < np.inf
8 v[i] = | x[i] - lb[i], if g[i] > 0 and lb[i] > -np.inf
9 | 1, otherwise
11where g is the gradient of a cost function and lb, ub are the bounds. Its
12components are distances to the bounds at which the anti-gradient points (if
13this distance is finite). Define a scaling matrix D = diag(v**0.5).
14First-order optimality conditions can be stated as
16 D^2 g(x) = 0.
18Meaning that components of the gradient should be zero for strictly interior
19variables, and components must point inside the feasible region for variables
20on the bound.
22Now consider this system of equations as a new optimization problem. If the
23point x is strictly interior (not on the bound), then the left-hand side is
24differentiable and the Newton step for it satisfies
26 (D^2 H + diag(g) Jv) p = -D^2 g
28where H is the Hessian matrix (or its J^T J approximation in least squares),
29Jv is the Jacobian matrix of v with components -1, 1 or 0, such that all
30elements of matrix C = diag(g) Jv are non-negative. Introduce the change
31of the variables x = D x_h (_h would be "hat" in LaTeX). In the new variables,
32we have a Newton step satisfying
34 B_h p_h = -g_h,
36where B_h = D H D + C, g_h = D g. In least squares B_h = J_h^T J_h, where
37J_h = J D. Note that J_h and g_h are proper Jacobian and gradient with respect
38to "hat" variables. To guarantee global convergence we formulate a
39trust-region problem based on the Newton step in the new variables:
41 0.5 * p_h^T B_h p + g_h^T p_h -> min, ||p_h|| <= Delta
43In the original space B = H + D^{-1} C D^{-1}, and the equivalent trust-region
44problem is
46 0.5 * p^T B p + g^T p -> min, ||D^{-1} p|| <= Delta
48Here, the meaning of the matrix D becomes more clear: it alters the shape
49of a trust-region, such that large steps towards the bounds are not allowed.
50In the implementation, the trust-region problem is solved in "hat" space,
51but handling of the bounds is done in the original space (see below and read
52the code).
54The introduction of the matrix D doesn't allow to ignore bounds, the algorithm
55must keep iterates strictly feasible (to satisfy aforementioned
56differentiability), the parameter theta controls step back from the boundary
57(see the code for details).
59The algorithm does another important trick. If the trust-region solution
60doesn't fit into the bounds, then a reflected (from a firstly encountered
61bound) search direction is considered. For motivation and analysis refer to
62[STIR]_ paper (and other papers of the authors). In practice, it doesn't need
63a lot of justifications, the algorithm simply chooses the best step among
64three: a constrained trust-region step, a reflected step and a constrained
65Cauchy step (a minimizer along -g_h in "hat" space, or -D^2 g in the original
66space).
68Another feature is that a trust-region radius control strategy is modified to
69account for appearance of the diagonal C matrix (called diag_h in the code).
71Note that all described peculiarities are completely gone as we consider
72problems without bounds (the algorithm becomes a standard trust-region type
73algorithm very similar to ones implemented in MINPACK).
75The implementation supports two methods of solving the trust-region problem.
76The first, called 'exact', applies SVD on Jacobian and then solves the problem
77very accurately using the algorithm described in [JJMore]_. It is not
78applicable to large problem. The second, called 'lsmr', uses the 2-D subspace
79approach (sometimes called "indefinite dogleg"), where the problem is solved
80in a subspace spanned by the gradient and the approximate Gauss-Newton step
81found by ``scipy.sparse.linalg.lsmr``. A 2-D trust-region problem is
82reformulated as a 4th order algebraic equation and solved very accurately by
83``numpy.roots``. The subspace approach allows to solve very large problems
84(up to couple of millions of residuals on a regular PC), provided the Jacobian
85matrix is sufficiently sparse.
87References
88----------
89.. [STIR] Branch, M.A., T.F. Coleman, and Y. Li, "A Subspace, Interior,
90 and Conjugate Gradient Method for Large-Scale Bound-Constrained
91 Minimization Problems," SIAM Journal on Scientific Computing,
92 Vol. 21, Number 1, pp 1-23, 1999.
93.. [JJMore] More, J. J., "The Levenberg-Marquardt Algorithm: Implementation
94 and Theory," Numerical Analysis, ed. G. A. Watson, Lecture
95"""
96import numpy as np
97from numpy.linalg import norm
98from scipy.linalg import svd, qr
99from scipy.sparse.linalg import lsmr
100from scipy.optimize import OptimizeResult
102from .common import (
103 step_size_to_bound, find_active_constraints, in_bounds,
104 make_strictly_feasible, intersect_trust_region, solve_lsq_trust_region,
105 solve_trust_region_2d, minimize_quadratic_1d, build_quadratic_1d,
106 evaluate_quadratic, right_multiplied_operator, regularized_lsq_operator,
107 CL_scaling_vector, compute_grad, compute_jac_scale, check_termination,
108 update_tr_radius, scale_for_robust_loss_function, print_header_nonlinear,
109 print_iteration_nonlinear)
112def trf(fun, jac, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev, x_scale,
113 loss_function, tr_solver, tr_options, verbose):
114 # For efficiency, it makes sense to run the simplified version of the
115 # algorithm when no bounds are imposed. We decided to write the two
116 # separate functions. It violates the DRY principle, but the individual
117 # functions are kept the most readable.
118 if np.all(lb == -np.inf) and np.all(ub == np.inf):
119 return trf_no_bounds(
120 fun, jac, x0, f0, J0, ftol, xtol, gtol, max_nfev, x_scale,
121 loss_function, tr_solver, tr_options, verbose)
122 else:
123 return trf_bounds(
124 fun, jac, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev, x_scale,
125 loss_function, tr_solver, tr_options, verbose)
128def select_step(x, J_h, diag_h, g_h, p, p_h, d, Delta, lb, ub, theta):
129 """Select the best step according to Trust Region Reflective algorithm."""
130 if in_bounds(x + p, lb, ub):
131 p_value = evaluate_quadratic(J_h, g_h, p_h, diag=diag_h)
132 return p, p_h, -p_value
134 p_stride, hits = step_size_to_bound(x, p, lb, ub)
136 # Compute the reflected direction.
137 r_h = np.copy(p_h)
138 r_h[hits.astype(bool)] *= -1
139 r = d * r_h
141 # Restrict trust-region step, such that it hits the bound.
142 p *= p_stride
143 p_h *= p_stride
144 x_on_bound = x + p
146 # Reflected direction will cross first either feasible region or trust
147 # region boundary.
148 _, to_tr = intersect_trust_region(p_h, r_h, Delta)
149 to_bound, _ = step_size_to_bound(x_on_bound, r, lb, ub)
151 # Find lower and upper bounds on a step size along the reflected
152 # direction, considering the strict feasibility requirement. There is no
153 # single correct way to do that, the chosen approach seems to work best
154 # on test problems.
155 r_stride = min(to_bound, to_tr)
156 if r_stride > 0:
157 r_stride_l = (1 - theta) * p_stride / r_stride
158 if r_stride == to_bound:
159 r_stride_u = theta * to_bound
160 else:
161 r_stride_u = to_tr
162 else:
163 r_stride_l = 0
164 r_stride_u = -1
166 # Check if reflection step is available.
167 if r_stride_l <= r_stride_u:
168 a, b, c = build_quadratic_1d(J_h, g_h, r_h, s0=p_h, diag=diag_h)
169 r_stride, r_value = minimize_quadratic_1d(
170 a, b, r_stride_l, r_stride_u, c=c)
171 r_h *= r_stride
172 r_h += p_h
173 r = r_h * d
174 else:
175 r_value = np.inf
177 # Now correct p_h to make it strictly interior.
178 p *= theta
179 p_h *= theta
180 p_value = evaluate_quadratic(J_h, g_h, p_h, diag=diag_h)
182 ag_h = -g_h
183 ag = d * ag_h
185 to_tr = Delta / norm(ag_h)
186 to_bound, _ = step_size_to_bound(x, ag, lb, ub)
187 if to_bound < to_tr:
188 ag_stride = theta * to_bound
189 else:
190 ag_stride = to_tr
192 a, b = build_quadratic_1d(J_h, g_h, ag_h, diag=diag_h)
193 ag_stride, ag_value = minimize_quadratic_1d(a, b, 0, ag_stride)
194 ag_h *= ag_stride
195 ag *= ag_stride
197 if p_value < r_value and p_value < ag_value:
198 return p, p_h, -p_value
199 elif r_value < p_value and r_value < ag_value:
200 return r, r_h, -r_value
201 else:
202 return ag, ag_h, -ag_value
205def trf_bounds(fun, jac, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev,
206 x_scale, loss_function, tr_solver, tr_options, verbose):
207 x = x0.copy()
209 f = f0
210 f_true = f.copy()
211 nfev = 1
213 J = J0
214 njev = 1
215 m, n = J.shape
217 if loss_function is not None:
218 rho = loss_function(f)
219 cost = 0.5 * np.sum(rho[0])
220 J, f = scale_for_robust_loss_function(J, f, rho)
221 else:
222 cost = 0.5 * np.dot(f, f)
224 g = compute_grad(J, f)
226 jac_scale = isinstance(x_scale, str) and x_scale == 'jac'
227 if jac_scale:
228 scale, scale_inv = compute_jac_scale(J)
229 else:
230 scale, scale_inv = x_scale, 1 / x_scale
232 v, dv = CL_scaling_vector(x, g, lb, ub)
233 v[dv != 0] *= scale_inv[dv != 0]
234 Delta = norm(x0 * scale_inv / v**0.5)
235 if Delta == 0:
236 Delta = 1.0
238 g_norm = norm(g * v, ord=np.inf)
240 f_augmented = np.zeros((m + n))
241 if tr_solver == 'exact':
242 J_augmented = np.empty((m + n, n))
243 elif tr_solver == 'lsmr':
244 reg_term = 0.0
245 regularize = tr_options.pop('regularize', True)
247 if max_nfev is None:
248 max_nfev = x0.size * 100
250 alpha = 0.0 # "Levenberg-Marquardt" parameter
252 termination_status = None
253 iteration = 0
254 step_norm = None
255 actual_reduction = None
257 if verbose == 2:
258 print_header_nonlinear()
260 while True:
261 v, dv = CL_scaling_vector(x, g, lb, ub)
263 g_norm = norm(g * v, ord=np.inf)
264 if g_norm < gtol:
265 termination_status = 1
267 if verbose == 2:
268 print_iteration_nonlinear(iteration, nfev, cost, actual_reduction,
269 step_norm, g_norm)
271 if termination_status is not None or nfev == max_nfev:
272 break
274 # Now compute variables in "hat" space. Here, we also account for
275 # scaling introduced by `x_scale` parameter. This part is a bit tricky,
276 # you have to write down the formulas and see how the trust-region
277 # problem is formulated when the two types of scaling are applied.
278 # The idea is that first we apply `x_scale` and then apply Coleman-Li
279 # approach in the new variables.
281 # v is recomputed in the variables after applying `x_scale`, note that
282 # components which were identically 1 not affected.
283 v[dv != 0] *= scale_inv[dv != 0]
285 # Here, we apply two types of scaling.
286 d = v**0.5 * scale
288 # C = diag(g * scale) Jv
289 diag_h = g * dv * scale
291 # After all this has been done, we continue normally.
293 # "hat" gradient.
294 g_h = d * g
296 f_augmented[:m] = f
297 if tr_solver == 'exact':
298 J_augmented[:m] = J * d
299 J_h = J_augmented[:m] # Memory view.
300 J_augmented[m:] = np.diag(diag_h**0.5)
301 U, s, V = svd(J_augmented, full_matrices=False)
302 V = V.T
303 uf = U.T.dot(f_augmented)
304 elif tr_solver == 'lsmr':
305 J_h = right_multiplied_operator(J, d)
307 if regularize:
308 a, b = build_quadratic_1d(J_h, g_h, -g_h, diag=diag_h)
309 to_tr = Delta / norm(g_h)
310 ag_value = minimize_quadratic_1d(a, b, 0, to_tr)[1]
311 reg_term = -ag_value / Delta**2
313 lsmr_op = regularized_lsq_operator(J_h, (diag_h + reg_term)**0.5)
314 gn_h = lsmr(lsmr_op, f_augmented, **tr_options)[0]
315 S = np.vstack((g_h, gn_h)).T
316 S, _ = qr(S, mode='economic')
317 JS = J_h.dot(S) # LinearOperator does dot too.
318 B_S = np.dot(JS.T, JS) + np.dot(S.T * diag_h, S)
319 g_S = S.T.dot(g_h)
321 # theta controls step back step ratio from the bounds.
322 theta = max(0.995, 1 - g_norm)
324 actual_reduction = -1
325 while actual_reduction <= 0 and nfev < max_nfev:
326 if tr_solver == 'exact':
327 p_h, alpha, n_iter = solve_lsq_trust_region(
328 n, m, uf, s, V, Delta, initial_alpha=alpha)
329 elif tr_solver == 'lsmr':
330 p_S, _ = solve_trust_region_2d(B_S, g_S, Delta)
331 p_h = S.dot(p_S)
333 p = d * p_h # Trust-region solution in the original space.
334 step, step_h, predicted_reduction = select_step(
335 x, J_h, diag_h, g_h, p, p_h, d, Delta, lb, ub, theta)
337 x_new = make_strictly_feasible(x + step, lb, ub, rstep=0)
338 f_new = fun(x_new)
339 nfev += 1
341 step_h_norm = norm(step_h)
343 if not np.all(np.isfinite(f_new)):
344 Delta = 0.25 * step_h_norm
345 continue
347 # Usual trust-region step quality estimation.
348 if loss_function is not None:
349 cost_new = loss_function(f_new, cost_only=True)
350 else:
351 cost_new = 0.5 * np.dot(f_new, f_new)
352 actual_reduction = cost - cost_new
353 Delta_new, ratio = update_tr_radius(
354 Delta, actual_reduction, predicted_reduction,
355 step_h_norm, step_h_norm > 0.95 * Delta)
357 step_norm = norm(step)
358 termination_status = check_termination(
359 actual_reduction, cost, step_norm, norm(x), ratio, ftol, xtol)
360 if termination_status is not None:
361 break
363 alpha *= Delta / Delta_new
364 Delta = Delta_new
366 if actual_reduction > 0:
367 x = x_new
369 f = f_new
370 f_true = f.copy()
372 cost = cost_new
374 J = jac(x, f)
375 njev += 1
377 if loss_function is not None:
378 rho = loss_function(f)
379 J, f = scale_for_robust_loss_function(J, f, rho)
381 g = compute_grad(J, f)
383 if jac_scale:
384 scale, scale_inv = compute_jac_scale(J, scale_inv)
385 else:
386 step_norm = 0
387 actual_reduction = 0
389 iteration += 1
391 if termination_status is None:
392 termination_status = 0
394 active_mask = find_active_constraints(x, lb, ub, rtol=xtol)
395 return OptimizeResult(
396 x=x, cost=cost, fun=f_true, jac=J, grad=g, optimality=g_norm,
397 active_mask=active_mask, nfev=nfev, njev=njev,
398 status=termination_status)
401def trf_no_bounds(fun, jac, x0, f0, J0, ftol, xtol, gtol, max_nfev,
402 x_scale, loss_function, tr_solver, tr_options, verbose):
403 x = x0.copy()
405 f = f0
406 f_true = f.copy()
407 nfev = 1
409 J = J0
410 njev = 1
411 m, n = J.shape
413 if loss_function is not None:
414 rho = loss_function(f)
415 cost = 0.5 * np.sum(rho[0])
416 J, f = scale_for_robust_loss_function(J, f, rho)
417 else:
418 cost = 0.5 * np.dot(f, f)
420 g = compute_grad(J, f)
422 jac_scale = isinstance(x_scale, str) and x_scale == 'jac'
423 if jac_scale:
424 scale, scale_inv = compute_jac_scale(J)
425 else:
426 scale, scale_inv = x_scale, 1 / x_scale
428 Delta = norm(x0 * scale_inv)
429 if Delta == 0:
430 Delta = 1.0
432 if tr_solver == 'lsmr':
433 reg_term = 0
434 damp = tr_options.pop('damp', 0.0)
435 regularize = tr_options.pop('regularize', True)
437 if max_nfev is None:
438 max_nfev = x0.size * 100
440 alpha = 0.0 # "Levenberg-Marquardt" parameter
442 termination_status = None
443 iteration = 0
444 step_norm = None
445 actual_reduction = None
447 if verbose == 2:
448 print_header_nonlinear()
450 while True:
451 g_norm = norm(g, ord=np.inf)
452 if g_norm < gtol:
453 termination_status = 1
455 if verbose == 2:
456 print_iteration_nonlinear(iteration, nfev, cost, actual_reduction,
457 step_norm, g_norm)
459 if termination_status is not None or nfev == max_nfev:
460 break
462 d = scale
463 g_h = d * g
465 if tr_solver == 'exact':
466 J_h = J * d
467 U, s, V = svd(J_h, full_matrices=False)
468 V = V.T
469 uf = U.T.dot(f)
470 elif tr_solver == 'lsmr':
471 J_h = right_multiplied_operator(J, d)
473 if regularize:
474 a, b = build_quadratic_1d(J_h, g_h, -g_h)
475 to_tr = Delta / norm(g_h)
476 ag_value = minimize_quadratic_1d(a, b, 0, to_tr)[1]
477 reg_term = -ag_value / Delta**2
479 damp_full = (damp**2 + reg_term)**0.5
480 gn_h = lsmr(J_h, f, damp=damp_full, **tr_options)[0]
481 S = np.vstack((g_h, gn_h)).T
482 S, _ = qr(S, mode='economic')
483 JS = J_h.dot(S)
484 B_S = np.dot(JS.T, JS)
485 g_S = S.T.dot(g_h)
487 actual_reduction = -1
488 while actual_reduction <= 0 and nfev < max_nfev:
489 if tr_solver == 'exact':
490 step_h, alpha, n_iter = solve_lsq_trust_region(
491 n, m, uf, s, V, Delta, initial_alpha=alpha)
492 elif tr_solver == 'lsmr':
493 p_S, _ = solve_trust_region_2d(B_S, g_S, Delta)
494 step_h = S.dot(p_S)
496 predicted_reduction = -evaluate_quadratic(J_h, g_h, step_h)
497 step = d * step_h
498 x_new = x + step
499 f_new = fun(x_new)
500 nfev += 1
502 step_h_norm = norm(step_h)
504 if not np.all(np.isfinite(f_new)):
505 Delta = 0.25 * step_h_norm
506 continue
508 # Usual trust-region step quality estimation.
509 if loss_function is not None:
510 cost_new = loss_function(f_new, cost_only=True)
511 else:
512 cost_new = 0.5 * np.dot(f_new, f_new)
513 actual_reduction = cost - cost_new
515 Delta_new, ratio = update_tr_radius(
516 Delta, actual_reduction, predicted_reduction,
517 step_h_norm, step_h_norm > 0.95 * Delta)
519 step_norm = norm(step)
520 termination_status = check_termination(
521 actual_reduction, cost, step_norm, norm(x), ratio, ftol, xtol)
522 if termination_status is not None:
523 break
525 alpha *= Delta / Delta_new
526 Delta = Delta_new
528 if actual_reduction > 0:
529 x = x_new
531 f = f_new
532 f_true = f.copy()
534 cost = cost_new
536 J = jac(x, f)
537 njev += 1
539 if loss_function is not None:
540 rho = loss_function(f)
541 J, f = scale_for_robust_loss_function(J, f, rho)
543 g = compute_grad(J, f)
545 if jac_scale:
546 scale, scale_inv = compute_jac_scale(J, scale_inv)
547 else:
548 step_norm = 0
549 actual_reduction = 0
551 iteration += 1
553 if termination_status is None:
554 termination_status = 0
556 active_mask = np.zeros_like(x)
557 return OptimizeResult(
558 x=x, cost=cost, fun=f_true, jac=J, grad=g, optimality=g_norm,
559 active_mask=active_mask, nfev=nfev, njev=njev,
560 status=termination_status)