Coverage for /pythoncovmergedfiles/medio/medio/usr/local/lib/python3.8/site-packages/scipy/optimize/_lsq/trf.py: 3%

287 statements  

« prev     ^ index     » next       coverage.py v7.3.2, created at 2023-12-12 06:31 +0000

1"""Trust Region Reflective algorithm for least-squares optimization. 

2 

3The algorithm is based on ideas from paper [STIR]_. The main idea is to 

4account for the presence of the bounds by appropriate scaling of the variables (or, 

5equivalently, changing a trust-region shape). Let's introduce a vector v: 

6 

7 | ub[i] - x[i], if g[i] < 0 and ub[i] < np.inf 

8 v[i] = | x[i] - lb[i], if g[i] > 0 and lb[i] > -np.inf 

9 | 1, otherwise 

10 

11where g is the gradient of a cost function and lb, ub are the bounds. Its 

12components are distances to the bounds at which the anti-gradient points (if 

13this distance is finite). Define a scaling matrix D = diag(v**0.5). 

14First-order optimality conditions can be stated as 

15 

16 D^2 g(x) = 0. 

17 

18Meaning that components of the gradient should be zero for strictly interior 

19variables, and components must point inside the feasible region for variables 

20on the bound. 

21 

22Now consider this system of equations as a new optimization problem. If the 

23point x is strictly interior (not on the bound), then the left-hand side is 

24differentiable and the Newton step for it satisfies 

25 

26 (D^2 H + diag(g) Jv) p = -D^2 g 

27 

28where H is the Hessian matrix (or its J^T J approximation in least squares), 

29Jv is the Jacobian matrix of v with components -1, 1 or 0, such that all 

30elements of matrix C = diag(g) Jv are non-negative. Introduce the change 

31of the variables x = D x_h (_h would be "hat" in LaTeX). In the new variables, 

32we have a Newton step satisfying 

33 

34 B_h p_h = -g_h, 

35 

36where B_h = D H D + C, g_h = D g. In least squares B_h = J_h^T J_h, where 

37J_h = J D. Note that J_h and g_h are proper Jacobian and gradient with respect 

38to "hat" variables. To guarantee global convergence we formulate a 

39trust-region problem based on the Newton step in the new variables: 

40 

41 0.5 * p_h^T B_h p + g_h^T p_h -> min, ||p_h|| <= Delta 

42 

43In the original space B = H + D^{-1} C D^{-1}, and the equivalent trust-region 

44problem is 

45 

46 0.5 * p^T B p + g^T p -> min, ||D^{-1} p|| <= Delta 

47 

48Here, the meaning of the matrix D becomes more clear: it alters the shape 

49of a trust-region, such that large steps towards the bounds are not allowed. 

50In the implementation, the trust-region problem is solved in "hat" space, 

51but handling of the bounds is done in the original space (see below and read 

52the code). 

53 

54The introduction of the matrix D doesn't allow to ignore bounds, the algorithm 

55must keep iterates strictly feasible (to satisfy aforementioned 

56differentiability), the parameter theta controls step back from the boundary 

57(see the code for details). 

58 

59The algorithm does another important trick. If the trust-region solution 

60doesn't fit into the bounds, then a reflected (from a firstly encountered 

61bound) search direction is considered. For motivation and analysis refer to 

62[STIR]_ paper (and other papers of the authors). In practice, it doesn't need 

63a lot of justifications, the algorithm simply chooses the best step among 

64three: a constrained trust-region step, a reflected step and a constrained 

65Cauchy step (a minimizer along -g_h in "hat" space, or -D^2 g in the original 

66space). 

67 

68Another feature is that a trust-region radius control strategy is modified to 

69account for appearance of the diagonal C matrix (called diag_h in the code). 

70 

71Note that all described peculiarities are completely gone as we consider 

72problems without bounds (the algorithm becomes a standard trust-region type 

73algorithm very similar to ones implemented in MINPACK). 

74 

75The implementation supports two methods of solving the trust-region problem. 

76The first, called 'exact', applies SVD on Jacobian and then solves the problem 

77very accurately using the algorithm described in [JJMore]_. It is not 

78applicable to large problem. The second, called 'lsmr', uses the 2-D subspace 

79approach (sometimes called "indefinite dogleg"), where the problem is solved 

80in a subspace spanned by the gradient and the approximate Gauss-Newton step 

81found by ``scipy.sparse.linalg.lsmr``. A 2-D trust-region problem is 

82reformulated as a 4th order algebraic equation and solved very accurately by 

83``numpy.roots``. The subspace approach allows to solve very large problems 

84(up to couple of millions of residuals on a regular PC), provided the Jacobian 

85matrix is sufficiently sparse. 

86 

87References 

88---------- 

89.. [STIR] Branch, M.A., T.F. Coleman, and Y. Li, "A Subspace, Interior, 

90 and Conjugate Gradient Method for Large-Scale Bound-Constrained 

91 Minimization Problems," SIAM Journal on Scientific Computing, 

92 Vol. 21, Number 1, pp 1-23, 1999. 

93.. [JJMore] More, J. J., "The Levenberg-Marquardt Algorithm: Implementation 

94 and Theory," Numerical Analysis, ed. G. A. Watson, Lecture 

95""" 

96import numpy as np 

97from numpy.linalg import norm 

98from scipy.linalg import svd, qr 

99from scipy.sparse.linalg import lsmr 

100from scipy.optimize import OptimizeResult 

101 

102from .common import ( 

103 step_size_to_bound, find_active_constraints, in_bounds, 

104 make_strictly_feasible, intersect_trust_region, solve_lsq_trust_region, 

105 solve_trust_region_2d, minimize_quadratic_1d, build_quadratic_1d, 

106 evaluate_quadratic, right_multiplied_operator, regularized_lsq_operator, 

107 CL_scaling_vector, compute_grad, compute_jac_scale, check_termination, 

108 update_tr_radius, scale_for_robust_loss_function, print_header_nonlinear, 

109 print_iteration_nonlinear) 

110 

111 

112def trf(fun, jac, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev, x_scale, 

113 loss_function, tr_solver, tr_options, verbose): 

114 # For efficiency, it makes sense to run the simplified version of the 

115 # algorithm when no bounds are imposed. We decided to write the two 

116 # separate functions. It violates the DRY principle, but the individual 

117 # functions are kept the most readable. 

118 if np.all(lb == -np.inf) and np.all(ub == np.inf): 

119 return trf_no_bounds( 

120 fun, jac, x0, f0, J0, ftol, xtol, gtol, max_nfev, x_scale, 

121 loss_function, tr_solver, tr_options, verbose) 

122 else: 

123 return trf_bounds( 

124 fun, jac, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev, x_scale, 

125 loss_function, tr_solver, tr_options, verbose) 

126 

127 

128def select_step(x, J_h, diag_h, g_h, p, p_h, d, Delta, lb, ub, theta): 

129 """Select the best step according to Trust Region Reflective algorithm.""" 

130 if in_bounds(x + p, lb, ub): 

131 p_value = evaluate_quadratic(J_h, g_h, p_h, diag=diag_h) 

132 return p, p_h, -p_value 

133 

134 p_stride, hits = step_size_to_bound(x, p, lb, ub) 

135 

136 # Compute the reflected direction. 

137 r_h = np.copy(p_h) 

138 r_h[hits.astype(bool)] *= -1 

139 r = d * r_h 

140 

141 # Restrict trust-region step, such that it hits the bound. 

142 p *= p_stride 

143 p_h *= p_stride 

144 x_on_bound = x + p 

145 

146 # Reflected direction will cross first either feasible region or trust 

147 # region boundary. 

148 _, to_tr = intersect_trust_region(p_h, r_h, Delta) 

149 to_bound, _ = step_size_to_bound(x_on_bound, r, lb, ub) 

150 

151 # Find lower and upper bounds on a step size along the reflected 

152 # direction, considering the strict feasibility requirement. There is no 

153 # single correct way to do that, the chosen approach seems to work best 

154 # on test problems. 

155 r_stride = min(to_bound, to_tr) 

156 if r_stride > 0: 

157 r_stride_l = (1 - theta) * p_stride / r_stride 

158 if r_stride == to_bound: 

159 r_stride_u = theta * to_bound 

160 else: 

161 r_stride_u = to_tr 

162 else: 

163 r_stride_l = 0 

164 r_stride_u = -1 

165 

166 # Check if reflection step is available. 

167 if r_stride_l <= r_stride_u: 

168 a, b, c = build_quadratic_1d(J_h, g_h, r_h, s0=p_h, diag=diag_h) 

169 r_stride, r_value = minimize_quadratic_1d( 

170 a, b, r_stride_l, r_stride_u, c=c) 

171 r_h *= r_stride 

172 r_h += p_h 

173 r = r_h * d 

174 else: 

175 r_value = np.inf 

176 

177 # Now correct p_h to make it strictly interior. 

178 p *= theta 

179 p_h *= theta 

180 p_value = evaluate_quadratic(J_h, g_h, p_h, diag=diag_h) 

181 

182 ag_h = -g_h 

183 ag = d * ag_h 

184 

185 to_tr = Delta / norm(ag_h) 

186 to_bound, _ = step_size_to_bound(x, ag, lb, ub) 

187 if to_bound < to_tr: 

188 ag_stride = theta * to_bound 

189 else: 

190 ag_stride = to_tr 

191 

192 a, b = build_quadratic_1d(J_h, g_h, ag_h, diag=diag_h) 

193 ag_stride, ag_value = minimize_quadratic_1d(a, b, 0, ag_stride) 

194 ag_h *= ag_stride 

195 ag *= ag_stride 

196 

197 if p_value < r_value and p_value < ag_value: 

198 return p, p_h, -p_value 

199 elif r_value < p_value and r_value < ag_value: 

200 return r, r_h, -r_value 

201 else: 

202 return ag, ag_h, -ag_value 

203 

204 

205def trf_bounds(fun, jac, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev, 

206 x_scale, loss_function, tr_solver, tr_options, verbose): 

207 x = x0.copy() 

208 

209 f = f0 

210 f_true = f.copy() 

211 nfev = 1 

212 

213 J = J0 

214 njev = 1 

215 m, n = J.shape 

216 

217 if loss_function is not None: 

218 rho = loss_function(f) 

219 cost = 0.5 * np.sum(rho[0]) 

220 J, f = scale_for_robust_loss_function(J, f, rho) 

221 else: 

222 cost = 0.5 * np.dot(f, f) 

223 

224 g = compute_grad(J, f) 

225 

226 jac_scale = isinstance(x_scale, str) and x_scale == 'jac' 

227 if jac_scale: 

228 scale, scale_inv = compute_jac_scale(J) 

229 else: 

230 scale, scale_inv = x_scale, 1 / x_scale 

231 

232 v, dv = CL_scaling_vector(x, g, lb, ub) 

233 v[dv != 0] *= scale_inv[dv != 0] 

234 Delta = norm(x0 * scale_inv / v**0.5) 

235 if Delta == 0: 

236 Delta = 1.0 

237 

238 g_norm = norm(g * v, ord=np.inf) 

239 

240 f_augmented = np.zeros((m + n)) 

241 if tr_solver == 'exact': 

242 J_augmented = np.empty((m + n, n)) 

243 elif tr_solver == 'lsmr': 

244 reg_term = 0.0 

245 regularize = tr_options.pop('regularize', True) 

246 

247 if max_nfev is None: 

248 max_nfev = x0.size * 100 

249 

250 alpha = 0.0 # "Levenberg-Marquardt" parameter 

251 

252 termination_status = None 

253 iteration = 0 

254 step_norm = None 

255 actual_reduction = None 

256 

257 if verbose == 2: 

258 print_header_nonlinear() 

259 

260 while True: 

261 v, dv = CL_scaling_vector(x, g, lb, ub) 

262 

263 g_norm = norm(g * v, ord=np.inf) 

264 if g_norm < gtol: 

265 termination_status = 1 

266 

267 if verbose == 2: 

268 print_iteration_nonlinear(iteration, nfev, cost, actual_reduction, 

269 step_norm, g_norm) 

270 

271 if termination_status is not None or nfev == max_nfev: 

272 break 

273 

274 # Now compute variables in "hat" space. Here, we also account for 

275 # scaling introduced by `x_scale` parameter. This part is a bit tricky, 

276 # you have to write down the formulas and see how the trust-region 

277 # problem is formulated when the two types of scaling are applied. 

278 # The idea is that first we apply `x_scale` and then apply Coleman-Li 

279 # approach in the new variables. 

280 

281 # v is recomputed in the variables after applying `x_scale`, note that 

282 # components which were identically 1 not affected. 

283 v[dv != 0] *= scale_inv[dv != 0] 

284 

285 # Here, we apply two types of scaling. 

286 d = v**0.5 * scale 

287 

288 # C = diag(g * scale) Jv 

289 diag_h = g * dv * scale 

290 

291 # After all this has been done, we continue normally. 

292 

293 # "hat" gradient. 

294 g_h = d * g 

295 

296 f_augmented[:m] = f 

297 if tr_solver == 'exact': 

298 J_augmented[:m] = J * d 

299 J_h = J_augmented[:m] # Memory view. 

300 J_augmented[m:] = np.diag(diag_h**0.5) 

301 U, s, V = svd(J_augmented, full_matrices=False) 

302 V = V.T 

303 uf = U.T.dot(f_augmented) 

304 elif tr_solver == 'lsmr': 

305 J_h = right_multiplied_operator(J, d) 

306 

307 if regularize: 

308 a, b = build_quadratic_1d(J_h, g_h, -g_h, diag=diag_h) 

309 to_tr = Delta / norm(g_h) 

310 ag_value = minimize_quadratic_1d(a, b, 0, to_tr)[1] 

311 reg_term = -ag_value / Delta**2 

312 

313 lsmr_op = regularized_lsq_operator(J_h, (diag_h + reg_term)**0.5) 

314 gn_h = lsmr(lsmr_op, f_augmented, **tr_options)[0] 

315 S = np.vstack((g_h, gn_h)).T 

316 S, _ = qr(S, mode='economic') 

317 JS = J_h.dot(S) # LinearOperator does dot too. 

318 B_S = np.dot(JS.T, JS) + np.dot(S.T * diag_h, S) 

319 g_S = S.T.dot(g_h) 

320 

321 # theta controls step back step ratio from the bounds. 

322 theta = max(0.995, 1 - g_norm) 

323 

324 actual_reduction = -1 

325 while actual_reduction <= 0 and nfev < max_nfev: 

326 if tr_solver == 'exact': 

327 p_h, alpha, n_iter = solve_lsq_trust_region( 

328 n, m, uf, s, V, Delta, initial_alpha=alpha) 

329 elif tr_solver == 'lsmr': 

330 p_S, _ = solve_trust_region_2d(B_S, g_S, Delta) 

331 p_h = S.dot(p_S) 

332 

333 p = d * p_h # Trust-region solution in the original space. 

334 step, step_h, predicted_reduction = select_step( 

335 x, J_h, diag_h, g_h, p, p_h, d, Delta, lb, ub, theta) 

336 

337 x_new = make_strictly_feasible(x + step, lb, ub, rstep=0) 

338 f_new = fun(x_new) 

339 nfev += 1 

340 

341 step_h_norm = norm(step_h) 

342 

343 if not np.all(np.isfinite(f_new)): 

344 Delta = 0.25 * step_h_norm 

345 continue 

346 

347 # Usual trust-region step quality estimation. 

348 if loss_function is not None: 

349 cost_new = loss_function(f_new, cost_only=True) 

350 else: 

351 cost_new = 0.5 * np.dot(f_new, f_new) 

352 actual_reduction = cost - cost_new 

353 Delta_new, ratio = update_tr_radius( 

354 Delta, actual_reduction, predicted_reduction, 

355 step_h_norm, step_h_norm > 0.95 * Delta) 

356 

357 step_norm = norm(step) 

358 termination_status = check_termination( 

359 actual_reduction, cost, step_norm, norm(x), ratio, ftol, xtol) 

360 if termination_status is not None: 

361 break 

362 

363 alpha *= Delta / Delta_new 

364 Delta = Delta_new 

365 

366 if actual_reduction > 0: 

367 x = x_new 

368 

369 f = f_new 

370 f_true = f.copy() 

371 

372 cost = cost_new 

373 

374 J = jac(x, f) 

375 njev += 1 

376 

377 if loss_function is not None: 

378 rho = loss_function(f) 

379 J, f = scale_for_robust_loss_function(J, f, rho) 

380 

381 g = compute_grad(J, f) 

382 

383 if jac_scale: 

384 scale, scale_inv = compute_jac_scale(J, scale_inv) 

385 else: 

386 step_norm = 0 

387 actual_reduction = 0 

388 

389 iteration += 1 

390 

391 if termination_status is None: 

392 termination_status = 0 

393 

394 active_mask = find_active_constraints(x, lb, ub, rtol=xtol) 

395 return OptimizeResult( 

396 x=x, cost=cost, fun=f_true, jac=J, grad=g, optimality=g_norm, 

397 active_mask=active_mask, nfev=nfev, njev=njev, 

398 status=termination_status) 

399 

400 

401def trf_no_bounds(fun, jac, x0, f0, J0, ftol, xtol, gtol, max_nfev, 

402 x_scale, loss_function, tr_solver, tr_options, verbose): 

403 x = x0.copy() 

404 

405 f = f0 

406 f_true = f.copy() 

407 nfev = 1 

408 

409 J = J0 

410 njev = 1 

411 m, n = J.shape 

412 

413 if loss_function is not None: 

414 rho = loss_function(f) 

415 cost = 0.5 * np.sum(rho[0]) 

416 J, f = scale_for_robust_loss_function(J, f, rho) 

417 else: 

418 cost = 0.5 * np.dot(f, f) 

419 

420 g = compute_grad(J, f) 

421 

422 jac_scale = isinstance(x_scale, str) and x_scale == 'jac' 

423 if jac_scale: 

424 scale, scale_inv = compute_jac_scale(J) 

425 else: 

426 scale, scale_inv = x_scale, 1 / x_scale 

427 

428 Delta = norm(x0 * scale_inv) 

429 if Delta == 0: 

430 Delta = 1.0 

431 

432 if tr_solver == 'lsmr': 

433 reg_term = 0 

434 damp = tr_options.pop('damp', 0.0) 

435 regularize = tr_options.pop('regularize', True) 

436 

437 if max_nfev is None: 

438 max_nfev = x0.size * 100 

439 

440 alpha = 0.0 # "Levenberg-Marquardt" parameter 

441 

442 termination_status = None 

443 iteration = 0 

444 step_norm = None 

445 actual_reduction = None 

446 

447 if verbose == 2: 

448 print_header_nonlinear() 

449 

450 while True: 

451 g_norm = norm(g, ord=np.inf) 

452 if g_norm < gtol: 

453 termination_status = 1 

454 

455 if verbose == 2: 

456 print_iteration_nonlinear(iteration, nfev, cost, actual_reduction, 

457 step_norm, g_norm) 

458 

459 if termination_status is not None or nfev == max_nfev: 

460 break 

461 

462 d = scale 

463 g_h = d * g 

464 

465 if tr_solver == 'exact': 

466 J_h = J * d 

467 U, s, V = svd(J_h, full_matrices=False) 

468 V = V.T 

469 uf = U.T.dot(f) 

470 elif tr_solver == 'lsmr': 

471 J_h = right_multiplied_operator(J, d) 

472 

473 if regularize: 

474 a, b = build_quadratic_1d(J_h, g_h, -g_h) 

475 to_tr = Delta / norm(g_h) 

476 ag_value = minimize_quadratic_1d(a, b, 0, to_tr)[1] 

477 reg_term = -ag_value / Delta**2 

478 

479 damp_full = (damp**2 + reg_term)**0.5 

480 gn_h = lsmr(J_h, f, damp=damp_full, **tr_options)[0] 

481 S = np.vstack((g_h, gn_h)).T 

482 S, _ = qr(S, mode='economic') 

483 JS = J_h.dot(S) 

484 B_S = np.dot(JS.T, JS) 

485 g_S = S.T.dot(g_h) 

486 

487 actual_reduction = -1 

488 while actual_reduction <= 0 and nfev < max_nfev: 

489 if tr_solver == 'exact': 

490 step_h, alpha, n_iter = solve_lsq_trust_region( 

491 n, m, uf, s, V, Delta, initial_alpha=alpha) 

492 elif tr_solver == 'lsmr': 

493 p_S, _ = solve_trust_region_2d(B_S, g_S, Delta) 

494 step_h = S.dot(p_S) 

495 

496 predicted_reduction = -evaluate_quadratic(J_h, g_h, step_h) 

497 step = d * step_h 

498 x_new = x + step 

499 f_new = fun(x_new) 

500 nfev += 1 

501 

502 step_h_norm = norm(step_h) 

503 

504 if not np.all(np.isfinite(f_new)): 

505 Delta = 0.25 * step_h_norm 

506 continue 

507 

508 # Usual trust-region step quality estimation. 

509 if loss_function is not None: 

510 cost_new = loss_function(f_new, cost_only=True) 

511 else: 

512 cost_new = 0.5 * np.dot(f_new, f_new) 

513 actual_reduction = cost - cost_new 

514 

515 Delta_new, ratio = update_tr_radius( 

516 Delta, actual_reduction, predicted_reduction, 

517 step_h_norm, step_h_norm > 0.95 * Delta) 

518 

519 step_norm = norm(step) 

520 termination_status = check_termination( 

521 actual_reduction, cost, step_norm, norm(x), ratio, ftol, xtol) 

522 if termination_status is not None: 

523 break 

524 

525 alpha *= Delta / Delta_new 

526 Delta = Delta_new 

527 

528 if actual_reduction > 0: 

529 x = x_new 

530 

531 f = f_new 

532 f_true = f.copy() 

533 

534 cost = cost_new 

535 

536 J = jac(x, f) 

537 njev += 1 

538 

539 if loss_function is not None: 

540 rho = loss_function(f) 

541 J, f = scale_for_robust_loss_function(J, f, rho) 

542 

543 g = compute_grad(J, f) 

544 

545 if jac_scale: 

546 scale, scale_inv = compute_jac_scale(J, scale_inv) 

547 else: 

548 step_norm = 0 

549 actual_reduction = 0 

550 

551 iteration += 1 

552 

553 if termination_status is None: 

554 termination_status = 0 

555 

556 active_mask = np.zeros_like(x) 

557 return OptimizeResult( 

558 x=x, cost=cost, fun=f_true, jac=J, grad=g, optimality=g_norm, 

559 active_mask=active_mask, nfev=nfev, njev=njev, 

560 status=termination_status)