Coverage for /pythoncovmergedfiles/medio/medio/usr/local/lib/python3.8/site-packages/scipy/optimize/

1"""Trust Region Reflective algorithm for least-squares optimization.

3The algorithm is based on ideas from paper [STIR]_. The main idea is to

4account for the presence of the bounds by appropriate scaling of the variables (or,

5equivalently, changing a trust-region shape). Let's introduce a vector v:

7 | ub[i] - x[i], if g[i] < 0 and ub[i] < np.inf

8 v[i] = | x[i] - lb[i], if g[i] > 0 and lb[i] > -np.inf

9 | 1, otherwise

11where g is the gradient of a cost function and lb, ub are the bounds. Its

12components are distances to the bounds at which the anti-gradient points (if

13this distance is finite). Define a scaling matrix D = diag(v**0.5).

14First-order optimality conditions can be stated as

16 D^2 g(x) = 0.

18Meaning that components of the gradient should be zero for strictly interior

19variables, and components must point inside the feasible region for variables

20on the bound.

22Now consider this system of equations as a new optimization problem. If the

23point x is strictly interior (not on the bound), then the left-hand side is

24differentiable and the Newton step for it satisfies

26 (D^2 H + diag(g) Jv) p = -D^2 g

28where H is the Hessian matrix (or its J^T J approximation in least squares),

29Jv is the Jacobian matrix of v with components -1, 1 or 0, such that all

30elements of matrix C = diag(g) Jv are non-negative. Introduce the change

31of the variables x = D x_h (_h would be "hat" in LaTeX). In the new variables,

32we have a Newton step satisfying

34 B_h p_h = -g_h,

36where B_h = D H D + C, g_h = D g. In least squares B_h = J_h^T J_h, where

37J_h = J D. Note that J_h and g_h are proper Jacobian and gradient with respect

38to "hat" variables. To guarantee global convergence we formulate a

39trust-region problem based on the Newton step in the new variables:

41 0.5 * p_h^T B_h p + g_h^T p_h -> min, ||p_h|| <= Delta

43In the original space B = H + D^{-1} C D^{-1}, and the equivalent trust-region

44problem is

46 0.5 * p^T B p + g^T p -> min, ||D^{-1} p|| <= Delta

48Here, the meaning of the matrix D becomes more clear: it alters the shape

49of a trust-region, such that large steps towards the bounds are not allowed.

50In the implementation, the trust-region problem is solved in "hat" space,

51but handling of the bounds is done in the original space (see below and read

52the code).

54The introduction of the matrix D doesn't allow to ignore bounds, the algorithm

55must keep iterates strictly feasible (to satisfy aforementioned

56differentiability), the parameter theta controls step back from the boundary

57(see the code for details).

59The algorithm does another important trick. If the trust-region solution

60doesn't fit into the bounds, then a reflected (from a firstly encountered

61bound) search direction is considered. For motivation and analysis refer to

62[STIR]_ paper (and other papers of the authors). In practice, it doesn't need

63a lot of justifications, the algorithm simply chooses the best step among

64three: a constrained trust-region step, a reflected step and a constrained

65Cauchy step (a minimizer along -g_h in "hat" space, or -D^2 g in the original

66space).

68Another feature is that a trust-region radius control strategy is modified to

69account for appearance of the diagonal C matrix (called diag_h in the code).

71Note that all described peculiarities are completely gone as we consider

72problems without bounds (the algorithm becomes a standard trust-region type

73algorithm very similar to ones implemented in MINPACK).

75The implementation supports two methods of solving the trust-region problem.

76The first, called 'exact', applies SVD on Jacobian and then solves the problem

77very accurately using the algorithm described in [JJMore]_. It is not

78applicable to large problem. The second, called 'lsmr', uses the 2-D subspace

79approach (sometimes called "indefinite dogleg"), where the problem is solved

80in a subspace spanned by the gradient and the approximate Gauss-Newton step

81found by ``scipy.sparse.linalg.lsmr``. A 2-D trust-region problem is

82reformulated as a 4th order algebraic equation and solved very accurately by

83``numpy.roots``. The subspace approach allows to solve very large problems

84(up to couple of millions of residuals on a regular PC), provided the Jacobian

85matrix is sufficiently sparse.

87References

88----------

89.. [STIR] Branch, M.A., T.F. Coleman, and Y. Li, "A Subspace, Interior,

90 and Conjugate Gradient Method for Large-Scale Bound-Constrained

91 Minimization Problems," SIAM Journal on Scientific Computing,

92 Vol. 21, Number 1, pp 1-23, 1999.

93.. [JJMore] More, J. J., "The Levenberg-Marquardt Algorithm: Implementation

94 and Theory," Numerical Analysis, ed. G. A. Watson, Lecture

95"""

96import numpy as np

97from numpy.linalg import norm

98from scipy.linalg import svd, qr

99from scipy.sparse.linalg import lsmr

100from scipy.optimize import OptimizeResult

101

102from .common import (

103 step_size_to_bound, find_active_constraints, in_bounds,

104 make_strictly_feasible, intersect_trust_region, solve_lsq_trust_region,

105 solve_trust_region_2d, minimize_quadratic_1d, build_quadratic_1d,

106 evaluate_quadratic, right_multiplied_operator, regularized_lsq_operator,

107 CL_scaling_vector, compute_grad, compute_jac_scale, check_termination,

108 update_tr_radius, scale_for_robust_loss_function, print_header_nonlinear,

109 print_iteration_nonlinear)

110

111

112def trf(fun, jac, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev, x_scale,

113 loss_function, tr_solver, tr_options, verbose):

114 # For efficiency, it makes sense to run the simplified version of the

115 # algorithm when no bounds are imposed. We decided to write the two

116 # separate functions. It violates the DRY principle, but the individual

117 # functions are kept the most readable.

118 if np.all(lb == -np.inf) and np.all(ub == np.inf):

119 return trf_no_bounds(

120 fun, jac, x0, f0, J0, ftol, xtol, gtol, max_nfev, x_scale,

121 loss_function, tr_solver, tr_options, verbose)

122 else:

123 return trf_bounds(

124 fun, jac, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev, x_scale,

125 loss_function, tr_solver, tr_options, verbose)

126

127

128def select_step(x, J_h, diag_h, g_h, p, p_h, d, Delta, lb, ub, theta):

129 """Select the best step according to Trust Region Reflective algorithm."""

130 if in_bounds(x + p, lb, ub):

131 p_value = evaluate_quadratic(J_h, g_h, p_h, diag=diag_h)

132 return p, p_h, -p_value

133

134 p_stride, hits = step_size_to_bound(x, p, lb, ub)

135

136 # Compute the reflected direction.

137 r_h = np.copy(p_h)

138 r_h[hits.astype(bool)] *= -1

139 r = d * r_h

140

141 # Restrict trust-region step, such that it hits the bound.

142 p *= p_stride

143 p_h *= p_stride

144 x_on_bound = x + p

145

146 # Reflected direction will cross first either feasible region or trust

147 # region boundary.

148 _, to_tr = intersect_trust_region(p_h, r_h, Delta)

149 to_bound, _ = step_size_to_bound(x_on_bound, r, lb, ub)

150

151 # Find lower and upper bounds on a step size along the reflected

152 # direction, considering the strict feasibility requirement. There is no

153 # single correct way to do that, the chosen approach seems to work best

154 # on test problems.

155 r_stride = min(to_bound, to_tr)

156 if r_stride > 0:

157 r_stride_l = (1 - theta) * p_stride / r_stride

158 if r_stride == to_bound:

159 r_stride_u = theta * to_bound

160 else:

161 r_stride_u = to_tr

162 else:

163 r_stride_l = 0

164 r_stride_u = -1

165

166 # Check if reflection step is available.

167 if r_stride_l <= r_stride_u:

168 a, b, c = build_quadratic_1d(J_h, g_h, r_h, s0=p_h, diag=diag_h)

169 r_stride, r_value = minimize_quadratic_1d(

170 a, b, r_stride_l, r_stride_u, c=c)

171 r_h *= r_stride

172 r_h += p_h

173 r = r_h * d

174 else:

175 r_value = np.inf

176

177 # Now correct p_h to make it strictly interior.

178 p *= theta

179 p_h *= theta

180 p_value = evaluate_quadratic(J_h, g_h, p_h, diag=diag_h)

181

182 ag_h = -g_h

183 ag = d * ag_h

184

185 to_tr = Delta / norm(ag_h)

186 to_bound, _ = step_size_to_bound(x, ag, lb, ub)

187 if to_bound < to_tr:

188 ag_stride = theta * to_bound

189 else:

190 ag_stride = to_tr

191

192 a, b = build_quadratic_1d(J_h, g_h, ag_h, diag=diag_h)

193 ag_stride, ag_value = minimize_quadratic_1d(a, b, 0, ag_stride)

194 ag_h *= ag_stride

195 ag *= ag_stride

196

197 if p_value < r_value and p_value < ag_value:

198 return p, p_h, -p_value

199 elif r_value < p_value and r_value < ag_value:

200 return r, r_h, -r_value

201 else:

202 return ag, ag_h, -ag_value

203

204

205def trf_bounds(fun, jac, x0, f0, J0, lb, ub, ftol, xtol, gtol, max_nfev,

206 x_scale, loss_function, tr_solver, tr_options, verbose):

207 x = x0.copy()

208

209 f = f0

210 f_true = f.copy()

211 nfev = 1

212

213 J = J0

214 njev = 1

215 m, n = J.shape

216

217 if loss_function is not None:

218 rho = loss_function(f)

219 cost = 0.5 * np.sum(rho[0])

220 J, f = scale_for_robust_loss_function(J, f, rho)

221 else:

222 cost = 0.5 * np.dot(f, f)

223

224 g = compute_grad(J, f)

225

226 jac_scale = isinstance(x_scale, str) and x_scale == 'jac'

227 if jac_scale:

228 scale, scale_inv = compute_jac_scale(J)

229 else:

230 scale, scale_inv = x_scale, 1 / x_scale

231

232 v, dv = CL_scaling_vector(x, g, lb, ub)

233 v[dv != 0] *= scale_inv[dv != 0]

234 Delta = norm(x0 * scale_inv / v**0.5)

235 if Delta == 0:

236 Delta = 1.0

237

238 g_norm = norm(g * v, ord=np.inf)

239

240 f_augmented = np.zeros((m + n))

241 if tr_solver == 'exact':

242 J_augmented = np.empty((m + n, n))

243 elif tr_solver == 'lsmr':

244 reg_term = 0.0

245 regularize = tr_options.pop('regularize', True)

246

247 if max_nfev is None:

248 max_nfev = x0.size * 100

249

250 alpha = 0.0 # "Levenberg-Marquardt" parameter

251

252 termination_status = None

253 iteration = 0

254 step_norm = None

255 actual_reduction = None

256

257 if verbose == 2:

258 print_header_nonlinear()

259

260 while True:

261 v, dv = CL_scaling_vector(x, g, lb, ub)

262

263 g_norm = norm(g * v, ord=np.inf)

264 if g_norm < gtol:

265 termination_status = 1

266

267 if verbose == 2:

268 print_iteration_nonlinear(iteration, nfev, cost, actual_reduction,

269 step_norm, g_norm)

270

271 if termination_status is not None or nfev == max_nfev:

272 break

273

274 # Now compute variables in "hat" space. Here, we also account for

275 # scaling introduced by `x_scale` parameter. This part is a bit tricky,

276 # you have to write down the formulas and see how the trust-region

277 # problem is formulated when the two types of scaling are applied.

278 # The idea is that first we apply `x_scale` and then apply Coleman-Li

279 # approach in the new variables.

280

281 # v is recomputed in the variables after applying `x_scale`, note that

282 # components which were identically 1 not affected.

283 v[dv != 0] *= scale_inv[dv != 0]

284

285 # Here, we apply two types of scaling.

286 d = v**0.5 * scale

287

288 # C = diag(g * scale) Jv

289 diag_h = g * dv * scale

290

291 # After all this has been done, we continue normally.

292

293 # "hat" gradient.

294 g_h = d * g

295

296 f_augmented[:m] = f

297 if tr_solver == 'exact':

298 J_augmented[:m] = J * d

299 J_h = J_augmented[:m] # Memory view.

300 J_augmented[m:] = np.diag(diag_h**0.5)

301 U, s, V = svd(J_augmented, full_matrices=False)

302 V = V.T

303 uf = U.T.dot(f_augmented)

304 elif tr_solver == 'lsmr':

305 J_h = right_multiplied_operator(J, d)

306

307 if regularize:

308 a, b = build_quadratic_1d(J_h, g_h, -g_h, diag=diag_h)

309 to_tr = Delta / norm(g_h)

310 ag_value = minimize_quadratic_1d(a, b, 0, to_tr)[1]

311 reg_term = -ag_value / Delta**2

312

313 lsmr_op = regularized_lsq_operator(J_h, (diag_h + reg_term)**0.5)

314 gn_h = lsmr(lsmr_op, f_augmented, **tr_options)[0]

315 S = np.vstack((g_h, gn_h)).T

316 S, _ = qr(S, mode='economic')

317 JS = J_h.dot(S) # LinearOperator does dot too.

318 B_S = np.dot(JS.T, JS) + np.dot(S.T * diag_h, S)

319 g_S = S.T.dot(g_h)

320

321 # theta controls step back step ratio from the bounds.

322 theta = max(0.995, 1 - g_norm)

323

324 actual_reduction = -1

325 while actual_reduction <= 0 and nfev < max_nfev:

326 if tr_solver == 'exact':

327 p_h, alpha, n_iter = solve_lsq_trust_region(

328 n, m, uf, s, V, Delta, initial_alpha=alpha)

329 elif tr_solver == 'lsmr':

330 p_S, _ = solve_trust_region_2d(B_S, g_S, Delta)

331 p_h = S.dot(p_S)

332

333 p = d * p_h # Trust-region solution in the original space.

334 step, step_h, predicted_reduction = select_step(

335 x, J_h, diag_h, g_h, p, p_h, d, Delta, lb, ub, theta)

336

337 x_new = make_strictly_feasible(x + step, lb, ub, rstep=0)

338 f_new = fun(x_new)

339 nfev += 1

340

341 step_h_norm = norm(step_h)

342

343 if not np.all(np.isfinite(f_new)):

344 Delta = 0.25 * step_h_norm

345 continue

346

347 # Usual trust-region step quality estimation.

348 if loss_function is not None:

349 cost_new = loss_function(f_new, cost_only=True)

350 else:

351 cost_new = 0.5 * np.dot(f_new, f_new)

352 actual_reduction = cost - cost_new

353 Delta_new, ratio = update_tr_radius(

354 Delta, actual_reduction, predicted_reduction,

355 step_h_norm, step_h_norm > 0.95 * Delta)

356

357 step_norm = norm(step)

358 termination_status = check_termination(

359 actual_reduction, cost, step_norm, norm(x), ratio, ftol, xtol)

360 if termination_status is not None:

361 break

362

363 alpha *= Delta / Delta_new

364 Delta = Delta_new

365

366 if actual_reduction > 0:

367 x = x_new

368

369 f = f_new

370 f_true = f.copy()

371

372 cost = cost_new

373

374 J = jac(x, f)

375 njev += 1

376

377 if loss_function is not None:

378 rho = loss_function(f)

379 J, f = scale_for_robust_loss_function(J, f, rho)

380

381 g = compute_grad(J, f)

382

383 if jac_scale:

384 scale, scale_inv = compute_jac_scale(J, scale_inv)

385 else:

386 step_norm = 0

387 actual_reduction = 0

388

389 iteration += 1

390

391 if termination_status is None:

392 termination_status = 0

393

394 active_mask = find_active_constraints(x, lb, ub, rtol=xtol)

395 return OptimizeResult(

396 x=x, cost=cost, fun=f_true, jac=J, grad=g, optimality=g_norm,

397 active_mask=active_mask, nfev=nfev, njev=njev,

398 status=termination_status)

399

400

401def trf_no_bounds(fun, jac, x0, f0, J0, ftol, xtol, gtol, max_nfev,

402 x_scale, loss_function, tr_solver, tr_options, verbose):

403 x = x0.copy()

404

405 f = f0

406 f_true = f.copy()

407 nfev = 1

408

409 J = J0

410 njev = 1

411 m, n = J.shape

412

413 if loss_function is not None:

414 rho = loss_function(f)

415 cost = 0.5 * np.sum(rho[0])

416 J, f = scale_for_robust_loss_function(J, f, rho)

417 else:

418 cost = 0.5 * np.dot(f, f)

419

420 g = compute_grad(J, f)

421

422 jac_scale = isinstance(x_scale, str) and x_scale == 'jac'

423 if jac_scale:

424 scale, scale_inv = compute_jac_scale(J)

425 else:

426 scale, scale_inv = x_scale, 1 / x_scale

427

428 Delta = norm(x0 * scale_inv)

429 if Delta == 0:

430 Delta = 1.0

431

432 if tr_solver == 'lsmr':

433 reg_term = 0

434 damp = tr_options.pop('damp', 0.0)

435 regularize = tr_options.pop('regularize', True)

436

437 if max_nfev is None:

438 max_nfev = x0.size * 100

439

440 alpha = 0.0 # "Levenberg-Marquardt" parameter

441

442 termination_status = None

443 iteration = 0

444 step_norm = None

445 actual_reduction = None

446

447 if verbose == 2:

448 print_header_nonlinear()

449

450 while True:

451 g_norm = norm(g, ord=np.inf)

452 if g_norm < gtol:

453 termination_status = 1

454

455 if verbose == 2:

456 print_iteration_nonlinear(iteration, nfev, cost, actual_reduction,

457 step_norm, g_norm)

458

459 if termination_status is not None or nfev == max_nfev:

460 break

461

462 d = scale

463 g_h = d * g

464

465 if tr_solver == 'exact':

466 J_h = J * d

467 U, s, V = svd(J_h, full_matrices=False)

468 V = V.T

469 uf = U.T.dot(f)

470 elif tr_solver == 'lsmr':

471 J_h = right_multiplied_operator(J, d)

472

473 if regularize:

474 a, b = build_quadratic_1d(J_h, g_h, -g_h)

475 to_tr = Delta / norm(g_h)

476 ag_value = minimize_quadratic_1d(a, b, 0, to_tr)[1]

477 reg_term = -ag_value / Delta**2

478

479 damp_full = (damp**2 + reg_term)**0.5

480 gn_h = lsmr(J_h, f, damp=damp_full, **tr_options)[0]

481 S = np.vstack((g_h, gn_h)).T

482 S, _ = qr(S, mode='economic')

483 JS = J_h.dot(S)

484 B_S = np.dot(JS.T, JS)

485 g_S = S.T.dot(g_h)

486

487 actual_reduction = -1

488 while actual_reduction <= 0 and nfev < max_nfev:

489 if tr_solver == 'exact':

490 step_h, alpha, n_iter = solve_lsq_trust_region(

491 n, m, uf, s, V, Delta, initial_alpha=alpha)

492 elif tr_solver == 'lsmr':

493 p_S, _ = solve_trust_region_2d(B_S, g_S, Delta)

494 step_h = S.dot(p_S)

495

496 predicted_reduction = -evaluate_quadratic(J_h, g_h, step_h)

497 step = d * step_h

498 x_new = x + step

499 f_new = fun(x_new)

500 nfev += 1

501

502 step_h_norm = norm(step_h)

503

504 if not np.all(np.isfinite(f_new)):

505 Delta = 0.25 * step_h_norm

506 continue

507

508 # Usual trust-region step quality estimation.

509 if loss_function is not None:

510 cost_new = loss_function(f_new, cost_only=True)

511 else:

512 cost_new = 0.5 * np.dot(f_new, f_new)

513 actual_reduction = cost - cost_new

514

515 Delta_new, ratio = update_tr_radius(

516 Delta, actual_reduction, predicted_reduction,

517 step_h_norm, step_h_norm > 0.95 * Delta)

518

519 step_norm = norm(step)

520 termination_status = check_termination(

521 actual_reduction, cost, step_norm, norm(x), ratio, ftol, xtol)

522 if termination_status is not None:

523 break

524

525 alpha *= Delta / Delta_new

526 Delta = Delta_new

527

528 if actual_reduction > 0:

529 x = x_new

530

531 f = f_new

532 f_true = f.copy()

533

534 cost = cost_new

535

536 J = jac(x, f)

537 njev += 1

538

539 if loss_function is not None:

540 rho = loss_function(f)

541 J, f = scale_for_robust_loss_function(J, f, rho)

542

543 g = compute_grad(J, f)

544

545 if jac_scale:

546 scale, scale_inv = compute_jac_scale(J, scale_inv)

547 else:

548 step_norm = 0

549 actual_reduction = 0

550

551 iteration += 1

552

553 if termination_status is None:

554 termination_status = 0

555

556 active_mask = np.zeros_like(x)

557 return OptimizeResult(

558 x=x, cost=cost, fun=f_true, jac=J, grad=g, optimality=g_norm,

559 active_mask=active_mask, nfev=nfev, njev=njev,

560 status=termination_status)

Coverage for /pythoncovmergedfiles/medio/medio/usr/local/lib/python3.8/site-packages/scipy/optimize/_lsq/trf.py: 3%

287 statements