Coverage for /pythoncovmergedfiles/medio/medio/usr/local/lib/python3.11/site-packages/numpy/lib/_format_impl.py: 17%

Shortcuts on this page

r m x   toggle line displays

j k   next/prev highlighted chunk

0   (zero) top of page

1   (one) first highlighted chunk

309 statements  

1""" 

2Binary serialization 

3 

4NPY format 

5========== 

6 

7A simple format for saving numpy arrays to disk with the full 

8information about them. 

9 

10The ``.npy`` format is the standard binary file format in NumPy for 

11persisting a *single* arbitrary NumPy array on disk. The format stores all 

12of the shape and dtype information necessary to reconstruct the array 

13correctly even on another machine with a different architecture. 

14The format is designed to be as simple as possible while achieving 

15its limited goals. 

16 

17The ``.npz`` format is the standard format for persisting *multiple* NumPy 

18arrays on disk. A ``.npz`` file is a zip file containing multiple ``.npy`` 

19files, one for each array. 

20 

21Capabilities 

22------------ 

23 

24- Can represent all NumPy arrays including nested record arrays and 

25 object arrays. 

26 

27- Represents the data in its native binary form. 

28 

29- Supports Fortran-contiguous arrays directly. 

30 

31- Stores all of the necessary information to reconstruct the array 

32 including shape and dtype on a machine of a different 

33 architecture. Both little-endian and big-endian arrays are 

34 supported, and a file with little-endian numbers will yield 

35 a little-endian array on any machine reading the file. The 

36 types are described in terms of their actual sizes. For example, 

37 if a machine with a 64-bit C "long int" writes out an array with 

38 "long ints", a reading machine with 32-bit C "long ints" will yield 

39 an array with 64-bit integers. 

40 

41- Is straightforward to reverse engineer. Datasets often live longer than 

42 the programs that created them. A competent developer should be 

43 able to create a solution in their preferred programming language to 

44 read most ``.npy`` files that they have been given without much 

45 documentation. 

46 

47- Allows memory-mapping of the data. See `open_memmap`. 

48 

49- Can be read from a filelike stream object instead of an actual file. 

50 

51- Stores object arrays, i.e. arrays containing elements that are arbitrary 

52 Python objects. Files with object arrays are not to be mmapable, but 

53 can be read and written to disk. 

54 

55Limitations 

56----------- 

57 

58- Arbitrary subclasses of numpy.ndarray are not completely preserved. 

59 Subclasses will be accepted for writing, but only the array data will 

60 be written out. A regular numpy.ndarray object will be created 

61 upon reading the file. 

62 

63.. warning:: 

64 

65 Due to limitations in the interpretation of structured dtypes, dtypes 

66 with fields with empty names will have the names replaced by 'f0', 'f1', 

67 etc. Such arrays will not round-trip through the format entirely 

68 accurately. The data is intact; only the field names will differ. We are 

69 working on a fix for this. This fix will not require a change in the 

70 file format. The arrays with such structures can still be saved and 

71 restored, and the correct dtype may be restored by using the 

72 ``loadedarray.view(correct_dtype)`` method. 

73 

74File extensions 

75--------------- 

76 

77We recommend using the ``.npy`` and ``.npz`` extensions for files saved 

78in this format. This is by no means a requirement; applications may wish 

79to use these file formats but use an extension specific to the 

80application. In the absence of an obvious alternative, however, 

81we suggest using ``.npy`` and ``.npz``. 

82 

83Version numbering 

84----------------- 

85 

86The version numbering of these formats is independent of NumPy version 

87numbering. If the format is upgraded, the code in `numpy.io` will still 

88be able to read and write Version 1.0 files. 

89 

90Format Version 1.0 

91------------------ 

92 

93The first 6 bytes are a magic string: exactly ``\\x93NUMPY``. 

94 

95The next 1 byte is an unsigned byte: the major version number of the file 

96format, e.g. ``\\x01``. 

97 

98The next 1 byte is an unsigned byte: the minor version number of the file 

99format, e.g. ``\\x00``. Note: the version of the file format is not tied 

100to the version of the numpy package. 

101 

102The next 2 bytes form a little-endian unsigned short int: the length of 

103the header data HEADER_LEN. 

104 

105The next HEADER_LEN bytes form the header data describing the array's 

106format. It is an ASCII string which contains a Python literal expression 

107of a dictionary. It is terminated by a newline (``\\n``) and padded with 

108spaces (``\\x20``) to make the total of 

109``len(magic string) + 2 + len(length) + HEADER_LEN`` be evenly divisible 

110by 64 for alignment purposes. 

111 

112The dictionary contains three keys: 

113 

114 "descr" : dtype.descr 

115 An object that can be passed as an argument to the `numpy.dtype` 

116 constructor to create the array's dtype. 

117 "fortran_order" : bool 

118 Whether the array data is Fortran-contiguous or not. Since 

119 Fortran-contiguous arrays are a common form of non-C-contiguity, 

120 we allow them to be written directly to disk for efficiency. 

121 "shape" : tuple of int 

122 The shape of the array. 

123 

124For repeatability and readability, the dictionary keys are sorted in 

125alphabetic order. This is for convenience only. A writer SHOULD implement 

126this if possible. A reader MUST NOT depend on this. 

127 

128Following the header comes the array data. If the dtype contains Python 

129objects (i.e. ``dtype.hasobject is True``), then the data is a Python 

130pickle of the array. Otherwise the data is the contiguous (either C- 

131or Fortran-, depending on ``fortran_order``) bytes of the array. 

132Consumers can figure out the number of bytes by multiplying the number 

133of elements given by the shape (noting that ``shape=()`` means there is 

1341 element) by ``dtype.itemsize``. 

135 

136Format Version 2.0 

137------------------ 

138 

139The version 1.0 format only allowed the array header to have a total size of 

14065535 bytes. This can be exceeded by structured arrays with a large number of 

141columns. The version 2.0 format extends the header size to 4 GiB. 

142`numpy.save` will automatically save in 2.0 format if the data requires it, 

143else it will always use the more compatible 1.0 format. 

144 

145The description of the fourth element of the header therefore has become: 

146"The next 4 bytes form a little-endian unsigned int: the length of the header 

147data HEADER_LEN." 

148 

149Format Version 3.0 

150------------------ 

151 

152This version replaces the ASCII string (which in practice was latin1) with 

153a utf8-encoded string, so supports structured types with any unicode field 

154names. 

155 

156Notes 

157----- 

158The ``.npy`` format, including motivation for creating it and a comparison of 

159alternatives, is described in the 

160:doc:`"npy-format" NEP <neps:nep-0001-npy-format>`, however details have 

161evolved with time and this document is more current. 

162 

163""" 

164import io 

165import os 

166import pickle 

167import warnings 

168 

169import numpy 

170from numpy._utils import set_module 

171from numpy.lib._utils_impl import drop_metadata 

172 

173__all__ = [] 

174 

175drop_metadata.__module__ = "numpy.lib.format" 

176 

177EXPECTED_KEYS = {'descr', 'fortran_order', 'shape'} 

178MAGIC_PREFIX = b'\x93NUMPY' 

179MAGIC_LEN = len(MAGIC_PREFIX) + 2 

180ARRAY_ALIGN = 64 # plausible values are powers of 2 between 16 and 4096 

181BUFFER_SIZE = 2**18 # size of buffer for reading npz files in bytes 

182# allow growth within the address space of a 64 bit machine along one axis 

183GROWTH_AXIS_MAX_DIGITS = 21 # = len(str(8*2**64-1)) hypothetical int1 dtype 

184 

185# difference between version 1.0 and 2.0 is a 4 byte (I) header length 

186# instead of 2 bytes (H) allowing storage of large structured arrays 

187_header_size_info = { 

188 (1, 0): ('<H', 'latin1'), 

189 (2, 0): ('<I', 'latin1'), 

190 (3, 0): ('<I', 'utf8'), 

191} 

192 

193# Python's literal_eval is not actually safe for large inputs, since parsing 

194# may become slow or even cause interpreter crashes. 

195# This is an arbitrary, low limit which should make it safe in practice. 

196_MAX_HEADER_SIZE = 10000 

197 

198 

199def _check_version(version): 

200 if version not in [(1, 0), (2, 0), (3, 0), None]: 

201 msg = "we only support format version (1,0), (2,0), and (3,0), not %s" 

202 raise ValueError(msg % (version,)) 

203 

204 

205@set_module("numpy.lib.format") 

206def magic(major, minor): 

207 """ Return the magic string for the given file format version. 

208 

209 Parameters 

210 ---------- 

211 major : int in [0, 255] 

212 minor : int in [0, 255] 

213 

214 Returns 

215 ------- 

216 magic : str 

217 

218 Raises 

219 ------ 

220 ValueError if the version cannot be formatted. 

221 """ 

222 if major < 0 or major > 255: 

223 raise ValueError("major version must be 0 <= major < 256") 

224 if minor < 0 or minor > 255: 

225 raise ValueError("minor version must be 0 <= minor < 256") 

226 return MAGIC_PREFIX + bytes([major, minor]) 

227 

228 

229@set_module("numpy.lib.format") 

230def read_magic(fp): 

231 """ Read the magic string to get the version of the file format. 

232 

233 Parameters 

234 ---------- 

235 fp : filelike object 

236 

237 Returns 

238 ------- 

239 major : int 

240 minor : int 

241 """ 

242 magic_str = _read_bytes(fp, MAGIC_LEN, "magic string") 

243 if magic_str[:-2] != MAGIC_PREFIX: 

244 msg = "the magic string is not correct; expected %r, got %r" 

245 raise ValueError(msg % (MAGIC_PREFIX, magic_str[:-2])) 

246 major, minor = magic_str[-2:] 

247 return major, minor 

248 

249 

250@set_module("numpy.lib.format") 

251def dtype_to_descr(dtype): 

252 """ 

253 Get a serializable descriptor from the dtype. 

254 

255 The .descr attribute of a dtype object cannot be round-tripped through 

256 the dtype() constructor. Simple types, like dtype('float32'), have 

257 a descr which looks like a record array with one field with '' as 

258 a name. The dtype() constructor interprets this as a request to give 

259 a default name. Instead, we construct descriptor that can be passed to 

260 dtype(). 

261 

262 Parameters 

263 ---------- 

264 dtype : dtype 

265 The dtype of the array that will be written to disk. 

266 

267 Returns 

268 ------- 

269 descr : object 

270 An object that can be passed to `numpy.dtype()` in order to 

271 replicate the input dtype. 

272 

273 """ 

274 # NOTE: that drop_metadata may not return the right dtype e.g. for user 

275 # dtypes. In that case our code below would fail the same, though. 

276 new_dtype = drop_metadata(dtype) 

277 if new_dtype is not dtype: 

278 warnings.warn("metadata on a dtype is not saved to an npy/npz. " 

279 "Use another format (such as pickle) to store it.", 

280 UserWarning, stacklevel=2) 

281 dtype = new_dtype 

282 

283 if dtype.names is not None: 

284 # This is a record array. The .descr is fine. XXX: parts of the 

285 # record array with an empty name, like padding bytes, still get 

286 # fiddled with. This needs to be fixed in the C implementation of 

287 # dtype(). 

288 return dtype.descr 

289 elif not type(dtype)._legacy: 

290 # this must be a user-defined dtype since numpy does not yet expose any 

291 # non-legacy dtypes in the public API 

292 # 

293 # non-legacy dtypes don't yet have __array_interface__ 

294 # support. Instead, as a hack, we use pickle to save the array, and lie 

295 # that the dtype is object. When the array is loaded, the descriptor is 

296 # unpickled with the array and the object dtype in the header is 

297 # discarded. 

298 # 

299 # a future NEP should define a way to serialize user-defined 

300 # descriptors and ideally work out the possible security implications 

301 warnings.warn("Custom dtypes are saved as python objects using the " 

302 "pickle protocol. Loading this file requires " 

303 "allow_pickle=True to be set.", 

304 UserWarning, stacklevel=2) 

305 return "|O" 

306 else: 

307 return dtype.str 

308 

309 

310@set_module("numpy.lib.format") 

311def descr_to_dtype(descr): 

312 """ 

313 Returns a dtype based off the given description. 

314 

315 This is essentially the reverse of `~lib.format.dtype_to_descr`. It will 

316 remove the valueless padding fields created by, i.e. simple fields like 

317 dtype('float32'), and then convert the description to its corresponding 

318 dtype. 

319 

320 Parameters 

321 ---------- 

322 descr : object 

323 The object retrieved by dtype.descr. Can be passed to 

324 `numpy.dtype` in order to replicate the input dtype. 

325 

326 Returns 

327 ------- 

328 dtype : dtype 

329 The dtype constructed by the description. 

330 

331 """ 

332 if isinstance(descr, str): 

333 # No padding removal needed 

334 return numpy.dtype(descr) 

335 elif isinstance(descr, tuple): 

336 # subtype, will always have a shape descr[1] 

337 dt = descr_to_dtype(descr[0]) 

338 return numpy.dtype((dt, descr[1])) 

339 

340 titles = [] 

341 names = [] 

342 formats = [] 

343 offsets = [] 

344 offset = 0 

345 for field in descr: 

346 if len(field) == 2: 

347 name, descr_str = field 

348 dt = descr_to_dtype(descr_str) 

349 else: 

350 name, descr_str, shape = field 

351 dt = numpy.dtype((descr_to_dtype(descr_str), shape)) 

352 

353 # Ignore padding bytes, which will be void bytes with '' as name 

354 # Once support for blank names is removed, only "if name == ''" needed) 

355 is_pad = (name == '' and dt.type is numpy.void and dt.names is None) 

356 if not is_pad: 

357 title, name = name if isinstance(name, tuple) else (None, name) 

358 titles.append(title) 

359 names.append(name) 

360 formats.append(dt) 

361 offsets.append(offset) 

362 offset += dt.itemsize 

363 

364 return numpy.dtype({'names': names, 'formats': formats, 'titles': titles, 

365 'offsets': offsets, 'itemsize': offset}) 

366 

367 

368@set_module("numpy.lib.format") 

369def header_data_from_array_1_0(array): 

370 """ Get the dictionary of header metadata from a numpy.ndarray. 

371 

372 Parameters 

373 ---------- 

374 array : numpy.ndarray 

375 

376 Returns 

377 ------- 

378 d : dict 

379 This has the appropriate entries for writing its string representation 

380 to the header of the file. 

381 """ 

382 d = {'shape': array.shape} 

383 if array.flags.c_contiguous: 

384 d['fortran_order'] = False 

385 elif array.flags.f_contiguous: 

386 d['fortran_order'] = True 

387 else: 

388 # Totally non-contiguous data. We will have to make it C-contiguous 

389 # before writing. Note that we need to test for C_CONTIGUOUS first 

390 # because a 1-D array is both C_CONTIGUOUS and F_CONTIGUOUS. 

391 d['fortran_order'] = False 

392 

393 d['descr'] = dtype_to_descr(array.dtype) 

394 return d 

395 

396 

397def _wrap_header(header, version): 

398 """ 

399 Takes a stringified header, and attaches the prefix and padding to it 

400 """ 

401 import struct 

402 assert version is not None 

403 fmt, encoding = _header_size_info[version] 

404 header = header.encode(encoding) 

405 hlen = len(header) + 1 

406 padlen = ARRAY_ALIGN - ((MAGIC_LEN + struct.calcsize(fmt) + hlen) % ARRAY_ALIGN) 

407 try: 

408 header_prefix = magic(*version) + struct.pack(fmt, hlen + padlen) 

409 except struct.error: 

410 msg = f"Header length {hlen} too big for version={version}" 

411 raise ValueError(msg) from None 

412 

413 # Pad the header with spaces and a final newline such that the magic 

414 # string, the header-length short and the header are aligned on a 

415 # ARRAY_ALIGN byte boundary. This supports memory mapping of dtypes 

416 # aligned up to ARRAY_ALIGN on systems like Linux where mmap() 

417 # offset must be page-aligned (i.e. the beginning of the file). 

418 return header_prefix + header + b' ' * padlen + b'\n' 

419 

420 

421def _wrap_header_guess_version(header): 

422 """ 

423 Like `_wrap_header`, but chooses an appropriate version given the contents 

424 """ 

425 try: 

426 return _wrap_header(header, (1, 0)) 

427 except ValueError: 

428 pass 

429 

430 try: 

431 ret = _wrap_header(header, (2, 0)) 

432 except UnicodeEncodeError: 

433 pass 

434 else: 

435 warnings.warn("Stored array in format 2.0. It can only be" 

436 "read by NumPy >= 1.9", UserWarning, stacklevel=2) 

437 return ret 

438 

439 header = _wrap_header(header, (3, 0)) 

440 warnings.warn("Stored array in format 3.0. It can only be " 

441 "read by NumPy >= 1.17", UserWarning, stacklevel=2) 

442 return header 

443 

444 

445def _write_array_header(fp, d, version=None): 

446 """ Write the header for an array and returns the version used 

447 

448 Parameters 

449 ---------- 

450 fp : filelike object 

451 d : dict 

452 This has the appropriate entries for writing its string representation 

453 to the header of the file. 

454 version : tuple or None 

455 None means use oldest that works. Providing an explicit version will 

456 raise a ValueError if the format does not allow saving this data. 

457 Default: None 

458 """ 

459 header = ["{"] 

460 for key, value in sorted(d.items()): 

461 # Need to use repr here, since we eval these when reading 

462 header.append(f"'{key}': {repr(value)}, ") 

463 header.append("}") 

464 header = "".join(header) 

465 

466 # Add some spare space so that the array header can be modified in-place 

467 # when changing the array size, e.g. when growing it by appending data at 

468 # the end. 

469 shape = d['shape'] 

470 header += " " * ((GROWTH_AXIS_MAX_DIGITS - len(repr( 

471 shape[-1 if d['fortran_order'] else 0] 

472 ))) if len(shape) > 0 else 0) 

473 

474 if version is None: 

475 header = _wrap_header_guess_version(header) 

476 else: 

477 header = _wrap_header(header, version) 

478 fp.write(header) 

479 

480 

481@set_module("numpy.lib.format") 

482def write_array_header_1_0(fp, d): 

483 """ Write the header for an array using the 1.0 format. 

484 

485 Parameters 

486 ---------- 

487 fp : filelike object 

488 d : dict 

489 This has the appropriate entries for writing its string 

490 representation to the header of the file. 

491 """ 

492 _write_array_header(fp, d, (1, 0)) 

493 

494 

495@set_module("numpy.lib.format") 

496def write_array_header_2_0(fp, d): 

497 """ Write the header for an array using the 2.0 format. 

498 The 2.0 format allows storing very large structured arrays. 

499 

500 Parameters 

501 ---------- 

502 fp : filelike object 

503 d : dict 

504 This has the appropriate entries for writing its string 

505 representation to the header of the file. 

506 """ 

507 _write_array_header(fp, d, (2, 0)) 

508 

509 

510@set_module("numpy.lib.format") 

511def read_array_header_1_0(fp, max_header_size=_MAX_HEADER_SIZE): 

512 """ 

513 Read an array header from a filelike object using the 1.0 file format 

514 version. 

515 

516 This will leave the file object located just after the header. 

517 

518 Parameters 

519 ---------- 

520 fp : filelike object 

521 A file object or something with a `.read()` method like a file. 

522 

523 Returns 

524 ------- 

525 shape : tuple of int 

526 The shape of the array. 

527 fortran_order : bool 

528 The array data will be written out directly if it is either 

529 C-contiguous or Fortran-contiguous. Otherwise, it will be made 

530 contiguous before writing it out. 

531 dtype : dtype 

532 The dtype of the file's data. 

533 max_header_size : int, optional 

534 Maximum allowed size of the header. Large headers may not be safe 

535 to load securely and thus require explicitly passing a larger value. 

536 See :py:func:`ast.literal_eval()` for details. 

537 

538 Raises 

539 ------ 

540 ValueError 

541 If the data is invalid. 

542 

543 """ 

544 return _read_array_header( 

545 fp, version=(1, 0), max_header_size=max_header_size) 

546 

547 

548@set_module("numpy.lib.format") 

549def read_array_header_2_0(fp, max_header_size=_MAX_HEADER_SIZE): 

550 """ 

551 Read an array header from a filelike object using the 2.0 file format 

552 version. 

553 

554 This will leave the file object located just after the header. 

555 

556 Parameters 

557 ---------- 

558 fp : filelike object 

559 A file object or something with a `.read()` method like a file. 

560 max_header_size : int, optional 

561 Maximum allowed size of the header. Large headers may not be safe 

562 to load securely and thus require explicitly passing a larger value. 

563 See :py:func:`ast.literal_eval()` for details. 

564 

565 Returns 

566 ------- 

567 shape : tuple of int 

568 The shape of the array. 

569 fortran_order : bool 

570 The array data will be written out directly if it is either 

571 C-contiguous or Fortran-contiguous. Otherwise, it will be made 

572 contiguous before writing it out. 

573 dtype : dtype 

574 The dtype of the file's data. 

575 

576 Raises 

577 ------ 

578 ValueError 

579 If the data is invalid. 

580 

581 """ 

582 return _read_array_header( 

583 fp, version=(2, 0), max_header_size=max_header_size) 

584 

585 

586def _filter_header(s): 

587 """Clean up 'L' in npz header ints. 

588 

589 Cleans up the 'L' in strings representing integers. Needed to allow npz 

590 headers produced in Python2 to be read in Python3. 

591 

592 Parameters 

593 ---------- 

594 s : string 

595 Npy file header. 

596 

597 Returns 

598 ------- 

599 header : str 

600 Cleaned up header. 

601 

602 """ 

603 import tokenize 

604 from io import StringIO 

605 

606 tokens = [] 

607 last_token_was_number = False 

608 for token in tokenize.generate_tokens(StringIO(s).readline): 

609 token_type = token[0] 

610 token_string = token[1] 

611 if (last_token_was_number and 

612 token_type == tokenize.NAME and 

613 token_string == "L"): 

614 continue 

615 else: 

616 tokens.append(token) 

617 last_token_was_number = (token_type == tokenize.NUMBER) 

618 return tokenize.untokenize(tokens) 

619 

620 

621def _read_array_header(fp, version, max_header_size=_MAX_HEADER_SIZE): 

622 """ 

623 see read_array_header_1_0 

624 """ 

625 # Read an unsigned, little-endian short int which has the length of the 

626 # header. 

627 import ast 

628 import struct 

629 hinfo = _header_size_info.get(version) 

630 if hinfo is None: 

631 raise ValueError(f"Invalid version {version!r}") 

632 hlength_type, encoding = hinfo 

633 

634 hlength_str = _read_bytes(fp, struct.calcsize(hlength_type), "array header length") 

635 header_length = struct.unpack(hlength_type, hlength_str)[0] 

636 header = _read_bytes(fp, header_length, "array header") 

637 header = header.decode(encoding) 

638 if len(header) > max_header_size: 

639 raise ValueError( 

640 f"Header info length ({len(header)}) is large and may not be safe " 

641 "to load securely.\n" 

642 "To allow loading, adjust `max_header_size` or fully trust " 

643 "the `.npy` file using `allow_pickle=True`.\n" 

644 "For safety against large resource use or crashes, sandboxing " 

645 "may be necessary.") 

646 

647 # The header is a pretty-printed string representation of a literal 

648 # Python dictionary with trailing newlines padded to an ARRAY_ALIGN byte 

649 # boundary. The keys are strings. 

650 # "shape" : tuple of int 

651 # "fortran_order" : bool 

652 # "descr" : dtype.descr 

653 # Versions (2, 0) and (1, 0) could have been created by a Python 2 

654 # implementation before header filtering was implemented. 

655 # 

656 # For performance reasons, we try without _filter_header first though 

657 try: 

658 d = ast.literal_eval(header) 

659 except SyntaxError as e: 

660 if version <= (2, 0): 

661 header = _filter_header(header) 

662 try: 

663 d = ast.literal_eval(header) 

664 except SyntaxError as e2: 

665 msg = "Cannot parse header: {!r}" 

666 raise ValueError(msg.format(header)) from e2 

667 else: 

668 warnings.warn( 

669 "Reading `.npy` or `.npz` file required additional " 

670 "header parsing as it was created on Python 2. Save the " 

671 "file again to speed up loading and avoid this warning.", 

672 UserWarning, stacklevel=4) 

673 else: 

674 msg = "Cannot parse header: {!r}" 

675 raise ValueError(msg.format(header)) from e 

676 if not isinstance(d, dict): 

677 msg = "Header is not a dictionary: {!r}" 

678 raise ValueError(msg.format(d)) 

679 

680 if EXPECTED_KEYS != d.keys(): 

681 keys = sorted(d.keys()) 

682 msg = "Header does not contain the correct keys: {!r}" 

683 raise ValueError(msg.format(keys)) 

684 

685 # Sanity-check the values. 

686 if (not isinstance(d['shape'], tuple) or 

687 not all(isinstance(x, int) for x in d['shape'])): 

688 msg = "shape is not valid: {!r}" 

689 raise ValueError(msg.format(d['shape'])) 

690 if not isinstance(d['fortran_order'], bool): 

691 msg = "fortran_order is not a valid bool: {!r}" 

692 raise ValueError(msg.format(d['fortran_order'])) 

693 try: 

694 dtype = descr_to_dtype(d['descr']) 

695 except TypeError as e: 

696 msg = "descr is not a valid dtype descriptor: {!r}" 

697 raise ValueError(msg.format(d['descr'])) from e 

698 

699 return d['shape'], d['fortran_order'], dtype 

700 

701 

702@set_module("numpy.lib.format") 

703def write_array(fp, array, version=None, allow_pickle=True, pickle_kwargs=None): 

704 """ 

705 Write an array to an NPY file, including a header. 

706 

707 If the array is neither C-contiguous nor Fortran-contiguous AND the 

708 file_like object is not a real file object, this function will have to 

709 copy data in memory. 

710 

711 Parameters 

712 ---------- 

713 fp : file_like object 

714 An open, writable file object, or similar object with a 

715 ``.write()`` method. 

716 array : ndarray 

717 The array to write to disk. 

718 version : (int, int) or None, optional 

719 The version number of the format. None means use the oldest 

720 supported version that is able to store the data. Default: None 

721 allow_pickle : bool, optional 

722 Whether to allow writing pickled data. Default: True 

723 pickle_kwargs : dict, optional 

724 Additional keyword arguments to pass to pickle.dump, excluding 

725 'protocol'. These are only useful when pickling objects in object 

726 arrays to Python 2 compatible format. 

727 

728 Raises 

729 ------ 

730 ValueError 

731 If the array cannot be persisted. This includes the case of 

732 allow_pickle=False and array being an object array. 

733 Various other errors 

734 If the array contains Python objects as part of its dtype, the 

735 process of pickling them may raise various errors if the objects 

736 are not picklable. 

737 

738 """ 

739 _check_version(version) 

740 _write_array_header(fp, header_data_from_array_1_0(array), version) 

741 

742 if array.itemsize == 0: 

743 buffersize = 0 

744 else: 

745 # Set buffer size to 16 MiB to hide the Python loop overhead. 

746 buffersize = max(16 * 1024 ** 2 // array.itemsize, 1) 

747 

748 dtype_class = type(array.dtype) 

749 

750 if array.dtype.hasobject or not dtype_class._legacy: 

751 # We contain Python objects so we cannot write out the data 

752 # directly. Instead, we will pickle it out 

753 if not allow_pickle: 

754 if array.dtype.hasobject: 

755 raise ValueError("Object arrays cannot be saved when " 

756 "allow_pickle=False") 

757 if not dtype_class._legacy: 

758 raise ValueError("User-defined dtypes cannot be saved " 

759 "when allow_pickle=False") 

760 if pickle_kwargs is None: 

761 pickle_kwargs = {} 

762 pickle.dump(array, fp, protocol=4, **pickle_kwargs) 

763 elif array.flags.f_contiguous and not array.flags.c_contiguous: 

764 if isfileobj(fp): 

765 array.T.tofile(fp) 

766 else: 

767 for chunk in numpy.nditer( 

768 array, flags=['external_loop', 'buffered', 'zerosize_ok'], 

769 buffersize=buffersize, order='F'): 

770 fp.write(chunk.tobytes('C')) 

771 elif isfileobj(fp): 

772 array.tofile(fp) 

773 else: 

774 for chunk in numpy.nditer( 

775 array, flags=['external_loop', 'buffered', 'zerosize_ok'], 

776 buffersize=buffersize, order='C'): 

777 fp.write(chunk.tobytes('C')) 

778 

779 

780@set_module("numpy.lib.format") 

781def read_array(fp, allow_pickle=False, pickle_kwargs=None, *, 

782 max_header_size=_MAX_HEADER_SIZE): 

783 """ 

784 Read an array from an NPY file. 

785 

786 Parameters 

787 ---------- 

788 fp : file_like object 

789 If this is not a real file object, then this may take extra memory 

790 and time. 

791 allow_pickle : bool, optional 

792 Whether to allow writing pickled data. Default: False 

793 pickle_kwargs : dict 

794 Additional keyword arguments to pass to pickle.load. These are only 

795 useful when loading object arrays saved on Python 2. 

796 max_header_size : int, optional 

797 Maximum allowed size of the header. Large headers may not be safe 

798 to load securely and thus require explicitly passing a larger value. 

799 See :py:func:`ast.literal_eval()` for details. 

800 This option is ignored when `allow_pickle` is passed. In that case 

801 the file is by definition trusted and the limit is unnecessary. 

802 

803 Returns 

804 ------- 

805 array : ndarray 

806 The array from the data on disk. 

807 

808 Raises 

809 ------ 

810 ValueError 

811 If the data is invalid, or allow_pickle=False and the file contains 

812 an object array. 

813 

814 """ 

815 if allow_pickle: 

816 # Effectively ignore max_header_size, since `allow_pickle` indicates 

817 # that the input is fully trusted. 

818 max_header_size = 2**64 

819 

820 version = read_magic(fp) 

821 _check_version(version) 

822 shape, fortran_order, dtype = _read_array_header( 

823 fp, version, max_header_size=max_header_size) 

824 if len(shape) == 0: 

825 count = 1 

826 else: 

827 count = numpy.multiply.reduce(shape, dtype=numpy.int64) 

828 

829 # Now read the actual data. 

830 if dtype.hasobject: 

831 # The array contained Python objects. We need to unpickle the data. 

832 if not allow_pickle: 

833 raise ValueError("Object arrays cannot be loaded when " 

834 "allow_pickle=False") 

835 if pickle_kwargs is None: 

836 pickle_kwargs = {} 

837 try: 

838 array = pickle.load(fp, **pickle_kwargs) 

839 except UnicodeError as err: 

840 # Friendlier error message 

841 raise UnicodeError("Unpickling a python object failed: %r\n" 

842 "You may need to pass the encoding= option " 

843 "to numpy.load" % (err,)) from err 

844 else: 

845 if isfileobj(fp): 

846 # We can use the fast fromfile() function. 

847 array = numpy.fromfile(fp, dtype=dtype, count=count) 

848 else: 

849 # This is not a real file. We have to read it the 

850 # memory-intensive way. 

851 # crc32 module fails on reads greater than 2 ** 32 bytes, 

852 # breaking large reads from gzip streams. Chunk reads to 

853 # BUFFER_SIZE bytes to avoid issue and reduce memory overhead 

854 # of the read. In non-chunked case count < max_read_count, so 

855 # only one read is performed. 

856 

857 # Use np.ndarray instead of np.empty since the latter does 

858 # not correctly instantiate zero-width string dtypes; see 

859 # https://github.com/numpy/numpy/pull/6430 

860 array = numpy.ndarray(count, dtype=dtype) 

861 

862 if dtype.itemsize > 0: 

863 # If dtype.itemsize == 0 then there's nothing more to read 

864 max_read_count = BUFFER_SIZE // min(BUFFER_SIZE, dtype.itemsize) 

865 

866 for i in range(0, count, max_read_count): 

867 read_count = min(max_read_count, count - i) 

868 read_size = int(read_count * dtype.itemsize) 

869 data = _read_bytes(fp, read_size, "array data") 

870 array[i:i + read_count] = numpy.frombuffer(data, dtype=dtype, 

871 count=read_count) 

872 

873 if array.size != count: 

874 raise ValueError( 

875 "Failed to read all data for array. " 

876 f"Expected {shape} = {count} elements, " 

877 f"could only read {array.size} elements. " 

878 "(file seems not fully written?)" 

879 ) 

880 

881 if fortran_order: 

882 array = array.reshape(shape[::-1]) 

883 array = array.transpose() 

884 else: 

885 array = array.reshape(shape) 

886 

887 return array 

888 

889 

890@set_module("numpy.lib.format") 

891def open_memmap(filename, mode='r+', dtype=None, shape=None, 

892 fortran_order=False, version=None, *, 

893 max_header_size=_MAX_HEADER_SIZE): 

894 """ 

895 Open a .npy file as a memory-mapped array. 

896 

897 This may be used to read an existing file or create a new one. 

898 

899 Parameters 

900 ---------- 

901 filename : str or path-like 

902 The name of the file on disk. This may *not* be a file-like 

903 object. 

904 mode : str, optional 

905 The mode in which to open the file; the default is 'r+'. In 

906 addition to the standard file modes, 'c' is also accepted to mean 

907 "copy on write." See `memmap` for the available mode strings. 

908 dtype : data-type, optional 

909 The data type of the array if we are creating a new file in "write" 

910 mode, if not, `dtype` is ignored. The default value is None, which 

911 results in a data-type of `float64`. 

912 shape : tuple of int 

913 The shape of the array if we are creating a new file in "write" 

914 mode, in which case this parameter is required. Otherwise, this 

915 parameter is ignored and is thus optional. 

916 fortran_order : bool, optional 

917 Whether the array should be Fortran-contiguous (True) or 

918 C-contiguous (False, the default) if we are creating a new file in 

919 "write" mode. 

920 version : tuple of int (major, minor) or None 

921 If the mode is a "write" mode, then this is the version of the file 

922 format used to create the file. None means use the oldest 

923 supported version that is able to store the data. Default: None 

924 max_header_size : int, optional 

925 Maximum allowed size of the header. Large headers may not be safe 

926 to load securely and thus require explicitly passing a larger value. 

927 See :py:func:`ast.literal_eval()` for details. 

928 

929 Returns 

930 ------- 

931 marray : memmap 

932 The memory-mapped array. 

933 

934 Raises 

935 ------ 

936 ValueError 

937 If the data or the mode is invalid. 

938 OSError 

939 If the file is not found or cannot be opened correctly. 

940 

941 See Also 

942 -------- 

943 numpy.memmap 

944 

945 """ 

946 if isfileobj(filename): 

947 raise ValueError("Filename must be a string or a path-like object." 

948 " Memmap cannot use existing file handles.") 

949 

950 if 'w' in mode: 

951 # We are creating the file, not reading it. 

952 # Check if we ought to create the file. 

953 _check_version(version) 

954 # Ensure that the given dtype is an authentic dtype object rather 

955 # than just something that can be interpreted as a dtype object. 

956 dtype = numpy.dtype(dtype) 

957 if dtype.hasobject: 

958 msg = "Array can't be memory-mapped: Python objects in dtype." 

959 raise ValueError(msg) 

960 d = { 

961 "descr": dtype_to_descr(dtype), 

962 "fortran_order": fortran_order, 

963 "shape": shape, 

964 } 

965 # If we got here, then it should be safe to create the file. 

966 with open(os.fspath(filename), mode + 'b') as fp: 

967 _write_array_header(fp, d, version) 

968 offset = fp.tell() 

969 else: 

970 # Read the header of the file first. 

971 with open(os.fspath(filename), 'rb') as fp: 

972 version = read_magic(fp) 

973 _check_version(version) 

974 

975 shape, fortran_order, dtype = _read_array_header( 

976 fp, version, max_header_size=max_header_size) 

977 if dtype.hasobject: 

978 msg = "Array can't be memory-mapped: Python objects in dtype." 

979 raise ValueError(msg) 

980 offset = fp.tell() 

981 

982 if fortran_order: 

983 order = 'F' 

984 else: 

985 order = 'C' 

986 

987 # We need to change a write-only mode to a read-write mode since we've 

988 # already written data to the file. 

989 if mode == 'w+': 

990 mode = 'r+' 

991 

992 marray = numpy.memmap(filename, dtype=dtype, shape=shape, order=order, 

993 mode=mode, offset=offset) 

994 

995 return marray 

996 

997 

998def _read_bytes(fp, size, error_template="ran out of data"): 

999 """ 

1000 Read from file-like object until size bytes are read. 

1001 Raises ValueError if not EOF is encountered before size bytes are read. 

1002 Non-blocking objects only supported if they derive from io objects. 

1003 

1004 Required as e.g. ZipExtFile in python 2.6 can return less data than 

1005 requested. 

1006 """ 

1007 data = b"" 

1008 while True: 

1009 # io files (default in python3) return None or raise on 

1010 # would-block, python2 file will truncate, probably nothing can be 

1011 # done about that. note that regular files can't be non-blocking 

1012 try: 

1013 r = fp.read(size - len(data)) 

1014 data += r 

1015 if len(r) == 0 or len(data) == size: 

1016 break 

1017 except BlockingIOError: 

1018 pass 

1019 if len(data) != size: 

1020 msg = "EOF: reading %s, expected %d bytes got %d" 

1021 raise ValueError(msg % (error_template, size, len(data))) 

1022 else: 

1023 return data 

1024 

1025 

1026@set_module("numpy.lib.format") 

1027def isfileobj(f): 

1028 if not isinstance(f, (io.FileIO, io.BufferedReader, io.BufferedWriter)): 

1029 return False 

1030 try: 

1031 # BufferedReader/Writer may raise OSError when 

1032 # fetching `fileno()` (e.g. when wrapping BytesIO). 

1033 f.fileno() 

1034 return True 

1035 except OSError: 

1036 return False