Coverage for /pythoncovmergedfiles/medio/medio/usr/local/lib/python3.8/site-packages/dulwich/line_ending.py: 62%

Shortcuts on this page

r m x   toggle line displays

j k   next/prev highlighted chunk

0   (zero) top of page

1   (one) first highlighted chunk

64 statements  

1# line_ending.py -- Line ending conversion functions 

2# Copyright (C) 2018-2018 Boris Feld <boris.feld@comet.ml> 

3# 

4# Dulwich is dual-licensed under the Apache License, Version 2.0 and the GNU 

5# General Public License as public by the Free Software Foundation; version 2.0 

6# or (at your option) any later version. You can redistribute it and/or 

7# modify it under the terms of either of these two licenses. 

8# 

9# Unless required by applicable law or agreed to in writing, software 

10# distributed under the License is distributed on an "AS IS" BASIS, 

11# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 

12# See the License for the specific language governing permissions and 

13# limitations under the License. 

14# 

15# You should have received a copy of the licenses; if not, see 

16# <http://www.gnu.org/licenses/> for a copy of the GNU General Public License 

17# and <http://www.apache.org/licenses/LICENSE-2.0> for a copy of the Apache 

18# License, Version 2.0. 

19# 

20r"""All line-ending related functions, from conversions to config processing. 

21 

22Line-ending normalization is a complex beast. Here is some notes and details 

23about how it seems to work. 

24 

25The normalization is a two-fold process that happens at two moments: 

26 

27- When reading a file from the index and to the working directory. For example 

28 when doing a ``git clone`` or ``git checkout`` call. We call this process the 

29 read filter in this module. 

30- When writing a file to the index from the working directory. For example 

31 when doing a ``git add`` call. We call this process the write filter in this 

32 module. 

33 

34Note that when checking status (getting unstaged changes), whether or not 

35normalization is done on write depends on whether or not the file in the 

36working dir has also been normalized on read: 

37 

38- For autocrlf=true all files are always normalized on both read and write. 

39- For autocrlf=input files are only normalized on write if they are newly 

40 "added". Since files which are already committed are not normalized on 

41 checkout into the working tree, they are also left alone when staging 

42 modifications into the index. 

43 

44One thing to know is that Git does line-ending normalization only on text 

45files. How does Git know that a file is text? We can either mark a file as a 

46text file, a binary file or ask Git to automatically decides. Git has an 

47heuristic to detect if a file is a text file or a binary file. It seems based 

48on the percentage of non-printable characters in files. 

49 

50The code for this heuristic is here: 

51https://git.kernel.org/pub/scm/git/git.git/tree/convert.c#n46 

52 

53Dulwich have an implementation with a slightly different heuristic, the 

54`dulwich.patch.is_binary` function. 

55 

56The binary detection heuristic implementation is close to the one in JGit: 

57https://github.com/eclipse/jgit/blob/f6873ffe522bbc3536969a3a3546bf9a819b92bf/org.eclipse.jgit/src/org/eclipse/jgit/diff/RawText.java#L300 

58 

59There is multiple variables that impact the normalization. 

60 

61First, a repository can contains a ``.gitattributes`` file (or more than one...) 

62that can further customize the operation on some file patterns, for example: 

63 

64 \*.txt text 

65 

66Force all ``.txt`` files to be treated as text files and to have their lines 

67endings normalized. 

68 

69 \*.jpg -text 

70 

71Force all ``.jpg`` files to be treated as binary files and to not have their 

72lines endings converted. 

73 

74 \*.vcproj text eol=crlf 

75 

76Force all ``.vcproj`` files to be treated as text files and to have their lines 

77endings converted into ``CRLF`` in working directory no matter the native EOL of 

78the platform. 

79 

80 \*.sh text eol=lf 

81 

82Force all ``.sh`` files to be treated as text files and to have their lines 

83endings converted into ``LF`` in working directory no matter the native EOL of 

84the platform. 

85 

86If the ``eol`` attribute is not defined, Git uses the ``core.eol`` configuration 

87value described later. 

88 

89 \* text=auto 

90 

91Force all files to be scanned by the text file heuristic detection and to have 

92their line endings normalized in case they are detected as text files. 

93 

94Git also have a obsolete attribute named ``crlf`` that can be translated to the 

95corresponding text attribute value. 

96 

97Then there are some configuration option (that can be defined at the 

98repository or user level): 

99 

100- core.autocrlf 

101- core.eol 

102 

103``core.autocrlf`` is taken into account for all files that doesn't have a ``text`` 

104attribute defined in ``.gitattributes``; it takes three possible values: 

105 

106 - ``true``: This forces all files on the working directory to have CRLF 

107 line-endings in the working directory and convert line-endings to LF 

108 when writing to the index. When autocrlf is set to true, eol value is 

109 ignored. 

110 - ``input``: Quite similar to the ``true`` value but only force the write 

111 filter, ie line-ending of new files added to the index will get their 

112 line-endings converted to LF. 

113 - ``false`` (default): No normalization is done. 

114 

115``core.eol`` is the top-level configuration to define the line-ending to use 

116when applying the read_filer. It takes three possible values: 

117 

118 - ``lf``: When normalization is done, force line-endings to be ``LF`` in the 

119 working directory. 

120 - ``crlf``: When normalization is done, force line-endings to be ``CRLF`` in 

121 the working directory. 

122 - ``native`` (default): When normalization is done, force line-endings to be 

123 the platform's native line ending. 

124 

125One thing to remember is when line-ending normalization is done on a file, Git 

126always normalize line-ending to ``LF`` when writing to the index. 

127 

128There are sources that seems to indicate that Git won't do line-ending 

129normalization when a file contains mixed line-endings. I think this logic 

130might be in text / binary detection heuristic but couldn't find it yet. 

131 

132Sources: 

133- https://git-scm.com/docs/git-config#git-config-coreeol 

134- https://git-scm.com/docs/git-config#git-config-coreautocrlf 

135- https://git-scm.com/docs/gitattributes#_checking_out_and_checking_in 

136- https://adaptivepatchwork.com/2012/03/01/mind-the-end-of-your-line/ 

137""" 

138 

139from .object_store import iter_tree_contents 

140from .objects import Blob 

141from .patch import is_binary 

142 

143CRLF = b"\r\n" 

144LF = b"\n" 

145 

146 

147def convert_crlf_to_lf(text_hunk): 

148 """Convert CRLF in text hunk into LF. 

149 

150 Args: 

151 text_hunk: A bytes string representing a text hunk 

152 Returns: The text hunk with the same type, with CRLF replaced into LF 

153 """ 

154 return text_hunk.replace(CRLF, LF) 

155 

156 

157def convert_lf_to_crlf(text_hunk): 

158 """Convert LF in text hunk into CRLF. 

159 

160 Args: 

161 text_hunk: A bytes string representing a text hunk 

162 Returns: The text hunk with the same type, with LF replaced into CRLF 

163 """ 

164 # TODO find a more efficient way of doing it 

165 intermediary = text_hunk.replace(CRLF, LF) 

166 return intermediary.replace(LF, CRLF) 

167 

168 

169def get_checkout_filter(core_eol, core_autocrlf, git_attributes): 

170 """Returns the correct checkout filter based on the passed arguments.""" 

171 # TODO this function should process the git_attributes for the path and if 

172 # the text attribute is not defined, fallback on the 

173 # get_checkout_filter_autocrlf function with the autocrlf value 

174 return get_checkout_filter_autocrlf(core_autocrlf) 

175 

176 

177def get_checkin_filter(core_eol, core_autocrlf, git_attributes): 

178 """Returns the correct checkin filter based on the passed arguments.""" 

179 # TODO this function should process the git_attributes for the path and if 

180 # the text attribute is not defined, fallback on the 

181 # get_checkin_filter_autocrlf function with the autocrlf value 

182 return get_checkin_filter_autocrlf(core_autocrlf) 

183 

184 

185def get_checkout_filter_autocrlf(core_autocrlf): 

186 """Returns the correct checkout filter base on autocrlf value. 

187 

188 Args: 

189 core_autocrlf: The bytes configuration value of core.autocrlf. 

190 Valid values are: b'true', b'false' or b'input'. 

191 Returns: Either None if no filter has to be applied or a function 

192 accepting a single argument, a binary text hunk 

193 """ 

194 if core_autocrlf == b"true": 

195 return convert_lf_to_crlf 

196 

197 return None 

198 

199 

200def get_checkin_filter_autocrlf(core_autocrlf): 

201 """Returns the correct checkin filter base on autocrlf value. 

202 

203 Args: 

204 core_autocrlf: The bytes configuration value of core.autocrlf. 

205 Valid values are: b'true', b'false' or b'input'. 

206 Returns: Either None if no filter has to be applied or a function 

207 accepting a single argument, a binary text hunk 

208 """ 

209 if core_autocrlf == b"true" or core_autocrlf == b"input": 

210 return convert_crlf_to_lf 

211 

212 # Checking filter should never be `convert_lf_to_crlf` 

213 return None 

214 

215 

216class BlobNormalizer: 

217 """An object to store computation result of which filter to apply based 

218 on configuration, gitattributes, path and operation (checkin or checkout). 

219 """ 

220 

221 def __init__(self, config_stack, gitattributes) -> None: 

222 self.config_stack = config_stack 

223 self.gitattributes = gitattributes 

224 

225 # Compute which filters we needs based on parameters 

226 try: 

227 core_eol = config_stack.get("core", "eol") 

228 except KeyError: 

229 core_eol = "native" 

230 

231 try: 

232 core_autocrlf = config_stack.get("core", "autocrlf").lower() 

233 except KeyError: 

234 core_autocrlf = False 

235 

236 self.fallback_read_filter = get_checkout_filter( 

237 core_eol, core_autocrlf, self.gitattributes 

238 ) 

239 self.fallback_write_filter = get_checkin_filter( 

240 core_eol, core_autocrlf, self.gitattributes 

241 ) 

242 

243 def checkin_normalize(self, blob, tree_path): 

244 """Normalize a blob during a checkin operation.""" 

245 if self.fallback_write_filter is not None: 

246 return normalize_blob( 

247 blob, self.fallback_write_filter, binary_detection=True 

248 ) 

249 

250 return blob 

251 

252 def checkout_normalize(self, blob, tree_path): 

253 """Normalize a blob during a checkout operation.""" 

254 if self.fallback_read_filter is not None: 

255 return normalize_blob( 

256 blob, self.fallback_read_filter, binary_detection=True 

257 ) 

258 

259 return blob 

260 

261 

262def normalize_blob(blob, conversion, binary_detection): 

263 """Takes a blob as input returns either the original blob if 

264 binary_detection is True and the blob content looks like binary, else 

265 return a new blob with converted data. 

266 """ 

267 # Read the original blob 

268 data = blob.data 

269 

270 # If we need to detect if a file is binary and the file is detected as 

271 # binary, do not apply the conversion function and return the original 

272 # chunked text 

273 if binary_detection is True: 

274 if is_binary(data): 

275 return blob 

276 

277 # Now apply the conversion 

278 converted_data = conversion(data) 

279 

280 new_blob = Blob() 

281 new_blob.data = converted_data 

282 

283 return new_blob 

284 

285 

286class TreeBlobNormalizer(BlobNormalizer): 

287 def __init__(self, config_stack, git_attributes, object_store, tree=None) -> None: 

288 super().__init__(config_stack, git_attributes) 

289 if tree: 

290 self.existing_paths = { 

291 name for name, _, _ in iter_tree_contents(object_store, tree) 

292 } 

293 else: 

294 self.existing_paths = set() 

295 

296 def checkin_normalize(self, blob, tree_path): 

297 # Existing files should only be normalized on checkin if it was 

298 # previously normalized on checkout 

299 if ( 

300 self.fallback_read_filter is not None 

301 or tree_path not in self.existing_paths 

302 ): 

303 return super().checkin_normalize(blob, tree_path) 

304 return blob