Package fr.becpg.common.diff
Class DiffMatchPatch
java.lang.Object
fr.becpg.common.diff.DiffMatchPatch
Class containing the diff, match and patch methods.
Also contains the behaviour settings.
- Author:
- matthieu
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final Patternprivate static final Patternprivate static final floatNumber of seconds to map a diff before giving up (0 for infinity).private static final intHow far to search for a match (0 = exact location, 1000+ = broad match).private static final shortThe number of bits in an int.private static final floatAt what point is no match declared (0.0 = perfection, 1.0 = very loose). -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptiondiffBisect(String text1, String text2, long deadline) Find the 'middle snake' of a diff, split the problem in two and return the recursively constructed diff.diffBisectSplit(String text1, String text2, int x, int y, long deadline) Given the location of the 'middle snake', split the diff in two parts and recurse.private voiddiffCharsToLines(List<Diff> diffs, List<String> lineArray) Rehydrate the dmp in a diff from a string of line hashes to real lines of dmp.private voiddiffCleanupMerge(List<Diff> diffs) Reorder and merge like edit sections.private voiddiffCleanupSemantic(List<Diff> diffs) Reduce the number of edits by eliminating semantically trivial equalities.private voiddiffcleanupSemanticLossless(List<Diff> diffs) Look for single edits surrounded on both sides by equalities which can be shifted sideways to align the edit to a word boundary.private intdiffCleanupSemanticScore(String one, String two) Given two strings, compute a score representing whether the internal boundary falls on logical boundaries.protected intdiffCommonOverlap(String text1, String text2) Determine if the suffix of one string is the prefix of another.private intdiffCommonPrefix(String text1, String text2) Determine the common prefix of two stringsprivate intdiffCommonSuffix(String text1, String text2) Determine the common suffix of two stringsdiffCompute(String text1, String text2, boolean checklines, long deadline) Find the differences between two texts.protected String[]diffHalfMatch(String text1, String text2) Do the two texts share a substring which is at least half the length of the longer dmp?private String[]diffHalfMatchI(String longtext, String shorttext, int i) Does a substring of shorttext exist within longtext such that the substring is at least half the length of longtext?diffLineMode(String text1, String text2, long deadline) Do a quick line-level diff on both strings, then rediff the parts for greater accuracy.diffLinesToChars(String text1, String text2) Split two texts into a list of strings.private StringSplit a dmp into a list of strings.Find the differences between two texts.Find the differences between two texts.Find the differences between two texts.diffPrettyHtml(List<Diff> diffs) Convert a Diff list into a pretty HTML report.matchAlphabet(String pattern) Initialise the alphabet for the Bitap algorithm.protected intmatchbitap(String text, String pattern, int loc) Locate the best instance of 'pattern' in 'dmp' near 'loc' using the Bitap algorithm.private doublematchBitapScore(int e, int x, int loc, String pattern) Compute and return the score for a match with e errors and x location.
-
Field Details
-
DIFFTIMEOUT
private static final float DIFFTIMEOUTNumber of seconds to map a diff before giving up (0 for infinity).- See Also:
-
MATCHTRESHOLD
private static final float MATCHTRESHOLDAt what point is no match declared (0.0 = perfection, 1.0 = very loose).- See Also:
-
MATCHDISTANCE
private static final int MATCHDISTANCEHow far to search for a match (0 = exact location, 1000+ = broad match). A match this many characters away from the expected location will add 1.0 to the score (0.0 is a perfect match).- See Also:
-
MATCHMAXBITS
private static final short MATCHMAXBITSThe number of bits in an int.- See Also:
-
BLANKLINEEND
-
BLANKLINESTART
-
-
Constructor Details
-
DiffMatchPatch
public DiffMatchPatch()
-
-
Method Details
-
diffMain
Find the differences between two texts. Run a faster, slightly less optimal diff. This method allows the 'checklines' of diffmain() to be optional. Most of the time checklines is wanted, so default to true.- Parameters:
text1- Old string to be diffed.text2- New string to be diffed.- Returns:
- Linked List of Diff objects.
- Since:
- 23.2.1.26
-
diffMain
Find the differences between two texts.- Parameters:
text1- Old string to be diffed.text2- New string to be diffed.checklines- Speedup flag. If false, then don't run a line-level diff first to identify the changed areas. If true, then run a faster slightly less optimal diff.- Returns:
- Linked List of Diff objects.
-
diffMain
Find the differences between two texts. Simplifies the problem by stripping any common prefix or suffix off the texts before diffing.- Parameters:
text1- Old string to be diffed.text2- New string to be diffed.checklines- Speedup flag. If false, then don't run a line-level diff first to identify the changed areas. If true, then run a faster slightly less optimal diff.deadline- Time when the diff should be complete by. Used internally for recursive calls. Users should set DiffTimeout instead.- Returns:
- Linked List of Diff objects.
-
diffCompute
Find the differences between two texts. Assumes that the texts do not have any common prefix or suffix.- Parameters:
text1- Old string to be diffed.text2- New string to be diffed.checklines- Speedup flag. If false, then don't run a line-level diff first to identify the changed areas. If true, then run a faster slightly less optimal diff.deadline- Time when the diff should be complete by.- Returns:
- Linked List of Diff objects.
-
diffLineMode
Do a quick line-level diff on both strings, then rediff the parts for greater accuracy. This speedup can produce non-minimal diffs.- Parameters:
text1- Old string to be diffed.text2- New string to be diffed.deadline- Time when the diff should be complete by.- Returns:
- Linked List of Diff objects.
-
diffBisect
Find the 'middle snake' of a diff, split the problem in two and return the recursively constructed diff. See Myers 1986 paper: An O(ND) Difference Algorithm and Its Variations.- Parameters:
text1- Old string to be diffed.text2- New string to be diffed.deadline- Time at which to bail if not yet complete.- Returns:
- List of Diff objects.
- Since:
- 23.2.1.26
-
diffBisectSplit
Given the location of the 'middle snake', split the diff in two parts and recurse.- Parameters:
text1- Old string to be diffed.text2- New string to be diffed.x- Index of split point in text1.y- Index of split point in text2.deadline- Time at which to bail if not yet complete.- Returns:
- List of Diff objects.
-
diffLinesToChars
Split two texts into a list of strings. Reduce the texts to a string of hashes where each Unicode character represents one line.- Parameters:
text1- First string.text2- Second string.- Returns:
- An object containing the encoded text1, the encoded text2 and the List of unique strings. The zeroth element of the List of unique strings is intentionally blank.
-
diffLinesToCharsMunge
private String diffLinesToCharsMunge(String text, List<String> lineArray, Map<String, Integer> lineHash) Split a dmp into a list of strings. Reduce the texts to a string of hashes where each Unicode character represents one line.- Parameters:
text- String to encode.lineArray- List of unique strings.lineHash- Map of strings to indices.- Returns:
- Encoded string.
-
diffCharsToLines
Rehydrate the dmp in a diff from a string of line hashes to real lines of dmp.- Parameters:
diffs- List of Diff objects.lineArray- List of unique strings.
-
diffCommonPrefix
Determine the common prefix of two strings- Parameters:
text1- First string.text2- Second string.- Returns:
- The number of characters common to the start of each string.
-
diffCommonSuffix
Determine the common suffix of two strings- Parameters:
text1- First string.text2- Second string.- Returns:
- The number of characters common to the end of each string.
-
diffCommonOverlap
Determine if the suffix of one string is the prefix of another.- Parameters:
text1- First string.text2- Second string.- Returns:
- The number of characters common to the end of the first string and the start of the second string.
- Since:
- 23.2.1.26
-
diffHalfMatch
Do the two texts share a substring which is at least half the length of the longer dmp? This speedup can produce non-minimal diffs.- Parameters:
text1- First string.text2- Second string.- Returns:
- Five element String array, containing the prefix of text1, the suffix of text1, the prefix of text2, the suffix of text2 and the common middle. Or null if there was no match.
- Since:
- 23.2.1.26
-
diffHalfMatchI
Does a substring of shorttext exist within longtext such that the substring is at least half the length of longtext?- Parameters:
longtext- Longer string.shorttext- Shorter string.i- Start index of quarter length substring within longtext.- Returns:
- Five element String array, containing the prefix of longtext, the suffix of longtext, the prefix of shorttext, the suffix of shorttext and the common middle. Or null if there was no match.
-
diffCleanupSemantic
Reduce the number of edits by eliminating semantically trivial equalities.- Parameters:
diffs- List of Diff objects.
-
diffcleanupSemanticLossless
Look for single edits surrounded on both sides by equalities which can be shifted sideways to align the edit to a word boundary. e.g: The c<ins>at c</ins>ame. -> The <ins>cat </ins>came.- Parameters:
diffs- List of Diff objects.
-
diffCleanupSemanticScore
Given two strings, compute a score representing whether the internal boundary falls on logical boundaries. Scores range from 6 (best) to 0 (worst).- Parameters:
one- First string.two- Second string.- Returns:
- The score.
-
diffCleanupMerge
Reorder and merge like edit sections. Merge equalities. Any edit section can move as long as it doesn't cross an equality.- Parameters:
diffs- List of Diff objects.
-
diffPrettyHtml
Convert a Diff list into a pretty HTML report.- Parameters:
diffs- List of Diff objects.- Returns:
- HTML representation.
-
matchbitap
Locate the best instance of 'pattern' in 'dmp' near 'loc' using the Bitap algorithm. Returns -1 if no match found.- Parameters:
text- The dmp to search.pattern- The pattern to search for.loc- The location to search around.- Returns:
- Best match index or -1.
- Since:
- 23.2.1.26
-
matchBitapScore
Compute and return the score for a match with e errors and x location.- Parameters:
e- Number of errors in match.x- Location of match.loc- Expected location of match.pattern- Pattern being sought.- Returns:
- Overall score for match (0.0 = good, 1.0 = bad).
-
matchAlphabet
Initialise the alphabet for the Bitap algorithm.- Parameters:
pattern- The dmp to encode.- Returns:
- Hash of character locations.
- Since:
- 23.2.1.26
-