Class DiffMatchPatch

java.lang.Object
fr.becpg.common.diff.DiffMatchPatch

public class DiffMatchPatch extends Object
Class containing the diff, match and patch methods. Also contains the behaviour settings.
Author:
matthieu
  • Field Details

    • DIFFTIMEOUT

      private static final float DIFFTIMEOUT
      Number of seconds to map a diff before giving up (0 for infinity).
      See Also:
    • MATCHTRESHOLD

      private static final float MATCHTRESHOLD
      At what point is no match declared (0.0 = perfection, 1.0 = very loose).
      See Also:
    • MATCHDISTANCE

      private static final int MATCHDISTANCE
      How far to search for a match (0 = exact location, 1000+ = broad match). A match this many characters away from the expected location will add 1.0 to the score (0.0 is a perfect match).
      See Also:
    • MATCHMAXBITS

      private static final short MATCHMAXBITS
      The number of bits in an int.
      See Also:
    • BLANKLINEEND

      private static final Pattern BLANKLINEEND
    • BLANKLINESTART

      private static final Pattern BLANKLINESTART
  • Constructor Details

    • DiffMatchPatch

      public DiffMatchPatch()
  • Method Details

    • diffMain

      public List<Diff> diffMain(String text1, String text2)
      Find the differences between two texts. Run a faster, slightly less optimal diff. This method allows the 'checklines' of diffmain() to be optional. Most of the time checklines is wanted, so default to true.
      Parameters:
      text1 - Old string to be diffed.
      text2 - New string to be diffed.
      Returns:
      Linked List of Diff objects.
      Since:
      23.2.1.26
    • diffMain

      private List<Diff> diffMain(String text1, String text2, boolean checklines)
      Find the differences between two texts.
      Parameters:
      text1 - Old string to be diffed.
      text2 - New string to be diffed.
      checklines - Speedup flag. If false, then don't run a line-level diff first to identify the changed areas. If true, then run a faster slightly less optimal diff.
      Returns:
      Linked List of Diff objects.
    • diffMain

      private List<Diff> diffMain(String text1, String text2, boolean checklines, long deadline)
      Find the differences between two texts. Simplifies the problem by stripping any common prefix or suffix off the texts before diffing.
      Parameters:
      text1 - Old string to be diffed.
      text2 - New string to be diffed.
      checklines - Speedup flag. If false, then don't run a line-level diff first to identify the changed areas. If true, then run a faster slightly less optimal diff.
      deadline - Time when the diff should be complete by. Used internally for recursive calls. Users should set DiffTimeout instead.
      Returns:
      Linked List of Diff objects.
    • diffCompute

      private List<Diff> diffCompute(String text1, String text2, boolean checklines, long deadline)
      Find the differences between two texts. Assumes that the texts do not have any common prefix or suffix.
      Parameters:
      text1 - Old string to be diffed.
      text2 - New string to be diffed.
      checklines - Speedup flag. If false, then don't run a line-level diff first to identify the changed areas. If true, then run a faster slightly less optimal diff.
      deadline - Time when the diff should be complete by.
      Returns:
      Linked List of Diff objects.
    • diffLineMode

      private List<Diff> diffLineMode(String text1, String text2, long deadline)
      Do a quick line-level diff on both strings, then rediff the parts for greater accuracy. This speedup can produce non-minimal diffs.
      Parameters:
      text1 - Old string to be diffed.
      text2 - New string to be diffed.
      deadline - Time when the diff should be complete by.
      Returns:
      Linked List of Diff objects.
    • diffBisect

      protected List<Diff> diffBisect(String text1, String text2, long deadline)
      Find the 'middle snake' of a diff, split the problem in two and return the recursively constructed diff. See Myers 1986 paper: An O(ND) Difference Algorithm and Its Variations.
      Parameters:
      text1 - Old string to be diffed.
      text2 - New string to be diffed.
      deadline - Time at which to bail if not yet complete.
      Returns:
      List of Diff objects.
      Since:
      23.2.1.26
    • diffBisectSplit

      private List<Diff> diffBisectSplit(String text1, String text2, int x, int y, long deadline)
      Given the location of the 'middle snake', split the diff in two parts and recurse.
      Parameters:
      text1 - Old string to be diffed.
      text2 - New string to be diffed.
      x - Index of split point in text1.
      y - Index of split point in text2.
      deadline - Time at which to bail if not yet complete.
      Returns:
      List of Diff objects.
    • diffLinesToChars

      private DiffMatchPatch.LinesToCharsResult diffLinesToChars(String text1, String text2)
      Split two texts into a list of strings. Reduce the texts to a string of hashes where each Unicode character represents one line.
      Parameters:
      text1 - First string.
      text2 - Second string.
      Returns:
      An object containing the encoded text1, the encoded text2 and the List of unique strings. The zeroth element of the List of unique strings is intentionally blank.
    • diffLinesToCharsMunge

      private String diffLinesToCharsMunge(String text, List<String> lineArray, Map<String,Integer> lineHash)
      Split a dmp into a list of strings. Reduce the texts to a string of hashes where each Unicode character represents one line.
      Parameters:
      text - String to encode.
      lineArray - List of unique strings.
      lineHash - Map of strings to indices.
      Returns:
      Encoded string.
    • diffCharsToLines

      private void diffCharsToLines(List<Diff> diffs, List<String> lineArray)
      Rehydrate the dmp in a diff from a string of line hashes to real lines of dmp.
      Parameters:
      diffs - List of Diff objects.
      lineArray - List of unique strings.
    • diffCommonPrefix

      private int diffCommonPrefix(String text1, String text2)
      Determine the common prefix of two strings
      Parameters:
      text1 - First string.
      text2 - Second string.
      Returns:
      The number of characters common to the start of each string.
    • diffCommonSuffix

      private int diffCommonSuffix(String text1, String text2)
      Determine the common suffix of two strings
      Parameters:
      text1 - First string.
      text2 - Second string.
      Returns:
      The number of characters common to the end of each string.
    • diffCommonOverlap

      protected int diffCommonOverlap(String text1, String text2)
      Determine if the suffix of one string is the prefix of another.
      Parameters:
      text1 - First string.
      text2 - Second string.
      Returns:
      The number of characters common to the end of the first string and the start of the second string.
      Since:
      23.2.1.26
    • diffHalfMatch

      protected String[] diffHalfMatch(String text1, String text2)
      Do the two texts share a substring which is at least half the length of the longer dmp? This speedup can produce non-minimal diffs.
      Parameters:
      text1 - First string.
      text2 - Second string.
      Returns:
      Five element String array, containing the prefix of text1, the suffix of text1, the prefix of text2, the suffix of text2 and the common middle. Or null if there was no match.
      Since:
      23.2.1.26
    • diffHalfMatchI

      private String[] diffHalfMatchI(String longtext, String shorttext, int i)
      Does a substring of shorttext exist within longtext such that the substring is at least half the length of longtext?
      Parameters:
      longtext - Longer string.
      shorttext - Shorter string.
      i - Start index of quarter length substring within longtext.
      Returns:
      Five element String array, containing the prefix of longtext, the suffix of longtext, the prefix of shorttext, the suffix of shorttext and the common middle. Or null if there was no match.
    • diffCleanupSemantic

      private void diffCleanupSemantic(List<Diff> diffs)
      Reduce the number of edits by eliminating semantically trivial equalities.
      Parameters:
      diffs - List of Diff objects.
    • diffcleanupSemanticLossless

      private void diffcleanupSemanticLossless(List<Diff> diffs)
      Look for single edits surrounded on both sides by equalities which can be shifted sideways to align the edit to a word boundary. e.g: The c<ins>at c</ins>ame. -> The <ins>cat </ins>came.
      Parameters:
      diffs - List of Diff objects.
    • diffCleanupSemanticScore

      private int diffCleanupSemanticScore(String one, String two)
      Given two strings, compute a score representing whether the internal boundary falls on logical boundaries. Scores range from 6 (best) to 0 (worst).
      Parameters:
      one - First string.
      two - Second string.
      Returns:
      The score.
    • diffCleanupMerge

      private void diffCleanupMerge(List<Diff> diffs)
      Reorder and merge like edit sections. Merge equalities. Any edit section can move as long as it doesn't cross an equality.
      Parameters:
      diffs - List of Diff objects.
    • diffPrettyHtml

      public String diffPrettyHtml(List<Diff> diffs)
      Convert a Diff list into a pretty HTML report.
      Parameters:
      diffs - List of Diff objects.
      Returns:
      HTML representation.
    • matchbitap

      protected int matchbitap(String text, String pattern, int loc)
      Locate the best instance of 'pattern' in 'dmp' near 'loc' using the Bitap algorithm. Returns -1 if no match found.
      Parameters:
      text - The dmp to search.
      pattern - The pattern to search for.
      loc - The location to search around.
      Returns:
      Best match index or -1.
      Since:
      23.2.1.26
    • matchBitapScore

      private double matchBitapScore(int e, int x, int loc, String pattern)
      Compute and return the score for a match with e errors and x location.
      Parameters:
      e - Number of errors in match.
      x - Location of match.
      loc - Expected location of match.
      pattern - Pattern being sought.
      Returns:
      Overall score for match (0.0 = good, 1.0 = bad).
    • matchAlphabet

      protected Map<Character,Integer> matchAlphabet(String pattern)
      Initialise the alphabet for the Bitap algorithm.
      Parameters:
      pattern - The dmp to encode.
      Returns:
      Hash of character locations.
      Since:
      23.2.1.26