Class SimilarityRenameDetector


  • class SimilarityRenameDetector
    extends java.lang.Object
    • Field Detail

      • BITS_PER_INDEX

        private static final int BITS_PER_INDEX
        Number of bits we need to express an index into src or dst list.

        This must be 28, giving us a limit of 2^28 entries in either list, which is an insane limit of 536,870,912 file names being considered in a single rename pass. The other 8 bits are used to store the score, while staying under 127 so the long doesn't go negative.

        See Also:
        Constant Field Values
      • srcs

        private java.util.List<DiffEntry> srcs
        All sources to consider for copies or renames.

        A source is typically a DiffEntry.ChangeType.DELETE change, but could be another type when trying to perform copy detection concurrently with rename detection.

      • dsts

        private java.util.List<DiffEntry> dsts
        All destinations to consider looking for a rename.

        A destination is typically an DiffEntry.ChangeType.ADD, as the name has just come into existence, and we want to discover where its initial content came from.

      • matrix

        private long[] matrix
        Matrix of all examined file pairs, and their scores.

        The upper 8 bits of each long stores the score, but the score is bounded to be in the range (0, 128] so that the highest bit is never set, and all entries are therefore positive.

        List indexes to an element of srcs and dsts are encoded as the lower two groups of 28 bits, respectively, but the encoding is inverted, so that 0 is expressed as (1 << 28) - 1. This sorts lower list indices later in the matrix, giving precedence to files whose names sort earlier in the tree.

      • renameScore

        private int renameScore
        Score a pair must exceed to be considered a rename.
      • bigFileThreshold

        private int bigFileThreshold
        File size threshold (in bytes) for detecting renames. Files larger than this size will not be processed for renames.
      • skipBinaryFiles

        private boolean skipBinaryFiles
        Skip content renames for binary files.
    • Method Detail

      • setRenameScore

        void setRenameScore​(int score)
      • setBigFileThreshold

        void setBigFileThreshold​(int threshold)
      • setSkipBinaryFiles

        void setSkipBinaryFiles​(boolean value)
      • getMatches

        java.util.List<DiffEntry> getMatches()
      • getLeftOverSources

        java.util.List<DiffEntry> getLeftOverSources()
      • getLeftOverDestinations

        java.util.List<DiffEntry> getLeftOverDestinations()
      • isTableOverflow

        boolean isTableOverflow()
      • compactSrcList

        private static java.util.List<DiffEntry> compactSrcList​(java.util.List<DiffEntry> in)
      • compactDstList

        private static java.util.List<DiffEntry> compactDstList​(java.util.List<DiffEntry> in)
      • nameScore

        static int nameScore​(java.lang.String a,
                             java.lang.String b)
      • size

        private long size​(DiffEntry.Side side,
                          DiffEntry ent)
                   throws java.io.IOException
        Throws:
        java.io.IOException
      • score

        private static int score​(long value)
      • srcFile

        static int srcFile​(long value)
      • dstFile

        static int dstFile​(long value)
      • encode

        static long encode​(int score,
                           int srcIdx,
                           int dstIdx)
      • encodeFile

        private static long encodeFile​(int idx)
      • decodeFile

        private static int decodeFile​(int v)
      • isFile

        private static boolean isFile​(FileMode mode)