Class FileEncoding


  • public class FileEncoding
    extends java.lang.Object
    Tries to guess the encoding of the byte sequence. Orignial code taken from https://github.com/file/file/blob/master/src/encoding.c
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private java.lang.String code  
      private java.lang.String code_mime  
      private static char[] ebcdic_1047_to_8859  
      private static char[] ebcdic_to_ascii  
      private static byte F  
      private static byte I  
      private static byte T  
      private byte[] text_chars  
      private java.lang.String type  
      private static byte X  
    • Constructor Summary

      Constructors 
      Constructor Description
      FileEncoding()  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private byte[] from_ebcdic​(byte[] buf, int nbytes)  
      java.lang.String getCode()  
      java.lang.String getCodeMime()  
      java.lang.String getType()  
      boolean guessFileEncoding​(byte[] buf)
      Try to determine whether text is in some character code we can identify.
      private boolean looks_ascii​(byte[] buf, int nbytes)  
      private boolean looks_extended​(byte[] buf, int nbytes)  
      private boolean looks_latin1​(byte[] buf, int nbytes)  
      private int looks_ucs16​(byte[] buf, int nbytes)  
      private boolean looks_utf7​(byte[] buf, int nbytes)  
      protected int looks_utf8​(byte[] buf, int nbytes)  
      private boolean looks_utf8_with_BOM​(byte[] buf, int nbytes)  
      private int unsignedByte​(byte value)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • type

        private java.lang.String type
      • code

        private java.lang.String code
      • code_mime

        private java.lang.String code_mime
      • text_chars

        private byte[] text_chars
      • ebcdic_to_ascii

        private static final char[] ebcdic_to_ascii
      • ebcdic_1047_to_8859

        private static final char[] ebcdic_1047_to_8859
    • Constructor Detail

      • FileEncoding

        public FileEncoding()
    • Method Detail

      • getCodeMime

        public java.lang.String getCodeMime()
      • getType

        public java.lang.String getType()
      • getCode

        public java.lang.String getCode()
      • guessFileEncoding

        public boolean guessFileEncoding​(byte[] buf)
        Try to determine whether text is in some character code we can identify. It also identifies EBCDIC by converting it to ISO-8859-1.
        Returns:
        true if it could guess an encoding.
      • looks_ascii

        private boolean looks_ascii​(byte[] buf,
                                    int nbytes)
      • looks_latin1

        private boolean looks_latin1​(byte[] buf,
                                     int nbytes)
      • looks_extended

        private boolean looks_extended​(byte[] buf,
                                       int nbytes)
      • looks_utf8

        protected int looks_utf8​(byte[] buf,
                                 int nbytes)
      • looks_utf8_with_BOM

        private boolean looks_utf8_with_BOM​(byte[] buf,
                                            int nbytes)
      • looks_utf7

        private boolean looks_utf7​(byte[] buf,
                                   int nbytes)
      • looks_ucs16

        private int looks_ucs16​(byte[] buf,
                                int nbytes)
      • from_ebcdic

        private byte[] from_ebcdic​(byte[] buf,
                                   int nbytes)
      • unsignedByte

        private int unsignedByte​(byte value)