module CharDet
Big5 frequency table by Taiwan's Mandarin Promotion Council <www.edu.tw:81/mandr/>
128 –> 0.42261 256 –> 0.57851 512 –> 0.74851 1024 –> 0.89384 2048 –> 0.97583
Ideal Distribution Ratio = 0.74851/(1-0.74851) =2.98 Random Distribution Ration = 512/(5401-512)=0.105
Typical Distribution Ratio about 25% of Ideal one, still much higher than RDR
BEGIN LICENSE BLOCK ########################
The Original Code is Mozilla Communicator client code.
The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. All Rights Reserved.
Contributor(s):
Jeff Hodges - port to Ruby Mark Pilgrim - port to Python
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
END LICENSE BLOCK #########################
BEGIN LICENSE BLOCK ########################
The Original Code is Mozilla Communicator client code.
The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. All Rights Reserved.
Contributor(s):
Jeff Hodges Mark Pilgrim - port to Python
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
END LICENSE BLOCK #########################
BEGIN LICENSE BLOCK ########################
The Original Code is Mozilla Communicator client code.
The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. All Rights Reserved.
Contributor(s):
Jeff Hodges - port to Ruby Mark Pilgrim - port to Python
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
END LICENSE BLOCK #########################
BEGIN LICENSE BLOCK ########################
The Original Code is Mozilla Universal charset detector code.
The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 2001 the Initial Developer. All Rights Reserved.
Contributor(s):
Jeff Hodges - port to Ruby Mark Pilgrim - port to Python Shy Shalom - original C code
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
END LICENSE BLOCK #########################
BEGIN LICENSE BLOCK ########################
The Original Code is mozilla.org code.
The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. All Rights Reserved.
Contributor(s):
Jeff Hodges - port to Ruby Mark Pilgrim - port to Python
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
END LICENSE BLOCK #########################
BEGIN LICENSE BLOCK ########################
The Original Code is Mozilla Universal charset detector code.
The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 2001 the Initial Developer. All Rights Reserved.
Contributor(s):
Jeff Hodges - port to Ruby Mark Pilgrim - port to Python Shy Shalom - original C code
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
END LICENSE BLOCK #########################
BEGIN LICENSE BLOCK ########################
The Original Code is mozilla.org code.
The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. All Rights Reserved.
Contributor(s):
Jeff Hodges - port to Ruby Mark Pilgrim - port to Python
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
END LICENSE BLOCK #########################
BEGIN LICENSE BLOCK ########################
The Original Code is mozilla.org code.
The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. All Rights Reserved.
Contributor(s):
Jeff Hodges - port to Ruby Mark Pilgrim - port to Python
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
END LICENSE BLOCK #########################
BEGIN LICENSE BLOCK ########################
The Original Code is mozilla.org code.
The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. All Rights Reserved.
Contributor(s):
Jeff Hodges - port to Ruby Mark Pilgrim - port to Python
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
END LICENSE BLOCK #########################
128 –> 0.79 256 –> 0.92 512 –> 0.986 1024 –> 0.99944 2048 –> 0.99999
Idea Distribution Ratio = 0.98653 / (1-0.98653) = 73.24 Random Distribution Ration = 512 / (2350-512) = 0.279.
Typical Distribution Ratio
BEGIN LICENSE BLOCK ########################
The Original Code is mozilla.org code.
The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. All Rights Reserved.
Contributor(s):
Jeff Hodges - port to Ruby Mark Pilgrim - port to Python
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
END LICENSE BLOCK #########################
128 –> 0.42261 256 –> 0.57851 512 –> 0.74851 1024 –> 0.89384 2048 –> 0.97583
Idea Distribution Ratio = 0.74851/(1-0.74851) =2.98 Random Distribution Ration = 512/(5401-512)=0.105
Typical Distribution Ratio about 25% of Ideal one, still much higher than RDR
BEGIN LICENSE BLOCK ########################
The Original Code is mozilla.org code.
The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. All Rights Reserved.
Contributor(s):
Jeff Hodges - port to Ruby Mark Pilgrim - port to Python
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
END LICENSE BLOCK #########################
512 –> 0.79 – 0.79 1024 –> 0.92 – 0.13 2048 –> 0.98 – 0.06 6768 –> 1.00 – 0.02
Ideal Distribution Ratio = 0.79135/(1-0.79135) = 3.79 Random Distribution Ration = 512 / (3755 - 512) = 0.157
Typical Distribution Ratio about 25% of Ideal one, still much higher that RDR
BEGIN LICENSE BLOCK ########################
The Original Code is mozilla.org code.
The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. All Rights Reserved.
Contributor(s):
Jeff Hodges - port to Ruby Mark Pilgrim - port to Python
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
END LICENSE BLOCK #########################
windows-1255 / ISO-8859-8 code points of interest
128 –> 0.77094 256 –> 0.85710 512 –> 0.92635 1024 –> 0.97130 2048 –> 0.99431
Ideal Distribution Ratio = 0.92635 / (1-0.92635) = 12.58 Random Distribution Ration = 512 / (2965+62+83+86-512) = 0.191
Typical Distribution Ratio, 25% of IDR
BEGIN LICENSE BLOCK ########################
The Original Code is Mozilla Communicator client code.
The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. All Rights Reserved.
Contributor(s):
Jeff Hodges - port to Ruby Mark Pilgrim - port to Python
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
END LICENSE BLOCK #########################
Character Mapping Table: this table is modified base on win1251BulgarianCharToOrderMap, so only number <64 is sure valid
BEGIN LICENSE BLOCK ########################
The Original Code is Mozilla Communicator client code.
The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. All Rights Reserved.
Contributor(s):
Jeff Hodges - port to Ruby Mark Pilgrim - port to Python
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
END LICENSE BLOCK #########################
BEGIN LICENSE BLOCK ########################
The Original Code is Mozilla Communicator client code.
The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. All Rights Reserved.
Contributor(s):
Jeff Hodges - port to Ruby Mark Pilgrim - port to Python
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
END LICENSE BLOCK #########################
BEGIN LICENSE BLOCK ########################
The Original Code is Mozilla Universal charset detector code.
The Initial Developer of the Original Code is
Simon Montagu
Portions created by the Initial Developer are Copyright (C) 2005 the Initial Developer. All Rights Reserved.
Contributor(s):
Jeff Hodges - port to Ruby Mark Pilgrim - port to Python Shy Shalom - original C code Shoshannah Forbes - original C code (?)
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
END LICENSE BLOCK #########################
BEGIN LICENSE BLOCK ########################
The Original Code is Mozilla Communicator client code.
The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. All Rights Reserved.
Contributor(s):
Jeff Hodges - port to Ruby Mark Pilgrim - port to Python
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
END LICENSE BLOCK #########################
BEGIN LICENSE BLOCK ########################
The Original Code is Mozilla Communicator client code.
The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. All Rights Reserved.
Contributor(s):
Jeff Hodges - port to Ruby Mark Pilgrim - port to Python
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
END LICENSE BLOCK #########################
BEGIN LICENSE BLOCK ########################
The Original Code is Mozilla Universal charset detector code.
The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 2001 the Initial Developer. All Rights Reserved.
Contributor(s):
Jeff Hodges - port to Ruby Mark Pilgrim - port to Python Shy Shalom - original C code
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
END LICENSE BLOCK #########################
BEGIN LICENSE BLOCK ########################
The Original Code is Mozilla Universal charset detector code.
The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 2001 the Initial Developer. All Rights Reserved.
Contributor(s):
Jeff Hodges - port to Ruby Mark Pilgrim - port to Python Shy Shalom - original C code Proofpoint, Inc.
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
END LICENSE BLOCK #########################
BEGIN LICENSE BLOCK ########################
The Original Code is Mozilla Universal charset detector code.
The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 2001 the Initial Developer. All Rights Reserved.
Contributor(s):
Jeff Hodges - port to Ruby Mark Pilgrim - port to Python Shy Shalom - original C code Proofpoint, Inc.
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
END LICENSE BLOCK #########################
BEGIN LICENSE BLOCK ########################
The Original Code is mozilla.org code.
The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. All Rights Reserved.
Contributor(s):
Jeff Hodges - port to Ruby Mark Pilgrim - port to Python
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
END LICENSE BLOCK #########################
BEGIN LICENSE BLOCK ########################
The Original Code is Mozilla Universal charset detector code.
The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 2001 the Initial Developer. All Rights Reserved.
Contributor(s):
Jeff Hodges - port to Ruby Mark Pilgrim - port to Python Shy Shalom - original C code
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
END LICENSE BLOCK #########################
BEGIN LICENSE BLOCK ########################
The Original Code is Mozilla Universal charset detector code.
The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 2001 the Initial Developer. All Rights Reserved.
Contributor(s):
Jeff Hodges - port to Ruby Mark Pilgrim - port to Python Shy Shalom - original C code
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
END LICENSE BLOCK #########################
BEGIN LICENSE BLOCK ########################
The Original Code is mozilla.org code.
The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. All Rights Reserved.
Contributor(s):
Jeff Hodges - port to Ruby Mark Pilgrim - port to Python
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
END LICENSE BLOCK #########################
BEGIN LICENSE BLOCK ########################
The Original Code is Mozilla Universal charset detector code.
The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 2001 the Initial Developer. All Rights Reserved.
Contributor(s):
Jeff Hodges - port to Ruby Mark Pilgrim - port to Python Shy Shalom - original C code
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
END LICENSE BLOCK #########################
BEGIN LICENSE BLOCK ########################
The Original Code is mozilla.org code.
The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. All Rights Reserved.
Contributor(s):
Jeff Hodges - port to Ruby Mark Pilgrim - port to Python
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
END LICENSE BLOCK #########################
Constants
- ACO
- ACV
- ASC
- ASO
- ASS
- ASV
- BIG5_TABLE_SIZE
Char to FreqOrder table
- BIG5_TYPICAL_DISTRIBUTION_RATIO
- BIG5_cls
BIG5
- BIG5_st
- Big5CharLenTable
- Big5CharToFreqOrder
- Big5SMModel
- BulgarianLangModel
Model Table: total sequences: 100% first 512 sequences: 96.9392% first 1024 sequences:3.0618% rest sequences: 0.2992% negative sequences: 0.0020%
- CLASS_NUM
- DONT_KNOW
- EDetecting
- EError
- EEscAscii
- EFoundIt
- EHighbyte
- EItsMe
- ENOUGH_DATA_THRESHOLD
- ENOUGH_REL_THRESHOLD
- ENotMe
- EPureAscii
- EStart
- EUCJPCharLenTable
- EUCJPSMModel
- EUCJP_cls
EUC-JP
- EUCJP_st
- EUCKRCharLenTable
- EUCKRCharToFreqOrder
Char to FreqOrder table ,
- EUCKRSMModel
- EUCKR_TABLE_SIZE
- EUCKR_TYPICAL_DISTRIBUTION_RATIO
- EUCKR_cls
EUC-KR
- EUCKR_st
- EUCTWCharLenTable
- EUCTWCharToFreqOrder
- EUCTWSMModel
- EUCTW_TABLE_SIZE
Char to FreqOrder table ,
- EUCTW_TYPICAL_DISTRIBUTION_RATIO
- EUCTW_cls
EUC-TW
- EUCTW_st
- FINAL_KAF
- FINAL_MEM
- FINAL_NUN
- FINAL_PE
- FINAL_TSADI
- FREQ_CAT_NUM
- GB18030CharLenTable
To be accurate, the length of class 6 can be either 2 or 4. But it is not necessary to discriminate between the two since it is used for frequency analysis only, and we are validing each code range there as well. So it is safe to set it to be 2 here.
- GB18030CharToFreqOrder
- GB18030SMModel
- GB18030_TABLE_SIZE
- GB18030_TYPICAL_DISTRIBUTION_RATIO
- GB18030_cls
GB18030
- GB18030_st
- GreekLangModel
Model Table: total sequences: 100% first 512 sequences: 98.2851% first 1024 sequences:1.7001% rest sequences: 0.0359% negative sequences: 0.0148%
- HZCharLenTable
- HZSMModel
- HZ_cls
- HZ_st
- HebrewLangModel
Model Table: total sequences: 100% first 512 sequences: 98.4004% first 1024 sequences: 1.5981% rest sequences: 0.087% negative sequences: 0.0015%
- HungarianLangModel
Model Table: total sequences: 100% first 512 sequences: 94.7368% first 1024 sequences:5.2623% rest sequences: 0.8894% negative sequences: 0.0009%
- IBM855_CharToOrderMap
- IBM866_CharToOrderMap
- ISO2022CNCharLenTable
- ISO2022CNSMModel
- ISO2022CN_cls
- ISO2022CN_st
- ISO2022JPCharLenTable
- ISO2022JPSMModel
- ISO2022JP_cls
- ISO2022JP_st
- ISO2022KRCharLenTable
- ISO2022KRSMModel
- ISO2022KR_cls
- ISO2022KR_st
- Ibm855Model
- Ibm866Model
- JISCharToFreqOrder
- JIS_TABLE_SIZE
Char to FreqOrder table ,
- JIS_TYPICAL_DISTRIBUTION_RATIO
- JP2_CHAR_CONTEXT
This is hiragana 2-char sequence table, the number in each cell represents its frequency category
- KOI8R_CharToOrderMap
KOI8-R language model Character Mapping Table:
- Koi8rModel
- LOGICAL_HEBREW_NAME
- Latin1ClassModel
0 : illegal 1 : very unlikely 2 : normal 3 : very likely
- Latin1_CharToClass
- Latin2HungarianModel
- Latin2_HungarianCharToOrderMap
Character Mapping Table:
- Latin5BulgarianModel
- Latin5CyrillicModel
- Latin5_BulgarianCharToOrderMap
- Latin7GreekModel
- Latin7_CharToOrderMap
Character Mapping Table:
- MAX_REL_THRESHOLD
- MINIMUM_DATA_THRESHOLD
- MINIMUM_THRESHOLD
- MIN_FINAL_CHAR_DISTANCE
Minimum Visual vs Logical final letter score difference. If the difference is below this, don't rely solely on the final letter score distance.
- MIN_MODEL_DISTANCE
Minimum Visual vs Logical model score difference. If the difference is below this, don't rely at all on the model score distance.
- MacCyrillicModel
- NEGATIVE_SHORTCUT_THRESHOLD
- NORMAL_KAF
- NORMAL_MEM
- NORMAL_NUN
- NORMAL_PE
- NORMAL_TSADI
- NUMBER_OF_SEQ_CAT
- NUM_OF_CATEGORY
- ONE_CHAR_PROB
- OTH
- POSITIVE_CAT
- POSITIVE_SHORTCUT_THRESHOLD
- RussianLangModel
Model Table: total sequences: 100% first 512 sequences: 97.6601% first 1024 sequences: 2.3389% rest sequences: 0.1237% negative sequences: 0.0009%
- SAMPLE_SIZE
- SB_ENOUGH_REL_THRESHOLD
- SHORTCUT_THRESHOLD
- SJISCharLenTable
- SJISSMModel
- SJIS_cls
Shift_JIS
- SJIS_st
- SURE_NO
- SURE_YES
- SYMBOL_CAT_ORDER
- TIS620CharToOrderMap
Character Mapping Table:
- TIS620ThaiModel
- ThaiLangModel
Model Table: total sequences: 100% first 512 sequences: 92.6386% first 1024 sequences:7.3177% rest sequences: 1.0230% negative sequences: 0.0436%
- UCS2BECharLenTable
- UCS2BESMModel
- UCS2BE_cls
UCS2-BE
- UCS2BE_st
- UCS2LECharLenTable
- UCS2LESMModel
- UCS2LE_cls
UCS2-LE
- UCS2LE_st
- UDF
- UTF8CharLenTable
- UTF8SMModel
- UTF8_cls
UTF-8
- UTF8_st
- VERSION
- VISUAL_HEBREW_NAME
- Win1250HungarianCharToOrderMap
- Win1250HungarianModel
- Win1251BulgarianModel
- Win1251CyrillicModel
- Win1253GreekModel
- Win1253_CharToOrderMap
- Win1255HebrewModel
- Win1255_CharToOrderMap
Windows-1255 language model Character Mapping Table:
Public Class Methods
# File lib/rchardet.rb, line 58 def CharDet.detect(aBuf) aBuf = aBuf.dup.force_encoding(Encoding::BINARY) u = UniversalDetector.new u.reset u.feed(aBuf) u.close u.result end