libtranscript
|
Data Structures | |
struct | transcript_name_t |
A structure holding a display name and availability information about a converter. More... | |
struct | transcript_t |
An opaque structure describing a converter and its state. More... | |
Macros | |
#define | TRANSCRIPT_MIN_BUFFER_SIZE |
Minimum required size for an output buffer for either transcript_to_unicode or transcript_from_unicode, if M:N conversion are allowed. More... | |
#define | TRANSCRIPT_MIN_CODEPAGE_BUFFER_SIZE |
Minimum required size for an output buffer for transcript_from_unicode, if M:N conversion are allowed. More... | |
#define | TRANSCRIPT_MIN_UNICODE_BUFFER_SIZE |
Minimum required size for an output buffer for transcript_to_unicode, if M:N conversion are allowed. More... | |
#define | TRANSCRIPT_SAVE_STATE_SIZE |
Required size of a buffer for saving converter state. More... | |
#define | TRANSCRIPT_VERSION |
The version of libtranscript encoded as a single integer. More... | |
Functions | |
void | transcript_close_converter (transcript_t *handle) |
Close a converter. More... | |
int | transcript_equal (const char *name_a, const char *name_b) |
Check if two names describe the same converter. More... | |
void | transcript_finalize (void) |
Finalize the library use. More... | |
transcript_error_t | transcript_from_unicode (transcript_t *handle, const char **inbuf, const char *inbuflimit, char **outbuf, const char *outbuflimit, int flags) |
Convert a buffer from Unicode to a chararcter set. More... | |
transcript_error_t | transcript_from_unicode_flush (transcript_t *handle, char **outbuf, const char *outbuflimit) |
Write out any bytes required to create a legal output in a character set. More... | |
void | transcript_from_unicode_reset (transcript_t *handle) |
Reset the from-Unicode conversion to its initial state. More... | |
transcript_error_t | transcript_from_unicode_skip (transcript_t *handle, const char **inbuf, const char *inbuflimit) |
Skip the next character in Unicode encoding. More... | |
const char * | transcript_get_codeset (void) |
Get a character string describing the current character set indicated by the environment. More... | |
const transcript_name_t * | transcript_get_names (int *count) |
Retrieve the list of display names known to this instantiation of the library. More... | |
long | transcript_get_version (void) |
Get the value of TRANSCRIPT_VERSION corresponding to the actually used library. More... | |
transcript_error_t | transcript_handle_unassigned (transcript_t *handle, uint32_t codepoint, char **outbuf, const char *outbuflimit, int flags) |
Handle an unassigned codepoint in a from-Unicode conversion. More... | |
transcript_error_t | transcript_init (void) |
Initialize the library. More... | |
void | transcript_load_state (transcript_t *handle, void *state) |
Restore a converter's state. More... | |
void | transcript_normalize_name (const char *name, char *normalized_name, size_t normalized_name_max) |
Normalize a character set name. More... | |
transcript_t * | transcript_open_converter (const char *name, transcript_utf_t utf_type, int flags, transcript_error_t *error) |
Open a converter. More... | |
int | transcript_probe_converter (const char *name) |
Check if a named converter is available. More... | |
void | transcript_save_state (transcript_t *handle, void *state) |
Save a converter's state. More... | |
const char * | transcript_strerror (transcript_error_t error) |
Get a localized descriptive string for an error code. More... | |
transcript_error_t | transcript_to_unicode (transcript_t *handle, const char **inbuf, const char *inbuflimit, char **outbuf, const char *outbuflimit, int flags) |
Convert a buffer from a chararcter set to Unicode. More... | |
void | transcript_to_unicode_reset (transcript_t *handle) |
Reset the to-Unicode conversion to its initial state. More... | |
transcript_error_t | transcript_to_unicode_skip (transcript_t *handle, const char **inbuf, const char *inbuflimit) |
Skip the next character in character set encoding. More... | |
#define TRANSCRIPT_MIN_BUFFER_SIZE |
Minimum required size for an output buffer for either transcript_to_unicode or transcript_from_unicode, if M:N conversion are allowed.
#define TRANSCRIPT_MIN_CODEPAGE_BUFFER_SIZE |
Minimum required size for an output buffer for transcript_from_unicode, if M:N conversion are allowed.
#define TRANSCRIPT_MIN_UNICODE_BUFFER_SIZE |
Minimum required size for an output buffer for transcript_to_unicode, if M:N conversion are allowed.
#define TRANSCRIPT_SAVE_STATE_SIZE |
Required size of a buffer for saving converter state.
#define TRANSCRIPT_VERSION |
The version of libtranscript encoded as a single integer.
The least significant 8 bits represent the patch level. The second 8 bits represent the minor version. The third 8 bits represent the major version.
At runtime, the value of TRANSCRIPT_VERSION can be retrieved by calling transcript_get_version.
enum transcript_error_t |
Error values.
Enumerator | |
---|---|
TRANSCRIPT_SUCCESS |
All OK. |
TRANSCRIPT_NO_SPACE |
There was no space left in the output buffer. |
TRANSCRIPT_INCOMPLETE |
The buffer ended with an incomplete sequence, or more data was needed to verify a M:N conversion. |
TRANSCRIPT_FALLBACK |
The next character to convert is a fallback mapping. |
TRANSCRIPT_UNASSIGNED |
The next character to convert is an unassigned sequence. |
TRANSCRIPT_ILLEGAL |
The input is an illegal sequence. |
TRANSCRIPT_ILLEGAL_END |
The end of the input does not form a valid sequence. |
TRANSCRIPT_INTERNAL_ERROR |
The transcript library screwed up; no recovery possible. |
TRANSCRIPT_PRIVATE_USE |
The next character to convert maps to a private use codepoint. |
TRANSCRIPT_ERRNO |
See errno for error code. |
TRANSCRIPT_BAD_ARG |
Bad argument. |
TRANSCRIPT_OUT_OF_MEMORY |
Out of memory. |
TRANSCRIPT_INVALID_FORMAT |
Invalid format while reading conversion map. |
TRANSCRIPT_TRUNCATED_MAP |
Tried to read a truncated conversion map. |
TRANSCRIPT_WRONG_VERSION |
Conversion map is of an unsupported version. |
TRANSCRIPT_INTERNAL_TABLE |
Tried to load a table that is for internal use only. |
TRANSCRIPT_DLOPEN_FAILURE |
Opening if the plugin failed. |
TRANSCRIPT_CONVERTER_DISABLED |
The converter has been explicitly disabled. |
TRANSCRIPT_PACKAGE_FILE |
The converter name references a converter package file, not an actual converter. |
TRANSCRIPT_INIT_DLFCN |
Could not initialize dynamic module loading functionality. |
TRANSCRIPT_NOT_INITIALIZED |
transcript_init has not been called yet. |
TRANSCRIPT_PART_SUCCESS_MAX |
Highest error code which indicates success or end-of-buffer. |
enum transcript_flags_t |
Flags for converters and conversions.
Enumerator | |
---|---|
TRANSCRIPT_ALLOW_FALLBACK |
Include fallback characters in the conversion. This flag is only used by transcript_from_unicode. |
TRANSCRIPT_SUBST_UNASSIGNED |
Automatically replace unmappable characters by substitute characters. |
TRANSCRIPT_SUBST_ILLEGAL |
Automatically insert a substitution character on illegal input. |
TRANSCRIPT_ALLOW_PRIVATE_USE |
Allow private-use mappings. If not allowed, they are handled like unassigned sequences, with the exception that they return a different error.. |
TRANSCRIPT_FILE_START |
The begining of the input buffer is the begining of a file and a BOM should be expected/generated. |
TRANSCRIPT_END_OF_TEXT |
The end of the input buffer is the end of the text. This flag is only valid when passed to transcript_from_unicode or transcript_to_unicode.
|
TRANSCRIPT_SINGLE_CONVERSION |
Only convert the next character, then return (useful for handling fallback/unassigned characters etc). This flag is only valid when passed to transcript_from_unicode or transcript_to_unicode. |
TRANSCRIPT_NO_MN_CONVERSION |
Do not use M:N conversions. This flag is only valid when passed to transcript_from_unicode or transcript_to_unicode. |
TRANSCRIPT_NO_1N_CONVERSION |
Do not use 1:N conversions. Implies TRANSCRIPT_NO_MN_CONVERSION. This flag is only valid when passed to ::transcript_from_unicode or ::transcript_to_unicode. |
void transcript_close_converter | ( | transcript_t * | handle | ) |
Close a converter.
handle | The converter to close. |
This function releases all memory associated with handle. handle may be NULL
.
int transcript_equal | ( | const char * | name_a, |
const char * | name_b | ||
) |
Check if two names describe the same converter.
name_a | |
name_b |
void transcript_finalize | ( | void | ) |
Finalize the library use.
This function will release all memory used by the library when this function has been called as many times as transcript_init has been called. Calling this function is not necessary, but may be useful when trying to find memory leaks.
transcript_error_t transcript_from_unicode | ( | transcript_t * | handle, |
const char ** | inbuf, | ||
const char * | inbuflimit, | ||
char ** | outbuf, | ||
const char * | outbuflimit, | ||
int | flags | ||
) |
Convert a buffer from Unicode to a chararcter set.
handle | The converter to use. |
inbuf | A double pointer to the start of the input buffer. |
inbuflimit | A pointer to the end of the input buffer. |
outbuf | A double pointer to the start of the output buffer. |
outbuflimit | A pointer to the end of the output buffer. |
flags | Flags for this conversion (see transcript_flags_t for possible values). |
TRANSCRIPT_SUCCESS | |
TRANSCRIPT_NO_SPACE | |
TRANSCRIPT_INCOMPLETE | |
TRANSCRIPT_FALLBACK | |
TRANSCRIPT_UNASSIGNED | |
TRANSCRIPT_ILLEGAL | |
TRANSCRIPT_ILLEGAL_END | |
TRANSCRIPT_INTERNAL_ERROR | |
TRANSCRIPT_PRIVATE_USE |
This function uses the converter indicated by handle to convert data from Unicode to the character set named in opening handle. The interface is designed to work with incomplete buffers, and may return TRANSCRIPT_INCOMPLETE if the bytes at the end of the input buffer do not form a complete sequence. If the output buffer is not large enough to store all the converted data, TRANSCRIPT_NO_SPACE is returned.
If M:N conversions are enabled, the output buffer must be able to hold at least 32 bytes (TRANSCRIPT_MIN_CODEPAGE_BUFFER_SIZE).
transcript_error_t transcript_from_unicode_flush | ( | transcript_t * | handle, |
char ** | outbuf, | ||
const char * | outbuflimit | ||
) |
Write out any bytes required to create a legal output in a character set.
handle | The converter to use. |
outbuf | A double pointer to the start of the output buffer. |
outbuflimit | A pointer to the end of the output buffer. |
TRANSCRIPT_SUCCESS | |
TRANSCRIPT_NO_SPACE | |
TRANSCRIPT_INTERNAL_ERROR |
Some stateful encoding converters need to store a shift sequence or some closing bytes at the end of the output, that can only be computed when it is known that there is no more input. For efficiency reasons, this is not done based on the TRANSCRIPT_END_OF_TEXT flag in transcript_from_unicode.
After calling this function, the from-Unicode conversion will be in the initial state.
void transcript_from_unicode_reset | ( | transcript_t * | handle | ) |
Reset the from-Unicode conversion to its initial state.
handle | The converter to reset. |
transcript_error_t transcript_from_unicode_skip | ( | transcript_t * | handle, |
const char ** | inbuf, | ||
const char * | inbuflimit | ||
) |
Skip the next character in Unicode encoding.
handle | The converter to use. |
inbuf | A double pointer to the start of the input buffer. |
inbuflimit | A pointer to the end of the input buffer. |
TRANSCRIPT_SUCCESS | |
TRANSCRIPT_INCOMPLETE | |
TRANSCRIPT_INTERNAL_ERROR |
This function can be used to recover stopped from-Unicode conversions, if the next input character can not be converted (either because the input is corrupt, or the conversions are not permitted by the flag settings).
const char * transcript_get_codeset | ( | void | ) |
Get a character string describing the current character set indicated by the environment.
setlocale
or nl_langinfo
.Essentially this function does the same as nl_langinfo(CODESET)
. However, nl_langinfo
may not be available. In those cases, it uses setlocale
to retrieve the current value for LC_CTYPE
, and tries to retrieve the character set in that. If all else fails, it returns a string representing the ASCII character set.
const transcript_name_t * transcript_get_names | ( | int * | count | ) |
Retrieve the list of display names known to this instantiation of the library.
count | A location to store the number of names returned. |
long transcript_get_version | ( | void | ) |
Get the value of TRANSCRIPT_VERSION corresponding to the actually used library.
This function can be useful to determine at runtime what version of the library was linked to the program. Although currently there are no known uses for this information, future library additions may prompt library users to want to operate differently depending on the available features.
transcript_error_t transcript_handle_unassigned | ( | transcript_t * | handle, |
uint32_t | codepoint, | ||
char ** | outbuf, | ||
const char * | outbuflimit, | ||
int | flags | ||
) |
Handle an unassigned codepoint in a from-Unicode conversion.
This function does a lookup in the generic fall-back table. If no generic fall-back is found, this function simply returns TRANSCRIPT_UNASSIGNED. Otherwise, it handles conversion of the generic fall-back as if it were specified in the converter table.
transcript_error_t transcript_init | ( | void | ) |
Initialize the library.
This function must be called before calling any other function of the library. It is safe to call this function more than once.
void transcript_load_state | ( | transcript_t * | handle, |
void * | state | ||
) |
Restore a converter's state.
handle | The converter to restore the state for. |
state | A pointer to a buffer of at least TRANSCRIPT_SAVE_STATE_SIZE bytes. |
void transcript_normalize_name | ( | const char * | name, |
char * | normalized_name, | ||
size_t | normalized_name_max | ||
) |
Normalize a character set name.
name | The name to normalize. |
normalized_name | A pointer to a buffer to store the normalized name. |
normalized_name_max | The size of normalized_name. |
Any characters in name other than the letters 'a'-'z' (either upper or lower case), and the numbers '0'-'9' are ignored. Furthermore, leading zeros in numbers are ignored as well. The stored result will be nul terminated.
transcript_t * transcript_open_converter | ( | const char * | name, |
transcript_utf_t | utf_type, | ||
int | flags, | ||
transcript_error_t * | error | ||
) |
Open a converter.
name | The name of the converter to open. |
utf_type | The UTF type to use for representing Unicode codepoints. |
flags | The default flags for the converter (see transcript_flags_t for possible values). |
error | The location to store a possible error code. |
The name of the converter is in principle free-form. A list of known names can be retrieved through transcript_get_names. The name argument is passed through transcript_normalize_name first, and at most 79 characters of the normalized name are considered.
int transcript_probe_converter | ( | const char * | name | ) |
Check if a named converter is available.
name | The name of the converter to check. |
void transcript_save_state | ( | transcript_t * | handle, |
void * | state | ||
) |
Save a converter's state.
handle | The converter to save the state for. |
state | A pointer to a buffer of at least TRANSCRIPT_SAVE_STATE_SIZE bytes. |
const char * transcript_strerror | ( | transcript_error_t | error | ) |
Get a localized descriptive string for an error code.
error | The error code to retrieve the descriptive string for. |
transcript_error_t transcript_to_unicode | ( | transcript_t * | handle, |
const char ** | inbuf, | ||
const char * | inbuflimit, | ||
char ** | outbuf, | ||
const char * | outbuflimit, | ||
int | flags | ||
) |
Convert a buffer from a chararcter set to Unicode.
handle | The converter to use. |
inbuf | A double pointer to the start of the input buffer. |
inbuflimit | A pointer to the end of the input buffer. |
outbuf | A double pointer to the start of the output buffer. |
outbuflimit | A pointer to the end of the output buffer. |
flags | Flags for this conversion (see transcript_flags_t for possible values). |
TRANSCRIPT_SUCCESS | |
TRANSCRIPT_NO_SPACE | |
TRANSCRIPT_INCOMPLETE | |
TRANSCRIPT_FALLBACK | |
TRANSCRIPT_UNASSIGNED | |
TRANSCRIPT_ILLEGAL | |
TRANSCRIPT_ILLEGAL_END | |
TRANSCRIPT_INTERNAL_ERROR | |
TRANSCRIPT_PRIVATE_USE |
This function uses the converter indicated by handle to convert data from the character set named in opening handle to Unicode. The interface is designed to work with incomplete buffers, and may return TRANSCRIPT_INCOMPLETE if the bytes at the end of the input buffer do not form a complete sequence. If the output buffer is not large enough to store all the converted data, TRANSCRIPT_NO_SPACE is returned.
If M:N conversions are enabled, the output buffer must be able to hold at least 20 codepoints. This is guaranteed if the size of the output buffer is at least 80 (TRANSCRIPT_MIN_UNICODE_BUFFER_SIZE) bytes.
void transcript_to_unicode_reset | ( | transcript_t * | handle | ) |
Reset the to-Unicode conversion to its initial state.
handle | The converter to reset. |
transcript_error_t transcript_to_unicode_skip | ( | transcript_t * | handle, |
const char ** | inbuf, | ||
const char * | inbuflimit | ||
) |
Skip the next character in character set encoding.
handle | The converter to use. |
inbuf | A double pointer to the start of the input buffer. |
inbuflimit | A pointer to the end of the input buffer. |
TRANSCRIPT_SUCCESS | |
TRANSCRIPT_INCOMPLETE | |
TRANSCRIPT_INTERNAL_ERROR |
This function can be used to recover stopped to-Unicode conversions, if the next input character can not be converted (either because the input is corrupt, or the conversions are not permitted by the flag settings).