Main Page | Modules | Namespace List | Data Structures | Directories | File List | Namespace Members | Data Fields | Globals

unicode.h File Reference

Unicode facilities to manage UTF-8, UTF-16 and wide characters. More...

#include "errno.h"
#include <Datatypes.h>
#include <stdlib.h>
#include <wchar.h>

Functions

int unicode_utf8len (int lead_byte)
 Gets the length of a UTF-8 character.
int unicode_utf8towc (wchar_t *restrict result, const char *restrict string, size_t size)
 UTF-8 to wide character.
int unicode_wctoutf8 (char *s, wchar_t wc, size_t size)
 Wide character to UTF-8.
int unicode_utf16len (int lead_word)
 Gets the length of a UTF-16 character.
int unicode_utf16towc (wchar_t *restrict result, const uint16_t *restrict string, size_t size)
 UTF-16 to wide character.
int unicode_wctoutf16 (uint16_t *s, wchar_t wc, size_t size)
 Wide character to UTF-16.
wchar_t unicode_simple_fold (wchar_t wc)
 Simple case folding of a wide character.


Detailed Description

Unicode facilities to manage UTF-8, UTF-16 and wide characters.


Function Documentation

wchar_t unicode_simple_fold wchar_t  wc  ) 
 

Simple case folding of a wide character.

Parameters:
wc the wide character to fold.
Returns:
If a simple folding is defined, the folded version of wc is returned, otherwise wc is returned unchanged.
Remarks:
This function performs simple case folding using two accesses in large lookup tables.
Case folding provides a mapping between characters that only differ in case. This is useful for case insensitive comparison. Simple case folding maps a single wide character to another single wide character (usually lower case). Full case folding, instead, may map a single wide character to more wide characters.

int unicode_utf16len int  lead_word  ) 
 

Gets the length of a UTF-16 character.

Parameters:
lead_word the first uint16_t of a UTF-16 character;
Return values:
>0 the length in uint16_t units of the UTF-16 character;
Remarks:
For performance reasons, this function does not parse the whole UTF-16 word sequence, just the first uint16_t. If checking the validity of the whole UTF-16 word sequence is needed, use unicode_utf16towc.

int unicode_utf16towc wchar_t *restrict  result,
const uint16_t *restrict  string,
size_t  size
 

UTF-16 to wide character.

Parameters:
result where to store the converted wide character;
string buffer containing the UTF-16 character to convert;
size max number of uint16_t units of string to examine;
Return values:
>0 the length in uint16_t units of the processed UTF-16 character, the wide character is stored in result;
-EILSEQ invalid UTF-16 word sequence;
-ENAMETOOLONG size too small to parse the UTF-16 character.

int unicode_utf8len int  lead_byte  ) 
 

Gets the length of a UTF-8 character.

Parameters:
lead_byte the first byte of a UTF-8 character;
Return values:
>0 the length in bytes of the UTF-8 character;
-EILSEQ invalid UTF-8 lead byte;
Remarks:
For performance reasons, this function does not parse the whole UTF-8 byte sequence, just the first byte. If checking the validity of the whole UTF-8 byte sequence is needed, use unicode_utf8towc.

int unicode_utf8towc wchar_t *restrict  result,
const char *restrict  string,
size_t  size
 

UTF-8 to wide character.

Parameters:
result where to store the converted wide character;
string buffer containing the UTF-8 character to convert;
size max number of bytes of string to examine;
Return values:
>0 the length in bytes of the processed UTF-8 character, the wide character is stored in result;
-EILSEQ invalid UTF-8 byte sequence;
-ENAMETOOLONG size too small to parse the UTF-8 character.

int unicode_wctoutf16 uint16_t s,
wchar_t  wc,
size_t  size
 

Wide character to UTF-16.

Parameters:
s where to store the converted UTF-16 character;
wc the wide character to convert;
size max number of uint16_t units to store in s;
Return values:
>0 the length in uint16_t units of the converted UTF-16 character, stored in s;
-EINVAL invalid wide character (don't know how to convert it to UTF-16);
-ENAMETOOLONG size too small to store the UTF-16 character.

int unicode_wctoutf8 char *  s,
wchar_t  wc,
size_t  size
 

Wide character to UTF-8.

Parameters:
s where to store the converted UTF-8 character;
wc the wide character to convert;
size max number of bytes to store in s;
Return values:
>0 the length in bytes of the converted UTF-8 character, stored in s;
-EINVAL invalid wide character (don't know how to convert it to UTF-8);
-ENAMETOOLONG size too small to store the UTF-8 character.


Generated on Fri Feb 24 14:13:22 2006 for VDK Blacksheep by  doxygen 1.4.1