Enumerations | Functions | Variables
src/linebreak/linebreak.c File Reference

Enumerations

enum  BreakAction {
  DIR_BRK, IND_BRK, CMI_BRK, CMP_BRK,
  PRH_BRK
}

Functions

void init_linebreak (void)
utf32_t lb_get_next_char_utf8 (const utf8_t *s, size_t len, size_t *ip)
utf32_t lb_get_next_char_utf16 (const utf16_t *s, size_t len, size_t *ip)
utf32_t lb_get_next_char_utf32 (const utf32_t *s, size_t len, size_t *ip)
void set_linebreaks (const void *s, size_t len, const char *lang, char *brks, get_next_char_t get_next_char)
void set_linebreaks_utf8 (const utf8_t *s, size_t len, const char *lang, char *brks)
void set_linebreaks_utf16 (const utf16_t *s, size_t len, const char *lang, char *brks)
void set_linebreaks_utf32 (const utf32_t *s, size_t len, const char *lang, char *brks)
int is_line_breakable (utf32_t char1, utf32_t char2, const char *lang)

Variables

const int linebreak_version = LINEBREAK_VERSION

Detailed Description

Implementation of the line breaking algorithm as described in Unicode Standard Annex 14.

Version:
2.1, 2011/05/07
Author:
Wu Yongwei

Enumeration Type Documentation

Enumeration of break actions. They are used in the break action pair table below.

Enumerator:
DIR_BRK 

Direct break opportunity

IND_BRK 

Indirect break opportunity

CMI_BRK 

Indirect break opportunity for combining marks

CMP_BRK 

Prohibited break for combining marks

PRH_BRK 

Prohibited break


Function Documentation

void init_linebreak ( void  )

Initializes the second-level index to the line breaking properties. If it is not called, the performance of #get_char_lb_class_lang (and thus the main functionality) can be pretty bad, especially for big code points like those of Chinese.

utf32_t lb_get_next_char_utf8 ( const utf8_t s,
size_t  len,
size_t *  ip 
)

Gets the next Unicode character in a UTF-8 sequence. The index will be advanced to the next complete character, unless the end of string is reached in the middle of a UTF-8 sequence.

Parameters:
[in]sinput UTF-8 string
[in]lenlength of the string in bytes
[in,out]ippointer to the index
Returns:
the Unicode character beginning at the index; or

EOS if end of input is encountered

utf32_t lb_get_next_char_utf16 ( const utf16_t s,
size_t  len,
size_t *  ip 
)

Gets the next Unicode character in a UTF-16 sequence. The index will be advanced to the next complete character, unless the end of string is reached in the middle of a UTF-16 surrogate pair.

Parameters:
[in]sinput UTF-16 string
[in]lenlength of the string in words
[in,out]ippointer to the index
Returns:
the Unicode character beginning at the index; or

EOS if end of input is encountered

utf32_t lb_get_next_char_utf32 ( const utf32_t s,
size_t  len,
size_t *  ip 
)

Gets the next Unicode character in a UTF-32 sequence. The index will be advanced to the next character.

Parameters:
[in]sinput UTF-32 string
[in]lenlength of the string in dwords
[in,out]ippointer to the index
Returns:
the Unicode character beginning at the index; or

EOS if end of input is encountered

void set_linebreaks ( const void *  s,
size_t  len,
const char *  lang,
char *  brks,
get_next_char_t  get_next_char 
)

Sets the line breaking information for a generic input string.

Parameters:
[in]sinput string
[in]lenlength of the input
[in]langlanguage of the input
[out]brkspointer to the output breaking data, containing #LINEBREAK_MUSTBREAK,

LINEBREAK_ALLOWBREAK, #LINEBREAK_NOBREAK,

or #LINEBREAK_INSIDEACHAR

Parameters:
[in]get_next_charfunction to get the next UTF-32 character
void set_linebreaks_utf8 ( const utf8_t s,
size_t  len,
const char *  lang,
char *  brks 
)

Sets the line breaking information for a UTF-8 input string.

Parameters:
[in]sinput UTF-8 string
[in]lenlength of the input
[in]langlanguage of the input
[out]brkspointer to the output breaking data, containing

LINEBREAK_MUSTBREAK, #LINEBREAK_ALLOWBREAK,

LINEBREAK_NOBREAK, or #LINEBREAK_INSIDEACHAR

void set_linebreaks_utf16 ( const utf16_t s,
size_t  len,
const char *  lang,
char *  brks 
)

Sets the line breaking information for a UTF-16 input string.

Parameters:
[in]sinput UTF-16 string
[in]lenlength of the input
[in]langlanguage of the input
[out]brkspointer to the output breaking data, containing

LINEBREAK_MUSTBREAK, #LINEBREAK_ALLOWBREAK,

LINEBREAK_NOBREAK, or #LINEBREAK_INSIDEACHAR

void set_linebreaks_utf32 ( const utf32_t s,
size_t  len,
const char *  lang,
char *  brks 
)

Sets the line breaking information for a UTF-32 input string.

Parameters:
[in]sinput UTF-32 string
[in]lenlength of the input
[in]langlanguage of the input
[out]brkspointer to the output breaking data, containing

LINEBREAK_MUSTBREAK, #LINEBREAK_ALLOWBREAK,

LINEBREAK_NOBREAK, or #LINEBREAK_INSIDEACHAR

int is_line_breakable ( utf32_t  char1,
utf32_t  char2,
const char *  lang 
)

Tells whether a line break can occur between two Unicode characters. This is a wrapper function to expose a simple interface. Generally speaking, it is better to use set_linebreaks_utf32 instead, since complicated cases involving combining marks, spaces, etc. cannot be correctly processed.

Parameters:
char1the first Unicode character
char2the second Unicode character
langlanguage of the input
Returns:
one of #LINEBREAK_MUSTBREAK, #LINEBREAK_ALLOWBREAK,

LINEBREAK_NOBREAK, or #LINEBREAK_INSIDEACHAR


Variable Documentation

const int linebreak_version = LINEBREAK_VERSION

Version number of the library.