Classes | Typedefs | Enumerations | Functions | Variables
src/linebreak/linebreakdef.h File Reference

Go to the source code of this file.

Classes

struct  LineBreakProperties
struct  LineBreakPropertiesLang

Typedefs

typedef utf32_t(* get_next_char_t )(const void *, size_t, size_t *)

Enumerations

enum  LineBreakClass {
  LBP_Undefined, LBP_OP, LBP_CL, LBP_CP,
  LBP_QU, LBP_GL, LBP_NS, LBP_EX,
  LBP_SY, LBP_IS, LBP_PR, LBP_PO,
  LBP_NU, LBP_AL, LBP_ID, LBP_IN,
  LBP_HY, LBP_BA, LBP_BB, LBP_B2,
  LBP_ZW, LBP_CM, LBP_WJ, LBP_H2,
  LBP_H3, LBP_JL, LBP_JV, LBP_JT,
  LBP_AI, LBP_BK, LBP_CB, LBP_CR,
  LBP_LF, LBP_NL, LBP_SA, LBP_SG,
  LBP_SP, LBP_XX
}

Functions

utf32_t lb_get_next_char_utf8 (const utf8_t *s, size_t len, size_t *ip)
utf32_t lb_get_next_char_utf16 (const utf16_t *s, size_t len, size_t *ip)
utf32_t lb_get_next_char_utf32 (const utf32_t *s, size_t len, size_t *ip)
void set_linebreaks (const void *s, size_t len, const char *lang, char *brks, get_next_char_t get_next_char)

Variables

struct LineBreakProperties lb_prop_default []
struct LineBreakPropertiesLang lb_prop_lang_map []

Detailed Description

Definitions of internal data structures, declarations of global variables, and function prototypes for the line breaking algorithm.

Version:
2.1, 2011/05/07
Author:
Wu Yongwei

Typedef Documentation

typedef utf32_t(* get_next_char_t)(const void *, size_t, size_t *)

Abstract function interface for lb_get_next_char_utf8,

lb_get_next_char_utf16, and lb_get_next_char_utf32.


Enumeration Type Documentation

Line break classes. This is a direct mapping of Table 1 of Unicode Standard Annex 14, Revision 26.

Enumerator:
LBP_Undefined 

Undefined

LBP_OP 

Opening punctuation

LBP_CL 

Closing punctuation

LBP_CP 

Closing parenthesis

LBP_QU 

Ambiguous quotation

LBP_GL 

Glue

LBP_NS 

Non-starters

LBP_EX 

Exclamation/Interrogation

LBP_SY 

Symbols allowing break after

LBP_IS 

Infix separator

LBP_PR 

Prefix

LBP_PO 

Postfix

LBP_NU 

Numeric

LBP_AL 

Alphabetic

LBP_ID 

Ideographic

LBP_IN 

Inseparable characters

LBP_HY 

Hyphen

LBP_BA 

Break after

LBP_BB 

Break before

LBP_B2 

Break on either side (but not pair)

LBP_ZW 

Zero-width space

LBP_CM 

Combining marks

LBP_WJ 

Word joiner

LBP_H2 

Hangul LV

LBP_H3 

Hangul LVT

LBP_JL 

Hangul L Jamo

LBP_JV 

Hangul V Jamo

LBP_JT 

Hangul T Jamo

LBP_AI 

Ambiguous (alphabetic or ideograph)

LBP_BK 

Break (mandatory)

LBP_CB 

Contingent break

LBP_CR 

Carriage return

LBP_LF 

Line feed

LBP_NL 

Next line

LBP_SA 

South-East Asian

LBP_SG 

Surrogates

LBP_SP 

Space

LBP_XX 

Unknown


Function Documentation

utf32_t lb_get_next_char_utf8 ( const utf8_t s,
size_t  len,
size_t *  ip 
)

Gets the next Unicode character in a UTF-8 sequence. The index will be advanced to the next complete character, unless the end of string is reached in the middle of a UTF-8 sequence.

Parameters:
[in]sinput UTF-8 string
[in]lenlength of the string in bytes
[in,out]ippointer to the index
Returns:
the Unicode character beginning at the index; or

EOS if end of input is encountered

utf32_t lb_get_next_char_utf16 ( const utf16_t s,
size_t  len,
size_t *  ip 
)

Gets the next Unicode character in a UTF-16 sequence. The index will be advanced to the next complete character, unless the end of string is reached in the middle of a UTF-16 surrogate pair.

Parameters:
[in]sinput UTF-16 string
[in]lenlength of the string in words
[in,out]ippointer to the index
Returns:
the Unicode character beginning at the index; or

EOS if end of input is encountered

utf32_t lb_get_next_char_utf32 ( const utf32_t s,
size_t  len,
size_t *  ip 
)

Gets the next Unicode character in a UTF-32 sequence. The index will be advanced to the next character.

Parameters:
[in]sinput UTF-32 string
[in]lenlength of the string in dwords
[in,out]ippointer to the index
Returns:
the Unicode character beginning at the index; or

EOS if end of input is encountered

void set_linebreaks ( const void *  s,
size_t  len,
const char *  lang,
char *  brks,
get_next_char_t  get_next_char 
)

Sets the line breaking information for a generic input string.

Parameters:
[in]sinput string
[in]lenlength of the input
[in]langlanguage of the input
[out]brkspointer to the output breaking data, containing #LINEBREAK_MUSTBREAK,

LINEBREAK_ALLOWBREAK, #LINEBREAK_NOBREAK,

or #LINEBREAK_INSIDEACHAR

Parameters:
[in]get_next_charfunction to get the next UTF-32 character

Variable Documentation

Default line breaking properties as from the Unicode Web site.

Association data of language-specific line breaking properties with language names. This is the definition for the static data in this file. If you want more flexibility, or do not need the data here, you may want to redefine lb_prop_lang_map in your C source file.