1# Character Processing 2 3## Use Cases 4 5Character rules vary greatly in different languages, and it is usually difficult to extract expected information from the corresponding text. Character processing makes it possible to process text with similar logic under different language rules. 6 7## How to Develop 8 9 10### Character Type Identification Using Character Attributes 11 12Character attributes are used to determine the character type, for example, digit, letter, or space, and check whether a character is of the right-to-left (RTL) language or whether a character is an ideographic character (for example, Chinese, Japanese, or Korean). 13 14These functions are implemented by APIs of the **Unicode** class. For example, you can use [isDigit](../reference/apis-localization-kit/js-apis-i18n.md#isdigit9) to check whether a character is a digit. The development procedure is as follows: 15 161. Import the **i18n** module. 17 18 ```ts 19 import { i18n } from '@kit.LocalizationKit'; 20 ``` 21 222. Obtain the character attribute. 23 24 ```ts 25 let isDigit: boolean = i18n.Unicode.isDigit(char: string); 26 ``` 27 283. Obtain the character type. The following code snippet uses the common type as an example. For details, see the **getType** API reference. 29 30 ```ts 31 let type = i18n.Unicode.getType(char: string); 32 ``` 33 34**Development Example** 35```ts 36// Import the i18n module. 37import { i18n } from '@kit.LocalizationKit'; 38 39// Check whether the input character is a digit. 40let isDigit = i18n.Unicode.isDigit('1'); // isDigit: true 41 42// Check whether a character is of the RTL language. 43let isRTL = i18n.Unicode.isRTL('a'); // isRTL: false 44 45// Check whether a character is an ideographic character. 46let isIdeograph = i18n.Unicode.isIdeograph('Hua'); // isIdeograph: true 47 48// Obtain the character type. 49let type = i18n.Unicode.getType('a'); // type: U_LOWERCASE_LETTER 50``` 51 52 53### Transliteration 54 55Transliteration means to use content with similar pronunciation in the local language to replace the original content. This function is implemented through the [transform](../reference/apis-localization-kit/js-apis-i18n.md#transform9) API of the **Transliterator** class. The development procedure is as follows: 56 57> **NOTE** 58> This module supports the transliteration from Chinese characters to pinyin. However, it does not guaranteed that polyphonic characters are effectively processed based on the context. 59 601. Import the **i18n** module. 61 ```ts 62 import { i18n } from '@kit.LocalizationKit'; 63 ``` 64 652. Create a **Transliterator** object to obtain the transliteration list. 66 ```ts 67 let transliterator: i18n.Transliterator = i18n.Transliterator.getInstance(id: string); // Pass in a valid ID to create a Transliterator object. 68 let ids: string[] = i18n.Transliterator.getAvailableIDs(); // Obtain the list of IDs supported by the Transliterator object. 69 ``` 70 713. Transliterate text. 72 ```ts 73 let res: string = transliterator.transform(text: string); // Transliterate the text content. 74 ``` 75 76 77**Development Example** 78```ts 79// Import the i18n module. 80import { i18n } from '@kit.LocalizationKit'; 81 82// Transliterate the text into the Latn format. 83let transliterator = i18n.Transliterator.getInstance('Any-Latn'); 84let wordArray = ["中国", "德国", "美国", "法国"] 85for (let i = 0; i < wordArray.length; i++) { 86 let res = transliterator.transform(wordArray[i]); // res: zhōng guó, dé guó, měi guó, fǎ guó 87} 88 89// Chinese transliteration and tone removal 90let transliter = i18n.Transliterator.getInstance('Any-Latn;Latin-Ascii'); 91let result = transliter.transform('中国'); // result: zhong guo 92 93// Chinese surname pronunciation 94let nameTransliter = i18n.Transliterator.getInstance('Han-Latin/Names'); 95let result1 = nameTransliter.transform('单老师'); // result1: shàn lǎo shī 96let result2 = nameTransliter.transform('长孙无忌'); // result2: zhǎng sūn wú jì 97 98 99// Obtain the list of IDs supported by the Transliterator object. 100let ids = i18n.Transliterator.getAvailableIDs(); // ids: ['ASCII-Latin', 'Accents-Any', ...] 101``` 102 103 104### Character Normalization 105 106Character normalization means to the standardize characters according to the specified paradigm. This function is implemented through the [normalize](../reference/apis-localization-kit/js-apis-i18n.md#normalize10) API of the **Normalizer** class. The development procedure is as follows: 107 1081. Import the **i18n** module. 109 ```ts 110 import { i18n } from '@kit.LocalizationKit'; 111 ``` 112 1132. Create a **Normalizer** object. Pass in the text normalization paradigm to create a **Normalizer** object. The text normalization paradigm can be NFC, NFD, NFKC, or NFKD. For details, see [Unicode Normalization Forms](https://www.unicode.org/reports/tr15/#Norm_Forms). 114 ```ts 115 let normalizer: i18n.Normalizer = i18n.Normalizer.getInstance(mode: NormalizerMode); 116 ``` 117 1183. Normalize the text. 119 ```ts 120 let normalizedText: string = normalizer.normalize(text: string); // Normalize the text. 121 ``` 122 123**Development Example** 124```ts 125// Import the i18n module. 126import { i18n } from '@kit.LocalizationKit'; 127 128// Normalize characters in the NFC form. 129let normalizer = i18n.Normalizer.getInstance(i18n.NormalizerMode.NFC); 130let normalizedText = normalizer.normalize('\u1E9B\u0323'); // normalizedText: \u1E9B\u0323 131``` 132 133 134### Line Wrapping 135 136Line wrapping means to obtain the text break position based on the specified text boundary and wrap the line. It is implemented by using the APIs of the [BreakIterator](../reference/apis-localization-kit/js-apis-i18n.md#breakiterator8) class. The development procedure is as follows: 137 1381. Import the **i18n** module. 139 ```ts 140 import { i18n } from '@kit.LocalizationKit'; 141 ``` 142 1432. Create a **BreakIterator** object. 144 Pass a valid locale to create a **BreakIterator** object. This object wraps lines based on the rules specified by the locale. 145 146 ```ts 147 let iterator: i18n.BreakIterator = i18n.getLineInstance(locale: string); 148 ``` 149 1503. Set the text to be processed. 151 ```ts 152 iterator.setLineBreakText(text: string); // Set the text to be processed. 153 let breakText: string = iterator.getLineBreakText(); // View the text being processed by the BreakIterator object. 154 ``` 155 1564. Obtain the break positions of the text. 157 ```ts 158 let currentPos: number = iterator.current(); // Obtain the position of BreakIterator in the text. 159 let firstPos: number = iterator.first(); // Set the position of BreakIterator as the first break point and return the position of the break point. The first break point is always at the beginning of the text, that is firstPos = 0. 160 let nextPos: number = iterator.next(number); // Move BreakIterator by the specified number of break points. If the number is a positive number, the iterator is moved backward. If the number is a negative number, the iterator is moved forward. The default value is 1. nextPos indicates the position after moving. If BreakIterator is moved out of the text length range, -1 is returned. 161 let isBoundary: boolean = iterator.isBoundary(number); // Check whether the position indicated by the specified number is a break point. 162 ``` 163 164 165**Development Example** 166```ts 167// Import the i18n module. 168import { i18n } from '@kit.LocalizationKit'; 169 170// Create a BreakIterator object. 171let iterator = i18n.getLineInstance('en-GB'); 172 173// Set the text to be processed. 174iterator.setLineBreakText('Apple is my favorite fruit.'); 175 176// Move BreakIterator to the beginning of the text. 177let firstPos = iterator.first(); // firstPos: 0 178 179// Move BreakIterator by several break points. 180let nextPos = iterator.next(2); // nextPos: 9 181 182// Check whether a position is a break point. 183let isBoundary = iterator.isBoundary(9); // isBoundary: true 184 185// Obtain the text processed by BreakIterator. 186let breakText = iterator.getLineBreakText(); // breakText: Apple is my favorite fruit. 187``` 188<!--RP1--><!--RP1End--> 189 190<!--no_check-->