Unicode Support and Character Limits

Unicode Support

Text data in EDS is stored internally in the UTF-8 encoding. This allows all characters from all scripts (Latin, Cyrillic, CJK, etc) to be used in most parts of EDS, while minimising memory usage and being remaining compatible with ASCII text if only ASCII text is used.

The UTF-8 encoding uses a variable number of bytes (1-4) per Unicode codepoint (glyph). For example:

ASCII characters: 1 byte.
Extended Latin, Greek, Cyrillic, Arabic characters: 2 bytes.
Chinese, Japanese and Korean characters: 3 bytes.
Others, Symbols, Emoji: 4 bytes.

Character Limits

For compatibility reasons, whenever a character limit is specified in EDS, this refers to a number of bytes of UTF-8 encoded text (unless otherwise specified). Therefore, the number of Unicode characters that can fit within that limit is dependent on the actual characters used.

For example, if a 10 'character' (byte) limit was specified, the following would fit (any additional characters would not):

ASCII: "CatDogBird" (10 bytes)
Cyrillic: "Кошка" (10 bytes)
Chinese: "猫犬鸟" (9 bytes)

EDS will truncate any text that does not fit within the given byte limit.