National Character Support

10.2

The PL/B Language provides National Character support using the UTF-8 character encoding. By using UTF-8 character encoding, the PL/B Language preserves the basic PL/B instruction behaviors. This support minimizes the impact on PL/B program development. The UTF-8 character encoding support provides advantages which work best for the PL/B Language as follows:

It can represent all 1,114,112 Unicode characters.
Since UTF-8 is fully compatible with 7-bit ASCII, PL/B code that uses strings on a byte-by-byte basis still works.
Characters will never require more than four bytes.
String sort order is preserved. In other words, sorting UTF-8 strings per-byte yields the same order as sorting them per-character by logical Unicode value.
There are no byte-order/endianness issues, since UTF-8 data is a byte stream.

The National Character support for PL/B has added three data constructs of NCHAR, NINIT, and N"literal". These data constructs have the same basic language format and behaviors as implemented for DIM, INIT, and "literal' that is expected in the PL/B programs. The following table gives an overview comparison of the normal language constructs versus the National Character constructions:

Normal	National	Description
DIM	NCHAR	The NCHAR has the same structure as the DIM when it is declared in a PL/B program. For the same program declared character size of a DIM and NCHAR, the NCHAR physical byte size is four times larger than the DIM byte size. Also, the NCHAR has a state flag indicating that it can only be populated with valid UTF-8 characters.
INIT	NINIT	The NINIT has the same structure as the INIT when it is declared in a PL/B program. However, the physical byte size of the NINIT is four times larger than the INIT byte size when the same number of UTF-8 characters are in the literal declaration. Also, the NINIT has a state flag indicating that it can only be populated with valid UTF-8 characters.
"lit"	N"lit"	The N"lit" has the same structure as the "lit" when it is used in a PL/B program. However, the N"lit" literal data can only contains valid UTF-8 encoded characters. Otherwise, the PL/B compiler gives an appropriate compilation error. The N"lit" physical byte size is the exact number of bytes required to encode the UTF-8 characters.

Note:

The NCHAR, NINIT, and N"lit" data constructs can only contain a data string that is a properly formatted UTF-8 encode character data stream.
The 'F14' format error is generated by the PL/B runtimes if a PL/B program performs an operation that attempts to store an invalid UTF-8 data string into an NCHAR or NINIT data variable.

The PL/B National Character support using NCHAR, NINIT, and N"literal" data constructs are not supported or are restricted in the following PL/B instructions:

Not Supported:

AND

COMPRESS

CONVERTUTF

DECODE64

DECOMPRESS

DECRYPT

ENCODE64

ENCRYPT

NOT

PARSEFNAME

TEST

XOR

Restricted Support:

HASH	NCHAR can not be used for destination operand.
MOVE	MOVE from integer to NCHAR is not supported.
RESET	No NCHAR/N"lit" second operand.
SETLPTR	No NCHAR/N"lit" second operand.

See Also: Introduction