|
Rational Developer for Power Systems Software V7.6 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.ibm.etools.iseries.util.NlsUtil
public class NlsUtil
This class provides national language support (NLS) functions. These are mostly DBCS/MBCS-related functions, and may not work correctly for other character encodings. Some bidirectional support will also be added gradually, mainly for the editor's internal use.
These NLS functions attempt to ease the handling by users of files originating on and/or targeted for a remote system, in a manner similar to their handling in that particular environment (e.g., iSeries members being edited with an iSeries editor): emulation of SO/SI control characters, awareness of the text's actual positions, columns, length, and sequence numbers (all byte-determined), etc.
The current implementation, while attempting to be generic, focuses on Windows as the workstation, and zSeries (S/390) and iSeries (AS/400) as the remote system. These remote systems use EBCDIC character encodings: a DBCS character uses two bytes, a DBCS string is delimited by SO/SI controls, certain character combinations (Arabic lam + alef) may translate into a single-byte character (visual lamalef code point). Also, for these systems each byte in the character encoding takes up one display column on the screen (1-byte SBCS character = 1 display column, 2-byte DBCS character = 2-column display width, the SO and SI control characters = 1 display column each).
For EUC character encodings (UNIX, AIX, Linux), NlsUtil may not provide adequate emulation of their source edit environments. The byte-length of characters differs from their display-column width, so you may have to use native code (JNI), e.g., the *mb* C library functions, for display-width calculations (as only byte-length information is currently obtainable in Java); these calculations affect the (column-based) tabs expansion in LPEX.
Terminology used in here:
Unicode | Any Java program, uses Unicode for its internal representation of characters. More specifically, this is the UTF-16 encoding, which encodes the basic multilingual plane of Unicode version 1 directly, and uses surrogate pairs as the escape mechanism to encode the next 16 planes of Unicode version 3 |
encoding | a Java-supported character encoding, e.g., "Cp1252" (Windows Latin-1) |
native encoding |
this is the default character encoding of the platform (host operating system) that LPEX runs on, according to the default locale. This is, usually, an ASCII character encoding on a workstation (Windows, Linux, OS/2, etc.); an EBCDIC character encoding on a mainframe/midi (S/390, AS/400, etc.). This encoding is normally determined from the "file.encoding" Java system property |
file encoding |
this is the character encoding of the
underlying file. The file encoding is, normally, the native
encoding, as files are usually stored in an encoding that is
same as the default encoding of the host operating system (for
example, on the Japanese Linux, files are typically stored in
EUC-JP).
In a heterogeneous platform environment, the encoding of the host operating system may be different from the encoding of the file we want to load into the editor. In such a case, one must explicitly specify the encoding of the file, or let the editor attempt to detect it; the editor will then perform the character code conversion on loading the file in, and similarly whan saving the document. |
source encoding |
the source file's character encoding: the file being edited may originate from and/or be targeted for a remote system (i.e., different from the platform that LPEX runs on). Setting the source encoding information in the editor allows LPEX to emulate features of the file's original editing environment (for example, display emulated SO/SI controls), correctly establish the sequence numbers in effect, calculate the length limit of text lines for save operations, etc. |
DBCS | Asian character set/encoding that contains double-byte characters |
MBCS | Asian character set/encoding that contains multi-byte characters |
SO, SI | Shift-out and Shift-in control characters. Only EBCDIC DBCS encodings use SO/SI escape characters. Balanced SO/SI characters enclose sequences of DBCS character bytes. LPEX can display emulation SO/SI characters in order to present the user an image of the file similar to the one seen in its source natural habitat (e.g., an iSeries member being edited with an iSeries editor). |
Notes:
<SO>D1D2<SI><SO>D1D2<SI>
),
this information is lost in the conversion
'\u000e'
and '\u000f'
), these will be
kept (and balanced) in the converted EBCDIC by the Java character-encoding
converters.
Method Summary | |
---|---|
static int |
countLamAlefs(String buffer,
int ccsid)
Counts the number of Lam-Alefs in a string. |
static int |
encodingCharIndex(String s,
int index,
String encoding)
Return the character index into the encoded string (i.e., as converted from Java Unicode s using the character encoding ),
which corresponds to the index into text String s . |
static int |
encodingLength(char c,
String encoding)
Get the byte-length for a string consisting of one Java Unicode character c converted to the specified character encoding . |
static int |
encodingLength(String s,
String encoding)
Get the byte-length of a Java Unicode String s in
the specified character encoding . |
static int |
getEBCDICLengthOfLogicalBuffer(String buffer)
Return the length of the Unicode buffer in EBCDIC bytes, taking lam-alef chars and LRM and RLM markters into account |
static byte[] |
getLamAlefBytes(int ccsid)
Retrieves the list of iSeries Lam-Alef bytes |
static int |
getLamAlefsCountInBufferRange(String buffer,
int len)
Count the number of lam-alef characters in the given text buffer beginning at the first character for the given length If the character at index len is an alef and the following character is a lam, then still increment the count of lam-alefs found. |
static String |
getNativeEncoding()
Retrieve the native (platform's default) character encoding. |
static int |
indexFromEncodingIndex(String s,
int index,
String encoding)
Return the index into the Java Unicode text String s which
corresponds to the index into its encoding string (i.e., as
converted using the specified character encoding ). |
static boolean |
isALEF(char c)
Indicates whether or not the character is an Arabic Alef |
static boolean |
isBidiEncoding(String encoding)
Determine whether a character encoding is bidirectional. |
static boolean |
isEucEncoding(String encoding)
Determine whether a character encoding is EUC (AIX MBCS). |
static boolean |
isIgnoringBidiMarks(String strEncoding,
int CCSID)
Returns whether the document might contain ignorable bidi marks. |
static boolean |
isLAM(char c)
Indicates whether or not the character is an Arabic Lam |
static boolean |
isLamAlefByte(byte bufferByte,
byte[] lamAlefArray)
Indicates wether or not a byte is lam-alef |
static boolean |
isLamAlefChar(char c)
Indicates whether or not a character is a joined lam-alef in UNICODE |
static boolean |
isMbcsEncoding(String encoding)
Determine whether a character encoding is DBCS/MBCS. |
static boolean |
isShapedBidiCcsid(int ccsid)
Does this iSeries CCSID contain a Lam-Alef ligature |
static boolean |
isSosiEncoding(String encoding)
Determine whether a character encoding uses SO/SI control characters - i.e., whether it is an EBCDIC DBCS character encoding. |
static boolean |
isValidEncoding(String encoding)
Validate a character encoding. |
static byte[] |
massageLamAlefs(byte[] buffer,
int ccsid)
Adds a blank for each lam-alef, so that they get transformed from one iSeries byte, to two UNICODE bytes |
static String |
massageLamAlefs(String buffer,
int ccsid)
Adds a blank for each lam-alef, so that they get transformed from one iSeries byte, to two UNICODE bytes |
static char |
toUpperCase(char c)
Uppercases the character taking into account variant characters (which do not get uppercased) |
static String |
toUpperCase(String text)
Uppercases the text taking into account variant characters (which do not get uppercased) Also does not uppercase any substring enclosed in double quotes. |
static String |
truncateString(String visualString,
int visualLength)
Truncate the given visual string, to the visual length given in EBCDIC bytes |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Method Detail |
---|
public static boolean isValidEncoding(String encoding)
encoding
- character encoding to validate
null
or not supportedpublic static boolean isMbcsEncoding(String encoding)
isValidEncoding(java.lang.String)
to ensure that this character
encoding is supported by the active Java run environment.
encoding
- canonical name of a character encodingpublic static boolean isEucEncoding(String encoding)
isValidEncoding(java.lang.String)
to ensure that this character
encoding is supported by the active Java run environment.
encoding
- canonical name of a character encodingpublic static boolean isSosiEncoding(String encoding)
isValidEncoding(java.lang.String)
to ensure that this character
encoding is supported by the active Java run environment.
encoding
- canonical name of a character encodingpublic static boolean isBidiEncoding(String encoding)
isValidEncoding(java.lang.String)
to ensure that this character
encoding is supported by the active Java run environment.
encoding
- canonical name of a character encodingpublic static String getNativeEncoding()
"file.encoding"
Java system property.
public static int encodingLength(String s, String encoding)
s
in
the specified character encoding
.
For certain character encodings the length returned includes control
bytes. For example, for EBCDIC DBCS encodings the length includes the
SO/SI control characters; for UTF-16, it includes the byte-order mark.
public static int encodingLength(char c, String encoding)
c
converted to the specified character encoding
.
For an EBCDIC DBCS character, this method returns 2
(i.e.,
the length of the two-byte character itself, without the SO/SI
controls). For other character encodings, the length returned may include
control bytes.
c
converts to a single-byte character;
2 if the encoding character is double-byte;
n if the encoding character is multi-byte.public static int encodingCharIndex(String s, int index, String encoding)
s
using the character encoding
),
which corresponds to the index
into text String s
.
If the encoding is EBCDIC DBCS, the index returned is positioned away from a SO/SI control character.
s
- Java Unicode Stringindex
- ZERO-based index into s
encoding
- character encoding
public static int indexFromEncodingIndex(String s, int index, String encoding)
s
which
corresponds to the index
into its encoding string (i.e., as
converted using the specified character encoding
).
s
- Java Unicode Stringindex
- ZERO-based index into the encoding string of s
s
public static boolean isIgnoringBidiMarks(String strEncoding, int CCSID)
Bidirectional marks LRM and RLM may be found in files brought over from an iSeries remote, as a result of the conversion from the visual-order EBCDIC iSeries file to a logical-order UTF-8 / Unicode workstation file.
LPEX should ignore these marks for most intents and purposes in this scenario (iSeries Arabic and Hebrew CCSIDs), as they are removed when the file is converted back to the remote.
This method returns true
when the source encoding
is bidirectional, and the source CCSID
defines a visual encoding.
public static boolean isLAM(char c)
public static boolean isALEF(char c)
public static boolean isShapedBidiCcsid(int ccsid)
ccsid
- public static byte[] getLamAlefBytes(int ccsid)
ccsid
- The ccsid to get the bytes.
public static boolean isLamAlefByte(byte bufferByte, byte[] lamAlefArray)
bufferByte
- The byte to checklamAlefArray
- The list of lam-alefs to check against
public static byte[] massageLamAlefs(byte[] buffer, int ccsid)
buffer
- The orginal buffer to massageccsid
- The ccsid of the buffer
public static int countLamAlefs(String buffer, int ccsid)
buffer
- The string to checkccsid
- The destination CCISD
public static String massageLamAlefs(String buffer, int ccsid)
buffer
- The orginal buffer to massageccsid
- The ccsid of the buffer
public static boolean isLamAlefChar(char c)
c
- The characters to checks
public static String toUpperCase(String text)
text
- The text to uppercase
public static char toUpperCase(char c)
c
- The character to uppercase
public static int getLamAlefsCountInBufferRange(String buffer, int len)
buffer
- - text buffer to count lam-aleflen
- - number of characters from the beginning of the buffer to count lam-alefs in
public static int getEBCDICLengthOfLogicalBuffer(String buffer)
text
- - the Unicode buffer of interest
public static String truncateString(String visualString, int visualLength)
visualString
- visualLength
-
|
Rational Developer for Power Systems Software V7.6 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |