International Components for Unicode

ICU Home
  · ICU Home
ICU4C Demos
  · Converter Explorer
  · Collation Demo
  · Segments
  · IDNA
  · Locale Explorer
  · Normalization Browser
  · Regular Expressions
  · String Compare
  · Transforms
  · Unicode Browser
ICU4J Demos
  · Demo Page
Tools
 

Related Websites

Unicode Consortium

Common Locale Data

 

  ICU  >   Demo > 

ICU Unicode String Comparison


This demo application illustrates the operation of some of the different string compare functions that are available in the ICU Unicode support library.

Enter two strings to be compared, then click on the "Submit" button. The results are described in the Key. For more information, see Background.


Strings to be Compared
Enter strings, then submit
Strings after unescaping αλφα αλφα
Strings in hex format \u03b1 \u03bb \u03c6 \u03b1 \u03b1 \u03bb \u03c6 \u03b1


Comparison Result
Binary Caseless Equiv Equiv-Caseless
Y Y Y Y


Key to Results
Result Meaning
Binary Strings have exactly the same code points in Unicode
Caseless Strings are equal, case insensitive; thus case differences are discarded. Examples: αλφα and Αλφα
Equiv Strings are canonically equivalent; thus equal after normalization. Examples: Åland and Åland, or \u062f\u0650\u0651 and \u062f\u0651\u0650. Note that Åland and Åland are also equal in a caseless match because they both case-fold to the same string.
Equiv-Caseless Strings are canonically equivalent, case insensitive. Examples: åland and Åland
Hex Display all input characters as hex values.


Background
  • The above comparisons use only Unicode properties, and are invariant across locales. To compare two strings according to locale settings, see ICU Collation Demo.
  • The strings may contain \uhhhh and \Uhhhhhhhh hex escapes for characters that can not be entered directly from the keyboard. For descriptions of additional escape sequences, see UnicodeString::unescape() in the ICU API reference.
  • This tool is built using ICU's string comparison functions, showing the effects of the options for Unicode Normalization (canonical equivalence) and Case Insensitive comparisons.