Details

[Home]

Issue of the Implementation # S0690

Brief

Inconsistent processing of assigned characters (Glib Unicode Manipulation)

Detailed Description

The description of g_unichar_isdefined function states:

Returns : TRUE if the character has an assigned value

What is called "character" here corresponds to "code point" in the Unicode standard (version 5.0 and later). This standard states the following concerning assignment of values to the code points (http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf, Chapter 2 "General Structure", section 2.4 "Code Points and Characters"):

Not all assigned code points represent abstract characters; only Graphic, Format, Control and Private-use do. Surrogates and Noncharacters are assigned code points but are not assigned to abstract characters.

The meaning of the term "assigned" is rather unclear in the description of g_unichar_isdefined function.

If "assigned" corresponds to the code points assigned to abstract characters, g_unichar_isdefined should return FALSE for the code points from "Surrogates" and "Noncharacters" groups.

If "assigned" implies just assigned code points, g_unichar_isdefined should return TRUE for the code points from "Surrogates" and "Noncharacters" groups.

However, g_unichar_isdefined returns TRUE for "Surrogates", but FALSE for "Noncharacters" code points in glib up to 2.17.3 inclusive.

For example, one may check the return value of g_unichar_isdefined when it is called for the following code points:

  • 0xD800 (U+D800) - "surrogate" code point, the function returns TRUE
  • 0xFDD0 (U+FDD0) - "noncharacter" code point, the function returns FALSE

That is, the actual behaviour of g_unichar_isdefined function corresponds to neither of the meanings of the term "assigned" specified in the Unicode standard.

If this is intentional and "has an assigned value" means something different in the description of g_unichar_isdefined than in the Unicode standard, it should be stated explicitly to avoid confusion.

Problem location(s) in the standard

Linux Standard Base Desktop Specification 3.2, section 15.2.1.1 - "Interfaces for GTK General purpose utility library" that refers Glib 2.6.2 Reference Manual (http://www.gtk.org/api/2.6/glib/glib-Unicode-Manipulation.html#g-unichar-isdefined)

Component

gtk-glib 2.17.3 and below

Accepted

Gnome Bugzilla 541507

Status

Fixed in glib 2.17.4

[Home]