Corrigendum #9: Clarification About Noncharacters
       
              
              
                | Corrigendum | Effective Date | Applicable Versions | Fixed Version | Result Documented In: | 
              
                | Corrigendum #9: Clarification About Noncharacters | 2013-Jan-30 [134-C15]
 | 3.1.0 to 6.3.0 | 7.0.0 2014-June
 | Chapter 3, Conformance | 
              
        Background
                The formal wording of the definition of noncharacter
                in the standard has led some implementers to interpret any presence
                of a noncharacter code point in a Unicode string as causing that
                string to be ill-formed, and thereby has led to inappropriate
                over-rejection of some Unicode strings in APIs, components, or applications
                that should handle (i.e., either process or pass through) all well-formed Unicode
                strings.
		Noncharacters in the Unicode Standard are intended for internal use
                and have no standard interpretation when exchanged outside the context
                of internal use. However, they are not illegal in interchange nor do
                they cause ill-formed Unicode text. This has always been the intent
                of the standard, as expressed by the Unicode Technical Committee. This is
                necessary for the effective use of noncharacters, because anytime a
                Unicode string crosses an API boundary, it is in effect being
                "interchanged". Furthermore, for distributed software, it is
                often very difficult to determine what constitutes an "internal" versus
                an "external" context for any particular software process.
                The real intent of noncharacters is that they are permanently
                prohibited from being assigned standard, interchangeable meanings,
                rather than that they are prohibited from occurring in Unicode
                strings which happen to be interchanged.
                Corrigendum #9 provides a means for implementations that openly interchange
                noncharacters to claim conformance to versions of the standard in which
                Definition D14 nominally prohibits such interchange. This corrigendum does not
                affect the fact that when so interchanged, the intended semantics of noncharacters
                may not be interpretable.
		
        Changes to the Content of the Core Specification
        Change D14 in Section 3.4, Characters and Encoding, as indicated:
        
        Noncharacter: A code point that is permanently reserved for internal use
        and that should never be interchanged. Noncharacters consist of the values 
        U+nFFFE and U+nFFFF (where n is from 0 to 1016) 
        and the values U+FDD0..U+FDEF.
        
        Note that in Unicode 3.1.0 through Unicode 4.1.0, the definition in
        question was labeled D7b, instead of D14.
        There is associated informative text in the Core Specification concerning
        noncharacters. That text will also be clarified when the text of this
        corrigendum is applied in a future revision of the Core Specification.