YC's knowledge Management: [XML][C#]XML遇到ASCII控制字元處理方法

2014年2月25日

[XML][C#]XML遇到ASCII控制字元處理方法

最近遇到了一個頗妙的狀況，原先的目的是要將資料轉成XML格式，
輸出後檢測卻發現了一個元素所填入的資料含有特殊字元
如下圖：

既然已知是特殊字元，那就是把它取代掉就沒事了。

可是...是哪些特殊字元不能用?拜神之後得到一些可能的答案

原來是ASCII碼中 00H 到 1FH字元是控制字元，所以XML在讀取時會出現錯誤

(出處:http://home.educities.edu.tw/wanker742126/asm/ap04.html)

詳情請見W3C定義查閱

既然找到哪些字元了，那就可以寫一段方法專門處理這種字串。
參考網路上找到的寫法如下：

public static class YCExtension
{
    /// 
    /// Removes control characters and other non-UTF-8 characters
    /// 
    /// The string to process
    /// A string with no control characters or entities above 0x00FD
    public static string RemoveTroublesomeCharacters(this string inString)
    {
        if (inString == null) return null;

        StringBuilder newString = new StringBuilder();
        char ch;

        for (int i = 0; i < inString.Length; i++)
        {

            ch = inString[i];
            // remove any characters outside the valid UTF-8 range as well as all control characters
            // except tabs and new lines
            if ((ch < 0x00FD && ch > 0x001F) || ch == '\t' || ch == '\n' || ch == '\r')
            {
                newString.Append(ch);
            }
        }
        return newString.ToString();

    }
}

參考出處：
http://stackoverflow.com/questions/20762/how-do-you-remove-invalid-hexadecimal-characters-from-an-xml-based-data-source-p

http://home.educities.edu.tw/wanker742126/asm/ap04.html

--------------------------------------------------------------------------------------------------------------
2014/06/21補充:
最近又遇到，重新查了一次資料，找到了Linq的寫法，簡明扼要，一行搞定！！！
語法如下

string s="含有控制字元內容";
s =new string(s.Where(p => !char.IsControl(p)).Select(p=>p).ToArray());

--------------------------------------------------------------------------------------------------------------

2 則留言:

匿名2014年7月25日下午5:55
在xml的時候用CData標明是"字"，不是比較快嗎?...
不專業路過...
回覆刪除
回覆
匿名2017年9月4日下午1:34
CDATA一樣會爆
回覆刪除
回覆

新增留言

網頁

2014年2月25日

[XML][C#]XML遇到ASCII控制字元處理方法

2 則留言: