c# - Regex for matching attribute values in invalid xml file -
i have invalid xml's ( < > & "" characters inside attribute value). need parse them correct xml file in c#.
the way can think of escaping invalid characters inside attributes. works fine < > , & (< ;, > ;, & ;). have problems detecting , changing "" inside attributes.
right using regex matching attribute values:
/="(.*?)"
my test case this:
<add sqlquery="select blaat test count == "1"" test="dfsdf"/> <add sqlquery="select blaat test count == "1"" test="dfsdf" /> <add sqlquery="select blaat test count == "1" , blaat > 3" test="dfsdf"/> <add xmldiff_action="movenodefrom('1')" alias="jkhkjh" /> <add xmldiff_action="movenodefrom('1')" />
as can see in test matching stops @ quote "1""
if change regex greedy /="(.*)" match whole line (so including other attributes on same line.
it hard define "end quote" of xml attribute. in test cases can end in:
- " (space)
- "/>
- "
- " otherattribute="value"
i know looks unnecessary want parse invalid xml (even invalid sql query because uses double spaces , quotes == "1". thas because comes application saves data in cdata section. doing need parse cdata section correct xml (with escaping invalid characters)
huge in advance if solve in regex or combination of regex , c#!
considering sql statement expected inside params, come following regexp using captured groups:
(?<match>"((\g<match>|[^"]*))*?")(?=\s|\/|>)/gm
proof somehow works, it’s insane try regexps.
Comments
Post a Comment