Recursive group capturing regex with backreference in JAVA -


i trying capture multiple groups recursively in string using backreference group within regex. though using pattern , matcher , "while(matcher.find())" loop, still capturing last instance instead of instances. in case possible tags <sm>,<po>,<pof>,<pos>,<poi>,<pol>,<poif>,<poil>. since these formatting tags, need capture:

  1. any text outside of tag (so can format "normal" text, , going capturing text before tag in 1 group while capture tag in group, , iterate through occurrences remove has been captured original string; if have text left on in end format "normal" text)
  2. the "name" of tag know how have format text inside tag
  3. the text contents of tag formatted accordingly tag name , associated rules

here sample code:

        string currenttext = "the man said:<pof>“this one, @ last, bone of bones</pof><poi>and flesh of flesh;</poi><po>this 1 shall called ‘woman,’</po><poil>for out of man 1 has been taken.”</poil>";         string remainingtext = currenttext;          //first check if our string has kind of xml tag, because if not format whole string "normal" text         if(currenttext.matches("(?su).*<[/]{0,1}(?:sm|po)[f|l|s|i|3]{0,1}[f|l]{0,1}>.*"))         {                             //an opening or closing tag has been found, let start our pattern captures             //i using backreference \\2 make sure closing tag same opening tag             pattern pattern1 = pattern.compile("(.*)<((sm|po)[f|l|s|i|3]{0,1}[f|l]{0,1})>(.*?)</\\2>",pattern.unicode_character_class);             matcher matcher1 = pattern1.matcher(currenttext);                             int iteration = 0;             while(matcher1.find()){                 system.out.print("iteration ");                 system.out.println(++iteration);                 system.out.println("group1:"+matcher1.group(1));                 system.out.println("group2:"+matcher1.group(2));                 system.out.println("group3:"+matcher1.group(3));                 system.out.println("group4:"+matcher1.group(4));                  if(matcher1.group(1) != null && matcher1.group(1).isempty() == false)                 {                     m_xtext.insertstring(xtextrange, matcher1.group(1), false);                     remainingtext = remainingtext.replacefirst(matcher1.group(1), "");                 }                 if(matcher1.group(4) != null && matcher1.group(4).isempty() == false)                 {                     switch (matcher1.group(2)) {                         case "pof": [...]                         case "pos": [...]                         case "poif": [...]                         case "po": [...]                         case "poi": [...]                         case "pol": [...]                         case "poil": [...]                         case "sm": [...]                     }                     remainingtext = remainingtext.replacefirst("<"+matcher1.group(2)+">"+matcher1.group(4)+"</"+matcher1.group(2)+">", "");                 }             } 

the system.out.println outputting once in console, these results:

iteration 1:   group1:the man said:<pof>“this one, @ last, bone of bones</pof><poi>and flesh of flesh;</poi><po>this 1 shall called ‘woman,’</po>;    group2:poil   group3:po   group4:for out of man 1 has been taken.” 

group 3 ignored, useful groups 1, 2 , 4 (group 3 part of group 2). why capturing last tag instance "poil", while not capturing preceding "pof", "poi", , "po" tags?

the output see this:

iteration 1:   group1:the man said:   group2:pof   group3:po   group4:“this one, @ last, bone of bones  iteration 2:   group1:   group2:poi   group3:po   group4:and flesh of flesh;  iteration 3:   group1:   group2:po   group3:po   group4:this 1 shall called ‘woman,’  iteration 3:   group1:   group2:poil   group3:po   group4:for out of man 1 has been taken.” 

i found answer problem, needed non-greedy quantifier in first capture, had in fourth capture group. working needed:

pattern pattern1 = pattern.compile("(.*?)<((sm|po)[f|l|s|i|3]{0,1}[f|l]{0,1})>(.*?)</\\2>",pattern.unicode_character_class); 

Comments

Popular posts from this blog

php - Admin SDK -- get information about the group -

Python Error - TypeError: input expected at most 1 arguments, got 3 -

dns - How To Use Custom Nameserver On Free Cloudflare? -