python - Between double curly braces: replace particular text -


i've got string (python 2.7.3) rendered template in django don't think specific django. string comes document.xml file inside docx file. i'm extacting document xml rendering , putting inside docx simple mail merge type stuff.

one of issues, other obvious limitations template tags can use, word likes drop in whole bunch of xml if edit text in word.

for needs, i'd successful if could

  1. find occurrences of " between double curly braces , replace quote ".

i'd replace " " in following:

word_docxml = 'some text here {{form.letterdate|date:"y-m-d"}} , more text' 

i reading on these:

but having trouble putting together.

  1. how remove/strip inside , including < > in between {{ }}'s in mess following:

    <w:rpr>   <w:rfonts w:eastasia="times new roman" w:cs="arial" w:ascii="arial" w:hansi="arial"/>   <w:color w:val="00000a"/>   <w:sz w:val="22"/>   <w:szcs w:val="22"/>   <w:lang w:val="en-us" w:eastasia="en-us" w:bidi="ar-sa"/> </w:rpr> <w:t>{{form.</w:t>undefined</w:r>undefined<w:r> <w:rpr>   <w:rfonts w:eastasia="times new roman" w:cs="arial" w:ascii="arial" w:hansi="arial"/>   <w:b w:val="false"/>   <w:bcs w:val="false"/>   <w:color w:val="00000a"/>   <w:sz w:val="22"/>   <w:szcs w:val="22"/>   <w:lang w:val="en-us" w:eastasia="en-us" w:bidi="ar-sa"/> </w:rpr> <w:t>l</w:t>undefined</w:r>undefined<w:r> <w:rpr>   <w:rfonts w:eastasia="times new roman" w:cs="arial" w:ascii="arial" w:hansi="arial"/>   <w:color w:val="00000a"/>   <w:sz w:val="22"/>   <w:szcs w:val="22"/>   <w:lang w:val="en-us" w:eastasia="en-us" w:bidi="ar-sa"/> </w:rpr> <w:t>etterdate.value|date:"y-m-d"}}</w:t>undefined</w:r> 

which result in following (apologies, can't seem highlight area of interest):

<w:rpr>   <w:rfonts w:eastasia="times new roman" w:cs="arial" w:ascii="arial" w:hansi="arial"/>   <w:color w:val="00000a"/>   <w:sz w:val="22"/>   <w:szcs w:val="22"/>   <w:lang w:val="en-us" w:eastasia="en-us" w:bidi="ar-sa"/> </w:rpr> <w:t>{{form.letterdate.value|date:"y-m-d"}}</w:t>undefined</w:r> 

how 1 handle this? regex way go; if so, how put command together?

this not duplicate of between double curly braces: replace particular text because has no mention of handling double curly brace start , end search range (that real problem, i've read through many examples , unable pattern substitution formatted correctly). other post parsing subset of html entities in xhtml; there no xhtml parsing required, mentioned or questioned in post. post here asks how remove and/or replace repeating pattern between 2 other known start/end patterns. provided brief background, 2 concrete examples simple complex hoping learn how accomplish current task - best hope part explained , apply method myself part b. got intelligent discussion , super replies helpful members of community. post doesn't involve html @ template i'm rendering in django added docx archive , saved filestore. not duplicate (of marked duplicate anyhow).

yes, regex great this!

a) use this:

 re.sub(r"(\{\{[^}]+}\})", lambda m: re.sub("&quot;", '"', m.group(1)), word_docxml) 

results:

>>> word_docxml = 'some text here {{form.letterdate|date:&quot;y-m-d&quot;}} , &quot; more text' >>> re.sub(r"(\{\{[^}]+}\})", lambda m: re.sub("&quot;", '"', m.group(1)), word_docxml) 'some text here {{form.letterdate|date:"y-m-d"}} , &quot; more text' 

b) more of same, matching different content inside braces;

re.sub(r"(\{\{[^}]+}\})", lambda m: re.sub("<[^>]+>", "", m.group(1)), s) 

results:

>>> s = """<w:rpr><w:rfonts w:eastasia="times new roman" w:cs="arial" w:ascii="arial" w:hansi="arial"/><w:color w:val="00000a"/><w:sz w:val="22"/><w:szcs w:val="22"/><w:us" w:eastasia="en-us" w:bidi="ar-sa"/></w:rpr><w:t>{{form.</w:t></w:r><w:r><w:rpr><w:rfonts w:eastasia="times new roman" w:cs="arial" w:ascii="arial" w:hansi="arial"/><e"/><w:bcs w:val="false"/><w:color w:val="00000a"/><w:sz w:val="22"/><w:szcs w:val="22"/><w:lang w:val="en-us" w:eastasia="en-us" w:bidi="ar-sa"/></w:rpr><w:t>l</w:t></w<w:rfonts w:eastasia="times new roman" w:cs="arial" w:ascii="arial" w:hansi="arial"/><w:color w:val="00000a"/><w:sz w:val="22"/><w:szcs w:val="22"/><w:lang w:val="en-us"-us" w:bidi="ar-sa"/></w:rpr><w:t>etterdate.value|date:"y-m-d"}}</w:t></w:r>""" >>> re.sub(r"(\{\{[^}]+}\})", lambda m: re.sub("<[^>]+>", "", m.group(1)), s) '<w:rpr><w:rfonts w:eastasia="times new roman" w:cs="arial" w:ascii="arial" w:hansi="arial"/><w:color w:val="00000a"/><w:sz w:val="22"/><w:szcs w:val="22"/><w:lang w:val="en-us" w:eastasia="en-us" w:bidi="ar-sa"/></w:rpr><w:t>{{form.letterdate.value|date:"y-m-d"}}</w:t></w:r>' 

explanation, since asked guidance, not answer;

re.sub(r"(\{\{[^}]+}\})", lambda m: re.sub("&quot;", '"', m.group(1)), word_docxml) 

the way works first match double brace interval. lambda expression takes group found in match , replace of relevant content.

the smaller regexes explained:

&quot;     # matching that, nothing fancy 

a pattern match tags;

<     # opening of tag [^>]+ # followed 1 or more characters not closing tags >     # followed closing tag 

Comments

Popular posts from this blog

php - Admin SDK -- get information about the group -

dns - How To Use Custom Nameserver On Free Cloudflare? -

Python Error - TypeError: input expected at most 1 arguments, got 3 -