On 4/6/2008 1:49 PM, tom wrote:
> On 6 Kwi, 20:46, pk <p...@[EMAIL PROTECTED]
> wrote:
>
>>pk wrote:
>>
>>>With the same input format you show, and a bit of extra work:
>>
>>>$ echo '<html>foo<li>A<br></li><li><b>B</b></li><li>C</li>bar</html>' |
\
>>>awk -F '</li>' '{for(i=1;i<NF;i++){gsub(/^.*<li>/,"",$i);print $i}}'
>>>A<br>
>>><b>B</b>
>>>C
>>
>>Just to be clear, this of course produces a fourth field (not shown
above,
>>since it's not relevant), which you can see by including the NF'th field
in
>>the loop:
>>
>>$ echo '<html>foo<li>A<br></li><li><b>B</b></li><li>C</li>bar</html>' |
\
>>awk -F '</li>' '{for(i=1;i<=NF;i++){gsub(/^.*<li>/,"",$i);print $i}}'
>>A<br>
>><b>B</b>
>>C
>>bar</html>
>>
>>--
>>All the commands are tested with bash and GNU tools, so they may use
>>nonstandard features. I try to mention when something is nonstandard (if
>>I'm aware of that), but I may miss something. Corrections are welcome.
>
>
> Great! This is what I was looking for, thank you very much !!!
Alternativel, with GNU awk:
I thought you only wanted the text between <li> and </li>. If so, try this
with
GNU awk:
$ echo '<html>foo<li>A<br></li><li><b>B</b></li><li>C</li>bar</html>' |
gawk -v RS="</li>" -F'<li>' 'RT{print $NF}'
A<br>
<b>B</b>
C
Regards,
Ed.


|