awk still rocks!

Last week I faced a unique challenging problem where I need to extract multiline soap request from a huge log file. First I thought about using java, then realised that loading a huge file is java could be problematic.  Then I thought about groovy or other interpreted language. Then lastly I found a very simple solution based on awk. I did some awk script in past, for generating huge test file based on some random criteria(Like generating CDR files for load testing). I was simply amazed to see how an awk script can solve this problem so easily.

Lets assume you have huge file with following SOAP request.

   Payload: <?xml version="1.0" encoding="UTF-8"?><vivek xmlns="com.test.vivek">
	<Template>abc service template</Template>
		<Subject>Technical Service &amp; Repair</Subject>

Here is the the awk script to extract all such xml:

awk '/vivek/,/\/vivek>/'  test.txt   | awk '/Payload:/ {print "*****************"; print } !/Payload/{ print;} ' 

This awk script will extract all text from vivek to /vivek> then we run another awk script which check for work Payload and if its found then it print “***************” then rest of the line containing Payload text, if line does not contain Payload then it simply print the line. We are printing “**************” to keep a boundary between different matched request.

Here is the response from command:-


Awk Reference




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s