Regular Expression Lookahead and Lookback

I had confusion over lookahead and lookback regular expression operator. This was clarified once I read the book and tried some examples in scala.

Lookahead/Lookback operator are used to identify position in target string from where we need to start matching. These are same as ^ or $ which is used to define start and end position of a line. ^ and $ are a specific example of these operators.

Let’s see these by example:

Create a pattern with regular expression.  Here ?= is a lookforward operator. It says that try to find a string “Vivek” and set the pointer(Assume location to be pointer) at the beginning of the match.

Then from that position look for a string VivekKumar.

scala> val r = new  Regex("(?=Vivek)VivekKumar")
r: scala.util.matching.Regex = (?=Vivek)VivekKumar
scala> r.findAllIn("VivekKumar").mkString
res31: String = VivekKumar
scala> r.findAllIn("VivekAbhishek").mkString
res32: String = ""
In last example pointer is at the beginning (V) and then system tries to match VivekKumar and it will not match.

Lookback is same as lookahead but instead of setting pointer at the beginning it will set pointer at the end of the matched word.

Here ?<= is lookback operator.

scala> val r = new  Regex("(?<=Vivek)AbhishekKumar")
r: scala.util.matching.Regex = (?<=Vivek)AbhishekKumar
scala> r.findAllIn("VivekAbhishekKumar").mkString
res35: String = AbhishekKumar

After matching the lookback operator pointer points to k(5th character) VivekAbhishekKumar, from then it will try to match next expression(AbhishekKumar) hence it prints AbhishekKumar.

Advertisements

2 responses to “Regular Expression Lookahead and Lookback

  1. Nice one. looking for more java articles like this.

  2. I came across another interesting example of lookback. The problem was to split the string in equal parts.
    “foobar”.split(“(? Array(foob, a, r)
    This will put the split anchor after 4th character. Then split the string. But it will match only once. To repeat this match always we can use \\G
    “foobar”.split(“(? Array(foob, ar) This will repeatedly try to match 4 character.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s