[Home] [Downloads] [Search] [Help/forum]


Register forum user name Search FAQ

Gammon Forum

[Folder]  Entire forum
-> [Folder]  Programming
. -> [Folder]  General
. . -> [Subject]  lpeg code translate to lpeg re

lpeg code translate to lpeg re

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page


Pages: 1 2  3  4  5  

Posted by Albert Chan   (55 posts)  [Biography] bio
Date Sat 20 Jan 2018 08:12 PM (UTC)
Message
i enjoyed lpeg tutorial: http://www.gammon.com.au/lpeg
however, i am puzzled about the lpeg re examples

how to translate lpeg upto function example in lpeg re ?
what is the lpeg re equivalent to lpeg p1 - p2 ?
[Go to top] top

Posted by Albert Chan   (55 posts)  [Biography] bio
Date Reply #1 on Sat 20 Jan 2018 10:18 PM (UTC)
Message
to make my lpeg re questions more concrete, I was unable to
translate lua pattern "(.*)and(.*)" using lpeg re:

local C, P = lpeg.C, lpeg.P

-- my attempt for lua pettern "(.+)and(.*)"
local lpeg_pat = C((P(1) - 'and')^1) * 'and' * C(P(1)^0)
local re_pat = re.compile "{ (. ! 'and')* . } 'and' {.*}"

-- lua pattern "(.*)and(.*)
lpeg_pat = C((P(1) - 'and')^0) 'and' * C(P(1)^0)

-- what is lpeg re equivalent code ?
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #2 on Sat 20 Jan 2018 10:32 PM (UTC)
Message
It looks like you have to reverse the order. This works:


require "re"
target = "foo and bar"
local re_pat = re.compile "{ (!'and' .)*} 'and' {.*}"
print (lpeg.match (re_pat, target))


Output:


foo   bar


The pattern is basically saying:


{       <-- start of capture
(       <-- start of group
!'and'  <-- assert not matching 'and' without consuming input
.       <-- consume one character
)       <-- end of group
*       <-- match zero or more of the preceding
}       <-- end of capture
'and'   <-- consume the 'and'
{.*}    <-- capture rest of input

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #3 on Sat 20 Jan 2018 10:39 PM (UTC)
Message
I don't see how to implement "upto" in re, since I can't see how to pass arguments to functions.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Albert Chan   (55 posts)  [Biography] bio
Date Reply #4 on Sat 20 Jan 2018 11:24 PM (UTC)
Message
your solution of reversing the order is very nice !

you should consider put this trick to the tutorial,
to complement the lpeg upto function example.

with this insight, can i assert the following ?

P(1) - 'and' == -P('and') * 1 == re.compile "! 'and' ."

or, to generalize

P(1) - (P'and' + 'or' + 'not')
== -(P'and' + 'or' + 'not') * 1
== re.compile "!('and' / 'or' / 'not') ."
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #5 on Sat 20 Jan 2018 11:48 PM (UTC)
Message
Yes, I think you are right, although you still need to repeat that pattern. So 'upto' could be written:


function upto (what)
  return C((-P(what) * P(1))^1) * P(what)
end -- upto


Instead of:


function upto (what)
  return C((P(1) - P(what))^1) * P(what)
end -- upto


Those two versions also do capturing, which you can remove by deleting the 'C' character.

I'll try to add this to the LPEG web page explanation.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Albert Chan   (55 posts)  [Biography] bio
Date Reply #6 on Sun 21 Jan 2018 12:31 AM (UTC)
Message
FYI, both versions of upto generate exactly the same parse tree code

Great thanks to Sean Conner, who actually recompile a debug lpeg
version to go thru the parse tree code. He also get your reversed
order answer (trial and error with the debug parse tree)

his response is in lua mailing list jan 20, 2018 5:54pm
[Go to top] top

Posted by Albert Chan   (55 posts)  [Biography] bio
Date Reply #7 on Sun 21 Jan 2018 01:34 AM (UTC)

Amended on Sun 21 Jan 2018 01:35 AM (UTC) by Nick Gammon

Message
Also noticed lpeg upto trick cannot translate lua pattern "(.*)and(.*)"

re.compile "{(! 'and' .)*} 'and' {.*}" correspond to lua pattern "(.-)and(.*)",
the non-greedy pattern.

your lpeg tutorial 2 examples return the same match only because
words are separated by spaces: %a+ is greedy, function upto is NOT.

:-(
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #8 on Sun 21 Jan 2018 01:39 AM (UTC)
Message
What are you expecting?


require "re"
target = "foo and bar"

print "====="

local re_pat = re.compile "{ (!'and' .)*} 'and' {.*}"
print (lpeg.match (re_pat, target))

print "---"

print (string.match (target, "(.*)and(.*)"))
print (string.match (target, "(.-)and(.*)"))


Output is:


=====
foo   bar
---
foo   bar
foo   bar


That looks the same to me. For what input do you expect a difference, and what do you expect that difference to be?

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #9 on Sun 21 Jan 2018 01:43 AM (UTC)
Message
Maybe here:


target = "foo and bar and whatever"


Output:


=====
foo   bar and whatever
---
foo and bar   whatever
foo   bar and whatever


So yes, it looks non-greedy.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #10 on Sun 21 Jan 2018 03:10 AM (UTC)
Message
I worked it out with this grammar:


target = "foo and bar and whatever"

c = re.compile [[
   parse     <- {| {noDelim} lastDelim |}  -- look for all up to the last delimiter followed by the last part
   delim     <- 'and'                      -- our delimiter
   noDelim   <- (!lastDelim .)*            -- zero or more characters without the last delimiter
   lastDelim <- delim {(!delim .)*} !.     -- the delimiter without any more delimiters and then end of subject
]]

result = lpeg.match (c, target)

for k, v in ipairs (result) do
  print (k, v)
end -- for



Output:


1 foo and bar
2  whatever

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Albert Chan   (55 posts)  [Biography] bio
Date Reply #11 on Sun 21 Jan 2018 04:50 AM (UTC)

Amended on Sun 21 Jan 2018 04:52 AM (UTC) by Albert Chan

Message
i have a simpler and faster lpeg re (about 2x speed), but the
last 'and' is appended to front string

pat = re.compile "{g <- . g / 'and'} {.*}"

= pat:match("this and that and this and more")
this and that and this and
more

anyway to remove the last "and" ?
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #12 on Sun 21 Jan 2018 05:48 AM (UTC)

Amended on Sun 21 Jan 2018 06:02 AM (UTC) by Nick Gammon

Message
By my measurements, yours is not 2x faster. In some cases it is slightly faster. Arguably, if it isn't providing the results you want, then the speed doesn't matter. You could always remove the trailing "and" with a string.sub, but that would take time. I made up a test bed:


require "re"
require "tprint"

c = re.compile [[
   parse     <- {| {noDelim} lastDelim |}  -- look for all up to the last delimiter followed by the last part
   delim     <- 'and'                      -- our delimiter
   noDelim   <- (!lastDelim .)*            -- zero or more characters without the last delimiter
   lastDelim <- delim {(!delim .)*} !.     -- the delimiter without any more delimiters and then end of subject
]]

pat = re.compile "{| {g <- . g / 'and'} {.*} |}"  -- Albert Chan pattern

function showResults (result, start, finish)
  if not result then
    print ("no match")
  else
    tprint (result)
  end -- if
  print (string.format ("Time taken = %0.3f us", (finish - start) * 1e6))
end -- showResults 

function test (which)

  print (string.rep ("=", 20))
  print ("Testing:", which)

  print (string.rep ("-", 10))
  print "Nick"

  start = utils.timer ()
  result = lpeg.match (c, which)
  finish = utils.timer ()

  showResults (result, start, finish)

  print (string.rep ("-", 10))
  print "Albert"

  start = utils.timer ()
  result = lpeg.match (pat, which)
  finish = utils.timer ()

  showResults (result, start, finish)

end -- test

tests = {
 "foo and bar and whatever",
 "foo and bar",
 "XandY",
 "foo",
 "Xand",
 "andY",
 "and",
 "",
}

for _, v in ipairs (tests) do
  test (v)
end -- for



You will notice that the very case you were interested in (multiple instances of the word "and") your expression is almost 4 times as slow.



====================
Testing: foo and bar and whatever
----------
Nick
1="foo and bar "
2=" whatever"
Time taken = 11.733 us
----------
Albert
1="foo and bar and"
2=" whatever"
Time taken = 43.302 us
====================
Testing: foo and bar
----------
Nick
1="foo "
2=" bar"
Time taken = 5.029 us
----------
Albert
1="foo and"
2=" bar"
Time taken = 4.749 us
====================
Testing: XandY
----------
Nick
1="X"
2="Y"
Time taken = 4.749 us
----------
Albert
1="Xand"
2="Y"
Time taken = 4.749 us
====================
Testing: foo
----------
Nick
no match
Time taken = 3.073 us
----------
Albert
no match
Time taken = 2.794 us
====================
Testing: Xand
----------
Nick
1="X"
2=""
Time taken = 4.470 us
----------
Albert
1="Xand"
2=""
Time taken = 5.867 us
====================
Testing: andY
----------
Nick
1=""
2="Y"
Time taken = 4.749 us
----------
Albert
1="and"
2="Y"
Time taken = 4.470 us
====================
Testing: and
----------
Nick
1=""
2=""
Time taken = 5.029 us
----------
Albert
1="and"
2=""
Time taken = 4.470 us
====================
Testing: 
----------
Nick
no match
Time taken = 3.073 us
----------
Albert
no match
Time taken = 3.073 us



I took the compile part out of the timing, because you should really only do that once, and the speed you are really interested in is execution speed (that is, match speed).




Having said all that, your pattern looks nice and elegant. :)

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

Posted by Albert Chan   (55 posts)  [Biography] bio
Date Reply #13 on Sun 21 Jan 2018 06:11 AM (UTC)
Message
my mistake.
i was comparing my pattern vs yours
but mine does multiple returns, while yours was saved in table

i am new with lpeg ...
how to convert your re pattern to do multiple returns ?
[Go to top] top

Posted by Nick Gammon   Australia  (22,973 posts)  [Biography] bio   Forum Administrator
Date Reply #14 on Sun 21 Jan 2018 06:29 AM (UTC)

Amended on Sun 21 Jan 2018 06:38 AM (UTC) by Nick Gammon

Message
See my "parse" line above. You put the pattern inside these symbols:


{| pattern |}


Or to not do that, remove those symbols.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
[Go to top] top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


118,713 views.

This is page 1, subject is 5 pages long: 1 2  3  4  5  [Next page]

It is now over 60 days since the last post. This thread is closed.     [Refresh] Refresh page

Go to topic:           Search the forum


[Go to top] top

Quick links: MUSHclient. MUSHclient help. Forum shortcuts. Posting templates. Lua modules. Lua documentation.

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.

[Home]


Written by Nick Gammon - 5K   profile for Nick Gammon on Stack Exchange, a network of free, community-driven Q&A sites   Marriage equality

Comments to: Gammon Software support
[RH click to get RSS URL] Forum RSS feed ( https://gammon.com.au/rss/forum.xml )

[Best viewed with any browser - 2K]    [Hosted at HostDash]