here's regex expression...
(?<=")[^"]+(?=")|[-+@]?([\w]+(-*\w*)*)
and here's test code...
"@one one" @two 3 4 "fi-ve five" 6 se-ven "e-ight" "nine n-ine nine"
i don't want double-quotes returned in results, seems make return parts between other quoted phrases quoted phrase in itself. here current results (excluding single quotes)...
'@one one' ' @two 3 4 ' 'fi-ve five' ' 6 se-ven ' 'e-ight' ' ' 'nine n-ine nine'
whereas want return individual results (excluding single quotes)...
'@one one' '@two' 'three' 'four' 'fi-ve five' 'six' 'se-ven' 'e-ight' 'nine n-ine nine'
any ideas change make double-quotes apply phrase itself, not inter-quote words? thanks.
the problem you've come across regexes don't have "memory" — is, can't remember whether last quotes opening or closing (the same reason why regex not parsing html/xml). however, if can assume quoting follows standard rules there no space between quotation marks , text being quoted (whereas, if there space between quotation mark , adjacent word, word not part of quote), can use negative look-arounds (?!\s)
, (?<!\s)
make sure there's no space in places:
(?<=")(?!\s)[^"]+(?<!\s)(?=")|[-+@]?([\w]+(-*\w*)*)
to clarify assumptions (using underscores mark spaces in question):
"this quote"_this text not quote_"another quote" ^ ^ ^ ^ ^ ^ no space here | | none here between word ⌞ there here ⌟ , mark
edit: also, can simplify regex bit removing groups , using character classes:
(?<=")(?!\s)[^"]+(?!\s)(?=")|[-+@]?[\w]+[-\w]*
this makes easier (for me anyway) results:
>> str = "\"@one one\" @two 3 4 \"fi-ve five\" 6 se-ven \"e-ight\" \"nine n-ine nine\"" => "\"@one one\" @two 3 4 \"fi-ve five\" 6 se-ven \"e-ight\" \"nine n-ine nine\"" >> rex = /(?<=")(?!\s)[^"]+(?!\s)(?=")|[-+@]?[\w]+[-\w]*/ => /(?<=")(?!\s)[^"]+(?!\s)(?=")|[-+@]?[\w]+[-\w]*/ >> str.scan rex => ["@one one", "@two", "three", "four", "fi-ve five", "six", "se-ven", "e-ight", "nine n-ine nine"]
Comments
Post a Comment