Regex returns inter-phrase words as another quoted phrase -


here's regex expression...

(?<=")[^"]+(?=")|[-+@]?([\w]+(-*\w*)*) 

and here's test code...

"@one one" @two 3 4 "fi-ve five" 6 se-ven "e-ight" "nine n-ine nine" 

i don't want double-quotes returned in results, seems make return parts between other quoted phrases quoted phrase in itself. here current results (excluding single quotes)...

'@one one' ' @two 3 4 ' 'fi-ve five' ' 6 se-ven ' 'e-ight' ' ' 'nine n-ine nine' 

whereas want return individual results (excluding single quotes)...

'@one one' '@two' 'three' 'four' 'fi-ve five' 'six' 'se-ven' 'e-ight' 'nine n-ine nine' 

any ideas change make double-quotes apply phrase itself, not inter-quote words? thanks.

the problem you've come across regexes don't have "memory" — is, can't remember whether last quotes opening or closing (the same reason why regex not parsing html/xml). however, if can assume quoting follows standard rules there no space between quotation marks , text being quoted (whereas, if there space between quotation mark , adjacent word, word not part of quote), can use negative look-arounds (?!\s) , (?<!\s) make sure there's no space in places:

(?<=")(?!\s)[^"]+(?<!\s)(?=")|[-+@]?([\w]+(-*\w*)*) 

to clarify assumptions (using underscores mark spaces in question):

"this quote"_this text not quote_"another quote" ^               ^ ^                      ^ ^             ^   no space here   |                      |    none here   between word    ⌞  there here   ⌟   , mark 

edit: also, can simplify regex bit removing groups , using character classes:

(?<=")(?!\s)[^"]+(?!\s)(?=")|[-+@]?[\w]+[-\w]* 

this makes easier (for me anyway) results:

>> str = "\"@one one\" @two 3 4 \"fi-ve five\" 6 se-ven \"e-ight\" \"nine n-ine nine\"" => "\"@one one\" @two 3 4 \"fi-ve five\" 6 se-ven \"e-ight\" \"nine n-ine nine\"" >> rex = /(?<=")(?!\s)[^"]+(?!\s)(?=")|[-+@]?[\w]+[-\w]*/ => /(?<=")(?!\s)[^"]+(?!\s)(?=")|[-+@]?[\w]+[-\w]*/ >> str.scan rex => ["@one one", "@two", "three", "four", "fi-ve five",      "six", "se-ven", "e-ight", "nine n-ine nine"] 

Comments