command line – How to use sed to remove newlines above and below a string?


A backslash is used in ERE’s to escape (strip special meaning from) characters in order to match them literally … However, in BRE’s, it also adds a special meaning to some characters.

So, it has a special meaning in both and therefore has to be escaped itself in order to be matched as a literal \.

sed has a --debug option that you can use to see what is matched in your file:

$ cat file
\ntoken1\n
\ntoken2\n

… with your RegEx patterns \ntoken1\n and \ntoken1\n like so:

$ sed --debug -e 's/\ntoken1\n/token1/' -e 's/\ntoken2\n/token2/' file
SED PROGRAM:
  s/\ntoken1\n/token1/
  s/\ntoken2\n/token2/
INPUT:   'file' line 1
PATTERN: \\ntoken1\\n
COMMAND: s/\ntoken1\n/token1/
PATTERN: \\ntoken1\\n
COMMAND: s/\ntoken2\n/token2/
PATTERN: \\ntoken1\\n
END-OF-CYCLE:
\ntoken1\n
INPUT:   'file' line 2
PATTERN: \\ntoken2\\n
COMMAND: s/\ntoken1\n/token1/
PATTERN: \\ntoken2\\n
COMMAND: s/\ntoken2\n/token2/
PATTERN: \\ntoken2\\n
END-OF-CYCLE:
\ntoken2\n

… to see that no MATCHED REGEX is reported.

While if you escape it like so:

$ sed --debug -e 's/\\ntoken1\\n/token1/' -e 's/\\ntoken2\\n/token2/' file
SED PROGRAM:
  s/\\\\ntoken1\\\\n/token1/
  s/\\\\ntoken2\\\\n/token2/
INPUT:   'file' line 1
PATTERN: \\ntoken1\\n
COMMAND: s/\\\\ntoken1\\\\n/token1/
MATCHED REGEX REGISTERS
  regex[0] = 0-10 '\ntoken1\n'
PATTERN: token1
COMMAND: s/\\\\ntoken2\\\\n/token2/
PATTERN: token1
END-OF-CYCLE:
token1
INPUT:   'file' line 2
PATTERN: \\ntoken2\\n
COMMAND: s/\\\\ntoken1\\\\n/token1/
PATTERN: \\ntoken2\\n
COMMAND: s/\\\\ntoken2\\\\n/token2/
MATCHED REGEX REGISTERS
  regex[0] = 0-10 '\ntoken2\n'
PATTERN: token2
END-OF-CYCLE:
token2

… then, you get a MATCHED REGEX and your replacement works.

To find and replace all occurrences, you’ll need to set the global flag like so:

sed -e 's/\\ntoken1\\n/token1/g' -e 's/\\ntoken2\\n/token2/g' file



Source link

Leave a Comment