[Smaug] CR/LF, wordwrap, newline & ASCII

Rick Moen rick at linuxmafia.com
Fri, 14 Jun 2002 10:37:16 -0700

Quoting Paul Thomas (paul@cuenet.com):

> Right on. I guess what I'm trying to figure out is, if I open a 'plain
> text' file from someone else, does my OS insert the line breaks or do
> they come enclosed with the 'plain text' file. 

When the author composed the file, he or his software inserted what are
locally considered to be an endline sequence of normally-invisible
control characters.  Which control characters are considered the normal
endline sequence is an OS-dependent convention as described yesterday.

Pico and less (the programs you mentioned) expect to see the
normally-invisible character ASCII #10 decimal (the linefeed = LF
character).  But the convention on MacOS is to use ASCII #13 decimal
(the carriage return = CR character, instead of LF.  So, pico or less 
does _not_ see what it considers a valid endline sequence, but instead a 
peculiar non-displayable character that it reflects by showing the 
Control-key combination that can produce it, Control-M, signified as ^M.

Other editors might be a little more sophisticated and figure out what's 
going on.  Or you can use that simple "tr" transform I posted yesterday,
to fix the file.  That fix works because it strips out the CR characters
and inserts LF in the place of each.  Next time pico or less encounters
the (changed) file, it sees what it considers valid endline signals
instead of funky control characters it sees no purpose for.  And thus
it acts on the signal at each such point in the text, going back to 
column 1, dropping down a line, and continuing. 

Have a look at this page, and especially at the section labelled "ASCII
Table":  http://www.mindspring.com/~jc1/serial/Resources/ASCII.html
That's the 128-character, 7-bit ASCII character set.  Notice that the
first 32 entries are non-printable characters that by convention have 
various programmatic purposes -- generally to control hardware, which is 
why they're called control characters.  One way to generate them (which
may not work in some software) is to hold down the Control shiftkey and 
hit the regular typewriter key associated with it.  E.g., the Backspace
= BS control character, ASCII #8, can be generated by holding down
Control and typing "H".  That's indicated in the table as ^H.