[Smaug] CR/LF, wordwrap, newline & ASCII

Rick Moen rick at linuxmafia.com
Thu, 13 Jun 2002 18:59:29 -0700


Quoting Paul Thomas (paul@cuenet.com):

> I got into this cause some Mac-based person zipped up some 'plain
> text' files with Stuffit and sent them to me as an attachment.
> Apparently the origin of the 'plain text' files were emails the guy
> received with his Mac version of Eudora, and he saved them as 'plain
> text' files.

Specifically, whatever text editor Eudora uses.  Can't remember if
it's internal or what.  But he pushed Enter, the terminal handler
recorded CR, and the other software didn't interfere (for lack of 
fiendish aliens from Betelgeuse bearing key-macro programs or clever
devices for scrambling keyboard scancodes).

> So I unzipped this mess easily enough, but when trying to read the
> text files with 'less' or 'pico', the text all ran together and were
> end of lines had a ^M character followed by a space or perhaps even a
> tab space, sometimes two ^M characters like: ^M^M.

Pico didn't see what it considered any valid endline seqences, only
CR characters inserted at various points in one incredibly long line.
It interpreted that CR character as the Control-key sequence that
generates it, Control-M, and indicated its presence as ^M.  (For similar 
reasons, the LF character is often displayed as ^J.)

> I was able to:
> 
> strings some.txt > some.fixed.txt
> 
> and then read the file successfully formatted as it should have been
> without the ^M characters visible.

I mentioned that you can use the "tr" filter.  Like this:
tr '\015' '\012'  macfile  > unixfile

Going the reverse direction:
tr '\012' '\015'  unixfile  > macfile

> When I reported this to the sender, he replied that I must be using
> some old text reader from the 70's.

Typical Mac-head solipsist.  Any problems cannot possibly have anything
to do with the Mac, because it's perfect.

(Of course, to repeat, this is nobody's fault.  It's just a difference.)