Published on Jul 31, 2011 by Pim Elshoff
News #Tool #nl2lf #PHP #Encoding #GitHub
When I switched to a new IDE to edit an existing project, new files would get the wrong newline characters. This is not such a big deal, but to prevent cross-platform problems I wanted to keep all the files Unix-style. I wrote a convert script at first, but when it happened again I decided to make a small tool out of it: nl2lf. I also learned to configure my IDE before using it…
The following characters should be seen as a single new line (courtesy of Wikipedia):
I’ve been trying out a lot of things recently. The whole QA stuff, using DVCS’s and remote repositories and also other IDE’s. I’ve previously spent most of my time in PhpDesigner which is a step up from PSPad, but not ideal. Debugging is easy, but since I started with the unit testing I switched to NetBeans. NetBeans has integrated testing and code coverage functionality and I loved it from the first moment.
As I’ve been working on my test project ObjectStore to try out new things I decided to give PhpStorm a go, as was advised to me in the comments of a recent article. It turns out that, by default, PhpStorm uses the default system newline character.
For those of you who don’t know, ‘the’ newline character is the line-feed (LF) character. Well at least, that’s the newline character on *NIX systems. And MacOS X. But not MacOS before X, because the newline character on earlier versions of MacOS is the carriage return (CR) character. I guess the people at Microsoft wanted to appeal to all the users, because they decided to just use both: CR+LF.
The following characters should be seen as a single new line (courtesy of Wikipedia):
| Character | Description | Unicode | ASCII | ASCII Dec. | Notable Platform(s) |
|---|---|---|---|---|---|
| LF | Line Feed | U+000A | 0x0A | 10 | *NIX, MacOS X |
| VT | Vertical Tab | U+000B | 0x0B | 11 | |
| FF | Form Feed | U+000C | 0x0C | 12 | |
| CR | Carriage Return | U+000D | 0x0D | 13 | MacOS < X |
| CR+LF | CR followed by LF | Windows | |||
| NEL | Next Line | U+0085 | 0x85 | - | IBM EBCDIC |
| LS | Line Separator | U+2028 | - | - | |
| PS | Paragraph Separator | U+2029 | - | - |
Because UTF-8 characters over 0x7F (the DEL control character) can’t be stored in a single byte these characters are not often used. For more information on these and other control characters, please see this wiki page about ASCII.
As you can probably guess from most of these names, a lot of the control characters are about sending commands to hardware. Telling the printer to go down a line or tab, to return the printer head back to the beginning and if you read the link on ASCII there are many such more.
The traditional ‘real’ way to do it is to tell the printer to return the printer head and go to a new line: CR+LF. Many textual internet protocols, such as HTTP, SMTP, FTP and IRC, prescribe the usage of these characters. But a lot of people, including me before reading wiki, think that the \n character is sufficient to globally indicate a new line. It isn’t. In fact, ‘\n’ is two characters (duh Pim, really!?) and will be mapped by the underlying software to an actual code point indicating which newline character will be actually stored on disk.
Let me repeat that: the \n character is not the actual LF Unicode character, but will be mapped to some newline character by the underlying software. Traditionally *NIX and in particular the C compilers translated \n to LF without a problem and many people came to assume that \n was in fact LF and as such caused a lot of problems transferring files back and forth between systems.
Good news for us PHP programmers though: in PHP, \n always maps to LF and \r always maps to CR. Java and Python also work this way.
If you want to assure cross-platform ok-ness then a common way to store files is to encode them as UTF-8 and save them with *NIX style newlines (LF). Windows will understand that a single LF is a new line, but other systems may see the CR+LF as two new line or as a single new line and a weird character. The CodeSniffer tool that comes with the QA shebang already checks the newline character for you.
Saving all the data as utf-8 ensures that you won’t run into problems with character encodings that are not entirely foreign. All the websites at Crowd Surfing are completely UTF-8 and as a result we don’t run into weird ‘?’ signs when someone named Pïèrré \rôllãnd fills out a webform. Ok, I lied; we don’t run into them often. We even have some websites hosting Japanese and Chinese characters without problems.
Oh yea I forgot. So, if you have a project that is not stored with the correct newline character, you can download nl2lf (or use dos2unix if you’re on *NIX). If you have PEAR installed, you can just follow the instructions in the readme and you should be able to run nl2lf from shell. Just call
nl2lf -p -d
To get a preview of all the files that use \r, \r\n or \n\r for newlines. Run it without the -p modifier to execute. Enjoy!

No trackbacks yet
Anonymous said:
Thanks that was useful! 08:34, 03 April 2012