Remove Byte-Order-Mark

Problems always occur when UTF8 encoded script files or XML files are opened and edited. After saving the file the following characters are suddenly at the beginning of the file:
  ï »¿

These characters are called Byte Order Mark (BOM) and mark the text as UTF-8 encoded. The characters are usually inserted automatically by editors. The downside is that the BOM sometimes creates problems in Java or PHP applications.


It’s relatively easy to remove these characters from PHP or XML files under Linux.


The first solution is using vim:


Open the file in vim and type the following:

:set nobomb :w

This will remove the BOM and save the file.


An alternative solution without vim uses sed (you could also use tr instead):

cp inputfile.xml inputfile.xml.tmp
cat inputfile.xml.tmp | sed ‘s/^xEFxBBxBF//’ > inputfile.xml
rm inputfile.xml.tmp

This will copy the file, replace the three BOM characters by the empty string (i.e. removing them) and copying it back to the original file (and deleting the temporary file in the end).


That’s it !



Leave a Reply

Your email address will not be published. Required fields are marked *