Notepad's New UTF-8 Rules since May 2019
- Notepad in the old days was (and Browsers then and now are) fine with printable Microsoft ANSI codes 127-255. If the character is a Unicode higher than 255, Notepad used to insist on being saved as UTF-8 (or one of the other Unicodes) and placed an invisible BOM (three character Byte Order Mark) at the start of the file.
- Notepad (today) defaults to UTF-8. If you endeavour to open a file that should be opened specifically as ANSI, unless these characters are early in the file, it doesn't load Notepad properly, the characters in the file in the range 127-255 are converted to meaningless � characters on the screen, and permanently stay that way when the file is next saved. Click here to view more about this "Replacement" character in Wikipedia.
In an IEEE article in 2012, this jumbled, meaningless, character problem is referred to as the problem of mojibake, Japanese for “character transformation.”
- On the other hand, if you specify that the file is to be opened as an ANSI file, it loads Notepad correctly. The file can then be saved as UTF-8, with the single-byte characters in the range 127-255 converted automatically.