Unreadable Characters in VERSION CHANGES file

DarkHorse2 · Post by **DarkHorse2** » Sat Aug 12, 2023 4:26 am

Somebody introduced a number of unreadable characters in your VERSION CHANGES.txt file.

: CG3_UnreadableCharacters.PNG (175.8 KiB) Viewed 840 times

Fixed below.

El_Condoro · Post by **El_Condoro** » Sun Aug 13, 2023 2:17 am

They're apostrophes by the context.

DarkHorse2 · Post by **DarkHorse2** » Sun Aug 13, 2023 6:23 pm

Many were, but not all.

There were other unreadable characters fixed as well.

Some were invoved with:
- Lérida
- São Jorge
- Lützow
- hyphens (-)
- quotations (")

Hubert Cater · Post by **Hubert Cater** » Mon Aug 14, 2023 7:27 pm

I've seen this with some text editors, which one are you using if you don't mind me asking?

You could try Notepad++ to see if that helps?

And then set the Editor via 'editor.ini' in the USER folder to launch by default for all in game script files when launching them from the Editor.

e.g. change this line to the following:

#SHELL= C:\Program Files\Notepad++\notepad++.exe

Or wherever your install of Notepad++ has been installed etc.

DarkHorse2 · Post by **DarkHorse2** » Mon Aug 14, 2023 9:06 pm

Visual Studio Code.

It interprets the file as UTF-8 encoded. (most likely because it is the most commonly used on the planet)

UTF-8 is the dominant encoding for the World Wide Web (and internet technologies), accounting for 98.0% of all web pages, over 99.0% of the top 10,000 pages, and up to 100% for many languages, as of 2023. Virtually all countries and languages have 95% or more use of UTF-8 encodings on the web.

But a number of the problem characters are not UTF-8 - such as many of the apostrophe's, it is using the value of 92 for the [']

If I force vscode to interpret the file using the windows-1252 code page, it is able to discern the problem characters.

This is all a little weird that you are using UTF-8-BOM for the event files, but a non-UTF-8 encoding for the other text files.

Hubert Cater · Post by **Hubert Cater** » Thu Aug 17, 2023 1:26 pm

Very strange as the characters all show up fine on my end.

However you are not wrong that there are some different characters in the text file, e.g. the apostrophe's can vary a bit in the file, e.g. I use one style, and I believe Bill's keyboard provides him with a different style from mine, but I've never seen it as an issue with how it displays before, like how it is showing incorrectly for you.

The encoding when I look at the original file on my end is ANSI, at least for the Matrix install, and perhaps this is the issue, e.g. it has been encoded to UTF-8 on your end with your text viewer and the characters were corrupted in the encoding change?

DarkHorse2 · Post by **DarkHorse2** » Thu Aug 17, 2023 4:48 pm

The actual encoding of the file did not change on my end. I copied the file as "SOE VERSION CHANGES.txt" before making any corrections - which was really prompted by needing to check-in the file to my github repo - as I needed to reference it from other markdown files.

There are a range of character codes that are common to both ANSI and UTF-8. As long as only those are used, there are no issues. But as soon as a code exceeds 128 (IIRC), you start running into problems.

https://bizbrains.com/resources/bizblog ... tf-8-more/

EDIT:
What I had previously posted was not entirely correct (which I deleted)

The issue came about when opening the txt file from a local markdown preview, which loaded it as utf-8. (vscode)

My changes were either to correct annoying mis-spellings.... or change to a character that would be readable by both utf-8 & windows-1252. So far that looks to be the case, but I have not checked 100% of them. (and I am a bit leary of the UNICODE chars, such as the one used in Lützow)

Yep, checked. There is no common character for the [ü] (or most likely for any other special character).

Which means, you really should be using utf-8 if you are using an extended character set.

As with Windows-1252, the first 128 code points are identical to ASCII, but above that the two encodings differ considerably. While Windows-1252 only contains 256 code points altogether, UTF-8 has code points for the entire Unicode character set.

DarkHorse2 · Post by **DarkHorse2** » Thu Aug 17, 2023 5:33 pm

As you can see from here ChangeLog.md, I've had no issues with links to the event txt files, as they are using the utf-8-bom character set.

As a software engineer, I never used a regular windows ansi text file to contain UNICODE characters. You can really only depend on the first 128 chars to display correctly, those of the Ascii char set.

Right now, your version file is using is Windows-1252. (which contains only the latin chars) (that used to be the default)

Windows-1252 or CP-1252 (code page 1252) is a single-byte character encoding of the Latin alphabet (with additions) that was used by default in Microsoft Windows for English and many Romance and Germanic languages including Spanish, Portuguese, French, and German (though missing uppercase ẞ). This character-encoding scheme is used throughout the Americas, Western Europe, Oceania, and much of Africa. All modern operating systems, including Windows, now use Unicode code points and text encodings by default, which are portable across all of the world's major languages.

https://www.wikiwand.com/en/Windows-1252

Hubert Cater · Post by **Hubert Cater** » Wed Aug 23, 2023 7:21 pm

Yeah it is not something we really ever took into account for the VERSION CHANGES.txt file, but as you noted for the in game script files, we use encoding so there are no issues in game and especially so as we have multiple languages supported.

Unreadable Characters in VERSION CHANGES file

Unreadable Characters in VERSION CHANGES file

Re: Unreadable Characters in VERSION CHANGES file

Re: Unreadable Characters in VERSION CHANGES file

Re: Unreadable Characters in VERSION CHANGES file

Re: Unreadable Characters in VERSION CHANGES file

Re: Unreadable Characters in VERSION CHANGES file

Re: Unreadable Characters in VERSION CHANGES file

Re: Unreadable Characters in VERSION CHANGES file

Re: Unreadable Characters in VERSION CHANGES file