Unreadable Characters in VERSION CHANGES file

Post bug reports and ask for help with other issues here.
Post Reply
DarkHorse2
Posts: 1070
Joined: Fri Feb 04, 2022 12:08 pm

Unreadable Characters in VERSION CHANGES file

Post by DarkHorse2 »

Somebody introduced a number of unreadable characters in your VERSION CHANGES.txt file.


CG3_UnreadableCharacters.PNG
CG3_UnreadableCharacters.PNG (175.8 KiB) Viewed 840 times

Fixed below.
Attachments
SOE VERSION CHANGES.txt
(155.51 KiB) Downloaded 7 times
El_Condoro
Posts: 607
Joined: Sat Aug 03, 2019 4:35 am

Re: Unreadable Characters in VERSION CHANGES file

Post by El_Condoro »

They're apostrophes by the context.
ORB & CROWN Fantasy Warfare Mod for Strategic Command
Download for War in Europe or World at War - YouTube - Discord
DarkHorse2
Posts: 1070
Joined: Fri Feb 04, 2022 12:08 pm

Re: Unreadable Characters in VERSION CHANGES file

Post by DarkHorse2 »

Many were, but not all.

There were other unreadable characters fixed as well.

Some were invoved with:
- Lérida
- São Jorge
- Lützow
- hyphens (-)
- quotations (")
User avatar
Hubert Cater
Posts: 6047
Joined: Mon Jul 22, 2013 11:42 am
Contact:

Re: Unreadable Characters in VERSION CHANGES file

Post by Hubert Cater »

I've seen this with some text editors, which one are you using if you don't mind me asking?

You could try Notepad++ to see if that helps?

And then set the Editor via 'editor.ini' in the USER folder to launch by default for all in game script files when launching them from the Editor.

e.g. change this line to the following:

#SHELL= C:\Program Files\Notepad++\notepad++.exe

Or wherever your install of Notepad++ has been installed etc.
DarkHorse2
Posts: 1070
Joined: Fri Feb 04, 2022 12:08 pm

Re: Unreadable Characters in VERSION CHANGES file

Post by DarkHorse2 »

Visual Studio Code.

It interprets the file as UTF-8 encoded. (most likely because it is the most commonly used on the planet)
UTF-8 is the dominant encoding for the World Wide Web (and internet technologies), accounting for 98.0% of all web pages, over 99.0% of the top 10,000 pages, and up to 100% for many languages, as of 2023. Virtually all countries and languages have 95% or more use of UTF-8 encodings on the web.
But a number of the problem characters are not UTF-8 - such as many of the apostrophe's, it is using the value of 92 for the [']

If I force vscode to interpret the file using the windows-1252 code page, it is able to discern the problem characters.

This is all a little weird that you are using UTF-8-BOM for the event files, but a non-UTF-8 encoding for the other text files.

:?: :?:
Attachments
CG3_HexEditor.PNG
CG3_HexEditor.PNG (23.02 KiB) Viewed 761 times
CG3_VSCode.png
CG3_VSCode.png (168.68 KiB) Viewed 761 times
User avatar
Hubert Cater
Posts: 6047
Joined: Mon Jul 22, 2013 11:42 am
Contact:

Re: Unreadable Characters in VERSION CHANGES file

Post by Hubert Cater »

Very strange as the characters all show up fine on my end.

However you are not wrong that there are some different characters in the text file, e.g. the apostrophe's can vary a bit in the file, e.g. I use one style, and I believe Bill's keyboard provides him with a different style from mine, but I've never seen it as an issue with how it displays before, like how it is showing incorrectly for you.

The encoding when I look at the original file on my end is ANSI, at least for the Matrix install, and perhaps this is the issue, e.g. it has been encoded to UTF-8 on your end with your text viewer and the characters were corrupted in the encoding change?
DarkHorse2
Posts: 1070
Joined: Fri Feb 04, 2022 12:08 pm

Re: Unreadable Characters in VERSION CHANGES file

Post by DarkHorse2 »

The actual encoding of the file did not change on my end. I copied the file as "SOE VERSION CHANGES.txt" before making any corrections - which was really prompted by needing to check-in the file to my github repo - as I needed to reference it from other markdown files.

There are a range of character codes that are common to both ANSI and UTF-8. As long as only those are used, there are no issues. But as soon as a code exceeds 128 (IIRC), you start running into problems.

https://bizbrains.com/resources/bizblog ... tf-8-more/

EDIT:
What I had previously posted was not entirely correct (which I deleted) :)

The issue came about when opening the txt file from a local markdown preview, which loaded it as utf-8. (vscode)

My changes were either to correct annoying mis-spellings.... or change to a character that would be readable by both utf-8 & windows-1252. So far that looks to be the case, but I have not checked 100% of them. (and I am a bit leary of the UNICODE chars, such as the one used in Lützow)

Yep, checked. There is no common character for the [ü] (or most likely for any other special character).

Which means, you really should be using utf-8 if you are using an extended character set.
As with Windows-1252, the first 128 code points are identical to ASCII, but above that the two encodings differ considerably. While Windows-1252 only contains 256 code points altogether, UTF-8 has code points for the entire Unicode character set.
Attachments
CG3_GitHub_Diff.PNG
CG3_GitHub_Diff.PNG (164.45 KiB) Viewed 730 times
Last edited by DarkHorse2 on Thu Aug 17, 2023 9:00 pm, edited 1 time in total.
DarkHorse2
Posts: 1070
Joined: Fri Feb 04, 2022 12:08 pm

Re: Unreadable Characters in VERSION CHANGES file

Post by DarkHorse2 »

As you can see from here ChangeLog.md, I've had no issues with links to the event txt files, as they are using the utf-8-bom character set.

As a software engineer, I never used a regular windows ansi text file to contain UNICODE characters. You can really only depend on the first 128 chars to display correctly, those of the Ascii char set.

Right now, your version file is using is Windows-1252. (which contains only the latin chars) (that used to be the default)
Windows-1252 or CP-1252 (code page 1252) is a single-byte character encoding of the Latin alphabet (with additions) that was used by default in Microsoft Windows for English and many Romance and Germanic languages including Spanish, Portuguese, French, and German (though missing uppercase ẞ). This character-encoding scheme is used throughout the Americas, Western Europe, Oceania, and much of Africa. All modern operating systems, including Windows, now use Unicode code points and text encodings by default, which are portable across all of the world's major languages.
https://www.wikiwand.com/en/Windows-1252
User avatar
Hubert Cater
Posts: 6047
Joined: Mon Jul 22, 2013 11:42 am
Contact:

Re: Unreadable Characters in VERSION CHANGES file

Post by Hubert Cater »

Yeah it is not something we really ever took into account for the VERSION CHANGES.txt file, but as you noted for the in game script files, we use encoding so there are no issues in game and especially so as we have multiple languages supported.
Post Reply

Return to “Tech Support”