Chinese character leads to corrupt xlsx file

Questions and answers on how to use XLSReadWriteII 5.
Post Reply
d3nton
Posts: 137
Joined: Thu Oct 25, 2012 9:48 am

Chinese character leads to corrupt xlsx file

Post by d3nton »

Hi!

I have an issue if i try to save a foldername with chinese characters to an xlsx file.
The reproduction is a bit tricky, since
I cannot reproduce this with a hardcoded string but only with a folder which contains chinese characters.
You can download the sample folder here:
http://www.file-upload.net/download-803 ... r.zip.html

To reproduce the issue please extract the testolder.zip e.g. to an empty usb stick.
Use the code below (please adjust the drivename of the usb stick in the code below).
FindFirst will return a TSearchRec record with the name of the folder which I try to write
as string into an xlsx file:

Sample code:

Code: Select all

var
ExcelFile5 : TXLSReadWriteII5;
lstring: TSearchRec;

begin
FindFirst('F:\␍簴ꊨ簷*', faAnyFile, lstring); <- adjust the drivename of the usb stick here

ExcelFile5 := TXLSReadWriteII5.Create(nil);
ExcelFile5.Version:= xvExcel97;
ExcelFile5.Filename := 'D:\output.xlsx';
ExcelFile5[0].AsString[0,0] := lstring.Name;
ExcelFile5.Write;
could you please fix this in the next version?
Thank you.
d3nton
Posts: 137
Joined: Thu Oct 25, 2012 9:48 am

Re: Chinese character leads to corrupt xlsx file

Post by d3nton »

Here is another repro, which might be easier.
Please not that you have to set the file format to 'Little Endian UCS-2'in your project.

Code: Select all

var
ExcelFile5 : TXLSReadWriteII5;
lstring: TSearchRec;

begin
ExcelFile5 := TXLSReadWriteII5.Create(nil);
ExcelFile5.Filename := 'D:\output.xlsx';
ExcelFile5[0].AsString[0,0] := 'F:\␍簴ꊨ簷�';
ExcelFile5.Write;
end.
larsa
Site Admin
Posts: 926
Joined: Mon Jun 27, 2005 9:30 pm

Re: Chinese character leads to corrupt xlsx file

Post by larsa »

Hello

There is something wrong with you Chinese characters. The first "character" is a square, meaning that it's not a correct character. If you want to use/test Chinese, write the characters unicode numeric values, such as: #$5564 + #$9152 (beer in Chinese).
Generally there are no problems using Chinese or other non-latin alphabets. We have many Chinese users of the component.
Lars Arvidsson, Axolot Data
d3nton
Posts: 137
Joined: Thu Oct 25, 2012 9:48 am

Re: Chinese character leads to corrupt xlsx file

Post by d3nton »

Are you sure? Because exporting to xls instead of xlsx works with XLSReadWrite. The symbol is not a square but the same symbol like in my examples above.
Also if i copy the foldername in MS Excel and save the file as xlsx, the xlsx file is is valid and can be opened again without any errors.
larsa
Site Admin
Posts: 926
Joined: Mon Jun 27, 2005 9:30 pm

Re: Chinese character leads to corrupt xlsx file

Post by larsa »

Hello

Yes, your Chinese characters are not valid. You can test them at google translate: https://translate.google.com/
Lars Arvidsson, Axolot Data
d3nton
Posts: 137
Joined: Thu Oct 25, 2012 9:48 am

Re: Chinese character leads to corrupt xlsx file

Post by d3nton »

Hi!

Okay.
I moved the characters via copy paste to excel and saved the file as xlsx (->the file contains the invalid character buit is not corrupted).
Aftgerwards I extracted the xlsx file (created in excel). Compared to the xml created by XLSReadWrite there is a small difference.:
Excel appended a _FFFF_ to the characters which results in a valid excel file although the character is invalid
Here is a comparison screenshot:
http://www.file-upload.net/download-863 ... e.png.html

Would it be possible, that XLSReadWrite also append this _FFFF_ to avoid an invalid excel file?

best regards
denton
larsa
Site Admin
Posts: 926
Joined: Mon Jun 27, 2005 9:30 pm

Re: Chinese character leads to corrupt xlsx file

Post by larsa »

Hello

Can you please explain the logic behind appending _FFFF_? Shall this be appended to all Chinese characters? To all invalid characters? How does the component know that the characters are invalid?
Lars Arvidsson, Axolot Data
d3nton
Posts: 137
Joined: Thu Oct 25, 2012 9:48 am

Re: Chinese character leads to corrupt xlsx file

Post by d3nton »

There are 66 Noncharacters (some in the unicode basic plane, some in the supplementary plane):
http://www.unicode.org/faq/private_use. ... characters
http://en.wikipedia.org/wiki/Unicode#Ch ... l_Category

These characters corrupt the xlsx files.
Nocharacters in delphi are represented e.g. #$FDD0, #$FDD1 ...
These are all noncharacters:

Code: Select all

#$FDD0,#$FDD1,#$FDD2,#$FDD3,#$FDD4,#$FDD5,#$FDD6,#$FDD7,#$FDD8,#$FDD9,#$FDDA,#$FDDB,#$FDDC,#$FDDD,#$FDDE,#$FDDF,
#$FDE0,#$FDE1,#$FDE2,#$FDE3,#$FDE4,#$FDE5,#$FDE6,#$FDE7,#$FDE8,#$FDE9,#$FDEA,#$FDEB,#$FDEC,#$FDED,#$FDEE,#$FDEF,
 #$FFFE,#$FFFF
and the multibyte characters:
#$1FFFE,#$1FFFF,#$2FFFE,#$2FFFF,#$3FFFE,#$3FFFF,#$4FFFE,#$4FFFF,#$5FFFE,#$5FFFF,#$6FFFE,#$6FFFF,#$7FFFE,#$7FFFF,#$8FFFE,#$8FFFF,
#$9FFFE,#$9FFFF,#$AFFFE,#$AFFFF,#$BFFFE,#$BFFFF,#$CFFFE,#$1FFFF,#$DFFFE,#$DFFFF,#$EFFFE,#$EFFFF,#$FFFFE,#$FFFFF,
#$10FFFE,#$10FFFF
Can you please explain the logic behind appending _FFFF_?
The _FFFF_ is not appended as string but in the extracted xlsx file in the xl\sharestrins.xml where the string is stored:
(<sst xmlns="http://schemas.openxmlformats.org/sprea ... /2006/main" count="1" uniqueCount="1"><si><t>␍簴ꊨ簷_xFFFF_</t></si></sst>)
The character is displayed in excel as a rectangle containing a questionmark.
You can reproduce this behaviour by downloading this testfolder http://www.file-upload.net/download-803 ... r.zip.html
If you extract the folder and copy-paste the folder name in an xlsx file you can see the described behaviour.
larsa
Site Admin
Posts: 926
Joined: Mon Jun 27, 2005 9:30 pm

Re: Chinese character leads to corrupt xlsx file

Post by larsa »

Hello

1. There are 66 Noncharacters...
You can write whatever characters you want. The component has no opinion on that.

2. You can reproduce this behaviour by downloading...
Yes, the folder name is invalid. It can't be translated by Google Translate. Congratulations. Do you want the component to make sure that your files contains invalid characters? Sorry I will not add this feature.

You have still not explained the logic behind _xFFFF_. Please point me to the relevant documentation. An educated guess would however be that _xFFFF_ means that the string is invalid.

As you have the source code, there is nothing that stops you from adding your own routine to check the characters before they are written to the file. You can then add whatever invalid data you want.

I would also like to underline that there are of course many Chinese users of the component ad I have not received any complaints from any that there are errors in the strings.
Lars Arvidsson, Axolot Data
Post Reply