View Issue Details

IDProjectCategoryView StatusLast Update
0000769Main CAcert Websitecertificate issuingpublic2021-08-26 15:29
Reporternijel Assigned To 
PriorityurgentSeveritycrashReproducibilityhave not tried
Status newResolutionopen 
Summary0000769: Client certificate broken with unicode
DescriptionWhile generating client certificate, I got certificate with name "Michal &". I guess something went wrong when processing unicode name "Michal ?iha?".
[The surname could be for example "Čihař".]
Steps To ReproduceTried generate a client cert on the testsys. My name contains "š" (s-caron). It is taken from my account, where I suppose is OK (displayed OK in my account). However, it is CP-1250 one-byte coded in the client cert created. As client cert is probably UTF-8 (two-bytes for diacritic) coded, this CP-1250 coding is wrong.
Additional InformationSuch error occurs in Win/IE, Win/Chrome, and also Linux Ubuntu/Mozilla Firefox, Linux OpenSuSE/Mozilla Firefox.
This error can be also seen at the beginning of e-mail notices about end of cert's validity. E.G.: Hi Ale�, (etc.) usually I can only read "Hi Ale,". The last char of my name depends on the mail client. Here the hex representation was FDFF meaning the information was lost. Another clients show 9A00 (CP-1250), or C2006101. The correct Unicode representation is 6101 only. The correct UTF-8 representation is C5A1. The representation in the cert is 9A (CP-1250). It would be correct, if this text in the cert was preceded by a coding type and codepage number (I do not know if it is).
TagsCATS, certificates, charset, diacritic, iso-8559-1, latin1, names, unicode, utf-8
Reviewed by
Test Instructions

Relationships

related to 0000475 needs workTed CATS.cacert.org Improvements on some tables 
related to 0000610 needs work CATS.cacert.org use utf-8 as encoding 
related to 0000869 new Wiki Wrong characters while creating the pdf for COAP 
related to 0000932 needs testingBenBE test.cacert.org Get UTF8 failured email subject 
related to 0000979 needs feedbackBenBE Main CAcert Website HTML <meta> tag states charset=utf-8 on some pages when it is not 
related to 0000991 needs workNEOatNHNG Main CAcert Website commonName is wrongly burned on CSR 
related to 0001025 needs workNEOatNHNG Main CAcert Website Domain Dispute strange behaviour / Domain Dispute issue 
related to 0001039 needs review & testing Main CAcert Website Cyber peretas nomor 085823771018 
related to 0001054 needs review & testingTed Main CAcert Website Review the code regarding the new point calculation in ./includes/general.php 
related to 0001311 new Main CAcert Website The check about email during email dispute works incorrect 
related to 0001354 needs reviewBenBE Main CAcert Website Problems with diacretics and non-latin1 characters 
related to 0001389 solved?BenBE Main CAcert Website Wrong encoding for mails sent with function sendmail() 
related to 0001398 needs workTed CATS.cacert.org User Interface Translation to Czech 
related to 0001401 newBenBE Main CAcert Website Names containing non-ascii characters are displayed incorrectly on the website 
related to 0001402 new CATS.cacert.org A comment on deployed Czech translation of the "Assurer's Challenge" test 
related to 0001419 new Main CAcert Website Issue with displaying "é" as é in "Client Certificates - View all certificates" 
related to 0001441 solved?wytze test.cacert.org umlauts are not stored/displayed correctly in Testsystem 
related to 0001461 new Blog Hatchek, etc. not displayed 
related to 0000770 new Main CAcert Website PGP keys signing fails with unicode 
related to 0001097 closedNEOatNHNG Main CAcert Website Special characters which have no HTML-entities are not properly escaped 

Activities

alkas

2015-08-29 20:51

manager   ~0005460

This error is very unpleasant, as it concerns the basic info - user's name.
As of certs with names I am not Aleš anymore, but Ale (beer) only.

alkas

2021-08-06 16:28

manager   ~0006055

The certificate interpreter suppose the CN name is coded as UTF-8. In fact, the name is coded CP-1250 (Latin2). From where does it come? If you make a CSR with CN=name, then the name is properly coded in UTF-8. So there are following 2 possibilities:
1. The name is taken from the user's account and possibly converted into CP-1250 representation. But the document CPS (COD6) in 3.1.1 says that the CN (and OU) are NOT coming from the member's account!
2. The name is taken from CSR and then converted into member's national CP. This behaviour possibly remains from days before UTF-8 arised.
Conclusion: Probably an unnecessary conversion is performed.

jandd

2021-08-07 10:54

administrator   ~0006061

Last edited: 2021-08-07 10:55

The web site software has a lot of hardcoded latin1 assumptions, even the database is encoded in latin1/iso-8859-1. Database migrations and a full retest of registration, assurance and certificate issuing is required to fix this. I need to check whether the signer also has assumptions about the charset. I agree that this is unacceptable in 2021 and that the whole site and software should use UTF-8/Unicode.

At least CATS is affected by this too and uses Latin1 instead of UTF-8.

Golffies

2021-08-25 13:28

manager   ~0006075

On the 2nd of August, a certificate requested by Wacław triggered the bug. It requiered from Dirk and Michaela to go to Ede, and reset the signer script manually.

alkas

2021-08-25 16:17

manager   ~0006076

A little research has discovered a severe bug.
1. Encoding/decoding names and other texts containing characters outside ASCII table, e.g. ANSI/ISO CPs are not encoded/decoded entering/leaving accounts. BUT, the standard representation (at least in Windows) changed during all the CAcert years - since founded. THIS MEANS that the representation of names considered as very important by CAcert - also changed. In my case, when I created my account in 2014, my Windows sent my name in ANSI/ISO. Now it is UTF-8. When creating a certificate, CAcert's software does no conversion. Certificates are supposed to have text fields in UTF-8. So, old accounts could have ANOTHER representation of names then new ones! (Provided that accounts were/are created from Windows.)
2. The CAcert sotware has/had no tools to discover the code and convert it to UTF-8.
3. Even after repairing the signer software, these names/dates/other texts (security questions) will suffer. In fact, there are users complaining that they are unable to answer security questions.

jandd

2021-08-25 16:48

administrator   ~0006077

some data points from our research:

- Web application only support ISO-8859-1, everything else is encoded as HTML entity and written to the database in the encoded representation and enforces ISO-8859-1 on incoming requests
- the database stores data with latin1_ci_swedish, the ancient MySQL default collation
- the client.pl that sends data to the signer takes the HTML encoded form of SubjectDN and SubjectAlternativeName fields literaly and sends it to the signer
- server.pl on the signer has a error condition that fails on &#<number>; encoded entities in SubjectDN, this crashes server.pl, so no response is sent to the signer client and client.pl runs into a timeout. The request with the broken SubjectDN is sent to the signer and crashes server.pl until manual intervention
- the openssl configuration on the signer (ancient Debian 5 openssl) takes the ISO-8859-1 data literally but marks them as ASN.1 T61String which is wrong. No reencoding is done and therefore ISO-8859-1 characters above the ASCII character set like 'â' are not encoded correctly and can not be displayed by conforming X.509/ASN.1 implementations

So we have an encoding problem on all these levels. A proper solution would be to reimplement everything to use UTF-8 and implement data migration procedures at least for all the values in the subject column(s) of the relevant database table(s).

To protect the signer's server.pl from future crashes we need to perform the same checks of the subject string in client.pl and do not send requests to the signer that will crash server.pl. This means:
- that we can not issue certificates for anybody who has a character that is not in ISO-8859-1 in his name (that will be encoded as numbered HTML entity by the web application)
- that certificates that contain a subject DN that is valid in ISO-8859-1, but contains characters that have a different meaning in ASN.1 T61String will still be issued but will have a wrong encoding for the CN part of the Subject DN

An UTF-8 implementation should be done in several - individually testable - steps:

1.) reimplement the signer software (currently server.pl) in a language that supports Unicode properly, still speak the old protocol but convert incoming Subject DN information to UTF-8 and create certificates with ASN.1 UTFString encoding for the Subject DN fileds
2.) reimplement the signer client (currently client.pl) to do the conversion, implement a proper protocol between signer client and server that can properly contain UTF-8 characters, has proper framing, error signaling and so on. I have ideas how to do this but still need to write the specification.
3.) implement a set of automated tests with representative data especially for non western-european / english contries (everything not covered by 7bit ASCII) to have a set of regression tests for step 4
4.) reimplement the Web application and implement data migration code to transform the database to UTF-8/Unicode (MySQL uses a 4 Byte Unicode representation utf8mb4 internally). This does only make sense with a proper set of tests from 3)

I know this is a huge undertaking but from my point of view this is necessary if we want to stay relevant, gain new community members or implement any new use cases based on our certificates (i.e. OAuth2/OpenID connect federated login, or any other new standard that is using Unicode/UTF-8).

egal

2021-08-25 20:23

administrator   ~0006078

Following: https://bugs.cacert.org/view.php?id=769#c6075 by Golffies

This issue triggered a visit at the Signer for me and Michaela, but (after investigation) it has been solved remotely on webdb-server.

Means: Whenever it happens again, a visit at the datacenter is not necessary, but manual remote interaction by critical team is necessary on webdb.

jandd

2021-08-26 15:29

administrator   ~0006083

I implemented a mitigation for the server.pl crash in https://github.com/CAcertOrg/cacert-devel/pull/31

Issue History

Date Modified Username Field Change
2009-08-15 13:18 nijel New Issue
2012-12-22 20:13 Werner Dworak Relationship added related to 0000770
2012-12-22 20:16 Werner Dworak Relationship added related to 0001097
2015-08-29 20:51 alkas Note Added: 0005460
2015-08-29 20:51 alkas Steps to Reproduce Updated
2015-08-29 20:51 alkas Additional Information Updated
2015-08-30 05:58 alkas Tag Attached: certificates
2015-08-30 05:58 alkas Tag Attached: diacritic
2015-08-30 05:58 alkas Tag Attached: names
2015-08-30 06:34 alkas Additional Information Updated
2015-08-30 06:35 alkas Additional Information Updated
2015-08-30 06:39 alkas Description Updated
2021-08-06 16:28 alkas Note Added: 0006055
2021-08-07 10:54 jandd Note Added: 0006061
2021-08-07 10:55 jandd Note Edited: 0006061
2021-08-07 10:55 jandd Tag Attached: CATS
2021-08-07 10:55 jandd Tag Attached: unicode
2021-08-07 10:55 jandd Tag Attached: utf-8
2021-08-07 10:55 jandd Tag Attached: latin1
2021-08-07 10:55 jandd Tag Attached: iso-8559-1
2021-08-07 10:55 jandd Tag Attached: charset
2021-08-25 13:28 Golffies Priority normal => urgent
2021-08-25 13:28 Golffies Severity minor => crash
2021-08-25 13:28 Golffies Note Added: 0006075
2021-08-25 13:34 bdmc Relationship added related to 0000475
2021-08-25 13:34 bdmc Relationship added related to 0000610
2021-08-25 13:34 bdmc Relationship added related to 0000869
2021-08-25 13:35 bdmc Relationship added related to 0000932
2021-08-25 13:35 bdmc Relationship added related to 0000979
2021-08-25 13:36 bdmc Relationship added related to 0000991
2021-08-25 13:36 bdmc Relationship added related to 0001025
2021-08-25 13:37 bdmc Relationship added related to 0001039
2021-08-25 13:37 bdmc Relationship added related to 0001054
2021-08-25 13:37 bdmc Relationship added related to 0001311
2021-08-25 13:38 bdmc Relationship added related to 0001354
2021-08-25 13:38 bdmc Relationship added related to 0001389
2021-08-25 13:38 bdmc Relationship added related to 0001398
2021-08-25 13:39 bdmc Relationship added related to 0001401
2021-08-25 13:39 bdmc Relationship added related to 0001402
2021-08-25 13:39 bdmc Relationship added related to 0001419
2021-08-25 13:40 bdmc Relationship added related to 0001441
2021-08-25 13:41 bdmc Relationship added related to 0001461
2021-08-25 16:17 alkas Note Added: 0006076
2021-08-25 16:48 jandd Note Added: 0006077
2021-08-25 20:23 egal Note Added: 0006078
2021-08-26 15:29 jandd Note Added: 0006083