View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0000769||Main CAcert Website||certificate issuing||public||2009-08-15 13:18||2021-08-26 15:29|
|Priority||urgent||Severity||crash||Reproducibility||have not tried|
|Summary||0000769: Client certificate broken with unicode|
|Description||While generating client certificate, I got certificate with name "Michal &". I guess something went wrong when processing unicode name "Michal ?iha?".|
[The surname could be for example "Čihař".]
|Steps To Reproduce||Tried generate a client cert on the testsys. My name contains "š" (s-caron). It is taken from my account, where I suppose is OK (displayed OK in my account). However, it is CP-1250 one-byte coded in the client cert created. As client cert is probably UTF-8 (two-bytes for diacritic) coded, this CP-1250 coding is wrong.|
|Additional Information||Such error occurs in Win/IE, Win/Chrome, and also Linux Ubuntu/Mozilla Firefox, Linux OpenSuSE/Mozilla Firefox.|
This error can be also seen at the beginning of e-mail notices about end of cert's validity. E.G.: Hi Ale�, (etc.) usually I can only read "Hi Ale,". The last char of my name depends on the mail client. Here the hex representation was FDFF meaning the information was lost. Another clients show 9A00 (CP-1250), or C2006101. The correct Unicode representation is 6101 only. The correct UTF-8 representation is C5A1. The representation in the cert is 9A (CP-1250). It would be correct, if this text in the cert was preceded by a coding type and codepage number (I do not know if it is).
|Tags||CATS, certificates, charset, diacritic, iso-8559-1, latin1, names, unicode, utf-8|
|related to||0000475||needs work||Ted||CATS.cacert.org||Improvements on some tables|
|related to||0000610||needs work||CATS.cacert.org||use utf-8 as encoding|
|related to||0000869||new||Wiki||Wrong characters while creating the pdf for COAP|
|related to||0000932||needs testing||BenBE||test.cacert.org||Get UTF8 failured email subject|
|related to||0000979||needs feedback||BenBE||Main CAcert Website||HTML <meta> tag states charset=utf-8 on some pages when it is not|
|related to||0000991||needs work||NEOatNHNG||Main CAcert Website||commonName is wrongly burned on CSR|
|related to||0001025||needs work||NEOatNHNG||Main CAcert Website||Domain Dispute strange behaviour / Domain Dispute issue|
|related to||0001039||needs review & testing||Main CAcert Website||Cyber peretas nomor 085823771018|
|related to||0001054||needs review & testing||Ted||Main CAcert Website||Review the code regarding the new point calculation in ./includes/general.php|
|related to||0001311||new||Main CAcert Website||The check about email during email dispute works incorrect|
|related to||0001354||needs review||BenBE||Main CAcert Website||Problems with diacretics and non-latin1 characters|
|related to||0001389||solved?||BenBE||Main CAcert Website||Wrong encoding for mails sent with function sendmail()|
|related to||0001398||needs work||Ted||CATS.cacert.org||User Interface Translation to Czech|
|related to||0001401||new||BenBE||Main CAcert Website||Names containing non-ascii characters are displayed incorrectly on the website|
|related to||0001402||new||CATS.cacert.org||A comment on deployed Czech translation of the "Assurer's Challenge" test|
|related to||0001419||new||Main CAcert Website||Issue with displaying "é" as é in "Client Certificates - View all certificates"|
|related to||0001441||solved?||wytze||test.cacert.org||umlauts are not stored/displayed correctly in Testsystem|
|related to||0001461||new||Blog||Hatchek, etc. not displayed|
|related to||0000770||new||Main CAcert Website||PGP keys signing fails with unicode|
|related to||0001097||closed||NEOatNHNG||Main CAcert Website||Special characters which have no HTML-entities are not properly escaped|
This error is very unpleasant, as it concerns the basic info - user's name.
As of certs with names I am not Aleš anymore, but Ale (beer) only.
The certificate interpreter suppose the CN name is coded as UTF-8. In fact, the name is coded CP-1250 (Latin2). From where does it come? If you make a CSR with CN=name, then the name is properly coded in UTF-8. So there are following 2 possibilities:
1. The name is taken from the user's account and possibly converted into CP-1250 representation. But the document CPS (COD6) in 3.1.1 says that the CN (and OU) are NOT coming from the member's account!
2. The name is taken from CSR and then converted into member's national CP. This behaviour possibly remains from days before UTF-8 arised.
Conclusion: Probably an unnecessary conversion is performed.
The web site software has a lot of hardcoded latin1 assumptions, even the database is encoded in latin1/iso-8859-1. Database migrations and a full retest of registration, assurance and certificate issuing is required to fix this. I need to check whether the signer also has assumptions about the charset. I agree that this is unacceptable in 2021 and that the whole site and software should use UTF-8/Unicode.
At least CATS is affected by this too and uses Latin1 instead of UTF-8.
||On the 2nd of August, a certificate requested by Wacław triggered the bug. It requiered from Dirk and Michaela to go to Ede, and reset the signer script manually.|
A little research has discovered a severe bug.
1. Encoding/decoding names and other texts containing characters outside ASCII table, e.g. ANSI/ISO CPs are not encoded/decoded entering/leaving accounts. BUT, the standard representation (at least in Windows) changed during all the CAcert years - since founded. THIS MEANS that the representation of names considered as very important by CAcert - also changed. In my case, when I created my account in 2014, my Windows sent my name in ANSI/ISO. Now it is UTF-8. When creating a certificate, CAcert's software does no conversion. Certificates are supposed to have text fields in UTF-8. So, old accounts could have ANOTHER representation of names then new ones! (Provided that accounts were/are created from Windows.)
2. The CAcert sotware has/had no tools to discover the code and convert it to UTF-8.
3. Even after repairing the signer software, these names/dates/other texts (security questions) will suffer. In fact, there are users complaining that they are unable to answer security questions.
some data points from our research:
- Web application only support ISO-8859-1, everything else is encoded as HTML entity and written to the database in the encoded representation and enforces ISO-8859-1 on incoming requests
- the database stores data with latin1_ci_swedish, the ancient MySQL default collation
- the client.pl that sends data to the signer takes the HTML encoded form of SubjectDN and SubjectAlternativeName fields literaly and sends it to the signer
- server.pl on the signer has a error condition that fails on &#<number>; encoded entities in SubjectDN, this crashes server.pl, so no response is sent to the signer client and client.pl runs into a timeout. The request with the broken SubjectDN is sent to the signer and crashes server.pl until manual intervention
- the openssl configuration on the signer (ancient Debian 5 openssl) takes the ISO-8859-1 data literally but marks them as ASN.1 T61String which is wrong. No reencoding is done and therefore ISO-8859-1 characters above the ASCII character set like 'â' are not encoded correctly and can not be displayed by conforming X.509/ASN.1 implementations
So we have an encoding problem on all these levels. A proper solution would be to reimplement everything to use UTF-8 and implement data migration procedures at least for all the values in the subject column(s) of the relevant database table(s).
To protect the signer's server.pl from future crashes we need to perform the same checks of the subject string in client.pl and do not send requests to the signer that will crash server.pl. This means:
- that we can not issue certificates for anybody who has a character that is not in ISO-8859-1 in his name (that will be encoded as numbered HTML entity by the web application)
- that certificates that contain a subject DN that is valid in ISO-8859-1, but contains characters that have a different meaning in ASN.1 T61String will still be issued but will have a wrong encoding for the CN part of the Subject DN
An UTF-8 implementation should be done in several - individually testable - steps:
1.) reimplement the signer software (currently server.pl) in a language that supports Unicode properly, still speak the old protocol but convert incoming Subject DN information to UTF-8 and create certificates with ASN.1 UTFString encoding for the Subject DN fileds
2.) reimplement the signer client (currently client.pl) to do the conversion, implement a proper protocol between signer client and server that can properly contain UTF-8 characters, has proper framing, error signaling and so on. I have ideas how to do this but still need to write the specification.
3.) implement a set of automated tests with representative data especially for non western-european / english contries (everything not covered by 7bit ASCII) to have a set of regression tests for step 4
4.) reimplement the Web application and implement data migration code to transform the database to UTF-8/Unicode (MySQL uses a 4 Byte Unicode representation utf8mb4 internally). This does only make sense with a proper set of tests from 3)
I know this is a huge undertaking but from my point of view this is necessary if we want to stay relevant, gain new community members or implement any new use cases based on our certificates (i.e. OAuth2/OpenID connect federated login, or any other new standard that is using Unicode/UTF-8).
Following: https://bugs.cacert.org/view.php?id=769#c6075 by Golffies
This issue triggered a visit at the Signer for me and Michaela, but (after investigation) it has been solved remotely on webdb-server.
Means: Whenever it happens again, a visit at the datacenter is not necessary, but manual remote interaction by critical team is necessary on webdb.
||I implemented a mitigation for the server.pl crash in https://github.com/CAcertOrg/cacert-devel/pull/31|
|2009-08-15 13:18||nijel||New Issue|
|2012-12-22 20:13||Werner Dworak||Relationship added||related to 0000770|
|2012-12-22 20:16||Werner Dworak||Relationship added||related to 0001097|
|2015-08-29 20:51||alkas||Note Added: 0005460|
|2015-08-29 20:51||alkas||Steps to Reproduce Updated|
|2015-08-29 20:51||alkas||Additional Information Updated|
|2015-08-30 05:58||alkas||Tag Attached: certificates|
|2015-08-30 05:58||alkas||Tag Attached: diacritic|
|2015-08-30 05:58||alkas||Tag Attached: names|
|2015-08-30 06:34||alkas||Additional Information Updated|
|2015-08-30 06:35||alkas||Additional Information Updated|
|2015-08-30 06:39||alkas||Description Updated|
|2021-08-06 16:28||alkas||Note Added: 0006055|
|2021-08-07 10:54||jandd||Note Added: 0006061|
|2021-08-07 10:55||jandd||Note Edited: 0006061|
|2021-08-07 10:55||jandd||Tag Attached: CATS|
|2021-08-07 10:55||jandd||Tag Attached: unicode|
|2021-08-07 10:55||jandd||Tag Attached: utf-8|
|2021-08-07 10:55||jandd||Tag Attached: latin1|
|2021-08-07 10:55||jandd||Tag Attached: iso-8559-1|
|2021-08-07 10:55||jandd||Tag Attached: charset|
|2021-08-25 13:28||Golffies||Priority||normal => urgent|
|2021-08-25 13:28||Golffies||Severity||minor => crash|
|2021-08-25 13:28||Golffies||Note Added: 0006075|
|2021-08-25 13:34||bdmc||Relationship added||related to 0000475|
|2021-08-25 13:34||bdmc||Relationship added||related to 0000610|
|2021-08-25 13:34||bdmc||Relationship added||related to 0000869|
|2021-08-25 13:35||bdmc||Relationship added||related to 0000932|
|2021-08-25 13:35||bdmc||Relationship added||related to 0000979|
|2021-08-25 13:36||bdmc||Relationship added||related to 0000991|
|2021-08-25 13:36||bdmc||Relationship added||related to 0001025|
|2021-08-25 13:37||bdmc||Relationship added||related to 0001039|
|2021-08-25 13:37||bdmc||Relationship added||related to 0001054|
|2021-08-25 13:37||bdmc||Relationship added||related to 0001311|
|2021-08-25 13:38||bdmc||Relationship added||related to 0001354|
|2021-08-25 13:38||bdmc||Relationship added||related to 0001389|
|2021-08-25 13:38||bdmc||Relationship added||related to 0001398|
|2021-08-25 13:39||bdmc||Relationship added||related to 0001401|
|2021-08-25 13:39||bdmc||Relationship added||related to 0001402|
|2021-08-25 13:39||bdmc||Relationship added||related to 0001419|
|2021-08-25 13:40||bdmc||Relationship added||related to 0001441|
|2021-08-25 13:41||bdmc||Relationship added||related to 0001461|
|2021-08-25 16:17||alkas||Note Added: 0006076|
|2021-08-25 16:48||jandd||Note Added: 0006077|
|2021-08-25 20:23||egal||Note Added: 0006078|
|2021-08-26 15:29||jandd||Note Added: 0006083|