View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0001001||Main CAcert Website||misc||public||2011-12-14 20:57||2012-12-30 08:22|
|Platform||Main CAcert Website||OS||N/A|
|Summary||0001001: Need a way to set up redundant OCSP responders|
|Description||Some organisations have offered to host an OCSP server on their infrastructure to overcome some stability problems. Problem is we can't give them an OCSP cert (they could then produce valid responses for any serial number even if not valid/revoked in our system).|
A possibility would be to build some kind of caching mechanism that doesn't need an OCSP cert because it just gets the responses from the main server.
Unfortunately there seems no existing software to do this so we probably have to implement it ourselves.
|Additional Information||Possible architecture:|
- Master OCSP server runs a daemon which sends signed OCSP responses for all serial numbers valid in the system to known slave servers
- If asked for known serial numbers slaves answer with the cached response
- If asked for unknown serial number slaves ask master
* If serial not valid, master answers with a signed response and slaves cache that response for a limited time (maybe even more sophisticated: if revoked cache for long time, if only unknown only cache for short time) and send the response to the client
* If serial valid, slave caches that response as usual and sends response to client
* If master unreachable or indicates failure indicate failure to the client
There is also an IETF draft but it never made it towards standardisation before it expired https://tools.ietf.org/html/draft-ietf-pkix-ocsp-caching-00
|Tags||No tags attached.|
The problem that I see with this idea is the traffic it takes. At the moment, CAcert has 700 thousand issued certificates, so we can approximate 1 million certs. An OCSP response takes approximately 2 KB, so we have 2 GB of traffic that we would want to distribute regularly (daily? hourly? in case of emergency?) to the OCSP caches? If we do it daily, then we have 60 GB of traffic per OCSP responder. I think having "trusted" OCSP servers where CAcert is sure that they are operated properly, is the better way, than to run untrusted OCSP caches.
My suggestion is to setup several OCSP servers on various trusted places, automatically and manually monitor them, whether they are behaving properly, and (semi-)automatically pulling their plug through DNS if they misbehave.
We only need to distribute OCSP responses for _valid_ certs not for all issued certs, which is 82,807 certs at the moment (one order of magnitude less) and likely to stay more or less the same (grows with the number of users not with time). Also there are some ways to make distribution more efficient (e.g. compress multiple responses together in one package).
AFAIK the responder we have in mind right now even resides in the same data center as our servers. I don't know about our contracts but this might be even cheaper, at least faster.
Having trusted sites will need serious consideration. OCSP certs can not be revoked in practice and DNS changes might take a while to propagate. Apart from that DNS was not built for high security applications as some vulnerabilities have shown in the past. DNSSEC might improve that but it's not widely implemented yet and the issue of stub resolvers is not handled.
||P.S.: ideally we didn't have to do caching on our side at all and every server would support OCSP stapling, but I guess that's not gonna happen anytime soon.|
Having several OCSP responders in the same datacenter is not helping much. We need OCSP responders in different places, at least different countries, preferrably different continents.
If one datacenter or even a whole country has an outage, CAcert would jeopardize all CAcert certificate users on the rest of the planet, which need the availability of OCSP, by not having enough OCSP servers in other areas. CAcert should provide a high-available OCSP infrastructure on a global scale.
The main problems today are due to crashes of the responder software or machine. In this concrete cases a second responder even in the same data center would improve the situation a lot.
For disaster recovery and latency it might indeed be a good idea to spread them around the world but this is more a "what happens if" matter not "what is most annoying right now". For a global spread we could try to scale up the design proposed in the description, then we would likely need some additional efficiency improvements (e.g. only distribute responses for certs where there was at least one request in some fixed time span, to only catch certs in active use) or indeed operate multiple full-blown OCSP responders, this would probably be quite costly because the environment needs to be secured.
Question is how do these full-blown responders get to know which certs are valid? If this would require an active connection to the database it would be as bad as a local solution. If they take a CRL there's problems if someone just issues certs above our current serial number range because they can't know what is the highest valid number and so on (this is what the current responder does IIRC).
One possible issue with the caching approach I have not looked into yet: the OCSP request allows for an extension which includes a nonce with the request and the response needs to also contain this nonce. I don't know how many clients actually do this as then all sorts of caching (including OCSP stapling) would not work any more. If it's a high percentage we might need to go for the full-blown OCSP responder approach anyway.
||Maybe we could also use standard P2P techniques to distribute the traffic between the caching OCSP responders|
Maybe http://www.ejbca.org/installation-ocsp.html shows a possible solution. Every OCSP responder should have its own certificate. If something goes wrong at one responder, its certificate will be revoked. Of couse if this happens, this is a very uncomfortable situation, so protection of OCSPs private key is really an issue.
Each responder have its own database, which is of course synced with master server, so responder works, even if master is temporally down.
|2011-12-14 20:57||NEOatNHNG||New Issue|
|2011-12-17 17:43||Sourcerer||Note Added: 0002746|
|2011-12-17 18:59||NEOatNHNG||Note Added: 0002747|
|2011-12-17 19:00||NEOatNHNG||Note Added: 0002748|
|2011-12-18 01:33||Sourcerer||Note Added: 0002749|
|2011-12-18 02:27||NEOatNHNG||Note Added: 0002750|
|2012-02-08 23:02||NEOatNHNG||Note Added: 0002828|
|2012-12-02 16:38||INOPIAE||Relationship added||related to 0001119|
|2012-12-30 08:22||CookieEater||Note Added: 0003581|