View Issue Details

IDProjectCategoryView StatusLast Update
0000793Main CAcert Websitesource codepublic2013-01-15 14:26
Reporteredgarwahn Assigned ToSourcerer  
PrioritynormalSeverityfeatureReproducibilityN/A
Status closedResolutionfixed 
Fixed in Version2009 Q4 
Summary0000793: stats.php consumes big time to finish - add caching feature
Descriptionstats.php gets some statistical data from the tables and produces a nice overview of cacerts "performance" over the last periods of time.
Sadly the page is directly accessible from the net AND it consumes quite a lot of time to finish. Thus its a perfect point to an DOS attack against CACert.

Solution is a small patch which utilizes a caching table in the database.

Cache TTL is currently set to 60 seconds to make it more "testable", see the define directly at the beginning of the file.

Caching table is created on the fly, fallback to get from live if cache fails.

Drawback: statistical data is not deleted from database at all, thus quite a amount of data will aggregate (not a problem, data can be deleted, we just need to add an housekeeper which does the job).
TagsNo tags attached.
Attached Files
stats.php (13,014 bytes)
stats_ph.php (12,456 bytes)
Reviewed by
Test Instructions

Relationships

child of 0000730 closedSourcerer statistics pages are slow 

Activities

Sourcerer

2009-11-25 12:29

administrator   ~0001517

Can you also provide a SQL script that creates the table, please?

edgarwahn

2009-11-25 12:33

developer   ~0001518

Sorry, forgot to mention, its in the source. Table is automatically created if the script gets the "no such table" error message from mysql.

[...]
elseif (mysql_errno() > 0) {
    if (mysql_errno() == 1146) {
    $sql = 'create table statscache (timestamp bigint not null primary key, cache text not null) engine=myisam';
    $res = mysql_query($sql);
    }
}

aphexer

2009-11-25 18:47

reporter   ~0001519

Running the full stats query every minute is still overkill. Increasing the TTL to 1 day seems more reasonable to me, and then this patch would be a great improvement over the current system.

If the choice was either this patch or keeping the system as is, then definitely please apply this patch (but with a higher TTL).

However a few remarks:

-Your caching system can only be used for the stats.php. There are a lot of other pages which are quite slow. Hence a general caching system that can be used for anything would be preferred in my opinion.

-No matter at which value you set the TTL you still execute the *full* heavy query every $TTL time. If there were 1000 assurers on Jan 1, then there is no reason to go calculate this value again each day, or even each minute as you proposed. You just remember this value semi-permanently and just calculate the diff between Jan 1 and today, which is much more efficient.

-You need to make quite some changes to the stats.php code itself. It would be cooler if you could just say: "this query needs to be cached" but not having to change anything else.

So please take a look at my patches too: http://aphexer.ulyssis.org/cacert/

The caching system I implemented there takes care of the above 3 issues. (especially patch 03a_cache.php-create.patch)

Please don't be offended, I'm only opening discussion as to what approach is the most appropriate here. At this point we have 2 different ways of solving things, one more complex than the other, each having it's timeframe of being able to get implemented on the real system :)

Alexander

Sourcerer

2009-11-25 21:24

administrator   ~0001520

stats_ph.php is an improved version that only does one MySQL query in case of a cache-hit instead of 2 queries

edgarwahn

2009-11-27 08:34

developer   ~0001522

Seems a quite intense discussion on caching started, good :)

Alexander, I put my patch together as a "quick hack", knowing that the whole system is to be replaced in the (near?) future. I did not intend to introduce a generic caching mechanism.

By adding some more fields to the cache table, you could start a generic cache, you would need to add an unique identifier to all caching requests and then just get only these data from the cache. I.e. getData('stats.php') vs. getData('wot.php?id=10').

To add detail caching (only get from DB whats really needed, no need to update historical data at all) you'd need to store several distinct bits of data in the cache for stats.php and then get the required bits back.

I.e. getData('stats.php', 'growth_last12', '2009-11');

Its all possible, but do we really need this abstract level of caching? Plus as the code is quite mixed (means php and layout are in one place) adding caching would either make these scripts even harder to maintain OR we would need to restructure all source (see my patched stats.php, 1st code, then layout).

How much optimization / effort do we want to put in the current system?

One last note on caching time: as I wrote in my first post, I intendedly set this to one minute for testing. I think it can well be set to 6 hours or greater.

edgarwahn

2009-11-27 13:24

developer   ~0001523

One more note: we have set up an svn system that should help developers to maintain their patches once they are not submitted to CACert or not accepted by CACert. Since I opened the issue we already seem to have four different versions of the file, I'd suggest we start using an version control system to coordinate changes.

This system is for development and testing only, once patches are approved they are magically migrated to the secure cvs. The published csv tarball will be put into the svn system on a regular basis.

If you agree and would like to use it, drop me a note to my mail address mw at it-sls dot de.

aphexer

2009-11-30 12:46

reporter   ~0001525

Hello,

You don't want to specify getData('stats.php', 'other parameter', 'yet another parameter').

The idea I had is to do this: getData($query);

Then getData checks if there exists a record in the cache with key hash($query). If that record was inserted < TTL seconds ago, use it. Otherwise execute the query, and update the cache.

What about that? That's only a minor change to your suggestion but it makes the caching system transparant.

It then becomes really really easy to use it. Instead of <?=number_format(tc(mysql_extended_query("select count(`id`) as `count` from `notary`")))?> you just write:

<?=number_format(tc(getCachedValue("select count(`id`) as `count` from `notary`")))?>

Which is almost exactly the same! Once you have this caching system it'll be very easy to use it in many places.

Then I also suggest to put the caching related functions in a separate file (not in stats.php).

But I have no problem with just taking your patch as is and when the need to speedup another place with some caching arises, then have this discussion again.

Sourcerer

2009-12-17 10:49

administrator   ~0001537

I have implemented the patch now (due to high server load that had caused problems), I have set the timeout to 10 hours (feel free to complain if you need fresher data, we can do it faster if there is demand for it).
The caching successfully helped to lower the system load considerably, and to make the whole system faster and more responsive. Thanks a lot for all your help!

Sourcerer

2009-12-17 10:54

administrator   ~0001538

I would suggest that we simply add an additional row "type" to the caching table, as soon as we need a more generalized caching system.

Issue History

Date Modified Username Field Change
2009-11-25 12:18 edgarwahn New Issue
2009-11-25 12:18 edgarwahn File Added: stats.php
2009-11-25 12:19 edgarwahn Relationship added child of 0000730
2009-11-25 12:29 Sourcerer Note Added: 0001517
2009-11-25 12:33 edgarwahn Note Added: 0001518
2009-11-25 18:47 aphexer Note Added: 0001519
2009-11-25 21:23 Sourcerer File Added: stats_ph.php
2009-11-25 21:24 Sourcerer Note Added: 0001520
2009-11-27 08:34 edgarwahn Note Added: 0001522
2009-11-27 13:24 edgarwahn Note Added: 0001523
2009-11-30 12:46 aphexer Note Added: 0001525
2009-12-17 10:49 Sourcerer Note Added: 0001537
2009-12-17 10:49 Sourcerer Status new => solved?
2009-12-17 10:54 Sourcerer Note Added: 0001538
2009-12-18 12:58 Sourcerer Status solved? => closed
2009-12-18 12:58 Sourcerer Fixed in Version => production
2009-12-18 12:58 Sourcerer Resolution open => fixed
2009-12-18 12:58 Sourcerer Assigned To => Sourcerer
2013-01-15 14:26 Werner Dworak Fixed in Version => 2009 Q4