View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000793 | Main CAcert Website | source code | public | 2009-11-25 12:18 | 2013-01-15 14:26 |
Reporter | edgarwahn | Assigned To | Sourcerer | ||
Priority | normal | Severity | feature | Reproducibility | N/A |
Status | closed | Resolution | fixed | ||
Fixed in Version | 2009 Q4 | ||||
Summary | 0000793: stats.php consumes big time to finish - add caching feature | ||||
Description | stats.php gets some statistical data from the tables and produces a nice overview of cacerts "performance" over the last periods of time. Sadly the page is directly accessible from the net AND it consumes quite a lot of time to finish. Thus its a perfect point to an DOS attack against CACert. Solution is a small patch which utilizes a caching table in the database. Cache TTL is currently set to 60 seconds to make it more "testable", see the define directly at the beginning of the file. Caching table is created on the fly, fallback to get from live if cache fails. Drawback: statistical data is not deleted from database at all, thus quite a amount of data will aggregate (not a problem, data can be deleted, we just need to add an housekeeper which does the job). | ||||
Tags | No tags attached. | ||||
Attached Files | |||||
Reviewed by | |||||
Test Instructions | |||||
|
Can you also provide a SQL script that creates the table, please? |
|
Sorry, forgot to mention, its in the source. Table is automatically created if the script gets the "no such table" error message from mysql. [...] elseif (mysql_errno() > 0) { if (mysql_errno() == 1146) { $sql = 'create table statscache (timestamp bigint not null primary key, cache text not null) engine=myisam'; $res = mysql_query($sql); } } |
|
Running the full stats query every minute is still overkill. Increasing the TTL to 1 day seems more reasonable to me, and then this patch would be a great improvement over the current system. If the choice was either this patch or keeping the system as is, then definitely please apply this patch (but with a higher TTL). However a few remarks: -Your caching system can only be used for the stats.php. There are a lot of other pages which are quite slow. Hence a general caching system that can be used for anything would be preferred in my opinion. -No matter at which value you set the TTL you still execute the *full* heavy query every $TTL time. If there were 1000 assurers on Jan 1, then there is no reason to go calculate this value again each day, or even each minute as you proposed. You just remember this value semi-permanently and just calculate the diff between Jan 1 and today, which is much more efficient. -You need to make quite some changes to the stats.php code itself. It would be cooler if you could just say: "this query needs to be cached" but not having to change anything else. So please take a look at my patches too: http://aphexer.ulyssis.org/cacert/ The caching system I implemented there takes care of the above 3 issues. (especially patch 03a_cache.php-create.patch) Please don't be offended, I'm only opening discussion as to what approach is the most appropriate here. At this point we have 2 different ways of solving things, one more complex than the other, each having it's timeframe of being able to get implemented on the real system :) Alexander |
|
stats_ph.php is an improved version that only does one MySQL query in case of a cache-hit instead of 2 queries |
|
Seems a quite intense discussion on caching started, good :) Alexander, I put my patch together as a "quick hack", knowing that the whole system is to be replaced in the (near?) future. I did not intend to introduce a generic caching mechanism. By adding some more fields to the cache table, you could start a generic cache, you would need to add an unique identifier to all caching requests and then just get only these data from the cache. I.e. getData('stats.php') vs. getData('wot.php?id=10'). To add detail caching (only get from DB whats really needed, no need to update historical data at all) you'd need to store several distinct bits of data in the cache for stats.php and then get the required bits back. I.e. getData('stats.php', 'growth_last12', '2009-11'); Its all possible, but do we really need this abstract level of caching? Plus as the code is quite mixed (means php and layout are in one place) adding caching would either make these scripts even harder to maintain OR we would need to restructure all source (see my patched stats.php, 1st code, then layout). How much optimization / effort do we want to put in the current system? One last note on caching time: as I wrote in my first post, I intendedly set this to one minute for testing. I think it can well be set to 6 hours or greater. |
|
One more note: we have set up an svn system that should help developers to maintain their patches once they are not submitted to CACert or not accepted by CACert. Since I opened the issue we already seem to have four different versions of the file, I'd suggest we start using an version control system to coordinate changes. This system is for development and testing only, once patches are approved they are magically migrated to the secure cvs. The published csv tarball will be put into the svn system on a regular basis. If you agree and would like to use it, drop me a note to my mail address mw at it-sls dot de. |
|
Hello, You don't want to specify getData('stats.php', 'other parameter', 'yet another parameter'). The idea I had is to do this: getData($query); Then getData checks if there exists a record in the cache with key hash($query). If that record was inserted < TTL seconds ago, use it. Otherwise execute the query, and update the cache. What about that? That's only a minor change to your suggestion but it makes the caching system transparant. It then becomes really really easy to use it. Instead of <?=number_format(tc(mysql_extended_query("select count(`id`) as `count` from `notary`")))?> you just write: <?=number_format(tc(getCachedValue("select count(`id`) as `count` from `notary`")))?> Which is almost exactly the same! Once you have this caching system it'll be very easy to use it in many places. Then I also suggest to put the caching related functions in a separate file (not in stats.php). But I have no problem with just taking your patch as is and when the need to speedup another place with some caching arises, then have this discussion again. |
|
I have implemented the patch now (due to high server load that had caused problems), I have set the timeout to 10 hours (feel free to complain if you need fresher data, we can do it faster if there is demand for it). The caching successfully helped to lower the system load considerably, and to make the whole system faster and more responsive. Thanks a lot for all your help! |
|
I would suggest that we simply add an additional row "type" to the caching table, as soon as we need a more generalized caching system. |
Date Modified | Username | Field | Change |
---|---|---|---|
2009-11-25 12:18 | edgarwahn | New Issue | |
2009-11-25 12:18 | edgarwahn | File Added: stats.php | |
2009-11-25 12:19 | edgarwahn | Relationship added | child of 0000730 |
2009-11-25 12:29 | Sourcerer | Note Added: 0001517 | |
2009-11-25 12:33 | edgarwahn | Note Added: 0001518 | |
2009-11-25 18:47 | aphexer | Note Added: 0001519 | |
2009-11-25 21:23 | Sourcerer | File Added: stats_ph.php | |
2009-11-25 21:24 | Sourcerer | Note Added: 0001520 | |
2009-11-27 08:34 | edgarwahn | Note Added: 0001522 | |
2009-11-27 13:24 | edgarwahn | Note Added: 0001523 | |
2009-11-30 12:46 | aphexer | Note Added: 0001525 | |
2009-12-17 10:49 | Sourcerer | Note Added: 0001537 | |
2009-12-17 10:49 | Sourcerer | Status | new => solved? |
2009-12-17 10:54 | Sourcerer | Note Added: 0001538 | |
2009-12-18 12:58 | Sourcerer | Status | solved? => closed |
2009-12-18 12:58 | Sourcerer | Fixed in Version | => production |
2009-12-18 12:58 | Sourcerer | Resolution | open => fixed |
2009-12-18 12:58 | Sourcerer | Assigned To | => Sourcerer |
2013-01-15 14:26 | Werner Dworak | Fixed in Version | => 2009 Q4 |