P6Cache

About P6Cache

P6Cache provides seamless caching to any application running JDBC. P6Cache was designed for easy installation and to require no code changes in existing applications. This module is particularly useful for caching queries in session beans, as well as for applications that do not use EJBs and need a mechanism to provide caching.

The current version of P6Cache is in-process and the cache is specific to the JVM. This means the current version does not implement a LRU algorithm, does not enable distributed caching, does not allow caches to be saved and copied, and only supports time based caches (caches that expire at a specific time). However, it is expected that LRU, save/restore, distributed caching and trigger based caching support will be added in future releases. (Trigger based caching means that a cache would automatically be invalidated by a particular statement. For example, a cache created around select id from lookup_table would be invalidated by the statement "update lookup_table ...".) Distributed caching has already been successfully tested, however, the codebase is no longer compatiable and for various reasons discussed below, we decided to defer implementing this feature.

Note that the decisions to implement/not implement specific functionality were not arbitrary. It is believed that the functionality provided in the first version is the optimal set for most applications. The LRU algorithm, while useful, assumes that your cache will grow large enough to be a problem for the memory available. We are making the assumption the system will have enough memory available. Not running a LRU algorithm allows the system to run faster. Trigger based caching is likewise very useful, but comes at a higher performance and complexity cost and requires that the system know about all database interactions. Meanwhile, most systems can do fine without them. For example, inventory checks on product pages on a website may generate enough traffic that constant database hits would be a problem. However, with a 15 minute cache on that query, the product page is often close enough to accurate. If real inventory values are a concern, the developer could then implement a second query on the checkout page that would not be cached. This is a very common technique, for example, some major online auction sites implement this methodology to strike a balance between traffic and the need for accuracy. The ability to save and copy caches helps avoid huge database hits during startup, however, often times the reason for restarting the server is to change functionality that may impact the caches. An alternative is to prebuild the caches during startup once, and copy them to every application server, however, this has a number of complexity implications we wanted to avoid for this version. Finally, the decision to not implement a distributed cache was the result of research done on the product. The results of the research showed that the overhead of distributed caching could be significant enough to actually degrade performance when performed unnecessarily. While this overhead can be reduced, ultimately a distributed cache will always be slower than an in-process cache. Distributed caching is particularly important when trigger based caching is used, since the trigger based cache would ideally expire when trigger by any of the application servers. However, since trigger based caching was not implemented in this version, distributed caching was not essential.

Turning on P6Cache

The P6Cache module is disabled by default. To enable it, uncomment the module in spy.properties, see the example below:

#################################################################
# MODULES #
# #
# Modules provide the P6Spy functionality. If a module, such #
# as module_log is commented out, that functionality will not #
# be available. If it is not commented out (if it is active), #
# the functionality will be active. #
# #
# Values set in Modules cannot be reloaded using the #
# reloadproperties variable. Once they are loaded, they remain #
# in memory until the application is restarted. #
# #
#################################################################

module_cache=com.p6spy.engine.outage.P6CacheDriver
#module_log=com.p6spy.engine.logging.P6LogSpyDriver

Using P6Cache

P6Cache was designed to be easy to use, but it is a bit more complex than the logging application. P6Cache works by intercepting JDBC statements and comparing them to query forms that specify what should be cached and for how long. A query form is similar to a prepared statement. P6Cache reads the query forms you want to cache and the expiration time for the query form from the spy.forms file.

To determine what you want to cache set the formstrace equal to true in spy.properties. This causes P6Cache to write all of the query forms that it sees to the file forms.log. (You will only use this in development, in production you will set formstrace=false). Forms.log will contain entries such as:

00:00:00; select ? from stmt_test

Look through the query forms and copy the ones that interest you. Paste these into spy.forms. Spy.forms looks like this:

# use form: expiration_time; query from forms.log file
# 00:00:00; select count(*) from prepstmt_test where col2 = ?
#
# We support the following expiration time shorthands:
# 660 - just 660 minutes - the default.
# 1 day - 1 day from now
# 1 hour - 1 hour
# 1 hr - 1 hour
# 1 min - 1 minute
# 1 minute - 1 minute
# 1 sec - 1 second - not much point of course
# 1 second - 1 second
# 2 days - 2 days
# 2 hours - 2 hours
# 2 hrs - 2 hours
# 2 minutes - 2 minutes
# 2 seconds - 2 seconds
# 2 secs - 2 seconds
# 6 days - 6 days
# 60 mins - 60 minutes
# 12:00:00 - at 12:00:00 regardless of when the cache was created
# 23:59:59 - at 23:59:59 regardless of when the cache was created.
# 23:59 - at 23:59:00 regardless of when the cache was created.
# 12:01 - at 12:01:00 regardless of when the cache was created.

00:00:00; select ? from stmt_test

This would create a cache for "select ? from stmt_test" that expires at midnight every day. You might prefer to create a cache that is valid for 15 minutes. You would do that by changing "00:00:00; select ? from stmt_test" to read "15 minutes; select ? from stmt_test"

Once the caches are defined in spy.forms restart your server and P6Cache will begin caching.

Advanced Configuration

You can set additional properties in the spy.properties file. In the file you will find a P6Cache specific section:

################################################################
# P6CACHE SPECIFIC PROPERTIES #
################################################################

# if the driver is loaded, determines if caching is performed
cache=true

# outputs trace information into the p6log file
cachetrace=false

# the SQL command the causes the cache to clear. this can be
# any command, such as "clear cache". It is intercepted and
# never actually executed by the real driver
clearcache=

# the default number of entries expected. use for performance
# purposes
entries=

# the file that contains the query forms to cache
# this file should contain a series of individual
# lines with:
# <expirationtime>;<query form>
# hh:mm:ss; query take from forms.log
# 00:10:00; select ? from stmt_test
formsfile=spy.forms

# the location of an automatically generated file that contains
# a list of the forms seen by p6cache. you can just copy these
# into the formsfile
formslog=forms.log

# run this during development to determine which query forms
# you want to cache
formstrace=true

cache

When the module is enabled, setting cache=true causes caching to be enabled. Setting cache=false causes caching to be disabled.

cachetrace

Outputs debug information in spy.log.

clearcache

This allows you to define a SQL statement that, when executed, will clear all of the caches. For example, you could set:

clearcache=clear caches

so whenever you execute the SQL statement "clear caches" it would clear the cache

entries

allows you to define the default cache size

formsfile

this defaults to spy.forms and is the file that contains the list of query forms to cache

formslog

this defaults to forms.log, see formstrace

formstrace

when set to true formstrace causes all query forms that P6Cache finds to be logged to the formslog. This is particularly important during development, so the developer can easily spot queries to cache.

Other Properties

P6Cache does not share common properties. Currently this means P6Cache does not support reloading of properties. This was purposely left out due to lack of time to prove performance would not be impacted. Ultimately, reload check should not impact performance, and it will probably be enabled.

Using Caching and Logging Together

If you have both P6Cache enabled and P6Log enabled, the behavior is dependent on the order the modules are listed in spy.properties:

#################################################################
# MODULES #
# #
# Modules provide the P6Spy functionality. If a module, such #
# as module_log is commented out, that functionality will not #
# be available. If it is not commented out (if it is active), #
# the functionality will be active. #
# #
# Values set in Modules cannot be reloaded using the #
# reloadproperties variable. Once they are loaded, they remain #
# in memory until the application is restarted. #
# #
#################################################################

module_cache=com.p6spy.engine.outage.P6CacheDriver
#module_log=com.p6spy.engine.logging.P6LogSpyDriver

In this case P6CacheDriver is listed first. This is probably the behavior you want, since P6Cache will process the request first and only pass through to P6Log when an actual database query is executed. For example, if you cached the statement "select ? from employee", the first time you accessed that query you would see an entry in P6Log. However, every time after that there would be no statement, since P6Cache is intercepting the statement before it goes to the database.

If you want to see everything that is happening, you can reverse the order and list P6LogSpyDriver first. This will cause every statement, cached or not, to be logged.

Known Limitations/Issues

P6Cache does not support caching of Blobs, Clobs, Arrays, Binary Stream, Unicode Stream, get Timestamp/time with calendar (standard get time/timestamp is supported), big decimal with scale (standard get big decimal is supported), get date with calendar (standard get date is supported), and get ref. These may be supported at a later time.

One bug that has been observed is that some databases that do not support long VARCHAR values and instead recommend using BLOBs (MySQL for example) may allow you to cheat and perform a getString() on the blob column and the database driver will make the automatic conversion when you invoke getString(). P6Cache will not prevent that call, since it only blocks getBlob calls on cached data, however, P6Cache also does not have this automatic conversion capability built-in. This means if you cache a BLOB column and invoke getString() P6Cache will not return the expected String, but rather the hashCode of the BLOB.