The book Enterprise Integration Patterns – Designing, Building and Deploying Messaging Solutions by Gregor Hohpe and Bobby Woolf (Addison-Wesley) describes a number of patterns for enterprise integration, focusing mainly on messaging concepts. It is interesting to note that many of these patterns can be applied to the batch scenarios as well, especially since the batch processing landscape has changed in the last several years from a traditional sequential processing style into a multi-threaded processing style involving integration with several internal and external systems.
Not so long ago, I used a couple of these patterns while recommending a solution for the architecture for the batch processes in a project. Various batch processes in the project consisted of multiple processing steps and adhered to the following two styles of execution:
- Per-record multi-threaded execution – Every selected record is assigned to a thread in a thread pool and goes through the various steps sequentially. The job ends when all threads join.
- Per-step multi-threaded execution: All selected records execute a given step concurrently via threads in a thread pool. However, these records will go to the next step only after all records finish the previous step.
(In the actual scenarios in the project, the steps were again subdivided into tasks. For the sake of simplicity in this article, I am considering each step as the lowest unit of work). The steps in the batch process were time-consuming. Without putting much extra burden on each step, I had to find a way to extract some information when each step finishes and use this information for purposes such as checkpoints, logging and monitoring.
The approach chosen was to use a wire-tapping mechanism to receive messages from every step with the required information, and then use the recipient list pattern to forward the information to the registered listeners. The Wire Tap is an integration pattern widely used in message-based integration architecture to capture a copy of the messages while they are being forwarded to the ultimate destination. Similarly, the Recipient List pattern is used in the integration domain to route messages to a number of statically or dynamically specified recipients.
The above diagram shows the batch execution styles (a) and (b) modified by incorporating these patterns. From an implementation point of view, a decorator pattern was used to add to each step the responsibility to send messages to the wire tap at the end of the processing.
Interestingly, several other integration patterns may be used in batch architecture. For example:
- Routers allow the data to choose the processing step to be used, based on certain properties or rules.
- Translaters allow the data to be transformed, as required by a given processing step.
- Splitters allow the data to be split and fed into various processing steps concurrently.
- Aggregators allow the data from each concurrent processing step to be combined, for the use of subsequent steps.
But in all these cases, the patterns are used in a data-centric fashion rather than message-centric, mainly because the batch processing tends to be data-focussed, whereas the integration is more message-oriented.
Past few years, I have been working as a solution architect for a large-scale digital preservation programme. An Internet-facing application developed for the programme was a Java servlet-based viewer to display the contents of archived websites harvested using the Web Curator Tool and stored in the digital preservation system.
A web harvest contains a set of ARC files and one ARC Index file (CDX file format). Each entry in the CDX file for a given web harvest is an ARC record - a structure that holds the information about a given web resource, such as the original URL of the resource and the information about where the underlying content stream for the resource could be found (e.g. name of the ARC file, seek position and stream length). The CDX file can thus be considered as the "table of contents" of all harvested resources captured inside the various ARC files of the particular web harvest.
In order to improve the performance of the web harvest viewing, we had employed a caching solution based on Ehcache in the viewer. As soon as the user clicks to view a given web harvest, the viewer converts the contents from the corresponding CDX file into a collection of ARC records, and stores this collection as an element in the cache (See the class diagram). When the user requests to view various pages in the same web harvest, the application doesn’t need to parse the CDX file again. It just needs to retrieve the ARC record collection of the harvest from the cache, locate the particular ARC record and then read the corresponding ARC file from the disk for the resource content. The viewer was deployed in the same application server that also runs the third-party digital preservation software.
All worked well until the application server started giving Java "Out Of Memory" error now and then. Analysis of the heap dump at one such occasion revealed that the ARC record collections held in the cache occupied about 50% (close to 2 GB) of the used heap space. Since the web archive viewer was "stealing" the memory thus, the digital preservation application was struggling to find enough space for its processing. A screenshot of the memory analyser shown below indicates the heap utilisation of the cached objects.
What happened here, to have the cache eat up a lot of memory?
- The design has a constraint that every cache element must be a collection of ARC records for a given web harvest. Some of the web harvests were small, containing only few hundreds of web resources. However, many "killer" web harvests containing close to 100,000 web resources were also present in the preservation system. Viewing one such web harvest then creates one ARC record collection containing these many ARC record objects. This was big enough to blot the cache and in turn the Java heap. In the future, the preservation system may store even bigger web harvests. Therefore, it was not possible to profile the objects that are stored in the cache.
- The original configuration parameters of the cache were not in accordance with the usage pattern of the application. For example, the cache was configured with a very high value for the "maximum elements in memory" parameter. But the average number of users in the system was only a small fraction of that value. In addition, the cache used the Least Frequently Used (LFU) strategy as the “memory store eviction policy”. Apparently, this was unsuitable - the largest web harvests scored high as the most frequently used objects, and thus they were never evicted from the cache even after expiry!
- In Ehcache, "expired" elements do not mean "evicted" elements. The eviction happens only when the threshold is reached. With the cache configured with a large value for the maximum number of elements, the cache got filled very soon with very many ‘dead’ ARC record collection objects in the cache occupying the space, but not being used at all.
Once the web archive viewer started affecting the stability of the application server and the digital preservation application, I took a very close look at how to tune the cache. Some parameters tuned included the maximum elements in memory and also the eviction policy.
Interested to know more? Please continue reading the Part 2 of this article.
Continued from Part 1 of this article.
As part of this exercise, I wanted to know what really happens when different eviction policies were used for the Ehcache. In the local development environment, purely for the experiment purpose, I configured the maximum elements to be 5 (which means that the cache will store the ARC record collections of 5 web harvests, and will start evicting elements once this limit is reached), and tested the effect of LRU versus LFU as the eviction policy. The results were interesting.
The test started by viewing one of the very large web harvests (over 76,000 ARC records) and then continued to view the smaller ones, up to 8 web harvests. Once I started viewing the 6th web harvest (that is, after when the cache had 5 elements in it), the eviction algorithm kicked in.
- Least Frequently Used (LFU): Since the last-viewed web harvest was the smallest among the web harvests viewed and thus was accessed the least in the cache, the ARC record collection of this web harvest was evicted when LFU was used. Since this was the smallest, it removed only a few hundred ARC records from the cache, providing only a very little gain in heap space.
- Least Recently Used (LRU): The first-viewed web harvest was the least recently used among the web harvests viewed. With LRU, the ARC record collection of this web harvest was evicted, and since it had the largest number of ARC records, the total number of ARC records referenced by the cache dropped hugely by about 76,000 leaving only about 27,000 really live ARC record objects. This was a huge gain.
The tabulated result is given below. The high-lighted cells show the remaining records in the cache after the eviction started.
Shown also is a graph indicating the gain in the heap space for the two eviction policies. The LRU was a much bigger memory-saver compared to LFU in our case.
Does it mean that LFU is bad and LRU is good? Not at all. It was just that LFU was not good for our usage.
Caching is definitely good, provided a disciplined approach is followed while designing, using and configuring them. I would like to suggest few tips to use while designing cache-based solutions:
- Don’t use a single blotted cache to store all kinds of objects to be cached. Instead, use separate cache for storing each kind of objects so that each cache can be configured/tuned differently. For example, separate the caches that store short-lived, session-bound user data from the caches that store long-living application-wide objects.
- It is wiser to stay away from creating your own cache implementations using Java Collections API. There are already plenty of products (open-source and commercial) available in the market, such as Terracotta Ehcache and Apache JCS.
- As in the case of Java Collections API, when doing heap analysis of cache and the objects held within, make sure to inspect the "retained heap" space rather than shallow heap. For example, as seen in the screenshot below of the heap analysis from the production system, while the shallow heap of the cache elements was very small, the retained heap was close to 2GB - And these were the memory used by ARC records indirectly referenced by the cache.
- Know your objects well. Learn about the objects that are going to be stored in the cache and use this knowledge to choose the expiration/eviction strategy accordingly.
- If the objects are of uniform size, the eviction strategy based on the number of elements in the cache may be used (such as LRU or LFU).
- On the other hand, if you can’t predict the size of one object (which was the case of the ARC record collection in web archive viewer), then one may have to experiment with various eviction policies. If the cache framework that you use supports evictions based on the utilised heap size (in absolute value or percentage of maximum heap), then one could use this.
- As in Ehcache, the cache framework may not evict the expired elements from the cache until the limit on the maximum cache elements is reached. If memory is the utmost concern of the application, one could write logic to evict expired elements from the cache and run this at off-peak as a low priority thread. However, care must be taken since such eviction operations can be synchronized and thus may create performance bottlenecks.
- With the introduction of annotations in Java, there has been a tendency to provide the cache configurations as annotations on a cachable object (e.g. Ehcache annotations for Spring framework). While this improves the code readability, this can cause a maintanance nightmare several years later when all developers would have left the organisation and the application support team needs to investigate the reasons for blotted cache!! Thus, prefer to keep the configurations outside the code so that the cache could be reconfigured without editing/recompiling the code.
- Cache frameworks provide facility to have the elements overflow to secondary storage. While this is a useful feature, use it cautiously since the secondary storage access will usually be slower. If the objects to be stored in cache are derived from a time-consuming operation, then it is worth using a cache that overflows to disk. Otherwise, this feature may be as good as not storing the objects in the cache.
- Could we have used this approach? Probably not, because size of each cache element was unpredictable and thus an overflow of just one cache element to disk could cause 100K’s of ARC Record objects being written onto disk.
Are things like cache tuning a development concern or an architecture concern? The boundary is fuzzy. In the case of web archive viewer, it was indeed an architecture concern, primarily because it shared the very same deployment environment as the highly business-sensitive digital preservation application.