| The Context....
Past few years, I have been working as a solution architect for a large-scale digital preservation programme. An Internet-facing application developed for the programme was a Java servlet-based viewer to display the contents of archived websites harvested using the Web Curator Tool and stored in the digital preservation system.
A web harvest contains a set of ARC files and one ARC Index file (CDX file format). Each entry in the CDX file for a given web harvest is an ARC record - a structure that holds the information about a given web resource, such as the original URL of the resource and the information about where the underlying content stream for the resource could be found (e.g. name of the ARC file, seek position and stream length). The CDX file can thus be considered as the "table of contents" of all harvested resources captured inside the various ARC files of the particular web harvest.
In order to improve the performance of the web harvest viewing, we had employed a caching solution based on Ehcache in the viewer. As soon as the user clicks to view a given web harvest, the viewer converts the contents from the corresponding CDX file into a collection of ARC records, and stores this collection as an element in the cache (See the class diagram). When the user requests to view various pages in the same web harvest, the application doesn’t need to parse the CDX file again. It just needs to retrieve the ARC record collection of the harvest from the cache, locate the particular ARC record and then read the corresponding ARC file from the disk for the resource content. The viewer was deployed in the same application server that also runs the third-party digital preservation software.
The Problem....
All worked well until the application server started giving Java "Out Of Memory" error now and then. Analysis of the heap dump at one such occasion revealed that the ARC record collections held in the cache occupied about 50% (close to 2 GB) of the used heap space. Since the web archive viewer was "stealing" the memory thus, the digital preservation application was struggling to find enough space for its processing. A screenshot of the memory analyser shown below indicates the heap utilisation of the cached objects.
What happened here, to have the cache eat up a lot of memory?
- The design has a constraint that every cache element must be a collection of ARC records for a given web harvest. Some of the web harvests were small, containing only few hundreds of web resources. However, many "killer" web harvests containing close to 100,000 web resources were also present in the preservation system. Viewing one such web harvest then creates one ARC record collection containing these many ARC record objects. This was big enough to blot the cache and in turn the Java heap. In the future, the preservation system may store even bigger web harvests. Therefore, it was not possible to profile the objects that are stored in the cache.
- The original configuration parameters of the cache were not in accordance with the usage pattern of the application. For example, the cache was configured with a very high value for the "maximum elements in memory" parameter. But the average number of users in the system was only a small fraction of that value. In addition, the cache used the Least Frequently Used (LFU) strategy as the “memory store eviction policy”. Apparently, this was unsuitable - the largest web harvests scored high as the most frequently used objects, and thus they were never evicted from the cache even after expiry!
- In Ehcache, "expired" elements do not mean "evicted" elements. The eviction happens only when the threshold is reached. With the cache configured with a large value for the maximum number of elements, the cache got filled very soon with very many ‘dead’ ARC record collection objects in the cache occupying the space, but not being used at all.
Once the web archive viewer started affecting the stability of the application server and the digital preservation application, I took a very close look at how to tune the cache. Some parameters tuned included the maximum elements in memory and also the eviction policy.
Interested to know more? Please continue reading the Part 2 of this article. |
Manage Subscriptions
/_layouts/images/ReportServer/Manage_Subscription.gif
/blog/_layouts/ReportServer/ManageSubscriptions.aspx?list={ListId}&ID={ItemId}
0x80
0x0
FileType
rdl
350
Manage Data Sources
/blog/_layouts/ReportServer/DataSourceList.aspx?list={ListId}&ID={ItemId}
0x0
0x20
FileType
rdl
351
Manage Shared Datasets
/blog/_layouts/ReportServer/DatasetList.aspx?list={ListId}&ID={ItemId}
0x0
0x20
FileType
rdl
352
Manage Parameters
/blog/_layouts/ReportServer/ParameterList.aspx?list={ListId}&ID={ItemId}
0x0
0x4
FileType
rdl
353
Manage Processing Options
/blog/_layouts/ReportServer/ReportExecution.aspx?list={ListId}&ID={ItemId}
0x0
0x4
FileType
rdl
354
Manage Cache Refresh Plans
/blog/_layouts/ReportServer/CacheRefreshPlanList.aspx?list={ListId}&ID={ItemId}
0x0
0x4
FileType
rdl
355
View Report History
/blog/_layouts/ReportServer/ReportHistory.aspx?list={ListId}&ID={ItemId}
0x0
0x40
FileType
rdl
356
View Dependent Items
/blog/_layouts/ReportServer/DependentItems.aspx?list={ListId}&ID={ItemId}
0x0
0x4
FileType
rsds
350
Edit Data Source Definition
/blog/_layouts/ReportServer/SharedDataSource.aspx?list={ListId}&ID={ItemId}
0x0
0x4
FileType
rsds
351
View Dependent Items
/blog/_layouts/ReportServer/DependentItems.aspx?list={ListId}&ID={ItemId}
0x0
0x4
FileType
smdl
350
Manage Clickthrough Reports
/blog/_layouts/ReportServer/ModelClickThrough.aspx?list={ListId}&ID={ItemId}
0x0
0x4
FileType
smdl
352
Manage Model Item Security
/blog/_layouts/ReportServer/ModelItemSecurity.aspx?list={ListId}&ID={ItemId}
0x0
0x2000000
FileType
smdl
353
Regenerate Model
/blog/_layouts/ReportServer/GenerateModel.aspx?list={ListId}&ID={ItemId}
0x0
0x4
FileType
smdl
354
Manage Data Sources
/blog/_layouts/ReportServer/DataSourceList.aspx?list={ListId}&ID={ItemId}
0x0
0x20
FileType
smdl
351
Load in Report Builder
/blog/_layouts/ReportServer/RSAction.aspx?RSAction=ReportBuilderModelContext&list={ListId}&ID={ItemId}
0x0
0x2
FileType
smdl
250
Edit in Report Builder
/_layouts/images/ReportServer/EditReport.gif
/blog/_layouts/ReportServer/RSAction.aspx?RSAction=ReportBuilderReportContext&list={ListId}&ID={ItemId}
0x0
0x4
FileType
rdl
250
Edit in Report Builder
/blog/_layouts/ReportServer/RSAction.aspx?RSAction=ReportBuilderDatasetContext&list={ListId}&ID={ItemId}
0x0
0x4
FileType
rsd
250
Manage Caching Options
/blog/_layouts/ReportServer/DatasetCachingOptions.aspx?list={ListId}&ID={ItemId}
0x0
0x4
FileType
rsd
350
Manage Cache Refresh Plans
/blog/_layouts/ReportServer/CacheRefreshPlanList.aspx?list={ListId}&ID={ItemId}&IsDataset=true
0x0
0x4
FileType
rsd
351
Manage Data Sources
/blog/_layouts/ReportServer/DataSourceList.aspx?list={ListId}&ID={ItemId}
0x0
0x20
FileType
rsd
352
View Dependent Items
/blog/_layouts/ReportServer/DependentItems.aspx?list={ListId}&ID={ItemId}
0x0
0x4
FileType
rsd
353
Compliance Details
javascript:commonShowModalDialog('{SiteUrl}/_layouts/itemexpiration.aspx?ID={ItemId}&List={ListId}', 'center:1;dialogHeight:500px;dialogWidth:500px;resizable:yes;status:no;location:no;menubar:no;help:no', function GotoPageAfterClose(pageid){if(pageid == 'hold') {STSNavigate(unescape(decodeURI('{SiteUrl}'))+'/_layouts/hold.aspx?ID={ItemId}&List={ListId}'); return false;} if(pageid == 'audit') {STSNavigate(unescape(decodeURI('{SiteUrl}'))+'/_layouts/Reporting.aspx?Category=Auditing&backtype=item&ID={ItemId}&List={ListId}'); return false;} if(pageid == 'config') {STSNavigate(unescape(decodeURI('{SiteUrl}'))+'/_layouts/expirationconfig.aspx?ID={ItemId}&List={ListId}'); return false;}}, null); return false;
0x0
0x1
ContentType
0x01
898
Edit in Browser
/_layouts/images/icxddoc.gif
/blog/_layouts/formserver.aspx?XsnLocation={ItemUrl}&OpenIn=Browser&Source={Source}
0x0
0x1
FileType
xsn
255
Edit in Browser
/_layouts/images/icxddoc.gif
/blog/_layouts/formserver.aspx?XmlLocation={ItemUrl}&OpenIn=Browser&Source={Source}
0x0
0x1
ProgId
InfoPath.Document
255
Edit in Browser
/_layouts/images/icxddoc.gif
/blog/_layouts/formserver.aspx?XmlLocation={ItemUrl}&OpenIn=Browser&Source={Source}
0x0
0x1
ProgId
InfoPath.Document.2
255
Edit in Browser
/_layouts/images/icxddoc.gif
/blog/_layouts/formserver.aspx?XmlLocation={ItemUrl}&OpenIn=Browser&Source={Source}
0x0
0x1
ProgId
InfoPath.Document.3
255
Edit in Browser
/_layouts/images/icxddoc.gif
/blog/_layouts/formserver.aspx?XmlLocation={ItemUrl}&OpenIn=Browser&Source={Source}
0x0
0x1
ProgId
InfoPath.Document.4
255