From d7e2d9af1950d268a0608da7f4eca9ae9d2b326c Mon Sep 17 00:00:00 2001 From: markt Date: Thu, 24 Feb 2011 16:01:38 +0000 Subject: [PATCH] Add documentation for the Crawler Session Manager Valve. git-svn-id: https://svn.apache.org/repos/asf/tomcat/trunk@1074192 13f79535-47bb-0310-9956-ffa450edef68 --- webapps/docs/changelog.xml | 8 +++++++ webapps/docs/config/valve.xml | 56 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 64 insertions(+) diff --git a/webapps/docs/changelog.xml b/webapps/docs/changelog.xml index 5e9415681..a17ac9be2 100644 --- a/webapps/docs/changelog.xml +++ b/webapps/docs/changelog.xml @@ -130,6 +130,14 @@ ServletContext.getResourcePaths() includes static resources packaged in JAR files in its output. (markt) + + Web crawlers can trigger the creation of many thousands of sessions as + they crawl a site which may result in significant memory consumption. + Thw new Crawler Session Manager Valve ensures that crawlers are + associated with a single session - just like normal users - regardless + of whether or not they provide a session token with their requests. + (markt) + diff --git a/webapps/docs/config/valve.xml b/webapps/docs/config/valve.xml index ad90b9772..87046655a 100644 --- a/webapps/docs/config/valve.xml +++ b/webapps/docs/config/valve.xml @@ -880,6 +880,62 @@ +
+ + + +

Web crawlers can trigger the creation of many thousands of sessions as + they crawl a site which may result in significant memory consumption. This + Valve ensures that crawlers are associated with a single session - just like + normal users - regardless of whether or not they provide a session token + with their requests.

+ +

This Valve may be used at the Engine, Host or + Context level as required. Normally, this Valve would be used + at the Engine level.

+ +

If used in conjunction with Remote IP valve then the Remote IP valve + should be defined before this valve to ensure that the correct client IP + address is presented to this valve.

+ +
+ + + +

The Crawler Session Manager Valve supports the + following configuration attributes:

+ + + + +

Java class name of the implementation to use. This MUST be set to + org.apache.catalina.valves.CrawlerSessionManagerValve. +

+
+ + +

Regular expression (using java.util.regex) that the user + agent HTTP request header is matched against to determine if a request + is from a web crawler. If not set, the default of + .*GoogleBot.*|.*bingbot.*|.*Yahoo! Slurp.* is used.

+
+ + +

The minimum time in seconds that the Crawler Session Manager Valve + should keep the mapping of client IP to session ID in memory without any + activity from the client. The client IP / session cache will be + periodically purged of mappings that have been inactive for longer than + this interval. If not specified the default value of 60 + will be used.

+
+ +
+ +
+ +
+ + -- 2.11.0