Home

Site Map Links Web Mining Information  Retrieval Glossary Bibliography
Extended Common Log File Format

Most Web servers offer the option to store Web log files in either the common log format or a proprietary format. The common log file format is supported by the majority of analysis tools but the information about each server transaction is fixed. In many cases it is desirable to record more information. Sites sensitive to personal data issues may wish to omit the recording of certain data. In addition ambiguities arise in analyzing the common log file format since field separator characters may in some cases occur within fields. The extended log file format is designed to meet the following needs:

  • Permit control over the data recorded.
  • Support needs of proxies, clients and servers in a common format
  • Provide robust handling of character escaping issues
  • Allow exchange of demographic data.
  • Allow summary data to be expressed.

Format

An extended log file contains a sequence of lines containing ASCII characters terminated by either the sequence LF or CRLF. Log file generators should follow the line termination convention for the platform on which they are executed. Analyzers should accept either form. Each line may contain either a directive or an entry.

Entries consist of a sequence of fields relating to a single HTTP transaction. Fields are separated by whitespace, the use of tab characters for this purpose is encouraged. If a field is unused in a particular entry dash "-" marks the omitted field. Directives record information about the logging process itself.

Example

An extended common log format file is a variant of the common log format file simply adding two additional fields to the end of the line, the referer and the user agent fields.

The following is an example file in the extended log format:

#Version: 1.0

#Date: 12-Jan-1996 00:00:00

#Fields: time cs-method cs-uri

00:34:23 GET /foo/bar.html

12:21:16 GET /foo/bar.html

12:45:52 GET /foo/bar.html

12:57:34 GET /foo/bar.html

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Created by Lan Man

Last Modified: Nov 11, 2002