Analyze Web Dispatcher logs with Kibana

SAP Web Dispatcher (WD) is the entry point of your users that access your web enabled applications. These can be any HTML service or app you have running on NetWeaver, or other systems like HANA XS. For over a decade WD offers reverse proxy functionality for SAP systems, and while until shortly its main usage area was SAP Portal and Web Dynpro applications, with the rise of Fiori WD is more exposed. Naturally, more and more companies will use it. Of course WD can be integrated into SolMan and therefore can be managed and monitored.

While this is nice, analytical requirements for a web application can be quite complex. A standard approach is to use a web analytics application that helps you to find out how your site is used (sessions, entry/exit points, campaigns). While this gives you transparency about the site experience of your end users, it is not really useful when it comes to a more administratively driven approach: what kind of content is passed through WD, impact on configuration parameters: CSS, JavaScript, response times, data throughput. Besides, your users must be OK with the tracking code and modern browsers allow users to deactivate tracking cookies and related technology (do not track).

WD is the single point of entry to web applications; it contains viable information about their usage. This information can heavily influence the understanding of the app. Think about finding the bottlenecks of the app, the most accessed resources, usage patterns, and so on. The log of the web dispatcher contains all this kind of information. You only have to gather it, store it and analyze it.

Basically, WD is a reverse proxy, and in the non-SAP context, Apache is one of the most used reverse proxies. Analyzing HTTP traffic is a common task for web site administrators, and so it is not a big surprise to find a huge list of Apache traffic analyze log tools available. The Swiss army knife among them is logstash. Now, logstash does not really analyze web server logs. It rather parses them and can send them to another tool for storing and analyzing the data. Like elasticsearch.

To learn how to configure your own system for WD, logstash, please read the how to document I posted here.

This is the default use case of logstash: Parse logs, extract the information and send it to elasticsearch for storing and retrieval. After the information is stored in elasticsearch, it can be used by Kibana for retrieving information like statistics and analytical data. Think about access statistics or trends.

The vantage of the combination of logstash, elasticsearch and Kibana over a web analytic app is that you do not have to install a tracking / analyzing part in your web application. You can also analyze part of your web page normally invisible, like resources. Depending on your WD configuration, you gain insights into how WD works, like how long it takes to retrieve files from the SAP system.

Information retrieval

After connecting Kibana to elasticsearch it is easy to surf the data and to create your own dashboard. Drilling down is no problem and while logstash is running in the background adding new data, the dashboard can reflect this instantly. A few sample reports may include:

Total number of files served by WD

Total number of MB transferred

Hits to resouces

You can correlate this data to find out interesting stuff like:

  • Number of requests: a cached resource is served locally by browser, this can decrease drastically the load on WD and backend.
  • Requests for a specific file / site
  • Average response time for CSS or JS files: does it make sense to use WD as a web cache? Think about it: the data may indicate that WD waits to retrieve a file from ICM, multiply it with the numbers of requests it takes for a user to access a resource and you have an idea of time wasted.
  • Data send by serving static files: is your cache configuration correct?
  • What is the largest file requested?
  • Usages: your application is accessible only internally, does the access statistics reflect this?
  • Hitting a lot of 304, 404 or 500? What is causing this?
  • Monitor ICM admin resources to find out possible attack vectors.

SAP WebDispatcher and Logstash – installation and configuration

This document explains how to install and configure an environment for analyzing SAP Web Dispatcher (WD) logs with logstash, elasticsearch and Kibana under Linux. Kibana 3 needs a running web server. The example shown here is using nginx, but won’t detail how to set up nginx.

Components referred to in this document:

SAP WebDispatcher

“The SAP Web dispatcher lies between the Internet and your SAP system. It is the entry point for HTTP(s) requests into your system, which consists of one or more SAP NetWeaver application servers.” http://help.sap.com/saphelp_nw73ehp1/helpdata/en/48/8fe37933114e6fe10000000a421937/frameset.htm

Logstash

“logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use (like, for searching). Speaking of searching, logstash comes with a web interface for searching and drilling into all of your logs.” http://logstash.net/

Elasticsearch

“Elasticsearch is a powerful open source search and analytics engine that makes data easy to explore.” http://www.elasticsearch.org/

Kibana

“Kibana is an open source, browser based analytics and search dashboard for ElasticSearch.” http://www.elasticsearch.org/overview/kibana/

Nginx

“nginx (pronounced engine-x) is a free, open-source, high-performance HTTP server and reverse proxy, as well as an IMAP/POP3 proxy server.” http://wiki.nginx.org/Main

 

Install Elasticsearch

Installation in 3 steps: http://www.elasticsearch.org/overview/elkdownloads/

  1. Command: wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.4.2.tar.gz

  2. Extract archive

    Command: tar –zxvf elasticsearch-1.4.2.tar.gz

  3. Start Elasticsearch

    Command

    cd ealsticsearch-1.4.2

    cd bin

    ./elasticsearch

Install Logstash

Installation in 3 steps: http://www.elasticsearch.org/overview/elkdownloads/

  1. Command:

    wget https://download.elasticsearch.org/logstash/logstash/logstash-contrib-1.4.2.tar.gz

  2. Extract

    Command: tar –zxvf logstash-contrib-1.4.2.tar.gz

  3. Run logstsash logstash. Before logstash can be run, it must be configured. Configuration is done in a config file.

Logstash configuration

The configuration of logstash depends on the log configuration of WD. Logstash comes out of the box with everything it takes to read Apache logs. In case WD is configured to write logs in Apache format, no additional configuration is needed. WD also offers the option to write additional information to the log.

Logformats http://help.sap.com/saphelp_nw73ehp1/helpdata/en/48/442541e0804bb8e10000000a42189b/content.htm?frameset=/en/48/8fe37933114e6fe10000000a421937/frameset.htm&current_toc=/en/ed/2429371ec14c23a7508affa1280d07/plain.htm&node_id=46&show_children=false

  • CLF. This is how Apache is logging. It contains most information needed.
  • CLFMOD. Same format as CLF, but without form fields and parameters for security reason.
  • SAP: writes basic information and no client IP, but contains processing time on SAP Application Server. This is a field you really will need.
  • SMD: For SolMan Diagnostics and same as SAP, but contains the correlation ID.

As mentioned before, for CLF logstash comes with everything already configured. A log level that makes sense is SMD because of the response time. In that case, logstash must be configured to parse correctly the WD log. Logstash uses regular expressions to extract information. To make logstash understand SMD log format, the correct regular expression must be made available. Grok uses the pattern file to extract the information from the log http://logstash.net/docs/1.4.2/filters/grok The standard pattern file can be found here: https://github.com/elasticsearch/logstash/tree/v1.4.2/patterns

For instance, to extract the value of the correlation id when log format is set to SMD, the regular is:

CORRELATIONID [a-zA-Z]\[\-\]

For WD with SMD log the complete regular expression is

TEST2 \|

WEBDISPATCHER \[%{HTTPDATE:timestamp}\] %{USER:ident} “(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})” %{NUMBER:response} (?:%{NUMBER:bytes}|-) \[%{NUMBER:duration}\] %{CORRELATIONID:correlationid} %{TEST2:num1}

 

When the IP is added to the WD log with SMD, the regular expression is

TEST2 \|

CORRELATIONID [a-zA-Z]\[\-\]

WEBDISPATCHERTPP %{IP:ip} \[%{HTTPDATE:timestamp}\] %{USER:ident} “(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})” %{NUMBER:response} (?:%{NUMBER:bytes}|-)\[%{NUMBER:duration}\] %{CORRELATIONID:correlationid} %{TEST2:num1}

 

You can find an example pattern file here: https://github.com/tobiashofmann/wd_logstash. The standard grok pattern file defines regular expressions for user id, IPv4/6, data, etc.

The actual configuration file consists of three sections: input, filter and output. The input part defines the logs to read, the filter part defines the filter to be applied to the input and the output part specifies where to write the result to. Let’s take a look at each of the sections:

Input

input {

file {

type => “wd”

path => [“/usr/sap/webdispatcher/access*”]

start_position => “beginning”

codec => plain {

charset => “ISO-8859-1”

}

}

}

All files starting with access at directory /usr/sap/webdispatcher are being read by logstash. The codec parameter ensures URLs with special characters are read correctly. To all lines read a type named wd is added.

Filter

filter {

if [type] == “wd” {

grok {

patterns_dir => “./patterns”

match => { “message” => “%{WEBDISPATCHER}” }

}

date {

match => [“timestamp”, “dd/MMM/yyyy:HH:mm:ss Z” ]

}

mutate {

convert => [ “bytes”, “integer” ]

convert => [ “duration”, “integer” ]

}

}

}

The filter is applied to all lines with type wd (see input). Grok is doing the regular expressions and to find the customized patterns for WD, the patterns_dir parameter is used. The date value is given by the timestamp. If this is not set, logstash takes the timestamp when the line is read. What you want is the timestamp of the logged access time of the HTTP request. To facilitate later analysis, the values bytes and duration are transformed to integer values.

Output

output {

elasticsearch {

host => localhost

index => “wd”

index_type => “logs”

protocol => “http”

}

}

As output a local elasticsearch server is defined. The logs are written to the index wd to index type logs. This stores the log lines as a value to elasticsearch and makes it accessible for further processing.

A sample configuration file can be found here https://github.com/tobiashofmann/wd_logstash

 

Run logstash

To run logstash and let it read the WD logs, use the following command:

./logstash –f logstash.conf

This will start logstash. It takes a few seconds for the JVM to come up and read the first log file. Afterwards the log files are parsed and send over to elastic search.

Kibana

Installation in 3 steps: http://www.elasticsearch.org/overview/elkdownloads/

  1. Go to the HTML directory configured for NGinx, like /var/www/html

    Command: cd /var/www/html

  2. Command: wget https://download.elasticsearch.org/kibana/kibana/kibana-3.1.2.tar.gz

  3. Extract archive

    Command: tar –zxvf kibana-3.1.2.tar.gz

  4. Configure nginx

    Add a location in nginx configuration file to make the kibana application available under /kibana

    Location /kibana {

    alias /var/www/html/<dir of kibana>

    }

  5. Access Kibana on web browser: http://webserver:port/kibana