Web Services as a Data Source in Splunk - A How-To Guide

Web Services as a Data Source in Splunk

A How-To Guide

Tomaso Vasella
by Tomaso Vasella
on July 25, 2019
time to read: 15 minutes

Keypoints

Use Web Services as Data Sources in Splunk

  • Machine-to-machine communication plays an increasingly important role all the time
  • The interfaces necessary for this are often implemented with APIs in the form of web services
  • These data sources are often required for analytics
  • Splunk is a popular tool for tasks like these but doesn't have any built-in functions for accessing web service-based data sources

Machine-to-machine communication plays an increasingly important role all the time in the modern digital world. APIs are nothing new: already in the year 2000 you could access the web APIs provided by Salesforce and eBay. In recent years, however, there has been such an explosion in numbers that a veritable API economy has emerged – companies are even building their business models around them.

Microservices or IoT would never have become as popular as they are today without the possibilities afforded by APIs. With the rapidly increasing production and storage of data, people generally also want the ability to share, analyze and extract useful information from this data using standard interfaces. APIs in the form of web services are playing a key role in machine-to-machine communication as increasing volumes of data and information are becoming accessible in this way all the time.

Using these data sources for analytics, correlations, etc. is a common task. This article presents one possible approach to accessing data from web-based APIs with the help of a Splunk app developed specifically for this purpose.

Splunk

Splunk is a popular tool that indexes, saves and visualizes machine data, such as logs, metrics and so forth, making it possible to carry out comprehensive analytics and draw correlations as well as to handle complex monitoring and reporting tasks with graphics, reports, and warnings. Splunk runs on several modern operating systems and is available as a free, full-featured 60-day demo version.

Splunk apps

Splunk has a whole range of built-in capabilities for accessing data from various sources, such as syslog via UDP/TCP or reading files with file readers and forwarders. To source data from an interface that is not already supported by the product or to gain complete control over how the data is sourced, however, you need certain additional features. This is usually achieved using an app or add-on.

To access web-based APIs, such as REST interfaces, Splunk does not include any built-in capabilities, meaning that you need an app to do this. The SDK, which is available for various development languages, can help. In our example, we will be using the version for Python.

Data inputs

Data sources are integrated in Splunk using data inputs, which are essentially definitions containing the protocol and, where applicable, the credentials used to access the data sources. An app can define its own data inputs, which can then be configured using the GUI. This method is often used for API-based data sources. The SDK includes functions that make creating these data inputs much easier. The data source used here is the VulDB API; the following information can, however, provide a general foundation and be applied to other data sources in a similar fashion.

Preparing the new app

The easiest way to start building a new app is to use the web GUI: On the home page, click on Manage Apps (gear icon) and then select Create app. In this example, VulDB was selected as the name for the app and for the directory. Click on Create app to create a minimal app with a directory structure that will be saved under $SPLUNK_HOME/etc/apps/VulDB/:

bin/
    README
default/
    app.conf
    data/
        ui/
            nav/
                default.xml
            views/
                README
local/
    app.conf
metadata/
    default.meta
    local.meta

For more information about the various folders and files belonging to apps, refer to the documentation.

Integrating the SDK

The required files from the Python SDK are stored in the bin subfolder. We recommend that you copy to bin only those SDK folders and files which you will actually be using in the app. In this example, the folder splunklib is copied from the SDK to the bin/packages/ directory. Your own scripts will then also be stored in bin.

Accessing the data source

To access the data source, you need a client (the requests Python library in this case), a suitable data processing method (JSON here) and its subsequent saving and indexing in Splunk.

Creating a modular input (a certain type of data input) is perfect for this and is very easy to set up using the SDK. This requires writing a script that carries out the following three activities:

  1. Output of the scheme defining items like the configuration, possible parameters and a description of the input;
  2. Validating the values for the defined parameters that will be entered via the web GUI. This step is optional but strongly recommended;
  3. Passing the data retrieved by the script to Splunk (streaming).

Also refer to the documentation and additional examples.

First, a derived class from splunklib.modularinput.script is created. The three steps just described are then implemented with the methods get_scheme, validate_input and stream_events provided for this purpose. You must also define the method __main__, which executes the script.

To access APIs programmatically, it often makes sense to implement a dedicated API client. That way, the details of the API interaction can be extracted and modularized more easily, and things like session handling and error handling can be managed more smoothly. One possible method of an API client might look like this:

@request_error_handler
def get_latest_entries(self, latest=1000, details=0):
    post_data = {'format'  : 'json',
                 'recent'  : str(latest),
                 'details' : str(details)}

    return requests.post(self.url, data=post_data, headers=self.headers,
        proxies=self.proxies, verify=self.verify)

The following lines of code are a highly simplified example of a script for a modular input using the three previously mentioned methods from the SDK:

import sys
import os
sys.path.insert(0, os.path.sep.join([os.path.dirname(__file__), 'packages']))

import splunklib.modularinput as mi
import json

class MyScript(Script):
    def get_scheme(self):
        scheme = mi.Scheme("VulDB")
        scheme.description = "Get information from VulDB, the number one vulnerability
                              database."

        scheme.add_argument(mi.Argument(name="api_key",
                                     title="VulDB API Key",
                                     description="The key for accessing the VulDB API",
                                     data_type=mi.Argument.data_type_string,
                                     required_on_create=True,
                                     required_on_edit=False))
        return scheme

    def validate_input(self, v):
        if 'api_key' in v.parameters:
            try:
                if not v.parameters['api_key'].isalnum():
                    raise ValueError('VulDB API key must be alphanumeric')
            except Exception as e:
                raise ValueError('VulDB API key must be alphanumeric')

    def stream_events(self, inputs, ew):
        for input_name, input_item in inputs.inputs.iteritems():

            # res contains the response of a web request to the VulDB API
            # such as the above example
            for data in res.json()['result']:
                try:
                    event = mi.Event()
                    event.stanza = input_name
                    event.data = json.dumps(data)
                    ew.write_event(event)
                except Exception as e:
                    ew.log('ERROR', 'An error has occurred writing data: {}'.format(e))

if __name__ == "__main__":
    sys.exit(MyScript().run(sys.argv))

Logging is optional but is highly recommended. You can use the method log() with varying criticality levels for this, e.g. INFO or ERROR. The log events are written to the file splunkd.log.

Pagination

If an API client requests large volumes of data, e.g. all of the data available for a certain timeframe, it may be useful or necessary to split the request into partial requests, because many APIs limit the amount of data that can be included in a single request. A pagination method is normally used to manage this properly. The API client will then include the relevant parameters in the request and specify where in the sequence of available data the request begins (page number) and how much data should be included in the output (number of data records per page). Firstly, this requires a corresponding logic on the client side. Secondly, the client has to note which data points it requested last in order to connect them with the subsequent requests.

Security for API keys

APIs can often only be used with their corresponding keys. These keys should be stored securely and never be visible as plaintext in scripts or configuration files. Splunk offers the option to use what are called storage passwords. The password or API key is encrypted using a key that is stored on the Splunk server. Apart from system administrators with access to the file system, only users with Admin rights can view these passwords.

Below are some examples of how these credentials are stored and retrieved; the string api_key_label is used as the username in this example.

def protect_key(self, key):
    try:
        for storage_password in self.service.storage_passwords:
            if storage_password.username == self.api_key_label:
                self.service.storage_passwords.delete(username = self.api_key_label)
                break

        self.service.storage_passwords.create(key, self.api_key_label)
    except Exception as e:
        raise Exception("An error occurred protecting key: {}".format(e))

def get_protected_key(self):
    try:
        for storage_password in self.service.storage_passwords:
            if storage_password.username == self.api_key_label:
                return storage_password.content.clear_password
    except Exception as e:
        raise Exception("An error occurred retrieving protected key: {}".format(e))

After you are done writing the script, it can be integrated in the app by following these steps:

  1. Packaging the script and SDK
  2. Creating the file app.conf
  3. Creating the file inputs.conf.spec
  4. Installing the app by saving the package under $SPLUNK_HOME/etc/apps/

Using the directory structure created previously, the modular input script will be created and copied to the bin directory along with all other scripts that have been created, such as the previously mentioned API client. The resulting directory tree will then look like this:

bin/
  packages/
    splunklib/
    VulDB.py
    VulDBApi.py
  README
default/
    app.conf
    data/
        ui/
            nav/
                default.xml
            views/
                README
local/
    app.conf
metadata/
    default.meta
    local.meta

Next, modify the file app.conf, which defines various aspects of the app.

 [install]
is_configured = 0

[ui]
is_visible = 1
label = VulDB

[launcher]
author = VulDB
description = VulDB
version = 0.0.1

The modular input must be configured manually. This requires first creating the README directory on the same level as bin. The file inputs.conf.spec is now created in the directory:

[VulDB://<name>]
vuldb_lang =
api_key =
details =

The lines must correspond with the parameters defined using the method scheme.add_argument in the modular input script. For more information, refer to the documentation.

The following directory tree will then be created:

bin/
  packages/
    splunklib/
    VulDB.py
    VulDBApi.py
  README
default/
    app.conf
    data/
        ui/
            nav/
                default.xml
            views/
                README
local/
    app.conf
metadata/
    default.meta
    local.meta
README/
    inputs.conf.spec

The final step is to store this entire directory tree in $SPLUNK_HOME/etc/apps/. The installation of the app and its modular input is now complete. Splunk must be restarted so that these additions are displayed in the web GUI. The app should now appear in the GUI, and the defined modular input can be found in the Settings menu under Data Inputs.

Configuring the data format

The data retrieved in this example is returned by the API in JSON format. An appropriate configuration is set up so that Splunk can index the data correctly. This is handled in the file props.conf, which is located in the default/ directory. Here you can also define how the timestamp is extracted from the data or how it should be used (assuming the data contains a timestamp).

[VulDB]
TRUNCATE = 0
TIME_PREFIX = timestamp":.*create":
TIME_FORMAT = %s
TZ = UTC
INDEXED_EXTRACTIONS = JSON

The last step is to create and configure a new instance of the modular input in the web GUI based on the parameters defined in the script and in the file inputs.conf.spec. Here you can also define which Splunk source type and which index to use when saving data. Once the script has been executed successfully, the data that has been retrieved is indexed, saved and available for further use.

Conclusion

Communication with and using web-based APIs as a data source for analytics purposes is a popular use case. It is relatively easy to use Splunk and its SDK to develop solutions in the form of apps. Splunk’s native support for common data formats like JSON as well as the availability of powerful libraries for web access have proven very useful here. To communicate with the API and handle errors, it is usually a good idea to implement a dedicated API client. It is also important to provide adequate security for credentials or API keys and to consider any specific attributes of the integrated APIs, such as pagination. Once the data is indexed and saved in Splunk, it will be available for analytics, correlations and visualizations. The procedure described here can also be used similarly with other data sources for a quickly implemented and generally useful solution for analyzing data.

About the Author

Tomaso Vasella

Tomaso Vasella has a Master in Organic Chemistry at ETH Z├╝rich. He is working in the cybersecurity field since 1999 and worked as a consultant, engineer, auditor and business developer. (ORCID 0000-0002-0216-1268)

Links

You want to bring your logging and monitoring to the next level?

Our experts will get in contact with you!

×
CIS Controls

CIS Controls

Tomaso Vasella

Passwordless Authentication

Passwordless Authentication

Tomaso Vasella

Data Leakage Prevention

Data Leakage Prevention

Tomaso Vasella

Webscraping with Powershell

Webscraping with Powershell

Tomaso Vasella

You want more?

Further articles available here

You need support in such a project?

Our experts will get in contact with you!

You want more?

Further articles available here