Security Testing
Tomaso Vasella
Use Web Services as Data Sources in Splunk
Microservices or IoT would never have become as popular as they are today without the possibilities afforded by APIs. With the rapidly increasing production and storage of data, people generally also want the ability to share, analyze and extract useful information from this data using standard interfaces. APIs in the form of web services are playing a key role in machine-to-machine communication as increasing volumes of data and information are becoming accessible in this way all the time.
Using these data sources for analytics, correlations, etc. is a common task. This article presents one possible approach to accessing data from web-based APIs with the help of a Splunk app developed specifically for this purpose.
Splunk is a popular tool that indexes, saves and visualizes machine data, such as logs, metrics and so forth, making it possible to carry out comprehensive analytics and draw correlations as well as to handle complex monitoring and reporting tasks with graphics, reports, and warnings. Splunk runs on several modern operating systems and is available as a free, full-featured 60-day demo version.
Splunk has a whole range of built-in capabilities for accessing data from various sources, such as syslog via UDP/TCP or reading files with file readers and forwarders. To source data from an interface that is not already supported by the product or to gain complete control over how the data is sourced, however, you need certain additional features. This is usually achieved using an app or add-on.
To access web-based APIs, such as REST interfaces, Splunk does not include any built-in capabilities, meaning that you need an app to do this. The SDK, which is available for various development languages, can help. In our example, we will be using the version for Python.
Data sources are integrated in Splunk using data inputs, which are essentially definitions containing the protocol and, where applicable, the credentials used to access the data sources. An app can define its own data inputs, which can then be configured using the GUI. This method is often used for API-based data sources. The SDK includes functions that make creating these data inputs much easier. The data source used here is the VulDB API; the following information can, however, provide a general foundation and be applied to other data sources in a similar fashion.
The easiest way to start building a new app is to use the web GUI: On the home page, click on Manage Apps
(gear icon) and then select Create app
. In this example, VulDB
was selected as the name for the app and for the directory. Click on Create app
to create a minimal app with a directory structure that will be saved under $SPLUNK_HOME/etc/apps/VulDB/
:
bin/ README default/ app.conf data/ ui/ nav/ default.xml views/ README local/ app.conf metadata/ default.meta local.meta
For more information about the various folders and files belonging to apps, refer to the documentation.
The required files from the Python SDK are stored in the bin
subfolder. We recommend that you copy to bin
only those SDK folders and files which you will actually be using in the app. In this example, the folder splunklib
is copied from the SDK to the bin/packages/
directory. Your own scripts will then also be stored in bin
.
To access the data source, you need a client (the requests Python library in this case), a suitable data processing method (JSON here) and its subsequent saving and indexing in Splunk.
Creating a modular input (a certain type of data input) is perfect for this and is very easy to set up using the SDK. This requires writing a script that carries out the following three activities:
Also refer to the documentation and additional examples.
First, a derived class from splunklib.modularinput.script
is created. The three steps just described are then implemented with the methods get_scheme
, validate_input
and stream_events
provided for this purpose. You must also define the method __main__
, which executes the script.
To access APIs programmatically, it often makes sense to implement a dedicated API client. That way, the details of the API interaction can be extracted and modularized more easily, and things like session handling and error handling can be managed more smoothly. One possible method of an API client might look like this:
@request_error_handler def get_latest_entries(self, latest=1000, details=0): post_data = {'format' : 'json', 'recent' : str(latest), 'details' : str(details)} return requests.post(self.url, data=post_data, headers=self.headers, proxies=self.proxies, verify=self.verify)
The following lines of code are a highly simplified example of a script for a modular input using the three previously mentioned methods from the SDK:
import sys import os sys.path.insert(0, os.path.sep.join([os.path.dirname(__file__), 'packages'])) import splunklib.modularinput as mi import json class MyScript(Script): def get_scheme(self): scheme = mi.Scheme("VulDB") scheme.description = "Get information from VulDB, the number one vulnerability database." scheme.add_argument(mi.Argument(name="api_key", title="VulDB API Key", description="The key for accessing the VulDB API", data_type=mi.Argument.data_type_string, required_on_create=True, required_on_edit=False)) return scheme def validate_input(self, v): if 'api_key' in v.parameters: try: if not v.parameters['api_key'].isalnum(): raise ValueError('VulDB API key must be alphanumeric') except Exception as e: raise ValueError('VulDB API key must be alphanumeric') def stream_events(self, inputs, ew): for input_name, input_item in inputs.inputs.iteritems(): # res contains the response of a web request to the VulDB API # such as the above example for data in res.json()['result']: try: event = mi.Event() event.stanza = input_name event.data = json.dumps(data) ew.write_event(event) except Exception as e: ew.log('ERROR', 'An error has occurred writing data: {}'.format(e)) if __name__ == "__main__": sys.exit(MyScript().run(sys.argv))
Logging is optional but is highly recommended. You can use the method log()
with varying criticality levels for this, e.g. INFO
or ERROR
. The log events are written to the file splunkd.log
.
If an API client requests large volumes of data, e.g. all of the data available for a certain timeframe, it may be useful or necessary to split the request into partial requests, because many APIs limit the amount of data that can be included in a single request. A pagination method is normally used to manage this properly. The API client will then include the relevant parameters in the request and specify where in the sequence of available data the request begins (page number) and how much data should be included in the output (number of data records per page). Firstly, this requires a corresponding logic on the client side. Secondly, the client has to note which data points it requested last in order to connect them with the subsequent requests.
APIs can often only be used with their corresponding keys. These keys should be stored securely and never be visible as plaintext in scripts or configuration files. Splunk offers the option to use what are called storage passwords. The password or API key is encrypted using a key that is stored on the Splunk server. Apart from system administrators with access to the file system, only users with Admin rights can view these passwords.
Below are some examples of how these credentials are stored and retrieved; the string api_key_label
is used as the username in this example.
def protect_key(self, key): try: for storage_password in self.service.storage_passwords: if storage_password.username == self.api_key_label: self.service.storage_passwords.delete(username = self.api_key_label) break self.service.storage_passwords.create(key, self.api_key_label) except Exception as e: raise Exception("An error occurred protecting key: {}".format(e)) def get_protected_key(self): try: for storage_password in self.service.storage_passwords: if storage_password.username == self.api_key_label: return storage_password.content.clear_password except Exception as e: raise Exception("An error occurred retrieving protected key: {}".format(e))
After you are done writing the script, it can be integrated in the app by following these steps:
app.conf
inputs.conf.spec
$SPLUNK_HOME/etc/apps/
Using the directory structure created previously, the modular input script will be created and copied to the bin
directory along with all other scripts that have been created, such as the previously mentioned API client. The resulting directory tree will then look like this:
bin/ packages/ splunklib/ VulDB.py VulDBApi.py README default/ app.conf data/ ui/ nav/ default.xml views/ README local/ app.conf metadata/ default.meta local.meta
Next, modify the file app.conf
, which defines various aspects of the app.
[install] is_configured = 0 [ui] is_visible = 1 label = VulDB [launcher] author = VulDB description = VulDB version = 0.0.1
The modular input must be configured manually. This requires first creating the README
directory on the same level as bin. The file inputs.conf.spec
is now created in the directory:
[VulDB://<name>] vuldb_lang = api_key = details =
The lines must correspond with the parameters defined using the method scheme.add_argument
in the modular input script. For more information, refer to the documentation.
The following directory tree will then be created:
bin/ packages/ splunklib/ VulDB.py VulDBApi.py README default/ app.conf data/ ui/ nav/ default.xml views/ README local/ app.conf metadata/ default.meta local.meta README/ inputs.conf.spec
The final step is to store this entire directory tree in $SPLUNK_HOME/etc/apps/
. The installation of the app and its modular input is now complete. Splunk must be restarted so that these additions are displayed in the web GUI. The app should now appear in the GUI, and the defined modular input can be found in the Settings menu under Data Inputs.
The data retrieved in this example is returned by the API in JSON format. An appropriate configuration is set up so that Splunk can index the data correctly. This is handled in the file props.conf
, which is located in the default/
directory. Here you can also define how the timestamp is extracted from the data or how it should be used (assuming the data contains a timestamp).
[VulDB] TRUNCATE = 0 TIME_PREFIX = timestamp":.*create": TIME_FORMAT = %s TZ = UTC INDEXED_EXTRACTIONS = JSON
The last step is to create and configure a new instance of the modular input in the web GUI based on the parameters defined in the script and in the file inputs.conf.spec
. Here you can also define which Splunk source type and which index to use when saving data. Once the script has been executed successfully, the data that has been retrieved is indexed, saved and available for further use.
Communication with and using web-based APIs as a data source for analytics purposes is a popular use case. It is relatively easy to use Splunk and its SDK to develop solutions in the form of apps. Splunk’s native support for common data formats like JSON as well as the availability of powerful libraries for web access have proven very useful here. To communicate with the API and handle errors, it is usually a good idea to implement a dedicated API client. It is also important to provide adequate security for credentials or API keys and to consider any specific attributes of the integrated APIs, such as pagination. Once the data is indexed and saved in Splunk, it will be available for analytics, correlations and visualizations. The procedure described here can also be used similarly with other data sources for a quickly implemented and generally useful solution for analyzing data.
Our experts will get in contact with you!
Tomaso Vasella
Tomaso Vasella
Tomaso Vasella
Tomaso Vasella
Our experts will get in contact with you!