Splunk is a widely used log hoarding/analysis system and has a fairly full featured Software Development Kit (SDK) available under a variety of languages. One of these is Python.

Unfortunately it is only available for the now ancient Python 2. This is not a problem for the most part, except when a company mandate or other requirement stipulates Python3 must be used.

A bug raised in Splunk’s GitHub from 2014 requests a Python 3 version and at the time of writing the most recent comment was from march 2017 asking how there is still not a Python 3 version in 2017.

Running the 2to3 utility, a tool to convert Python 2 code to Python 3 compatible code, on the Python 2 version fails with numerous errors.

There are a few attempts at ports kicking around in various git repositories on the internet. They didn’t work for me when I tried. Even if an attempt at porting did work, we would then be stuck with maintaining that port when Splunk decides to change things in their upstream software — an undesirable state of affairs!

Cheating

So what to do?

There aren’t too many options since Python 2 does some things quite differently from Python 3.

In the end I decided that the best option was to do the unthinkable…

Yes that nasty solution you didn’t want to consider.

We call Python 2 from your Python 3 code.

On the face of it, this seems really nasty with some annoying things to contend with:

  • getting the data into and out of Python 2
  • dealing with any errors that crop up poses a challenge, but with some tricks it can be done in a way that proves to be quite a seamless integration.

One of the nicest things about Python is the plethora of libraries one can call on.

The one that helps us here is pickle.

Pickle allows us to store a Python data structure in a file on disk.

This removes the problem of having to parse the unpredictable output of a Python 2 script when it is called from Python 3. We really can’t predict all the possible failure modes.

Another nice feature of pickle is that you can specify the protocol number and your file formats are compatible between Python 2 and 3. You can use the same method to get data into the Python 2 parts.

Errors and stack traces from the Python 2 part can be collected by Popen and raised as errors by the Python 3 script. This way you don’t loose visibility of what’s going on.

Since they are totally separate from any actual data parsed between Python 2 and 3 through pickle, we don’t have to worry about contamination of the data structures with random error messages or other unknowns. All necessary parameters, queries and arguments can be passed to the Python 2 part through pickle and there is no need for any configuration parameters to be buried in the Python 2 code.

Show Me the Code Already!

Here we see the critical code snippet.

pickle.dump(args, open(swap_file, "wb"), protocol=2)
    call_status = Popen([python2, fragment_file, swap_file], stdout=PIPE, stderr=PIPE)
    stdout, stderr = call_status.communicate()
    if stderr:
        raise Exception(stderr)
    output = pickle.load(open(swap_file, "rb"))

pickle.dump puts the request in swap_file, specifying that pickle should use protocol version 2.

We then call ‘python2’, in this case a var pointing to the system location of the Python 2 executable, pointing it at the fragment of code that calls the Splunk SDK, and swap_file which contains the Splunk query.

Any errors returned by Python 2 are picked up from Popen and raised.

If there are no errors, Python 2 writes the data it has received from Splunk back to swap_file, from which it is picked up and assigned to the output variable.

Of course you will probably need to customize the Python 2 code depending on your requirements. There are a million ways to use Splunk and the SDK is quite flexible.

The example I am about to give worked for my use case, doing some relatively simple searches with a few thousand results at most. All this while fitting into an existing Python 3 based tooling framework.

And Here It Is

I made a library that contains this tool and a couple of other bits and pieces I have found useful at one point or another. It can be found here:

https://github.com/sigmatechnica/monifyer