StackStorm Centralized Logging with Graylog

August 23, 2017

StackStorm Centralized Logging with Graylog

August 22, 2017
By Nick Maludy of Encore Technologies

Want to implement centralized logging for your StackStorm deployment? Read on to find out how to send your StackStorm logs to Graylog, and produce quality dashboards like this:

https://stackstorm.com/wp/wp-content/uploads/2017/08/dashboard.png

Background: Centralised Logging and StackStorm

One of the pillars of modern application deployments is aggregating its logs in a centralized logging application such as ELK stack, Splunk or Graylog. Centralized logging allows engineers to format, index and query logs from across their stack and distributed applications and be able to access them in a single pane of glass. StackStorm is a distributed application with multiple services that can benefit greatly from centralized logging aggregation. In this blog post, we’ll investigate how to configure StackStorm to output structured logs, setup and configure Fluentd to ship these logs, and finally configure Graylog to receive, index and query the logs.

Structured Logging

Structured logging is a fancy term for writing log output from an application in JSON format. When logs are output in JSON this gives context for all of the information contained in each log message. This context allows log shippers to save precious CPU cycles by not having to parse out this information from plain text logs. It also allows centralized logging applications to effectively index the logs and provide it with multiple fields with which to query.

To demonstrate the difference between plain text logs and structured logs we’ll take an example from st2api. Below is an example of a standard log message that is written to /var/log/st2/st2api.log:

2017-08-19 11:16:38,767 83927760 INFO mixins [-] Connected to amqp://guest:**@127.0.0.1:5672//

As you can see this has some information such as the timestamp, log level, and several other fields. If we were to try to utilize this in some meaningful way a parser would need to be written to extract the data fields. If the log message was instead written in a standard format (JSON) we could easily parse it and quickly make meaningful use of the fields within the message. Below is the structured logging message that corresponds to the plain text log from above.

{"version": "1.1", "level": 6, "timestamp": 1503174203, "_python": {"name": "kombu.mixins", "process": 76071, "module": "mixins", "funcName": "Consumer", "processName": "MainProcess", "lineno": 231, "filename": "mixins.py"}, "host": "stackstorm.domain.tld", "full_message": "Connected to amqp://guest:**@127.0.0.1:5672//", "short_message": "Connected to %s"}

This is great, but kind of hard to read. Below is the same log message formatted in a way that’s easier to read.

{

  "version": "1.1",

  "level": 6,

  "timestamp": 1503174203,

  "_python": {

    "name": "kombu.mixins",

    "process": 76071,

    "module": "mixins",

    "funcName": "Consumer",

    "processName": "MainProcess",

    "lineno": 231,

    "filename": "mixins.py"

  },

  "host": "stackstorm.domain.tld",

  "full_message": "Connected to amqp://guest:**@127.0.0.1:5672//",

  "short_message": "Connected to %s"

}

This output is in GELF (Graylog Extended Logging Format) JSON format. GELF log messages are nothing more than JSON with a few standard fields in the payload. The GELF payload specification can be found here. GELF also defines two wire protocol formats, GELF UDP and GELF TCP that detail how GELF JSON log messages can be sent to Graylog.

Log Shippers

A log shipper is an application that reads in log messages from some source, usually a log file, potentially transforms the message and then transmits it to some destination, usually a log aggregation or centralized logging application. There are several commonly used log shippers out there including Fluentd, Logstash, and Filebeat.

In this article we’re going to be using Fluentd because it was the easiest one to configure for parsing GELF JSON and shipping to Graylog.

Architecture

The setup detailed in this blog post will adhere to the following architecture:

https://stackstorm.com/wp/wp-content/uploads/2017/08/pipeline.png

First, StackStorm uses the Python logging module to write logs to /var/log/st2/*.log in GELF JSON format. The log shipper Fluentd monitors those log files for changes, reads in any new messages, converts them into GELF UDP format and sends that to Graylog. Finally, Graylog receives GELF UDP and indexes the log messages.

Configuring StackStorm Logging

StackStorm uses Python’s builtin logging module for application level logging. In this module there are two key concepts: formatters and handlers.

A formatter takes a log function call in python code and translates that into a string of text.

Python Logging Call

server = stackstorm.domain.tld

LOG.debug("Connecting to server %s".format(server))

Log String

2017-08-19 11:16:38,767 DEBUG [-] Connecting to server stackstorm.domain.tld

handlers take the log message strings and writes it to some destination. The builtin handlers can write to a file, syslog, UDP, TCP and more.

StackStorm logging configuration files are written in the logging module’s configuration file format. To configure StackStorm to write structured logs we’ll be editing the logging config file stored in /etc/st2/logging.<component>.conf. StackStorm ships with a formatter st2common.logging.formatters.GelfLogFormatter that emits structured logs in GELF format. Luckily StackStorm AUDIT logs utilize the GelfLogFormatter so there is a reference already defined that we can reuse. All we need to do is add another handler to each config that writes the GELF logs to a new file. We can define a new log handler by adding the following to every logging config:


# For all components except actionrunner

[handler_gelfHandler]

class=handlers.RotatingFileHandler

level=DEBUG

formatter=gelfFormatter

args=("/var/log/st2/st2<component>.gelf.log",)

 

# For actionrunner only (needs a different handler classs)

[handler_gelfHandler]

class=st2common.log.FormatNamedFileHandler

level=INFO

formatter=gelfFormatter

args=("/var/log/st2/st2actionrunner.{pid}.gelf.log",)

Now that we have a new handler defined we need to tell the logger about it. To accomplish this we’ll need to add gelfHandler to the following sections:

[handlers]

# add ', gelfHandler' the end of the following line

keys=consoleHandler, fileHandler, auditHandler, gelfHandler

 

[logger_root]

level=INFO

# add ', gelfHandler' the end of the following line

handlers=consoleHandler, fileHandler, auditHandler, gelfHandler

StackStorm should now be configured to write structured logs to /var/log/st2/st2<component>.gelf.log. In order for these changes to be realized we need to restart the StackStorm services. This can be accomplished by either restarting all StackStorm processes:

st2ctl restart

Or we can restart just the components we’ve modified

systemctl restart st2<component>

This is a good time to check /var/log/st2/st2<component>.gelf.log and make sure logs are present.

Astute readers may be asking “if the builtin logging facility provides a UDP handler, why not use it to send logs directly to Graylog?”. The answer is fairly simple, the DatagramHandler which writes log strings to UDP does NOT format the messages in GELF UDP format. GELF UDP requires a special header at the beginning of every packet. To accommodate this we’ll be using Fluentd in the next section to send the log message in GELF UDP format to Graylog.

Configuring the Log Shipper Fluentd

We’re going to use Fluentd to read from /var/log/st2/st2<component>.gelf.log and transform the log messages into GELF UDP format, then send those UDP packets to Graylog.

First we need to install Fluentd v0.14.

Note Fluentd v0.14 is required if you would like sub-second resolution on your logging timestamps. In Fluentd v0.12 timestamps are rounded to 1-second resolution. This causes the messages in graylog to potentially be viewed out-of-order because Graylog doesn’t know which message came first within a 1-second interval.

Below are instructions for installation on RHEL 7, for all other platforms please follow the official documentation here.

Note Fluentd is the name of the log shipping application and it is written by a company called Treasure Data (td). The agent installed on your machine is called td-agent and it wraps Fluentd in a service file that’s specific to your platform.

# add GPG key

rpm --import https://packages.treasuredata.com/GPG-KEY-td-agent

 

# add treasure data repository to yum

cat >/etc/yum.repos.d/td.repo <<'EOF'

[treasuredata]

name=TreasureData

baseurl=http://packages.treasuredata.com/3/redhat/\$releasever/\$basearch

gpgcheck=1

gpgkey=https://packages.treasuredata.com/GPG-KEY-td-agent

EOF

 

# update your sources

yum check-update

 

# install the toolbelt

yum install -y td-agent

 

# start service

systemctl start td-agent

systemctl enable td-agent

After installation we need to install a Fluentd plugin that implements GELF UDP output formatting.

/usr/sbin/td-agent-gem install fluent-plugin-gelf-hs

Next we need to configure Fluentd to tail the new StackStorm log files we configured in the previous section. The default location for the Fluentd config file is /etc/td-agent/td-agent.conf:

export GRAYLOG_SERVER=graylog.domain.tld

export GRAYLOG_GELF_UDP_PORT=12202

cat >> /etc/td-agent/td-agent.conf << EOF

<source>

  type tail

  format json

  path /var/log/st2/st2actionrunner*.gelf.log

  tag st2actionrunner

  pos_file /var/run/td-agent/st2actionrunner.gelf.log.pos

  enable_watch_timer false

  estimate_current_event true

</source>

 

<source>

  type tail

  format json

  path /var/log/st2/st2api.gelf.log

  tag st2api

  pos_file /var/run/td-agent/st2api.gelf.log.pos

  enable_watch_timer false

  estimate_current_event true

</source>

 

<source>

  type tail

  format json

  path /var/log/st2/st2auth.gelf.log

  tag st2auth

  pos_file /var/run/td-agent/st2auth.gelf.log.pos

  enable_watch_timer false

  estimate_current_event true

</source>

 

<source>

  type tail

  format json

  path /var/log/st2/st2garbagecollector.gelf.log

  tag st2garbagecollector

  pos_file /var/run/td-agent/st2garbagecollector.gelf.log.pos

  enable_watch_timer false

  estimate_current_event true

</source>

 

<source>

  type tail

  format json

  path /var/log/st2/st2notifier.gelf.log

  tag st2notifier

  pos_file /var/run/td-agent/st2notifier.gelf.log.pos

  enable_watch_timer false

  estimate_current_event true

</source>

 

<source>

  type tail

  format json

  path /var/log/st2/st2resultstracker.gelf.log

  tag st2resultstracker

  pos_file /var/run/td-agent/st2resultstracker.gelf.log.pos

  enable_watch_timer false

  estimate_current_event true

</source>

 

<source>

  type tail

  format json

  path /var/log/st2/st2rulesengine.gelf.log

  tag st2rulesengine

  pos_file /var/run/td-agent/st2rulesengine.gelf.log.pos

  enable_watch_timer false

  estimate_current_event true

</source>

 

<source>

  type tail

  format json

  path /var/log/st2/st2sensorcontainer.gelf.log

  tag st2sensorcontainer

  pos_file /var/run/td-agent/st2sensorcontainer.gelf.log.pos

  enable_watch_timer false

  estimate_current_event true

</source>

 

<source>

  type tail

  format json

  path /var/log/st2/st2stream.gelf.log

  tag st2stream

  pos_file /var/run/td-agent/st2stream.gelf.log.pos

  enable_watch_timer false

  estimate_current_event true

</source>

 

<match st2**>

  type gelf

  host $GRAYLOG_SERVER

  port $GRAYLOG_GELF_UDP_PORT

  protocol udp

  flush_interval 5s

  estimate_current_event true

</match>

EOF

Note estimate_current_event true is used in the config file because the timestamps emitted by StackStorm are rounded to 1-second resolutions. This is fixed in PR #3662 where a new field timestamp_f is added to the GELF logging output. This PR has been merged and should be available in StackStorm v2.4. In these versions you can replace estimate_current_event true with:

time_key timestamp_f

keep_time_key true

Finally we need to restart Fluentd so that the config file changes are realized:

systemctl restart td-agent

Fluentd should now be sending log messages to Graylog, however Graylog is not listening.

Configuring Graylog

To configure Graylog to receive GELF UDP messages we need to add a new Input. In the Graylog WebUI navigate to System > Inputs:

https://stackstorm.com/wp/wp-content/uploads/2017/08/inputs.png

To add a new input click the dropdown Select a new input type: and select GELF UDP then press the button Launch new Input.

https://stackstorm.com/wp/wp-content/uploads/2017/08/configure_input.png

In the new input dialog configure it with the following settings:

·      Global = Yes

·      Name = GELF UDP

·      Port = 12202

Leave all other settings as defaults, and click Save.

https://stackstorm.com/wp/wp-content/uploads/2017/08/input_options.png

Why did we choose port 12202? Graylog, by default, logs its internal logs to udp/12201 so we need to choose a different port to differentiate the inputs. Graylog should now be receiving log messages from StackStorm.

https://stackstorm.com/wp/wp-content/uploads/2017/08/log_search.png

If you’re not seeing any messages flowing in you can always run an action st2 run or restart a service systemctl restart st2api and this should force logs to be written.

Conclusion

We’ve introduced you to structured logging and log shippers, then walked you through the configuration and setup of utilizing these technologies to stream StackStorm logs into the centralized logging application Graylog. Now that we have StackStorm logs into Graylog, what can we do with them? In a future blog post I’ll walk you through creating a dashboard that will provide insight and visualization of your StackStorm deployment.

About The Author

Nick Maludy is the DevOps Manager at Encore Technologies, a company out of Cincinnati Ohio that specializes in Datacenters, Cloud and Managed Services, Professional Services and Hardware Sales. Nick works in the Cloud and Managed Services organization that is focused on providing customers with tailored IT solutions to accelerate their business through automation and modernization.

Categories:  

Something Old, New, Borrowed, EternalBlue

July 25, 2017

By Mike Schuetter, Vice President of Information Security 

If you are like me, over the last couple of months, you were inundated with emails and webinars from the security industry talking about WannaCry and NotPetya.  While the industry continues to point to restricting Internet exposure, frequency of patching systems and maintaining a solid backup strategy (oh and buying their tools), I cannot help but think we are still missing something in the conversation.


If I go back into my dusty memory banks, I recall fighting Zotob, Conficker and all their variants “back in the day”.  At the time, these worms clogged up network pipes and once cleaned out, we went back to normal operation.  As these attacks started to die out, the industry claimed that criminals were hoarding their zero days to someday monetize them instead of wasting them on simple worms.  A couple years later, we experienced Stuxnet from afar - a worm that seemingly targeted extremely specific nuclear control systems. While we saw the impact as it crossed the virtual-physical boundary, we wrote this off as nation state activity which requires a high degree of sophistication… something most of us in the commercial world would not need to ever worry about.


Since then, we have watched Ransomware take center stage – malware with short lives that need to be sent out to as many people as possible before security vendors catch up to write signatures and decryption programs for them.  Even with a small life span, a lot of money can be made.  Then of course we had Shadow Brokers release NSA exploits into the wild, including EternalBlue and other SMB exploits, free of charge for anyone to use.  Following closely on its heels, we see WannaCry and NotPetya starting to worm their way through the internet.


The Shadow Brokers release was a game changer because it brought a level of nation state sophistication to the noisy-and-opportunistic-but-effective email blasts that characterized ransomware. Complement that with the fact that we are no better at controlling worm-like activities, and we have a recipe for the WannaCry and NotPetya developments that have made recent headlines.


So why was this so surprising and cause for so much coverage?  While hindsight is 20/20, and the threat is constantly evolving, it still seems like we should be better prepared for this type of attack. Maybe it’s due to the time in between events.  When we were fighting worms on a quarterly basis, we got really good at locating, quarantining and cleaning infected systems.  With such a large gap between the old and new, we forgot how painful it really was and we certainly lost our touch at remediation for such activities. Out of sight, out of mind leads to skills waning.  Or maybe it’s caused by our myopic focus on the newest, shiniest threat.  We seemingly have forgotten that worm propagation is even a threat to the environment.  Did something change in the environment that reduces the risk of worm-like activity where it was dropped out of our risk registers?  I don’t think so. 


Instead, I think we simply want to put the old in the past and tackle something new.  On top of that, we are bombarded in the industry with whatever the newest problem is and we all are drawn to it as we put the old (yet unresolved) problems behind us.  Or have we stopped focusing on the old issues because we believe that our tools are taking care of the routine, allowing us to focus our attention on the new hot topic.  But just how good is our tool set? Is it doing what it’s supposed to do?  My auditors call it “operating effectiveness”.  The controls are put in place for a purpose…security needs to monitor to ensure it is performing as designed. Are they still appropriate to defend against the newest rendition of attacks we are seeing?


In the end, I tend to believe that our security programs are deficient in providing proper risk management and oversight within the organization.  We need to get out of the weeds and see the bigger picture.


When something like Shadow Brokers releases NSA zero day exploits into the wild, our security management should be attempting to understand its impact on our environments.  How can the newly released exploits be leveraged against us?  How exposed are we to such an attack?  Is our security architecture capable of defending against such exploitation?  If not, what changes should be made to reduce its impact?  We should be proactively consuming the new threat intelligence as it is made available, incorporating it into our risk register while working through our risk management processes to determine if further action is necessary.


Yes, you need to reduce your exposure to the Internet and have a solid patch management and backup strategy in place.  But for many, I feel like immaturity in the risk management and oversight functions is problematic.  While we do not know what the next threat will look like, improvements in these two areas would lead to a more solid stance against the next wave of unknowns.



Categories:   Cyber Security,

Future of the Data Center

March 02, 2017

The future of the data center lies in a highly flexible, virtualized environment.  For a customer, that represents a pool of technological resources, rather than physical equipment and locations. Services and products are provisioned quickly, and with little effort, in a way that maximizes resource usage. This flexibility frees a business to continuously adjust its technology to adapt to changes in business and financial requirements.
Encore's new data center takes advantage of this agile technology. It is designed to be concurrently maintainable, secure and is monitored 24 hours a day by certified staff.
Categories:   Data Center,

Cisco and Apple Are Fast-Tracking the Mobile Enterprise

August 25, 2016

Two industry giants, Cisco and Apple, are collaborating on solutions for the mobile enterprise. Encore Technologies is one of the few Apple and Cisco certified partners in the nation. Contact our sales team today to see how we can help support your Cisco or Apple environments. sales@encore.tech 

http://www.cisco.com/c/m/en_us/solutions/strategic-partners/apple.html
 
Categories:   Apple, Cisco,

Encore Technologies Achieves Cisco’s Advanced Collaboration Architecture Specialization

August 08, 2016

As of August 7th, 2016 Encore Technologies met all necessary criteria to achieve the prestigious Advanced Collaboration Architecture Specialization from Cisco Systems, Inc. By achieving this certification, Encore joins an elite group of value-added Cisco partners who are able to provide sophisticated Cisco solutions to our customers. If your organization is in need of a complex Cisco based solution give us a call today at 1-866-990-3526 or email our sales team at sales@encore.tech. 

Categories:   Cisco,


Page 1 of 3