Wednesday, November 22, 2017

Spelunking your Splunk – Part II (Disk Usage)

By Tony Lee

In our first article of the series, Spelunking your Splunk Part I (Exploring Your Data), we looked at a clever dashboard that can be used to quickly understand the indexes, sources, sourcetypes, and hosts in any Splunk environment.  Now we will examine disk usage!

You may know this already--Splunk stores data on indexers. But have you ever wanted to visually see indexer capacity?  Or in a distributed environment, have you ever wondered how well the data is distributed across the indexers?  We have a solution for both and will provide the code at the bottom of the article.

Finding disk usage information

There are a number of ways to query disk utilization within Splunk.  For example, you could create scripted input that makes a call to the operating system, but Splunk makes it even simpler than that...  Try copying and pasting this RESTful query into the search bar:

| rest splunk_server=* /services/server/status/partitions-space | eval usage = round((capacity - free) / 1024, 2) | eval capacity = round(capacity / 1024, 2) | eval compare_usage = usage." / ".capacity | eval pct_usage = round(usage / capacity * 100, 2)  | table updated, splunk_server, mount_point, fs_type, capacity, compare_usage, pct_usage | rename mount_point as "Mount Point", fs_type as "File System Type", compare_usage as "Disk Usage (GB)", capacity as "Capacity (GB)", pct_usage as "Disk Usage (%)" | sort splunk_server


This should result in something that looks like the following screenshot which provides information such as the server name, mount point, file system type, drive capacity, disk usage, and percentage of disk usage. If you receive information from non-indexers or mount points that are not related to your actual indexer mount points, you can either ignore them or filter them out of the search.


Figure 1:  The search that starts it all

Adding a gauge

This is pretty interesting information, especially in a distributed environment, but let's take it up a notch so we can see a visual representation.  The dashboard code at the bottom of the page will give you the basic building blocks to customize gauges on your disk usage page.

Figure 2:  Adding a filler gauge for each indexer

Note:  For the gauges, you should change two values:  splunk_server to match the value in the splunk_server column and mount_point to match the value in the Mount Point column in our original search.

For environments with clustered indexers, just add a gauge for each indexer.  The end result should look something like the following:

Figure 3:  Filler gauges across the index cluster

In this example, it is very easy to see one indexer that is not properly load balanced. This dashboard can also be used to trigger alerts based on disk usage.

Conclusion

Splunk provides good visibility into indexer health via the Monitoring Console / DMC (Distributed management console), but we found this visual representation quite helpful for monitoring disk usage and indexer cluster load balancing.   We hope this helps you too.


Dashboard XML code is below:

Below is the dashboard code needed to enumerate your servers and mount point and to create one gauge.  Now just copy the gauge code for as many gauges as needed:

<dashboard stylesheet="custom.css">
  <label>Disk Usage</label>
  <row>
    <panel>
      <chart>
        <title>Indexer-1</title>
        <search>
          <query>| rest splunk_server=* /services/server/status/partitions-space | search splunk_server=server_name_here mount_point="/" | eval usage = round((capacity - free) / 1024, 2) | eval capacity = round(capacity / 1024, 2) | eval compare_usage = usage." / ".capacity | eval pct_usage = round(usage / capacity * 100, 2)  | table pct_usage | rename mount_point as "Mount Point", fs_type as "File System Type", compare_usage as "Disk Usage (GB)", capacity as "Capacity (GB)", pct_usage as "Disk Usage (%)" | sort splunk_server</query>
          <earliest>0</earliest>
          <latest></latest>
        </search>
        <option name="charting.axisLabelsX.majorLabelStyle.overflowMode">ellipsisNone</option>
        <option name="charting.axisLabelsX.majorLabelStyle.rotation">0</option>
        <option name="charting.axisTitleX.visibility">visible</option>
        <option name="charting.axisTitleY.visibility">visible</option>
        <option name="charting.axisTitleY2.visibility">visible</option>
        <option name="charting.axisX.scale">linear</option>
        <option name="charting.axisY.scale">linear</option>
        <option name="charting.axisY2.enabled">0</option>
        <option name="charting.axisY2.scale">inherit</option>
        <option name="charting.chart">fillerGauge</option>
        <option name="charting.chart.bubbleMaximumSize">50</option>
        <option name="charting.chart.bubbleMinimumSize">10</option>
        <option name="charting.chart.bubbleSizeBy">area</option>
        <option name="charting.chart.nullValueMode">gaps</option>
        <option name="charting.chart.rangeValues">[0,50,75,100]</option>
        <option name="charting.chart.showDataLabels">none</option>
        <option name="charting.chart.sliceCollapsingThreshold">0.01</option>
        <option name="charting.chart.stackMode">default</option>
        <option name="charting.chart.style">shiny</option>
        <option name="charting.drilldown">all</option>
        <option name="charting.gaugeColors">["0x84E900","0xFFE800","0xBF3030"]</option>
        <option name="charting.layout.splitSeries">0</option>
        <option name="charting.layout.splitSeries.allowIndependentYRanges">0</option>
        <option name="charting.legend.labelStyle.overflowMode">ellipsisMiddle</option>
        <option name="charting.legend.placement">right</option>
      </chart>
    </panel>
  </row>
  <row>
    <panel>
      <table>
        <search>
          <query>| rest splunk_server=* /services/server/status/partitions-space | eval usage = round((capacity - free) / 1024, 2) | eval capacity = round(capacity / 1024, 2) | eval compare_usage = usage." / ".capacity | eval pct_usage = round(usage / capacity * 100, 2)  | table updated, splunk_server, mount_point, fs_type, capacity, compare_usage, pct_usage | rename mount_point as "Mount Point", fs_type as "File System Type", compare_usage as "Disk Usage (GB)", capacity as "Capacity (GB)", pct_usage as "Disk Usage (%)" | sort splunk_server</query>
          <earliest>-15m</earliest>
          <latest>now</latest>
        </search>
        <option name="count">10</option>
        <option name="dataOverlayMode">none</option>
        <option name="drilldown">cell</option>
        <option name="rowNumbers">false</option>
        <option name="wrap">true</option>
      </table>
    </panel>
  </row>
</dashboard>

No comments:

Post a Comment