Table of Contents

OS Agent - Troubleshooting

The host does not appear in the Monitor tab

Issue

Solution

1. Check Telegraf is running on the host

# Linux
sudo systemctl status telegraf
journalctl -u telegraf -n 50
 
# Windows (PowerShell)
Get-Service telegraf
Get-EventLog -LogName Application -Source Telegraf -Newest 20

If the service is stopped or in a crash loop, the log shows the reason. Fix it and start the service again.

2. Check Telegraf can reach the Collector

The most common Telegraf log line:

Error writing to outputs.http: Post “https://...”: dial tcp …: connect: connection refused

Test from the host:

curl -v https://collector.example.com/api/v1/os-agent/push

Even a 401 Unauthorized answer is good - it means the network path works.

If you get connection refused or timeout → check:

  1. The URL in telegraf.conf (scheme, host, port)
  2. The Collector is running and listening on that port
  3. No firewall blocks the connection from the host to the Collector

3. Check the API key

If Telegraf logs received status code: 401:

  1. Open the API Keys tab in the Collector
  2. Copy the current global key
  3. Update the X-API-Key line in telegraf.conf
  4. Restart Telegraf

4. Check the Active flag

If the Active checkbox in the API Keys tab is unchecked, every push is rejected with 401. Tick it and click Save.


The host appears but the status is ERROR

Issue

Solution

1. The Telegraf service is stopped

Restart it:

# Linux
sudo systemctl restart telegraf
 
# Windows
Restart-Service telegraf

2. Telegraf is running but pushes fail

Check the Telegraf log on the host - the latest error tells what is wrong (network, auth, host is set inactive…).

3. The API key was regenerated

After clicking Regenerate in the Collector, every deployed agent using the old key starts to be rejected. Update telegraf.conf with the new key and restart Telegraf.

4. The host is set Inactive

Open the host detail in the Monitor tab and click Activate.


The host shows but the Top Processes table is empty

Issue

Solution

1. The "process" input is not enabled

  1. Open the Configuration tab
  2. Tick process in the Enabled Inputs list
  3. Click Save
  4. Re-run the setup script on the host (or copy the new telegraf.conf over) and restart Telegraf

2. The "process" input is enabled but data has not arrived yet

procstat pushes every 30 seconds with the default top-K filter. Wait one cycle and click Refresh.

3. Telegraf runs in a Docker container without SYS_PTRACE

Inside Docker without SYS_PTRACE, the procstat input only sees Telegraf's own process. Add the cap when starting the container:

docker run --cap-add SYS_PTRACE ... telegraf:latest

Or run Telegraf on the host instead of in a container.


Disk I/O on processes is always zero

Issue

Solution

This is not a bug - per-process disk I/O needs read access to /proc/<pid>/io on Linux:

If the host is in Docker, add –cap-add SYS_PTRACE or –privileged to the run, or accept that the field stays empty.


One specific input shows nothing

Issue

Solution


Auth Failures counter rises

Issue

Solution

The counter rises every time a push gets 401. Possible causes:

Click Clear in the Statistics tab, then watch the counter. If it stays at zero, the issue is fixed.


Parse Errors counter rises

Issue

Solution

The Collector received a body it cannot parse as Influx Line Protocol. Causes:

Open the Collector log, find the Failed to parse influx body: line - it shows the offending content. Fix the sender.


Host cap reached

Issue

OS Agent: host cap reached (10000), rejecting auto-discovery of <hostname>

Solution

Almost always means someone (or a script) is pushing with random host tag values using a valid API key. Steps:

  1. Open the Monitor tab and look at the recent entries - delete obvious junk hostnames
  2. Regenerate the global API key
  3. Redeploy the new key to legitimate agents only - the abuser is locked out

Old hosts I do not want anymore

Issue

Solution

Stop Telegraf on the dead host first

If Telegraf is still running with a valid key, the host will re-appear after every Delete. Either:

Then delete in the Collector

  1. Open the Monitor tab
  2. Expand the host
  3. Click Delete

If you cannot stop Telegraf (the host is unreachable), use Deactivate instead - inactive hosts reject all pushes and stay listed.


Configuration changes do not reach deployed agents

Issue

Solution

This is by design. The Collector does not push config to agents. Apply changes manually:

# Linux
sudo systemctl restart telegraf
 
# Windows
Restart-Service telegraf