Not really about solving the performance problems but rather about finding them first.
The “There are items missing data for more than 10 minutes” is one of the default triggers in recent zabbix installation and it indicates a problem with your queue (zabbix has no queue btw, just like there is no spoon :). So you click on “Administration” and then on “Queue” and discover that “More than 10 minutes” column on one of the rows or maybe more than one is very red indeed.
And from here you are stuck, you cannot know what are the items zabbix fails to update data for. At least I did not find a more convenient way to do that than just selecting a list from database. Why it is not a clickable option in zabbix interface I cannot say.
I’ve missed the drop-down in the top-right corner of the Administration->Queue screen. There is the “Details” option available that will show what’s exactly in the queue. Or there is still the database way described below.
What you have to look for in database? Unsurprisingly, the field names are not exactly what you’d expect from the GUI. The ones that interest us are:
- hosts.status – which you have to filter on 0, which means active host
- items.status – same as above, has to be 0
- items.lastclock – this is the last update timestamp I suppose.
- items.delay – which is the update period in seconds
To find the items that where not properly update for more than 10 minutes I should select all active items ( hosts.status and items.status are both equal to 0) that have their last update timestamp more than delay seconds+10 minutes in the past. Which works for me like this:
h.status AS hstatus,
i.hostid,
h.name AS host,
i.name AS item,
i.status
FROM items i
LEFT JOIN hosts h
ON h.hostid=i.hostid
WHERE
UNIX_TIMESTAMP()-lastclock > delay+(10*60)
AND
( h.status = 0 AND i.status = 0 );
Something similar to the above should give you something similar to the below:
| hstatus | hostid | host | item | status |
+---------+--------+--------+------------------------------------------------------+--------+
| 0 | 10057 | MyHost | DB Pool DataSource MaxActive | 0 |
| 0 | 10057 | MyHost | DB Pool DataSource MaxIdle | 0 |
| 0 | 10057 | MyHost | DB Pool DataSource MaxWait | 0 |
| 0 | 10057 | MyHost | DB Pool DataSource MinIdle | 0 |
| 0 | 10057 | MyHost | DB Pool DataSource NumActive | 0 |
Skip ......
+---------+--------+--------+------------------------------------------------------+--------+
178 rows in set (0.01 sec)
Where do you go from here? that really depends on many factors. Firstly it depends on what type of items are these. Understanding who is actually collecting and sending them will be the first step. Then you can probably proceed to understanding what prevents a timely update.