Skip to main content
Version: v1.5.x

Monitoring Apache Yarn Monitoring

Hertzbeat monitors Apache Yarn node monitoring metrics.

Protocol Used: HTTP

Pre-monitoring Actions

Retrieve the HTTP monitoring port of Apache Yarn. Value: yarn.resourcemanager.webapp.address

Configuration Parameters

Parameter NameParameter Description
Target HostIP address, IPV6, or domain name of the monitored endpoint. Without protocol header.
PortMonitoring port number of Apache Yarn, default is 8088.
Query TimeoutTimeout for querying Apache Yarn, in milliseconds, default is 6000 milliseconds.
Metrics IntervalTime interval for monitoring data collection, in seconds, minimum interval is 30 seconds.

Collected Metrics

Metric Set: ClusterMetrics

Metric NameUnitMetric Description
NumActiveNMsNumber of currently active NodeManagers
NumDecommissionedNMsNumber of currently decommissioned NodeManagers
NumDecommissioningNMsNumber of nodes currently decommissioning
NumLostNMsNumber of lost nodes in the cluster
NumUnhealthyNMsNumber of unhealthy nodes in the cluster

Metric Set: JvmMetrics

Metric NameUnitMetric Description
MemNonHeapCommittedMMBCurrent committed size of non-heap memory in JVM
MemNonHeapMaxMMBMaximum available non-heap memory in JVM
MemNonHeapUsedMMBCurrent used size of non-heap memory in JVM
MemHeapCommittedMMBCurrent committed size of heap memory in JVM
MemHeapMaxMMBMaximum available heap memory in JVM
MemHeapUsedMMBCurrent used size of heap memory in JVM
GcTimeMillisJVM GC time
GcCountNumber of JVM GC occurrences

Metric Set: QueueMetrics

Metric NameUnitMetric Description
queueQueue name
AllocatedVCoresAllocated virtual cores (allocated)
ReservedVCoresReserved cores
AvailableVCoresAvailable cores (unallocated)
PendingVCoresBlocked scheduling cores
AllocatedMBMBAllocated (used) memory size
AvailableMBMBAvailable memory (unallocated)
PendingMBMBBlocked scheduling memory
ReservedMBMBReserved memory
AllocatedContainersNumber of allocated (used) containers
PendingContainersNumber of blocked scheduling containers
ReservedContainersNumber of reserved containers
AggregateContainersAllocatedTotal aggregated containers allocated
AggregateContainersReleasedTotal aggregated containers released
AppsCompletedNumber of completed applications
AppsKilledNumber of killed applications
AppsFailedNumber of failed applications
AppsPendingNumber of pending applications
AppsRunningNumber of currently running applications
AppsSubmittedNumber of submitted applications
running_0Number of jobs running for less than 60 minutes
running_60Number of jobs running between 60 and 300 minutes
running_300Number of jobs running between 300 and 1440 minutes
running_1440Number of jobs running for more than 1440 minutes

Metric Set: runtime

Metric NameUnitMetric Description
StartTimeStartup timestamp