Flink On Yarn Monitoring
Measurement and monitoring of general metrics for Flink stream engine in Yarn running mode.
Configuration Parameters
| Parameter Name | Parameter Help Description |
|---|---|
| Monitoring Host | The monitored peer's IPV4, IPV6, or domain name. Note ⚠️ do not include protocol headers (e.g., https://, http://). |
| Task Name | The name identifying this monitoring task. The name must be unique. |
| Yarn Port | The Yarn port, corresponding to the port in yarn.resourcemanager.webapp.address. |
| Query Timeout | The timeout for JVM connections, in milliseconds, default is 3000 ms. |
| Enable SSL | Whether to enable SSL |
| Username | Connection username |
| Password | Connection password |
| Monitoring Interval | Interval for periodic data collection, in seconds, minimum interval is 30 seconds. |
| Tags | Used for categorizing and managing monitoring resources. |
| Description | Additional notes and descriptions for this monitoring task. Users can add notes here. |
Collected Metrics
Metrics Set: JobManager Metrics
| Metric Name | Metric Unit | Metric Help Description |
|---|---|---|
| Status.JVM.Memory.NonHeap.Committed | Bytes | Non-heap memory committed |
| Status.JVM.Memory.Mapped.TotalCapacity | Bytes | Total capacity of mapped memory |
| Status.JVM.Memory.NonHeap.Used | Bytes | Non-heap memory used |
| Status.JVM.Memory.Metaspace.Max | Bytes | Maximum capacity of metaspace |
| Status.JVM.GarbageCollector.G1_Old_Generation.Count | Count | Count of old generation garbage collections |
| Status.JVM.Memory.Direct.MemoryUsed | Bytes | Direct memory used |
| Status.JVM.Memory.Mapped.MemoryUsed | Bytes | Mapped memory used |
| Status.JVM.GarbageCollector.G1_Young_Generation.Count | Count | Count of young generation garbage collections |
| Status.JVM.Memory.Direct.TotalCapacity | Bytes | Total capacity of direct memory |
| Status.JVM.GarbageCollector.G1_Old_Generation.Time | ms | Time spent on old generation garbage collections |
| Status.JVM.Memory.Heap.Committed | Bytes | Heap memory committed |
| Status.JVM.Memory.Mapped.Count | Count | Count of mapped memory |
| Status.JVM.Memory.Metaspace.Used | Bytes | Metaspace memory used |
| Status.JVM.Memory.Direct.Count | Count | Count of direct memory |
| Status.JVM.Memory.Heap.Used | Bytes | Heap memory used |
| Status.JVM.Memory.Heap.Max | Bytes | Maximum capacity of heap memory |
| Status.JVM.GarbageCollector.G1_Young_Generation.Time | ms | Time spent on young generation garbage collections |
| Status.JVM.Memory.NonHeap.Max | Bytes | Maximum capacity of non-heap memory |
Metrics Set: JobManager Config
| Metric Name | Metric Unit | Metric Help Description |
|---|---|---|
| internal.jobgraph-path | - | Internal job graph path |
| env.java.home | - | Java environment path |
| classloader.check-leaked-classloader | - | Whether to check for leaked class loaders |
| env.java.opts | - | Java options |
| high-availability.cluster-id | - | High availability cluster ID |
| jobmanager.rpc.address | - | JobManager's RPC address |
| jobmanager.memory.jvm-overhead.min | Bytes | Minimum JVM overhead for JobManager |
| jobmanager.web.port | Port | JobManager's Web port |
| webclient.port | Port | Web client port |
| execution.savepoint.ignore-unclaimed-state | - | Whether to ignore unclaimed state |
| io.tmp.dirs | Path | Temporary file directories |
| parallelism.default | - | Default parallelism |
| taskmanager.memory.fraction | - | TaskManager memory fraction |
| taskmanager.numberOfTaskSlots | - | Number of task slots for TaskManager |
| yarn.application.name | - | Yarn application name |
| taskmanager.heap.mb | MB | Heap memory size for TaskManager |
| taskmanager.memory.process.size | GB | Process memory size for TaskManager |
| web.port | Port | Web port |
| classloader.resolve-order | - | Class loader resolve order |
| jobmanager.heap.mb | MB | Heap memory size for JobManager |
| jobmanager.memory.off-heap.size | Bytes | Off-heap memory size for JobManager |
| state.backend.incremental | - | Whether the state backend is incremental |
| execution.target | - | Execution target |
| jobmanager.memory.process.size | GB | Process memory size for JobManager |
| web.tmpdir | Path | Web temporary directory |
| yarn.ship-files | Path | Yarn shipped files |
| jobmanager.rpc.port | Port | JobManager's RPC port |
| internal.io.tmpdirs.use-local-default | - | Whether to use local default temporary directories |
| execution.checkpointing.interval | ms | Checkpointing interval |
| execution.attached | - | Whether to execute attached |
| internal.cluster.execution-mode | - | Internal cluster execution mode |
| execution.shutdown-on-attached-exit | - | Whether to shutdown on attached exit |
| pipeline.jars | Path | Pipeline JAR files |
| rest.address | - | REST address |
| state.backend | - | State backend type |
| jobmanager.memory.jvm-metaspace.size | Bytes | JVM metaspace size for JobManager |
| $internal.deployment.config-dir | Path | Internal deployment configuration directory |
| $internal.yarn.log-config-file | Path | Internal Yarn log configuration file path |
| jobmanager.memory.heap.size | Bytes | Heap memory size for JobManager |
| state.checkpoints.dir | Path | State checkpoints directory |
| jobmanager.memory.jvm-overhead.max | Bytes | Maximum JVM overhead for JobManager |
TaskManager Metrics
| Metric Name | Metric Unit | Metric Help Description |
|---|---|---|
| Container ID | - | Container ID for uniquely identifying a container |
| Path | - | Container path |
| Data Port | Port | Data transmission port |
| JMX Port | Port | JMX (Java Management Extensions) port |
| Last Heartbeat | Timestamp | Last heartbeat time |
| All Slots | Count | Total number of task slots in the container |
| Free Slots | Count | Number of free task slots in the container |
| totalResourceCpuCores | Cores | Total number of CPU cores in the container |
| totalResourceTaskHeapMemory | MB | Total task heap memory size in the container |
| totalResourceManagedMemory | MB | Total managed memory size in the container |
| totalResourceNetworkMemory | MB | Total network memory size in the container |
| freeResourceCpuCores | Cores | Number of free CPU cores in the container |
| freeResourceTaskHeapMemory | MB | Free task heap memory size in the container |
| freeResourceTaskOffHeapMemory | MB | Free task off-heap memory size in the container |
| freeResourceManagedMemory | MB | Free managed memory size in the container |
| freeResourceNetworkMemory | MB | Free network memory size in the container |
| CPU Cores | Cores | Number of CPU cores |
| Physical MEM | MB | Size of physical memory |
| JVM Heap Size | MB | Size of JVM heap memory |
| Flink Managed MEM | MB | Size of Flink managed memory |
| Framework Heap | MB | Size of framework heap memory |
| Task Heap | MB | Size of task heap memory |
| Framework Off-Heap | MB | Size of framework off-heap memory |
| memoryConfigurationTaskOffHeap | Bytes | Task off-heap memory configuration |
| Network | MB | Network memory configuration |
| Managed Memory | MB | Managed memory configuration |
| JVM Metaspace | MB | Size of JVM metaspace |
| JVM Overhead | MB | JVM overhead |
| memoryConfigurationTotalFlinkMemory | Bytes | Total Flink memory configuration |
| memoryConfigurationTotalProcessMemory | Bytes | Total process memory configuration |
TaskManager Status Metrics
| Metric Name | Metric Unit | Metric Help Description |
|---|---|---|
| Status.Shuffle.Netty.TotalMemory | MB | Total memory used by Netty Shuffle |
| Status.Flink.Memory.Managed.Used | MB | Managed memory used by Flink |
| Status.JVM.Memory.Metaspace.Used | MB | Used JVM metaspace memory |
| Status.JVM.Memory.Metaspace.Max | MB | Maximum JVM metaspace memory |
| Status.JVM.Memory.Heap.Used | MB | Used JVM heap memory |
| Status.JVM.Memory.Heap.Max | MB | Maximum JVM heap memory |
| Status.Flink.Memory.Managed.Total | MB | Total managed memory by Flink |
| Status.Shuffle.Netty.UsedMemory | MB | Used memory by Netty Shuffle |