Node metrics and monitoring

The Optimism op-node exposes a variety of metrics to help observe the health of the system and debug issues. Metrics are formatted for use with Prometheus and exposed via a metrics endpoint. The default metrics endpoint is http://localhost:7300/metrics.

To enable metrics, pass the --metrics.enabled flag to the op-node. You can customize the metrics port and address via the --metrics.port and --metrics.addr flags, respectively.

Important metrics

To monitor the health of your node, you should monitor the following metrics:

  • op_node_default_refs_number: This metric represents the op-node's current L1/L2 reference block number for different sync types. If it stops increasing, it means that the node is not syncing. If it goes backwards, it means your node is reorging.
  • op_node_default_peer_count: This metric represents how many peers the op-node is connected to. Without peers, the op-node cannot sync unsafe blocks and your node will lag behind the sequencer as it will fall back to syncing purely from L1.
  • op_node_default_rpc_client_request_duration_seconds: This metric measures the latency of RPC requests initiated by the op-node. This metric is important when debugging sync performance, as it will reveal which specific RPC calls are slowing down sync. This metric exposes one timeseries per RPC method. The most important RPC methods to monitor are:
    • engine_forkChoiceUpdatedV1, engine_getPayloadV1, and engine_newPayloadV1: These methods are used to execute blocks on op-geth. If these methods are slow, it means that sync time is bottlenecked by either op-geth itself or your connection to it.
    • eth_getBlockByHash, eth_getTransactionReceipt, and eth_getBlockByNumber: These methods are used by the op-node to fetch transaction data from L1. If these methods are slow, it means that sync time is bottlenecked by your L1 RPC.

Available metrics

A complete list of available metrics is below:

METRICDESCRIPTIONLABELSTYPE
op_node_default_infoPseudo-metric tracking version and config infoversiongauge
op_node_default_up1 if the op node has finished starting upgauge
op_node_default_rpc_server_requests_totalTotal requests to the RPC servermethodcounter
op_node_default_rpc_server_request_duration_secondsHistogram of RPC server request durationsmethodhistogram
op_node_default_rpc_client_requests_totalTotal RPC requests initiated by the opnode's RPC clientmethodcounter
op_node_default_rpc_client_request_duration_secondsHistogram of RPC client request durationsmethodhistogram
op_node_default_rpc_client_responses_totalTotal RPC request responses received by the opnode's RPC clientmethod,errorcounter
op_node_default_l1_source_cache_sizeL1 Source cache sizetypegauge
op_node_default_l1_source_cache_getL1 Source cache lookups, hitting or nottype,hitcounter
op_node_default_l1_source_cache_addL1 Source cache additions, evicting previous values or nottype,evictedcounter
op_node_default_l2_source_cache_sizeL2 Source cache sizetypegauge
op_node_default_l2_source_cache_getL2 Source cache lookups, hitting or nottype,hitcounter
op_node_default_l2_source_cache_addL2 Source cache additions, evicting previous values or nottype,evictedcounter
op_node_default_derivation_idle1 if the derivation pipeline is idlegauge
op_node_default_pipeline_resets_totalCount of derivation pipeline resets eventscounter
op_node_default_last_pipeline_resets_unixTimestamp of last derivation pipeline resets eventgauge
op_node_default_unsafe_payloads_totalCount of unsafe payloads eventscounter
op_node_default_last_unsafe_payloads_unixTimestamp of last unsafe payloads eventgauge
op_node_default_derivation_errors_totalCount of derivation errors eventscounter
op_node_default_last_derivation_errors_unixTimestamp of last derivation errors eventgauge
op_node_default_sequencing_errors_totalCount of sequencing errors eventscounter
op_node_default_last_sequencing_errors_unixTimestamp of last sequencing errors eventgauge
op_node_default_publishing_errors_totalCount of p2p publishing errors eventscounter
op_node_default_last_publishing_errors_unixTimestamp of last p2p publishing errors eventgauge
op_node_default_unsafe_payloads_buffer_lenNumber of buffered L2 unsafe payloadsgauge
op_node_default_unsafe_payloads_buffer_mem_sizeTotal estimated memory size of buffered L2 unsafe payloadsgauge
op_node_default_refs_numberGauge representing the different L1/L2 reference block numberslayer,typegauge
op_node_default_refs_timeGauge representing the different L1/L2 reference block timestampslayer,typegauge
op_node_default_refs_hashGauge representing the different L1/L2 reference block hashes truncated to float valueslayer,typegauge
op_node_default_refs_seqnrGauge representing the different L2 reference sequence numberstypegauge
op_node_default_refs_latencyGauge representing the different L1/L2 reference block timestamps minus current time, in secondslayer,typegauge
op_node_default_l1_reorg_depthHistogram of L1 Reorg Depthshistogram
op_node_default_transactions_sequenced_totalCount of total transactions sequencedgauge
op_node_default_p2p_peer_countCount of currently connected p2p peersgauge
op_node_default_p2p_stream_countCount of currently connected p2p streamsgauge
op_node_default_p2p_gossip_events_totalCount of gossip events by typetypecounter
op_node_default_p2p_bandwidth_bytes_totalP2P bandwidth by directiondirectiongauge
op_node_default_sequencer_building_diff_secondsHistogram of Sequencer building time, minus block timehistogram
op_node_default_sequencer_building_diff_totalNumber of sequencer block building jobscounter
op_node_default_sequencer_sealing_secondsHistogram of Sequencer block sealing timehistogram
op_node_default_sequencer_sealing_totalNumber of sequencer block sealing jobscounter