Skip to main content
Version: 1.8.0

Metrics

Overview

Magma gateways and orc8r generate a lot of metrics which provides a great deal of visibility into gateways, base stations, subscribers, reliability, throughput etc. These metrics are regularly pushed into prometheus, which along with grafana enables us to store and query for these metrics. All the metrics are stored in prometheus for a default of 30 days. For unlimited retention and a more scaled metrics pipeline, we also support deploying magma with Thanos.

Metrics Explorer

Metrics explorer provides an easy way to learn and explore the metrics available in our system. Metrics explorer can be viewed through NMS. We can search and filter the metrics by the name or description. Additionally if we click on the detailed view on the metric, we enable exploring the current trends on the metric via grafana explorer. Metric Explorer1 Metric Explorer2

Grafana

Grafana provides a powerful, configurable, and user-friendly dashboarding solution. Any users within an organization can create and edit custom timeseries dashboards that will be visible to all other users in their organization. An important detail is that Grafana access is limited only to users in an organization with the "Super-User" title (you will select this when provisioning users in an organization). This is a technical workaround to ensure that users with additional network visibility restrictions within an organization can't see information from networks that they are restricted from as Grafana will allow all users to query across any network that the organization owns.

When you click on this link we have to do some book-keeping on the backend, so the initial load may take a few seconds. Grafana homepage You’ll see built in dashboards available to you. Grafana variables These dashboards contain dropdown selectors to choose which network(s) and gateway(s) you want to look at. In Grafana you can look at any collection of networks or gateways your organization has access to at once. Simply select or deselect the networks/gateways that you want to see and the graphs will be updated. In the top right corner, there is an option to choose the time range that the graphs display. The default is 6 hours.

Custom Dashboards

With Grafana, you can create your own custom dashboards and populate them with any graphs and queries you want. These custom dashboards will be visible to all other users in the organization that you belong to in the NMS. The simple way is to just click on the “+” icon on the left sidebar, then create a new dashboard. There is ample documentation about grafana dashboards online if you need help creating your dashboard.

Grafana new dashboard

If you want to replicate the networkID or gatewayID variables that you find in the preconfigured dashboards, we provide a “template” dashboard to make that easy. Simply open the Template dashboard, and click on the gear icon near the top right. From there, click “Save As” and enter the name you want. Your new dashboard will now have the gatewayID and networkID variables. An example of how to use these variables in your queries: Grafana query Some technical details: You need to use =~ when matching label names with these variables in order to see more than one network or gateway at a time. This is because the =~ operator tells Prometheus to match the value as a regex.

Enabling Access

The feature flag is enabled by default for all new organizations created in the NMS. If you want to turn this feature off or on, you can do so from the host organization. Log in to the host organization, navigate to the feature flag page using the left sidebar, then edit the feature flag named "Include tab for Grafana in the Metrics page". Support can be turned on and off for individual organizations.

List of metrics which are currently available

Up-to-date view would be available through metrics explorer

Metric IdentifierDescriptionCategoryService
s1_connectioneNodeB S1 connection statuseNodeBMME
user_plane_bytes_ulUser plane uplink byteseNodeB
user_plane_bytes_dlUser plane downlink byteseNodeB
enodeb_mgmt_connectedeNodeB management plane connectedeNodeBenodebd
enodeb_mgmt_configuredeNodeB is in configured stateeNodeBenodebd
enodeb_rf_tx_enabledeNodeB RF transmitter enabledeNodeBenodebd
enodeb_rf_tx_desiredeNodeB RF transmitter desired stateeNodeBenodebd
enodeb_gps_connectedeNodeB GPS synchronizedeNodeBenodebd
enodeb_ptp_connectedeNodeB PTP/1588 synchronizedeNodeBenodebd
enodeb_opstate_enabledeNodeB operationally enabledeNodeBenodebd
enodeb_reboot_timer_activeIs timer for eNodeB reboot activeeNodeBenodebd
enodeb_rebootseNodeb reboots countereNodeBenodebd
rcc_estab_attemptsRRC establishment attemptseNodeBenodebd
rrc_estab_successesRRC establishment successeseNodeBenodebd
rrc_reestab_attemptsRRC re-establishment attemptseNodeBenodebd
rrc_reestab_attempts_reconf_failRRC re-establishment attempts due to reconfiguration failureeNodeBenodebd
rrc_reestab_attempts_ho_failRRC re-establishment attempts due to handover failureeNodeBenodebd
rrc_reestab_attempts_otherRRC re-establishment attempts due to other causeeNodeBenodebd
rrc_reestab_successesRRC re-establishment successeseNodeBenodebd
erab_estab_attemptsERAB establishment attemptseNodeBenodebd
erab_estab_failuresERAB establishment failureseNodeBenodebd
erab_estab_successesERAB establishment successeseNodeBenodebd
erab_release_requestsERAB release requestseNodeBenodebd
erab_release_requests_user_inactivityERAB release requests due to user inactivityeNodeBenodebd
erab_release_requests_normalERAB release requests due to normal causeeNodeBenodebd
erab_release_requests_radio_resources_not_availableERAB release requests due to radio resources not availableeNodeBenodebd
erab_release_requests_reduce_loadERAB release requests due to reducing load in serving celleNodeBenodebd
erab_release_requests_fail_in_radio_procERAB release requests due to failure in the radio interface procedureeNodeBenodebd
erab_release_requests_eutran_reasERAB release requests due to EUTRAN generated reasonseNodeBenodebd
erab_release_requests_radio_conn_lostERAB release requests due to radio connection with UE losteNodeBenodebd
erab_release_requests_oam_interventionERAB release requests due to OAM intervetioneNodeBenodebd
pdcp_user_plane_bytes_ulUser plane uplink bytes at PDCPeNodeBenodebd
pdcp_user_plane_bytes_dlUser plane downlink bytes at PDCPeNodeBenodebd
ip_address_allocatedTotal IP addresses allocatedAGWmobilityd
ip_address_releasedTotal IP addresses releasedAGWmobilityd
s6a_auth_successTotal successful S6a auth requestsAGWsubscriberdb
s6a_auth_failureTotal failed S6a auth requestsAGWsubscriberdb
s6a_location_updateTotal S6a lcoation update requestsAGWsubscriberdb
diameter_capabilities_exchangeTotal Diameter capabilities exchange requestsAGWsubscriberdb
diameter_watchdogTotal Diameter watchdog requestsAGWsubscriberdb
diameter_disconnectTotal Diameter disconnect requestsAGWsubscriberdb
dp_send_msg_errorTotal datapath message send errorsAGWpipelined
arp_default_gw_mac_errorTotal errors with default gateway MAC resolutionAGWpipelined
openflow_error_msgTotal openflow error messages received by code and typeAGWpipelined
unknown_pkt_directionCounts number of times a packet is missing its flow directionAGWpipelined
enforcement_rule_install_failCounts number of times rule install failed in enforcement appAGWpipelined
enforcement_stats_rule_install_failCounts number of times rule install failed in enforcement stats appAGWpipelined
network_iface_statusStatus of a network interface required for data pipelineAGWpipelined
subscriber_icmp_latency_msReported latency for subscriber in millisecondsAGWmonitord
magmad_ping_rtt_msGateway ping metrics in millisecondsAGWmagmad
cpu_percentSystem-wide CPU utilization as percentage over 1 secondAGWmagmad
swap_memory_percentPercent of memory that can be assigned to processesAGWmagmad
virtual_memory_percentPercent of memory that can be assigned to processes without the system going to swapAGWmagmad
mem_totalTotal memoryAGWmagmad
mem_availableAvailable memoryAGWmagmad
mem_usedUsed memoryAGWmagmad
mem_freeFree memoryAGWmagmad
disk_percentPercent of disk space used for the volume mounted at rootAGWmagmad
bytes_sentSystem-wide network I/O bytes sentAGWmagmad
bytes_receivedSystem-wide network I/O bytes receivedAGWmagmad
temperatureTemperature readings from system sensorsAGWmagmad
checkin_status1 for checkin success, and 0 for failureAGWmagmad
bootstrap_exceptionCount for exceptions raised by bootstrapperAGWmagmad
unexpected_service_restartsCount of unexpected service restartsAGWmagmad
unattended_upgrade_statusUnattended Upgrade statusAGWmagmad
service_restart_statusCount of service restartsAGWmagmad
enb_connectedNumber of eNodeb connected to MMEAGWMME
ue_registeredNumber of UE registered successfullyAGWMME
ue_connectedNumber of UE connectedAGWMME
ue_attachNumber of UE attach successAGWMME
ue_detachNumber of UE detachAGWMME
s1_setupCounter for S1 setup successAGWMME
mme_sgs_eps_detach_indication_sentSGS EPS detach indication sentAGWMME
sgs_eps_detach_timer_expiredSGS EPS Detach Timer expiredAGWMME
sgs_eps_implicit_detach_timer_expiredSGS EPS Implicit detach Timer expiredAGWMME
mme_sgs_imsi_detach_indication_sentSGS IMSI detach indication sentAGWMME
sgs_imsi_detach_timer_expiredSGS IMSI detach timer expiredAGWMME
sgs_imsi_implicit_detach_timer_expiredSGS IMSI implicit detach timer expiredAGWMME
mme_spgw_delete_session_rspSPGW delete session responseAGWMME
initial_context_setup_failure_receivedInitial context setup failure receivedAGWMME
nas_service_rejectNAS Service RejectAGWMME
mme_s6a_update_location_ansS6a Update locationAGWMME
sgsap_paging_rejectSGS Paging RejectAGWMME
duplicate_attach_requestDuplicate attach requestAGWMME
authentication_failureAuth failureAGWMME
nas_auth_rsp_timer_expiredNAS Auth response timer expiredAGWMME
emm_status_rcvdEMM status receivedAGWMME
emm_status_sentEMM status sentAGWMME
nas_security_mode_command_timer_expiredNAS security mode command timer expiredAGWMME
extended_service_requestExtended service requestAGWMME
tracking_area_update_reqTracking area update requestAGWMME
tracking_area_updateTracking area update successAGWMME
service_requestService requestAGWMME
security_mode_reject_receivedSecurity mode reject receivedAGWMME
mme_new_associationNew SCTP associationAGWMME
ue_context_release_command_timer_expiredUE context release command timer expiredAGWMME
enb_sctp_shutdown_ue_clean_up_timer_expiredSCTP shutdown UE clean up timer expiredAGWMME
ue_context_release_reqUE context release requestAGWMME
s1ap_error_ind_rcvdS1AP error indication receivedAGWMME
s1_reset_from_enbS1 reset from eNBAGWMME
nas_non_delivery_indication_receivedNAS non delivery indication receivedAGWMME
spgw_create_sessionSPGW create session successAGWMME
ue_pdn_connectionUE PDN connectionAGWMME
ue_pdn_connectivity_reqUE PDN connectivity requestAGWMME
ue_reported_usage / upReported TX traffic for subscriber / session in bytesAGWsessiond
ue_reported_usage / downReported RX traffic for subscriber / session in bytesAGWsessiond
ue_dropped_usage / upReported dropped TX traffic for subscriber / session in bytesAGWsessiond
ue_dropped_usage / downReported dropped RX traffic for subscriber / session in bytesAGWsessiond

REST API for querying metrics

api

Troubleshooting Metrics

On the gateways, magmad service collects metrics from all the services and pushes them to Orc8r. In Orc8r, metricsd receives the metrics and pushes them to registered metric exporters. Prometheus is one the main metric exporters. Specifically, Orc8r pushes the metrics to the edge-hub, which later scraped by prometheus instance.

On the query side, When we make queries through NMS or swagger, the Orc8r queries the prometheus instance directly.

We can effectively troubleshoot the metrics by looking at the logs in all these components involved. On the gateways, syslog might have error logs in case there is a failure during metric upload

ERROR:root:Metrics upload error! [StatusCode.UNKNOWN] rpc error: code = Unavailable desc = client_loop: send disconnect: Broken pipe

On the orc8r, we can debug the issues by dumping the logs on prometheus, prometheus-configmanager, prometheus-cache and metricsd

kubectl --namespace orc8r logs -l app.kubernetes.io/component=prometheus-configurer -c prometheus-configurer
kubectl --namespace orc8r logs -l app.kubernetes.io/component=prometheus -c prometheus
kubectl --namespace orc8r logs -l app.kubernetes.io/component=metricsd
helm --debug -n orc8r get values orc8r