What is monitoring?
Monitoring means controlling and observing performance. It can be said that the most important decision-making factor for the future is the past statistics, so managers need statistics to make the best decisions. It is the same in the field of IT, that is most strategic decisions are based on information and statistics obtained from the performance records of network elements. Monitoring is not only used in network.
Why should monitoring be used?
According to the definition provided, this is efficiency in any field. For example, in the world’s big industries, monitoring the human and industrial sector helps a lot in the positive process of the company’s activity. In this way, by carefully monitoring the performance of each department, it is possible to find the weak points and fix them.
It is also possible to find the strong points and make the most of it by investing in that part, as a point of support and a feature different from other companies. So far, we have mentioned the reason why big companies invest in this field and value it. Since the leap of technology in the world, many specialists have worked on the development of monitoring. It can be imagined that before the introduction of monitoring technology in the companies, a lot of time was spent to find the best solution when there were problems.
Preparing a complete and comprehensive report
If we expand this idea on the scale of very large industries such as: automotive and aviation industries, then we will find that monitoring has provided a great service to the rapid growth of technology in the present era. The good news is that in the field of IT, it is possible to have a complete and comprehensive report on almost every part of the network including (computers, services, servers, switching and routing equipment, etc.).
How monitoring works
The way monitoring works is that it collects information from different parts in a centralized database and informs the controlling operators in different ways, including charts, tables, text, etc. This leads to less down time in case of problems. One of the factors driving the companies towards this solution is this blackouts reduction.
If the total number of Down Time seconds of the companies is calculated in one year (including human resources costs, drop in sales, lagging behind competitors, etc.), surely all industries will set up a monitoring section at a much lower cost.
The most widely used monitoring software
The most widely used and popular monitoring systems in the world, which have a very high-quality compared to other systems, are as follows:
- OP Manager
In this list, PRTG and ZABBIX systems are ahead of the other two examples by a great difference. The power of these two software is not hidden even on people who have worked with them in a very basic way. or even seen their program closely. The difference in the power and performance of these two softwares with other monitoring softwares is their widespread usage . (numerous consumers). These two softwares at many levels, including the infrastructure level are seen in control centers (NOC).
RSMS (Raha Smart Monitoring Solution)
Raha intelligent monitoring system is developed based on ZABBIX, which has a superb and functional environment. which can be exemplary compared to any other monitoring software. This environment is so practical and user-friendly that any person who works in the IT field and is familiar with the nature of monitoring, can interact with this system marvelously.
One of the features of this software is that it avoids difficult and highly specialized terms, unlike low-level software. And it is continuously updated to be more developed according to the needs of Iranian companies. In the following, we will get to know some of the features of this system in a specialized way.
Considering the efficiency and important role of servers in companies, as well as the heavy expenses of buying and repairing servers, it is logical to monitor the performance and health of servers automatically.
For example, a feature called iLO in HP servers, which is a chipset on the server motherboard and supports the SNMP protocol, controls all server hardware.
The following items can be controlled using ILO.
The health of system fans
Temperature of system components
and monitoring and controlling the status of hardware ports and… in RSMS in the system.
Also, in the reports obtained from ILO, it is possible to control the various slots of the system, including Memory, iSCSI and NIC, which can be checked and fixed as soon as possible when failure occurs.
Because a hypervisor is usually installed on servers (virtual servers), virtual machines may have problems sometimes.
Suppose if the server is not in the company’s Data Center, then it is not clear whether this problem is related to the virtualizer level (Hypervisor) or the server hardware. Therefore, by using the features of the monitoring system and by connecting to the server through Telnet and SSH, you can find out from which level the problem is. If this does not meet the requirement, then you can check the server hardware using iLO.
All the things mentioned in this topic are automatically observed in the monitoring system, which makes it possible to solve a problem before it becomes acute.
It is possible to communicate with different hardware through the operating system.
For example, if kernels of Linux operating systems are not working properly, they can affect the entire operation that is being performed on that operating system.
The operating system consists of different drivers to communicate with the hardware and codes that provide different services.
Therefore, the health and safety of drivers and operating system codes must be monitored so that both hardware and services can work properly.
Accordingly, drivers can be monitored and services can be ensured by using RSMS capabilities.
(Next, in the topic of Audit, files monitoring will be explained and the changes made on the files will be reported in details.)
Server virtualization is one of the most important pillars of the network, which can be said to have established the foundation of various activities today.
For example, different virtual machines are launched by virtualizer which provide different services. As a result, if the performance of hypervisors is not monitored, it can lead to huge problems.
Hypervisors have many items to control and monitor.
Among them, we can mention V switch Hosts, Vcenter, etc.
If we consider the workload in a scale where High availability and Fault tolerance (HA, FT) technologies are implemented, then we can understand how much management burden and excessive damages can be reduced by controlling this set in order to prevent problems before they occur.
The amount of hardware resources used by VMs and the Hypervisor itself are other reports that RSMS provides. Also, the health of the VMs built on this hypervisor.
The amount of uptime of the machines and the hardware hypervisor overloads itself, including memory, etc., are very complete reports that are obtained by monitoring hypervisors for network managers, and they can take appropriate actions in this regard.
As mentioned, any device that can support protocols such as SNMP, SSH, etc. can be monitored.
For example, hardware firewalls, routers, switches, NVR devices, servers, storages, etc. from any brand such as Cisco, QNAP, FortiGate, etc. can be added to the ZABBIX monitoring server and monitored.
As explained in the introduction of different sections of ZABBIX, there is an important section called Audit, where all the file changes in the systems are reported.
In addition, there is another important section where user activities are reported.
The activities include Login/Logoff, the number of failed attempts using incorrect Username & Password, the number of hours the user has worked, viewing the user’s web browsing activities, the files the user has edited or visited, and other user activities are reported in full detail.
This very useful feature can monitor and control all kinds of behavior, for example it can mount on your VoIP service and monitor all its activities.
To warn immediately if for example, a user’s SIP connection is disrupted.
Or, for example, it can be defined to warn if a user’s conversation exceeds a certain number. The scope of such behavior monitoring is very broad and detailed, which is one of the attractive features of RSMS.
QoS means service quality.
RSMS can do this quality control perfectly by using its very powerful feature.
QOS includes the stability of communication between two devices, packets that have reached their destination or have errors
(Packet Lost) is the delay in sending and receiving packets and Jitter, and RSMS monitors QoS well by measuring these items.
It is very important for network managers to know what kind of packets are sent and received from the routers and switches of the collections under their supervision. For example, sometimes it is necessary to monitor the amount of HTTP traffic, SMB traffic (Sharing), DNS traffic, etc. and prepare a complete report for it.
This feature is called NetFlow. (The function of this section works like the Wireshark software.) By using this feature, in addition to controlling the number of packets sent and received on the network card, the type of traffic is checked and based on this amount, the necessary engineering to improve or control the network can be done.
A link can be continuously monitored by introducing a link to RSMS. This link can be a link between physical sites of an Enterprise network (VPN links), monitor radios between sites and thoroughly monitor any link that makes communication between two points. This feature is widely used in the map design section, that in case of link failure or reduction, it will turn red.
Also, one of the other capabilities of RSMS which will be mentioned later, is the monitoring of website pages. Naturally, there are a number of links on each page to connect to the pages of other websites.
By introducing the URL of these links to RSMS, it can instantly monitor the health of these links.
It is possible to monitor and control any type of information package with any protocol that is running on the network platform by this RSMS feature. To be informed about their current status and according to the transaction history of those packages, the conduction status of the mentioned packages in the network is checked. And plan to improve its conductivity in the future.
Hardware health is very important for network administrators. From the smallest network devices such as cameras to the largest and most important ones such as servers, they need hardware health to continue their activities.
As mentioned in the Server Monitoring section, different parts of the hardware can be monitored.
For example, the chipsets used in the CPU can be controlled. In general, it can be said that any secondary device directly connected to the motherboard (such as external disks) can be monitored.
By introducing a software and process to RSMS, it can be monitored and controlled. It can also be defined that in case of software interruption and crash, the necessary and appropriate actions can be taken to launch that software.
The importance of this issue is doubled when we give users access to run software on the server based on App Virtualization & Sharing.
Then this software failure may disrupt the users’ work. Therefore, this important thing can be controlled with monitoring and necessary actions can be taken in case of any incident.
In the Users section, in the introduction of different sections of RSMS, it was mentioned that monitoring users and operators can be given the access to view different sections of RSMS. It is also possible to define for different people what level of notifications and warnings they should receive.
In addition, it is possible to determine what kind of notice they will receive, for example, you can determine a group to receive SMS or to be notified via Telegram. It is also possible to define rules for the access level of these operators to different RSMS panels.
In addition to capturing images on the monitors over the output network, RSMS can also be configured separately on monitors that are directly connected to RSMS itself. This feature becomes valuable when you want to display large and highly detailed maps on a large monitor.
Passing this huge amount of pixels on the network requires a lot of bandwidth, so for such comprehensive maps it is recommended to use monitors that are directly connected to RSMS.
RSMS can monitor webpages using HTTP traffic control. This control can report and monitor the amount of website visits, the amount of time people stay on the pages, the amount of packets sent and received to the website, the amount of cookies used, the volume of downloads and uploads by users on the website, the high control of various pages, etc.
One of the most important elements of any system is the central processing unit of that system. It should always be kept in mind that CPU performance monitoring is a priority.
Sometimes, the CPU load increases greatly due to the increase in system processing or software or service malfunctions in the operating system, and this may lead to the system crash.
The importance of this issue is understood when an online service (SAS) is being served on the server.
In case of increased CPU load, serious and costly damage can be prevented by appropriate actions.
There will be different items in RSMS, according to the protocol that establishes the communication between the Node and the monitoring server.
But it can be said that the overall work is the same and the differences are in the details.
As an example, we are going to use it to monitor the CPU installed on a Windows server.
The important items that RSMS has considered for CPU monitoring (by default), we describe a few items as examples:
- Context switches per second
It has the ability to display and report the switching rate of CPU threads between different queued processes. A switch between processors occurs when processes either terminate or there is a higher priority process to take over hardware resources.
- CPU interrupt time
The amount of time the CPU was unable to serve.
This can be monitored in the servers and check the reason why the CPU has faced this disorder.
This can be due to various reasons.
- CPU privileged time
The amount of CPU resources usage is when the CPU has spent its time processing the operating system itself.
For example, this value is used for kernel processing in Linux operating systems.
- CPU user time
It reports the amount of time the CPU has spent processing in User mode.
- CPU utilization
The most important factor of CPU monitoring is this case.
Using this item, you can monitor CPU usage in all situations and create a comprehensive report.
- Number of cores
Based on the real-time report, it can display the number of processor cores that are free and idle.
Using this feature, it is possible to control the amount of storage usage, IOpS, Read & Write value, and the remaining amount of storage space, etc.
Also, the entire disk is not monitored in RSMS.
But also controls and monitors each partition created on it separately.
Like the CPU, this item has various details, the most important of which are listed below:
- Disk read rate
Displays the read rate from storage.
This is very important and useful for systems that are used as storage.
- Disk write rate
The amount system storage on the storage space that is shared or dedicated to them.
- Disk utilization
Displays the amount of time the disk has been active. This activity includes Read and Write.
- Disk read & write request avg waiting time
Displays the average number of requests sent to the disk to read or write. Excessive system requests can be organized by monitoring this item.
Another main pillar of system stability is Memory.
The importance of this issue is doubled when we want to monitor Enterprise level systems such as Data Center where all hardware resources are used jointly with each other.
Weak memory performance can cause general interference in the performance and health of the system, especially the CPU.
For this reason, RSMS has paid special attention to memory monitoring.
In the following, we will deal with some important items that RSMS has monitored:
Displays the amount of memory that acts as a cache.
It should be noted that this value shows the final amount of Cache and is not an average value.
Free swap space
This value is of especial importance for Linux operating systems.
If this value exceeds the limit, it means that the amount of memory usage has increased greatly, and as a result, it may be due to the transfer of ongoing processes to the Swap space which has a much lower speed than Memory (because the Swap space is created on the disk.), it can make the system to slow down too much.
Displays the amount of memory usage as a percentage. This item is one of the most important features that is always monitored in control centers (NOC).
Memory pages per second
This feature shows the amount of reading or writing from the hard disk when the CPU needs to process.
The CPU may need to perform a procedure, so the RAM calls that process from the memory.
If this memory transfer does not take place, so to speak, our system will hang.
This amount is dangerous when it exceeds 1000 pages.
Free system page table entries
Page table is a data structure that is managed by virtual memory in the operating system.
The value of this item is the extent to which this data structure is not using currently.
If this value is less than 5000, it can be concluded that the memory has problems and functional interference.
One of the most useful items that can be controlled and monitored are the services that are running on nodes.
This issue is of a great importance. Also, for client-level operating systems, many services are critical for the operating system to continue. So, we got to know the importance of this matter.
In the following, we explain a limited number of the most important services in a Windows server operating system:
CryptSvc (Cryptographic Services)
It is one of the most important Windows services. The health and stability of this service guarantees the correct performance of other security services of operating system, including decoding and encoding of files, checking certificates, Windows update, driver installation, etc.
This service is one of the most vital services in the network world.
The essence of this service is to convert name to IP and vice versa. According to the records defined in DNS, this requirement is fulfilled.
If this service fails in LAN networks or on the Internet, then this can be considered catastrophic, because all people around the world must know the IP address of the destination they want to work with!
The need for the health of this service is at the highest level, so they should always be controlled and monitored.
Because the network world works based on 0 and 1, therefore the communication of all network devices is done through IP. Therefore, the DHCP service must work properly to be able to provide IP with DHCP Clients which request IP.
The accuracy and veracity of this system operation can prevent interference, such as assigning one IP to two clients, responding to IP requests from clients, etc.
Net Logon Service
Net Logon is one of the important services used to authenticate users in the domain.
The importance of this service is also very great so that users can utilize them at the Domain level.
– Windows Update Service
The reason for any type of operating system updates is to fix functional defects and security bugs.
Therefore, the systems must be updated in order to increase the efficiency of the systems and protection against the risks that threaten them in the network world.
So, this service is also very important to communicate with upstream servers and receive updates.
This service is used in Windows server to share files and printers over the network. If this service fails, then clients will not be able to use Share and other shared resources.
Windows Defender Firewall Service
This system is one of the services used to protect the system.
If this system fails, it is certain that our information will be compromised in the absence of other security parameters, and this can be very terrible for organizations that have confidential information!
So far, we have reviewed only a very few services and their health importance.
Therefore, the importance of monitoring is quite clear.
RSMS with excellent capabilities can monitor all these services, report their usage and even take necessary actions to restart these services if a problem occurs for any of them.
The number of these services is very huge, and we only described them within the scope of the Windows operating system! Therefore, it can be said that with the addition of Linux operating systems and…, what a huge amount of services can be controlled and monitored.
The hardware that has made the whole world connect with each other and formed the big world of Internet is network cards! The existential importance of this hardware is a sufficient reason to prove the necessity of monitoring it. RSMS has provided a complete package of items that can be monitored, which we will review and explain some of them below:
- Bits received & sent
It calculates the incoming and outgoing traffic to the network card in terms of Bits and provides their full report.
- Inbound & Outbound packets discarded
Displays the amount of packets dropped in inbound and outbound. One of the factors that makes the network
to have the best performance and optimal speed is to bring Packet loss to the minimum possible level. Therefore, we can fix them and help improve the network by monitoring this issue.
Send and Receive speed is very important. This can be monitored and necessary actions can be taken if the speed drops.
The magnitude of monitoring is not given in the written form, this claim is confirmed by dear customers who participated in the presentation meetings in person. And they got to know the countless capabilities of this smart system, all of which are gathered in a user-friendly and practical consul. And this is not the whole story, it is very important that this system is configured by Raha specialists so that it can reach the maximum efficiency.
The belief of Raha information technology experts is that the monitoring system in networks is like the power of vision. that neither the blind nor the sighted know their value, but a sighted person who has lost his sight understands the value of vision, so it is strongly recommended to experience the vision with RSMS smart monitoring system!