Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
First, the terms involved in the present invention will be explained:
GNU C Library: GNU C libraries, also known as glibc (GNU C runtime libraries), are a compiler of C issued according to LGPL license agreement, which is convenient to download from the network, GNU C runtime libraries, which are C function libraries, are some API sets used during program execution, which are typically pre-compiled, and exist in binary code form in Linux-like systems, and GNU C runtime libraries are typically issued as part of GNU C compiler. The most important application is to match with a Linux kernel, and becomes an important component of a GNU/Linux operating system. The most widely used C function library on the Linux platform is glibc, wherein the implementation of the C standard library is included, and all system functions are also included. Almost all C programs call the library functions of glibc, so glibc is the basis for the Linux platform C program to run. glibc provides a set of header files and a set of library files in which most basic, most commonly used C standard library functions and system functions are found, with almost all C programs running on libc.so, some mathematically calculated C programs on libm.so, and multithreaded C programs on librethread.so.
Deadlock: when the shared resources are accessed among the multiple threads, the operations of mutex locks, read-write locks, spin locks and the like for protecting the shared resources are added, but the problem that deadlock is easily caused by improper use of lock resources is caused, so that abnormal blocking of a program is caused, and the program cannot continue to work.
Dead cycle: the linked list operation, the loop operation, the recursion operation and the like all have the risk of dead loop, and the root cause of the dead loop may be a code basic logic problem, a memory access out-of-range problem, a memory data error, an improper protection problem of shared resources and the like.
Memory exception access: illegal operations such as memory out-of-range access, repeated release, wild pointer and the like can cause fatal influence on a program, and can cause problems such as business logic errors, stack abnormality and the like.
IO is slow: IO operation comprises disk IO, network IO and the like, synchronous IO operation can cause caller blocking, and under the scene that the time delay of IO operation is prolonged, unreasonable program design can cause service logic abnormality.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. In the following description of the embodiments, the meaning of "a plurality" is two and more, unless explicitly defined otherwise.
The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.
An embodiment of the invention provides a processing method of program exception, which is used for processing when the program is abnormal. The execution body of the embodiment is a processing apparatus for program exception, and the apparatus may be provided in an electronic device, which may be a server or other computer device that may be implemented.
As shown in fig. 1, a flow chart of a method for processing program exception according to the present embodiment is shown, where the method includes:
step 101, if the first service abnormality is detected, a first signal is sent to the first thread that survives currently, so that after the first thread receives the first signal, the thread identifier and call stack information of the first thread are stored.
Specifically, whether the business process in the program is abnormal can be detected through a watchdog thread or other modes, the watchdog thread can be an independent thread or can be realized in the main thread, key business to be monitored can be defined in practical application, and due to different functions realized by different programs, the definition of the key business is different, and the method can be specifically set according to practical requirements, such as heartbeat business among key components (whether heartbeat needs to be monitored normally or not), timer execution business (whether normal execution needs to be monitored or not), network IO blocking time (whether blocking time needs to be monitored excessively long or not) and the like.
Each process may include one or more threads, in practical application, a complete program and a process are not in one-to-one correspondence, one program may be executed by multiple processes, each process may also execute multiple programs, in the present invention, the processing of program exceptions targets a process, and may be considered as a sub-program in each process corresponding to a program, if a program is completely executed by a process, the sub-program corresponding to the process is the program itself, and for each process, the method of the present invention may be adopted to process.
The first service can be any service executed by a target process (which can be a process for executing any subprogram in a program), or any key service set according to actual requirements, the service is divided according to program logic functions, the target process comprises one or more threads, specifically can comprise a main thread and other threads, each process can start an independent watchdog thread, or the main thread executes a watchdog function, so that the detection of the service state is realized; if the first service abnormality is detected, a related SIGNAL (may be referred to as a first SIGNAL) may be sent to all threads (may be referred to as first threads) that survive currently in the target process through a system command, so as to inform each first thread to record its own thread identifier and call stack information, for example, the first SIGNAL may be a user-defined SIGNAL, and when the program is started, a SIGNAL management function corresponding to the first SIGNAL is registered to the system by calling a system SIGNAL function at a program entry, so that the system (such as a Linux system) does not adopt a default action, but adopts a user-defined action; the thread identifier of the first thread can be a thread ID or a thread name, and the call stack information of the thread mainly comprises one or more of file name, function offset address, file offset address, thread name, thread ID and other information, and can be specifically set according to actual requirements; the first traffic anomaly may include a traffic anomaly problem of endless loop, abnormal suspension, etc.
Optionally, the current surviving thread may be maintained by a thread global table or other implementable manner, for example, during the running process of the program, each thread registers its thread information into the thread global table when starting, and deletes its thread information from the thread global table before the thread exits, so that the thread information of the current surviving thread is maintained in real time in the thread global table.
Because call stack information of all threads can be recorded when the service is abnormal, subsequent related personnel can carry out stack backtracking through the call stack information, so that the root cause of the service abnormality can be more conveniently positioned, and the problem that the service abnormality cannot be positioned in the prior art is solved; in addition, the stack backtracking method in the prior art can only record the call stack information of the thread generating the abnormal signal, and the method can realize stack backtracking of the whole thread, thereby effectively improving the positioning analysis efficiency of the abnormal problem.
Step 102, restarting the target process to which the first service belongs after waiting for a preset time.
Specifically, since the service is abnormal, the target process cannot be executed normally, and thus the target process needs to be recovered, in order to ensure that each thread can record the thread identifier and call stack information, the watchdog thread needs to wait for a certain time to restart the target process to recover the abnormal service processing flow, and the preset time can be set according to actual requirements, for example, can be set to 1 second, 2 seconds and the like according to the system performance.
According to the program exception processing method, when the program is in an exception, all surviving threads in the process of executing the service are informed to record the thread identification and call stack information, so that stack backtracking of all threads can be realized, the root cause of the service exception can be more conveniently located, the exception problem locating analysis efficiency is improved, and the problem that the service exception cannot be located in the prior art is solved.
In order to make the technical scheme of the invention clearer, another embodiment of the invention further supplements the method provided by the embodiment.
As an implementation manner, in order to be able to detect whether related traffic in a program is abnormal, on the basis of the foregoing embodiment, optionally, the method may further include: the watchdog thread detects at regular time whether there is traffic anomaly.
Specifically, whether related services in a program are abnormal or not is detected regularly through a software watchdog, for example, watchdog monitoring can be set for some key services in the program, specifically, for each process of executing the program, an independent watchdog thread is set in the process or a watchdog function is realized by a main thread of the process, for the services of the program executed by the process, a certain watchdog feeding time can be set for each service, when the services are abnormal, the watchdog cannot feed the watchdog on time, a watchdog timer overflows to generate a reset signal, the watchdog can determine that the service is abnormal, namely, the first signal can be sent to the current surviving first thread, so that each first thread can timely output the thread identification and call stack information of the first thread to a log file, the positioning analysis of an abnormal problem is facilitated, after the watchdog thread sends the first signal to each first thread, the watchdog thread waits for a preset time, and after the first thread can finish recording the thread identification and call stack information, the watchdog thread can restart the target process, and the abnormal service can be recovered.
The watchdog thread can be a brand new independent thread started in the target process, can be directly realized in the main thread of the target process, and can be specifically set according to actual requirements.
Further, in order to maintain the current surviving thread situation in real time, on the basis of the above embodiment, optionally, if the first traffic anomaly is detected, a first signal is sent to the current surviving first thread, including:
if the first business abnormality is detected, traversing a thread global table, and determining a current surviving first thread, wherein the thread global table at least comprises the thread identification of each current surviving thread; for each first thread that is currently alive, a first signal is sent to the first thread.
Specifically, during the running process of the program, the current surviving thread can be maintained in real time through the thread global table, so that after the first service abnormality is monitored, the watchdog can traverse the thread global table to determine the current surviving first thread, and further send a first signal to each first thread, so that the first thread outputs the thread identification and call stack information thereof to the log file; the thread global table can be maintained in any practicable format according to actual requirements, and can be maintained in a memory of a process, wherein the thread global table at least comprises thread identifications of all currently surviving threads so that a watchdog thread can determine which threads need to be sent with a first signal; the thread identifier may be a thread ID or a thread name with uniqueness, and in order to ensure uniqueness of the thread name, the thread name with the uniqueness may be configured for each thread, and specifically may be set according to actual requirements.
Further, to ensure that the current surviving thread information can be maintained in real time in the thread global table, the method further includes:
After each thread is started, registering the thread information of the thread into a thread global table, wherein the thread information at least comprises a thread identifier; before each thread exits, its thread information is deleted from the thread global table.
Specifically, after a thread is started, firstly, thread information of the thread is added to a thread global table, so that the thread global table maintains that the thread is in a surviving state in real time, and before the thread is to be exited, the thread information of the thread is deleted from the thread global table and then is exited, so that the thread information maintained in the thread global table is the information of the current surviving thread, the existence of the non-surviving thread in the thread global table is avoided, and unnecessary work is added to a watchdog thread.
For example, as shown in fig. 2, a schematic thread information registration flow is provided for this embodiment, where pthread_ setname _np is used to name a thread, and pthread_t is a thread ID.
In one embodiment, to ensure normal use of the first signal, the method further comprises: when the target process is started, registering a signal processing function corresponding to the first signal, and initializing a thread global table.
Specifically, the first signal is a user-defined signal, in order to ensure that the first signal can exert a user-defined function in the system, a signal processing function of the first signal of the program itself needs to be registered to the system through a system function, specifically, when the target process is started, the main thread of the target process registers the signal processing function of the first signal at a program entry of the target process, for example, the first signal is a sigustr 1 signal, and registers the signal processing function of the sigustr 1 signal, so that the target process can tell the system kernel which function should be called through the signal processing function; in addition, it is also necessary to initialize the thread global table so that the thread can register its own thread information to the thread global table after the thread is started.
Illustratively, as shown in fig. 3, a main thread initialization flowchart of the process provided in this embodiment is provided.
As another embodiment, the method further comprises: and if the signal causing the program crash abnormality is detected, storing the call stack information of the current thread.
Specifically, for a signal that may cause a program crash exception, such as SIGSEGV, SIGABRT, the embodiment of the present invention may also perform positioning of an exception problem, specifically perform positioning by adopting a stack trace back manner, that is, if a system detects a program crash exception executed by a process, the process is interrupted, a related exception signal (such as SIGSEGV, SIGABRT) is sent to the process, and the current thread that generates the exception in the process outputs own call stack information to a log file, so that positioning analysis of a subsequent exception problem is performed, and specific principles are not repeated.
For signals that may cause program crash exceptions, in order to enable stack trace back, signal processing functions of these signals need to be registered with the system, and a specific detection manner for causing program crash exceptions is the prior art and will not be described herein.
The method of the invention can realize the positioning of the program crash exception problem, and can also realize the whole thread stack backtracking by combining with the watchdog thread, thereby realizing the positioning of the program crash exception problem, such as the problems of dead circulation, exception suspension and the like, while the existing watchdog proposal can only detect the restarting of the service exception but can not position the exception problem; in addition, the method of the invention can realize the stack backtracking of the whole thread, and the problems of overlarge occupied disk space and overlong service interruption time are not introduced when the stack information of all threads is recorded, so the method of the invention also solves the problems of overlarge occupied disk space and overlong service interruption time of the traditional core dump scheme.
As another embodiment, to determine the anomaly problem, the method further includes: after receiving the first signal, the first thread outputs the thread identification and call stack information to the log file; and positioning and analyzing the abnormal problem of the target process based on the thread identification and the call stack information in the log file.
Specifically, the log file may be a system log file, in the case of an abnormal problem, the thread identifier and the call stack information are output to the system log file, and then the abnormal problem of the target process can be positioned and analyzed based on the thread identifier and the call stack information in the system log file.
For call stack information, the thread can call backtrace (backtracking) functions to acquire the call stack information, store the call stack information in a pointer array, and convert the acquired call stack information into printable string information (comprising function names, function offset addresses and actual return values) through a series of functions (backtrace _ symbols, backtrace _symbols_fd) related to backtrace, thereby completing stack backtracking.
It should be noted that, in this embodiment, each of the embodiments may be implemented separately, or may be implemented in any combination without conflict, without limiting the invention.
The overall flow of the method of the present invention is described in an exemplary embodiment, as shown in fig. 4, which is an exemplary overall flow diagram of a method for handling program exceptions provided in this embodiment, and for each process of executing a program, the method specifically includes:
1. At process start-up, the program's own signal processing functions need to be registered before the thread is created.
2. A thread global table, such as a thread ID global table, is initialized.
3. And initializing and starting other business processes of the process.
Specifically, for each thread, after startup, the thread ID (such as pthread_t) of the thread may be named by pthread_ setname _np and registered to the thread ID global table, then the processing logic of the thread is entered, the thread information of the thread is deleted from the thread ID global table before the thread exits, and then the thread exits.
5. The watchdog thread regularly detects whether the traffic is abnormal.
6. When the business abnormality is detected, the watchdog thread traverses the thread ID global table, a SIGUSR1 signal is sent to each surviving first thread one by one through a system command pthread_kill, stack backtracking is carried out after the first thread receives the SIGUSR1 signal, namely, the thread information and the call stack information of the first thread are output to a system log file, and after traversing, the watchdog thread restarts a process after waiting for a preset time (such as 1 second) to restore the business. pthread_kill is a system command that can be used to send messages between multiple threads.
7. When a signal (SIGSEGV, SIGABRT, etc.) capable of causing program crash exception is detected, the current thread generating the exception outputs its own call stack information to the system log file, i.e. solves the exception problem by a stack trace back mode.
The specific operations of the above steps are described in detail in the foregoing, and are not repeated here.
According to the program exception processing method, the watchdog is combined with the full thread Cheng Zhan backtracking, so that the full thread stack backtracking of the business exception problem is realized, the exception problem positioning analysis efficiency is effectively improved, the current surviving threads are maintained in real time through the thread global table, and the realization of the full thread stack backtracking is effectively ensured.
Still another embodiment of the present invention provides a processing apparatus for processing a program exception, configured to execute the method of the foregoing embodiment.
Fig. 5 is a schematic structural diagram of a processing device for program exception according to the present embodiment. The apparatus 30 includes: a first processing module 31 and a second processing module 32.
The first processing module is used for sending a first signal to a first thread which survives at present if the first service abnormality is detected, so that the first thread stores the thread identification and call stack information after receiving the first signal; and the second processing module is used for restarting the target process to which the first service belongs after waiting for the preset time.
Specifically, the first processing module may be connected to a detection module for detecting whether the service is abnormal, the detection module may notify the first processing module after detecting that the first service is abnormal, and the first processing module sends a first signal to the first thread that survives currently, so that after the first thread receives the first signal, the first thread identifier and call stack information of the first thread are stored, the first processing module may notify the second processing module after sending the first signal to the first thread, and the second processing module restarts a subroutine to which the first service belongs after waiting for a preset time, that is, restarts the target process.
In practical application, the first processing module and the second processing module are respectively a first sub-module and a second sub-module in the watchdog module executed by the watchdog thread, that is, if the watchdog thread detects that the first service is abnormal, the first processing module is used for sending a first signal to the first thread which is alive at present, so that after the first thread receives the first signal, the thread identification and call stack information of the first thread are stored, and the watchdog thread restarts the target process after waiting for a preset time.
The specific manner in which the respective modules perform the operations in the apparatus of the present embodiment has been described in detail in the embodiments related to the method, and the same technical effects can be achieved, which will not be described in detail herein.
In order to make the device of the present invention clearer, a further embodiment of the present invention provides a further supplementary explanation of the device provided in the above embodiment.
As shown in fig. 6, an exemplary configuration diagram of a processing apparatus for program exception according to the present embodiment is provided.
As an implementation manner, in order to be able to detect whether related traffic in the program is abnormal, the apparatus optionally further includes a detection module 33 on the basis of the above embodiment.
The detection module is used for detecting whether the business abnormality exists or not at regular time.
Specifically, the first processing module is a first sub-module in the watchdog module, the second processing module is a second sub-module in the watchdog module, the detection module is a third sub-module in the watchdog module, the detection module detects whether the business is abnormal at regular time, if the first business is detected, the first processing module is informed of the first processing module, the first processing module sends a first signal to a first thread which is alive at present, after the first thread receives the first signal, the first thread identification and call stack information of the first thread are stored, the first processing module can inform the second processing module after sending the first signal to each first thread, and the second processing module restarts the target process after waiting for a preset time.
Further, in order to maintain the current surviving thread situation in real time, the first processing module is specifically configured to:
if the first business abnormality is detected, traversing a thread global table, and determining a current surviving first thread, wherein the thread global table at least comprises the thread identification of each current surviving thread; for each first thread that is currently alive, a first signal is sent to the first thread.
Specifically, the current surviving thread condition is maintained in real time in the system through the thread global table, and after the first service abnormality is detected, the first processing module can traverse the thread global table to determine the current surviving thread (i.e. the first thread) and send a first signal to each first thread; the specific first processing module sends a first signal to each first thread through a communication mode between the watchdog thread and other threads.
Further, to ensure that the current surviving thread information can be maintained in real time in the thread global table, the apparatus further includes: a thread information registration module 34 and a thread information deletion module 35.
The thread information registration module is used for registering the thread information of the thread after the thread is started into a thread global table, wherein the thread information at least comprises a thread identifier; and the thread information deleting module is used for deleting the thread information from the thread global table before the thread exits.
Specifically, each thread in the program may include a thread information registration module and a thread information deletion module, where after the thread is started, the thread information registration module registers thread information in a thread global table, and before the thread exits, the thread information deletion module deletes the thread information from the thread global table.
In one embodiment, to ensure proper use of the first signal, the apparatus further comprises:
and the registration module is used for registering the signal processing function corresponding to the first signal when the target process is started, and initializing the thread global table.
Specifically, when the target process is started, the registration module may register a signal processing function corresponding to the first signal with the system, and initialize a thread global table.
In some embodiments, the apparatus may further include a storage module configured to store a thread global table as another embodiment, and to improve security of the system, the apparatus further includes:
And the third processing module is used for storing the call stack information of the current thread if the signal causing the program crash exception is detected.
As another implementation manner, after receiving the first signal, the first thread outputs its thread identification and call stack information to the log file; correspondingly, the device also comprises: and the fourth processing module is used for carrying out positioning analysis on the abnormal problems of the target process based on the thread identification and the call stack information in the log file.
Specifically, each thread may include an output module, configured to output, after receiving the first signal, a thread identifier and call stack information of the first thread to a log file, and when positioning analysis is required, the fourth processing module displays the thread identifier and call stack information in the log file to a related person, where the related person performs positioning analysis on an abnormal problem of the target process.
It should be noted that, in this embodiment, each of the embodiments may be implemented separately, or may be implemented in any combination without conflict, without limiting the invention.
The specific manner in which the respective modules perform the operations in the apparatus of the present embodiment has been described in detail in the embodiments related to the method, and the same technical effects can be achieved, which will not be described in detail herein.
Still another embodiment of the present invention provides an electronic device configured to perform the method provided in the foregoing embodiment. The electronic device may be a server or other computer device that may be implemented.
Fig. 7 is a schematic structural diagram of an electronic device according to the present embodiment. The electronic device 50 includes: a memory 51, a transceiver 52, and at least one processor 53.
The processor, the memory and the transceiver are interconnected through a circuit; the memory stores computer-executable instructions; a transceiver for receiving configuration information of a user; at least one processor executes computer-executable instructions stored in a memory, causing the at least one processor to perform the method as provided in any one of the embodiments above.
Specifically, the configuration information may include relevant information required by the user to configure program execution such as a configuration rule of a thread name for a thread, the transceiver sends the configuration information to the processor, the processor stores the configuration information in a preset area, and the processor reads and executes computer execution instructions stored in the memory to implement the method provided in any embodiment above.
The electronic equipment can be applied to an exception handling scene of any program, can be used for positioning the problem that program crash exception is caused on the basis of positioning the problem that program crash exception is caused, such as deadlock, exception suspension and the like, is convenient for relevant personnel to analyze the root cause behind the hidden problem, realizes full line Cheng Zhan backtracking, effectively improves positioning analysis efficiency, and solves the problem that the prior art cannot position business exception without generating exception signals.
It should be noted that, the electronic device of this embodiment can implement the method provided in any of the foregoing embodiments, and can achieve the same technical effects, which is not described herein again.
Yet another embodiment of the present invention provides a computer-readable storage medium having stored therein computer-executable instructions that, when executed by a processor, implement a method as provided in any of the above embodiments.
It should be noted that, the computer readable storage medium of the present embodiment can implement the method provided in any of the above embodiments, and can achieve the same technical effects, which is not described herein.
In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the invention is limited only by the appended claims.