[go: up one dir, main page]

CN113220535B - Program exception processing method, device, equipment and storage medium - Google Patents

Program exception processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN113220535B
CN113220535B CN202110596062.3A CN202110596062A CN113220535B CN 113220535 B CN113220535 B CN 113220535B CN 202110596062 A CN202110596062 A CN 202110596062A CN 113220535 B CN113220535 B CN 113220535B
Authority
CN
China
Prior art keywords
thread
signal
exception
information
program
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110596062.3A
Other languages
Chinese (zh)
Other versions
CN113220535A (en
Inventor
陈志德
刘茂毅
王孟宽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mach Valley Technology Co ltd
Original Assignee
Beijing Mach Valley Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mach Valley Technology Co ltd filed Critical Beijing Mach Valley Technology Co ltd
Priority to CN202110596062.3A priority Critical patent/CN113220535B/en
Publication of CN113220535A publication Critical patent/CN113220535A/en
Application granted granted Critical
Publication of CN113220535B publication Critical patent/CN113220535B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1438Restarting or rejuvenating
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本发明提供一种程序异常的处理方法、装置、设备及存储介质,该方法包括:若检测到第一业务异常,则向当前存活的第一线程发送第一信号,以使所述第一线程接收到所述第一信号后,将其线程标识及调用栈信息进行存储;等待预设时间后重启所述第一业务所属的目标进程。通过在程序的业务异常时,通知程序内所有存活线程记录其线程标识及调用栈信息,从而可以实现全线程的栈回溯,更方便定位业务异常的根本原因,提高异常问题定位分析效率,解决了现有技术无法定位业务异常的问题。

The present invention provides a method, device, equipment and storage medium for processing program exceptions, the method comprising: if a first business exception is detected, a first signal is sent to a currently surviving first thread, so that after receiving the first signal, the first thread stores its thread identifier and call stack information; and restarts the target process to which the first business belongs after waiting for a preset time. When a program business exception occurs, all surviving threads in the program are notified to record their thread identifiers and call stack information, so that stack backtracing of the entire thread can be achieved, the root cause of the business exception can be located more conveniently, the efficiency of abnormal problem location analysis is improved, and the problem that the prior art cannot locate business exceptions is solved.

Description

Program exception processing method, device, equipment and storage medium
Technical Field
The present invention relates to the field of computers and communications technologies, and in particular, to a method, an apparatus, a device, and a storage medium for processing program exceptions.
Background
With the development of multi-core CPU technology, the multi-thread program with reasonable design can fully utilize the computing resources of the multi-core CPU, and improve the CPU utilization rate, so that the overall processing efficiency of the program is improved, the concurrency is improved, and the like, and therefore, the multi-thread program becomes a normal state of a server program.
In the actual program development process, program exception problems caused by unexpected exception scenes, such as deadlock, dead loop, memory exception access, slow IO and the like, inevitably exist; in order to locate and solve various program exception problems, in the prior art, a stack backtracking method is generally adopted, that is, when a program is started, a system key signal is registered and taken over, and when an exception signal is triggered, call stack information of a current thread is output to a log file through a stack backtracking function, so that a subsequent related person analyzes the exception stack information of the log file to locate a cause of program crash exception.
However, the stack trace-back method adopted in the prior art can only locate the abnormal problem that causes program crash, but cannot locate some business anomalies which are easy to exist in the program and do not generate abnormal signals, such as dead circulation, blocking suspension and the like.
Disclosure of Invention
The embodiment of the invention provides a processing method, a device, equipment and a storage medium for program exception, which are used for solving the problem that the prior art cannot locate service exception which does not generate an exception signal.
In a first aspect, an embodiment of the present invention provides a method for processing a program exception, including:
If the first business abnormality is detected, a first signal is sent to a first thread which survives at present, so that after the first thread receives the first signal, the thread identification and call stack information of the first thread are stored;
And restarting the target process to which the first service belongs after waiting for a preset time.
In a second aspect, an embodiment of the present invention provides a processing apparatus for program exception, including:
The first processing module is used for sending a first signal to a first thread which survives at present if the first service abnormality is detected, so that the first thread stores the thread identification and call stack information after receiving the first signal;
And the second processing module is used for restarting the target process to which the first service belongs after waiting for the preset time.
In a third aspect, an embodiment of the present invention provides an electronic device, including: a memory, a transceiver, and at least one processor;
the processor, the memory and the transceiver are interconnected by a circuit;
the memory stores computer-executable instructions; the transceiver is used for receiving configuration information of a user;
The at least one processor executes the computer-executable instructions stored by the memory such that the at least one processor performs the method as described above in the first aspect and the various possible designs of the first aspect.
In a fourth aspect, embodiments of the present invention provide a computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, implement the method as described above in the first aspect and the various possible designs of the first aspect.
According to the program exception processing method, device, equipment and storage medium, when the program is abnormal in service, all surviving threads in the program are informed to record the thread identification and call stack information, so that stack backtracking of all threads can be realized, the root cause of the service exception can be more conveniently located, the exception problem locating analysis efficiency is improved, and the problem that the service exception cannot be located in the prior art is solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart illustrating a method for handling program exceptions according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a thread information registration process according to an embodiment of the present invention;
FIG. 3 is a main thread initialization flowchart of a process according to one embodiment of the present invention;
FIG. 4 is a flowchart illustrating an exemplary overall procedure of a method for handling program exceptions according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a device for handling program exceptions according to an embodiment of the present invention;
FIG. 6 is a schematic diagram illustrating an exemplary configuration of a device for handling program exceptions according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Specific embodiments of the present invention have been shown by way of the above drawings and will be described in more detail below. The drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but rather to illustrate the inventive concepts to those skilled in the art by reference to the specific embodiments.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
First, the terms involved in the present invention will be explained:
GNU C Library: GNU C libraries, also known as glibc (GNU C runtime libraries), are a compiler of C issued according to LGPL license agreement, which is convenient to download from the network, GNU C runtime libraries, which are C function libraries, are some API sets used during program execution, which are typically pre-compiled, and exist in binary code form in Linux-like systems, and GNU C runtime libraries are typically issued as part of GNU C compiler. The most important application is to match with a Linux kernel, and becomes an important component of a GNU/Linux operating system. The most widely used C function library on the Linux platform is glibc, wherein the implementation of the C standard library is included, and all system functions are also included. Almost all C programs call the library functions of glibc, so glibc is the basis for the Linux platform C program to run. glibc provides a set of header files and a set of library files in which most basic, most commonly used C standard library functions and system functions are found, with almost all C programs running on libc.so, some mathematically calculated C programs on libm.so, and multithreaded C programs on librethread.so.
Deadlock: when the shared resources are accessed among the multiple threads, the operations of mutex locks, read-write locks, spin locks and the like for protecting the shared resources are added, but the problem that deadlock is easily caused by improper use of lock resources is caused, so that abnormal blocking of a program is caused, and the program cannot continue to work.
Dead cycle: the linked list operation, the loop operation, the recursion operation and the like all have the risk of dead loop, and the root cause of the dead loop may be a code basic logic problem, a memory access out-of-range problem, a memory data error, an improper protection problem of shared resources and the like.
Memory exception access: illegal operations such as memory out-of-range access, repeated release, wild pointer and the like can cause fatal influence on a program, and can cause problems such as business logic errors, stack abnormality and the like.
IO is slow: IO operation comprises disk IO, network IO and the like, synchronous IO operation can cause caller blocking, and under the scene that the time delay of IO operation is prolonged, unreasonable program design can cause service logic abnormality.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. In the following description of the embodiments, the meaning of "a plurality" is two and more, unless explicitly defined otherwise.
The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.
An embodiment of the invention provides a processing method of program exception, which is used for processing when the program is abnormal. The execution body of the embodiment is a processing apparatus for program exception, and the apparatus may be provided in an electronic device, which may be a server or other computer device that may be implemented.
As shown in fig. 1, a flow chart of a method for processing program exception according to the present embodiment is shown, where the method includes:
step 101, if the first service abnormality is detected, a first signal is sent to the first thread that survives currently, so that after the first thread receives the first signal, the thread identifier and call stack information of the first thread are stored.
Specifically, whether the business process in the program is abnormal can be detected through a watchdog thread or other modes, the watchdog thread can be an independent thread or can be realized in the main thread, key business to be monitored can be defined in practical application, and due to different functions realized by different programs, the definition of the key business is different, and the method can be specifically set according to practical requirements, such as heartbeat business among key components (whether heartbeat needs to be monitored normally or not), timer execution business (whether normal execution needs to be monitored or not), network IO blocking time (whether blocking time needs to be monitored excessively long or not) and the like.
Each process may include one or more threads, in practical application, a complete program and a process are not in one-to-one correspondence, one program may be executed by multiple processes, each process may also execute multiple programs, in the present invention, the processing of program exceptions targets a process, and may be considered as a sub-program in each process corresponding to a program, if a program is completely executed by a process, the sub-program corresponding to the process is the program itself, and for each process, the method of the present invention may be adopted to process.
The first service can be any service executed by a target process (which can be a process for executing any subprogram in a program), or any key service set according to actual requirements, the service is divided according to program logic functions, the target process comprises one or more threads, specifically can comprise a main thread and other threads, each process can start an independent watchdog thread, or the main thread executes a watchdog function, so that the detection of the service state is realized; if the first service abnormality is detected, a related SIGNAL (may be referred to as a first SIGNAL) may be sent to all threads (may be referred to as first threads) that survive currently in the target process through a system command, so as to inform each first thread to record its own thread identifier and call stack information, for example, the first SIGNAL may be a user-defined SIGNAL, and when the program is started, a SIGNAL management function corresponding to the first SIGNAL is registered to the system by calling a system SIGNAL function at a program entry, so that the system (such as a Linux system) does not adopt a default action, but adopts a user-defined action; the thread identifier of the first thread can be a thread ID or a thread name, and the call stack information of the thread mainly comprises one or more of file name, function offset address, file offset address, thread name, thread ID and other information, and can be specifically set according to actual requirements; the first traffic anomaly may include a traffic anomaly problem of endless loop, abnormal suspension, etc.
Optionally, the current surviving thread may be maintained by a thread global table or other implementable manner, for example, during the running process of the program, each thread registers its thread information into the thread global table when starting, and deletes its thread information from the thread global table before the thread exits, so that the thread information of the current surviving thread is maintained in real time in the thread global table.
Because call stack information of all threads can be recorded when the service is abnormal, subsequent related personnel can carry out stack backtracking through the call stack information, so that the root cause of the service abnormality can be more conveniently positioned, and the problem that the service abnormality cannot be positioned in the prior art is solved; in addition, the stack backtracking method in the prior art can only record the call stack information of the thread generating the abnormal signal, and the method can realize stack backtracking of the whole thread, thereby effectively improving the positioning analysis efficiency of the abnormal problem.
Step 102, restarting the target process to which the first service belongs after waiting for a preset time.
Specifically, since the service is abnormal, the target process cannot be executed normally, and thus the target process needs to be recovered, in order to ensure that each thread can record the thread identifier and call stack information, the watchdog thread needs to wait for a certain time to restart the target process to recover the abnormal service processing flow, and the preset time can be set according to actual requirements, for example, can be set to 1 second, 2 seconds and the like according to the system performance.
According to the program exception processing method, when the program is in an exception, all surviving threads in the process of executing the service are informed to record the thread identification and call stack information, so that stack backtracking of all threads can be realized, the root cause of the service exception can be more conveniently located, the exception problem locating analysis efficiency is improved, and the problem that the service exception cannot be located in the prior art is solved.
In order to make the technical scheme of the invention clearer, another embodiment of the invention further supplements the method provided by the embodiment.
As an implementation manner, in order to be able to detect whether related traffic in a program is abnormal, on the basis of the foregoing embodiment, optionally, the method may further include: the watchdog thread detects at regular time whether there is traffic anomaly.
Specifically, whether related services in a program are abnormal or not is detected regularly through a software watchdog, for example, watchdog monitoring can be set for some key services in the program, specifically, for each process of executing the program, an independent watchdog thread is set in the process or a watchdog function is realized by a main thread of the process, for the services of the program executed by the process, a certain watchdog feeding time can be set for each service, when the services are abnormal, the watchdog cannot feed the watchdog on time, a watchdog timer overflows to generate a reset signal, the watchdog can determine that the service is abnormal, namely, the first signal can be sent to the current surviving first thread, so that each first thread can timely output the thread identification and call stack information of the first thread to a log file, the positioning analysis of an abnormal problem is facilitated, after the watchdog thread sends the first signal to each first thread, the watchdog thread waits for a preset time, and after the first thread can finish recording the thread identification and call stack information, the watchdog thread can restart the target process, and the abnormal service can be recovered.
The watchdog thread can be a brand new independent thread started in the target process, can be directly realized in the main thread of the target process, and can be specifically set according to actual requirements.
Further, in order to maintain the current surviving thread situation in real time, on the basis of the above embodiment, optionally, if the first traffic anomaly is detected, a first signal is sent to the current surviving first thread, including:
if the first business abnormality is detected, traversing a thread global table, and determining a current surviving first thread, wherein the thread global table at least comprises the thread identification of each current surviving thread; for each first thread that is currently alive, a first signal is sent to the first thread.
Specifically, during the running process of the program, the current surviving thread can be maintained in real time through the thread global table, so that after the first service abnormality is monitored, the watchdog can traverse the thread global table to determine the current surviving first thread, and further send a first signal to each first thread, so that the first thread outputs the thread identification and call stack information thereof to the log file; the thread global table can be maintained in any practicable format according to actual requirements, and can be maintained in a memory of a process, wherein the thread global table at least comprises thread identifications of all currently surviving threads so that a watchdog thread can determine which threads need to be sent with a first signal; the thread identifier may be a thread ID or a thread name with uniqueness, and in order to ensure uniqueness of the thread name, the thread name with the uniqueness may be configured for each thread, and specifically may be set according to actual requirements.
Further, to ensure that the current surviving thread information can be maintained in real time in the thread global table, the method further includes:
After each thread is started, registering the thread information of the thread into a thread global table, wherein the thread information at least comprises a thread identifier; before each thread exits, its thread information is deleted from the thread global table.
Specifically, after a thread is started, firstly, thread information of the thread is added to a thread global table, so that the thread global table maintains that the thread is in a surviving state in real time, and before the thread is to be exited, the thread information of the thread is deleted from the thread global table and then is exited, so that the thread information maintained in the thread global table is the information of the current surviving thread, the existence of the non-surviving thread in the thread global table is avoided, and unnecessary work is added to a watchdog thread.
For example, as shown in fig. 2, a schematic thread information registration flow is provided for this embodiment, where pthread_ setname _np is used to name a thread, and pthread_t is a thread ID.
In one embodiment, to ensure normal use of the first signal, the method further comprises: when the target process is started, registering a signal processing function corresponding to the first signal, and initializing a thread global table.
Specifically, the first signal is a user-defined signal, in order to ensure that the first signal can exert a user-defined function in the system, a signal processing function of the first signal of the program itself needs to be registered to the system through a system function, specifically, when the target process is started, the main thread of the target process registers the signal processing function of the first signal at a program entry of the target process, for example, the first signal is a sigustr 1 signal, and registers the signal processing function of the sigustr 1 signal, so that the target process can tell the system kernel which function should be called through the signal processing function; in addition, it is also necessary to initialize the thread global table so that the thread can register its own thread information to the thread global table after the thread is started.
Illustratively, as shown in fig. 3, a main thread initialization flowchart of the process provided in this embodiment is provided.
As another embodiment, the method further comprises: and if the signal causing the program crash abnormality is detected, storing the call stack information of the current thread.
Specifically, for a signal that may cause a program crash exception, such as SIGSEGV, SIGABRT, the embodiment of the present invention may also perform positioning of an exception problem, specifically perform positioning by adopting a stack trace back manner, that is, if a system detects a program crash exception executed by a process, the process is interrupted, a related exception signal (such as SIGSEGV, SIGABRT) is sent to the process, and the current thread that generates the exception in the process outputs own call stack information to a log file, so that positioning analysis of a subsequent exception problem is performed, and specific principles are not repeated.
For signals that may cause program crash exceptions, in order to enable stack trace back, signal processing functions of these signals need to be registered with the system, and a specific detection manner for causing program crash exceptions is the prior art and will not be described herein.
The method of the invention can realize the positioning of the program crash exception problem, and can also realize the whole thread stack backtracking by combining with the watchdog thread, thereby realizing the positioning of the program crash exception problem, such as the problems of dead circulation, exception suspension and the like, while the existing watchdog proposal can only detect the restarting of the service exception but can not position the exception problem; in addition, the method of the invention can realize the stack backtracking of the whole thread, and the problems of overlarge occupied disk space and overlong service interruption time are not introduced when the stack information of all threads is recorded, so the method of the invention also solves the problems of overlarge occupied disk space and overlong service interruption time of the traditional core dump scheme.
As another embodiment, to determine the anomaly problem, the method further includes: after receiving the first signal, the first thread outputs the thread identification and call stack information to the log file; and positioning and analyzing the abnormal problem of the target process based on the thread identification and the call stack information in the log file.
Specifically, the log file may be a system log file, in the case of an abnormal problem, the thread identifier and the call stack information are output to the system log file, and then the abnormal problem of the target process can be positioned and analyzed based on the thread identifier and the call stack information in the system log file.
For call stack information, the thread can call backtrace (backtracking) functions to acquire the call stack information, store the call stack information in a pointer array, and convert the acquired call stack information into printable string information (comprising function names, function offset addresses and actual return values) through a series of functions (backtrace _ symbols, backtrace _symbols_fd) related to backtrace, thereby completing stack backtracking.
It should be noted that, in this embodiment, each of the embodiments may be implemented separately, or may be implemented in any combination without conflict, without limiting the invention.
The overall flow of the method of the present invention is described in an exemplary embodiment, as shown in fig. 4, which is an exemplary overall flow diagram of a method for handling program exceptions provided in this embodiment, and for each process of executing a program, the method specifically includes:
1. At process start-up, the program's own signal processing functions need to be registered before the thread is created.
2. A thread global table, such as a thread ID global table, is initialized.
3. And initializing and starting other business processes of the process.
Specifically, for each thread, after startup, the thread ID (such as pthread_t) of the thread may be named by pthread_ setname _np and registered to the thread ID global table, then the processing logic of the thread is entered, the thread information of the thread is deleted from the thread ID global table before the thread exits, and then the thread exits.
5. The watchdog thread regularly detects whether the traffic is abnormal.
6. When the business abnormality is detected, the watchdog thread traverses the thread ID global table, a SIGUSR1 signal is sent to each surviving first thread one by one through a system command pthread_kill, stack backtracking is carried out after the first thread receives the SIGUSR1 signal, namely, the thread information and the call stack information of the first thread are output to a system log file, and after traversing, the watchdog thread restarts a process after waiting for a preset time (such as 1 second) to restore the business. pthread_kill is a system command that can be used to send messages between multiple threads.
7. When a signal (SIGSEGV, SIGABRT, etc.) capable of causing program crash exception is detected, the current thread generating the exception outputs its own call stack information to the system log file, i.e. solves the exception problem by a stack trace back mode.
The specific operations of the above steps are described in detail in the foregoing, and are not repeated here.
According to the program exception processing method, the watchdog is combined with the full thread Cheng Zhan backtracking, so that the full thread stack backtracking of the business exception problem is realized, the exception problem positioning analysis efficiency is effectively improved, the current surviving threads are maintained in real time through the thread global table, and the realization of the full thread stack backtracking is effectively ensured.
Still another embodiment of the present invention provides a processing apparatus for processing a program exception, configured to execute the method of the foregoing embodiment.
Fig. 5 is a schematic structural diagram of a processing device for program exception according to the present embodiment. The apparatus 30 includes: a first processing module 31 and a second processing module 32.
The first processing module is used for sending a first signal to a first thread which survives at present if the first service abnormality is detected, so that the first thread stores the thread identification and call stack information after receiving the first signal; and the second processing module is used for restarting the target process to which the first service belongs after waiting for the preset time.
Specifically, the first processing module may be connected to a detection module for detecting whether the service is abnormal, the detection module may notify the first processing module after detecting that the first service is abnormal, and the first processing module sends a first signal to the first thread that survives currently, so that after the first thread receives the first signal, the first thread identifier and call stack information of the first thread are stored, the first processing module may notify the second processing module after sending the first signal to the first thread, and the second processing module restarts a subroutine to which the first service belongs after waiting for a preset time, that is, restarts the target process.
In practical application, the first processing module and the second processing module are respectively a first sub-module and a second sub-module in the watchdog module executed by the watchdog thread, that is, if the watchdog thread detects that the first service is abnormal, the first processing module is used for sending a first signal to the first thread which is alive at present, so that after the first thread receives the first signal, the thread identification and call stack information of the first thread are stored, and the watchdog thread restarts the target process after waiting for a preset time.
The specific manner in which the respective modules perform the operations in the apparatus of the present embodiment has been described in detail in the embodiments related to the method, and the same technical effects can be achieved, which will not be described in detail herein.
In order to make the device of the present invention clearer, a further embodiment of the present invention provides a further supplementary explanation of the device provided in the above embodiment.
As shown in fig. 6, an exemplary configuration diagram of a processing apparatus for program exception according to the present embodiment is provided.
As an implementation manner, in order to be able to detect whether related traffic in the program is abnormal, the apparatus optionally further includes a detection module 33 on the basis of the above embodiment.
The detection module is used for detecting whether the business abnormality exists or not at regular time.
Specifically, the first processing module is a first sub-module in the watchdog module, the second processing module is a second sub-module in the watchdog module, the detection module is a third sub-module in the watchdog module, the detection module detects whether the business is abnormal at regular time, if the first business is detected, the first processing module is informed of the first processing module, the first processing module sends a first signal to a first thread which is alive at present, after the first thread receives the first signal, the first thread identification and call stack information of the first thread are stored, the first processing module can inform the second processing module after sending the first signal to each first thread, and the second processing module restarts the target process after waiting for a preset time.
Further, in order to maintain the current surviving thread situation in real time, the first processing module is specifically configured to:
if the first business abnormality is detected, traversing a thread global table, and determining a current surviving first thread, wherein the thread global table at least comprises the thread identification of each current surviving thread; for each first thread that is currently alive, a first signal is sent to the first thread.
Specifically, the current surviving thread condition is maintained in real time in the system through the thread global table, and after the first service abnormality is detected, the first processing module can traverse the thread global table to determine the current surviving thread (i.e. the first thread) and send a first signal to each first thread; the specific first processing module sends a first signal to each first thread through a communication mode between the watchdog thread and other threads.
Further, to ensure that the current surviving thread information can be maintained in real time in the thread global table, the apparatus further includes: a thread information registration module 34 and a thread information deletion module 35.
The thread information registration module is used for registering the thread information of the thread after the thread is started into a thread global table, wherein the thread information at least comprises a thread identifier; and the thread information deleting module is used for deleting the thread information from the thread global table before the thread exits.
Specifically, each thread in the program may include a thread information registration module and a thread information deletion module, where after the thread is started, the thread information registration module registers thread information in a thread global table, and before the thread exits, the thread information deletion module deletes the thread information from the thread global table.
In one embodiment, to ensure proper use of the first signal, the apparatus further comprises:
and the registration module is used for registering the signal processing function corresponding to the first signal when the target process is started, and initializing the thread global table.
Specifically, when the target process is started, the registration module may register a signal processing function corresponding to the first signal with the system, and initialize a thread global table.
In some embodiments, the apparatus may further include a storage module configured to store a thread global table as another embodiment, and to improve security of the system, the apparatus further includes:
And the third processing module is used for storing the call stack information of the current thread if the signal causing the program crash exception is detected.
As another implementation manner, after receiving the first signal, the first thread outputs its thread identification and call stack information to the log file; correspondingly, the device also comprises: and the fourth processing module is used for carrying out positioning analysis on the abnormal problems of the target process based on the thread identification and the call stack information in the log file.
Specifically, each thread may include an output module, configured to output, after receiving the first signal, a thread identifier and call stack information of the first thread to a log file, and when positioning analysis is required, the fourth processing module displays the thread identifier and call stack information in the log file to a related person, where the related person performs positioning analysis on an abnormal problem of the target process.
It should be noted that, in this embodiment, each of the embodiments may be implemented separately, or may be implemented in any combination without conflict, without limiting the invention.
The specific manner in which the respective modules perform the operations in the apparatus of the present embodiment has been described in detail in the embodiments related to the method, and the same technical effects can be achieved, which will not be described in detail herein.
Still another embodiment of the present invention provides an electronic device configured to perform the method provided in the foregoing embodiment. The electronic device may be a server or other computer device that may be implemented.
Fig. 7 is a schematic structural diagram of an electronic device according to the present embodiment. The electronic device 50 includes: a memory 51, a transceiver 52, and at least one processor 53.
The processor, the memory and the transceiver are interconnected through a circuit; the memory stores computer-executable instructions; a transceiver for receiving configuration information of a user; at least one processor executes computer-executable instructions stored in a memory, causing the at least one processor to perform the method as provided in any one of the embodiments above.
Specifically, the configuration information may include relevant information required by the user to configure program execution such as a configuration rule of a thread name for a thread, the transceiver sends the configuration information to the processor, the processor stores the configuration information in a preset area, and the processor reads and executes computer execution instructions stored in the memory to implement the method provided in any embodiment above.
The electronic equipment can be applied to an exception handling scene of any program, can be used for positioning the problem that program crash exception is caused on the basis of positioning the problem that program crash exception is caused, such as deadlock, exception suspension and the like, is convenient for relevant personnel to analyze the root cause behind the hidden problem, realizes full line Cheng Zhan backtracking, effectively improves positioning analysis efficiency, and solves the problem that the prior art cannot position business exception without generating exception signals.
It should be noted that, the electronic device of this embodiment can implement the method provided in any of the foregoing embodiments, and can achieve the same technical effects, which is not described herein again.
Yet another embodiment of the present invention provides a computer-readable storage medium having stored therein computer-executable instructions that, when executed by a processor, implement a method as provided in any of the above embodiments.
It should be noted that, the computer readable storage medium of the present embodiment can implement the method provided in any of the above embodiments, and can achieve the same technical effects, which is not described herein.
In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (8)

1.一种程序异常的处理方法,其特征在于,包括:1. A method for processing program exceptions, comprising: 看门狗线程定时检测是否存在业务异常;The watchdog thread periodically detects whether there are business anomalies; 若检测到第一业务异常,则遍历线程全局表,确定当前存活的第一线程,所述线程全局表中至少包括当前存活的各线程的线程标识;针对当前存活的每个第一线程,向所述第一线程发送用户自定义的第一信号,以使所述第一线程接收到所述第一信号后,将其线程标识及调用栈信息进行存储,所述第一业务异常为不会导致程序崩溃而不产生异常信号的业务异常;If a first business exception is detected, a thread global table is traversed to determine a currently surviving first thread, wherein the thread global table includes at least thread identifiers of each currently surviving thread; for each currently surviving first thread, a user-defined first signal is sent to the first thread, so that the first thread stores its thread identifier and call stack information after receiving the first signal, wherein the first business exception is a business exception that does not cause a program crash and does not generate an exception signal; 所述看门狗线程等待预设时间后重启所述第一业务所属的目标进程。The watchdog thread restarts the target process to which the first service belongs after waiting for a preset time. 2.根据权利要求1所述的方法,其特征在于,所述方法还包括:2. The method according to claim 1, characterized in that the method further comprises: 每个线程启动后,将其线程信息注册到所述线程全局表中,线程信息至少包括线程标识;After each thread is started, its thread information is registered in the thread global table, where the thread information at least includes a thread identifier; 每个线程退出前,将其线程信息从所述线程全局表中删除。Before each thread exits, its thread information is deleted from the thread global table. 3.根据权利要求1所述的方法,其特征在于,所述方法还包括:3. The method according to claim 1, characterized in that the method further comprises: 在所述目标进程启动时,注册所述第一信号对应的信号处理函数,并初始化所述线程全局表。When the target process is started, a signal processing function corresponding to the first signal is registered, and the thread global table is initialized. 4.根据权利要求1-3任一项所述的方法,其特征在于,所述方法还包括:4. The method according to any one of claims 1 to 3, characterized in that the method further comprises: 若检测到引发程序崩溃异常的信号,则将当前线程的调用栈信息进行存储。If a signal causing a program crash exception is detected, the call stack information of the current thread is stored. 5.一种程序异常的处理装置,其特征在于,包括:看门狗模块,用于执行看门狗线程;所述看门狗模块具体包括:5. A program exception processing device, characterized in that it includes: a watchdog module, which is used to execute a watchdog thread; the watchdog module specifically includes: 检测模块,用于看门狗线程定时检测是否存在业务异常;The detection module is used by the watchdog thread to periodically detect whether there are business anomalies; 第一处理模块,用于若看门狗线程检测到第一业务异常,则遍历线程全局表,确定当前存活的第一线程,所述线程全局表中至少包括当前存活的各线程的线程标识;以及针对当前存活的每个第一线程,向所述第一线程发送第一信号,以使所述第一线程接收到所述第一信号后,将其线程标识及调用栈信息进行存储,所述第一业务异常为不会导致程序崩溃而不产生异常信号的业务异常;A first processing module is configured to, if the watchdog thread detects a first business exception, traverse a thread global table to determine a currently surviving first thread, wherein the thread global table at least includes thread identifiers of each currently surviving thread; and send a first signal to each currently surviving first thread, so that the first thread stores its thread identifier and call stack information after receiving the first signal, wherein the first business exception is a business exception that does not cause a program crash and does not generate an exception signal; 第二处理模块,用于所述看门狗线程等待预设时间后重启所述第一业务所属的目标进程。The second processing module is used for the watchdog thread to restart the target process to which the first service belongs after waiting for a preset time. 6.根据权利要求5所述的装置,其特征在于,所述装置还包括:6. The device according to claim 5, characterized in that the device further comprises: 线程信息注册模块,用于线程启动后,将其线程信息注册到所述线程全局表中,线程信息至少包括线程标识;A thread information registration module, used to register the thread information of a thread into the thread global table after the thread is started, wherein the thread information at least includes a thread identifier; 线程信息删除模块,用于线程退出前,将其线程信息从所述线程全局表中删除。The thread information deletion module is used to delete the thread information from the thread global table before the thread exits. 7.一种电子设备,其特征在于,包括:存储器、收发器及至少一个处理器;7. An electronic device, comprising: a memory, a transceiver and at least one processor; 所述处理器、所述存储器与所述收发器通过电路互联;The processor, the memory and the transceiver are interconnected via a circuit; 所述存储器存储计算机执行指令;所述收发器,用于接收用户的配置信息;The memory stores computer-executable instructions; the transceiver is used to receive user configuration information; 所述至少一个处理器执行所述存储器存储的计算机执行指令,使得所述至少一个处理器执行如权利要求1-4任一项所述的方法。The at least one processor executes the computer-executable instructions stored in the memory, so that the at least one processor performs the method according to any one of claims 1 to 4. 8.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如权利要求1-4任一项所述的方法。8. A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer-executable instructions, and when a processor executes the computer-executable instructions, the method according to any one of claims 1 to 4 is implemented.
CN202110596062.3A 2021-05-31 2021-05-31 Program exception processing method, device, equipment and storage medium Active CN113220535B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110596062.3A CN113220535B (en) 2021-05-31 2021-05-31 Program exception processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110596062.3A CN113220535B (en) 2021-05-31 2021-05-31 Program exception processing method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113220535A CN113220535A (en) 2021-08-06
CN113220535B true CN113220535B (en) 2024-11-22

Family

ID=77099271

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110596062.3A Active CN113220535B (en) 2021-05-31 2021-05-31 Program exception processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113220535B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114237825A (en) * 2021-12-07 2022-03-25 航天科技控股集团股份有限公司 The Method of Recording Program Abnormal Data by Full LCD Meter
CN114003470B (en) * 2021-12-30 2022-04-08 北京中科网威信息技术有限公司 User mode process exception handling method, device, equipment and medium
CN114817006B (en) * 2022-04-08 2024-08-20 抖音视界有限公司 Stack information writing method, device, equipment and medium
CN118519836A (en) * 2023-02-20 2024-08-20 华为技术有限公司 Method, device, equipment and storage medium for detecting equipment performance
CN116450367B (en) * 2023-06-19 2023-09-12 建信金融科技有限责任公司 Data processing method, device, equipment and storage medium
CN118113509B (en) * 2024-04-26 2024-08-06 阿里云计算有限公司 System fault detection method
CN119621395A (en) * 2024-11-25 2025-03-14 深圳市芯睿视科技有限公司 Program abnormal crash positioning method, electronic device and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920195A (en) * 2018-06-19 2018-11-30 Oppo(重庆)智能科技有限公司 starting processing method and related product
CN109002694A (en) * 2018-06-08 2018-12-14 广东小天才科技有限公司 Method and device for positioning problem point after application code confusion

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582486B (en) * 2018-11-20 2023-04-28 厦门科灿信息技术有限公司 Watchdog monitoring method, system and device and storage medium
CN109614290A (en) * 2018-12-10 2019-04-12 苏州思必驰信息科技有限公司 Process exception information recording method and system in container

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109002694A (en) * 2018-06-08 2018-12-14 广东小天才科技有限公司 Method and device for positioning problem point after application code confusion
CN108920195A (en) * 2018-06-19 2018-11-30 Oppo(重庆)智能科技有限公司 starting processing method and related product

Also Published As

Publication number Publication date
CN113220535A (en) 2021-08-06

Similar Documents

Publication Publication Date Title
CN113220535B (en) Program exception processing method, device, equipment and storage medium
Guo et al. Rex: Replication at the speed of multi-core
US8726225B2 (en) Testing of a software system using instrumentation at a logging module
CN106790694B (en) Distributed system and scheduling method of target object in distributed system
US8370841B2 (en) Optimizing deterministic event record and replay operations
Dean et al. Perfcompass: Online performance anomaly fault localization and inference in infrastructure-as-a-service clouds
US8661450B2 (en) Deadlock detection for parallel programs
US8117600B1 (en) System and method for detecting in-line synchronization primitives in binary applications
Yuan et al. Effective concurrency testing for distributed systems
US20190114248A1 (en) Defeating deadlocks in production software
US20120222051A1 (en) Shared resource access verification
CN105074656B (en) The method and apparatus for managing concurrent predicate expressions
CN110799952A (en) Distributed time travel tracking recording and playback
US9864708B2 (en) Safely discovering secure monitors and hypervisor implementations in systems operable at multiple hierarchical privilege levels
US9535772B2 (en) Creating a communication channel between different privilege levels using wait-for-event instruction in systems operable at multiple levels hierarchical privilege levels
CN105683985A (en) Virtual machine introspection
CN111931191A (en) Method and system for dynamic detection of heap overflow vulnerability in binary software of Linux platform
CN116962017A (en) Windows system callback detection method and system based on PIN instrumentation
US7752497B2 (en) Method and system to detect errors in computer systems by using state tracking
Yuan et al. RAProducer: efficiently diagnose and reproduce data race bugs for binaries via trace analysis
CN114090322B (en) A thread deadlock detection method and device
CN116010976A (en) Control flow detection method, device, electronic equipment and computer readable storage medium
US7996585B2 (en) Method and system for state tracking and recovery in multiprocessing computing systems
Yim et al. Pluggable watchdog: Transparent failure detection for MPI programs
CN116991596B (en) Process keep-alive processing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant