CN119254607A

CN119254607A - Fault handling system and method, storage medium, and electronic device

Info

Publication number: CN119254607A
Application number: CN202310757251.3A
Authority: CN
Inventors: 伍元聪
Original assignee: Sanechips Technology Co Ltd
Current assignee: Sanechips Technology Co Ltd
Priority date: 2023-06-25
Filing date: 2023-06-25
Publication date: 2025-01-03
Also published as: WO2025001291A1

Abstract

The embodiment of the present invention provides a fault handling system and method, a storage medium, and an electronic device. The system includes: a fault management unit connected to a security domain system, and used to perform a first fault handling on a fault occurring in the security domain system when a fault occurs in the security domain system, and when the cumulative number of executions of the first fault handling exceeds a preset threshold and the fault is not handled successfully, the fault information of the unhandled fault is sent to a central fault management unit; a central fault management unit, connected to the fault management unit and the system reset management unit, respectively, and used to perform a second fault handling on the fault according to the fault information. The above system solves the problem of being unable to detect and handle faults in a timely manner, and the above system achieves the effect of detecting and handling faults in a timely manner.

Description

Fault processing system and method, storage medium and electronic device

Technical Field

The embodiment of the invention relates to the technical field of automobile electronic function safety, in particular to a fault processing system and method, a storage medium and an electronic device.

Background

In order to meet increasingly complex application scenarios, the design of a vehicle-mounted chip in an E/E (electric/Electronic Architecture) system of an automobile tends to be integrated on a large scale, so that a great challenge is brought to the design of a functional safety architecture in the vehicle-mounted chip. At present, a common fault management mode in a vehicle-mounted chip is a centralized fault management architecture, and fault management signals of all subsystems in the chip are required to be processed in a centralized way, so that great difficulty is brought to hardware architecture design and physical realization of the chip, and great processing load is brought to running software of the chip, so that faults cannot be detected and processed in time.

Aiming at the problem that faults cannot be detected and processed in time in the related technology, no effective solution is proposed yet.

Disclosure of Invention

The embodiment of the application provides a fault processing system and method and a storage medium electronic device. At least solving the problems that the faults can not be detected and processed in time in the related technology.

According to one embodiment of the application, a fault processing system is provided, which comprises a fault management unit, a central fault management unit and a system reset management unit, wherein the fault management unit is connected with a safety domain system and is used for performing first fault processing on faults generated by the safety domain system when the safety domain system breaks down, and sending fault information of faults which are not processed successfully to the central fault management unit when the accumulated execution times of the first fault processing exceeds a preset threshold and the faults are not processed successfully, and the central fault management unit is respectively connected with the fault management unit and the system reset management unit and is used for performing second fault processing on the faults according to the fault information.

In an exemplary embodiment, the fault management unit comprises a plurality of fault management subunits, the security domain system comprises a plurality of security domain subsystems, the number of the plurality of fault management subunits is the same as that of the plurality of security domain subsystems, and the security domain subsystems are connected with the fault management subunits in a one-to-one correspondence mode.

In an exemplary embodiment, the fault management unit is configured to determine a level of fault information and/or a type of the fault information that has failed in case of a fault in the security domain system, and perform the first fault processing according to the level of fault information and/or the type of fault information.

In an exemplary embodiment, the fault management unit is configured to, when it is determined that the first fault processing is not successful, perform the first fault management on the fault again, obtain a cumulative execution number of the first fault management, and determine whether the cumulative execution number exceeds the preset threshold.

In an exemplary embodiment, the central fault management unit is configured to determine that a fault of a target security domain subsystem is not successfully processed according to the fault information, and send a reset instruction to the system reset management unit, and the system reset management unit is configured to reset the target security domain subsystem according to the reset instruction, obtain a reset result of the target security domain subsystem, and send the reset result to the central fault management unit.

In an exemplary embodiment, the central fault management unit is configured to control the fault handling system to power down if the reset result indicates that the target security domain subsystem is not successfully reset.

In an exemplary embodiment, the central fault management unit is configured to determine a type of the fault information according to the fault information, and perform the second fault processing on the fault according to a processing manner corresponding to the shared security mechanism fault type if the type of the fault information is determined to be the shared security mechanism fault type.

According to another embodiment of the present application, there is further provided a fault handling method, including performing, by the fault handling unit, a first fault handling of the fault occurring in the safety domain system in case of the fault occurring in the safety domain system, obtaining, by the central fault handling unit, fault information of a fault that was not handled successfully in case that a cumulative execution number of the first fault handling exceeds a preset threshold and the fault was not handled successfully, and performing, by the central fault handling unit, a second fault handling of the fault according to the fault information.

According to a further aspect of the embodiments of the present application, there is also provided a computer-readable storage medium comprising a stored program, wherein the program when run performs the fault handling method as set forth in the preceding claims.

According to yet another aspect of the embodiments of the present application, there is also provided an electronic device including a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the fault handling method described above by using the computer program.

In the embodiment of the application, the fault management unit carries out first fault processing on faults generated by the security domain system, and sends fault information of faults which are not processed successfully to the central fault management unit when the accumulated execution times of the first fault processing exceeds a preset threshold value and the faults are not processed successfully, and the central fault management unit carries out second fault processing on the faults according to the fault information when the fault information of the faults which are not processed successfully is received. The problem that faults cannot be detected and processed in time in the prior art is solved, and the fault is detected and processed in time by arranging the fault management unit and the central fault management unit.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

FIG. 1 is a schematic diagram of a fault handling system according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a fault handling system according to an alternative embodiment of the present application;

fig. 3 is a schematic structural view of a fault management unit according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a hardware environment of a chip of a fault handling method according to an embodiment of the present application;

FIG. 5 is a flow chart of a fault handling method according to an embodiment of the present application;

FIG. 6 is a flow chart of a fault handling method according to an alternative embodiment of the present application;

fig. 7 is a schematic diagram of a fault handling apparatus according to an embodiment of the present application.

Detailed Description

In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Automotive functions are particularly important for safety-related electronic-electric (E/E) system safety in the automotive field. In the development of an E/E system of an automobile, an automobile development engineer inevitably has human negligence or error including software and controller hardware, so that the system function is invalid, thereby causing faults and generating harm, and the part of the invalidation caused by human negligence is systematic invalidation; the hardware of the controller has corresponding faults and generates harm due to the self aging and functional failure caused by external environmental factors, and the hardware failure has randomness and accords with certain probability distribution, so the hardware failure is called as random hardware failure.

In order to avoid the above-mentioned two failures in the automotive E/E system, the concept of functional safety is derived therefrom. Currently, the safety design of automobile functions generally conforms to the standard ISO26262, which is derived from the IEC61508 standard, and is mainly positioned on specific electric devices, electronic equipment, programmable electronic devices, integrated circuit chips and other parts specially used in the automobile field in the automobile industry, so as to improve the international standard of the functional safety of automobile electronics and electric products.

ISO26262 is based on a V model and automotive functional safety development activities begin with a conceptual phase that mainly includes related item definitions, hazard analysis and risk assessment and functional safety solution development, and categorizing a system or system component into a desired Automotive Safety Integrity Level (ASIL) according to the hazard risk assessment so that the functional safety of the system product meets automotive safety requirements. ASIL is typically classified into four classes, A, B, C, D each, where D is the highest security class and a is the lowest security class. And combining and classifying similar hazard events, and then deriving a security target, wherein if the derived security target is similar, the derived security target can be combined and inherited to the highest ASIL level. The safety requirements of the automotive E/E system determine the grade of ASIL, and the higher the ASIL grade is, the higher the corresponding safety measure realization cost (such as research and development cost, research and development period and research and development technical requirement) is, and the higher the diagnostic coverage rate is. In the ISO26262 standard, implementing ASIL-D grades requires a single point fault metric and a multi-point fault metric of 99% or more and 90% or more, respectively, and is therefore very complex and difficult to implement for functional safety in automotive E/E systems.

In order to meet the requirement of functional safety, a vehicle-mounted chip in an E/E system of the automobile is internally provided with a corresponding software and hardware safety mechanism to prevent systematic failure or random hardware failure. The security mechanism is a very important technical means for ensuring the functional security of the on-vehicle chip system, such as hardware redundancy, memory signature, error detection error correction code (Error Correction Codes, ECC), parity (Parity), redundancy cyclic check (Cyclic Redundancy Check, CRC), clock monitoring, voltage monitoring, security status import, etc. The safety mechanism needs to detect faults in time when the system breaks down and report the faults to the system so that the system can respond to corresponding fault treatment according to the fault type and the fault hazard degree, and further the occurrence of latent faults or the hazard influence directly caused by the faults is avoided.

In the related art, in order to meet increasingly complex application scenarios, a vehicle-mounted chip in an E/E (electric/Electronic Architecture) system of a vehicle at present tends to be integrated on a large scale, so that a great challenge is brought to the design of a functional security architecture in the vehicle-mounted chip. At present, a common fault management mode in a vehicle-mounted chip is a centralized fault management architecture, and fault management signals of all subsystems in the chip are required to be processed in a centralized way, so that great difficulty is brought to hardware architecture design and physical realization of the chip, and great processing load is brought to running software of the chip, so that faults cannot be detected and processed in time. To this end, the application provides a fault handling system and method, a storage medium and an electronic device.

In this embodiment, a fault handling system is provided, where the system may be applied to a chip, for example, a vehicle-mounted chip, including but not limited to a vehicle-mounted central gateway chip, a vehicle-mounted intelligent cockpit chip, and the like. As shown in fig. 1, the system includes a fault management unit 120 and a central fault management unit 140, wherein:

The fault management unit 120 is connected to the security domain system 160, and is configured to perform a first fault process on a fault occurring in the security domain system 160 when the security domain system 160 fails, and send fault information of a fault that fails to be processed to the central fault management unit 140 when the cumulative execution number of the first fault process exceeds a preset threshold and the fault fails to be processed;

the central fault management unit 140 is connected to the fault management unit 120 and the system reset management unit 180, respectively, and is configured to perform a second fault processing on the fault according to the fault information.

According to the fault processing system of the embodiment of the application, the fault management unit is used for carrying out first fault processing on faults generated by the security domain system, and when the accumulated execution times of the first fault processing exceeds a preset threshold value and the faults are not processed successfully, fault information of the faults which are not processed successfully is sent to the central fault management unit, and when the central fault management unit receives the fault information of the faults which are not processed successfully, second fault processing is carried out on the fault information. The system achieves the effect of timely detecting and processing faults by arranging the fault management unit and the central fault management unit.

In order to avoid the load of fault management unit detection caused by centralized fault management, as shown in fig. 2, in an embodiment of the present application, the fault management unit 120 includes a plurality of fault management subunits, and the security domain system 160 includes a plurality of security domain subsystems, where the number of the plurality of fault management subunits is the same as the number of the plurality of security domain subsystems, and the security domain subsystems are connected in a one-to-one correspondence with the fault management subunits. And furthermore, by one-to-one connection of the security domain subsystems and the fault management subsystem, the independent fault management subsystem is configured for each security domain subsystem, so that fault detection and fault processing are performed on each security domain subsystem in real time. And based on each security domain subsystem, an independent fault management subunit is configured, so that faults of each security domain subsystem in the chip can be accurately positioned and responded, and the problems of great difficulty in hardware architecture design and physical realization of the chip caused by the faults of all security domain subsystems in the centralized management chip are avoided, and meanwhile, great processing load exists in running software of the chip, so that the design and realization of the functional security chip with high efficiency, high coverage rate and timely fault processing response are not facilitated.

The secure domain system 160 is used for processing and controlling functional Data, for example, N secure domain subsystems with different functions are designed inside a chip according to different application functional requirements, for example, N secure domain subsystems with different functions include a secure domain subsystem of a low latency CAN/LIN communication engine, a secure domain subsystem of a network switching hardware engine, a secure domain subsystem of a functional secure peripheral interface, a secure domain subsystem of an on-chip memory controller, a secure domain subsystem of a peripheral equipment interconnect interface extension (PERIPHERAL COMPONENT INTERCONNECT EXPRESS, PCIE), a secure domain subsystem of Double Data Rate (DDR), and the like.

In an embodiment of the present application, the fault management unit 120 is configured to determine a level of fault information and/or a type of fault information that has failed in case of a fault in the security domain system, and perform a first fault process according to the level of fault information and/or the type of fault information.

The level of the fault information may be determined based on a preset database, where the preset database includes data information and a level corresponding to the data information, for example, in a case where it is determined that the safety domain system fails, a safety domain subsystem in which the safety domain system fails may be determined, and further, the data information corresponding to the safety domain subsystem in which the fault occurs may be determined, so that the level corresponding to the data information of the fault is obtained from the preset database, the level of the fault information may be determined based on a range of influence of the fault, for example, in a case where the fault influence is less, the level of the fault information is determined to be a low level, in a case where the fault influence is more, the level of the fault information is determined to be a high level, and the level of the fault information may be determined based on a display value of the fault information, for example, in a case where the display value is 1.

The type of the fault information can be determined based on the attribute information of the fault information, the type of the fault information can be determined based on the fault performance of the fault information, and the type of the fault information can be determined based on the fault signal reported by the security domain system. For example, in the case where the type of the fault information is determined based on the attribute information of the fault information or the fault manifestation of the fault information, the type of the fault information includes, but is not limited to, a mechanical fault, an electrical fault, a braking fault, a steering fault, a tire fault, and the like. In another example, in a case of determining the type of the fault information based on the fault signal reported by the security domain system, the type of the fault information includes a shared security mechanism fault type and a non-shared security mechanism fault type. Wherein a shared security mechanism failure may be understood as a failure of data communication between security domain subsystems in a security domain system, and a non-shared security mechanism failure type may be understood as a failure of a security mechanism of each of the security domain subsystems.

The fault information processing method can be used for preferentially processing the faults of the fault information with high grades, and processing the faults of the fault information with low grades when the faults with high grades are processed. For example, a failure of safety performance may be regarded as a high-level failure, and a failure of non-safety performance may be regarded as a low-level failure. When a failure of failure information with a low level is handled, if a failure of failure information with a high level is detected, a failure of failure information with a high level is handled preferentially.

In embodiments of the present application, the first fault handling includes, but is not limited to, resetting, restarting or powering off, changing bulbs, and the like. The first failure processing mode of the failure information with high level is reset, and the first failure processing mode of the failure information with low level is not reset, namely, by determining the type of the failure information with low level, for example, when determining that the type of the failure information is a tire failure, the sensor determines that the tire pressure of the tire is insufficient, and then the tire is inflated.

In the embodiment of the present application, the fault management unit 120 is configured to, when determining that the first fault process is not successful, perform the first fault management on the fault again, obtain the cumulative execution number of the first fault management, and determine whether the cumulative execution number exceeds a preset threshold value.

That is, in the case where the fault management unit 120 performs the first fault process on the occurred fault, it may be determined whether the first fault process is successful, and in the case where it is determined that the first fault process is unsuccessful, the first fault process may be performed again on the fault.

The method comprises the steps of determining whether first fault treatment is successful or not based on a test running in a preset time, for example, restarting a safety domain subsystem corresponding to a fault after the first fault treatment is performed on the fault, detecting whether the fault occurs in the safety domain subsystem in the preset time, determining that the first fault treatment is unsuccessful if the fault occurs in the last time, determining that the first fault treatment is successful if the fault does not occur in the last time, and determining whether the first fault treatment is successful based on a restarting function, for example, determining that the fault type of the fault is a lamp, restarting the lamp if the fault type is determined to be the fault type, determining that the first fault treatment is successful if the lamp is normal after the lamp is restarted, and determining that the first fault treatment is unsuccessful if the lamp is abnormal.

In order for those skilled in the art to more easily understand the detection of the security domain system by the fault management unit in the present application, that is, the detection of the security domain subsystem by the fault management subunit in real time, alternative embodiments of the present application are explained in the following manner.

For example, taking one fault management subunit as an example, as shown in fig. 3, the fault management subunit includes an online hardware self-test module 302, a communication interface module 304, a fault injection module 306, a fault recording module 308, a fault collection and control module 310, a fault processing interface module 312, a fault processing monitoring module 314, and a safety domain subsystem 316. The online self-test module 302 is connected to a corresponding safety domain subsystem 316 and is responsible for logic built-in self-test (LBIST) and storage built-in self-test (MBIST) of the safety domain subsystem 316, the communication interface module 304 is connected to a fault management communication bus 1110 and is responsible for transmission of configuration information and fault information of a fault management subunit, the fault injection module 306 is connected to the corresponding safety domain subsystem 316 and is responsible for fault injection test of internal memory safety mechanism, hardware redundancy safety mechanism and self-defined safety mechanism of the corresponding safety domain subsystem 316, the fault recording module 308 is connected to the corresponding safety domain subsystem 316 and is responsible for fault information recording of internal memory safety mechanism, hardware redundancy safety mechanism and self-defined safety mechanism of the corresponding safety domain subsystem 316, the fault collecting and control module 310 is connected to the corresponding safety domain subsystem 316, the fault processing interface module 312 and the fault processing monitoring module 314 and is responsible for collection of all faults inside the corresponding safety domain subsystem 316 and carries out system management according to fault types, the fault processing interface module 312 is connected to the corresponding safety domain subsystem 316, the fault collecting and control module 310 and the fault collecting and the fault control module 314 are connected to the corresponding safety domain subsystem 310 and the fault control module 310 are connected to the corresponding safety domain subsystem and the fault management module 310. And further realize real-time detection, recording and processing of faults occurring in the safety domain subsystem connected with the fault management subunit.

Because the security domain subsystem 316 is internally provided with the logic portion and the storage portion, the security domain subsystem 316 corresponds to a security mechanism, and thus the on-line hardware self-test module 302 detects a latent fault of the security mechanism. After the security domain subsystem 316 is powered on, it detects if the chip hardware has a fault based on the on-line hardware self-test module 302, wherein, the logic of the chip is tested, i.e. a test vector is automatically generated, the test vector is output, and the output is compared with the expected output, and the memory of the chip is tested, i.e. the embedded memory in the chip is self-tested, because the embedded memory in the chip is deeply buried into the logic of the chip, no method is needed to directly test by external excitation at all, and a part of circuits must be designed in the chip to automatically self-test the embedded memory.

The fault injection module 306 is a fault injection test for a security mechanism inside the security domain subsystem 316, the security mechanism is used for detecting some faults, and the security mechanism itself is problematic, and the fault injection module 306 is used for detecting the security mechanism. For example, detecting a failure of the security mechanism of the internal memory of security domain subsystem 316 may be accomplished by obtaining a one-bit signal and a multi-bit signal in the internal memory of security domain subsystem 316, simulating the two bit errors based on the one-bit signal and the multi-bit signal to determine whether the two bit errors were reported by the security mechanism of the internal memory of security domain subsystem 316, and if so, indicating that the security mechanism of the internal memory of security domain subsystem 316 is normal itself, i.e., the security mechanism of the internal memory of security domain subsystem 316 is not failed, and if not, indicating that the security mechanism of the internal memory of security domain subsystem 316 is problematic itself, i.e., the security mechanism of the internal memory of security domain subsystem 316 is failed.

In the embodiment of the present application, the central fault management unit 140 is configured to determine that the fault of the target security domain subsystem is not successfully processed according to the fault information and send a reset instruction to the system reset management unit 180, and the system reset management unit 280 is configured to reset the target security domain subsystem according to the reset instruction, obtain a reset result of the target security domain subsystem, and send the reset result to the central fault management unit.

Reset is understood to mean, among other things, restoration of factory settings.

The central fault management unit 140 is configured to control the fault handling system to power down when the reset result indicates that the target security domain subsystem is not successfully reset. For example, after the control fault system is powered down, the fault may be repaired based on the type of fault.

And under the condition that the reset result indicates that the target security domain subsystem is successfully reset, detecting the fault of the security domain subsystem in real time.

The central fault management unit 140 is configured to determine a type of fault information according to the fault information, and perform a second fault processing on the fault according to a processing mode corresponding to the type of fault of the shared security mechanism when the type of fault information is determined to be the type of fault of the shared security mechanism. That is, in the case where the secure domain system fails, the failure type in which the failure has occurred may be determined, and in the case where the failure type is determined to be the shared security mechanism failure type, it is indicated that the failure management unit has no authority to handle the failure of the shared security mechanism failure type, and thus a higher level of central failure management unit is required for failure handling. For example, in the case where it is determined that the shared storage between the first security domain subsystem and the second security domain subsystem is faulty, the central fault management unit may accurately determine that the faulty shared storage is disposed in the first security domain subsystem, so that the storage path, i.e., the data of the shared storage is stored in the second security domain subsystem.

In order to make the fault handling system more complete, as shown in fig. 2, the fault handling system further comprises a fault management communication bus 1110, an application domain processor system 1112, a system memory unit 1114, a system bus 1116 and a security domain processor system 1118, wherein the application domain processor system 1112 is connected with the system bus 1116, the system memory unit 1114 is connected with the system bus 1116, the security domain system 160 is connected with the system bus 1116 and the fault management unit 120, the fault management unit 120 is connected with the security domain system 160 and the fault management communication bus 1110, the security domain processor system 1118 is connected with the fault management communication bus 1110, the central fault management unit 140 is connected with the fault management communication bus 1110 and the system reset management unit 180, and the system reset management unit 180 is connected with the central fault management unit 140.

The application domain processor system 1112 is responsible for non-functional secure data processing and control, and may be implemented by a high-performance CPU system, for example, the types of the high-performance CPU include ARM Cortex-a53, ARM Cortex-a57, ARM Cortex-a72, and the like.

The system Memory unit 1114 is responsible for providing a cache of system data, and is implemented by using an on-chip SRAM (Static Random-Access Memory) Memory.

The system bus 1116 is responsible for providing a data transmission path between the application domain processor system 1112, the system memory unit 1114, and the security domain system 160, and may be implemented using a high performance NIC, NOC bus topology, for example, a high performance NIC bus ARM CoreLink NIC-400, a high performance NOC bus Arteris FlexNoC, and a communication protocol may be implemented using a high performance bus protocol, for example, AXI, AHB, etc.

The fault management communication bus 1110 is responsible for providing configuration information and data transmission paths for fault management between the safety domain processor system 1118, the fault management unit 120 and the central fault management unit 140, and may be implemented by using a high performance NIC, NOC bus topology, for example, the high performance NIC bus is ARM CoreLink NIC-400, the high performance NOC bus is Arteris FlexNoC, and the communication protocol may be AXI-lite, AHB-lite, APB, or the like.

The secure domain processor system 1118 is responsible for functional secure data processing and control, and is implemented by a CPU system capable of providing real-time functional security, for example, the full-performance CPU includes an ARM Cortex-M0, an ARM Cortex-M7, an ARM Cortex-R5, an ARM Cortex-R52, and the like.

According to the fault processing system provided by the embodiment of the application, the hierarchical processing of the internal faults of the vehicle-mounted chip is carried out through the functional security architecture consisting of the distributed fault management unit, the central fault management unit, the security domain system, the system reset management unit, the fault management communication bus, the application domain processor system, the system storage unit, the system bus and the security domain processor system, so that the transmission path of fault signals in the chip is reduced, the complexity of the design and physical realization of the hardware architecture of the chip is reduced, and the speed and the energy efficiency of the chip are improved. Based on the fact that an independent fault management unit is arranged for each security domain subsystem, fault detection can be carried out in real time, fault processing and fault information reporting can be carried out according to the level of fault information of faults, and therefore accurate positioning and response of faults of the security domain subsystems in a chip are guaranteed, the load of centralized fault detection is effectively reduced, faults can be detected and processed efficiently and timely, reasonable fault processing measures are adopted for the detected faults, usability of the chip system is improved when the faults occur, and safety of the system is improved.

According to one aspect of an embodiment of the present application, a fault handling method is provided. The fault handling method may be performed in a chip. Take the example of running on a chip. Fig. 4 is a block diagram of a hardware structure of a chip of a fault handling method according to an embodiment of the present application. As shown in fig. 4, the chip may include one or more processors 402 (only one is shown in fig. 4) (the processor 402 may include, but is not limited to, a microprocessor (Microprocessor Unit, abbreviated MPU) or a programmable logic device (Programmable logic device, abbreviated PLD)) and a memory 404 for storing data, and in an exemplary embodiment, the chip may further include a transmission device 406 for communication functions and an input-output device 408. It will be appreciated by those skilled in the art that the structure shown in fig. 4 is merely illustrative and is not intended to limit the structure of the chip. For example, the chip may also include more or fewer components than shown in FIG. 4, or have a different configuration than the equivalent function shown in FIG. 4 or more than the function shown in FIG. 4.

The memory 404 may be used to store computer programs, such as software programs of application software and modules, such as computer programs corresponding to a fault handling system in an embodiment of the present invention, and the processor 402 executes the computer programs stored in the memory 404 to perform various functional applications and data processing, i.e., implement the above-described methods. Memory 404 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 404 may further include memory located remotely from processor 402, which may be connected to the chip via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 406 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the switching device. In one example, the transmission device 406 includes a network adapter (Network Interface Controller, simply referred to as a NIC) that may connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 406 may be a Radio Frequency (RF) module, which is configured to communicate with the internet wirelessly.

Fig. 5 is a flowchart of a fault handling method according to an embodiment of the present application, which is applied to the above-mentioned chip, as shown in fig. 5, and includes:

step 520, in the case of a failure of the security domain system, performing a first failure processing on the failure of the security domain system by using the failure processing unit.

The fault management unit comprises a plurality of fault management subunits and a safety domain system, wherein the safety domain system comprises a plurality of safety domain subsystems, the number of the plurality of fault management subunits is the same as that of the plurality of safety domain subsystems, and the safety domain subsystems are connected with the fault management subunits in a one-to-one correspondence manner.

In the embodiment of the application, the plurality of fault management subunits can detect the corresponding plurality of safety domain subsystems in real time, and under the condition that the safety domain subsystems are in fault, the fault management subunits can perform first fault processing on the faults of the safety domain subsystems.

Under the condition that the safety domain subsystem is in fault, determining the grade of fault information and/or the type of the fault information, and further performing first fault processing according to the grade of the fault information and/or the type of the fault information.

In step 540, in the case where the cumulative execution number of the first fault processes exceeds the preset threshold and the fault is not processed successfully, fault information of the fault which is not processed successfully is obtained by the central fault processing unit.

In the embodiment of the application, under the condition that the first fault processing is performed according to the grade of the fault information and/or the type of the fault information, whether the first fault processing is successful or not can be judged, under the condition that the first fault processing is not successful, the first fault processing is performed again on the fault, the accumulated execution times of the first fault processing are obtained, whether the accumulated execution times of the first fault processing exceed a preset threshold value or not is determined, and under the condition that the accumulated execution times of the first fault processing exceed the preset threshold value and the fault is not successful, the fault which is not successfully processed is sent to the central fault processing unit, so that the central fault processing unit obtains the fault information which is not successfully processed.

In one embodiment of the present application, when the type of the fault information is determined to be the shared security mechanism fault type, the shared security mechanism fault type is sent to the central fault management unit, so that the central fault management unit performs a second fault processing on the fault according to a processing mode corresponding to the shared security mechanism fault type.

And step 560, performing a second fault processing on the fault according to the fault information by the central fault processing unit.

In the embodiment of the application, under the condition that the central fault processing unit acquires fault information which is not successfully processed, a target security domain subsystem corresponding to the fault information which is not successfully processed is determined, and based on the connection between the central fault management unit and the system reset management unit, a reset instruction is sent to the system reset management unit, so that the system reset management unit resets the target security domain subsystem according to the reset instruction, acquires a reset result of the target security domain subsystem, and sends the reset result to the central fault management unit.

The central fault management unit receives a reset result, and controls the fault processing system to be powered down when the reset result indicates that the target security domain subsystem is not reset successfully.

In one embodiment of the present application, the type of the fault information is determined, and in the case that the type of the fault information is determined to be a shared security mechanism fault type, the second fault processing is performed on the fault according to a processing manner corresponding to the shared security mechanism fault type.

In a specific embodiment of the present application, for example, taking the application scenario of the fault handling method as an on-vehicle central gateway chip, as shown in fig. 6:

Step 1, in the running process of the chip, carrying out real-time fault detection on a security mechanism in each security domain subsystem;

Step 2, a security mechanism inside the security domain subsystem detects that the security domain subsystem is in fault, and a fault management subunit corresponding to the security domain subsystem performs fault type identification;

step 3, performing first fault processing by a fault management subunit corresponding to the safety domain subsystem;

step 4, judging whether the first fault processing is successful or not, entering step 1 when the processing is successful, and entering step 5 when the processing is unsuccessful;

step 5, performing fault processing retry by the fault management subunit corresponding to the safety domain subsystem, acquiring the number of fault processing retry times, judging whether the number of retry times exceeds the preset number of retry times, entering step 4 when the number of retry times exceeds the preset number of retry times, and entering step 6 when the number of retry times does not exceed the preset number of retry times;

Step 6, the fault management subunit corresponding to the safety domain subsystem reports fault information which is not processed successfully to the central fault management unit so that the central fault management unit carries out second fault processing on the fault which is not processed successfully;

Step 7, judging whether the second fault processing is successful, entering step 1 when the second fault processing is determined to be successful, and entering step 8 when the second fault processing is determined to be unsuccessful;

and 8, controlling the system to enter a safe state.

According to the fault processing method of the embodiment of the application, the fault management unit is used for carrying out first fault processing on the fault of the security domain system, and when the accumulated execution times of the first fault processing exceeds a preset threshold and the fault is not processed successfully, fault information of the fault which is not processed successfully is sent to the central fault management unit, and when the central fault management unit receives the fault information of the fault which is not processed successfully, second fault processing is carried out on the fault information. The system achieves the effect of timely detecting and processing faults by arranging the fault management unit and the central fault management unit.

From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method of the various embodiments of the present application.

Fig. 7 is a block diagram of a fault handling apparatus according to an embodiment of the present application, and as shown in fig. 7, includes:

a first processing module 720, configured to perform, when the security domain system fails, a first failure process on the failure occurring in the security domain system through the failure processing unit;

An obtaining module 740, configured to obtain, by the central fault processing unit, fault information of a fault that is not successfully processed, when the cumulative execution number of the first fault processing exceeds a preset threshold and the fault is not successfully processed;

and the second processing module 60 is used for performing second fault processing on the fault according to the fault information through the central fault processing unit.

According to the device, under the condition that the safety domain system breaks down, the fault processing unit performs first fault processing on the fault of the safety domain system, under the condition that the first processing module breaks down, and under the condition that the accumulated execution times of the first fault processing exceeds a preset threshold value and the fault is not processed successfully, the central fault processing unit acquires fault information of the fault which is not processed successfully, and under the condition that the second processing module, the central fault processing unit performs second fault processing on the fault according to the fault information. The system effectively reduces the load of centralized fault detection and processing by arranging the fault management unit and the central fault management unit, and efficiently and timely realizes the fault detection and processing.

In an exemplary embodiment, the first processing module 720 is configured to determine a level of fault information and/or a type of the fault information that has failed in a case where the security domain system fails, and perform the first fault processing according to the level of the fault information and/or the type of the fault information.

In an exemplary embodiment, the obtaining module 740 is configured to, when it is determined that the first fault processing is not successful, perform the first fault management on the fault again, and obtain a cumulative execution number of the first fault management and determine whether the cumulative execution number exceeds the preset threshold.

In an exemplary embodiment, the second processing module 760 is configured to determine that the failure of the target security domain subsystem is not successfully processed according to the failure information, and send a reset instruction to the system reset management unit, where the system reset management unit is configured to reset the target security domain subsystem according to the reset instruction, obtain a reset result of the target security domain subsystem, and send the reset result to the central failure management unit.

In an exemplary embodiment, the second processing module 760 is configured to control the fault handling system to power down if the reset result indicates that the target security domain subsystem is not successfully reset.

In an exemplary embodiment, the second processing module 760 is configured to determine a type of the fault information according to the fault information, and perform the second fault processing on the fault according to a processing manner corresponding to the type of the shared security mechanism fault if the type of the fault information is determined to be the type of the shared security mechanism fault. An embodiment of the present application also provides a storage medium including a stored program, wherein the program executes the method of any one of the above.

Alternatively, in the present embodiment, the above-described storage medium may be configured to store program code for performing the steps of:

S1, under the condition that the safety domain system fails, performing first fault processing on the failure of the safety domain system through the fault processing unit;

S2, under the condition that the accumulated execution times of the first fault processing exceeds a preset threshold value and the fault is not processed successfully, fault information of the fault which is not processed successfully is obtained through the central fault processing unit;

S3, performing second fault processing on the faults according to the fault information through the central fault processing unit.

An embodiment of the application also provides an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.

Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:

Alternatively, in the present embodiment, the storage medium may include, but is not limited to, a U disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory RAM), a removable hard disk, a magnetic disk, or an optical disk, etc. which can store program codes.

Alternatively, specific examples in this embodiment may refer to examples described in the foregoing embodiments and optional implementations, and this embodiment is not described herein.

It will be appreciated by those skilled in the art that the modules or steps of the application described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a memory device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module for implementation. Thus, the present application is not limited to any specific combination of hardware and software.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims

1. A fault handling system, comprising:

a fault management unit connected to the security domain system, and configured to, in the event that a fault occurs in the security domain system, perform a first fault processing on the fault occurring in the security domain system, and, in the event that the cumulative number of executions of the first fault processing exceeds a preset threshold and the fault is not successfully processed, send fault information of the unsuccessfully processed fault to the central fault management unit;

The central fault management unit is connected to the fault management unit and the system reset management unit respectively, and is used to perform a second fault processing on the fault according to the fault information.

2. The fault handling system according to claim 1, characterized in that the fault management unit comprises: a plurality of fault management subunits, and the security domain system comprises: a plurality of security domain subsystems; wherein,

The number of the multiple fault management subunits is the same as the number of the multiple security domain subsystems, and the security domain subsystems are connected to the fault management subunits in a one-to-one correspondence.

3. The fault handling system according to claim 1 is characterized in that the fault management unit is used to determine the level of fault information and/or the type of fault information that has occurred when a fault occurs in the security domain system; and perform the first fault handling according to the level of the fault information and/or the type of the fault information.

4. The fault handling system according to claim 1 is characterized in that the fault management unit is used to determine that if the first fault handling is not successful, perform the first fault management on the fault again, obtain the cumulative number of executions of the first fault management and determine whether the cumulative number of executions exceeds the preset threshold.

5. The fault handling system according to claim 2, characterized in that the central fault management unit is used to determine that the fault of the target security domain subsystem has not been successfully handled according to the fault information, and send a reset instruction to the system reset management unit;

The system reset management unit is used to reset the target security domain subsystem according to the reset instruction, obtain the reset result of the target security domain subsystem, and send the reset result to the central fault management unit.

6. The fault handling system according to claim 5 is characterized in that the central fault management unit is used to control the fault handling system to power off when the reset result indicates that the target security domain subsystem has not been reset successfully.

7. The fault handling system according to claim 1 is characterized in that the central fault management unit is used to determine the type of the fault information based on the fault information, and when it is determined that the type of the fault information is a shared safety mechanism fault type, perform the second fault handling on the fault according to the processing method corresponding to the shared safety mechanism fault type.

8. A fault handling method, characterized in that it is applied to the fault handling system according to any one of claims 1 to 7, comprising:

In the event that a fault occurs in the security domain system, performing a first fault processing on the fault occurring in the security domain system by the fault processing unit;

When the cumulative number of executions of the first fault processing exceeds a preset threshold and the fault is not successfully processed, obtaining fault information of the unsuccessfully processed fault through the central fault processing unit;

The central fault processing unit performs a second fault processing on the fault according to the fault information.

9. The fault handling method according to claim 8, characterized in that the fault management unit comprises: a plurality of fault management subunits, and the security domain system comprises: a plurality of security domain subsystems; wherein,

10. The fault handling method according to claim 8, characterized in that performing a first fault handling on the fault occurring in the security domain system comprises:

In the event of a failure in the security domain system, determining a level of failure information of the failure and/or a type of the failure information;

The first fault processing is performed according to the level of the fault information and/or the type of the fault information.

11. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored program, wherein the program executes the fault handling method described in any one of claims 8 to 10 when running.

12. An electronic device comprising a memory and a processor, wherein a computer program is stored in the memory, and the processor is configured to execute the fault handling method described in any one of claims 8 to 10 through the computer program.