Disclosure of Invention
The method for acquiring the temperature of the hard disk, the server and the substrate management controller provided by the embodiment of the application enable the server to acquire the temperature information of the hard disk when the temperature abnormality occurs in time, so that the related temperature regulation and control can be performed according to the acquired temperature information of the hard disk, and the working reliability of the hard disk is ensured.
In a first aspect, an embodiment of the present application provides a method for obtaining a hard disk temperature, which is applied to a server, where the server includes a baseboard management controller BMC, an independent redundant disk array Raid card, and a hard disk, the BMC is connected to the Raid card, and the Raid card is connected to the hard disk, and the method includes:
The BMC sends a request sensing command to the hard disk through the Raid card, wherein the request sensing command is used for acquiring sensing data of the hard disk;
the BMC obtains sensing data of the hard disk aiming at the request sensing command through the Raid card, wherein the sensing data comprises a first field configured to indicate temperature information of the hard disk;
and the BMC acquires temperature information of the hard disk based on the perceived data of the hard disk.
According to the embodiment, the BMC controls the Raid card to send a request sensing command to the hard disk, and the hard disk can feed back sensing data indicating temperature information of the hard disk after responding to the request sensing command. Therefore, when the temperature of the hard disk is abnormal, the temperature information of the hard disk can be timely fed back to the server, so that the server can regulate and control the relevant temperature according to the temperature information of the hard disk, and the working reliability of the hard disk is ensured.
In some embodiments, before the BMC sends a request aware command to the hard disk through the Raid card, the method further comprises:
the BMC sends a query command to the hard disk through the Raid card;
And if the BMC does not acquire the temperature information of the hard disk, the BMC sends a request sensing command to the hard disk through the Raid card.
In this embodiment, when it is determined that the hard disk does not respond to the temperature query command to feed back the current temperature information, a request sensing instruction is sent to the hard disk. On one hand, the method can ensure that the current temperature information of the hard disk can be timely obtained, and on the other hand, the resource waste caused by randomly sending a request sensing instruction can be avoided.
In some embodiments, detecting a state of the hard disk;
And if the hard disk is in a formatted state, executing the step that the BMC sends a request sensing command to the hard disk through the Raid card.
By implementing the embodiment, the hard disk can still feed back the current temperature information when the hard disk is in the formatted state, and the working reliability of the hard disk is ensured.
In some embodiments, the first field is a SENSE DATA descriptor field, the SENSE DATA descriptor field configured to indicate a temperature value of the hard disk, the method further comprising:
the BMC controls a fan based on the temperature value of the hard disk.
According to the embodiment, the hard disk can also feed back the current temperature value to the BMC, so that the BMC can accurately regulate and control the temperature of the hard disk according to the current temperature value of the hard disk, the working reliability of the hard disk can be ensured, and the increase of the power consumption of the fan can be avoided.
In some embodiments, the server further comprises a fan, the fan is connected with the BMC, the perception data further comprises a second field and a third field, the second field is an additional detection code ASC field, and the third field is an additional detection code qualifier ASCQ field;
the ASC is configured to a first value and the ASCQ is configured to a second value, the first value and the second value being used to indicate a temperature threshold value for the trigger fan to perform speed regulation, the method further comprising:
if the temperature value of the hard disk reaches the temperature threshold value triggering the fan to execute speed regulation, the BMC starts the fan to radiate the hard disk;
or the ASC is configured to be a third value and the ASCQ is configured to be a fourth value, wherein the third value and the fourth value are used for indicating an upper limit threshold value of the working temperature of the hard disk, and the method further comprises:
If the temperature value of the hard disk reaches the upper limit threshold of the working temperature of the hard disk, the BMC controls the fan to increase the rotating speed;
Or the ASC is configured to be a fifth value and the ASCQ is configured to be a sixth value, wherein the fifth value and the sixth value are used for indicating a temperature threshold value for stopping the hard disk, and the method further comprises:
and if the temperature value of the hard disk reaches the temperature threshold value for stopping the hard disk, the BMC determines that the hard disk stops working.
After the temperature information is fed back to the Raid card, the BMC can determine the current working condition of the hard disk based on the temperature information of the hard disk, so that the temperature of the hard disk can be regulated and controlled according to the current working condition of the hard disk, and the working reliability of the hard disk is ensured.
In some embodiments, the sense data further includes a fourth field, the fourth field being an error class code sense key field,
The ASC is a first value, the ASCQ is a second value, and the sense key field is used for indicating that the temperature of the hard disk needs to be noted, and the method further comprises:
The BMC determines that the temperature of the hard disk is abnormal;
or the ASC is a third value, the ASCQ is a fourth value, and the sense key field is used for indicating that a recoverable error occurs, and the method further comprises:
The BMC determines that recoverable errors occur in the hard disk;
or, the ASC is a fifth value, the ASCQ is a sixth value, and the sense key field is used for indicating hardware error, and the method further comprises:
The BMC determines that an unrecoverable error has occurred with the hard disk.
By implementing the embodiment, the error condition of the hard disk can be reflected by setting different values of the sense key field, so that the server can determine the error condition of the hard disk according to the values reflected by the acquired sense key field, thereby carrying out temperature regulation and control on the hard disk and ensuring the working reliability of the hard disk.
In some embodiments, the hard disk obtains a temperature value and writes the temperature value to a first field.
According to the embodiment, the temperature value is written in the first field of the sensing data fed back by the hard disk, the BMC can regulate and control the temperature of the hard disk according to the obtained hard disk temperature value, and the working reliability of the hard disk is ensured.
In a second aspect, an embodiment of the present application provides a method for obtaining a hard disk temperature, which is applied to a baseboard management controller BMC, where the method includes:
the BMC sends a sensing data acquisition instruction to the Raid card, wherein the sensing data acquisition instruction is used for indicating the Raid card to send a request sensing command request sense command to the hard disk, and the request sensing command is used for acquiring sensing data SENSE DATA of the hard disk;
The BMC receives sensing data sent by the Raid card and aiming at the request sensing command, wherein the sensing data comprises a first field, and the first field is configured to indicate temperature information of the hard disk;
and the BMC acquires temperature information of the hard disk based on the perceived data of the hard disk.
In a third aspect, an embodiment of the present application provides a server, where the server includes a baseboard management controller BMC, a redundant array of independent disks Raid card, and a hard disk, where the BMC is connected to the Raid card, the Raid card is connected to the hard disk, and the server is configured to execute the method for acquiring a hard disk temperature according to any one of the foregoing embodiments.
In a fourth aspect, an embodiment of the present application provides a baseboard management controller BMC, where the baseboard management controller BMC is configured to execute the method for obtaining a hard disk temperature according to the foregoing embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application more apparent, the specific technical solutions of the present application will be described in further detail below with reference to the accompanying drawings in the embodiments of the present application. The following examples are illustrative of the application and are not intended to limit the scope of the application.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.
It should be noted that the term "first/second/third" in relation to embodiments of the present application is used to distinguish between similar or different objects, and does not represent a particular ordering of the objects, it being understood that the "first/second/third" may be interchanged with a particular order or sequencing, as permitted, to enable embodiments of the present application described herein to be implemented in an order other than that illustrated or described herein.
In a server, a storage device is indispensable for storing system applications and user data, and the server may be as large as the server, as small as a computer host, or other portable servers, for example, when the server works, various electronic components such as a memory, a hard disk, a motherboard and the like arranged in the server may bring about a large amount of power consumption, and further cause an increase in internal temperature of the server. The operation reliability of each electronic component in the server is strongly related to the temperature of the server, so that the heat dissipation problem of the server during operation needs to be considered, and each electronic component is ensured to work in the temperature range declared by the specification, so that the reliability of each electronic component and the whole server is further ensured.
In implementing heat dissipation to a server, it can generally be implemented in two ways. Firstly, the structure of the server can be optimized, and secondly, the heat dissipation components in the server, such as fans, can be used for dissipating heat of the electronic components. The optimization of the structure of the server is realized when the hardware structure is designed in the development stage of the server, and once the design is completed, the hardware structure is difficult to change in the later use. And the heat dissipation assembly in the server dissipates heat to each electronic component, so that the temperature can be collected from each electronic component or a sensor in the server when the server works, the working condition of the heat dissipation assembly can be adjusted according to the collected temperature, and the temperature adjustment of each electronic component can be realized.
The embodiment of the application provides a hard disk management method which is applied to a server.
Fig. 1 is a schematic structural diagram of a server provided in one embodiment.
As shown in fig. 1, the server may include a redundant array of independent disks (Redundant Array of INDEPENDENT DISKS, raid) card and a plurality of hard disks.
The hard disk is a main storage device in the server, and in the embodiment of the present application, the type of the hard disk is not limited, for example, the hard disk may include a mechanical hard disk HDD, a solid state disk SSD, and the like. The HDD hard disk may further include a serial connection small computer system interface (SERIAL ATTACHED SCSI, SAS) hard disk, a serial port (SERIAL ATA, SATA) hard disk, etc., and the SSD hard disk may further include a nonvolatile memory host controller interface specification (NonVolatile Memory Express, NVME) hard disk, etc.
The Raid card can be used for managing a plurality of hard disks in a server so as to meet the requirements of actual services. The Raid card may establish a connection with at least one hard disk, where the at least one hard disk may form a Raid group under the Raid card, where the Raid group may be used to improve performance or reduce redundancy of the server.
The Raid groups can be divided into two types, one type is a Raid group of non-redundant Raid levels, e.g., raid0, etc., and the other type is a Raid group of redundant Raid levels, e.g., raid 1, raid 5, etc. The Raid0 improves the read-write speed through a stripe division technology, but does not provide data redundancy, the Raid 1 provides complete data redundancy through a mirror image technology, but the disk utilization rate is low, and the Raid 5 combines stripe division and parity check, so that better data protection and performance balance are provided.
Through the combined use of the Raid card and the hard disk, the reliability, the safety, the read-write performance, the storage capacity and the availability and the stability of the server of the data can be improved.
Fig. 2 is a schematic structural diagram of a server according to another embodiment. As shown in fig. 2, the server may include a Raid card, a hard disk, and a management controller.
The management controller is a special microcontroller independent of the operating system and the main processor of the server, and can be an independent hardware module or software or firmware integrated on a main board of the server.
The management controller may include a baseboard management controller (baseboard management controller, BMC), a service processor (Service Processor, SP), an integrated management Module (INTEGRATED MANAGEMENT Module, IMM), a remote control card (INTEGRATED DELL remote access, IDRAC), an integrated remote management port (INTEGRATED LIGHTSOUT, iLO), hardware device management (HARDWARE DEVICE MANAGEMENT, HDM), and the like.
Taking the management controller as a BMC for illustration, the BMC can interact with the Raid card through a specific interface or protocol. A bus interface such as a Raid card may be coupled to the motherboard of the server via a peripheral component interconnect (PERIPHERAL COMPONENT INTERCONNECT, PCI) and the BMC may communicate with other components on the motherboard via a similar interface or other proprietary interface (e.g., INTELLIGENT PLATFORM MANAGEMENT BUS, IPMB). Or in addition to the hardware interface, communication between the BMC and the Raid card may be through a software interface. These software interfaces may include drivers, APIs (application program interfaces) or specific management protocols. Or the BMC may send instructions to the Raid card via an open standard protocol (INTELLIGENT PLATFORM MANAGEMENT INTERFACE, IPMI) to obtain its state information, perform configuration changes, etc.
The BMC may be further used for device information management, such as recording a model, a manufacturer, a date, information generated by each component, etc., or may monitor and manage a state of the server, such as detecting a health state of each component of the server, such as a temperature, a voltage, etc., or may perform remote control and management on the server, such as managing an on/off state, restarting, maintaining, firmware updating, system installation, etc., which is not limited in the embodiment of the present application.
In the server, the Raid card, the management controller and the hard disk together form an efficient and stable data storage and management platform. The Raid card can improve the reliability, performance and capacity of data storage by combining a plurality of hard disks, the management controller ensures the stable operation of the system by monitoring and managing hardware components of the server, and the hard disks are the most basic and important components in the whole system as data storage media. The interaction and the mutual dependence of the three provide solid guarantee for the stable operation and the data security of the server.
Fig. 3 is a schematic diagram of a server structure provided in yet another embodiment. As shown in fig. 3, the server may include a Raid card, a management controller, a processor, a cooling system, a motherboard, a backplane, at least one hard disk, and the like.
The main board is one of core components of the server, provides hardware support and connection, controls and manages resources, and completes data transmission and processing. Various interfaces and slots are integrated on the main board and are used for connecting components such as a processor, a memory, a hard disk, a Raid card and the like.
The back plate can comprise a plurality of slots, and can be connected with components such as a main board, a hard disk, a power supply and the like through the slots and the interfaces so as to ensure data transmission and power supply between the main board, the hard disk, the power supply and the like.
The processor may be a central processing unit (central processing unit, CPU), which is the computational core of the server, responsible for executing program instructions and processing data.
The processor can be connected with the main board through a processor slot on the main board, and performs high-speed data transmission with other components (such as a memory, a hard disk and the like) through a bus on the main board.
The processor may also interact with the management controller, such as through an intelligent platform management interface (INTELLIGENT PLATFORM MANAGEMENT INTERFACE, IPMI).
The management controller may also interact with the Raid card in a variety of ways. The management controller may be connected to the Raid card through an interface, for example, through an internal bus (such as PCIe, I2C, GPIO, or the like) of the server to implement data transmission, or the management controller may also be connected to the Raid card through a specific communication protocol, which may include an IPMI protocol, a System Management Interrupt (SMI) protocol, or the like, or the management controller may implement data transmission with the Raid card through special firmware or software that runs, where the software provides interfaces and tools for interaction with the Raid card, and through which an administrator may query the status of the Raid card, configure the Raid level, monitor disk health, or the like.
The Raid card may be disposed on a high-speed serial computer expansion bus standard (PERIPHERAL COM PONENT INTERCONNECTEXPRESS, PCIe) slot of the motherboard and electrically connected to the motherboard.
The uplink of the Raid card is interacted with the processor through a PCIe protocol, and the downlink of the Raid card can be interacted with the backboard and the hard disk through a serial connection SCSI interface (SERIAL ATTACHED SCSI, SAS) protocol so as to manage the hard disk connected to the Raid card.
The cooling system may include a fan, a heat sink, or other cooling device. Taking a cooling system as an example for explanation, the fan is connected with the main board through a fan interface on the main board, and the management controller can control the fan to radiate heat to the hard disk when acquiring that the temperature of components in the server is abnormal, such as the temperature of the hard disk is too high.
Embodiments of the present invention will be further described below with reference to the accompanying drawings.
Fig. 4 is a schematic implementation flow chart of a method for obtaining a hard disk temperature according to an embodiment of the present application. The hard disk management method can be applied to any one of the servers shown in fig. 1 to 3. As shown in fig. 4, the method may include the steps of:
In step 401, the bmc sends a request sensing command to the hard disk through the Raid card, where the request sensing command is used to obtain sensing data of the hard disk.
In some embodiments, before the BMC sends the request sensing command to the hard disk through the Raid card, the BMC may also send a query command to the hard disk through the Raid card, where the query command is used to indicate to query temperature information of the hard disk, and if the BMC does not obtain the temperature information of the hard disk, the BMC sends the request sensing command to the hard disk through the Raid card.
In the embodiment of the application, what kind of command is the query command is not limited. As in the SCSI protocol, the temperature INQUIRY command may be an INQUIRY command (12 h) command, which may be used to inquire about temperature information of the hard disk. In other protocols, the query command may be other types of commands, which are not described in detail herein.
In some embodiments, the temperature information of the hard disk may be detected by a sensor, which may include a temperature sensor, which may be a sensor of the hard disk itself or a sensor in a server, which is not limited thereto.
In the embodiment of the application, the BMC sends the query command to the hard disk through the Raid card, and the query command can be sent to a single hard disk or sent to a plurality of hard disks, and the comparison is not limited.
In the embodiment of the application, the time for the BMC to send the query command to the hard disk through the Raid card is not limited either, for example, the BMC can send the query command to the hard disk at fixed time intervals through the Raid card.
It will be appreciated that the BMC sends a query command to the hard disk via the Raid card, and the hard disk may or may not respond to the query command to feed back temperature information to the BMC.
The hard disk receives the query command without feeding back the temperature information, and may have various situations, such as that the hard disk itself fails, such as physical damage, firmware damage, etc., or may be in a FORMAT (FORMAT) state, such as in a small computer system interface protocol SCSI protocol, in the description of the query command FORMAT, it is not clear how the received command is to be processed when the hard disk is in the FORMAT state. In this way, the hard disk can not respond to the temperature inquiry command issued by the upper layer to report the temperature information of the hard disk in the formatted state.
That is, in some embodiments, the server may also detect the state of the hard disk first, and if the hard disk is in a formatted state, execute the step of the BMC sending a request awareness command to the hard disk through the Raid card.
Thus, different processing methods may be performed based on whether the hard disk responds to the query command.
If the hard disk feeds back temperature information to the BMC in response to a query command, the processing method shown in FIG. 5 may be performed, as in some embodiments.
As shown in fig. 5, if the BMC sends a query command to the hard disk through the Raid card and receives temperature information fed back by the hard disk, after receiving the temperature information, the BMC may execute a temperature regulation mode corresponding to the temperature information according to a plurality of preset temperature regulation modes.
For example, whether the temperature condition fed back in the temperature information of the hard disk reaches a regulation threshold value can be determined, if the current temperature condition of the hard disk does not reach the regulation threshold value, the processing is not performed, the next time the hard disk reports the temperature is continued, and if the current temperature condition of the hard disk reaches the regulation threshold value, the working mode of a cooling system in the server can be adaptively regulated according to the temperature regulation mode so as to cool the hard disk.
Here, a cooling system will be described by taking a fan as an example. If the working temperature of the hard disk is determined to be greater than the upper limit of the working temperature, the rotating speed of the fan can be controlled to be 1, and if the working temperature of the hard disk is determined to reach the upper limit of the temperature indicating to stop working, the rotating speed of the fan can be controlled to be 2, and the rotating speed 1 is less than the rotating speed 2.
In some embodiments, when the temperature of the hard disk is detected to be reduced, for example, the temperature information of the hard disk is lower than the regulation threshold value, the rotating speed of the recovery fan can be changed to a low-power-consumption rotating speed mode, so that the waste of the power consumption of the fan is avoided.
In some embodiments, if the hard disk receives the query command without feeding back temperature information to the BMC, the BMC may send a request awareness command to the hard disk through the Raid card.
In the embodiment of the application, the opportunity for the BMC to send the request sensing command to the hard disk through the Raid card is not limited. As in some embodiments, the BMC may send a request aware command to the hard disk through the Raid card if the hard disk receives a query command but no temperature information is fed back. That is, in case of error of the hard disk executing query command, the BMC sends the request sensing command to the hard disk again through the Raid card.
Or the BMC can still send a request sensing instruction to the hard disk through the Raid card under the condition that the hard disk executes the query command and feeds back the temperature information. Thus, the request perception instruction can be used for inquiring the state information after the hard disk normally executes the inquiry command last time.
In the embodiment of the application, the time for the BMC to send the request sensing command to the hard disk through the Raid card is not limited, for example, the BMC can send the request sensing command to the hard disk periodically through the Raid card, or the BMC can send the request sensing command to the hard disk through the Raid card after receiving the error information fed back by the hard disk, or the BMC can send the request sensing command to the hard disk through the Raid card when the relevant information of the hard disk responding to the query command is not received within a period of time.
The above-mentioned request-aware commands may implement the same function but may have different names because of different protocols or different versions of the protocols, and thus, in the embodiment of the present application, the specific command of the request-aware command is not limited. As in the SCSI protocol, the Request aware command may be a Request Sense command command, the Request Sense command command may also be simply referred to as a Request Sense command or a Sense command. For example, in the SCSI Block Commands-5 (SBS-5) protocol, request Sense command commands may be used to obtain detailed information about the failure of a device to execute a command before, thereby helping the device to debug or troubleshoot. In other protocols, the request awareness command may be other types of commands, which are not described in detail herein.
In step 402, the bmc obtains, by using the Raid card, sensing data of the hard disk for a request sensing command, where the sensing data includes a first field configured to indicate temperature information of the hard disk.
It should be noted that, in the embodiment of the present application, the Raid card does not parse the sensing data, but the Raid card sends the sensing data to the BMC, and the BMC parses the sensing data to obtain the temperature information of the hard disk included in the sensing data.
In the embodiment of the application, the temperature information of the hard disk may be temperature information detected by the hard disk when the inquiry command is received, or may be temperature information detected by the hard disk when the request sensing command is received, which is not limited.
In step 403, the bmc obtains temperature information of the hard disk based on the perceived data of the hard disk.
In the embodiment of the present application, the type of the temperature information of the hard disk included in the sensing data is not limited. For example, the temperature information of the hard disk may be a temperature range of the hard disk, a temperature value of the hard disk, a temperature abnormality type of the hard disk, or one or more of an excessive temperature, a warning, etc., which will be described in detail later.
In some embodiments, the awareness data includes a first field, which may be a SENSE DATA descriptor field, which SENSE DATA descriptor field may be configured to indicate a temperature value of the hard disk.
In some embodiments, the first field may include a first byte for indicating the descriptor type, a second byte for indicating the temperature value information length, and a third byte for indicating the current temperature value of the hard disk.
That is, the hard disk may obtain the temperature value and may write the temperature value to the first field, such as writing the temperature value to the third byte of the first field.
In this way, the BMC may control the connected fans based on the temperature value of the hard disk.
In the embodiment of the present application, the type of the first field is not limited. As in the SCSI protocol, the first field may be a SENSE DATA descriptor field. Of course, in other protocols, the first field may be another type of field, which is not described herein.
With reference to fig. 6, as shown in fig. 6, the 8 th byte to the nth byte are the positions of the first field.
Fig. 7 is a schematic diagram illustrating a description of the first field in the SCSI protocol. As shown in fig. 7, the first field may be described by a Descriptor type field (Descriptor TYPE FIELD), and in order to describe the current temperature value of the hard disk, values in the range from 80h to FFh in the Descriptor type list Descriptor TYPE FIELD LIST may be selected to implement content customization of the first field.
Fig. 8 shows a schematic diagram of the composition of a first field. As shown in fig. 8, in some embodiments, the first byte may be a Descriptor type (80 h to FFh), the second byte may be an additional information length ADDITIONAL LENGTH (n-1), and the third byte may be a byte of customizable content, for example, the temperature sensor may write the temperature value of the hard disk into the third byte after detecting the temperature value of the hard disk.
In some embodiments, the awareness data further includes a fifth field indicating the length of the first field, i.e., indicating the length of the subsequent bytes.
With continued reference to fig. 6, as shown in fig. 6, the 7 th byte is the location of the fifth field, which may be represented as the ADDITIONAL information length add SENSE LENGTH (n-1), that is, the fifth field may be used to indicate the byte length occupied by the first field.
For example, if the current temperature value of the hard disk written in the first field is 1 byte, the total number of bytes of the first field is 1 byte for the information indicating the descriptor type, 1 byte for the information indicating the length of the temperature value information, and 3 bytes for the total, i.e. the fifth field takes a value of 3.
Of course, in some embodiments, the specific temperature value of the hard disk may not be fed back in the sensing data, but the current temperature range of the hard disk may be fed back selectively, so that the BMC determines the control method for the fan according to the current temperature range of the hard disk.
As in the first embodiment, in order that the sensing data may reflect the temperature information of the hard disk, the sensing data may further include a second field and a third field, and the second field and the third field may be different according to different protocols.
As in the SCSI protocol, the second field may be an additional detection code ASC field in SENSE DATA information and the third field may be an additional detection code qualifier ASCQ field.
Of course, in other protocols, the second field and the third field may be other types of fields, which are not described herein.
It should be noted that, in the SCSI protocol, different values of the ASC field and the ASCQ field are described, and a part of values can be identified by the description information and are opened to the user to perform a user-defined description on the values, and then the values which can be user-defined by the user are abbreviated as "user-defined values", and the values which have specific description information in the protocol are abbreviated as "defined values".
Fig. 9 shows a description of the ASC field and the ASCQ field.
As shown in fig. 9, it can be seen from the contents of the dashed boxes that when the ASC is arbitrarily valued between 80h and FFh, the ASCQ field can be arbitrarily valued as in 01h,02h,03h, etc., and the ASC field and the ASCQ field can be combined to be described as "vector SPECIFIC", that is, the related contents are customized by the user.
When both the ASC field and the ASCQ field are 00h, the ASC field and the ASCQ field may be described as "NO ADDITIONAL SENSE INFORMATION", which is additional information for defining an abnormal situation, and when the ASC field is 0ch and the ASCQ field is 00h, the ASC field and the ASCQ field may be described as "WRITE ERROR", which is information for indicating the occurrence of a WRITE ERROR.
It can be seen that in the SCSI protocol, there are values available for customization in both the ASC field and the ASCQ field, as well as values of the defined description information. By using the combination of the values available for the customization and the values of the defined description information, the ASC field and the ASCQ field can be combined to feed back the temperature information of the hard disk.
Based on the above, the ASC and the ASCQ can be configured to be self-defined values, so that the values of the ASC and the ASCQ can directly reflect the temperature information of the hard disk.
For example, in some embodiments, ASC is configured to a first value and ASCQ is configured to a second value, the first and second values being used to indicate a temperature threshold T1 that triggers the fan to perform speed regulation.
Thus, if the ASC in the perceived data received by the BMC is configured to be a first value and the ASCQ is configured to be a second value, the BMC may determine that the temperature value of the hard disk reaches a temperature threshold T2 that triggers the fan to perform speed regulation, and based on this, the BMC may start the fan to perform heat dissipation on the hard disk.
For example, as shown in table 1 below, in one possible embodiment, the additional detection code ASC field may be valued for 80h and the additional detection code qualifier ASCQ field may be valued for 02h, which may be described in combination as "the current temperature of the hard disk is greater than T2".
Or in other embodiments, the ASC is configured to a third value and the ASCQ is configured to a fourth value, the third value and the fourth value being used to indicate an upper threshold for the hard disk operating temperature.
Thus, if the ASC in the perceived data received by the BMC is configured to the third value and the ASCQ is configured to the fourth value, the BMC may determine that the temperature value of the hard disk reaches the upper threshold T1 of the hard disk operating temperature, and based on this, the BMC may control the fan to increase the rotation speed, such as to operate at the maximum rotation speed.
For example, as shown in table 1 below, the additional detection code ASC field may be valued for 80h and the additional detection code qualifier ASCQ field may be valued for 01h, which three fields may be described in combination as "the current temperature of the hard disk is greater than T1".
Or in still other embodiments, the ASC is configured to a fifth value and the ASCQ is configured to a sixth value, the fifth value and the sixth value being used to indicate a temperature threshold at which the hard disk is out of service. Thus, if the temperature value of the hard disk reaches the temperature threshold T3 at which the hard disk stops working, the BMC may determine that the hard disk has stopped working.
For example, as shown in table 1 below, the additional detection code ASC field may be valued for 80h and the additional detection code qualifier ASCQ field may be valued for 03h, which three fields may be described in combination as "the current temperature of the hard disk is greater than T3".
TABLE 1
Wherein, T1 may be set as an upper limit threshold of the hard disk operating temperature, T2 may be set as a temperature threshold for triggering the fan to perform speed regulation, and T3 may be set as an upper limit of the hard disk stop (shutdown) temperature. Here, T2.ltoreq.T1 < T3. Typically, the normal operating temperature of a hard disk is typically between 40-50 degrees, and the limiting operating temperature should be less than 70 degrees.
After the embodiment is implemented, the BMC can determine the regulation and control mode of the fan based on the temperature information of the hard disk after the temperature abnormality information is fed back to the BMC, so that the regulation and control of the abnormal temperature of the hard disk is completed, and the working reliability of the hard disk is ensured.
In some embodiments, the awareness data may also include a fourth field that is an error class code sense key field.
Of course, in other protocols, the fourth field may be another type of field, which is not described herein.
With continued reference to fig. 6, fig. 6 shows a schematic diagram of the composition of the sensing data. As shown in fig. 6, the parts shown by the dashed boxes are the positions of the sense key field, the ASC field and the ASCQ field.
In the following, a description will be given by taking the fourth field as a sense key field, the second field as an ASC field, and the third field as an ASCQ field as an example, to reflect the temperature information of the hard disk by using the combination of the second field, the third field, and the fourth field.
In the SCSI protocol, the error condition and specific error information of the hard disk can be reflected through the combined use of the sense key field, the ASC field and the ASCQ field, and a top-down method is provided for the server to determine the state of the hard disk.
Firstly, in the SCSI protocol, different values of the sense key field are described, and part of values can be identified and opened to users through description information so as to carry out custom description on the values.
Fig. 10 shows a descriptive information diagram of a sense field.
As shown in FIG. 10, when the value of the sense field is 0h, the field may be used to indicate that there is no ERROR, when the value of the sense field is 1h, the field may be described as "RECOVERED ERROR" which is used to indicate that a recoverable ERROR occurs, which generally does not affect the integrity of the data, when the value of the sense field is 2h, the field may be described as "indicating that the device is not ready, the ERROR indicates that the target device (e.g., a hard disk drive) is not currently accessible, when the value of the sense field is 3h, the field may be used to indicate that a media ERROR occurs, which indicates that an unrecoverable ERROR is found on the data storage medium during the execution of the command, when the value of the sense field is 4h, the field may be described as" HARDWARE ERR "which is used to indicate that an unrecoverable HARDWARE ERROR occurs, when the value of the sense field is 6h, the field may be described as" ATTENTION "which is used to indicate that ATTENTION is needed, and when the value of the sense field is 9h, the sense may be defined by the user.
Of course, when the value of the sense key field is a value which is defined and used for describing the type of abnormality caused by command execution, the sense key field can only reflect that the hard disk is abnormal, and the BMC cannot know that the type of abnormality of the hard disk is related to temperature abnormality according to the field.
Based on this, an ASC field and an ASCQ field are also required to assist in reflecting the temperature information of the hard disk. In the existing SCSI protocol, there is no defined ASC field and ASCQ field value that can be used to reflect the temperature anomaly information of the hard disk. Therefore, the ASC field and the ASCQ field need to be custom values, and the description of the custom values indicates the scene information of the abnormal temperature of the hard disk.
In the existing SCSI protocol, when the ASC field is specified to have a value between 80h and FFh, the description content corresponding to the value can be customized.
It can be seen that in the SCSI protocol, there are values available for customization in the sense key field, ASC field, and ASCQ field, and values of the defined description information. By using the combination of the values available for the customization and the values of the defined description information, the sense key field, the ASC field and the ASCQ field can be combined to feed back the temperature information of the hard disk.
For example, in some embodiments, ASC may be set to a first value and ASCQ to a second value, and a sense key field may be used to indicate that attention to the hard disk temperature is required. Thus, the BMC may determine that the temperature of the hard disk is abnormal based on the combined description of the sense key field, the ASC field, and the ASCQ field.
It should be noted that, in some embodiments, if the temperature of the hard disk is abnormal, but the temperature of the hard disk does not reach the temperature threshold T2 for triggering the fan to perform speed regulation, the BMC may not start the fan to dissipate heat of the hard disk.
For example, the sense key field may be customized to indicate that attention to the hard disk temperature is required, as shown in table 2 below, and the sense key field may be customized to 6 h.
In this case, the ASC field and the ASCQ field may be set to be customized values at the same time, that is, ASC is a first value and ASCQ is a second value, for example, the ASC field may be set to be 80h and the ASCQ field may be set to be 02h, and these two fields may be combined to describe that "the current temperature of the hard disk is greater than T2".
Thus, when the sense key field value contained in the sense data received by the BMC is 6h, the asc field value is 80h, and the ascq field value is 02h, the BMC can determine that the current temperature of the hard disk is greater than T2.
Or in other embodiments, the ASC is a third value and the ASCQ is a fourth value, and the sense key field is used to indicate that a recoverable error has occurred. Thus, the BMC may determine that a recoverable error occurred in the hard disk based on the combined description of the sense key field, the ASC field, and the ASCQ field.
For example, as shown in table 2 below, when the value of the sense key field is selected, the value of the sense key field may be selected to be 1h, and a recoverable error occurs in the hard disk indicated by the value of the sense key field being 1 h. Meanwhile, to further describe the temperature information of the hard disk, an ASC value may be set to a third value, an ASCQ value may be set to a fourth value, for example, the third value is 80h, the fourth value is 01h, and the third value and the fourth value are used to indicate an upper threshold of the working temperature of the hard disk.
Therefore, when the sense field value contained in the sensing data received by the BMC is 1h, the ASC field value is 80h, and the ASCQ field value is 01h, the BMC can enable the hard disk to be in recoverable error at present, the temperature value of the hard disk reaches the upper limit threshold T1 of the working temperature of the hard disk, and based on the recoverable error, the BMC can control the fan to increase the rotating speed, for example, the fan can operate at the maximum rotating speed.
Or in still other embodiments, the ASC is a fifth value and the ASCQ is a sixth value, and the sense key field is used to indicate a hardware error. Thus, the BMC may determine that the hard disk has an unrecoverable error based on the combined description of the sense key field, the ASC field, and the ASCQ field.
For example, as shown in the following table 2, the error class code sense field is valued for 4h, and the fact that the error class code sense field is valued for 4h indicates that an unrecoverable error occurs in the hard disk. Meanwhile, to further describe the temperature information of the hard disk, an ASC value may be set to a fifth value, and an ASCQ value may be set to a sixth value, for example, the fifth value is 80h, the sixth value is 03h, and the fifth value and the sixth value are used to indicate a temperature threshold value at which the hard disk stops working.
Therefore, when the sense key field value contained in the sensing data received by the BMC is 4h, the ASC field value is 80h, and the ASCQ field value is 03h, the BMC can cause unrecoverable errors on the hard disk at present, and the temperature value of the hard disk reaches the temperature threshold T3 for stopping working.
TABLE 2
Wherein, T1 may be set as an upper limit threshold of the hard disk operating temperature, T2 may be set as a temperature threshold for triggering the fan to perform speed regulation, and T3 may be set as an upper limit of the hard disk stop (shutdown) temperature. Here, T2.ltoreq.T1 < T3. Typically, the normal operating temperature of a hard disk is typically between 40-50 degrees, and the limiting operating temperature should be less than 70 degrees.
By implementing the embodiment, the server can acquire the specific temperature range of the hard disk at present, so that the hard disk is accurately temperature-regulated according to the temperature range of the hard disk, and the working reliability of the hard disk is ensured.
In some embodiments, when the value of the sense key field is a value that is defined in the protocol and is used to describe the type of abnormality caused by command execution, the sense key field can only reflect that the hard disk is abnormal, and the server cannot know that the type of abnormality of the hard disk is related to the temperature abnormality according to the field.
Based on this, in some embodiments, ASC may also be valued at 80h and ascq at 00h, and it is predefined that these two fields may be described in combination as "the current temperature of the hard disk is greater than T1, or the current temperature of the hard disk is greater than T2, or the current temperature of the hard disk is greater than T3".
In this way, if the sense key field value is 6h, the asc field value is 80h, and the ascq field value is 00h in the sense data received by the BMC, the server may determine that an abnormal condition to be noted occurs in the hard disk, where the abnormal condition may be that the current temperature of the hard disk is greater than T1, or that the current temperature of the hard disk is greater than T2, or that the current temperature of the hard disk is greater than T3.
Of course, in some embodiments, the value of the sense key field may be a custom value indicating that an abnormality occurs, for example, the value of the sense key field is 9h, and the field may be custom value indicating that an abnormality occurs in the hard disk, or indicating that a temperature abnormality occurs in the hard disk, etc. In this case, the ASC field and the ASCQ field may be customized values.
Referring to table 3 below, for example, if the error class code sense field is 9h, the additional detection code ASC field is 80h, and the additional detection code qualifier ASCQ field is 00h, these three fields can be combined and described as "the hard disk has a temperature anomaly, the anomaly is that the current temperature of the hard disk is greater than T1, or the current temperature of the hard disk is greater than T2, or the current temperature of the hard disk is greater than T3".
TABLE 3 Table 3
Wherein, T1 may be set as an upper limit threshold of the hard disk operating temperature, T2 may be set as a temperature threshold for triggering the fan to perform speed regulation, and T3 may be set as an upper limit of the hard disk stop (shutdown) temperature. Here, T2.ltoreq.T1 < T3. Typically, the normal operating temperature of a hard disk is typically between 40-50 degrees, and the limiting operating temperature should be less than 70 degrees.
It can be understood that, for the above embodiment, the server can only determine that the hard disk has a temperature abnormality according to the combination of the fourth field, the second field and the third field, or only determine that the abnormal condition of the hard disk may be that "the current temperature of the hard disk is greater than T1, or the current temperature of the hard disk is greater than T2, or the current temperature of the hard disk is greater than T3", but the range where the current temperature of the hard disk is cannot be accurately determined, so that the cooling system cannot be controlled to perform reasonable regulation, so that the working reliability of the hard disk is ensured, and meanwhile, the power consumption of the cooling system is also reduced.
Based on the above embodiments, the regulation mode of the fan may be determined by combining the temperature value of the hard disk provided by the first field in the sensing data.
By implementing the embodiment, the hard disk can feed back the sensing data to the BMC, wherein the sensing data comprises the temperature information of the hard disk, such as the occurrence of the abnormality of the hard disk, the temperature value of the hard disk and the like, so that the temperature abnormality information of the hard disk can be fed back successfully in time when the temperature abnormality occurs, the server can perform relevant temperature regulation and control according to the temperature abnormality information of the hard disk, and the working reliability of the hard disk is further ensured.
It should be understood that, although the steps in the flowcharts described above are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described above may include a plurality of sub-steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of execution of the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with at least a part of the sub-steps or stages of other steps or other steps.
Based on the foregoing embodiment, the embodiment of the present application provides a method for obtaining a hard disk temperature, where the method is applied to a baseboard management controller BMC, and the method includes:
the BMC sends a sensing data acquisition instruction to the Raid card, wherein the sensing data acquisition instruction is used for indicating the Raid card to send a request sensing command request sense command to the hard disk, and the request sensing command is used for acquiring sensing data SENSE DATA of the hard disk;
The BMC receives sensing data sent by the Raid card and aiming at the request sensing command, wherein the sensing data comprises a first field, and the first field is configured to indicate temperature information of the hard disk;
and the BMC acquires temperature information of the hard disk based on the perceived data of the hard disk.
In some embodiments, the baseboard management controller BMC may also perform the method as described in the above embodiments.
The embodiment of the application provides a server, which comprises a Baseboard Management Controller (BMC), an independent redundant disk array (RAID) card and a hard disk, wherein the BMC is connected with the RAID card, and the RAID card is connected with the hard disk.
The steps in the method provided by the method embodiment can be executed through the BMC, the Raid card and the hard disk in the server.
The embodiment of the application provides a Baseboard Management Controller (BMC), through which steps in the method provided by the embodiment of the method can be executed.
It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application. The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments. The foregoing description of various embodiments is intended to highlight differences between the various embodiments, which may be the same or similar to each other by reference, and is not repeated herein for the sake of brevity.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The embodiments described above are illustrative only, and the various components shown or discussed as being coupled or directly coupled or communicatively coupled to each other may be indirectly coupled or communicatively coupled via some interface, device or module, whether electrical, mechanical or otherwise.
The methods disclosed in the method embodiments provided by the application can be arbitrarily combined under the condition of no conflict to obtain a new method embodiment.
The features disclosed in the embodiments of the method or the apparatus provided by the application can be arbitrarily combined without conflict to obtain new embodiments of the method or the apparatus.
The foregoing is merely an embodiment of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.