Disclosure of Invention
Accordingly, the present application aims to provide a method, apparatus, device and medium for storing cluster logs, which can improve the utilization rate of the cluster storage space and the log access efficiency. The specific scheme is as follows:
in a first aspect, the application discloses a cluster log storage method, which comprises the following steps:
counting the access frequency of log files of each target module in the cluster;
determining compression attributes corresponding to the target modules based on the access frequency, wherein the compression attributes comprise compression and non-compression;
When the logs to be stored in the cluster are monitored, determining the compression attribute corresponding to the logs to be stored;
And if the compression attribute is not compressed, storing the log to be stored in a log cache area, otherwise, compressing the log to be stored to obtain a first compressed log, and storing the first compressed log in a static storage area.
Optionally, the counting the access frequency of the log file of each target module in the cluster includes:
counting the access frequency of log files of each functional module and each sub-module of each functional module;
Correspondingly, the determining the compression attribute corresponding to each target module based on the access frequency includes:
And determining the compression attribute corresponding to each target module based on the access times, wherein the access frequency of the function module with the compression attribute being uncompressed is higher than that of the function module with the compression attribute being compressed, and the access frequency of the sub-module with the compression attribute being uncompressed is higher than that of the sub-module with the compression attribute being compressed.
Optionally, the determining, based on the access frequency, a compression attribute corresponding to each target module includes:
Ordering the access frequency of the log files of each functional module;
screening out a preset number of functional modules with highest access frequency to obtain a first module;
determining the compression attribute of the first module as non-compression, and determining the compression attribute of the non-first module in the functional module as compression;
ordering the access frequency of the log files of all the sub-modules which are not the first module in the functional module;
and determining the compression attribute of the sub-module with the first preset proportion, which has the highest access frequency, as non-compression, and determining the compression attribute of other sub-modules which are not the first module as compression.
Optionally, the method further comprises:
storing the compressed attribute of each target module to a log base;
correspondingly, the determining the compression attribute corresponding to the log to be stored includes:
Searching the compression attribute of the functional module to which the log to be stored belongs from the log base, if the compression attribute of the functional module to which the log to be stored belongs is not compressed, determining the compression attribute corresponding to the log to be stored to be not compressed, if the compression attribute of the functional module to which the log to be stored belongs is compressed, searching the compression attribute of the sub-module to which the log to be stored belongs, if the compression attribute of the sub-module to be stored is not compressed, determining the compression attribute corresponding to the log to be stored to be not compressed, otherwise, determining the compression attribute corresponding to the log to be stored to be compressed.
Optionally, the method further comprises:
when the log to be stored is stored in the log cache area, the expiration time of the log to be stored is stored in a log base;
Determining a target log from the log cache area based on the expiration time at preset time intervals or at appointed time;
compressing the target log to obtain a second compressed log;
And migrating the second compressed log to the static storage area.
Optionally, the determining, based on the expiration time, the target log from the log cache area includes:
Determining an expiration log from the log cache based on the expiration time;
and randomly selecting a specified number of the outdated logs to obtain target logs.
Optionally, the method further comprises:
when the use proportion of the log cache area reaches a preset threshold, ordering the access frequency of the log files of all sub-modules with uncompressed compression attributes;
Compressing the log files of the submodules with the second preset proportion, which have the minimum access frequency, to obtain a third compressed log;
And migrating the third compressed log to the static storage area.
In a second aspect, the present application discloses a cluster log storage device, including:
The access frequency statistics module is used for counting the access frequency of the log files of each target module in the cluster;
The module attribute determining module is used for determining compression attributes corresponding to the target modules based on the access frequency, wherein the compression attributes comprise compression and non-compression;
The log waiting monitoring module is used for monitoring whether logs waiting to be stored in the cluster;
the compression attribute determining module is used for determining the compression attribute corresponding to the log to be stored when the log to be stored in the cluster is monitored by the log to be stored monitoring module;
and the log storage module is used for storing the log to be stored into a log cache area if the compression attribute is not compressed, otherwise, compressing the log to be stored to obtain a first compressed log, and storing the first compressed log into a static storage area.
In a third aspect, the present application discloses an electronic device, comprising:
a memory for storing a computer program;
And the processor is used for executing the computer program to realize the cluster log storage method.
In a fourth aspect, the present application discloses a computer readable storage medium storing a computer program which, when executed by a processor, implements the aforementioned cluster log storage method.
The method comprises the steps of counting access frequency of log files of all target modules in a cluster, determining compression attributes corresponding to all the target modules based on the access frequency, wherein the compression attributes comprise compression and non-compression, determining the compression attributes corresponding to the logs to be stored when the logs to be stored in the cluster are monitored, storing the logs to be stored in a log cache area if the compression attributes are not compressed, otherwise, compressing the logs to be stored to obtain a first compression log, and storing the first compression log in a static storage area. The method and the system can compress and store the logs according to the access frequency of the log files of each module or directly store the logs in the log buffer area without compression, so that on one hand, partial logs are compressed, the utilization rate of the cluster storage space can be improved, and on the other hand, the compressed logs and the uncompressed logs are stored in different areas, and the uncompressed logs can be directly read from the log buffer area, so that the log access efficiency can be improved.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In the current distributed storage cluster environment, the log is important information in the storage cluster and comprises various business operation records, in addition, the historical running state of the cluster and problem positioning during faults are all needed to be analyzed through the log, meanwhile, along with the increasing of time, log data also linearly grows, a large amount of space is occupied by the log data in data storage, the overall space utilization rate of the distributed storage cluster is reduced, meanwhile, due to the fact that the log is numerous, the management and the retrieval of the log become more difficult, and therefore the problem of storing and retrieving massive log data is solved and has important influence on the cluster. Therefore, the application provides a cluster log storage scheme which can improve the utilization rate of a cluster storage space and log access efficiency.
Referring to fig. 1, the embodiment of the application discloses a cluster log storage method, which comprises the following steps:
and S11, counting the access frequency of the log files of each target module in the cluster.
And step S12, determining compression attributes corresponding to the target modules based on the access frequency, wherein the compression attributes comprise compression and non-compression.
In a specific embodiment, the access frequency of log files of each function module and sub-modules of each function module can be counted, and the compression attribute corresponding to each target module is determined based on the access frequency, wherein the access frequency of the function module which is not compressed is higher than the access frequency of the function module which is compressed by the compression attribute, and the access frequency of the sub-module which is not compressed is higher than the access frequency of the sub-module which is compressed by the compression attribute.
The functional modules may be alarm modules, cache modules, etc. in the cluster, where each functional module includes a plurality of sub-modules, for example, the alarm modules include alarm sub-modules, event sub-modules, etc.
In a specific embodiment, the access frequency of the log files of each functional module and the sub-modules of each functional module may be counted regularly, and then the compression attribute corresponding to each target module may be determined based on the access frequency. Thus, the access frequency can be updated in time, and the compression attribute can be modified in time.
Further, the log files of the function modules can be sorted, a preset number of the function modules with the highest access frequency are selected to obtain a first module, the compression attribute of the first module is determined to be not compressed, the compression attribute of the non-first module in the function modules is determined to be compressed, the access frequency of the log files of all the sub-modules other than the first module in the function modules is sorted, the compression attribute of the sub-modules with the first preset proportion with the highest access frequency is determined to be not compressed, and the compression attribute of other sub-modules other than the first module is determined to be compressed.
That is, in the embodiment of the present application, the access frequencies of the log files of each functional module may be ordered, a preset number of functional modules with the highest access frequency are selected, the compression attribute is determined to be compressed, then the access frequencies of the log files of the sub-modules of the remaining functional modules are ordered, and the compression attribute of the sub-modules with the first preset proportion with the highest access frequency is determined to be compressed, so that the problem that the access efficiency is reduced due to the fact that the log files with the higher access frequency exist in the remaining functional modules but are stored in a compressed manner can be avoided.
The preset number and the first preset proportion can be configured according to actual conditions.
And step S13, when the logs to be stored in the cluster are monitored, determining the compression attribute corresponding to the logs to be stored.
In a specific embodiment, the compression attribute of each target module may be stored in a log base, the compression attribute of the functional module to which the log to be stored belongs is searched from the log base, if the compression attribute of the functional module to which the log to be stored belongs is not compressed, the compression attribute corresponding to the log to be stored is determined to be not compressed, if the compression attribute of the functional module to which the log to be stored belongs is compressed, the compression attribute of the sub-module to which the log to be stored belongs is searched, if the compression attribute of the sub-module to which the log to be stored belongs is not compressed, the compression attribute corresponding to the log to be stored is determined to be not compressed, otherwise, the compression attribute corresponding to the log to be stored is determined to be compressed.
In a specific embodiment, the log storage task issued by the cluster can be monitored, when the log storage task is monitored, the log to be stored in the cluster is indicated, and the compression attribute corresponding to the log to be stored corresponding to the log storage task is determined for storage.
In addition, a global unique identification number can be generated for the corresponding log storage task, so that task processing is facilitated.
And S14, if the compression attribute is not compressed, storing the log to be stored in a log buffer area, otherwise, compressing the log to be stored to obtain a first compressed log, and storing the first compressed log in a static storage area.
It can be seen that, in the embodiment of the present application, the access frequency of the log files of each target module in the cluster is counted, and the compression attribute corresponding to each target module is determined based on the access frequency, where the compression attribute includes compression and non-compression, when the log to be stored in the cluster is monitored, the compression attribute corresponding to the log to be stored is determined, if the compression attribute is non-compression, the log to be stored is stored in the log buffer area, otherwise, the log to be stored is compressed to obtain a first compression log, and the first compression log is stored in the static storage area. The method and the system can compress and store the logs according to the access frequency of the log files of each module or directly store the logs in the log buffer area without compression, so that on one hand, partial logs are compressed, the utilization rate of the cluster storage space can be improved, and on the other hand, the compressed logs and the uncompressed logs are stored in different areas, and the uncompressed logs can be directly read from the log buffer area, so that the log access efficiency can be improved.
Referring to fig. 2, the embodiment of the application discloses a cluster log storage method, which comprises the following steps:
and S21, counting the access frequency of the log files of each target module in the cluster.
And S22, determining compression attributes corresponding to the target modules based on the access frequency, wherein the compression attributes comprise compression and non-compression.
Step S23, when the logs to be stored in the cluster are monitored, determining the compression attribute corresponding to the logs to be stored.
And step S24, if the compression attribute is not compressed, storing the log to be stored in a log buffer area, otherwise, compressing the log to be stored to obtain a first compressed log, and storing the first compressed log in a static storage area.
And S25, when the log to be stored is stored in the log buffer area, storing the expiration time of the log to be stored in a log base library.
And S26, determining a target log from the log cache area according to the expiration time at preset time intervals or at appointed time.
Such as once a day, once a week, or at specified points in time per day, on specified days per month, etc.
In a specific embodiment, the expiration logs can be determined from the log cache area based on the expiration time, and a specified number of the expiration logs are randomly selected to obtain a target log.
And S27, compressing the target log to obtain a second compressed log.
And step S28, migrating the second compressed log to the static storage area.
It can be understood that the embodiment of the application can process the expired log information, and in order to reduce the pressure of processing the expired log, the expired log can be selected from the target log in a random sampling manner at preset time intervals or at appointed time, and then compressed and stored.
In addition, in a specific embodiment, when the usage proportion of the log buffer area reaches a preset threshold, the method may sort the access frequencies of the log files of all the sub-modules with uncompressed compression attributes, compress the log files of the sub-modules with the second preset proportion, where the access frequency is the least, to obtain a third compressed log, and migrate the third compressed log to the static storage area.
It can be understood that, because the occupation of the log storage capacity increases linearly, when the capacity usage proportion reaches the threshold value, the log files with the second preset proportion after ranking are forcedly packed according to the access frequency order, and the storage resources are released, wherein the percentage of the released resources and the second preset proportion can be configured.
Further, in this embodiment, when a log reading request issued by the cluster is obtained, the compression attribute and the location information of the corresponding module of the corresponding log are read from the log base library, if the compression attribute is compression, the log is read from the static storage area according to the location information, decompressed and returned, and the log is stored and the log access record is stored in the log base library. If the compression attribute is not compressed, the log is read from the log buffer area according to the position information, and the log is directly returned, and the expiration time of the log is updated.
Referring to fig. 3, an embodiment of the present application discloses a cluster log storage device, including:
The access frequency statistics module 11 is used for counting the access frequency of the log files of each target module in the cluster;
A module attribute determining module 12, configured to determine a compression attribute corresponding to each of the target modules based on the access frequency, where the compression attribute includes compression and non-compression;
the log waiting monitoring module 13 is used for monitoring whether logs waiting to be stored in the cluster;
The compression attribute determining module 14 is configured to determine, when the log to be stored in the cluster is monitored by the log to be stored monitoring module, the compression attribute corresponding to the log to be stored;
and the log storage module 15 is configured to store the log to be stored in the log buffer area if the compression attribute is not compressed, otherwise, compress the log to be stored to obtain a first compressed log, and store the first compressed log in the static storage area.
It can be seen that, in the embodiment of the present application, the access frequency of the log files of each target module in the cluster is counted, and the compression attribute corresponding to each target module is determined based on the access frequency, where the compression attribute includes compression and non-compression, when the log to be stored in the cluster is monitored, the compression attribute corresponding to the log to be stored is determined, if the compression attribute is non-compression, the log to be stored is stored in the log buffer area, otherwise, the log to be stored is compressed to obtain a first compression log, and the first compression log is stored in the static storage area. The method and the system can compress and store the logs according to the access frequency of the log files of each module or directly store the logs in the log buffer area without compression, so that on one hand, partial logs are compressed, the utilization rate of the cluster storage space can be improved, and on the other hand, the compressed logs and the uncompressed logs are stored in different areas, and the uncompressed logs can be directly read from the log buffer area, so that the log access efficiency can be improved.
The access frequency statistics module 11 is specifically configured to count access frequencies of log files of each functional module and sub-modules of each functional module;
Accordingly, the module attribute determining module 12 is specifically configured to:
And determining the compression attribute corresponding to each target module based on the access times, wherein the access frequency of the function module with the compression attribute being uncompressed is higher than that of the function module with the compression attribute being compressed, and the access frequency of the sub-module with the compression attribute being uncompressed is higher than that of the sub-module with the compression attribute being compressed.
In a specific embodiment, the module attribute determining module 12 is specifically configured to:
Ordering the access frequency of the log files of each functional module;
screening out a preset number of functional modules with highest access frequency to obtain a first module;
determining the compression attribute of the first module as non-compression, and determining the compression attribute of the non-first module in the functional module as compression;
ordering the access frequency of the log files of all the sub-modules which are not the first module in the functional module;
and determining the compression attribute of the sub-module with the first preset proportion, which has the highest access frequency, as non-compression, and determining the compression attribute of other sub-modules which are not the first module as compression.
The apparatus further comprises a compression attribute storage module for:
storing the compressed attribute of each target module to a log base;
accordingly, the compression attribute determining module 14 is specifically configured to:
Searching the compression attribute of the functional module to which the log to be stored belongs from the log base, if the compression attribute of the functional module to which the log to be stored belongs is not compressed, determining the compression attribute corresponding to the log to be stored to be not compressed, if the compression attribute of the functional module to which the log to be stored belongs is compressed, searching the compression attribute of the sub-module to which the log to be stored belongs, if the compression attribute of the sub-module to be stored is not compressed, determining the compression attribute corresponding to the log to be stored to be not compressed, otherwise, determining the compression attribute corresponding to the log to be stored to be compressed.
The apparatus further comprises:
The expiration time storage module is used for storing the expiration time of the log to be stored into a log base when the log to be stored is stored into the log buffer area;
the target log determining module is used for determining a target log from the log cache area based on the expiration time at preset time intervals or at appointed time;
The log compression module is used for compressing the target log to obtain a second compressed log;
And the first log migration module is used for migrating the second compressed log to the static storage area.
Further, a target log determining module is specifically configured to determine an expiration log from the log buffer based on the expiration time, and randomly select a specified number of the expiration logs to obtain a target log.
Further, the device further comprises a second log migration module, which is used for sorting the access frequencies of the log files of all the sub-modules with the compression attribute not being compressed when the usage proportion of the log buffer area reaches a preset threshold value, compressing the log files of the sub-modules with the second preset proportion, which have the least access frequency, to obtain a third compressed log, and migrating the third compressed log to the static storage area.
For example, referring to fig. 4, an embodiment of the present application discloses a specific cluster log storage scheme. In a specific embodiment, the foregoing modules may be integrated, and a log task monitoring module, a log task management module, a log access module, a log update module, a log data statistics module, a log base library, and the like may be deployed. The system comprises a log task monitoring module, a log task management module and a log task management module, wherein the 1) the log task monitoring module is used for providing log task monitoring and log task issuing functions for a cluster, and when the cluster issues a log storage task, the task is sent to the log task management module for processing, and meanwhile, a global unique identification number is generated for the log task. 2) The log task management module is used for managing the issued log storage task, judging whether compression is needed and acquiring storage position information according to information in a log base when storing the log, storing the compressed log in a log storage server, storing the compressed log in a static storage area, recording the position information, storing the compressed log in a log cache area, recording the position information stored in the log cache area and configuring expiration time in the log base, reading the log according to the position information of the log when reading the log, and directly returning the log data to a user after or without decompression according to compression attributes. 3) And the log access module receives a log reading request issued by the cluster, acquires a target log through the log task management module, stores a log access record in a log base after log access is completed, and if the log access record is a file for accessing the log cache area, simultaneously refreshes the expiration time of the log file. 4) The log data statistics module supports the frequency statistics and ordering of the access of the log files of each module, n (configurable) functional modules with highest frequency of use are directly put into the log buffer, the newly generated logs of the first m (configurable) sub-modules with highest frequency are put into the log buffer, in order to reduce the performance pressure caused by real-time updating in a large-scale cluster, the log statistics analysis and ordering are carried out, the compression attribute of the corresponding module is updated by adopting a timing update strategy (configurable), and the frequency statistics of the log files of the log module is modified in an instant mode. 5) The log updating module processes the log information reaching the expiration time, and in order to reduce the pressure of the system for processing the expiration log file, the module adopts a timing strategy and a regular strategy, wherein the timing strategy is that the system processes the expiration file in a random sampling mode at regular time, the regular strategy is that the expired log is processed in batches regularly, the expired log is migrated from a log buffer area to be packaged and stored, in addition, the occupation of the storage capacity of the log is linearly increased, when the use ratio of the capacity reaches a threshold value, the log updating module forcedly packages the log files of k percent (configurable) submodules after ranking according to the ranking, and releases storage resources (the percentage of the released resources is configurable). 6) And the log base is used for storing the log access record and the compression attribute of each module of the log and the expiration time information of the log file. Through the module, the log data access frequency is counted, the log data is divided into high and low frequency, the log data with low frequency is packaged and stored, the log with high frequency is stored in the log buffer area, and the utilization rate of the storage space and the log access efficiency are improved. The automatic mode is adopted to process the expired log regularly and regularly, the cluster storage utilization rate is improved, manual migration is avoided, so that the distributed storage cluster can optimize the storage of log files to a certain extent, the occupied space of the log is reduced, the retrieval efficiency of the log is improved, and the data reliability and stability of the cluster are improved.
Referring to fig. 5, an embodiment of the present application discloses an electronic device 20, which includes a processor 21 and a memory 22, wherein the memory 22 is used for storing a computer program, and the processor 21 is used for executing the computer program, and the cluster log storage method disclosed in the foregoing embodiment.
For the specific process of the cluster log storage method, reference may be made to the corresponding content disclosed in the foregoing embodiment, and no further description is given here.
The memory 22 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk or an optical disk, and the storage mode may be transient storage or permanent storage.
In addition, the electronic device 20 further includes a power supply 23, a communication interface 24, an input/output interface 25, and a communication bus 26, where the power supply 23 is configured to provide working voltages for each hardware device on the electronic device 20, the communication interface 24 is capable of creating a data transmission channel between the electronic device 20 and an external device, and the communication protocol to be followed by the communication interface is any communication protocol applicable to the technical solution of the present application, and is not specifically limited herein, and the input/output interface 25 is configured to obtain external input data or output data to the external device, and a specific interface type thereof may be selected according to specific application needs and is not specifically limited herein.
Further, the embodiment of the application also discloses a computer readable storage medium for storing a computer program, wherein the computer program is executed by a processor to realize the cluster log storage method disclosed in the previous embodiment.
For the specific process of the cluster log storage method, reference may be made to the corresponding content disclosed in the foregoing embodiment, and no further description is given here.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The foregoing describes the method, apparatus, device and medium for cluster log storage in detail, and specific examples are provided herein to illustrate the principles and embodiments of the present application, and the above examples are provided to assist in understanding the method and core ideas of the present application, and meanwhile, to those skilled in the art, according to the ideas of the present application, there are variations in the specific embodiments and application scope, so the disclosure should not be construed as limiting the application.