CN117708809A

CN117708809A - Data recovery method, device, computing device cluster and storage medium

Info

Publication number: CN117708809A
Application number: CN202211449188.9A
Authority: CN
Inventors: 陈克云
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2022-09-15
Filing date: 2022-11-18
Publication date: 2024-03-15

Abstract

Disclosed are a data recovery method, apparatus, computing device cluster and storage medium, the method comprising: generating a file infection record based on snapshots of the file system at a plurality of snapshot time points, thereby responding to a data recovery instruction, accurately acquiring recovery data comprising a first file which is not infected from at least one snapshot of the first file recorded in the file infection record at the snapshot time points based on the file infection record, and carrying out data recovery based on the recovery data.

Description

Data recovery method, device, computing device cluster and storage medium

本申请要求于2022年09月15日提交的申请号202211124047.X、发明名称为“一种针对勒索病毒的数据管理方法”的中国专利申请的优先权，其全部内容通过引用结合在本申请中。This application claims priority to Chinese patent application No. 202211124047.X filed on September 15, 2022, with invention name “A data management method for ransomware”, the entire contents of which are incorporated by reference into this application.

技术领域Technical Field

本申请涉及存储技术领域，特别涉及一种数据恢复方法、装置、计算设备集群及存储介质。The present application relates to the field of storage technology, and in particular to a data recovery method, device, computing device cluster and storage medium.

背景技术Background Art

随着科技发展，数据安全逐渐受到重视。勒索病毒是一种威胁数据安全的电脑病毒，其利用各种加密算法对文件进行攻击，导致被感染的文件无法正常读写。With the development of science and technology, data security has gradually received more attention. Ransomware is a computer virus that threatens data security. It uses various encryption algorithms to attack files, making the infected files unable to be read and written normally.

目前，文件系统通常通过快照技术，来保存在多个历史时间点下的文件副本。在发现数据被病毒感染时，即可根据某个安全的历史时间点对应的文件副本，来恢复文件系统中的文件。Currently, file systems usually use snapshot technology to save copies of files at multiple historical time points. When data is found to be infected by a virus, the files in the file system can be restored based on the file copies corresponding to a certain safe historical time point.

但是，勒索病毒具备很强的隐蔽性，其会在文件系统中持续攻击。由于上述技术方案仅仅能够将文件系统恢复到被病毒攻击前的某一历史时间点下，而被病毒持续攻击的过程中所产生的文件则难以恢复，数据恢复的效率很低。However, ransomware is highly concealed and will continue to attack in the file system. Since the above technical solution can only restore the file system to a certain historical time point before the virus attack, the files generated during the continuous virus attack are difficult to recover, and the efficiency of data recovery is very low.

发明内容Summary of the invention

本申请实施例提供了一种数据恢复方法、装置、计算设备集群及存储介质，能够提升数据恢复的效率。该技术方案如下：The embodiment of the present application provides a data recovery method, apparatus, computing device cluster and storage medium, which can improve the efficiency of data recovery. The technical solution is as follows:

第一方面，提供了一种数据恢复方法，该方法包括：In a first aspect, a data recovery method is provided, the method comprising:

基于多个快照，生成文件感染记录，该文件感染记录指示被感染的第一文件被感染前的快照，其中，该快照是文件系统的快照或者目录的快照，每个该快照对应于一个快照时间点；Based on the multiple snapshots, a file infection record is generated, where the file infection record indicates a snapshot of the infected first file before the infection, wherein the snapshot is a snapshot of the file system or a snapshot of the directory, and each of the snapshots corresponds to a snapshot time point;

响应于数据恢复指令，基于该文件感染记录，从该快照时间点不晚于该第一文件的感染时间的至少一个快照中获取恢复数据，基于该恢复数据进行数据恢复，该恢复数据包括从该被感染前的快照中获取的未被感染的第一文件。In response to a data recovery instruction, based on the file infection record, recovery data is obtained from at least one snapshot whose snapshot time point is no later than the infection time of the first file, and data recovery is performed based on the recovery data, wherein the recovery data includes an uninfected first file obtained from the snapshot before the infection.

存储系统的软件(例如文件系统软件)对文件系统(或者目录)不断进行快照操作，以生成多个快照，并通过对这多个快照的内容进行比较，来检测是否存在被感染的文件。对于被感染文件，在其被感染前所生成的快照中，记录的是这个文件的健康数据(也即是未被篡改的正常数据)，而在其被感染之后所生成的快照中，记录的是这个文件的被感染后的数据。在第一方面中，通过文件感染记录来记录这个文件感染前的快照，当用户希望对这个被感染的文件进行恢复时，通过读取文件感染记录，能够直接定位到这个文件被感染前的数据所在的快照，并利用查找到的这个快照对被感染的该文件进行数据恢复，从而得到未被感染的数据。通过上述技术方案，能够高效地记录文件被感染的情况，从而对被感染的文件进行精准恢复，有效避免快照回滚带来的数据损失，大大提升数据恢复的效率。The software of the storage system (such as the file system software) continuously performs snapshot operations on the file system (or directory) to generate multiple snapshots, and detects whether there are infected files by comparing the contents of these multiple snapshots. For an infected file, the snapshot generated before it is infected records the healthy data of the file (that is, normal data that has not been tampered with), while the snapshot generated after it is infected records the data of the file after infection. In the first aspect, the snapshot before the infection of the file is recorded by the file infection record. When the user wants to restore the infected file, the snapshot where the data of the file before infection is located can be directly located by reading the file infection record, and the data of the infected file can be restored using the found snapshot, thereby obtaining uninfected data. Through the above technical solution, the infection of the file can be efficiently recorded, so that the infected file can be accurately restored, effectively avoiding the data loss caused by snapshot rollback, and greatly improving the efficiency of data recovery.

在一种可能实施方式中，该基于多个快照，生成文件感染记录，包括：In a possible implementation, the file infection record is generated based on multiple snapshots, including:

基于该多个快照中一对相邻快照时间点的快照，确定该相邻快照时间点中后一快照时间点的快照与该相邻快照时间点中前一快照时间点的快照之间的差异；Based on a pair of snapshots at adjacent snapshot time points among the multiple snapshots, determining a difference between a snapshot at a later snapshot time point among the adjacent snapshot time points and a snapshot at a previous snapshot time point among the adjacent snapshot time points;

对该差异进行感染检测，以确定该第一文件是被感染文件；Performing infection detection on the difference to determine that the first file is an infected file;

在文件感染记录中，将该第一文件记录为被感染文件。In the file infection record, the first file is recorded as an infected file.

通过确定相邻快照时间点的快照之间的差异以及感染检测过程，可以精准且实时地确定出被感染的第一文件，并在文件感染记录中指示能够用于恢复该第一文件的被感染前的快照，来保证所记录的文件的感染情况能够支持针对被感染文件的精准恢复。By determining the differences between snapshots at adjacent snapshot time points and the infection detection process, the infected first file can be accurately and real-time determined, and the snapshot before the infection that can be used to restore the first file is indicated in the file infection record to ensure that the infection status of the recorded file can support accurate recovery of the infected file.

在一种可能实施方式中，该对该差异进行感染检测，以确定该第一文件是被感染文件，包括：In a possible implementation, performing infection detection on the difference to determine that the first file is an infected file includes:

将文件后缀中存在病毒标识的第一文件，确定为被感染文件。The first file with a virus identifier in the file suffix is determined as an infected file.

在一些实施例中，病毒的感染方式包括修改文件的文件后缀，基于此，利用病毒在感染方式上的这一特点，能够快速地检测出被感染的文件，以提升感染检测的速度。In some embodiments, the virus infection method includes modifying the file suffix of the file. Based on this, by utilizing this feature of the virus infection method, the infected file can be quickly detected to improve the speed of infection detection.

将文件特征发生变化的第一文件，确定为被感染文件，该文件特征用于表征文件。The first file whose file feature changes is determined as an infected file, and the file feature is used to characterize the file.

在一些实施例中，病毒感染会导致文件的文件特征发生变化，基于此，利用病毒在感染方式上的这一特点，能够快速地检测出被感染的文件，以提升感染检测的速度。In some embodiments, virus infection may cause changes in file characteristics of files. Based on this, by utilizing this characteristic of the virus infection method, infected files can be quickly detected to increase the speed of infection detection.

在一种可能实施方式中，该差异包括：该相邻快照时间点中后一快照时间点的快照中相对于该相邻快照时间点中前一快照时间点的快照的新增文件或修改文件。In a possible implementation manner, the difference includes: a newly added file or a modified file in a snapshot at a later snapshot time point among the adjacent snapshot time points relative to a snapshot at a previous snapshot time point among the adjacent snapshot time points.

由于病毒对文件进行感染涉及对文件进行加密、修改和删除等过程，因此，文件系统中的新增文件、被修改的文件，都可能是被病毒感染的文件。因此，通过确定出新增文件或修改文件，即可初步确定出可能是因病毒感染而造成的变更，再通过后续的感染检测，来精准确定出被感染的文件。Since the virus infects files by encrypting, modifying and deleting files, newly added files and modified files in the file system may be files infected by the virus. Therefore, by identifying newly added files or modified files, it is possible to preliminarily determine the changes that may be caused by virus infection, and then through subsequent infection detection, the infected files can be accurately determined.

在一种可能实施方式中，在该文件感染记录中，记录有该第一文件的被感染前文件元数据。In a possible implementation manner, the file infection record includes file metadata of the first file before infection.

在一种可能实施方式中，该基于该文件感染记录，从该快照时间点不晚于该第一文件的感染时间的至少一个快照中获取恢复数据，包括：In a possible implementation, based on the file infection record, obtaining recovery data from at least one snapshot whose snapshot time point is not later than the infection time of the first file includes:

从该文件感染记录中，确定该第一文件的感染前文件元数据；Determining pre-infection file metadata of the first file from the file infection record;

从该第一文件的感染前文件元数据指示的快照中，获取未被感染的第一文件。An uninfected first file is obtained from a snapshot indicated by the file metadata of the first file before infection.

通过上述过程，即可根据所记录的感染前文件元数据，快速地获取到未被感染的文件副本，为数据恢复过程提供高效的信息检索方式，大大提升了数据恢复的效率。Through the above process, an uninfected file copy can be quickly obtained based on the recorded pre-infection file metadata, providing an efficient information retrieval method for the data recovery process, greatly improving the efficiency of data recovery.

在一种可能实施方式中，在该文件感染记录中，还记录有该第一文件的被感染后文件元数据，该第一文件的被感染前文件元数据以及该第一文件的被感染后文件元数据相关联；In a possible implementation manner, the file infection record also records the file metadata of the first file after infection, and the file metadata of the first file before infection and the file metadata of the first file after infection are associated;

该从该文件感染记录中，确定该第一文件的感染前文件元数据，包括：The step of determining the pre-infection file metadata of the first file from the file infection record includes:

从该文件感染记录中，确定该第一文件的被感染后文件元数据；Determining infected file metadata of the first file from the file infection record;

基于该第一文件的被感染后文件元数据，确定该第一文件的感染前文件元数据。Based on the post-infection file metadata of the first file, the pre-infection file metadata of the first file is determined.

通过上述过程，不仅记录了用于获取可用文件副本的感染前文件元数据，还记录了能够指示文件的感染时间的感染后文件元数据，从而为后续针对任一时间点的数据恢复提供完备的感染情况，进一步提升数据恢复的效率。Through the above process, not only the pre-infection file metadata used to obtain a usable file copy is recorded, but also the post-infection file metadata that can indicate the infection time of the file is recorded, thereby providing a complete infection situation for subsequent data recovery at any time point, further improving the efficiency of data recovery.

在一种可能实施方式中，该响应于数据恢复指令，基于该文件感染记录，从该快照时间点不晚于该第一文件的感染时间的至少一个快照中获取恢复数据，基于该恢复数据进行数据恢复，包括：In a possible implementation, in response to the data recovery instruction, based on the file infection record, obtaining recovery data from at least one snapshot whose snapshot time point is not later than the infection time of the first file, and performing data recovery based on the recovery data, includes:

响应于该数据恢复指令，生成克隆文件系统；In response to the data recovery instruction, generating a clone file system;

基于该文件感染记录，从该快照时间点不晚于该第一文件的感染时间的至少一个快照中，获取该恢复数据，在该克隆文件系统中，基于该恢复数据进行数据恢复。Based on the file infection record, the recovery data is obtained from at least one snapshot whose snapshot time point is not later than the infection time of the first file, and data recovery is performed in the clone file system based on the recovery data.

其中，该克隆文件系统是指该文件系统在指定时间点下的完整可用副本。The clone file system refers to a complete and available copy of the file system at a specified time point.

通过在克隆文件系统中进行数据恢复，能够排除数据恢复过程对当前文件系统中正常业务的影响，保障文件系统的一致性。By restoring data in a cloned file system, the impact of the data recovery process on normal services in the current file system can be eliminated, thereby ensuring the consistency of the file system.

在一种可能实施方式中，该基于该恢复数据对该文件系统进行数据恢复，包括：In a possible implementation, the performing data recovery on the file system based on the recovery data includes:

用该未被感染的第一文件，覆盖当前文件系统中被感染的第一文件。The infected first file in the current file system is overwritten with the uninfected first file.

通过上述技术方案，能够高效地完成针对被感染文件的恢复过程，无需人工从所保存的快照中确定被感染的文件，直接基于接近实时生成的文件感染记录来进行数据恢复，有效缩短了数据恢复的耗时，并减少了数据恢复带来的数据损失，大大提升了数据恢复的效率。Through the above technical solution, the recovery process for infected files can be completed efficiently without manually determining the infected files from the saved snapshots. Data recovery can be performed directly based on the file infection records generated in near real time, which effectively shortens the time required for data recovery and reduces the data loss caused by data recovery, greatly improving the efficiency of data recovery.

在一种可能实施方式中，该方法还包括：In one possible implementation, the method further includes:

在该第一文件未恢复完成的情况下，响应于针对该第一文件的访问请求，访问该被感染前的快照中未被感染的第一文件。In the case that the first file has not been completely restored, in response to an access request for the first file, the uninfected first file in the snapshot before being infected is accessed.

通过上述技术方案，能够在服务器后台进行数据恢复的过程中，为前台的业务系统提供流畅的文件访问服务，也即是，在业务系统对恢复过程无感知的情况下，实现对被感染数据的精准恢复。Through the above technical solution, it is possible to provide smooth file access services for the front-end business system during data recovery in the server background. That is, accurate recovery of infected data can be achieved without the business system being aware of the recovery process.

在一种可能实施方式中，该被感染是指被勒索病毒感染。In a possible implementation, the infection refers to infection by a ransomware virus.

第二方面，本申请实施例提供了一种数据恢复装置，包括至少一个功能模块，该至少一个功能模块用于实现前述第一方面或第一方面中任一种可选实现方式所涉及的数据恢复方法。In a second aspect, an embodiment of the present application provides a data recovery device, comprising at least one functional module, wherein the at least one functional module is used to implement the data recovery method involved in the aforementioned first aspect or any optional implementation manner of the first aspect.

第三方面，本申请实施例提供了一种计算设备集群，包括至少一个计算设备，每个计算设备包括处理器和存储器；该至少一个计算设备的处理器用于执行该至少一个计算设备的存储器中存储的指令，以使得该计算设备集群实现前述第一方面或第一方面中任一种可选实现方式所涉及的数据恢复方法。In a third aspect, an embodiment of the present application provides a computing device cluster, comprising at least one computing device, each computing device comprising a processor and a memory; the processor of the at least one computing device is used to execute instructions stored in the memory of the at least one computing device, so that the computing device cluster implements the data recovery method involved in the aforementioned first aspect or any optional implementation method of the first aspect.

第四方面，本申请实施例提供了一种计算机可读存储介质，该计算机可读存储介质用于存储至少一段程序代码，该至少一段程序代码由计算设备集群执行，以实现前述第一方面或第一方面中任一种可选实现方式所涉及的数据恢复方法。该存储介质包括但不限于易失性存储器，例如随机访问存储器，非易失性存储器，例如快闪存储器、硬盘(harddisk drive，HDD)、固态硬盘(solid state drive，SSD)。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium is used to store at least one program code, the at least one program code is executed by a computing device cluster to implement the data recovery method involved in the aforementioned first aspect or any optional implementation of the first aspect. The storage medium includes but is not limited to volatile memory, such as random access memory, non-volatile memory, such as flash memory, hard disk drive (HDD), solid state drive (SSD).

第五方面，本申请实施例提供了一种计算机程序产品，当该计算机程序产品在计算设备集群上运行时，使得计算设备集群实现第一方面或第一方面中任一种可选实现方式所涉及的数据恢复方法。该计算机程序产品可以为一个软件安装包，在需要实现前述数据恢复方法的情况下，可以下载该计算机程序产品并在计算设备上执行该计算机程序产品。In a fifth aspect, an embodiment of the present application provides a computer program product, which, when executed on a computing device cluster, enables the computing device cluster to implement the data recovery method involved in the first aspect or any optional implementation of the first aspect. The computer program product may be a software installation package, and when the aforementioned data recovery method needs to be implemented, the computer program product may be downloaded and executed on a computing device.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本申请实施例提供的一种数据恢复方法的实施环境的示意图；FIG1 is a schematic diagram of an implementation environment of a data recovery method provided in an embodiment of the present application;

图2是本申请实施例提供的一种数据恢复方法的原理图；FIG2 is a schematic diagram of a data recovery method provided in an embodiment of the present application;

图3是本申请实施例提供的一种计算设备的硬件结构示意图；FIG3 is a schematic diagram of the hardware structure of a computing device provided in an embodiment of the present application;

图4是本申请实施例提供的一种数据恢复方法的流程图；FIG4 is a flow chart of a data recovery method provided in an embodiment of the present application;

图5是本申请实施例提供的一种文件系统的功能示意图；FIG5 is a functional schematic diagram of a file system provided in an embodiment of the present application;

图6是本申请实施例提供的另一种数据恢复方法的流程图；FIG6 is a flow chart of another data recovery method provided in an embodiment of the present application;

图7是本申请实施例提供的一种数据恢复装置的结构示意图。FIG. 7 is a schematic diagram of the structure of a data recovery device provided in an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

下面将结合附图对本申请实施方式作进一步地详细描述。The implementation methods of the present application will be further described in detail below with reference to the accompanying drawings.

下面对本申请涉及的关键术语和关键概念进行说明。The key terms and key concepts involved in this application are explained below.

快照(snapshot)是指定数据集合的一种可用副本，该副本包括该数据集合在快照时间点的镜像(或称映像)，也即是，快照相当于是其所指示数据集合在某一时间点的一个副本。A snapshot is a usable copy of a specified data set, which includes a mirror image (or image) of the data set at the snapshot time point. That is, a snapshot is equivalent to a copy of the data set it indicates at a certain point in time.

勒索病毒是一种电脑病毒，主要以邮件、程序木马、网页挂马等形式进行传播。勒索病毒利用各种加密算法对文件进行加密，导致重要文件无法读取，关键数据被损坏。用户通常难以解密，需要拿到解密的私钥才有可能破解。Ransomware is a computer virus that is mainly spread in the form of emails, program Trojans, web page Trojans, etc. Ransomware uses various encryption algorithms to encrypt files, making important files unreadable and critical data damaged. It is usually difficult for users to decrypt, and they need to obtain the private key for decryption.

持续数据保护(continuous data protection，CDP)技术是一种能够连续捕获和保存文件系统中的数据变化，并将变化后的数据独立于原始数据进行保存的方法。Continuous data protection (CDP) technology is a method that can continuously capture and save data changes in a file system and save the changed data independently of the original data.

本申请提供了一种数据恢复方法，用于对文件系统进行数据恢复，能够有效提升数据恢复的效率。下面对本申请涉及的实施环境进行介绍。The present application provides a data recovery method for recovering data from a file system, which can effectively improve the efficiency of data recovery. The implementation environment involved in the present application is introduced below.

图1是本申请实施例提供的一种数据恢复方法的实施环境的示意图，参见图1，该实施环境包括文件系统110、文件系统管理端120、应用程序130和病毒程序140。FIG1 is a schematic diagram of an implementation environment of a data recovery method provided in an embodiment of the present application. Referring to FIG1 , the implementation environment includes a file system 110 , a file system management terminal 120 , an application program 130 , and a virus program 140 .

其中，该文件系统110用于提供文件服务，该文件系统110用于管理多个文件。在一些实施例中，该文件系统110基于至少一个计算设备构成的计算设备集群实现。在一些实施例中，该计算设备集群可以为服务器、多个物理服务器构成的服务器集群或者分布式文件系统，又或者是提供云存储以及云服务、云数据库、云计算、云函数、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(content delivery network，CDN)、大数据和人工智能平台等基础云计算服务的云服务器集群，本申请对此不做限定。在一些实施例中，该计算设备集群用于运行存储系统，存储系统是由存放程序和数据的各种存储设备、控制部件及管理信息调度的硬件和软件所组成的系统，文件系统用于在该存储系统中提供文件数据的存储和组织形式，以实现管理存储系统中多个文件的功能。Among them, the file system 110 is used to provide file services, and the file system 110 is used to manage multiple files. In some embodiments, the file system 110 is implemented based on a computing device cluster composed of at least one computing device. In some embodiments, the computing device cluster can be a server, a server cluster composed of multiple physical servers, or a distributed file system, or a cloud server cluster that provides cloud storage and cloud services, cloud databases, cloud computing, cloud functions, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (CDN), big data and artificial intelligence platforms and other basic cloud computing services, and this application does not limit this. In some embodiments, the computing device cluster is used to run a storage system, which is a system composed of various storage devices for storing programs and data, control components, and hardware and software for managing information scheduling. The file system is used to provide storage and organization of file data in the storage system to realize the function of managing multiple files in the storage system.

其中，该文件系统管理端120用于管理该文件系统110。在一些实施例中，该文件系统管理端120用于向该文件系统110发送数据恢复指令，以使该文件系统110进行数据恢复。在一些实施例中，该文件系统管理端120为终端。终端是指一类具备丰富人机交互方式、拥有接入互联网能力、通常搭载各种操作系统、具有较强处理能力的设备。在一些实施例中，上述终端的类型包括但不限于个人电脑、智能手机、平板电脑、车载终端等。在一些实施例中，该文件系统管理端120是使用该文件系统110所提供的文件服务的客户端软件，该文件系统管理端120能够用于对该文件系统中有访问权限的文件分区进行管理，本申请对此不做限定。Among them, the file system management terminal 120 is used to manage the file system 110. In some embodiments, the file system management terminal 120 is used to send a data recovery instruction to the file system 110 so that the file system 110 performs data recovery. In some embodiments, the file system management terminal 120 is a terminal. A terminal refers to a type of device that has rich human-computer interaction methods, has the ability to access the Internet, is usually equipped with various operating systems, and has strong processing capabilities. In some embodiments, the types of the above-mentioned terminals include but are not limited to personal computers, smart phones, tablet computers, car terminals, etc. In some embodiments, the file system management terminal 120 is a client software that uses the file service provided by the file system 110. The file system management terminal 120 can be used to manage file partitions with access rights in the file system, and this application does not limit this.

其中，该应用程序130是使用该文件系统110提供的文件服务的应用程序，其能够访问该文件系统110中的文件。在一些实施例中，该访问是指对文件系统中已有文件进行读或写，或者，在文件系统中写入新文件。在一些实施例中，该应用程序130基于文件系统上层的业务系统运行，该业务系统中运行有多个应用程序130，通过使用该文件系统110提供的文件存储业务来实现不同业务功能。The application 130 is an application using the file service provided by the file system 110, and can access files in the file system 110. In some embodiments, the access refers to reading or writing existing files in the file system, or writing new files in the file system. In some embodiments, the application 130 runs based on the business system on the upper layer of the file system, and multiple applications 130 run in the business system, which realize different business functions by using the file storage service provided by the file system 110.

其中，该病毒程序140能够对该文件系统110中的文件进行篡改，以感染文件系统中的正常文件。在一些实施例中，对文件的感染方式可以为：修改、加密或删除原有文件；或者，写入其他无效文件。在一些实施例中，该病毒程序140可以运行在访问该文件系统的任意节点中。该节点可以为访问该文件系统的虚拟机、容器或任意计算实例。在另一些实施例中，该病毒程序140可以运行在该应用程序130所在的业务系统中。在一些实施例中，该病毒程序140的程序代码被隐藏在该业务系统所访问的网页中，在业务系统访问网页的过程中，该病毒程序140的程序代码被下载至该业务系统中运行，从而能够入侵该业务系统所访问的文件系统110。在一些实施例中，该病毒是指勒索病毒，被感染则是指被勒索病毒感染。Among them, the virus program 140 can tamper with the files in the file system 110 to infect normal files in the file system. In some embodiments, the file infection method can be: modifying, encrypting or deleting the original file; or writing other invalid files. In some embodiments, the virus program 140 can run in any node that accesses the file system. The node can be a virtual machine, container or any computing instance that accesses the file system. In other embodiments, the virus program 140 can run in the business system where the application 130 is located. In some embodiments, the program code of the virus program 140 is hidden in the web page accessed by the business system. During the process of the business system accessing the web page, the program code of the virus program 140 is downloaded to the business system for execution, thereby being able to invade the file system 110 accessed by the business system. In some embodiments, the virus refers to a ransomware virus, and being infected refers to being infected by a ransomware virus.

在一些实施例中，文件系统管理端120、应用程序130和病毒程序140能够通过无线或有线网络与该文件系统110进行通信。In some embodiments, the file system management terminal 120, the application program 130 and the virus program 140 can communicate with the file system 110 via a wireless or wired network.

在本申请实施例中，该文件系统110支持生成和保存快照，从而能够基于所保存的快照，对文件系统中被感染的文件进行恢复。In the embodiment of the present application, the file system 110 supports generating and saving snapshots, so that infected files in the file system can be restored based on the saved snapshots.

在一些实施例中，该文件系统110可以支持下述任一种备份特性，以生成用于对被感染的文件进行恢复的快照。In some embodiments, the file system 110 may support any of the following backup features to generate snapshots for recovering infected files.

高密快照(HyperSnap)特性：支持此种备份特性的快照是文件系统中的源文件在某个快照时间点的一致性文件副本，该完全可用的副本包含该源文件在快照时间点下的静态映像。HyperSnap feature: A snapshot that supports this backup feature is a consistent file copy of the source file in the file system at a certain snapshot time point. The fully available copy contains a static image of the source file at the snapshot time point.

高密克隆(HyperClone)特性：支持此种备份特性的文件系统，能够文件系统中源文件在某个时间点的完整副本，或者，提供增量同步的备份方式。其中，“完整”指对源文件进行完全复制生成文件副本；“增量同步”是指文件副本可动态同步源文件中发生变更的部分。HyperClone feature: A file system that supports this backup feature can make a complete copy of the source file in the file system at a certain point in time, or provide an incremental synchronization backup method. "Complete" means that the source file is completely copied to generate a file copy; "Incremental synchronization" means that the file copy can dynamically synchronize the changed parts of the source file.

高密持续数据保护(HyperCDP)特性，支持此种备份特性的快照与高密快照特性同理。The high-density continuous data protection (HyperCDP) feature supports snapshots of this backup feature in the same way as the high-density snapshot feature.

在一些实施例中，该快照的对象可以是文件系统或文件系统的目录，本申请对快照的粒度不做限定。在病毒程序140持续攻击该文件系统110的过程中，应用程序130和也会持续访问该文件系统110，在此过程中产生的文件会被感染，本申请实施例提供的数据恢复方法能够应用在上述文件系统110中，通过进行精准高效的数据恢复。In some embodiments, the object of the snapshot may be a file system or a directory of the file system, and the present application does not limit the granularity of the snapshot. During the process of the virus program 140 continuously attacking the file system 110, the application program 130 and will also continuously access the file system 110, and the files generated in this process will be infected. The data recovery method provided in the embodiment of the present application can be applied to the above-mentioned file system 110 to perform accurate and efficient data recovery.

基于上述对实施环境的介绍，本申请实施例提供了一种数据恢复方法的原理图，参见图2，其中，在病毒攻击前的快照时间点T0下的快照包括“/document/1.doc；/document/10.doc”等文件的目录；在病毒持续攻击且业务正常运行过程中，快照时间点T1下的快照记录了应用程序正常操作的文件“/document/a.txt；/document/d.txt”以及被感染的文件“/document/1.doc；/document/10.doc”；快照时间点T2下的快照记录了应用程序正常操作的文件“/document/1.txt；/document/10.txt”以及被感染文件“/document/a.txt；/document/b.txt”；快照时间点Tn同理；本申请实施例提供的数据恢复方法能够通过感染检测，确定出被感染文件，从而针对被感染文件进行精准恢复，恢复的详细过程参见后文，在此不作赘述。其中，n为正整数，n为快照时间点的顺序编号，为了便于理解，下面结合图2示出了一种相邻快照时间点的快照的示例，参见表1，表1中，将“/document/”简写为“/doc/”，将“/picture/”简写为“/pic/”。表1中的“……”表示省略。Based on the above introduction to the implementation environment, an embodiment of the present application provides a schematic diagram of a data recovery method, see Figure 2, wherein the snapshot at the snapshot time point T0 before the virus attack includes a directory of files such as "/document/1.doc; /document/10.doc"; during the continuous virus attack and normal business operation, the snapshot at the snapshot time point T1 records the files "/document/a.txt; /document/d.txt" operated normally by the application and the infected files "/document/1.doc; /document/10.doc"; the snapshot at the snapshot time point T2 records the files "/document/1.txt; /document/10.txt" operated normally by the application and the infected files "/document/a.txt; /document/b.txt"; the same is true for the snapshot time point Tn; the data recovery method provided in the embodiment of the present application can determine the infected files through infection detection, thereby accurately recovering the infected files. The detailed recovery process is described later and will not be repeated here. Wherein, n is a positive integer, and n is a sequential number of a snapshot time point. For ease of understanding, an example of snapshots at adjacent snapshot time points is shown below in conjunction with FIG. 2 , as shown in Table 1. In Table 1, “/document/” is abbreviated as “/doc/”, and “/picture/” is abbreviated as “/pic/”. “…” in Table 1 indicates omission.

表1Table 1

本申请涉及的计算设备包括上述服务器和终端，计算设备具有通信功能，能够接入有线网络或无线网络。在一些实施例中，该无线网络或有线网络使用标准通信技术和/或协议。网络包括但不限于数据中心网络(data center network)、存储区域网(storagearea network，SAN)、局域网(local area network，LAN)、城域网(metropolitan areanetwork，MAN)、广域网(wide area network，WAN)、移动、有线或者无线网络、专用网络或者虚拟专用网络的任何组合。在一些实现方式中，使用包括超级文本标记语言(hyper textmarkup language，HTML)、可扩展标记语言(extensible markup language，XML)等的技术和/或格式来代表通过网络交换的数据。此外还能够使用诸如安全套接字层(securesockets layer，SSL)、传输层安全(transport layer security，TLS)、虚拟专用网络(virtual private network，VPN)、网际协议安全(internet protocol security，IPsec)等常规加密技术来加密所有或者部分链路。在另一些实施例中，还能够使用定制和/或专用数据通信技术取代或者补充上述数据通信技术。The computing device involved in the present application includes the above-mentioned server and terminal, and the computing device has a communication function and can access a wired network or a wireless network. In some embodiments, the wireless network or the wired network uses standard communication technology and/or protocol. The network includes but is not limited to any combination of a data center network, a storage area network (SAN), a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), a mobile, wired or wireless network, a private network or a virtual private network. In some implementations, the data exchanged through the network is represented by the technology and/or format including hypertext markup language (HTML), extensible markup language (XML), etc. In addition, conventional encryption technologies such as secure sockets layer (SSL), transport layer security (TLS), virtual private network (VPN), and Internet protocol security (IPsec) can be used to encrypt all or part of the links. In other embodiments, customized and/or dedicated data communication technologies can be used to replace or supplement the above-mentioned data communication technologies.

本申请实施例提供了一种计算设备集群。该计算设备集群包括一台或多台计算设备，该计算设备集群能够运行上述文件系统。该计算设备可以是服务器，例如是中心服务器、边缘服务器，或者是本地数据中心中的本地服务器。在一些实施例中，计算设备也可以是台式机、笔记本电脑或者智能手机等终端设备。在一些实施例中，计算设备集群中的一个或多个计算设备可以通过网络连接。下面对该计算设备集群中计算设备的硬件结构进行介绍。An embodiment of the present application provides a computing device cluster. The computing device cluster includes one or more computing devices, and the computing device cluster can run the above-mentioned file system. The computing device can be a server, such as a central server, an edge server, or a local server in a local data center. In some embodiments, the computing device can also be a terminal device such as a desktop, a laptop or a smart phone. In some embodiments, one or more computing devices in the computing device cluster can be connected via a network. The hardware structure of the computing device in the computing device cluster is introduced below.

本申请实施例提供了一种计算设备，该计算设备可以实现为上述服务器或终端。示意性地，参考图3，图3是本申请实施例提供的一种计算设备的硬件结构示意图。如图3所示，该计算设备300包括存储器301、处理器302、通信接口303以及总线304。其中，存储器301、处理器302、通信接口303通过总线304实现彼此之间的通信连接。The embodiment of the present application provides a computing device, which can be implemented as the above-mentioned server or terminal. Schematically, refer to Figure 3, which is a schematic diagram of the hardware structure of a computing device provided by an embodiment of the present application. As shown in Figure 3, the computing device 300 includes a memory 301, a processor 302, a communication interface 303 and a bus 304. Among them, the memory 301, the processor 302, and the communication interface 303 are connected to each other through the bus 304.

存储器301可以是只读存储器(read-only memory，ROM)或可存储静态信息和指令的其它类型的静态存储设备，随机存取存储器(random access memory，RAM)或者可存储信息和指令的其它类型的动态存储设备，也可以是电可擦可编程只读存储器(electricallyerasable programmable read-only memory，EEPROM)、只读光盘(compact disc read-only memory，CD-ROM)或其它光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其它磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其它介质，但不限于此。处理器302通过读取存储器301中保存的程序代码实现下述实施例中的数据恢复方法，或者，处理器302通过内部存储的程序代码实现下述实施例中的数据恢复方法。在处理器302通过读取存储器301中保存的程序代码实现下述实施例中的数据恢复方法的情况下，存储器301中可以保存实现本申请实施例提供的数据恢复方法的程序代码。存储器301还可以存储文件元数据等数据，本申请实施例对此不作限定。The memory 301 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (RAM) or other types of dynamic storage devices that can store information and instructions, or an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc storage (including compressed optical disc, laser disc, optical disc, digital versatile disc, Blu-ray disc, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store the desired program code in the form of an instruction or data structure and can be accessed by a computer, but is not limited thereto. The processor 302 implements the data recovery method in the following embodiments by reading the program code stored in the memory 301, or the processor 302 implements the data recovery method in the following embodiments by the program code stored internally. When the processor 302 implements the data recovery method in the following embodiment by reading the program code stored in the memory 301, the program code for implementing the data recovery method provided in the embodiment of the present application can be stored in the memory 301. The memory 301 can also store data such as file metadata, which is not limited in the embodiment of the present application.

处理器302可以是网络处理器(network processor，NP)、中央处理器(centralprocessing unit，CPU)、特定应用集成电路(application-specific integratedcircuit，ASIC)或用于控制本申请方案程序执行的集成电路。该处理器302可以是一个单核(single-CPU)处理器，也可以是一个多核(multi-CPU)处理器。该处理器302的数量可以是一个，也可以是多个。通信接口303使用例如收发器一类的收发模块，来实现计算设备300与其他设备或通信网络之间的通信。例如，可以通过通信接口303获取数据。The processor 302 may be a network processor (NP), a central processing unit (CPU), an application-specific integrated circuit (ASIC), or an integrated circuit for controlling the execution of the program of the present application. The processor 302 may be a single-CPU processor or a multi-CPU processor. The number of processors 302 may be one or more. The communication interface 303 uses a transceiver module such as a transceiver to implement communication between the computing device 300 and other devices or communication networks. For example, data may be acquired through the communication interface 303.

其中，存储器301和处理器302可以分离设置，也可以集成在一起。The memory 301 and the processor 302 may be separately provided or integrated together.

总线304可包括在计算设备300各个部件(例如，存储器301、处理器302、通信接口303)之间传送信息的通路。Bus 304 may include a path for transmitting information between various components of computing device 300 (eg, memory 301 , processor 302 , communication interface 303 ).

基于上述实施环境，在病毒程序对文件系统所存储的文件进行持续攻击的过程中，上层业务系统中的应用程序仍旧会对文件系统进行访问，例如，对未被感染的文件执行删除、修改或写入新的文件。因此，在持续攻击过程中，文件系统中实际包含正常文件和感染文件。而相关技术中，直接将文件系统直接回滚到某一快照时间点的方式，会导致病毒持续攻击过程中所产生的正常文件无法恢复。例如，参见表1：(1)在快照时间点T0下，文件“1.doc”和“10.doc”“a.txt”“d.txt”“a.bmp”“d.bmp”均尚未被感染，(2)在T0～T1之间，文件系统中部分文件被应用程序正常访问而修改，部分文件因为被病毒攻击而修改，因此，在快照时间点T1所生成的快照中：文件“1.doc”和“10.doc”被病毒程序感染，而文件“a.txt”和“d.txt”相较于T0时刻发生了变化；(3)在T1～T2之间，“a.txt”“d.txt”仍然保持被感染的状态(“a.txt”“d.txt”是被感染文件，被感染文件难以再被应用程序正常访问，所以通常不会发生变化，会保持被感染的状态)，新增被感染文件“1.doc”和“10.doc”，因此在T2时刻快照时，被快照的文件中包括4个感染文件，分别是“a.txt”、“d.txt”、“1.doc”和“10.doc”。此时若采用相关技术，直接将文件系统回滚至快照时间点T0，则T1下对文件“a.txt”和“d.txt”的操作结果会直接丢失Tn下对文件“a.bmp”和“b.bmp”的修改也会丢失；如果直接将文件系统回滚到T1时刻，则恢复的文件系统包含了被感染文件，没有达到清除感染的效果。同理，当把文件系统回滚到某个时刻，除了修改之外，对文件的其他正常操作(包括文件删除、写入的新文件)也会发生丢失。可以看出，相关技术会造成数据损失，使得数据恢复的效率十分低下。Based on the above implementation environment, during the process of the virus program continuously attacking the files stored in the file system, the application program in the upper-level business system will still access the file system, for example, deleting, modifying or writing new files to uninfected files. Therefore, during the continuous attack, the file system actually contains normal files and infected files. In the related art, the method of directly rolling back the file system to a certain snapshot time point will cause the normal files generated during the continuous virus attack to be unable to be restored. For example, see Table 1: (1) At the snapshot time point T0, the files "1.doc" and "10.doc", "a.txt", "d.txt", "a.bmp" and "d.bmp" have not been infected, (2) Between T0 and T1, some files in the file system are modified by normal access by the application program, and some files are modified because of the virus attack. Therefore, in the snapshot generated at the snapshot time point T1: the files "1.doc" and "10.doc" are infected by the virus program, while the files "a.txt" and "d.txt" are compared to T0. (3) Between T1 and T2, "a.txt" and "d.txt" remain infected ("a.txt" and "d.txt" are infected files, which are difficult to be accessed by applications normally, so they usually do not change and remain infected). New infected files "1.doc" and "10.doc" are added. Therefore, when taking a snapshot at T2, the snapshot files include 4 infected files, namely "a.txt", "d.txt", "1.doc" and "10.doc". At this time, if the relevant technology is used to directly roll back the file system to the snapshot time point T0, the operation results of the files "a.txt" and "d.txt" at T1 will be directly lost, and the modifications to the files "a.bmp" and "b.bmp" at Tn will also be lost; if the file system is directly rolled back to T1, the restored file system contains the infected files, and the effect of clearing the infection is not achieved. Similarly, when the file system is rolled back to a certain time, in addition to the modification, other normal operations on the files (including file deletion and new file writing) will also be lost. It can be seen that the relevant technology will cause data loss, making the efficiency of data recovery very low.

有鉴于此，本申请提供了一种数据恢复方法，能够针对病毒持续攻击文件系统的情况进行精准、高效的数据恢复，大大提升数据恢复的效率。下面对本申请提供的数据恢复方法进行介绍。In view of this, the present application provides a data recovery method, which can accurately and efficiently recover data in the case where viruses continuously attack the file system, thereby greatly improving the efficiency of data recovery. The data recovery method provided by the present application is introduced below.

图4是本申请实施例提供的一种数据恢复方法的流程图。如图4所示，该数据恢复方法能够应用于上述图1所示的实施环境中，由运行文件系统的服务器执行，该数据恢复方法包括下述步骤401至步骤404。Figure 4 is a flow chart of a data recovery method provided by an embodiment of the present application. As shown in Figure 4, the data recovery method can be applied to the implementation environment shown in Figure 1 above, and is executed by a server running a file system. The data recovery method includes the following steps 401 to 404.

401、服务器基于多个快照，生成文件感染记录，该文件感染记录指示被感染的第一文件被感染前的快照，其中，快照是文件系统的快照或者目录的快照，每个快照对应于一个快照时间点。401. The server generates a file infection record based on multiple snapshots, where the file infection record indicates a snapshot of the infected first file before the infection, wherein the snapshot is a snapshot of a file system or a snapshot of a directory, and each snapshot corresponds to a snapshot time point.

本申请提供的各实施例中，被快照的对象可以是文件系统(也就是文件系统管理的所有目录)，也可以是文件系统中的个别目录，本申请对快照的粒度不做限定。In each embodiment provided in the present application, the object of the snapshot may be a file system (that is, all directories managed by the file system) or an individual directory in the file system. The present application does not limit the granularity of the snapshot.

在一些实施例中，被感染的第一文件被感染前的快照有多个时：文件感染记录可以指示多个快照中任意一个快照，或者指示多个快照中距离感染时间最近的一个快照(也就是时间最晚的一个)。In some embodiments, when there are multiple snapshots before the infected first file is infected, the file infection record may indicate any one of the multiple snapshots, or indicate a snapshot of the multiple snapshots that is closest to the infection time (ie, the latest one).

存储系统定时对文件系统中的文件进行快照，相当于对数据进行了定时保护。当存储系统故障时，把数据“回滚”到快照时刻的数据，避免数据的大量丢失。The storage system regularly takes snapshots of files in the file system, which is equivalent to regular data protection. When the storage system fails, the data is "rolled back" to the data at the snapshot time to avoid massive data loss.

在本申请实施例中，快照是文件系统在快照时间点下的可用副本。需要说明的是，本实施例中快照的对象是文件系统，也就是对文件所管理的所有文件进行快照。在另一些实施例中，可以缩小快照的范围，例如仅针对一个或者多个目录进行快照。由于文件系统快照和目录快照的原理相同，此处仅以文件系统为例进行介绍。In the embodiment of the present application, a snapshot is a usable copy of a file system at the snapshot time point. It should be noted that the object of the snapshot in the present embodiment is the file system, that is, a snapshot is taken of all files managed by the file. In other embodiments, the scope of the snapshot can be narrowed, for example, only a snapshot is taken of one or more directories. Since the principles of file system snapshots and directory snapshots are the same, only the file system is used as an example for description.

在一些实施例中，该快照是针对文件系统中文件的快照，在这种示例下，快照包括其对应的快照时间点下的文件的备份以及文件的元数据，其中，文件的元数据用于从该快照中，获取该指定文件在该快照时间点下的备份。在一些实施例中，该元数据用于描述该文件在该快照时间点下在文件系统中的存储状态，例如，该元数据包括文件名称、文件后缀以及文件所在的数据块等信息。In some embodiments, the snapshot is a snapshot of a file in the file system. In this example, the snapshot includes a backup of the file at the corresponding snapshot time point and the file's metadata, wherein the file's metadata is used to obtain the backup of the specified file at the snapshot time point from the snapshot. In some embodiments, the metadata is used to describe the storage status of the file in the file system at the snapshot time point, for example, the metadata includes information such as the file name, the file suffix, and the data block where the file is located.

在另一些实施例中，该快照针对文件系统的目录。在这种示例下，快照包括该快照时间点下文件系统的目录，文件系统的各级目录构成了访问文件的路径，因此，快照所记录的目录能够指示在该快照时间点下文件系统中各个文件和文件夹的状态。在一些实施例中，文件系统的目录能够以目录文件的形式存储，通过复制文件系统在快照时间点下的目录文件，即可生成针对文件系统目录的快照。In other embodiments, the snapshot is for a directory of a file system. In this example, the snapshot includes the directory of the file system at the snapshot time point, and the directories at all levels of the file system constitute the path to access the file, so the directory recorded by the snapshot can indicate the status of each file and folder in the file system at the snapshot time point. In some embodiments, the directory of the file system can be stored in the form of a directory file, and a snapshot of the directory of the file system can be generated by copying the directory file of the file system at the snapshot time point.

在一些实施例中，文件系统的多个快照，能够保存在与文件互不干扰的存储空间中，例如，文件系统根目录下用于存储快照的snapshot文件夹。在一些实施例中，通过访问该存储空间，即可快速地访问到文件系统的快照。In some embodiments, multiple snapshots of the file system can be stored in a storage space that does not interfere with the files, for example, a snapshot folder under the root directory of the file system for storing snapshots. In some embodiments, by accessing the storage space, the snapshots of the file system can be quickly accessed.

在本申请实施例中，利用文件系统中周期性存储的快照，来对不同时间点下的文件进行备份，从而能够在大部分情况下及时地保存下文件被病毒攻击前的可用的历史副本，以实现对文件系统的持续数据保护。In an embodiment of the present application, snapshots stored periodically in the file system are used to back up files at different time points, so that in most cases, available historical copies of files before being attacked by viruses can be saved in time to achieve continuous data protection of the file system.

在本申请实施例中，服务器通过对该多个快照进行比较，确定出快照之间的差异，从而对该差异进行感染检测，来确定文件系统中被感染的文件，以生成文件感染记录。In an embodiment of the present application, the server compares the multiple snapshots to determine the differences between the snapshots, and then performs infection detection on the differences to determine the infected files in the file system to generate a file infection record.

其中，该文件感染记录用于对该文件系统进行数据恢复。在本申请实施例中，该文件感染记录指示被感染的第一文件被感染前的快照，基于此，在数据恢复时，根据该文件感染记录，即可精准地将该第一文件恢复到被感染前的状态。在一些实施例中，该第一文件可以为普通的数据文件，也可以是目录文件等其他类型的文件，本申请对此不作限定。The file infection record is used to perform data recovery on the file system. In an embodiment of the present application, the file infection record indicates a snapshot of the infected first file before it was infected. Based on this, during data recovery, the first file can be accurately restored to its state before it was infected according to the file infection record. In some embodiments, the first file can be an ordinary data file or other types of files such as a directory file, which is not limited in the present application.

在一些实施例中，服务器基于多个快照，生成该文件感染记录的过程可以包括下述步骤1至步骤3。In some embodiments, the process of the server generating the file infection record based on multiple snapshots may include the following steps 1 to 3.

步骤1、服务器基于该多个快照中一对相邻快照时间点的快照，确定该相邻快照时间点中后一快照时间点的快照与该相邻快照时间点中前一快照时间点的快照之间的差异。Step 1: The server determines the difference between a snapshot at a later snapshot time point in a pair of adjacent snapshot time points in the plurality of snapshots and a snapshot at a previous snapshot time point in the pair of adjacent snapshot time points.

其中，一对相邻快照时间点是指前后相邻的两个快照时间点，例如，服务器每隔30分钟生成一个快照，则快照时间点10：30与快照时间点11:00是一对相邻快照时间点。A pair of adjacent snapshot time points refers to two adjacent snapshot time points. For example, if the server generates a snapshot every 30 minutes, the snapshot time points 10:30 and 11:00 are a pair of adjacent snapshot time points.

在一些实施例中，服务器会周期性生成该文件系统的快照。示例性地，服务器每间隔目标时长，生成该文件系统的快照。在一些实施例中，相邻快照时间点之间的该目标时长可以根据业务需求设置，例如，该目标时长可以为半小时、一小时或一天等，本申请对此不做限定。在一些实施例中，服务器可以在负载满足空闲条件的情况下执行本步骤1，例如，服务器在内核CPU使用率小于空闲阈值(如70％)的情况下，执行本步骤1。In some embodiments, the server periodically generates snapshots of the file system. Exemplarily, the server generates a snapshot of the file system at target intervals. In some embodiments, the target duration between adjacent snapshot time points can be set according to business needs. For example, the target duration can be half an hour, one hour, or one day, etc., which is not limited in this application. In some embodiments, the server can perform this step 1 when the load meets the idle condition. For example, the server performs this step 1 when the kernel CPU usage is less than the idle threshold (such as 70%).

在一些实施例中，服务器会在生成一个快照后，实时进行快照的对比。在这种示例下，服务器每生成一个快照，则基于该最新的快照与前一个快照进行比较，确定出该快照之间差异，基于此，能够提升生成文件感染记录的实时性。在另一些实施例中，服务器每隔一段时间，按照快照时间点的先后，一次性对该段时间内生成的多个快照中每一对相邻时间点的快照进行比较，以确定每一对相邻时间点的快照之间的差异。基于此，能够减轻服务器的计算负载。In some embodiments, after generating a snapshot, the server will compare snapshots in real time. In this example, each time the server generates a snapshot, it compares the latest snapshot with the previous snapshot to determine the difference between the snapshots. Based on this, the real-time nature of generating file infection records can be improved. In other embodiments, the server compares each pair of snapshots at adjacent time points in the multiple snapshots generated within the period of time at regular intervals, according to the order of the snapshot time points, to determine the difference between each pair of snapshots at adjacent time points. Based on this, the computing load of the server can be reduced.

在本申请实施例中，该差异指示该后一快照时间点的快照相比于该前一快照时间点的快照所发生的变更。发生变更的可以是文件，也可以是文件元数据，还可以是文件系统的目录，本申请对确定出的差异的粒度不做限定。下面对步骤1确定差异的几种情况进行介绍，参见下述情况1、情况2和情况3。In the embodiment of the present application, the difference indicates the change that occurs in the snapshot at the latter snapshot time point compared to the snapshot at the former snapshot time point. The change may be a file, file metadata, or a directory of the file system. The present application does not limit the granularity of the determined difference. The following introduces several situations in which the difference is determined in step 1, see the following situation 1, situation 2, and situation 3.

情况1、差异是新增文件。Case 1: The difference is a newly added file.

在一些实施例中，服务器对比该相邻快照时间点的快照，确定出该后一快照时间点的快照中相对于该前一快照时间点的快照的新增文件，该新增文件即为该相邻快照时间点的快照之间的差异。在一些实施例中，快照是针对文件系统的目录的快照。服务器对比相邻快照时间点的快照所包括的文件系统的目录，即可确定出后一快照时间点的快照中新增的文件夹或文件，从而确定出新增文件。In some embodiments, the server compares the snapshots at the adjacent snapshot time points to determine the newly added files in the snapshot at the latter snapshot time point relative to the snapshot at the previous snapshot time point, and the newly added files are the differences between the snapshots at the adjacent snapshot time points. In some embodiments, the snapshot is a snapshot of a directory of a file system. The server compares the directories of the file system included in the snapshots at the adjacent snapshot time points to determine the newly added folders or files in the snapshot at the latter snapshot time point, thereby determining the newly added files.

情况2、差异是修改文件。Case 2: The difference is the modified file.

在一些实施例中，服务器对比该相邻快照时间点的快照，确定出该后一快照时间点的快照中相对于该前一快照时间点的快照的修改文件，该修改文件即为该相邻快照时间点的快照之间的差异。在一些实施例中，服务器对比相邻快照时间点的快照所包括的文件元数据，即可确定出后一快照时间点的快照中被修改的文件。例如，该文件元数据包括文件的最近修改时间和占用空间大小等，通过对比相邻快照时间点的快照，即可确定某一文件被修改，从而确定出修改文件。In some embodiments, the server compares the snapshots at the adjacent snapshot time points to determine the modified files in the snapshot at the latter snapshot time point relative to the snapshot at the previous snapshot time point, and the modified files are the differences between the snapshots at the adjacent snapshot time points. In some embodiments, the server compares the file metadata included in the snapshots at the adjacent snapshot time points to determine the modified files in the snapshot at the latter snapshot time point. For example, the file metadata includes the most recent modification time of the file and the size of the occupied space, etc. By comparing the snapshots at the adjacent snapshot time points, it can be determined that a certain file has been modified, thereby determining the modified file.

情况3、差异是文件元数据。Case 3: The difference is file metadata.

在一些实施例中，服务器对比该相邻快照时间点的快照，确定出该后一快照时间点的快照中相对于该前一快照时间点的快照的发生变更的文件元数据。示例性地，该发生变更的文件元数据可以为文件后缀或文件名称等。基于此，能够以元数据为粒度，检测出可能因病毒感染带来的变更，以初步确定出可能是因病毒感染而造成的变更，再通过感染检测，来精准确定出被感染的文件。In some embodiments, the server compares the snapshots at the adjacent snapshot time points to determine the file metadata that has changed in the snapshot at the latter snapshot time point relative to the snapshot at the former snapshot time point. Exemplarily, the changed file metadata may be a file suffix or a file name, etc. Based on this, it is possible to detect changes that may be caused by virus infection at the granularity of metadata, to preliminarily determine changes that may be caused by virus infection, and then accurately determine the infected files through infection detection.

步骤2、服务器对该差异进行感染检测，以确定该第一文件是被感染文件。Step 2: The server performs infection detection on the difference to determine that the first file is an infected file.

其中，该感染检测是指检测文件是否被感染的过程。在一些实施例中，该感染检测是指根据任一种病毒对文件的感染方式，确定该差异所指示的文件是否被感染。The infection detection refers to the process of detecting whether a file is infected. In some embodiments, the infection detection refers to determining whether the file indicated by the difference is infected according to the infection mode of any virus on the file.

在一些实施例中，病毒感染会导致文件的文件特征发生变化。在这种示例下，服务器可以将文件特征发生变化的第一文件，确定为被感染文件，其中，该文件特征用于表征文件。在一些实施例中，该文件特征可以对文件的状态进行分析得到，例如，文件是否处于加密状态或者。示例性地，该文件特征可以为文件的信息熵，该信息熵的大小能够表征该文件的随机性大小，从而指示该文件是否处于加密状态。可以理解地，文件的随机性越大，则被加密的可能性越大。In some embodiments, virus infection may cause a file feature to change. In this example, the server may determine the first file whose file feature changes as an infected file, wherein the file feature is used to characterize the file. In some embodiments, the file feature may be obtained by analyzing the state of the file, for example, whether the file is in an encrypted state or not. Exemplarily, the file feature may be the information entropy of the file, and the size of the information entropy may characterize the randomness of the file, thereby indicating whether the file is in an encrypted state. It is understandable that the greater the randomness of the file, the greater the possibility of being encrypted.

在一些实施例中，该病毒是勒索病毒，勒索病毒的感染方式主要是对文件进行加密，或者，篡改文件，使得文件处于无法访问的状态。针对被感染文件处于加密状态的情况，考虑到处于加密状态的被感染文件的随机性会大大增强，因此，该感染检测可以是根据文件的信息熵来判断文件是否被加密，从而确定文件是否被感染。这种示例下，对差异进行感染检测的过程可以包括：服务器针对该差异包括的第一文件，确定该第一文件的信息熵；服务器检测到该第一文件的信息熵超出文件系统的信息熵区间，确定该第一文件是被感染文件。其中，第一文件的信息熵超出该信息熵区间是指：超出该信息熵区间的上限。In some embodiments, the virus is a ransomware virus, and the main way of infection of the ransomware virus is to encrypt files, or to tamper with files so that the files are inaccessible. For the case where the infected file is in an encrypted state, considering that the randomness of the infected file in an encrypted state will be greatly enhanced, therefore, the infection detection can be to determine whether the file is encrypted based on the information entropy of the file, thereby determining whether the file is infected. In this example, the process of performing infection detection on the difference may include: the server determines the information entropy of the first file included in the difference; the server detects that the information entropy of the first file exceeds the information entropy interval of the file system, and determines that the first file is an infected file. Among them, the information entropy of the first file exceeds the information entropy interval means: exceeding the upper limit of the information entropy interval.

上述技术方案考虑到病毒感染会导致文件的文件特征发生变化，基于此，利用病毒在感染方式上的这一特点，能够快速地检测出被感染的文件，以提升感染检测的速度。The above technical solution takes into account that virus infection may cause changes in the file characteristics of files. Based on this, by utilizing this characteristic of the virus infection method, infected files can be quickly detected to improve the speed of infection detection.

在一些实施例中，可以通过对文件系统中的文件进行抽样检测，以确定文件系统的合理信息熵区间，例如，该信息熵区间可以表示为：[H(LOW),H(HIGH)]，其中，H(LOW)为信息熵下限，H(HIGH)为信息熵上限。In some embodiments, a reasonable information entropy range of the file system can be determined by sampling and testing files in the file system. For example, the information entropy range can be expressed as: [H(LOW), H(HIGH)], where H(LOW) is the lower limit of the information entropy and H(HIGH) is the upper limit of the information entropy.

在一些实施例中，该抽样检测可以针对该文件系统中的实时更新的文件进行，从而保证信息熵区间随着文件系统中业务文件的更迭而动态更新。在这种示例下，确定合理信息熵区间的过程包括：服务器每隔抽样间隔对该文件系统中的文件进行多次抽样，得到多批抽样文件；基于该多批抽样文件的信息熵，确定多个参考熵值区间；基于多个该参考熵值区间，确定文件系统的信息熵区间。In some embodiments, the sampling detection can be performed on files that are updated in real time in the file system, thereby ensuring that the information entropy interval is dynamically updated as the business files in the file system are changed. In this example, the process of determining a reasonable information entropy interval includes: the server samples the files in the file system multiple times at each sampling interval to obtain multiple batches of sampled files; based on the information entropy of the multiple batches of sampled files, multiple reference entropy value intervals are determined; based on the multiple reference entropy value intervals, the information entropy interval of the file system is determined.

在另一些实施例中，考虑到文件系统中存储的部分文件本身就是加密文件，因此，该抽样检测也可以针对该文件系统前期的部分文件进行，以保证文件系统的信息熵区间符合文件系统原始的文件存储特点。In other embodiments, considering that some files stored in the file system are encrypted files themselves, the sampling detection can also be performed on some early files of the file system to ensure that the information entropy range of the file system conforms to the original file storage characteristics of the file system.

上述技术方案能够有效地提升进行感染检测的准确性以及针对文件系统的适用性。The above technical solution can effectively improve the accuracy of infection detection and its applicability to file systems.

在另一些实施例中，勒索病毒的感染方式包括修改文件的文件后缀。在这种示例下，服务器可以将文件后缀中存在病毒标识的第一文件，确定为被感染文件。在一些实施例中，该病毒标识可以是指示勒索病毒的特殊后缀，例如，“.lock”或“.encrypt”。In other embodiments, the ransomware infection method includes modifying the file suffix of the file. In this example, the server can determine the first file with a virus identifier in the file suffix as an infected file. In some embodiments, the virus identifier can be a special suffix indicating a ransomware virus, such as ".lock" or ".encrypt".

在另一些实施例中，步骤1检测出的差异包括文件元数据。服务器针对该差异包括的第一文件的文件元数据，检测该文件元数据所包括的文件后缀中是否存在病毒标识；若存在该病毒标识，并且，相邻快照时间点中前一快照时间点的快照中也有相同文件名称的该第一文件，则可以确定该第一文件在该相邻快照时间点中后一快照时间点已被感染。In some other embodiments, the difference detected in step 1 includes file metadata. The server detects whether there is a virus identifier in the file suffix included in the file metadata of the first file included in the difference; if the virus identifier exists, and the first file with the same file name is also included in the snapshot at the previous snapshot time point in the adjacent snapshot time point, it can be determined that the first file has been infected at the next snapshot time point in the adjacent snapshot time point.

上述技术方案考虑到病毒的感染方式包括修改文件的文件后缀，基于此，利用病毒在感染方式上的这一特点，能够快速地检测出被感染的文件，以提升感染检测的速度。The above technical solution takes into account that the virus infection method includes modifying the file suffix of the file. Based on this, by utilizing this feature of the virus infection method, the infected file can be quickly detected to improve the speed of infection detection.

步骤3、服务器在文件感染记录中，将该第一文件记录为被感染文件。Step 3: The server records the first file as an infected file in the file infection record.

在本申请实施例中，通过上述步骤1至步骤2，即可确定第一文件被感染，并且，该第一文件在该前一快照时间点的快照中尚未被感染，在该后一快照时间点的快照中已经被感染，因此，通过在文件感染记录中指示该第一文件被感染前的快照，即可在数据恢复时，根据该文件感染记录，精准地将该第一文件恢复到被感染前的状态。In the embodiment of the present application, through the above steps 1 to 2, it can be determined that the first file is infected, and the first file has not been infected in the snapshot at the previous snapshot time point, but has been infected in the snapshot at the next snapshot time point. Therefore, by indicating the snapshot before the first file is infected in the file infection record, the first file can be accurately restored to the state before the infection according to the file infection record during data recovery.

在一些实施例中，服务器在文件感染记录中保存该第一文件在被感染前的快照中的文件元数据(被感染前文件元数据)，该被感染前文件元数据能够用于找到感染前的快照，进而从该被感染前的快照中获取未被感染的第一文件。In some embodiments, the server saves the file metadata of the first file in the snapshot before infection (pre-infection file metadata) in the file infection record. The pre-infection file metadata can be used to find the pre-infection snapshot, and then obtain the uninfected first file from the pre-infection snapshot.

在一些实施例中，服务器在文件感染记录中保存该第一文件在被感染前的快照中的文件元数据(被感染前文件元数据)，以及，该第一文件在被感染后的快照中的文件元数据(被感染后文件元数据)。其中，该被感染后文件元数据即可指示该第一文件在哪一个快照时间点中已被感染；该被感染前文件元数据即可指示该第一文件在哪一个快照时间点中尚未被感染。In some embodiments, the server saves the file metadata of the first file in the snapshot before infection (pre-infection file metadata) and the file metadata of the first file in the snapshot after infection (post-infection file metadata) in the file infection record. The post-infection file metadata can indicate at which snapshot time point the first file has been infected; the pre-infection file metadata can indicate at which snapshot time point the first file has not been infected.

在一些实施例中，由于可以确定该第一文件是在该相邻快照时间点之间的时间段被感染的，因此，在系统的恢复精度达到一定要求的情况下，例如，生成快照的目标时长(也即是相邻快照时间点之间的间隔)足够小，该文件感染记录即可接近实时记录文件被感染的效果。In some embodiments, since it can be determined that the first file is infected in the time period between adjacent snapshot time points, therefore, when the system's recovery accuracy meets certain requirements, for example, the target duration for generating snapshots (that is, the interval between adjacent snapshot time points) is small enough, the file infection record can be close to the effect of recording the file infection in real time.

在一些实施例中，该第一文件的被感染前文件元数据以及该第一文件的被感染后文件元数据相关联。在一些实施例中，服务器根据一对相邻快照时间点对应的快照，检测出被感染的该第一文件，从而针对本次检测生成文件感染记录时，能够在文件感染记录中建立该第一文件被感染后文件元数据和被感染前文件元数据之间的映射关系，从而使被感染后文件元数据和被感染前文件元数在本次检测中相关联。In some embodiments, the file metadata of the first file before infection and the file metadata of the first file after infection are associated. In some embodiments, the server detects the infected first file based on snapshots corresponding to a pair of adjacent snapshot time points, so that when generating a file infection record for this detection, a mapping relationship between the file metadata of the first file after infection and the file metadata before infection can be established in the file infection record, so that the file metadata after infection and the file metadata before infection are associated in this detection.

在一些实施例中，该文件感染记录能够以文件感染元数据库的形式存储，本申请提供了一种文件感染元数据库的示意，参见表2，该文件感染元数据库中，记录有基于多对相邻快照时间点进行对比以及感染检测得到的文件感染记录，其中，T0与T1为一对相邻快照时间点；T1与T2为一对相邻快照时间点；以T0与T1对比检测得到的记录为例：表中记录有T1时刻被感染文件“1.doc”、“2.doc”和“3.doc.lock”在T1时刻对应快照下的文件存储路径以及各个被感染文件在T0时刻对应的未被感染文件在T0时刻对应快照下的文件存储路径。表2中以文件元数据包括的文件存储路径为例，通过存储文件感染前后在对应快照下的文件存储路径，来保证通过文件感染记录，即能够准确地获知文件被感染前后的存储位置。表2中在T1下的“document/1.doc”和T0下的“document/1.doc”分别指示了T1时刻下被感染的文件“T1—1.doc”所在的存储位置和T0时刻下未被感染的文件“T0—1.doc”所在的存储位置。In some embodiments, the file infection record can be stored in the form of a file infection metadata database. The present application provides a schematic diagram of a file infection metadata database, see Table 2. In the file infection metadata database, there are recorded file infection records obtained by comparing multiple pairs of adjacent snapshot time points and infection detection, wherein T0 and T1 are a pair of adjacent snapshot time points; T1 and T2 are a pair of adjacent snapshot time points; taking the record obtained by comparing and detecting T0 and T1 as an example: the table records the file storage paths of the infected files "1.doc", "2.doc" and "3.doc.lock" at the time T1 under the snapshot corresponding to the time T1, and the file storage paths of the uninfected files corresponding to the infected files at the time T0 under the snapshot corresponding to the time T0. In Table 2, taking the file storage path included in the file metadata as an example, by storing the file storage paths under the corresponding snapshots before and after the file infection, it is ensured that the storage location of the file before and after the infection can be accurately known through the file infection record. In Table 2, "document/1.doc" under T1 and "document/1.doc" under T0 respectively indicate the storage location of the infected file "T1-1.doc" at time T1 and the storage location of the uninfected file "T0-1.doc" at time T0.

如表2所示，被感染文件“T1—3.doc.lock”的文件后缀中有病毒标识，表示病毒至少采用了修改文件后缀的方式来篡改该文件“T0—3.doc”，在这种示例下，服务器可以通过前述检测病毒标识的方式来检测出该被感染文件“T1—3.doc.lock”；被感染文件“T1—1.doc”的文件后缀中没有病毒标识，表示病毒可能采用其他感染方式来篡改文件“T0—1.doc”，例如，对文件内容进行加密，将文件内容修改为乱码或勒索信息，在这种示例下，服务器可以通过前述计算文件信息熵的方式来检测出该被感染文件“T1—1.doc”。表2中的“……”表示省略。As shown in Table 2, the infected file "T1-3.doc.lock" has a virus mark in the file suffix, indicating that the virus at least uses the method of modifying the file suffix to tamper with the file "T0-3.doc". In this example, the server can detect the infected file "T1-3.doc.lock" by the aforementioned method of detecting virus marks; the infected file "T1-1.doc" does not have a virus mark in the file suffix, indicating that the virus may use other infection methods to tamper with the file "T0-1.doc", for example, encrypting the file content and modifying the file content to garbled code or ransom information. In this example, the server can detect the infected file "T1-1.doc" by the aforementioned method of calculating file information entropy. "..." in Table 2 indicates omission.

表2Table 2

通过上述步骤1至步骤3，能够通过确定相邻快照时间点的快照之间的差异以及感染检测过程，来精准且接近实时地确定出被感染的第一文件，并在文件感染记录中指示能够用于恢复该第一文件的被感染前的快照。可以理解地，上述内容仅以被感染文件中的第一文件为例进行介绍，在实施过程中，基于上述原理，针对一对相邻快照时间点可能会检测出多个被感染文件，因此，在该对相邻快照时间点对应的文件感染记录(项)中，可能记录有多个被感染文件对应的文件元数据。在另一些实施例中，也可能未检测出被感染文件，本申请对此不作限定。Through the above steps 1 to 3, the infected first file can be accurately and nearly real-time determined by determining the difference between snapshots at adjacent snapshot time points and the infection detection process, and the snapshot before the infection that can be used to restore the first file is indicated in the file infection record. It can be understood that the above content is only introduced by taking the first file among the infected files as an example. In the implementation process, based on the above principles, multiple infected files may be detected for a pair of adjacent snapshot time points. Therefore, in the file infection record (item) corresponding to the pair of adjacent snapshot time points, file metadata corresponding to multiple infected files may be recorded. In other embodiments, the infected file may not be detected, and this application does not limit this.

在另一些实施例中，服务器能够根据该文件感染记录，向文件系统管理端提供病毒感染报告。在一些实施例中，该病毒感染报告包括基于周期性的快照对文件系统中被感染文件的检测结果，也即是，哪些文件在哪些快照时间点被感染。基于此，可以及时地向文件系统管理端发起告警，以使文件系统管理端发起数据恢复流程。在另一些实施例中，文件系统管理端可以基于该病毒感染报告，校验上述检测结果的准确性，并将校验结果反馈到文件系统，从而进一步提升生成文件感染记录的准确性。In other embodiments, the server can provide a virus infection report to the file system management end based on the file infection record. In some embodiments, the virus infection report includes the detection results of infected files in the file system based on periodic snapshots, that is, which files are infected at which snapshot time points. Based on this, an alarm can be promptly sent to the file system management end so that the file system management end can initiate a data recovery process. In other embodiments, the file system management end can verify the accuracy of the above detection results based on the virus infection report, and feed back the verification results to the file system, thereby further improving the accuracy of the generated file infection record.

基于上述对步骤401的说明，本申请在图1所提供的实施环境的基础上，提供了一种文件系统的功能示意图，参见图5，其中，文件系统包括多个功能模块，文件存储业务模块501用于提供文件服务和快照服务(支持前述的高密快照、高密克隆等备份特性)，还能够配合数据恢复模块实现数据恢复；感染检测模块502用于确定被感染的文件，并生成文件感染记录，例如，执行上述步骤1至步骤3；数据恢复模块503用于基于该文件感染记录对文件系统进行恢复，例如，执行下述步骤402；对图5中文件系统、文件系统管理端、应用程序和病毒程序的更多介绍可参见图1对应的内容，在此不作赘述。Based on the above description of step 401, the present application provides a functional schematic diagram of a file system based on the implementation environment provided by FIG. 1, see FIG. 5, wherein the file system includes multiple functional modules, and the file storage service module 501 is used to provide file services and snapshot services (supporting the aforementioned high-density snapshots, high-density cloning and other backup features), and can also cooperate with the data recovery module to realize data recovery; the infection detection module 502 is used to determine the infected files and generate file infection records, for example, executing the above steps 1 to 3; the data recovery module 503 is used to restore the file system based on the file infection record, for example, executing the following step 402; for more introductions to the file system, file system management terminal, application program and virus program in FIG. 5, please refer to the corresponding content of FIG. 1, which will not be repeated here.

402、服务器响应于数据恢复指令，基于该文件感染记录，从该快照时间点不晚于该第一文件的感染时间的至少一个快照中，获取恢复数据，该恢复数据包括从该被感染前的快照中获取的未被感染的第一文件。402. In response to the data recovery instruction, the server obtains recovery data from at least one snapshot whose snapshot time point is no later than the infection time of the first file based on the file infection record, wherein the recovery data includes the uninfected first file obtained from the snapshot before the infection.

其中，该数据恢复指令用于指示对该文件系统进行恢复。在一些实施例中，该数据恢复指令由文件系统管理端发送，在另一些实施例中，该数据恢复指令也可以由业务系统中具有管理权限的应用程序发送，本申请对此不做限定。The data recovery instruction is used to instruct the file system to be recovered. In some embodiments, the data recovery instruction is sent by the file system management terminal, and in other embodiments, the data recovery instruction can also be sent by an application with management authority in the business system, which is not limited in this application.

其中，该恢复数据包括用于对该文件系统中被感染文件进行恢复的文件副本。The recovery data includes a file copy used to recover the infected file in the file system.

在一些实施例中，在文件感染记录中，记录有该第一文件的被感染前文件元数据(参见前述步骤3中的介绍)，该被感染前文件元数据对应的感染前快照也即是不晚于该第一文件的感染时间的快照。在这种示例下，服务器响应于该数据恢复指令，能够从该文件感染记录中，确定该第一文件的感染前文件元数据；进而从该第一文件的感染前文件元数据指示的快照中，获取未被感染的第一文件(也即是未被感染的文件副本)。通过上述过程，即可根据所记录的感染前文件元数据，快速地获取到未被感染的文件副本，为数据恢复过程提供高效的信息检索方式，大大提升了数据恢复的效率。In some embodiments, the file infection record records the pre-infection file metadata of the first file (see the introduction in the aforementioned step 3), and the pre-infection snapshot corresponding to the pre-infection file metadata is a snapshot that is no later than the infection time of the first file. In this example, the server, in response to the data recovery instruction, can determine the pre-infection file metadata of the first file from the file infection record; and then obtain the uninfected first file (that is, the uninfected file copy) from the snapshot indicated by the pre-infection file metadata of the first file. Through the above process, the uninfected file copy can be quickly obtained based on the recorded pre-infection file metadata, providing an efficient information retrieval method for the data recovery process, greatly improving the efficiency of data recovery.

在另一些实施例中，在文件感染记录中，还记录有该第一文件的被感染后文件元数据，该第一文件的被感染前文件元数据以及该第一文件的被感染后文件元数据相对应(参见前述步骤3中的介绍)。在这种示例下，服务器从该文件感染记录中，确定该第一文件的感染前文件元数据的过程包括：服务器基于该数据恢复指令，从该文件感染记录中，确定该第一文件的被感染后文件元数据。其中，该服务器从该文件感染记录中读取到该第一文件的被感染后元数据，即可确定该第一文件在被感染后被记录到的一个最早快照时间点，也即是，该被感染后元数据对应的快照时间点即可近似作为该第一文件被感染的时间点。基于此，服务器基于该第一文件的被感染后文件元数据，可以确定该第一文件的感染前文件元数据。In other embodiments, the file infection record also records the file metadata of the first file after infection, and the file metadata of the first file before infection corresponds to the file metadata of the first file after infection (see the introduction in the aforementioned step 3). In this example, the process of the server determining the file metadata of the first file before infection from the file infection record includes: the server determines the file metadata of the first file after infection from the file infection record based on the data recovery instruction. Among them, the server reads the metadata of the first file after infection from the file infection record, and can determine the earliest snapshot time point recorded after the first file is infected, that is, the snapshot time point corresponding to the metadata after infection can be approximately used as the time point when the first file is infected. Based on this, the server can determine the file metadata of the first file before infection based on the file metadata after infection of the first file.

在一些实施例中，服务器根据该被感染后文件元数据和被感染前文件元数据之间的映射关系，根据被感染后文件元数据，即可直接从文件感染记录中读取到对应的被感染前文件元数据。通过上述过程，不仅记录了用于获取可用文件副本的感染前文件元数据，还记录了能够指示文件的感染时间的被感染后文件元数据，从而为后续针对任一时间点的数据恢复提供完备的感染情况，进一步提升数据恢复的效率。In some embodiments, the server can directly read the corresponding pre-infection file metadata from the file infection record based on the mapping relationship between the post-infection file metadata and the pre-infection file metadata. Through the above process, not only the pre-infection file metadata used to obtain the available file copy is recorded, but also the post-infection file metadata that can indicate the infection time of the file is recorded, thereby providing a complete infection situation for subsequent data recovery at any time point, further improving the efficiency of data recovery.

在一些实施例中，服务器能够基于该文件感染记录，生成文件恢复任务表，进而根据该文件恢复任务表，从第一文件被感染前的快照中，获取该文件系统的恢复数据。In some embodiments, the server can generate a file recovery task list based on the file infection record, and then obtain the recovery data of the file system from a snapshot before the first file was infected according to the file recovery task list.

在一些实施例中，该文件恢复任务表中的任务项，指示被感染文件以及该被感染文件的感染前文件元数据。在另一些实施例中，该文件感染记录中记录有各个被感染文件的被感染后文件元数据，该被感染后文件元数据对应的快照时间点能够指示该被感染文件的感染时间，因此，服务器能够根据各个被感染文件的被感染后文件元数据对应的快照时间点之间的顺序，依次扫描感染文件对应的文件感染记录(项)，生成文件恢复任务表中的任务项。In some embodiments, the task item in the file recovery task table indicates the infected file and the pre-infection file metadata of the infected file. In other embodiments, the file infection record records the post-infection file metadata of each infected file, and the snapshot time point corresponding to the post-infection file metadata can indicate the infection time of the infected file. Therefore, the server can sequentially scan the file infection records (items) corresponding to the infected files according to the order between the snapshot time points corresponding to the post-infection file metadata of each infected file, and generate the task item in the file recovery task table.

在另一些实施例中，服务器响应于该数据恢复指令，生成克隆文件系统；进而基于该文件感染记录，从不晚于该第一文件的感染时间的至少一个快照中，获取恢复数据。该过程与上述获取恢复数据的过程同理，在此不做赘述。其中，该克隆文件系统是指该文件系统在指定时间点下的完整可用副本。In some other embodiments, the server generates a clone file system in response to the data recovery instruction; and then obtains recovery data from at least one snapshot that is no later than the infection time of the first file based on the file infection record. This process is the same as the above-mentioned process of obtaining recovery data, and is not described in detail here. The clone file system refers to a complete and available copy of the file system at a specified time point.

在一些实施例中，服务器能基于前述的高密克隆特性实现对文件系统的克隆过程。In some embodiments, the server can implement the cloning process of the file system based on the aforementioned high-density cloning feature.

在另一些实施例中，若该数据恢复指令针对目标时间点，也即是，该数据恢复指令指示将文件系统恢复至该目标时间点对应的状态。在这种示例下，若该目标时间点为快照时间点，则服务器直接获取该目标时间点对应的快照，并基于该文件感染记录，确定在该目标时间点下被感染的第一文件，进而从快照时间点不晚于该第一文件的感染时间的至少一个快照中，获取将该文件系统恢复至该目标时间点状态下的恢复数据。若该目标时间点不是快照时间点，则服务器确定距离该目标时间点最近的快照时间点，从而基于该快照时间点执行与上述同理的过程，获取到恢复数据。该目标时间点对应的快照中，包括未被感染的文件以及被感染的第一文件，通过上述过程即可获取到用于精准恢复被感染文件的恢复数据，从而高效地恢复出文件系统在目标时间点下的全量干净文件，大大提升了数据恢复的效率。In other embodiments, if the data recovery instruction is for a target time point, that is, the data recovery instruction instructs to restore the file system to the state corresponding to the target time point. In this example, if the target time point is a snapshot time point, the server directly obtains the snapshot corresponding to the target time point, and determines the first file infected at the target time point based on the file infection record, and then obtains the recovery data for restoring the file system to the state of the target time point from at least one snapshot whose snapshot time point is not later than the infection time of the first file. If the target time point is not a snapshot time point, the server determines the snapshot time point closest to the target time point, and then performs the same process as above based on the snapshot time point to obtain the recovery data. The snapshot corresponding to the target time point includes uninfected files and the infected first file. Through the above process, the recovery data for accurately restoring the infected file can be obtained, thereby efficiently restoring the full amount of clean files of the file system at the target time point, greatly improving the efficiency of data recovery.

在一些实施例中，该数据恢复指令还可以指示对目标时间点下文件系统中的文件或目录进行恢复，本申请对数据恢复的粒度不做限定。在这种示例下，数据恢复指令可以携带有待恢复文件的文件标识(如文件名称)，或者，待恢复的目录(如根目录下的文件夹)，以精确指示待恢复的对象。在这示例下，该恢复数据可以包括目标时间点下的目录文件。基于此，提供了多粒度的恢复方式，可以针对被感染的文件、文件夹以及文件系统等多种粒度进行恢复，大大提升了数据恢复的效率和灵活性。In some embodiments, the data recovery instruction can also indicate that the files or directories in the file system at the target time point are to be recovered. The present application does not limit the granularity of data recovery. In this example, the data recovery instruction can carry the file identifier (such as the file name) of the file to be recovered, or the directory to be recovered (such as the folder under the root directory) to accurately indicate the object to be recovered. In this example, the recovery data can include the directory file at the target time point. Based on this, a multi-granular recovery method is provided, which can be used to recover multiple granularities such as infected files, folders, and file systems, greatly improving the efficiency and flexibility of data recovery.

上述步骤401至步骤402描述了生成文件感染记录以及获取恢复数据的过程，下面提供一种详细示例，对文件的感染以及恢复过程进行说明。The above steps 401 to 402 describe the process of generating a file infection record and obtaining recovery data. A detailed example is provided below to illustrate the file infection and recovery process.

在一些实施例中，该文件系统基于对象存储结构来存储文件。在对象存储结构中，一个对象(object)可以包括一个文件以及该文件的文件元数据。示例性地，一个对象由对象键(key)、对象值(value)、和对象元数据(metadata)组成。对象键是对象的标识，可以理解为该文件的存储路径，能够用于查找该文件；对象值也即是文件的内容(objectcontent)；对象元数据(metadata)包括文件的属性信息，例如，文件的修改时间、文件大小等。以文件系统每隔1小时生成一个快照为例：In some embodiments, the file system stores files based on an object storage structure. In the object storage structure, an object (object) may include a file and the file metadata of the file. Exemplarily, an object consists of an object key (key), an object value (value), and object metadata (metadata). The object key is the identifier of the object, which can be understood as the storage path of the file and can be used to find the file; the object value is also the content of the file (object content); the object metadata (metadata) includes the attribute information of the file, such as the modification time of the file, the file size, etc. Take the file system generating a snapshot every 1 hour as an example:

在10点钟的快照中，文件“ABC.TXT”尚未被感染，此时文件系统内部以对象1(object1)作为该文件的对象标识，在文件系统的目录中，该文件以“ABC.TXT”呈现。其中，该对象标识指向该文件“ABC.TXT”。In the snapshot at 10 o'clock, the file "ABC.TXT" has not been infected. At this time, the file system uses object 1 as the object identifier of the file, and the file is presented as "ABC.TXT" in the directory of the file system. The object identifier points to the file "ABC.TXT".

在11点钟的快照中，文件“ABC.TXT”的后缀被修改为勒索病毒后缀“LOCK”，则该被感染文件在文件系统的目录中，呈现为“ABC.LOCK”。此时，文件系统内部以一个新的对象标识对象2(object 2)作为该文件的对象标识，该文件被感染。In the snapshot at 11 o'clock, the suffix of the file "ABC.TXT" is changed to the ransomware suffix "LOCK", and the infected file appears as "ABC.LOCK" in the directory of the file system. At this time, a new object identifier object 2 (object 2) is used as the object identifier of the file inside the file system, and the file is infected.

在12点钟的快照中，由于该文件已被感染，故无法再被应用程序正确识别，也不会再被应用程序所修改，此时该文件在文件系统的目录中，仍然呈现为“ABC.LOCK”，在文件系统内部的对象标识仍然为对象2(object 2)。In the 12 o'clock snapshot, since the file has been infected, it can no longer be correctly identified by the application and will no longer be modified by the application. At this time, the file is still presented as "ABC.LOCK" in the directory of the file system, and the object identifier inside the file system is still object 2.

本申请实施例所提供的方法能够基于10点钟的快照和11点钟的快照，通过步骤1确定出该“ABC.TXT”发生了变更，从而通过步骤2，将文件后缀中存在病毒标识“LOCK”的“ABC.LOCK”确定为被感染文件，进而通过步骤3，在本次检测对应的文件感染记录中，将该文件“ABC.TXT”记录为被感染文件。在一些实施例中，可以在文件感染记录中指示10点钟的快照记录为文件“ABC.TXT”的感染前快照，指示将11点钟的快照记录为文件“ABC.TXT”的被感染后快照。示例性地，可以将object 1对应的对象元数据记录为感染前文件元数据，将object 2对应的对象元数据记录被感染后文件元数据。参照上述表2，该10点钟的快照中的文件“ABC.TXT”则是未被感染文件，该11点钟的快照中的文件“ABC.LOCK”则是被感染文件。The method provided in the embodiment of the present application can determine that the "ABC.TXT" has changed based on the snapshot at 10 o'clock and the snapshot at 11 o'clock through step 1, so that through step 2, the "ABC.LOCK" with the virus identifier "LOCK" in the file suffix is determined as an infected file, and then through step 3, in the file infection record corresponding to this detection, the file "ABC.TXT" is recorded as an infected file. In some embodiments, the snapshot at 10 o'clock can be indicated in the file infection record as the pre-infection snapshot of the file "ABC.TXT", and the snapshot at 11 o'clock can be indicated as the post-infection snapshot of the file "ABC.TXT". Exemplarily, the object metadata corresponding to object 1 can be recorded as the pre-infection file metadata, and the object metadata corresponding to object 2 can be recorded as the post-infection file metadata. Referring to Table 2 above, the file "ABC.TXT" in the snapshot at 10 o'clock is an uninfected file, and the file "ABC.LOCK" in the snapshot at 11 o'clock is an infected file.

基于11点钟的快照和12点钟的快照，该文件“ABC.LOCK”并未发生变更，则文件感染记录中不对该文件的感染情况进行再次记录，12点钟之后的其他快照之间也同理，在此不作赘述。Based on the snapshots at 11 o'clock and 12 o'clock, the file "ABC.LOCK" has not been changed, so the infection status of the file is not recorded again in the file infection record. The same applies to other snapshots after 12 o'clock, which will not be repeated here.

在12点钟之后的某一点钟，服务器若收到指示对该文件“ABC.LOCK”进行恢复的数据恢复指令。由于文件被感染后，不再能够被应用程序正常访问，所有这个文件的内容会保持不变(相应的其元数据也保持不变)，也就是保持在刚被感染时的内容。因此，在当前时刻，文件系统中该待恢复文件呈现为“ABC.LOCK”，待恢复文件的文件元数据也为object 2对应的对象元数据(后续记为“object 2”)。基于此，服务器根据待恢复文件元数据“object2”，即可确定前述在文件感染记录中所存储的被感染后文件元数据“object 2”，由于在文件感染记录中记录了object 2与object 1有对应关系，从而，得以通过“object 2”确定感染前文件元数据“object 1”，进而从“object 1”所对应的10点钟的快照中，直接地获取对应的未被感染文件“ABC.TXT”，以对该某一点钟下的该文件“ABC.LOCK”进行精准的数据恢复。为了便于理解上述过程，本申请提供了一种感染前文件元数据搜索过程的示意，参见表3。At a certain time after 12 o'clock, if the server receives a data recovery instruction indicating to recover the file "ABC.LOCK". Since the file can no longer be normally accessed by the application after being infected, all the contents of this file will remain unchanged (the corresponding metadata will also remain unchanged), that is, the contents at the time of infection will remain. Therefore, at the current moment, the file to be recovered in the file system is presented as "ABC.LOCK", and the file metadata of the file to be recovered is also the object metadata corresponding to object 2 (hereinafter referred to as "object 2"). Based on this, the server can determine the infected file metadata "object 2" stored in the file infection record according to the file metadata "object2" to be recovered. Since the file infection record records that object 2 has a corresponding relationship with object 1, the file metadata "object 1" before infection can be determined through "object 2", and then the corresponding uninfected file "ABC.TXT" can be directly obtained from the snapshot at 10 o'clock corresponding to "object 1" to accurately recover the data of the file "ABC.LOCK" at that certain time. To facilitate understanding of the above process, the present application provides an illustration of a pre-infection file metadata search process, see Table 3.

表3Table 3

在另一些实施例中，在12点钟之后的某一点钟，服务器若收到针对文件系统的数据恢复指令，则此时文件系统中的该文件仍然呈现为“ABC.LOCK”，基于此，服务器根据前述在文件感染记录中所存储的被感染后文件元数据“object 2”，即可确定该文件“ABC.LOCK”被感染。由于在文件感染记录中记录有“object 2”与“object 1”之间存在映射关系，因此，从object 2可以找到object 1，基于此，根据存储的感染前文件元数据“object 1”，即可从10点钟的快照中高效地获取对应的未被感染文件“ABC.TXT”，以对该某一点钟下的该文件“ABC.LOCK”进行精准的数据恢复。In other embodiments, at a certain time after 12 o'clock, if the server receives a data recovery instruction for the file system, the file in the file system at this time is still presented as "ABC.LOCK". Based on this, the server can determine that the file "ABC.LOCK" is infected according to the infected file metadata "object 2" stored in the file infection record. Since the file infection record records that there is a mapping relationship between "object 2" and "object 1", object 1 can be found from object 2. Based on this, according to the stored pre-infection file metadata "object 1", the corresponding uninfected file "ABC.TXT" can be efficiently obtained from the snapshot at 10 o'clock, so as to accurately recover the data of the file "ABC.LOCK" at that certain time.

需要说明的是，上述过程以修改文件后缀这一感染方式为例进行说明，本申请实施例对病毒的感染方式不做限定。It should be noted that the above process is described by taking the infection method of modifying the file suffix as an example, and the embodiment of the present application does not limit the virus infection method.

403、服务器基于该恢复数据进行数据恢复。403. The server performs data recovery based on the recovery data.

在一些实施例中，服务器基于该恢复数据，可以用该未被感染的第一文件，覆盖当前文件系统中被感染的第一文件，以实现该文件系统中的第一文件进行恢复的过程。In some embodiments, based on the recovery data, the server may overwrite the infected first file in the current file system with the uninfected first file to implement a process of recovering the first file in the file system.

在一些实施例中，服务器通过执行上述文件恢复任务表中第一文件对应的任务项来获取到该未被感染的第一文件，从而用未被感染的第一文件，覆盖当前文件系统中被感染的第一文件。In some embodiments, the server obtains the uninfected first file by executing the task item corresponding to the first file in the above file recovery task table, thereby overwriting the infected first file in the current file system with the uninfected first file.

在一些实施例中，服务器在当前文件系统的克隆文件系统中，基于该恢复数据对该文件系统进行数据恢复，恢复过程与上述同理。In some embodiments, the server performs data recovery on the file system based on the recovery data in a cloned file system of the current file system, and the recovery process is the same as described above.

404、服务器响应于针对该第一文件的访问请求，访问该被感染前的快照中未被感染的第一文件。404. The server accesses the uninfected first file in the snapshot before infection in response to the access request for the first file.

本步骤404在上述步骤401完成之后即可执行。This step 404 can be performed after the above step 401 is completed.

在一些实施例中，该针对第一文件的访问请求携带该第一文件的标识，例如，第一文件的文件名称。在一些实施例中，该访问请求由业务系统中的应用程序发送。在另一些实施例中，该访问请求可以针对文件的文件元数据，例如，文件对应文件夹的列举目录，本申请对此不作限定。In some embodiments, the access request for the first file carries an identifier of the first file, for example, the file name of the first file. In some embodiments, the access request is sent by an application in a business system. In other embodiments, the access request may be for file metadata of the file, for example, a list of directories corresponding to the file, which is not limited in this application.

在一些实施例中，运行在业务系统中的应用程序作为远程客户端，通过网络访问该文件系统，例如，客户端可以基于网络文件系统(network file system，NFS)技术或通用网络文件系统(common internet file system，CIFS)技术访问该文件系统，本申请对此不作限定。In some embodiments, the application running in the business system acts as a remote client and accesses the file system through the network. For example, the client can access the file system based on the network file system (NFS) technology or the common internet file system (CIFS) technology, which is not limited in this application.

在一些实施例中，该访问请求可能在第一文件尚未恢复完成的情况下发送给该服务器，例如，步骤403尚未执行完成。服务器在该第一文件未恢复完成的情况下，响应于针对该第一文件的访问请求，访问该被感染前的快照中未被感染的第一文件。In some embodiments, the access request may be sent to the server when the first file has not been fully restored, for example, step 403 has not been completed. When the first file has not been fully restored, the server accesses the uninfected first file in the snapshot before infection in response to the access request for the first file.

在一些实施例中，服务器响应于针对该第一文件的访问请求，先从文件感染记录中确定该第一文件是否被感染，从而在确定该第一文件是被感染文件的情况下，根据服务器的文件恢复任务表中未执行完成的任务项，确定该第一文件是否恢复完成，进而在该第一文件未恢复完成的情况下，根据该第一文件的感染前文件元数据，访问该被感染前的快照中未被感染的第一文件，这一过程也称为针对第一文件的重定向访问。In some embodiments, in response to an access request for the first file, the server first determines from the file infection record whether the first file is infected, and then, if it is determined that the first file is an infected file, determines whether the first file has been recovered based on uncompleted task items in the file recovery task table of the server, and then, if the first file has not been recovered, access is made to the uninfected first file in the snapshot before infection based on the pre-infection file metadata of the first file. This process is also referred to as redirected access to the first file.

在另一些实施例中，在该第一文件未恢复完成的情况下，服务器即可直接访问当前的文件系统中已恢复完成的第一文件。In other embodiments, when the first file has not been completely restored, the server can directly access the first file that has been completely restored in the current file system.

在另一些实施例中，文件系统中设置有恢复状态标识，服务器在该文件系统进行数据恢复的情况下，例如，响应于数据恢复指令，将该文件系统的恢复状态标识设置为待恢复状态，该待恢复状态指示该文件系统正在进行数据恢复。在这种示例下，响应于针对该第一文件的访问请求，服务器首先对文件系统的恢复状态标识进行检测，若该文件系统的恢复状态标识为待恢复状态，则执行上述确定第一文件是否被感染以及第一文件是否恢复完成的判断。若该文件系统的恢复状态标识不是待恢复状态，则可以直接访问当前的文件系统中已恢复完成的第一文件。通过设置该恢复状态标识，能够加快后台进行数据恢复过程中对该第一文件的访问速度。In other embodiments, a recovery status flag is set in the file system. When the server performs data recovery on the file system, for example, in response to a data recovery instruction, the server sets the recovery status flag of the file system to a pending recovery state, and the pending recovery state indicates that the file system is performing data recovery. In this example, in response to an access request for the first file, the server first detects the recovery status flag of the file system. If the recovery status flag of the file system is a pending recovery state, the above-mentioned determination of whether the first file is infected and whether the first file has been fully recovered is performed. If the recovery status flag of the file system is not a pending recovery state, the first file that has been fully recovered in the current file system can be directly accessed. By setting the recovery status flag, the access speed to the first file during the background data recovery process can be accelerated.

本申请实施例中，文件系统周期性地对文件系统(或者目录)进行快照，存储系统定期或者实时对相邻的一对前、后快照时间点所对应的一对快照(感染前快照和感染后快照)进行对比，如果同一个文件在相邻快照时间点的一对快照中的差异符合被勒索病毒感染的特征，则确定这样的文件是在前、后快照时间点之间的时间范围内被感染。因此，用文件感染记录来记录该文件所对应的未感染的快照(感染前快照)，也即能够精准地记录下感染前快照中所记录的该文件在被感染前的最新数据。当用户希望对此被感染的文件进行恢复时，通过读取文件感染记录，能够直接定位到该文件被感染前最后一次记录到该文件的快照，并利用查找到的该感染前快照，对被感染的文件进行精准恢复。相较前述相关技术中选择各个快照时间点的快照逐一尝试恢复，直到找到被感染之前最晚一次快照为止的做法，可以看出，本申请实施例提供的数据恢复方法要更加高效。In the embodiment of the present application, the file system periodically takes snapshots of the file system (or directory), and the storage system periodically or in real time compares a pair of snapshots corresponding to a pair of adjacent front and rear snapshot time points (pre-infection snapshot and post-infection snapshot). If the difference in a pair of snapshots of the same file at adjacent snapshot time points meets the characteristics of being infected by the ransomware virus, it is determined that such a file is infected within the time range between the front and rear snapshot time points. Therefore, the uninfected snapshot (pre-infection snapshot) corresponding to the file is recorded with the file infection record, that is, the latest data of the file recorded in the pre-infection snapshot before being infected can be accurately recorded. When the user wants to recover the infected file, by reading the file infection record, the snapshot of the file recorded last time before the file was infected can be directly located, and the infected file can be accurately restored using the found pre-infection snapshot. Compared with the aforementioned related art, which selects snapshots at each snapshot time point and tries to recover them one by one until the latest snapshot before being infected is found, it can be seen that the data recovery method provided by the embodiment of the present application is more efficient.

通过上述技术方案，能够高效地记录文件被感染的情况，从而对被感染的文件进行精准恢复，有效避免快照回滚带来的数据损失，大大提升数据恢复的效率。进一步地，能够基于周期性获得的文件副本仅感染检测，接近实时地生成文件感染记录，并能够以病毒感染报告的形式向用户提供被感染文件的详情；并且，提供了前后台相结合的近实时数据精准恢复，能够接近实时地给应用程序或用户提供干净的可正常访问的文件，保证上层业务系统不会读取到被感染的文件或者不用等待文件恢复完成，大大提升了数据恢复在各种业务场景中的灵活性和适用性。Through the above technical solution, the infected file situation can be recorded efficiently, so as to accurately restore the infected file, effectively avoid the data loss caused by snapshot rollback, and greatly improve the efficiency of data recovery. Furthermore, it can generate file infection records in near real time based on the infection detection of periodically obtained file copies, and provide the details of the infected file to the user in the form of a virus infection report; and it provides near real-time data accurate recovery combining the front and back ends, which can provide clean and normally accessible files to applications or users in near real time, ensuring that the upper-level business system will not read the infected files or does not need to wait for the file recovery to be completed, greatly improving the flexibility and applicability of data recovery in various business scenarios.

上述过程以勒索病毒为例进行说明，本申请实施例提供的数据恢复方法也能够针对其他类型的病毒所造成的文件感染进行数据恢复，本申请对此不做限定。The above process is described using a ransomware virus as an example. The data recovery method provided in the embodiment of the present application can also perform data recovery for files infected by other types of viruses, and the present application does not limit this.

基于上述图4对应的实施例和图5所介绍的文件系统的功能，本申请提供了另一种数据恢复方法的流程图，参见图6。Based on the embodiment corresponding to FIG. 4 and the function of the file system introduced in FIG. 5 , the present application provides a flowchart of another data recovery method, see FIG. 6 .

图6中所示出的本申请实施例所提供的数据恢复方法包括保护流程、检测流程、恢复流程和访问流程等部分，下面分别对该保护流程、检测流程、恢复流程和访问流程进行介绍。The data recovery method provided in the embodiment of the present application shown in Figure 6 includes a protection process, a detection process, a recovery process and an access process. The protection process, detection process, recovery process and access process are introduced below respectively.

参见图6，本申请实施例所提供的数据恢复方法中的保护流程由文件存储业务模块501执行，该保护流程是指文件系统生成多个快照时间点下的快照，例如，T0、T1和T2下的快照，该保护能力指示文件系统生成一致性副本的能力，例如，前文介绍的高密快照、高密克隆和高密持续数据保护等特性。Referring to Figure 6, the protection process in the data recovery method provided in the embodiment of the present application is executed by the file storage business module 501. The protection process refers to the file system generating snapshots at multiple snapshot time points, for example, snapshots at T0, T1 and T2. The protection capability indicates the ability of the file system to generate consistent copies, such as the high-density snapshots, high-density cloning and high-density continuous data protection features introduced above.

参见图6，本申请实施例所提供的数据恢复方法中的检测流程由感染检测模块502执行。该检测流程包括周期性感染检测和生成病毒感染报告两个部分。其中，周期性感染检测包括：获取文件系统中的差异(参照步骤1)；基于差异进行感染检测，感染检测包括基于信息熵的检测(参见步骤2)；确定被感染的文件(参见步骤2)。生成病毒感染报告包括：生成文件感染记录(参见步骤3)，图6中的文件感染记录实例为上述表2；向文件系统管理端发送病毒感染报告(参见步骤3)，其中，文件系统管理端可以对文件感染记录进行校验。Referring to Figure 6, the detection process in the data recovery method provided in the embodiment of the present application is executed by the infection detection module 502. The detection process includes two parts: periodic infection detection and generating a virus infection report. Among them, the periodic infection detection includes: obtaining differences in the file system (refer to step 1); performing infection detection based on differences, and the infection detection includes detection based on information entropy (see step 2); determining infected files (see step 2). Generating a virus infection report includes: generating a file infection record (see step 3), and the file infection record example in Figure 6 is the above Table 2; sending a virus infection report to the file system management end (see step 3), wherein the file system management end can verify the file infection record.

参见图6，本申请实施例所提供的数据恢复方法中的恢复流程由数据恢复模块503执行。该恢复流程包括下述(1)到(3)。6 , the recovery process in the data recovery method provided in the embodiment of the present application is executed by the data recovery module 503. The recovery process includes the following (1) to (3).

(1)基于数据恢复指令指示的目标时间点Tn，生成克隆文件系统(参见步骤402)。(1) Based on the target time point Tn indicated by the data recovery instruction, a clone file system is generated (see step 402).

(2)将该文件系统的恢复状态标识设置为待恢复状态Flag。(2) The recovery status flag of the file system is set to the pending recovery status Flag.

(3)进行数据恢复，包括下述(3-1)至(3-3)。(3) Perform data recovery, including the following (3-1) to (3-3).

(3-1)数据恢复模块从文件感染记录中，确定文件系统中被感染的文件，生成文件恢复任务表。(3-1) The data recovery module determines the infected files in the file system from the file infection records and generates a file recovery task table.

(3-2)数据恢复控制模块执行文件恢复任务表中的任务项：用感染前的文件覆盖被感染的文件。(3-2) The data recovery control module executes the task item in the file recovery task table: overwriting the infected file with the pre-infection file.

(3-3)待数据恢复完成后，清除文件系统的待恢复状态Flag，清除文件恢复任务表。(3-3) After the data recovery is completed, the pending recovery status Flag of the file system is cleared, and the file recovery task table is cleared.

参见图6，本申请实施例所提供的数据恢复方法中的访问流程由文件存储业务模块501执行。在文件系统管理端对文件系统发起恢复，也即是，文件系统基于克隆文件系统(Tn时生成)进行数据恢复的同时，上层业务系统访问实时(Tn+1)文件系统，此时，该访问流程包括下述步骤A和步骤B。6, the access process in the data recovery method provided in the embodiment of the present application is executed by the file storage service module 501. The file system management terminal initiates recovery of the file system, that is, while the file system performs data recovery based on the clone file system (generated at Tn), the upper-layer service system accesses the real-time (Tn+1) file system. At this time, the access process includes the following steps A and B.

步骤A、上层业务系统中的应用程序或者任意用户基于NFS或CIFS对当前文件系统进行数据访问；Step A: An application program or any user in the upper-layer business system accesses data to the current file system based on NFS or CIFS;

步骤B、检测是否存在待恢复状态Flag，如果否，则正常访问当前的文件系统中的待访问文件；如果是，判断待访问的文件是否被感染，判断过程包括：查询文件感染记录，确定该待访问文件在Tn之前是否被感染，如果否，则正常访问当前的文件系统中的待访问文件；如果被感染，则查询文件恢复任务表，确定该待访问文件是否完成恢复；如果是，则支持访问当前的文件系统中的待访问文件；如果否，则根据该待访问文件的感染前文件元数据，重定向访问被感染之前的文件。Step B, check whether there is a flag of the state to be restored, if not, access the file to be accessed in the current file system normally; if yes, determine whether the file to be accessed is infected, and the determination process includes: query the file infection record to determine whether the file to be accessed is infected before Tn, if not, access the file to be accessed in the current file system normally; if infected, query the file recovery task table to determine whether the file to be accessed has been restored; if yes, support access to the file to be accessed in the current file system; if not, redirect access to the file before infection according to the file metadata before infection of the file to be accessed.

上述图6中各个流程的实现原理参考上述图4对应的实施例中的相应步骤，在此不作赘述。The implementation principles of each process in FIG. 6 above refer to the corresponding steps in the embodiment corresponding to FIG. 4 above, and are not described in detail here.

图7是本申请实施例提供的一种数据恢复装置的结构示意图。如图7所示，该数据恢复装置包括：生成模块701，用于基于多个快照，生成文件感染记录，该文件感染记录指示被感染的第一文件被感染前的快照，其中，该快照是文件系统的快照或者目录的快照，每个该快照对应于一个快照时间点；FIG7 is a schematic diagram of the structure of a data recovery device provided by an embodiment of the present application. As shown in FIG7, the data recovery device includes: a generation module 701, which is used to generate a file infection record based on multiple snapshots, wherein the file infection record indicates a snapshot of the infected first file before the infection, wherein the snapshot is a snapshot of a file system or a snapshot of a directory, and each of the snapshots corresponds to a snapshot time point;

恢复模块702，用于响应于数据恢复指令，基于该文件感染记录，从该快照时间点不晚于该第一文件的感染时间的至少一个快照中获取恢复数据，基于该恢复数据进行数据恢复，该恢复数据包括从该被感染前的快照中获取的未被感染的第一文件。The recovery module 702 is used to respond to the data recovery instruction, based on the file infection record, obtain recovery data from at least one snapshot whose snapshot time point is not later than the infection time of the first file, and perform data recovery based on the recovery data, wherein the recovery data includes the uninfected first file obtained from the snapshot before the infection.

在一种可能实施方式中，该生成模块701，包括：In a possible implementation manner, the generating module 701 includes:

差异确定单元，用于基于该多个快照中一对相邻快照时间点的快照，确定该相邻快照时间点中后一快照时间点的快照与该相邻快照时间点中前一快照时间点的快照之间的差异；a difference determining unit, configured to determine, based on snapshots at a pair of adjacent snapshot time points among the plurality of snapshots, a difference between a snapshot at a later snapshot time point among the adjacent snapshot time points and a snapshot at a previous snapshot time point among the adjacent snapshot time points;

感染检测单元，用于对该差异进行感染检测，以确定该第一文件是被感染文件；an infection detection unit, configured to perform infection detection on the difference to determine that the first file is an infected file;

记录单元，用于在该文件感染记录中，将该第一文件记录为被感染文件。The recording unit is used to record the first file as an infected file in the file infection record.

在一种可能实施方式中，该感染检测单元，用于：In a possible implementation manner, the infection detection unit is used to:

在一种可能实施方式中，该恢复模块702，包括：In a possible implementation, the recovery module 702 includes:

第一确定单元，用于从该文件感染记录中，确定该第一文件的感染前文件元数据；A first determining unit, configured to determine the pre-infection file metadata of the first file from the file infection record;

获取单元，用于从该第一文件的感染前文件元数据指示的快照中，获取未被感染的第一文件。The acquisition unit is used to acquire the uninfected first file from the snapshot indicated by the file metadata before the infection of the first file.

在一种可能实施方式中，在该文件感染记录中，还记录有该第一文件的被感染后文件元数据，该第一文件的被感染前文件元数据以及该第一文件的被感染后文件元数据相关联；该第一确定单元，用于：In a possible implementation manner, the file infection record also records the file metadata of the first file after infection, and the file metadata of the first file before infection and the file metadata of the first file after infection are associated; the first determining unit is used to:

在一种可能实施方式中，该恢复模块702，用于：In a possible implementation, the recovery module 702 is used to:

在一种可能实施方式中，该装置还包括：In one possible implementation, the device further includes:

访问模块，用于在该第一文件未恢复完成的情况下，响应于针对该第一文件的访问请求，访问该被感染前的快照中未被感染的第一文件。The access module is used to access the uninfected first file in the snapshot before infection in response to an access request for the first file when the first file has not been completely restored.

通过上述装置，能够高效地记录文件被感染的情况，从而对被感染的文件进行精准恢复，有效避免快照回滚带来的数据损失，大大提升数据恢复的效率。进一步地，能够基于周期性获得的文件副本仅感染检测，接近实时地生成文件感染记录，并能够以病毒感染报告的形式向用户提供被感染文件的详情；并且，提供了前后台相结合的近实时数据精准恢复，能够接近实时地给应用程序或用户提供干净的可正常访问的文件，保证上层业务系统不会读取到被感染的文件或者不用等待文件恢复完成，大大提升了数据恢复在各种业务场景中的灵活性和适用性。Through the above device, the infected file situation can be efficiently recorded, so as to accurately restore the infected file, effectively avoid data loss caused by snapshot rollback, and greatly improve the efficiency of data recovery. Furthermore, it can generate file infection records in near real time based on the infection detection of periodically obtained file copies, and can provide users with details of infected files in the form of virus infection reports; and it provides near real-time data accurate recovery combining front-end and back-end, which can provide clean and normally accessible files to applications or users in near real time, ensuring that the upper-level business system will not read the infected files or does not need to wait for the file recovery to be completed, greatly improving the flexibility and applicability of data recovery in various business scenarios.

另外，在上述数据恢复装置中，生成模块701和恢复模块702均可以通过软件实现，或者可以通过硬件实现。示例性的，接下来以生成模块701为例，介绍生成模块701的实现方式。类似的，恢复模块702以及其他模块的实现方式可以参考生成模块701的实现方式。In addition, in the above data recovery device, the generation module 701 and the recovery module 702 can be implemented by software or hardware. Exemplarily, the generation module 701 is taken as an example to introduce the implementation of the generation module 701. Similarly, the implementation of the recovery module 702 and other modules can refer to the implementation of the generation module 701.

模块作为软件功能单元的一种举例，生成模块701可以包括运行在计算实例上的代码。其中，计算实例可以包括物理主机(计算设备)、虚拟机、容器中的至少一种。进一步地，上述计算实例可以是一台或者多台。例如，生成模块701可以包括运行在多个主机/虚拟机/容器上的代码。需要说明的是，用于运行该代码的多个主机/虚拟机/容器可以分布在相同的区域(region)中，也可以分布在不同的region中。进一步地，用于运行该代码的多个主机/虚拟机/容器可以分布在相同的可用区(availability zone，AZ)中，也可以分布在不同的AZ中，每个AZ包括一个数据中心或多个地理位置相近的数据中心。其中，通常一个region可以包括多个AZ。As an example of a software functional unit, the generation module 701 may include code running on a computing instance. Among them, the computing instance may include at least one of a physical host (computing device), a virtual machine, and a container. Further, the above-mentioned computing instance may be one or more. For example, the generation module 701 may include code running on multiple hosts/virtual machines/containers. It should be noted that the multiple hosts/virtual machines/containers used to run the code can be distributed in the same region (region) or in different regions. Furthermore, the multiple hosts/virtual machines/containers used to run the code can be distributed in the same availability zone (AZ) or in different AZs, each AZ including a data center or multiple data centers with similar geographical locations. Among them, usually a region can include multiple AZs.

同样，用于运行该代码的多个主机/虚拟机/容器可以分布在同一个虚拟私有云(virtual private cloud，VPC)中，也可以分布在多个VPC中。其中，通常一个VPC设置在一个区域(region)内，同一region内两个VPC之间，以及不同region的VPC之间跨区通信需在每个VPC内设置通信网关，经通信网关实现VPC之间的互连。Similarly, multiple hosts/virtual machines/containers used to run the code can be distributed in the same virtual private cloud (VPC) or in multiple VPCs. Usually, a VPC is set up in a region. For cross-region communication between two VPCs in the same region and between VPCs in different regions, a communication gateway needs to be set up in each VPC to achieve interconnection between VPCs through the communication gateway.

模块作为硬件功能单元的一种举例，生成模块701可以包括至少一个计算设备。或者，生成模块701也可以是利用专用集成电路(application-specific integratedcircuit，ASIC)实现、或可编程逻辑器件(programmable logic device，PLD)实现的设备等。其中，上述PLD可以是复杂程序逻辑器件(complex programmable logical device，CPLD)、现场可编程门阵列(field-programmable gate array，FPGA)、通用阵列逻辑(generic array logic，GAL)或其任意组合实现。As an example of a hardware functional unit, the generation module 701 may include at least one computing device. Alternatively, the generation module 701 may also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD). The PLD may be a complex programmable logical device (CPLD), a field-programmable gate array (FPGA), a generic array logic (GAL) or any combination thereof.

生成模块701包括的多个计算设备可以分布在相同的region中，也可以分布在不同的region中。生成模块701包括的多个计算设备可以分布在相同的AZ中，也可以分布在不同的AZ中。同样，生成模块701包括的多个计算设备可以分布在同一个VPC中，也可以分布在多个VPC中。其中，该多个计算设备可以是服务器、ASIC、PLD、CPLD、FPGA和GAL等计算设备的任意组合。另外，上述实施例提供的数据恢复装置与数据恢复方法实施例属于同一构思，其具体实现过程详见方法实施例，这里不再赘述。The multiple computing devices included in the generation module 701 can be distributed in the same region or in different regions. The multiple computing devices included in the generation module 701 can be distributed in the same AZ or in different AZs. Similarly, the multiple computing devices included in the generation module 701 can be distributed in the same VPC or in multiple VPCs. The multiple computing devices can be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs. In addition, the data recovery device and the data recovery method embodiment provided in the above embodiment belong to the same concept, and the specific implementation process is detailed in the method embodiment, which will not be repeated here.

需要说明的是，本申请所涉及的信息(包括但不限于用户设备信息、用户个人信息等)、数据(包括但不限于用于分析的数据、存储的数据、展示的数据等)以及信号，均为经用户授权或者经过各方充分授权的，且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。例如，本申请中涉及到的文件和文件元数据都是在充分授权的情况下获取的。It should be noted that the information (including but not limited to user device information, user personal information, etc.), data (including but not limited to data used for analysis, stored data, displayed data, etc.) and signals involved in this application are all authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data must comply with the relevant laws, regulations and standards of the relevant countries and regions. For example, the files and file metadata involved in this application are obtained with full authorization.

本申请中术语“第一”“第二”等字样用于对作用和功能基本相同的相同项或相似项进行区分，应理解，“第一”、“第二”、“第n”之间不具有逻辑或时序上的依赖关系，也不对数量和执行顺序进行限定。还应理解，尽管以下描述使用术语第一、第二等来描述各种元素，但这些元素不应受术语的限制。这些术语只是用于将一元素与另一元素区别分开。例如，在不脱离各种所述示例的范围的情况下，第一文件可以被称为第二文件，并且类似地，第二文件可以被称为第一文件。第一文件和第二文件都可以是文件，并且在某些情况下，可以是单独且不同的文件。In this application, the words such as the term "first", "second", etc. are used to distinguish the same or similar items with substantially the same effects and functions. It should be understood that there is no logical or temporal dependency between "first", "second", and "nth", nor is the quantity and execution order limited. It should also be understood that although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by the terms. These terms are only used to distinguish one element from another. For example, without departing from the scope of the various examples described, the first file can be referred to as the second file, and similarly, the second file can be referred to as the first file. Both the first file and the second file can be files, and in some cases, can be separate and different files.

本申请中术语“至少一个”的含义是指一个或多个，本申请中术语“多个”的含义是指两个或两个以上，例如，多个文件是指两个或两个以上的文件。The term "at least one" in this application means one or more, and the term "plurality" in this application means two or more, for example, a plurality of files means two or more files.

以上描述，仅为本申请的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到各种等效的修改或替换，这些修改或替换都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应以权利要求的保护范围为准。The above description is only a specific implementation of the present application, but the protection scope of the present application is not limited thereto. Any technician familiar with the technical field can easily think of various equivalent modifications or replacements within the technical scope disclosed in the present application, and these modifications or replacements should be included in the protection scope of the present application. Therefore, the protection scope of the present application shall be based on the protection scope of the claims.

在上述实施例中，可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时，可以全部或部分地以程序结构信息的形式实现。该程序结构信息包括一个或多个程序指令。在计算设备上加载和执行该程序指令时，全部或部分地产生按照本申请实施例中的流程或功能。In the above embodiments, all or part of the embodiments may be implemented by software, hardware, firmware, or any combination thereof. When implemented by software, all or part of the embodiments may be implemented in the form of program structure information. The program structure information includes one or more program instructions. When the program instructions are loaded and executed on a computing device, all or part of the processes or functions in the embodiments of the present application are generated.

本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成，也可以通过程序来指令相关的硬件完成，该程序可以存储于一种计算机可读存储介质中，上述提到的存储介质可以是只读存储器，磁盘或光盘等。A person skilled in the art will understand that all or part of the steps to implement the above embodiments may be accomplished by hardware or by instructing related hardware through a program, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a disk or an optical disk, etc.

以上所述，以上实施例仅用以说明本申请的技术方案，而非对其限制；尽管参照前述实施例对本申请进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。As described above, the above embodiments are only used to illustrate the technical solutions of the present application, rather than to limit it. Although the present application has been described in detail with reference to the aforementioned embodiments, a person of ordinary skill in the art should understand that the technical solutions described in the aforementioned embodiments can still be modified, or some of the technical features therein can be replaced by equivalents. However, these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A method of data recovery, the method comprising:

generating a file infection record based on a plurality of snapshots, the file infection record indicating a snapshot of an infected first file before the infection, wherein the snapshots are snapshots of a file system or a directory, each snapshot corresponding to a snapshot time point;

and responding to a data recovery instruction, acquiring recovery data from at least one snapshot of which the snapshot time point is not later than the infection time of the first file based on the file infection record, and carrying out data recovery based on the recovery data, wherein the recovery data comprises an uninfected first file acquired from the snapshot before being infected.

2. The method of claim 1, wherein generating a file infection record based on the plurality of snapshots comprises:

determining a difference between a snapshot of a later snapshot time point of the adjacent snapshot time points and a snapshot of a previous snapshot time point of the adjacent snapshot time points based on snapshots of a pair of adjacent snapshot time points of the plurality of snapshots;

detecting the difference to determine that the first file is an infected file;

in the file infection record, the first file is recorded as an infected file.

3. The method of claim 2, wherein the detecting the difference to determine that the first file is an infected file comprises:

and determining the first file with the virus identifier in the file suffix as an infected file.

4. A method according to any one of claims 2 to 3, wherein said detecting the difference to determine that the first file is an infected file comprises:

and determining the first file with changed file characteristics as an infected file, wherein the file characteristics are used for representing the file.

5. The method of any one of claims 2 to 4, wherein the difference comprises: and newly added files or modified files in the snapshots of the later snapshot time point in the adjacent snapshot time points relative to the snapshots of the previous snapshot time point in the adjacent snapshot time points.

6. The method according to any one of claims 1 to 5, wherein in the file infection record, pre-infection file metadata of the first file is recorded.

7. The method of claim 6, wherein the obtaining recovery data from at least one snapshot of the snapshot time point that is no later than an infection time of the first file based on the file infection record comprises:

determining pre-infection file metadata of the first file from the file infection record;

and acquiring the first file which is not infected from the snapshot indicated by the file metadata before infection of the first file.

8. The method of claim 7, wherein in the file infection record, post-infection file metadata of the first file is also recorded, the pre-infection file metadata of the first file and post-infection file metadata of the first file being associated;

the determining, from the file infection record, pre-infection file metadata of the first file includes:

determining infected file metadata of the first file from the file infection record;

Based on the post-infection file metadata of the first file, pre-infection file metadata of the first file is determined.

9. The method according to any one of claims 1 to 8, wherein the obtaining, in response to a data recovery instruction, recovery data from at least one snapshot of the snapshot time point not later than an infection time of the first file based on the file infection record, performing data recovery based on the recovery data, includes:

generating a clone file system in response to the data recovery instruction;

and acquiring the recovery data from at least one snapshot of which the snapshot time point is not later than the infection time of the first file based on the file infection record, and performing data recovery based on the recovery data in the clone file system.

10. The method according to any one of claims 1 to 9, wherein the performing data recovery based on the recovery data comprises:

and covering the infected first file in the current file system by using the first file which is not infected.

11. The method according to any one of claims 1 to 10, further comprising:

And in the case that the first file is not restored to be completed, accessing the first file which is not infected in the snapshot before being infected in response to the access request for the first file.

12. The method according to any one of claims 1 to 11, wherein the infection is infection with the lux virus.

13. A data recovery apparatus, the apparatus comprising:

a generating module, configured to generate a file infection record based on a plurality of snapshots, where the file infection record indicates a snapshot of an infected first file before being infected, and the snapshot is a snapshot of a file system or a snapshot of a directory, and each snapshot corresponds to a snapshot time point;

and the recovery module is used for responding to a data recovery instruction, acquiring recovery data from at least one snapshot of which the snapshot time point is not later than the infection time of the first file based on the file infection record, and carrying out data recovery based on the recovery data, wherein the recovery data comprises an uninfected first file acquired from the snapshot before being infected.

14. The apparatus of claim 13, wherein the generating module comprises:

A difference determining unit configured to determine, based on snapshots of a pair of adjacent snapshot time points among the plurality of snapshots, a difference between a snapshot of a subsequent snapshot time point among the adjacent snapshot time points and a snapshot of a previous snapshot time point among the adjacent snapshot time points;

an infection detection unit configured to detect infection of the difference to determine that the first file is an infected file;

and the recording unit is used for recording the first file as an infected file in the file infection record.

15. The apparatus of claim 14, wherein the infection detection unit is configured to:

16. The apparatus according to any one of claims 14 to 15, wherein the infection detection unit is configured to:

17. The apparatus of any one of claims 14 to 16, wherein the difference comprises: and newly added files or modified files in the snapshots of the later snapshot time point in the adjacent snapshot time points relative to the snapshots of the previous snapshot time point in the adjacent snapshot time points.

18. The apparatus according to any one of claims 13 to 17, wherein in the file infection record, pre-infection file metadata of the first file is recorded.

19. The apparatus of claim 18, wherein the recovery module comprises:

a first determining unit configured to determine, from the file infection record, pre-infection file metadata of the first file;

and the acquisition unit is used for acquiring the first file which is not infected from the snapshot indicated by the file metadata before infection of the first file.

20. The apparatus of claim 19, wherein in the file infection record, post-infection file metadata of the first file is also recorded, the pre-infection file metadata of the first file and post-infection file metadata of the first file being associated; the first determining unit is configured to:

21. The apparatus according to any one of claims 13 to 20, wherein the recovery module is configured to:

Generating a clone file system in response to the data recovery instruction;

22. The apparatus of any one of claims 13 to 21, wherein the recovery module is configured to:

23. The apparatus according to any one of claims 13 to 22, further comprising:

and the access module is used for responding to the access request for the first file and accessing the first file which is not infected in the snapshot before being infected under the condition that the first file is not restored to be completed.

24. A device according to any one of claims 13 to 23, wherein the infection is infection with the lux virus.

25. A cluster of computing devices, comprising at least one computing device, each computing device comprising a processor and a memory;

The processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device to cause the cluster of computing devices to perform the data recovery method of any one of claims 1 to 12.

26. A computer readable storage medium for storing a program code for performing the data recovery method according to any one of claims 1 to 12.

27. A computer program product, characterized in that the computer program product, when run on a computing device, causes the computing device to perform the data recovery method of any of claims 1 to 12.