CN105933120A

CN105933120A - Spark platform-based password hash value recovery method and device

Info

Publication number: CN105933120A
Application number: CN201610211597.3A
Authority: CN
Inventors: 覃征; 李志鹏; 黄凯; 叶树雄; 杨晓; 张任伟; 徐凯平
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2016-04-06
Filing date: 2016-04-06
Publication date: 2016-09-07

Abstract

The invention discloses a Spark platform-based password hash value recovery method and device. The design method includes a rainbow table data generation step and a rainbow table decryption step. According to the method, the first-of-chain node value of each rainbow chain is recorded as an SV (Start Value), and the last-of-chain node value of each rainbow chain is recorded as an EV (End Value); based on the processing capability of the Spark platform for large-scale data, a map function is utilized to effectively calculate EV corresponding to SV, so that rainbow chains can be generated, and are stored in an HDFS (Hadoop distributed file system), and rainbow table data generation is completed; and a filter function is utilized to find all SV corresponding to a ciphertext to be decrypted, and a foreach function is called to generate complete rainbow chains according to each SV, and the ciphertext can be decrypted.

Description

A password hash value recovery method and device based on Spark platform

技术领域technical field

本发明属于网络信息安全技术领域与密码学中的密码逆向恢复技术领域，特别涉及一种基于Spark平台的口令哈希值恢复方法和装置。The invention belongs to the technical field of network information security and the technical field of password reverse recovery in cryptography, and in particular relates to a password hash value recovery method and device based on a Spark platform.

背景技术Background technique

为了保证数据信息安全属性中的不可篡改性，通常不对口令明文进行直接存储，而是对经过哈希运算的口令明文对应的哈希值进行存储。哈希算法，又称散列算法，通过置换和混淆等密码模块将任意长度的明文输入转化为固定长度的哈希值输出，且具有较好的单向性，无法轻易地由输出逆推得到输入，由于哈希函数良好的保密特性和校验功能，因此被广泛用于数字签名、下载校验、口令存储等应用。In order to ensure the non-tamperable modification of the security attribute of the data information, the plaintext of the password is usually not stored directly, but the hash value corresponding to the plaintext of the password after the hash operation is stored. Hash algorithm, also known as hash algorithm, converts plaintext input of any length into fixed-length hash value output through cryptographic modules such as permutation and confusion, and has a good one-way property, which cannot be easily reversed from the output. Input, due to the good confidentiality and verification functions of the hash function, it is widely used in applications such as digital signatures, download verification, and password storage.

破解口令哈希值的方法包括暴力穷举法和字典查找法，暴力穷举法对简单的密码和简单的密码系统是可行的，但对于复杂的密码和密码系统，则会产生无穷大的字典，从而需要海量的计算时间；而字典查找法则需要海量的存储空间，解密代价太高。Methods for cracking password hash values include brute force method and dictionary lookup method. The brute force method is feasible for simple passwords and simple cryptosystems, but for complex passwords and cryptosystems, an infinite dictionary will be generated. Therefore, a large amount of computing time is required; while the dictionary lookup method requires a large amount of storage space, and the decryption cost is too high.

为了减小所需要字典的大小，减少产生和查找字典的时间，现有技术提供一种针对碰撞链的解决方案——彩虹表，其基于Martin Hellman理论(基于内存与时间的权重理论)。彩虹表是暴力穷举和字典查找的折中，通过预计算的方式来减少口令恢复的时间花销，其核心思想是将明文计算得到的哈希值由一个映射函数映射回到明文空间，进而交替地计算明文和哈希值，以减少哈希值密码恢复的时间。In order to reduce the size of the required dictionary and reduce the time for generating and looking up the dictionary, the prior art provides a solution for collision chains—rainbow table, which is based on Martin Hellman theory (weight theory based on memory and time). The rainbow table is a compromise between brute force exhaustion and dictionary lookup. It reduces the time spent on password recovery through pre-computation. Its core idea is to map the hash value calculated by plaintext back to plaintext space by a mapping function, and then Alternately compute plaintext and hash values to reduce hashed password recovery time.

Spark是一个开源簇运算框架，最初是由加州大学伯克利分校AMPLab所开发。相对于Hadoop的MapReduce会在运行完工作后将中介数据存放到磁盘中，Spark使用了存储器内运算技术，能在数据尚未写入硬盘时即在存储器内分析运算。Spark在存储器内运行程序的运算速度能做到比HadoopMapReduce的运算速度快上100倍，即便是运行程序于硬盘时，Spark也能快上10倍速度。Spark的基础数据结构是RDD，Spark平台将操作封装在transformation和action中，其中transformation是对RDD的转换，action是对RDD进行计算。调用transformation的action中的API，Spark平台会自行进行分布式运算，程序员只需对Spark进行配置，无需考虑分布式是如何进行的，这大大降低了平台使用的门槛。Spark is an open source cluster computing framework originally developed by AMPLab at the University of California, Berkeley. Compared with Hadoop's MapReduce, which will store the intermediary data in the disk after running the work, Spark uses the in-memory computing technology, which can analyze and calculate in the memory before the data is written to the hard disk. The calculation speed of Spark running programs in the memory can be 100 times faster than that of Hadoop MapReduce. Even when running programs on the hard disk, Spark can also be 10 times faster. The basic data structure of Spark is RDD, and the Spark platform encapsulates operations in transformation and action, where transformation is the transformation of RDD, and action is the calculation of RDD. Call the API in the transformation action, and the Spark platform will perform distributed computing on its own. Programmers only need to configure Spark without considering how the distribution is performed, which greatly reduces the threshold for using the platform.

发明内容Contents of the invention

为了克服上述现有技术的缺点，本发明的目的在于提供一种基于Spark平台的口令哈希值恢复方法和装置，可实现大容量彩虹表的并行生成、彩虹链并行过滤和彩虹链并行解密，使其具有高性能、低门槛等优点。In order to overcome the shortcomings of the above-mentioned prior art, the object of the present invention is to provide a password hash value recovery method and device based on the Spark platform, which can realize parallel generation of large-capacity rainbow tables, parallel filtering of rainbow chains and parallel decryption of rainbow chains, It has the advantages of high performance and low threshold.

为了实现上述目的，本发明采用的技术方案是：In order to achieve the above object, the technical scheme adopted in the present invention is:

利用Spark平台进行分布式并行生成彩虹表和解密。在生成彩虹表的过程中，需要根据所有随机生成的链首节点值生成其对应的链尾节点值。利用Spark平台的map函数，可以对每一个链首节点进行独立操作，也就是说，生成彩虹表是可以高度并行的，这大大提高了生成效率。彩虹表在HDFS中是以分块的方式进行存储的，调用Spark平台的filter函数，可以并行地从各块找出待解密密文对应的彩虹链，这大大加快了文本匹配的速度。利用Spark平台的foreach函数，可以并行地从彩虹链中计算出待解密密文所对应的明文。Use the Spark platform for distributed parallel generation of rainbow tables and decryption. In the process of generating the rainbow table, it is necessary to generate its corresponding chain tail node value based on all randomly generated chain head node values. Using the map function of the Spark platform, each chain head node can be independently operated, that is to say, the generation of the rainbow table can be highly parallelized, which greatly improves the generation efficiency. The rainbow table is stored in blocks in HDFS. By calling the filter function of the Spark platform, the rainbow chain corresponding to the ciphertext to be decrypted can be found from each block in parallel, which greatly speeds up the speed of text matching. Using the foreach function of the Spark platform, the plaintext corresponding to the ciphertext to be decrypted can be calculated from the rainbow chain in parallel.

具体地，本发明的技术方案是：Specifically, the technical scheme of the present invention is:

一种基于Spark平台的口令哈希值恢复方法，包括彩虹表数据生成步骤和彩虹表解密步骤，其特征在于，A password hash value recovery method based on the Spark platform, comprising a rainbow table data generation step and a rainbow table decryption step, characterized in that,

设彩虹链的数量为S，长度为L，L>1，则所述彩虹表数据生成步骤包括：Assume that the number of rainbow chains is S, the length is L, and L>1, then the steps of generating rainbow table data include:

步骤A：根据字符集随机生成S个链首节点值SV；Step A: randomly generate S chain head node values SV according to the character set;

步骤B：根据彩虹链的生成规则，计算出链首节点值SV对应的链尾节点值EV；Step B: According to the generation rules of the rainbow chain, calculate the chain tail node value EV corresponding to the chain head node value SV;

步骤C：将生成的所有(SV,EV)的元组保存在Hadoop分布式文件系统HDFS中；Step C: saving all generated (SV, EV) tuples in the Hadoop distributed file system HDFS;

所述彩虹表解密步骤包括：The steps of decrypting the rainbow table include:

步骤D：从HDFS中读出彩虹表；Step D: read the rainbow table from HDFS;

步骤E：从彩虹表中过滤出密文对应的彩虹链；Step E: Filter out the rainbow chain corresponding to the ciphertext from the rainbow table;

步骤F：根据得到的彩虹链计算出该密文对应的明文；Step F: Calculate the plaintext corresponding to the ciphertext according to the obtained rainbow chain;

具体地，所述链尾节点值EV的计算包括以下步骤：Specifically, the calculation of the chain tail node value EV includes the following steps:

步骤B1：对每个链首节点值SV执行L-1次f函数，生成L-2个中间节点和一个链尾节点；f函数包括H函数和R函数两个部分；其中H函数即指定的加密函数，R函数是与当前节点位置i有关的函数(1≤i≤L-1)，该函数的定义域和值域需要和H函数相反；本文用R_i表示参数为i的R函数，f_i表示H函数和R_i函数的组合；可以令R_i＝(X+i)mod N(其中X表示经过H函数处理后的字符串，N表示明文的范围)；步骤B2：调用Spark平台的map函数并行地执行步骤B1。Step B1: Execute the f function L-1 times for each chain head node value SV to generate L-2 intermediate nodes and a chain tail node; the f function includes two parts, the H function and the R function; the H function is the specified Encryption function, the R function is a function related to the current node position i (1≤i≤L-1), the definition domain and value range of this function need to be opposite to the H function; in this paper, R _i is used to represent the R function with parameter i, f _i represents the combination of the H function and the R _i function; R _i = (X+i) mod N (wherein X represents the character string processed by the H function, and N represents the scope of the plaintext); Step B2: call the Spark platform The map function executes step B1 in parallel.

所述彩虹表中过滤出密文对应的彩虹链包括以下步骤：Filtering out the rainbow chain corresponding to the ciphertext in the rainbow table includes the following steps:

步骤E1：猜测待解密密文所对应的明文在彩虹链中的位置i，依次从L-1到1进行尝试；Step E1: Guess the position i of the plaintext corresponding to the ciphertext to be decrypted in the rainbow chain, and try sequentially from L-1 to 1;

步骤E2：对密文执行R_i函数，将结果保存在中间节点M中；再对M执行f_i+1，f_i+2，…，f_L-1，将结果赋值给M；Step E2: Execute the R _i function on the ciphertext, save the result in the intermediate node M; then execute f _i+1 , f _i+2 , ..., f _L-1 on M, and assign the result to M;

步骤E3：调用Spark平台的filter函数并行地过滤出所有链尾节点值等于M的彩虹链，若彩虹表中没有链尾节点值等于M的彩虹链，则尝试下一个i；若所有尝试均未找到符合要求的彩虹链，则解密失败；若存在链尾节点值等于M的彩虹链，则进入下一步骤。Step E3: Call the filter function of the Spark platform to filter out all rainbow chains whose tail node value is equal to M in parallel. If there is no rainbow chain whose chain tail node value is equal to M in the rainbow table, try the next i; if all attempts fail If a rainbow chain that meets the requirements is found, the decryption fails; if there is a rainbow chain whose tail node value is equal to M, then enter the next step.

根据彩虹链计算出明文包括以下步骤：Calculating the plaintext according to the rainbow chain includes the following steps:

步骤F1：从步骤E中得到该密文对应的明文在彩虹链中的位置i，对该彩虹链的链首节点值SV执行f₁，f₂，…，f_i-1，得到的结果即为该密文对应的明文。Step F1: Obtain the position i of the plaintext corresponding to the ciphertext in the rainbow chain from step E, execute f ₁ , f ₂ ,...,f _i-1 on the chain head node value SV of the rainbow chain, and the obtained result is is the plaintext corresponding to the ciphertext.

本发明同时提出一种基于Spark平台的口令哈希值恢复装置，包括Spark配置单元、彩虹表数据生成单元和彩虹表解密单元，其中：The present invention simultaneously proposes a password hash value recovery device based on the Spark platform, including a Spark configuration unit, a rainbow table data generation unit and a rainbow table decryption unit, wherein:

Spark配置单元执行该恢复装置的准备工作，对Spark平台提供的计算能力进行配置；The Spark configuration unit executes the preparation work of the recovery device, and configures the computing power provided by the Spark platform;

彩虹表数据生成单元产生若干条彩虹链的集合，每条彩虹链均通过一系列哈希运算和映射函数迭代计算得到，彩虹表数据生成单元仅需要存储链首节点和链尾节点；The rainbow table data generation unit generates a collection of several rainbow chains, and each rainbow chain is calculated through a series of hash operations and mapping functions iteratively. The rainbow table data generation unit only needs to store the chain head node and the chain tail node;

彩虹表解密单元对待解密密文所对应的明文所在位置进行猜测，寻找该密文对应的链尾节点；然后在彩虹表中找到链尾节点与该密文对应的链尾节点相同的彩虹链，根据这些彩虹链计算出该密文对应的明文。The rainbow table decryption unit guesses the location of the plaintext corresponding to the ciphertext to be decrypted, and searches for the chain tail node corresponding to the ciphertext; then finds the rainbow chain whose chain tail node is the same as the chain tail node corresponding to the ciphertext in the rainbow table, Calculate the plaintext corresponding to the ciphertext based on these rainbow chains.

具体地，所述Spark配置单元的准备工作主要包括：Specifically, the preparation of the Spark configuration unit mainly includes:

搭建Spark平台，配置Master和Slave，然后再对Worker数量及大小进行配置。Build the Spark platform, configure the Master and Slave, and then configure the number and size of Workers.

所述彩虹表数据生成单元执行功能主要包括：Described rainbow table data generation unit execution function mainly comprises:

随机生成S个链首节点值，然后根据彩虹链生成规则，计算链首节点值对应的链尾节点值，调用Spark平台的map函数将计算结果存储在Hadoop分布式文件系统HDFS中。Randomly generate S chain head node values, and then calculate the chain tail node value corresponding to the chain head node value according to the rainbow chain generation rules, and call the map function of the Spark platform to store the calculation results in the Hadoop distributed file system HDFS.

所述彩虹表解密单元执行功能主要包括：Described rainbow table decryption unit executive function mainly comprises:

猜测待解密密文所对应的明文在彩虹链中的位置i，对密文执行R_i函数，将结果保存在中间节点M中；再对M执行f_i+1，f_i+2，…，f_L-1，将结果赋值给M，得到该密文对应的链尾节点；然后调用Spark平台的filter函数从彩虹表中过滤出链尾节点等于密文对应的链尾节点的彩虹链，若彩虹表中没有链尾节点值等于M的彩虹链，则尝试下一个i；若所有尝试均未找到符合要求的彩虹链，则解密失败；若存在链尾节点值等于M的彩虹链，则调用Spark平台的foreach函数对每个彩虹链计算该密文对应的明文。Guess the position i of the plaintext corresponding to the ciphertext to be decrypted in the rainbow chain, execute the R _i function on the ciphertext, and save the result in the intermediate node M; then execute f _i+1 , f _i+2 ,..., _If If there is no rainbow chain whose tail node value is equal to M in the rainbow table, try the next i; if all attempts fail to find a rainbow chain that meets the requirements, the decryption fails; if there is a rainbow chain whose tail node value is equal to M, call The foreach function of the Spark platform calculates the plaintext corresponding to the ciphertext for each rainbow chain.

与现有技术相比，本发明的有益效果是：利用Spark平台在内存中高效计算及其高度并行的特点，大大地提高了彩虹表的生成和解密的效率。Compared with the prior art, the beneficial effect of the present invention is that the efficiency of generating and decrypting the rainbow table is greatly improved by utilizing the high-efficiency calculation in the memory of the Spark platform and its highly parallel characteristics.

附图说明Description of drawings

图1是基于Spark平台口令哈希值恢复方法的流程图。Fig. 1 is a flowchart of a password hash value recovery method based on the Spark platform.

图2是基于Spark平台口令哈希值恢复方法生成彩虹表部分的流程图。Fig. 2 is a flow chart of generating a rainbow table part based on the Spark platform password hash value recovery method.

图3是基于Spark平台口令哈希值恢复方法对待解密密文进行解密部分的流程图。Fig. 3 is a flow chart of the decryption part of the ciphertext to be decrypted based on the Spark platform password hash value recovery method.

图4是基于Spark平台口令哈希值恢复装置的功能框图。Fig. 4 is a functional block diagram of a password hash value recovery device based on the Spark platform.

图5是基于Spark平台口令哈希值恢复装置彩虹表生成单元的功能框图。Fig. 5 is a functional block diagram of the rainbow table generation unit of the password hash value recovery device based on the Spark platform.

图6是基于Spark平台口令哈希值恢复装置解密单元的功能框图。Fig. 6 is a functional block diagram of the decryption unit of the password hash value recovery device based on the Spark platform.

具体实施方式detailed description

下面结合附图和实施例详细说明本发明的实施方式。The implementation of the present invention will be described in detail below in conjunction with the drawings and examples.

Spark是UC Berkeley AMP lab所开源的类Hadoop MapReduce的通用的并行计算框架，Spark基于map reduce算法实现的分布式计算，拥有HadoopMapReduce所具有的优点。此外，Spark任务中间输出和结果可以保存在内存中，这可以加快计算的读写效率。RDD是Spark的最基本抽象,是对分布式内存的抽象使用，实现了以操作本地集合的方式来操作分布式数据集的抽象实现。Spark is a common Hadoop MapReduce-like parallel computing framework open sourced by UC Berkeley AMP lab. Spark implements distributed computing based on the map reduce algorithm and has the advantages of Hadoop MapReduce. In addition, the intermediate output and results of Spark tasks can be stored in memory, which can speed up the read and write efficiency of calculations. RDD is the most basic abstraction of Spark. It is an abstract use of distributed memory and realizes the abstract implementation of operating distributed data sets in the way of operating local collections.

RDD是Spark最核心的东西，它表示已被分区，不可变的并能够被并行操作的数据集合，不同的数据集格式对应不同的RDD实现。RDD必须是可序列化的。RDD可以cache到内存中，每次对RDD数据集的操作之后的结果，都可以存放到内存中，下一个操作可以直接从内存中输入，省去了MapReduce大量的磁盘IO操作。RDD is the core of Spark. It represents a partitioned, immutable data set that can be operated in parallel. Different data set formats correspond to different RDD implementations. RDD must be serializable. RDD can be cached in the memory, and the result of each operation on the RDD data set can be stored in the memory, and the next operation can be directly input from the memory, which saves a lot of disk IO operations of MapReduce.

本发明正是利用Spark平台的内存计算和高度并行，利用RDD作为中间结果，高效地实现彩虹表生成与解密。具体地，本发明的一种基于Spark平台的口令哈希恢复方法，包括如图1所示的3个步骤。The present invention utilizes the memory calculation and high parallelism of the Spark platform, uses RDD as an intermediate result, and efficiently realizes the generation and decryption of the rainbow table. Specifically, a password hash recovery method based on the Spark platform of the present invention includes three steps as shown in FIG. 1 .

Step1，配置Spark平台参数，主要包括搭建Spark平台、配置Spark平台的Master和Slave(即配置Spark平台控制多少台主机)和配置Spark平台的Worker数量及大小(即配置每台主机拥有的工作线程数量及每个线程可使用的内存和CPU的大小)。Step1, configure the Spark platform parameters, mainly including building the Spark platform, configuring the Master and Slave of the Spark platform (that is, configuring how many hosts the Spark platform controls) and configuring the number and size of Workers on the Spark platform (that is, configuring the number of worker threads that each host has. and the size of memory and CPU available to each thread).

Step2，生成彩虹表。Step2, generate a rainbow table.

Step3，对待解密密文进行解密。Step3, decrypt the ciphertext to be decrypted.

具体地，彩虹表的生成步骤包括(见图2，设彩虹表的大小为S，每条彩虹链的长度为L(L＞1)，虹链链首节点值为SV(Start Value)，链尾节点值为EV(End Value))：Specifically, the generation steps of the rainbow table include (see Figure 2, assuming that the size of the rainbow table is S, the length of each rainbow chain is L (L>1), the value of the first node of the rainbow chain is SV (Start Value), and the chain The end node value is EV(End Value)):

Step 21，随机生成S个彩虹链首节点值SV，保存在列表list中。Step 21. Randomly generate S SVs of the first node of the rainbow chain and save them in the list list.

Step 22，根据彩虹链的生成规则，将list中的每个SV作为彩虹链首节点值计算出链尾节点值EV，调用Spark的transformation操作中的map函数，将所有(SV,EV)元组生成RDD数据集。Step 22. According to the generation rules of the rainbow chain, use each SV in the list as the first node value of the rainbow chain to calculate the chain tail node value EV, call the map function in the transformation operation of Spark, and convert all (SV, EV) tuples Generate RDD datasets.

Step 23，将RDD数据集保存在HDFS中。Step 23, save the RDD dataset in HDFS.

彩虹表的解密步骤包括(见图3)：The decryption steps of the rainbow table include (see Figure 3):

Step 31，从HDFS中读入彩虹表。Step 31, read the rainbow table from HDFS.

Step 32，猜测密文对应的明文所在的位置i，对密文执行R_i函数，将结果保存在中间节点M中。Step 32. Guess the position i of the plaintext corresponding to the ciphertext, execute the R _i function on the ciphertext, and save the result in the intermediate node M.

Step 33，对M执行f_i+1，f_i+2，…，f_L-1，将结果赋给M。Step 33. Execute f _i+1 , f _i+2 , ..., f _L-1 on M, and assign the result to M.

Step 34，从彩虹表中过滤出链尾节点值EV等于M的彩虹链。Step 34. Filter out the rainbow chain whose tail node value EV is equal to M from the rainbow table.

Step 32～Step34具体步骤如下：The specific steps of Step 32～Step34 are as follows:

初始化i为L-1；Initialize i as L-1;

对待解密的密文执行R_i函数，得到中间节点M；Execute the R _i function on the ciphertext to be decrypted to obtain the intermediate node M;

令j＝i+1；Let j=i+1;

若j≤L-1，则对M执行H函数，R_j+1函数，并将结果赋值给M，转至(5)；否则转至(6)；If j≤L-1, execute H function and R _j+1 function on M, assign the result to M, and go to (5); otherwise, go to (6);

令j＝j+1，转至(4)；Make j=j+1, go to (4);

调用Spark的transformation操作中的filter函数，从彩虹表中过滤出所有EV等于M的SV。若SV的数量等于0，则令i＝i-1，若i＞0，则转至(2)，若i＝0，否则解密失败；若SV的数量大于0，则进入下一步骤；Call the filter function in Spark's transformation operation to filter out all SVs whose EV is equal to M from the rainbow table. If the quantity of SV is equal to 0, then make i=i-1, if i>0, then go to (2), if i=0, otherwise decryption fails; if the quantity of SV is greater than 0, then enter next step;

Step 35，调用Spark的action操作中的foreach函数，对每个符合条件的彩虹链链首节点值SV执行f1，f2，…，fi-1，得到的结果即为对应的明文。Step 35. Call the foreach function in Spark's action operation, and execute f1, f2, ..., fi-1 for each qualified rainbow chain chain first node value SV, and the result obtained is the corresponding plaintext.

本发明同时提出一种基于Spark平台口令哈希恢复装置，包括如图4所示的3个单元，Spark配置单元、彩虹表生成单元以及解密单元。The present invention simultaneously proposes a password hash recovery device based on the Spark platform, including 3 units as shown in Figure 4, a Spark configuration unit, a rainbow table generation unit and a decryption unit.

其中，Spark配置单元执行的功能包括：Among others, the functions performed by the Spark hive include:

搭建Spark平台，配置Master和Slave，以及配置Worker数量及大小。Build the Spark platform, configure the Master and Slave, and configure the number and size of Workers.

如图5所示，彩虹表生成单元执行的功能包括：As shown in Figure 5, the functions performed by the rainbow table generation unit include:

链首节点生成单元，该单元用于随机生成S个链首节点，并且保证节点不重复。Chain head node generation unit, which is used to randomly generate S chain head nodes and ensure that the nodes are not repeated.

链尾节点计算单元，该单元根据彩虹链生成规则，计算链首节点值对应的链尾节点值，调用Spark平台的map函数将计算结果存储在Hadoop分布式文件系统HDFS中Chain tail node calculation unit, which calculates the chain tail node value corresponding to the chain head node value according to the rainbow chain generation rules, and calls the map function of the Spark platform to store the calculation results in the Hadoop distributed file system HDFS

如图6所示，解密单元执行的功能包括：As shown in Figure 6, the functions performed by the decryption unit include:

密文对应的链尾节点计算单元，该单元猜测待解密密文所对应的明文在彩虹链中的位置i，对密文执行R_i函数，将结果保存在中间节点M中，再对M执行f_i+1，f_i+2，…，f_L-1，将结果赋值给M，得到该密文对应的链尾节点。The calculation unit of the chain tail node corresponding to the ciphertext, which guesses the position i of the plaintext corresponding to the ciphertext to be decrypted in the rainbow chain, executes the R _i function on the ciphertext, saves the result in the intermediate node M, and then executes f _i+1 , f _i+2 , ..., f _L-1 , assign the result to M, and get the chain end node corresponding to the ciphertext.

彩虹链过滤单元，该单元调用Spark平台的filter函数从彩虹表中过滤出链尾节点等于密文对应的链尾节点的彩虹链。Rainbow chain filtering unit, which calls the filter function of the Spark platform to filter out the rainbow chain whose chain tail node is equal to the chain tail node corresponding to the ciphertext from the rainbow table.

生成明文单元，该单元调用Spark平台的foreach函数对每个彩虹链计算该密文对应的明文。Generate a plaintext unit, which calls the foreach function of the Spark platform to calculate the plaintext corresponding to the ciphertext for each rainbow chain.

以上对本发明所提供的一种基于Spark平台口令哈希恢复方法和装置进行了详细介绍，本文对本发明的原理及具体的实施方式进行了阐述，以上详细步骤的用于帮助理解本发明的方法及核心思想；同时，对于本领域的技术人员，根据本发明的思想，在具体的实现方式上均会有变化和改进之处，这些变化和改进均属于本发明的保护范围之内。A kind of based on Spark platform password hash recovery method and device provided by the present invention has been introduced in detail above, the principle and specific implementation of the present invention have been set forth in this paper, the above detailed steps are used to help understand the method and method of the present invention core idea; at the same time, for those skilled in the art, according to the idea of the present invention, there will be changes and improvements in specific implementation methods, and these changes and improvements all belong to the protection scope of the present invention.

Claims

1. A password hash value recovery method based on Spark platform, comprising rainbow table data generation step and rainbow table decryption step, is characterized in that,

Assume that the number of rainbow chains is S, the length is L, and L>1, then the steps of generating rainbow table data include:

Step A: randomly generate S chain head node values SV according to the character set;

Step B: According to the generation rules of the rainbow chain, calculate the chain tail node value EV corresponding to the chain head node value SV;

Step C: saving all generated (SV, EV) tuples in the Hadoop distributed file system HDFS;

The steps of decrypting the rainbow table include:

Step D: read the rainbow table from HDFS;

Step E: Filter out the rainbow chain corresponding to the ciphertext from the rainbow table;

Step F: Calculate the plaintext corresponding to the ciphertext according to the obtained rainbow chain.

2. the password hash value recovery method based on Spark platform according to claim 1, is characterized in that, the calculation of described chain tail node value EV comprises the following steps:

Step B1: Execute the f function L-1 times for each chain head node value SV to generate L-2 intermediate nodes and a chain tail node; the f function includes two parts, the H function and the R function; the H function is the specified Encryption function, the R function is a function related to the current node position i (1≤i≤L-1), the definition domain and value range of this function need to be opposite to the H function; use R _i to represent the R function with parameter i, f _i represents the combination of the H function and the R _i function; make R _i =(X+i) mod N, where X represents the character string processed by the H function, and N represents the scope of the plaintext;

Step B2: call the map function of the Spark platform to execute step B1 in parallel.

3. according to the described password hash value recovery method based on Spark platform of claim 2, it is characterized in that, filtering out the rainbow chain corresponding to ciphertext in the described rainbow table comprises the following steps:

Step E1: guess the position i of the plaintext corresponding to the ciphertext to be decrypted in the rainbow chain, and try from L-1 to 1 in turn;

Step E2: Execute the R _i function on the ciphertext, save the result in the intermediate node M; then execute f _i+1 , f _i+2 , ..., f _L-1 on M, and assign the result to M;

Step E3: Call the filter function of the Spark platform to filter out all rainbow chains whose tail node value is equal to M in parallel. If there is no rainbow chain whose chain tail node value is equal to M in the rainbow table, try the next i; if all attempts fail If a rainbow chain that meets the requirements is found, the decryption fails; if there is a rainbow chain whose tail node value is equal to M, then enter the next step.

4. the password hash value recovery method based on Spark platform according to claim 3, is characterized in that, calculates plaintext according to rainbow chain and comprises the following steps:

Step F1: Obtain the position i of the plaintext corresponding to the ciphertext in the rainbow chain from step E, execute f ₁ , f ₂ ,...,f _i-1 on the chain head node value SV of the rainbow chain, and the obtained result is is the plaintext corresponding to the ciphertext.

5. a password hash value recovery device based on Spark platform, is characterized in that, comprises Spark configuration unit, rainbow table data generation unit and rainbow table decryption unit, wherein:

The Spark configuration unit executes the preparation work of the recovery device, and configures the computing power provided by the Spark platform;

The rainbow table data generation unit generates a collection of several rainbow chains, and each rainbow chain is calculated through a series of hash operations and mapping functions iteratively. The rainbow table data generation unit only needs to store the chain head node and the chain tail node;

The rainbow table decryption unit guesses the location of the plaintext corresponding to the ciphertext to be decrypted, and searches for the chain tail node corresponding to the ciphertext; then finds the rainbow chain whose chain tail node is the same as the chain tail node corresponding to the ciphertext in the rainbow table, Calculate the plaintext corresponding to the ciphertext based on these rainbow chains.

6. according to the described password hash value recovery device based on Spark platform of claim 5, it is characterized in that, the preparatory work of described Spark configuration unit mainly comprises:

Build the Spark platform, configure the Master and Slave, and then configure the number and size of Workers.

7. the password hash value recovery device based on Spark platform according to claim 5, is characterized in that,

Described rainbow table data generation unit execution function mainly comprises:

Randomly generate S chain head node values, and then calculate the chain tail node value corresponding to the chain head node value according to the rainbow chain generation rules, and call the map function of the Spark platform to store the calculation results in the Hadoop distributed file system HDFS.

8. according to the described password hash value recovery device based on Spark platform of claim 5, it is characterized in that, described rainbow table deciphering unit execution function mainly comprises:

Guess the position i of the plaintext corresponding to the ciphertext to be decrypted in the rainbow chain, execute the R _i function on the ciphertext, and save the result in the intermediate node M; then execute f _i+1 , f _i+2 ,..., _If If there is no rainbow chain whose tail node value is equal to M in the rainbow table, try the next i; if all attempts fail to find a rainbow chain that meets the requirements, the decryption fails; if there is a rainbow chain whose tail node value is equal to M, call The foreach function of the Spark platform calculates the plaintext corresponding to the ciphertext for each rainbow chain.