CN116266180A - Data searching and inquiring method and system based on searchable encryption and homomorphic encryption - Google Patents
Data searching and inquiring method and system based on searchable encryption and homomorphic encryption Download PDFInfo
- Publication number
- CN116266180A CN116266180A CN202210167538.6A CN202210167538A CN116266180A CN 116266180 A CN116266180 A CN 116266180A CN 202210167538 A CN202210167538 A CN 202210167538A CN 116266180 A CN116266180 A CN 116266180A
- Authority
- CN
- China
- Prior art keywords
- data
- entry index
- mapping table
- search
- query
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2219—Large Object storage; Management thereof
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/221—Column-oriented storage; Management thereof
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/243—Natural language query formulation
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
 
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
- H04L63/045—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload wherein the sending and receiving network entities apply hybrid encryption, i.e. combination of symmetric and asymmetric encryption
 
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/008—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
 
- 
        - Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
 
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computer Security & Cryptography (AREA)
- Artificial Intelligence (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明提供了基于可搜索加密和同态加密的数据搜索与查询方法及系统,包括:基于数据访问者需要获取的数据项的关键字生成关键字的搜索凭证;基于搜索凭证从预先构建的数据条目索引映射表中检索与搜索凭证相对应的包含关键字的数据条目索引映射表;对数据条目索引映射表进行解密得到数据条目索引,并基于数据条目索引下载需要获取的数据项;其中,数据条目索引映射表是通过分词技术对文本形式的数据进行分词,从分词中提取出其中的关键字构建的。本发明通过将需要获取的关键字通过搜索凭证转换为数据条目索引映射表,并基于数据条目索引映射表下载对需要的数据项,实现了在加密状态下进行检索,有针对性的下载,避免了下载资源的浪费。
The invention provides a data search and query method and system based on searchable encryption and homomorphic encryption. Retrieve the data entry index mapping table containing keywords corresponding to the search voucher from the entry index mapping table; decrypt the data entry index mapping table to obtain the data entry index, and download the data item to be obtained based on the data entry index; where, the data The entry index mapping table is constructed by segmenting the data in the form of text through the word segmentation technology, and extracting the keywords from the word segmentation. The present invention converts the keywords to be obtained into a data entry index mapping table through the search voucher, and downloads the required data items based on the data entry index mapping table, thereby realizing retrieval in an encrypted state and targeted downloading, avoiding It reduces the waste of download resources.
Description
技术领域technical field
本发明涉及信息安全领域,具体涉及一种基于可搜索加密和同态加密的数据搜索与查询方法及系统。The invention relates to the field of information security, in particular to a data search and query method and system based on searchable encryption and homomorphic encryption.
背景技术Background technique
随着“数字新基建”决策的不断部署,数据来源、形式与规模的不断增大,对数据管理能力的需求也在不断提高。目前海量电力数据的存储与管理任务已无法由本地设备来承担,因此广泛采用数据中台的形式对数据进行存储与管理。与此同时,这些数据中往往会包含部分敏感数据,如用户个人隐私信息、专家库数据等,这类数据具有保密需求,不适宜在由运营商直接运维的数据中台进行储存。目前,数据拥有方多采用对数据加密的方式保证数据安全,但这将导致加密后的数据需经过解密、运算加密等过程才能使用,这将会增加时间成本以及系统资源消耗,效率低成本高。With the continuous deployment of "digital new infrastructure" decisions, data sources, forms, and scales continue to increase, and the demand for data management capabilities is also increasing. At present, the storage and management tasks of massive power data can no longer be undertaken by local equipment, so the form of data center is widely used to store and manage data. At the same time, these data often contain some sensitive data, such as user personal privacy information, expert database data, etc. This kind of data has confidentiality requirements and is not suitable for storage in the data center directly operated and maintained by the operator. At present, data owners often use data encryption to ensure data security, but this will cause encrypted data to be used after decryption, calculation encryption, etc., which will increase time costs and system resource consumption, with high efficiency and low cost .
针对敏感数据的安全处理技术,目前主流方案有以下三类:安全多方计算(SecureMulti-party Computation,SMPC),可信硬件(可信执行环境Trusted ExecutionEnvironment,TEE),和全同态加密(Fully-Homomorphic Encryption,FHE)。For the security processing technology of sensitive data, there are currently three mainstream solutions: Secure Multi-party Computation (SecureMulti-party Computation, SMPC), trusted hardware (Trusted ExecutionEnvironment, TEE), and fully homomorphic encryption (Fully- Homomorphic Encryption, FHE).
安全多方计算指分布式的多方参与者共同计算某一联合函数。每个参与方提供各自的输入信息,并得到相应的输出结果。安全多方计算的安全性保证每个参与方的输入与输出不会泄露给其他参与方。该源语可用于构建数据可的安全查询——数据库维护方和客户可分别作为安全多方计算的参与者;联合函数可以被设计为以客户的输入为其查询条件,并得到的输出为查询结果的密文。Secure multi-party computing refers to the joint calculation of a joint function by distributed multi-party participants. Each participant provides its own input information and obtains corresponding output results. The security of secure multi-party computation ensures that the input and output of each participant will not be leaked to other participants. This source language can be used to construct secure queries on data—database maintainers and customers can be participants in secure multi-party computations; joint functions can be designed to use customer input as query conditions and obtain output as query results ciphertext.
全同态加密是一种特殊的加密方案,用该方案加密的后的数据,可在不解密的情况下直接在密文上进行任意运算,得到运算结果的密文——将其解密后便可得到期望的计算结果。全同态加密同样也可以用来构造加密数据上的查询功能。Fully homomorphic encryption is a special encryption scheme. The data encrypted by this scheme can directly perform arbitrary operations on the ciphertext without decryption, and obtain the ciphertext of the operation result——after decrypting it, it can be The expected calculation results can be obtained. Fully homomorphic encryption can also be used to construct query functions on encrypted data.
近几年新型处理器普遍搭载的基于硬件的可信计算技术为加密数据处理提供了另一种思路,它可以提供三个特性以保证数据处理过程中的安全与隐私:彻底隔离执行,数据封装和远程证明。这些特性的实现方式会根据不同厂商的产品有所不同。具体而言,“隔离执行”在内存中划分了特定的区域,同时限制了只有特定的进程能够访问该内存空间;其他的进程,即使是系统进程、超管程序(hypervisor)和系统管理模块都不能访问这部分内存空间的数据。“数据封装”则是通过加密和认证,使得只有特定的进程能够执行解密操作而拿到数据原文。“远程证明”则向用户证明所有的代码都在安全的、无修改的执行。这三个特性相互配合,其效果相当于用户在服务器上划分了一块“隐藏区域”,在此空间的任何操作——服务器无法观察、修改和控制该区域的数据和操作。该技术同样可以用来构建可信数据库查询,云服务器安装可信硬件,并将数据加密后,把特定的操作极其所需的数据放到“隐藏区域”(值得一提的是,即使数据在“隐藏区域”被解密,解密后的明文对云服务器依然是不可见的),可以实现数据的搜索与查询功能。In recent years, the hardware-based trusted computing technology commonly used in new processors provides another way of thinking for encrypted data processing. It can provide three features to ensure the security and privacy of data processing: completely isolated execution, data encapsulation and remote attestation. The implementation of these features will vary according to the products of different manufacturers. Specifically, "isolated execution" divides a specific area in memory, and at the same time restricts only specific processes from accessing that memory space; other processes, even system processes, hypervisors, and system management modules Data in this part of the memory space cannot be accessed. "Data encapsulation" is through encryption and authentication, so that only a specific process can perform the decryption operation and get the original data. "Remote attestation" proves to users that all code is executing securely and without modification. These three features cooperate with each other, and the effect is equivalent to the user dividing a "hidden area" on the server. Any operation in this space-the server cannot observe, modify and control the data and operations in this area. This technology can also be used to build trusted database queries. The cloud server installs trusted hardware, encrypts the data, and puts the data required for specific operations into the "hidden area" (it is worth mentioning that even if the data is in the The "hidden area" is decrypted, and the decrypted plaintext is still invisible to the cloud server), which can realize data search and query functions.
当前对敏感电力数据的保护与使用主要采用权限管理与访问控制的思想,将敏感数据存储在负面清单空间内,通过用户权限审批与安全专区操作的方式来保障敏感数据安全。但针对高度敏感数据如专家库等,这种简单的保障措施仍存在被打破的可能,且一旦发生数据泄漏将造成极大危害,并不足以满足业务部门的需求,因此有必要将这类高度敏感数据以加密的形式进行存储,从而充分保障其安全性。At present, the protection and use of sensitive power data mainly adopts the idea of authority management and access control, stores sensitive data in the negative list space, and ensures the security of sensitive data through user authority approval and security zone operation. However, for highly sensitive data such as expert databases, this simple safeguard may still be broken, and once a data leak occurs, it will cause great harm and is not enough to meet the needs of the business department. Sensitive data is stored in encrypted form to fully guarantee its security.
在数据加密保存后,若想直接利用这些数据,需要进行下载与解密,成本极高。为使其仍具有一定的可用性,需要设计针对加密数据的搜索、查询等技术,然而现有加密数据搜索技术无法满足实际需求。After the data is encrypted and saved, if you want to use the data directly, you need to download and decrypt it, which is very expensive. In order to make it usable to a certain extent, technologies such as search and query for encrypted data need to be designed, but the existing encrypted data search technology cannot meet the actual needs.
发明内容Contents of the invention
为了解决传统敏感数据保护方法依赖权限管理与访问控制在安全性上存在缺陷和在加密情况下载资源浪费的问题,本发明提出了一种基于可搜索加密和同态加密的数据搜索与查询方法,包括:In order to solve the problem that the traditional sensitive data protection method relies on authority management and access control, which has defects in security and wastes downloading resources in the case of encryption, the present invention proposes a data search and query method based on searchable encryption and homomorphic encryption. include:
基于数据访问者需要获取的数据项的关键字生成所述关键字的搜索凭证;generating a search voucher for the keyword based on the keyword of the data item that the data visitor needs to obtain;
基于所述搜索凭证从预先构建的数据条目索引映射表中检索与所述搜索凭证相对应的包含所述关键字的数据条目索引映射表;Retrieving a data entry index mapping table corresponding to the search credential and including the keyword from a pre-built data entry index mapping table based on the search credential;
对所述数据条目索引映射表进行解密得到数据条目索引,并基于所述数据条目索引下载需要获取的数据项;Decrypting the data entry index mapping table to obtain a data entry index, and downloading data items to be obtained based on the data entry index;
其中,数据条目索引映射表是通过分词技术对文本形式的数据进行分词,从分词中提取出其中的关键字构建的。Wherein, the data entry index mapping table is constructed by segmenting data in the form of text through a word segmentation technology, and extracting keywords therein from the word segmentation.
优选的,所述数据条目索引映射表的构建包括:Preferably, the construction of the data entry index mapping table includes:
对文本形式的数据采用分词技术进行分词;Word segmentation technology is used to segment data in the form of text;
从所述分词中提取关键字;Extract keywords from the word segmentation;
由所述关键字和与所述关键字对应的分词构建数据条目索引映射表。A data entry index mapping table is constructed from the keywords and word segmentations corresponding to the keywords.
优选的,所述基于数据访问者需要获取的数据项的关键字生成所述关键字的搜索凭证,包括:Preferably, generating the keyword search voucher based on the keyword of the data item that the data visitor needs to obtain includes:
基于所述数据访问者发送的关键字查询申请核实所述数据访问者是否具有访问权限,当所述数据访问者具有访问权限时,为所述数据访问者生成所述关键字相对应的搜索凭证。Verify whether the data visitor has access rights based on the keyword query application sent by the data visitor, and generate a search voucher corresponding to the keyword for the data visitor when the data visitor has access rights .
优选的,还包括:Preferably, it also includes:
基于所述数据访问者发出的查询指令对数值型数据进行查询。Numerical data is queried based on the query command issued by the data visitor.
优选的,所述基于所述数据访问者发出的查询指令对数值型数据进行查询包括:Preferably, the querying of numerical data based on the query instruction issued by the data visitor includes:
解析数据访问者发出的查询指令,当解析结果为仅包含加法操作时,对同态加密的加密列直接进行计算得到待查询的数值型数据对应的密文;Analyze the query command sent by the data visitor, and when the result of the analysis contains only addition operations, directly calculate the encrypted columns of the homomorphic encryption to obtain the ciphertext corresponding to the numerical data to be queried;
当解析结果包含加法操作和非加法操作时,用生成的随机数将运算数盲化,利用BCP密码系统的双陷门对盲化后的运算数据进行解密得到盲化后数据的明文;并对所述盲化后数据的明文进行计算,同时对计算的结果进行加密,得到带有盲化因子的密文;对所述带有盲化因子的密文进行去盲化操作得到待查询的数值型数据对应的密文。When the analysis result includes addition operation and non-addition operation, use the generated random number to blind the operation number, use the double trapdoor of the BCP cryptosystem to decrypt the blinded operation data to obtain the plaintext of the blinded data; and The plaintext of the blinded data is calculated, and the calculated result is encrypted at the same time to obtain the ciphertext with the blinding factor; the deblinding operation is performed on the ciphertext with the blinding factor to obtain the value to be queried The ciphertext corresponding to the type data.
基于同一发明构思本发明还提供了一种基于可搜索加密和同态加密的数据搜索与查询系统,包括:Based on the same inventive concept, the present invention also provides a data search and query system based on searchable encryption and homomorphic encryption, including:
凭证生成模块,用于基于数据访问者需要获取的数据项的关键字生成所述关键字的搜索凭证;A voucher generation module, configured to generate a search voucher for the keyword based on the keyword of the data item that the data visitor needs to obtain;
索引映射模块,用于基于所述搜索凭证从预先构建的数据条目索引映射表中检索与所述搜索凭证相对应的包含所述关键字的数据条目索引映射表;An index mapping module, configured to retrieve a data entry index mapping table corresponding to the search credential and containing the keyword from a pre-built data entry index mapping table based on the search credential;
数据下载模块,用于对所述数据条目索引映射表进行解密得到数据条目索引,并基于所述数据条目索引下载需要获取的数据项;A data download module, configured to decrypt the data entry index mapping table to obtain a data entry index, and download data items to be obtained based on the data entry index;
其中,数据条目索引映射表是通过分词技术对文本形式的数据进行分词,从分词中提取出其中的关键字构建的。Wherein, the data entry index mapping table is constructed by segmenting data in the form of text through a word segmentation technology, and extracting keywords therein from the word segmentation.
优选的,所述数据条目索引映射表的构建包括:Preferably, the construction of the data entry index mapping table includes:
对文本形式的数据采用分词技术进行分词;Word segmentation technology is used to segment data in the form of text;
从所述分词中提取关键字;Extract keywords from the word segmentation;
由所述关键字和与所述关键字对应的分词构建数据条目索引映射表。A data entry index mapping table is constructed from the keywords and word segmentations corresponding to the keywords.
优选的,还包括:数值型数据查询模块;Preferably, it also includes: a numerical data query module;
数值型数据查询模块,用于基于所述数据访问者发出的查询指令对数值型数据进行查询。A numeric data query module, configured to query numeric data based on query instructions issued by the data visitor.
优选的,所述数值型数据查询模块包括:Preferably, the numerical data query module includes:
加法操作子模块,用于解析数据访问者发出的查询指令,当解析结果为仅包含加法操作时,对同态加密的加密列直接进行计算得到待查询的数值型数据对应的密文;The addition operation sub-module is used to analyze the query instruction issued by the data visitor. When the analysis result contains only the addition operation, it directly calculates the encrypted columns of the homomorphic encryption to obtain the ciphertext corresponding to the numerical data to be queried;
混合操作子模块,用于当解析结果包含加法操作和非加法操作时,用生成的随机数将运算数盲化,利用BCP密码系统的双陷门对盲化后的运算数据进行解密得到盲化后数据的明文;并对所述盲化后数据的明文进行计算,同时对计算的结果进行加密,得到带有盲化因子的密文;对所述带有盲化因子的密文进行去盲化操作得到待查询的数值型数据对应的密文。The mixed operation sub-module is used to blind the operation number with the generated random number when the analysis result includes addition operation and non-addition operation, and use the double trapdoor of the BCP cryptosystem to decrypt the blinded operation data to obtain blindness the plaintext of the data after the blinding; and calculate the plaintext of the data after the blinding, and encrypt the result of the calculation at the same time to obtain the ciphertext with the blinding factor; deblind the ciphertext with the blinding factor The cipher text corresponding to the numerical data to be queried is obtained through the operation.
与现有技术相比,本发明的有益效果为:Compared with prior art, the beneficial effect of the present invention is:
本发明提供了一种基于可搜索加密和同态加密的数据搜索与查询方法,包括:基于数据访问者需要获取的数据项的关键字生成关键字的搜索凭证;基于搜索凭证从预先构建的数据条目索引映射表中检索与搜索凭证相对应的包含关键字的数据条目索引映射表;对数据条目索引映射表进行解密得到数据条目索引,并基于数据条目索引下载需要获取的数据项;其中,数据条目索引映射表是通过分词技术对文本形式的数据进行分词,从分词中提取出其中的关键字构建的。本发明通过将需要获取的关键字通过搜索凭证转换为数据条目索引映射表,并基于数据条目索引映射表下载对需要的数据项,实现了在加密状态下进行检索,有针对性的下载,避免了下载资源的浪费。The present invention provides a data search and query method based on searchable encryption and homomorphic encryption, including: generating a keyword search voucher based on the keyword of the data item that the data visitor needs to obtain; Retrieve the data entry index mapping table containing keywords corresponding to the search voucher from the entry index mapping table; decrypt the data entry index mapping table to obtain the data entry index, and download the data item to be obtained based on the data entry index; where, the data The entry index mapping table is constructed by segmenting the data in the form of text through the word segmentation technology, and extracting the keywords from the word segmentation. The present invention converts the keywords to be obtained into a data entry index mapping table through the search voucher, and downloads the required data items based on the data entry index mapping table, thereby realizing retrieval in an encrypted state and targeted downloading, avoiding It reduces the waste of download resources.
附图说明Description of drawings
图1为本发明的基于可搜索加密和同态加密的数据搜索与查询方法流程图;Fig. 1 is the flow chart of the data search and query method based on searchable encryption and homomorphic encryption of the present invention;
图2为本发明的加密数据搜索与查询方法示意图;Fig. 2 is a schematic diagram of the encrypted data search and query method of the present invention;
图3为本发明的部署架构及工作流程示意图。FIG. 3 is a schematic diagram of the deployment architecture and workflow of the present invention.
具体实施方式Detailed ways
为了更好地理解本发明,下面结合说明书附图和实例对本发明的内容做进一步的说明。本发明提出了一种基于可搜索加密和同态加密的数据搜索与查询方法,通过采用高效可搜索加密,设计密码学协议—利用高效密码源语设计了针对数值型数据的查询方案。除文件关键词搜索外,本发明还支持关系数据库上的SQL语句查询。具体而言,对于仅需加法即可表达的查询语句,本发明采用同态加密技术,直接通过在加密数据上进行计算得到查询结果;对于需要加法与乘法共同的复杂查询,本发明通过采用辅助服务器的方式,从而构造完备的数据库查询机制。In order to better understand the present invention, the content of the present invention will be further described below in conjunction with the accompanying drawings and examples. The present invention proposes a data search and query method based on searchable encryption and homomorphic encryption. By adopting efficient searchable encryption, a cryptographic protocol is designed—a query scheme for numerical data is designed by using efficient cryptographic source language. In addition to file keyword search, the present invention also supports SQL statement query on relational databases. Specifically, for query statements that can be expressed only by addition, the present invention uses homomorphic encryption technology to obtain query results directly through calculations on encrypted data; for complex queries that require both addition and multiplication, the present invention uses auxiliary The way of the server, so as to construct a complete database query mechanism.
实施例1:Example 1:
本发明提出基于可搜索加密和同态加密的数据搜索与查询方法,如图1所示:包括:The present invention proposes a data search and query method based on searchable encryption and homomorphic encryption, as shown in Figure 1: including:
步骤1:基于数据访问者需要获取的数据项的关键字生成所述关键字的搜索凭证;Step 1: Generate a search voucher for the keyword based on the keyword of the data item that the data visitor needs to obtain;
步骤2:基于所述搜索凭证从预先构建的数据条目索引映射表中检索与所述搜索凭证相对应的包含所述关键字的数据条目索引映射表;Step 2: Retrieving a data entry index mapping table corresponding to the search voucher and including the keyword from a pre-built data entry index mapping table based on the search voucher;
步骤3:对所述数据条目索引映射表进行解密得到数据条目索引,并基于所述数据条目索引下载需要获取的数据项;Step 3: Decrypt the data entry index mapping table to obtain a data entry index, and download the data items to be obtained based on the data entry index;
其中,数据条目索引映射表是通过分词技术对文本形式的数据进行分词,从分词中提取出其中的关键字构建的。Wherein, the data entry index mapping table is constructed by segmenting data in the form of text through a word segmentation technology, and extracting keywords therein from the word segmentation.
在步骤1之前还包括:Before step 1 also include:
数据条目索引映射表的构建包括:The construction of the data entry index mapping table includes:
对文本形式的数据采用分词技术进行分词;Word segmentation technology is used to segment data in the form of text;
从所述分词中提取关键字;Extract keywords from the word segmentation;
由所述关键字和与所述关键字对应的分词构建数据条目索引映射表。A data entry index mapping table is constructed from the keywords and word segmentations corresponding to the keywords.
下面对数据条目索引映射表的构建进行详细介绍:The construction of the data entry index mapping table is described in detail below:
对于文本形式的机密数据,数据拥有者首先通过分词技术对数据表中存在的多个字段进行分词,提取出其中的关键字。关键字通常为代表本数据条目信息的重要实词。For confidential data in the form of text, the data owner first uses the word segmentation technology to segment the multiple fields in the data table to extract the keywords. Keywords are usually important content words representing the information of this data item.
基于上述提取出的关键字集合,数据拥有者针对该文本文件建立基于关键字的搜索索引。索引的形式为[关键字A,包含关键字A的数据项在文件所有项中的索引]的映射表。Based on the keyword set extracted above, the data owner builds a keyword-based search index for the text file. The form of the index is a mapping table of [keyword A, the index of the data item containing the keyword A in all items of the file].
数据拥有者采用主流加密算法(如AES)对文本文件中全部数据逐行加密,同时对提取出的每一个关键字也分别加密。对于建立的索引结构,通过密码学方法(如双线性对)对其进行置换,从而避免他人在拥有索引时无法根据索引获知关键字和数据项的对应关系;但在拿到特定的搜索凭证时可以用其获得索引中相对应的部分信息。The data owner uses the mainstream encryption algorithm (such as AES) to encrypt all the data in the text file line by line, and at the same time encrypts each extracted keyword separately. For the established index structure, it is replaced by cryptographic methods (such as bilinear pairing), so as to prevent others from knowing the corresponding relationship between keywords and data items according to the index when they have the index; It can be used to obtain the corresponding part of the information in the index.
本发明针对字符型的字段,设计密态搜索方案,使得当用户想查找包含某个关键字w的数据项时,可生成关于此关键字的搜索凭证[w],将[w]发送给数据存储服务器,数据存储服务器利用[w]执行搜索,将包含关键字w的数据项(以加密的状态)发送给用户。The present invention designs a secret search scheme for character-type fields, so that when a user wants to find a data item containing a certain keyword w, he can generate a search voucher [w] for this keyword, and send [w] to the data item The storage server, the data storage server performs a search using [w], and sends the data item (in an encrypted state) containing the keyword w to the user.
步骤1中基于数据访问者需要获取的数据项的关键字生成所述关键字的搜索凭证,具体包括:In step 1, a search voucher for the keyword is generated based on the keyword of the data item that the data visitor needs to obtain, specifically including:
基于所述数据访问者发送的关键字查询申请核实所述数据访问者是否具有访问权限,当所述数据访问者具有访问权限时,为所述数据访问者生成所述关键字相对应的搜索凭证。本实施例中的权限管理方用于判断数据访问者是否具有访问权限,数据存储方用于对数据进行存储管理。Verify whether the data visitor has access rights based on the keyword query application sent by the data visitor, and generate a search voucher corresponding to the keyword for the data visitor when the data visitor has access rights . The rights management part in this embodiment is used to judge whether the data visitor has access rights, and the data storage part is used to store and manage the data.
依据权限分离管理机制,当某个用户(数据访问者)想要搜索与获取包含某个关键字的数据项时,需要首先向权限管理方发送查询申请。权限管理方在核实该用户具有查询权限后,为其生成该关键词相对应的搜索凭证并发送给数据存储方。According to the authority separation management mechanism, when a user (data visitor) wants to search and obtain data items containing a certain keyword, he needs to first send a query application to the authority management party. After verifying that the user has the query authority, the authority management party generates a search voucher corresponding to the keyword and sends it to the data storage party.
步骤2中基于所述搜索凭证从预先构建的数据条目索引映射表中检索与所述搜索凭证相对应的包含所述关键字的数据条目索引映射表,具体包括:In step 2, based on the search credential, the data entry index mapping table containing the keyword corresponding to the search credential is retrieved from the pre-built data entry index mapping table, specifically including:
数据存储方在收到关于某个关键字的搜索凭证后,依据搜索凭证检索加密索引中与之相对应的包含该关键字的数据条目索引映射表,并将该加密映射表返回给访问者。After receiving the search voucher for a certain keyword, the data storage party retrieves the corresponding data entry index mapping table containing the keyword in the encrypted index according to the search voucher, and returns the encrypted mapping table to the visitor.
         步骤3中对所述数据条目索引映射表进行解密得到数据条目索引,并基于所述数据条目索引下载需要获取的数据项,具体包括:In 
访问者解密获得的结果,依据得到的数据条目索引下载相应的加密数据并进行解密。The visitor decrypts the obtained result, downloads the corresponding encrypted data according to the obtained data entry index and decrypts it.
本发明还包括基于所述数据访问者发出的查询指令对数值型数据进行查询。The present invention also includes querying numerical data based on query instructions issued by the data visitor.
数据格式与隐私防护需求:Data format and privacy protection requirements:
数据以二维表格的形式存储。在数据库表中每一列为一个字段又称为属性,每一行为一个记录又称为元组。数据库表中的属性有数值型和字符型两种类型;同时,每个属性包括数值型和字符型,可能为敏感或非敏感数据。敏感的字符型数据使用可搜索加密方案加密,以保证针对字符型数据的搜索功能;敏感的数值型数据通过BCP加密算法进行加密,这里的BCP加密算法是一种具有加同态、密钥同态和双陷门性质的加密算法,以保证针对数值型数据的线性和非线性运算。其中加同态是指使用相同密钥加密的两个数据的密文,密文上的某种运算可以在不解密的情况下直接得到这两个数据和的密文,即Encpk(m1)+Encpk(m2)=Encpk(m1+m2);密钥同态是指使一个两公钥之和作为公钥加密的数据,可用与这两个公钥相对一个的私钥解密,即双陷门是指对于任意公钥pki加密的数据/>除可用其对应的私钥ski解密之外,还存在一个“特权密钥”mk能够解密任何私钥加密的密文即第二陷门;线性和非线性运算是SQL查询语句的重要组成部分。Data is stored in the form of two-dimensional tables. In a database table, each column is a field, also known as an attribute, and each row is a record, also known as a tuple. There are two types of attributes in the database table: numeric and character; at the same time, each attribute includes numeric and character, which may be sensitive or non-sensitive data. Sensitive character data is encrypted using a searchable encryption scheme to ensure the search function for character data; sensitive numerical data is encrypted by BCP encryption algorithm, where BCP encryption algorithm is a State and double trapdoor encryption algorithms to ensure linear and nonlinear operations on numerical data. Among them, adding homomorphism refers to the ciphertext of two data encrypted with the same key, and a certain operation on the ciphertext can directly obtain the ciphertext of the sum of the two data without decryption, that is, Enc pk (m 1 )+Enc pk (m 2 )=Enc pk (m 1 +m 2 ); Key homomorphism means that the sum of two public keys is used as the data encrypted by the public key, and a private key corresponding to the two public keys can be used decryption, ie The double trapdoor refers to the data encrypted for any public key pk i /> In addition to being decrypted with its corresponding private key sk i , there is also a "privileged key" mk that can decrypt any ciphertext encrypted by the private key, that is, the second trapdoor; linear and nonlinear operations are an important part of SQL query statements .
特别的,对于针对于数值型数据的查询操作,可能涉及到3类情况:1查询语句仅仅涉及到非敏感数据;2查询语句仅涉及到敏感数据;3查询语句涉及到敏感和非敏感数据。对于情况1,现有的数据库管理系统中的查询功能即可完成。In particular, for query operations on numerical data, three types of situations may be involved: 1. The query statement only involves non-sensitive data; 2. The query statement only involves sensitive data; 3. The query statement involves both sensitive and non-sensitive data. For case 1, the query function in the existing database management system can be completed.
         对情况2和情况3,在本发明中针对电力数据的特点,我们使用结构化数据库又称数据表来进行存储,每条记录包含若干字段又称属性,这里的记录指的是数据库表的一行,也成为元组,这些字段部分为字符型,部分为数值型,这里的数值型包括整型或者浮点型数据。这些属性部分为敏感数据,部分为常规数据。为了保护数据的安全与隐私,同时保证数据存储服务商能够在数据上执行必要的操作,这里必要的操作包括搜索与查询,本发明使用同态加密方案将敏感数据进行加密;对于非敏感数据使用明文存储,以提升数据查询操作的效率。同时,本发明设计了相应的方案完成在数据上的特定操作。特别的,考虑到电网数据的实际应用场景中,经常出现设计多列数据的查询操作,本发明对数据查询方案做了针对于跨列查询语句执行方式的特殊设计。For 
基于所述数据访问者发出的查询指令对数值型数据进行查询包括:Querying numerical data based on query instructions sent by the data visitor includes:
解析数据访问者发出的查询指令,当解析结果为仅包含加法操作时,对同态加密的加密列直接进行计算得到待查询的数值型数据对应的密文;Analyze the query command sent by the data visitor, and when the result of the analysis contains only addition operations, directly calculate the encrypted columns of the homomorphic encryption to obtain the ciphertext corresponding to the numerical data to be queried;
当解析结果包含加法操作和非加法操作时,用生成的随机数将运算数盲化,利用BCP密码系统的双陷门对盲化后的运算数据进行解密得到盲化后数据的明文;并对所述盲化后数据的明文进行计算,同时对计算的结果进行加密,得到带有盲化因子的密文;对所述带有盲化因子的密文进行去盲化操作得到待查询的数值型数据对应的密文。When the analysis result includes addition operation and non-addition operation, use the generated random number to blind the operation number, use the double trapdoor of the BCP cryptosystem to decrypt the blinded operation data to obtain the plaintext of the blinded data; and The plaintext of the blinded data is calculated, and the calculated result is encrypted at the same time to obtain the ciphertext with the blinding factor; the deblinding operation is performed on the ciphertext with the blinding factor to obtain the value to be queried The ciphertext corresponding to the type data.
对于针对数值型数据的查询语句,本发明设计查询解析方案,将用户的查询指令解析为在数据库表各个数据项之间的基础代数运算。For query statements aimed at numerical data, the present invention designs a query analysis scheme, which resolves user query instructions into basic algebraic operations between various data items in the database table.
对于解析后的各基础运算,根据运算类型和涉及的数据不同,本发明设计了不同的运算执行方案。首先,对于仅涉及单一加密列上的线性运算,这里的线性运算包括加法和数乘,本发明使用BCP加密系统的加同态性质,即以相同密钥加密的密文,可以在不解密的情况下,直接得到两个明文做加法后和的密文,直接得到运算结果。其次,对于涉及到两个加密列上的加法和乘法运算,本发明设计存储服务器和辅助运算服务器之间的交互协议来完成运算。具体而言,存储服务器首先生成随机数,用此随机数将运算数盲化,并将盲化后的运算数发送给辅助计算服务器;辅助计算服务器利用BCP密码系统的的双陷门性质得到盲化后数据的明文,并完成计算并加密后,将带有盲化因子的运算结果的密文发送给存储服务器;存储服务器将其去盲化得到最终的运算结果。最后,对于涉及到多个加密列之间的数值比较运算,这里的比较运算为判断两个数值的大小或判断两个数值知否相等,本发明设计类似的储服务器和辅助运算服务器之间的交互协议完成运算,如图2所示。For the analyzed basic operations, the present invention designs different operation execution schemes according to the types of operations and the data involved. First of all, for the linear operation involving only a single encrypted column, the linear operation here includes addition and multiplication, the present invention uses the addition homomorphic property of the BCP encryption system, that is, the ciphertext encrypted with the same key can be encrypted without decryption. In this case, the ciphertext of the sum of two plaintexts after addition is directly obtained, and the result of the operation is directly obtained. Secondly, for the addition and multiplication operations involving two encrypted columns, the present invention designs an interactive protocol between the storage server and the auxiliary operation server to complete the operation. Specifically, the storage server first generates a random number, uses the random number to blind the operand, and sends the blinded operand to the auxiliary computing server; the auxiliary computing server uses the double trapdoor property of the BCP cryptosystem to obtain the blind After completing the calculation and encryption, the ciphertext of the operation result with the blinding factor is sent to the storage server; the storage server deblinds it to obtain the final operation result. Finally, for the numerical comparison operation involving multiple encrypted columns, the comparison operation here is to judge the size of two numerical values or determine whether the two numerical values are equal. The present invention designs a similar The interactive protocol completes the operation, as shown in Figure 2.
考虑到当前普遍采用的部署场景,本发明采用如图3所示的部署架构:由数据中台负责数据的存储与维护,同时,我们采用和数据存储云归属于不同服务商的计算云服务器,本实施例中采用存储云和辅助计算云服务器,并设计存储云和辅助计算云服务器之间的交互协议来完成非线性运算。本设计仅需计算云服务器负责极其轻量的计算,因此可实现经济化的部署。Considering the currently commonly used deployment scenarios, the present invention adopts the deployment architecture shown in Figure 3: the data center is responsible for data storage and maintenance, and at the same time, we use computing cloud servers that belong to different service providers from the data storage cloud. In this embodiment, a storage cloud and an auxiliary computing cloud server are used, and an interaction protocol between the storage cloud and the auxiliary computing cloud server is designed to complete nonlinear operations. This design only requires the computing cloud server to be responsible for extremely light calculations, so economical deployment can be achieved.
如果解析后的结果仅包含加法操作,则直接利用同态加密的性质在密文上操作,完成查询并将查询结果返回给用户,这里的查询结果指的是满足条件的数值结果以及满足条件的数据项。If the parsed result only contains addition operations, directly use the properties of homomorphic encryption to operate on the ciphertext, complete the query and return the query results to the user, where the query results refer to the numerical results that meet the conditions and the results that meet the conditions data item.
实施例2:Example 2:
对于数值型数据的查询操作具体如下:The query operation for numeric data is as follows:
为了和其他数据库系统及用户使用习惯兼容,数值型数据的查询以SQL语句的形式进行。在执行SQL语句时,需要首先运算其中的查询条件语句,通常为代数运算语句和逻辑运算语句,这里的袋鼠运算语句如加法、减法、乘法和乘方运算等,逻辑运算语句如“=”运算判断运算符两端的值是否相等;“>”运算判断运算符前的值是否大于运算符后的数据的值;随后在查询输出满足查询条件语句的结果。其中由于部分数据已加密,因此本发明设计了专用的在密文下的计算方案;而在获得查询条件语句的计算结果后输出查询结果,本发明直接采用现有数据库管理系统中的成熟方案即可。In order to be compatible with other database systems and user habits, the query of numerical data is performed in the form of SQL statements. When executing a SQL statement, it is necessary to first calculate the query condition statement, which is usually an algebraic operation statement and a logic operation statement. Here, the kangaroo operation statement includes addition, subtraction, multiplication, and power operation, etc., and the logic operation statement is such as "=" operation Determine whether the values at both ends of the operator are equal; the ">" operation determines whether the value before the operator is greater than the value of the data after the operator; then output the result that meets the query condition statement in the query. Wherein because part data is already encrypted, so the present invention has designed special-purpose calculation scheme under cipher text; And output query result after obtaining the calculation result of query condition sentence, the present invention directly adopts the ripe scheme in the existing database management system namely Can.
【情景一】在执行诸如SELECT*WHERE A+B=6这样的操作时,其中要进行两个运算——A+B的加法运算,以及A+B运算结果和c的等值判断运算。其中加法运算为线性运算,等值比较为非线性运算。[Scenario 1] When an operation such as SELECT*WHERE A+B=6 is executed, two operations need to be performed—the addition operation of A+B, and the equivalent judgment operation of the result of A+B operation and c. The addition operation is a linear operation, and the equivalence comparison is a nonlinear operation.
【情景二】在执行诸如SELECT*WHERE A>5;或SELECT*WHERE A=B这样的操作时,需要进行比较和等值测试运算,均为非线性运算。[Scenario 2] When performing operations such as SELECT*WHERE A>5; or SELECT*WHERE A=B, it is necessary to perform comparison and equivalence test operations, both of which are non-linear operations.
【情景三】在执行诸如SELECT COUNT(*)WHERE A>5或SELECT SUM(*)WHERE2*B>C这样的操作时,需要进行COUNT(计数),SUM(求和)和数乘运算,这些皆为线性运算。[Scenario 3] When performing operations such as SELECT COUNT(*)WHERE A>5 or SELECT SUM(*)WHERE2*B>C, COUNT (counting), SUM (summing) and multiplication operations are required. All are linear operations.
【情景四】在执行诸如SELECT*WHERE A X B>15这样的操作的时,需要执行A×B乘法操作。[Scenario 4] When performing an operation such as SELECT*WHERE A X B>15, it is necessary to perform an A×B multiplication operation.
处理类型一:仅涉及单一加属性的运算:Processing type 1: operations involving only a single attribute addition:
对于A+B,A×B等涉及到多属性上的代数运算,当属性A为密文存储的数据时使用密钥pkA、属性B为明文存储的数据时候,根据BCP同态加密的性质,数据中台先将属性B的每一个元素使用pkA进行加密,并在密文下得到运算结果的密文。对于涉及同一个属性多个数据项的运算,如SUM,AVE等,利用同态加密的性质数据中台也可直接完成运算。对于诸如α×A类型的数乘运算,其中α为一个整数,利用BCP加密的数学性质,可以得到数乘结果的密文。For A+B, A×B, etc. involving algebraic operations on multiple attributes, when attribute A is data stored in ciphertext, key pk A is used, and attribute B is data stored in plaintext, according to the nature of BCP homomorphic encryption , the data center first encrypts each element of attribute B with pk A , and obtains the ciphertext of the operation result under the ciphertext. For operations involving multiple data items of the same attribute, such as SUM, AVE, etc., the data center can also directly complete the operation by using the nature of homomorphic encryption. For multiplication operations such as α×A, where α is an integer, the ciphertext of the multiplication result can be obtained by using the mathematical properties of BCP encryption.
处理类型二:涉及多列加密数据的代数运算:Processing type two: Algebraic operations involving multiple columns of encrypted data:
对于A+B,A×B等涉及到多属性上的代数运算,当属性A、B均为以密文存储的数据时,分别使用密钥pkA,pkB,数据中台无法独立进行在密文上进行计算。因此本发明设计了专用的交互协议,在辅助计算服务器的配合下完成运算过程。For A+B, A×B, etc. that involve algebraic operations on multiple attributes, when attributes A and B are data stored in ciphertext, using the keys pk A and pk B respectively, the data center cannot independently Calculated on the ciphertext. Therefore, the present invention designs a dedicated interactive protocol to complete the calculation process with the cooperation of the auxiliary calculation server.
以加法运算为例,在计算A+B时,需要计算属性中的每一组数据项的和,即a+b,其运算结果记作c。首先数据中台生成两个随机数r1和r2,并根据BCP加密算法的同态性得到a+r1和b+r2的密文和/>并将它们发送给辅助计算服务器。辅助计算服务器掌握mk,可将其解密得到a+r1和b+r2,随后进行求和运算得到(a+b)+(r1+r2),并将其加密得到/> 并将其发回给数据中台;其中加密密钥pkc是根据BCP密码系统的密钥同态性质,利用pka和pkb计算而得。数据中台得到/> 后,因其保留了r1和r2,因此根据密码系统同态性质可在密文中减去(r1+r2),得到其他涉及到多列加密数据的代数运算,诸如乘法运算等,也通过类似的交互协议的思路可以得到计算结果的密文。Taking the addition operation as an example, when calculating A+B, it is necessary to calculate the sum of each group of data items in the attribute, that is, a+b, and the operation result is recorded as c. First, the data center generates two random numbers r 1 and r 2 , and obtains the ciphertext of a+r 1 and b+r 2 according to the homomorphism of the BCP encryption algorithm and /> and send them to the secondary computing server. The auxiliary computing server masters mk, can decrypt it to get a+r 1 and b+r 2 , then perform summation to get (a+b)+(r 1 +r 2 ), and encrypt it to get /> And send it back to the data center; the encryption key pk c is calculated by using pk a and pk b according to the key homomorphic property of the BCP cryptosystem. Obtained in the data center /> Afterwards, because r 1 and r 2 are retained, according to the homomorphic property of the cryptographic system, (r 1 +r 2 ) can be subtracted from the ciphertext to obtain Other algebraic operations involving multi-column encrypted data, such as multiplication, can also obtain the ciphertext of the calculation results through similar ideas of interactive protocols.
处理类型三:涉及多列加密数据的非代数运算:Processing type three: non-algebraic operations involving multiple columns of encrypted data:
非代数运算主要为比较运算,包括判断两个数值的大小以及判断两个数值是否相等。在数据库管理系统中根据输入的不同,需要处理X种类型:(1)比较一个密文数据和一个明文数据;(2)比较两个相同密钥加密的数据;(3)比较两个不同密钥加密的数据。因为BCP加密系统的同态性,因此情况(1)可以转换为(2)来处理。Non-algebraic operations are mainly comparison operations, including judging the magnitude of two values and judging whether two values are equal. According to different inputs in the database management system, X types need to be processed: (1) compare a ciphertext data and a plaintext data; (2) compare two data encrypted with the same key; (3) compare two different encrypted data key-encrypted data. Because of the homomorphism of the BCP encryption system, the case (1) can be converted to (2) for processing.
同样,比较结果又称为输出也应当有两种类型——明文输出和密文输出。如执行SELECT COUNT(*)WHERE A>B这样的输出计算结果的查询时,每次比较的结果是执行查询的一个中间数据,需要对数据中台保密,因此比较结果应该为密文形式;若执行SELECT*WHEREA>B这样的查询,因为在语句执行的时候,每次比较完之后直接决定是否输出该元组,因此没有必要对数据中台保密,进而比较结果以明文的形式输出。因此,本发明设计了如表一所示的比较协议。特别的,根据当前研究进展,更容易设计协议得到Goldwasser-Micali密码系统也称QR加密加密的比较结果。本发明沿用当前成果即利用现有方案的思路设计得到以QR密码方案加密的结果,并设计协议改变加密模式,将QR加密的数据转换为BCP加密的数据,如表1所示。Similarly, the comparison result, also known as output, should also have two types—plaintext output and ciphertext output. For example, when executing a query that outputs calculation results such as SELECT COUNT(*)WHERE A>B, the result of each comparison is an intermediate data for executing the query, which needs to be kept secret in the middle of the data, so the comparison result should be in ciphertext form; if Execute a query such as SELECT*WHEREA>B, because when the statement is executed, it is directly determined whether to output the tuple after each comparison, so there is no need to keep the data center secret, and the comparison result is output in plain text. Therefore, the present invention designs a comparison protocol as shown in Table 1. In particular, according to the current research progress, it is easier to design the protocol to obtain the comparison result of the Goldwasser-Micali cryptosystem, also known as QR encryption. The present invention continues to use the current achievement, that is, utilizes the thinking design of the existing scheme to obtain the result encrypted with the QR encryption scheme, and designs a protocol to change the encryption mode, and converts the QR encrypted data into BCP encrypted data, as shown in Table 1.
表1Table 1
至此,本发明设计了在密文下求解SQL查询条件语句的方法,随后采用成熟的数据库管理系统中的方法,输出与之相对应的内容即可。So far, the present invention has designed a method for solving SQL query conditional statements under ciphertext, and then adopts a method in a mature database management system to output the corresponding content.
实施例3:Example 3:
基于同一发明构思的本发明还提供了一种基于可搜索加密和同态加密的数据搜索与查询系统,包括:Based on the same inventive concept, the present invention also provides a data search and query system based on searchable encryption and homomorphic encryption, including:
凭证生成模块,用于基于数据访问者需要获取的数据项的关键字生成所述关键字的搜索凭证;A voucher generation module, configured to generate a search voucher for the keyword based on the keyword of the data item that the data visitor needs to obtain;
索引映射模块,用于基于所述搜索凭证从预先构建的数据条目索引映射表中检索与所述搜索凭证相对应的包含所述关键字的数据条目索引映射表;An index mapping module, configured to retrieve a data entry index mapping table corresponding to the search credential and containing the keyword from a pre-built data entry index mapping table based on the search credential;
数据下载模块,用于对所述数据条目索引映射表进行解密得到数据条目索引,并基于所述数据条目索引下载需要获取的数据项;A data download module, configured to decrypt the data entry index mapping table to obtain a data entry index, and download data items to be obtained based on the data entry index;
其中,数据条目索引映射表是通过分词技术对文本形式的数据进行分词,从分词中提取出其中的关键字构建的。Wherein, the data entry index mapping table is constructed by segmenting data in the form of text through a word segmentation technology, and extracting keywords therein from the word segmentation.
一种基于可搜索加密和同态加密的数据搜索与查询系统还包括:表格构建模块用于数据条目索引映射表的构建。A data search and query system based on searchable encryption and homomorphic encryption further includes: a table construction module for constructing a data entry index mapping table.
所述表格构建模块具体用于:The form building blocks are specifically used for:
对文本形式的数据采用分词技术进行分词;Word segmentation technology is used to segment data in the form of text;
从所述分词中提取关键字;Extract keywords from the word segmentation;
由所述关键字和与所述关键字对应的分词构建数据条目索引映射表。A data entry index mapping table is constructed from the keywords and word segmentations corresponding to the keywords.
下面对表格构建模块进行详细介绍:The table building blocks are described in detail below:
对于文本形式的机密数据,数据拥有者首先通过分词技术对数据表中存在的多个字段进行分词,提取出其中的关键字。关键字通常为代表本数据条目信息的重要实词。For confidential data in the form of text, the data owner first uses the word segmentation technology to segment the multiple fields in the data table to extract the keywords. Keywords are usually important content words representing the information of this data item.
基于上述提取出的关键字集合,数据拥有者针对该文本文件建立基于关键字的搜索索引。索引的形式为[关键字A,包含关键字A的数据项在文件所有项中的索引]的映射表。Based on the keyword set extracted above, the data owner builds a keyword-based search index for the text file. The form of the index is a mapping table of [keyword A, the index of the data item containing the keyword A in all items of the file].
数据拥有者采用主流加密算法(如AES)对文本文件中全部数据逐行加密,同时对提取出的每一个关键字也分别加密。对于之前建立的索引结构,通过密码学方法(如双线性对)对其进行置换,从而避免他人在拥有索引时无法根据索引获知关键字和数据项的对应关系;但在拿到特定的搜索凭证时可以用其获得索引中相对应的部分信息。The data owner uses the mainstream encryption algorithm (such as AES) to encrypt all the data in the text file line by line, and at the same time encrypts each extracted keyword separately. For the previously established index structure, it is replaced by cryptographic methods (such as bilinear pairing), so as to prevent others from knowing the corresponding relationship between keywords and data items according to the index when they have the index; The voucher can be used to obtain the corresponding part of the information in the index.
本发明针对字符型的字段,设计密态搜索方案,使得当用户想查找包含某个关键字w的数据项时,可生成关于此关键字的搜索凭证[w],将[w]发送给数据存储服务器,数据存储服务器利用[w]执行搜索,将包含关键字w的数据项以加密的状态发送给用户。The present invention designs a secret search scheme for character-type fields, so that when a user wants to find a data item containing a certain keyword w, he can generate a search voucher [w] for this keyword, and send [w] to the data item The storage server, the data storage server uses [w] to perform a search, and sends the data item containing the keyword w to the user in an encrypted state.
凭证生成模块具体用于:The credential generation module is specifically used for:
基于所述数据访问者发送的关键字查询申请核实所述数据访问者是否具有访问权限,当所述数据访问者具有访问权限时,为所述数据访问者生成所述关键字相对应的搜索凭证。本实施例中的权限管理方用于判断数据访问者是否具有访问权限,数据存储方用于对数据进行存储管理。Verify whether the data visitor has access rights based on the keyword query application sent by the data visitor, and generate a search voucher corresponding to the keyword for the data visitor when the data visitor has access rights . The rights management part in this embodiment is used to judge whether the data visitor has access rights, and the data storage part is used to store and manage the data.
依据权限分离管理机制,当某个用户又称数据访问者想要搜索与获取包含某个关键字的数据项时,需要首先向权限管理方发送查询申请。权限管理方在核实该用户具有查询权限后,为其生成该关键词相对应的搜索凭证并发送给数据存储方。According to the authority separation management mechanism, when a user, also known as a data visitor, wants to search and obtain data items containing a certain keyword, he needs to first send a query application to the authority management party. After verifying that the user has the query authority, the authority management party generates a search voucher corresponding to the keyword and sends it to the data storage party.
索引映射模块中基于所述搜索凭证从预先构建的数据条目索引映射表中检索与所述搜索凭证相对应的包含所述关键字的数据条目索引映射表,具体包括:In the index mapping module, the data entry index mapping table containing the keyword corresponding to the search voucher is retrieved from the pre-built data entry index mapping table based on the search voucher, specifically including:
数据存储方在收到关于某个关键字的搜索凭证后,依据搜索凭证检索加密索引中与之相对应的包含该关键字的数据条目索引映射表,并将该加密映射表返回给访问者。After receiving the search voucher for a certain keyword, the data storage party retrieves the corresponding data entry index mapping table containing the keyword in the encrypted index according to the search voucher, and returns the encrypted mapping table to the visitor.
数据下载模块具体用于:The data download module is specifically used for:
访问者解密获得的结果,依据得到的数据条目索引下载相应的加密数据并进行解密。The visitor decrypts the obtained result, downloads the corresponding encrypted data according to the obtained data entry index and decrypts it.
本发明还包括数值型数据查询模块;The present invention also includes a numerical data query module;
数值型数据查询模块,用于基于所述数据访问者发出的查询指令对数值型数据进行查询。A numeric data query module, configured to query numeric data based on query instructions issued by the data visitor.
数值型数据查询模块包括:Numerical data query modules include:
加法操作子模块,用于解析数据访问者发出的查询指令,当解析结果为仅包含加法操作时,对同态加密的加密列直接进行计算得到待查询的数值型数据对应的密文;The addition operation sub-module is used to analyze the query instruction issued by the data visitor. When the analysis result contains only the addition operation, it directly calculates the encrypted columns of the homomorphic encryption to obtain the ciphertext corresponding to the numerical data to be queried;
混合操作子模块,用于当解析结果包含加法操作和非加法操作时,用生成的随机数将运算数盲化,利用BCP密码系统的双陷门对盲化后的运算数据进行解密得到盲化后数据的明文;并对所述盲化后数据的明文进行计算,同时对计算的结果进行加密,得到带有盲化因子的密文;对所述带有盲化因子的密文进行去盲化操作得到待查询的数值型数据对应的密文。The mixed operation sub-module is used to blind the operation number with the generated random number when the analysis result includes addition operation and non-addition operation, and use the double trapdoor of the BCP cryptosystem to decrypt the blinded operation data to obtain blindness the plaintext of the data after the blinding; and calculate the plaintext of the data after the blinding, and encrypt the result of the calculation at the same time to obtain the ciphertext with the blinding factor; deblind the ciphertext with the blinding factor The cipher text corresponding to the numerical data to be queried is obtained through the operation.
对于针对数值型数据的查询语句,本发明设计查询解析方案,将用户的查询指令解析为在数据库表各个数据项之间的基础代数运算。For query statements aimed at numerical data, the present invention designs a query analysis scheme, which resolves user query instructions into basic algebraic operations between various data items in the database table.
对于解析后的各基础运算,根据运算类型和涉及的数据不同,本发明设计了不同的运算执行方案。首先,对于仅涉及单一加密列上的线性(加法和数乘)运算,本发明的加法操作子模块使用BCP加密系统的加同态性质即以相同密钥加密的密文,可以在不解密的情况下,直接得到两个明文做加法后和的密文,直接得到运算结果。其次,对于涉及到两个加密列上的加法和乘法运算,本发明的混合操作子模块设计存储服务器和辅助运算服务器之间的交互协议来完成运算。具体而言,存储服务器首先生成随机数,用此随机数将运算数盲化,并将盲化后的运算数发送给辅助计算服务器;辅助计算服务器利用BCP密码系统的双陷门性质得到盲化后数据的明文,并完成计算并加密后,将带有盲化因子的运算结果的密文发送给存储服务器;存储服务器将其去盲化得到最终的运算结果。最后,对于涉及到多个加密列之间的数值比较运算,这里的比较运算包括判断两个数值的大小或判断两个数值知否相等,本发明设计类似的储服务器和辅助运算服务器之间的交互协议完成运算。For the analyzed basic operations, the present invention designs different operation execution schemes according to the types of operations and the data involved. First of all, for linear (addition and multiplication) operations involving only a single encrypted column, the addition operation submodule of the present invention uses the addition homomorphic property of the BCP encryption system, that is, the ciphertext encrypted with the same key can be encrypted without decryption. In this case, the ciphertext of the sum of two plaintexts after addition is directly obtained, and the result of the operation is directly obtained. Secondly, for the addition and multiplication operations involving two encrypted columns, the hybrid operation sub-module of the present invention designs an interactive protocol between the storage server and the auxiliary operation server to complete the operation. Specifically, the storage server first generates a random number, uses this random number to blind the operand, and sends the blinded operand to the auxiliary computing server; the auxiliary computing server uses the double trapdoor property of the BCP cryptosystem to obtain the blinded After completing the calculation and encryption, the ciphertext of the operation result with the blinding factor is sent to the storage server; the storage server deblinds it to obtain the final operation result. Finally, for the numerical comparison operation involving multiple encrypted columns, the comparison operation here includes judging the size of two numerical values or judging whether the two numerical values are equal. The present invention designs a similar The interactive protocol completes the operation.
我们采用和数据存储云归属于不同服务商的计算云服务器,这里采用存储云和辅助计算云服务器,并设计存储云和辅助计算云服务器之间的交互协议来完成非线性运算。本设计仅需计算云服务器负责极其轻量的计算,因此可实现经济化的部署。We use computing cloud servers belonging to different service providers from the data storage cloud. Here we use the storage cloud and auxiliary computing cloud servers, and design the interaction protocol between the storage cloud and auxiliary computing cloud servers to complete nonlinear operations. This design only requires the computing cloud server to be responsible for extremely light calculations, so economical deployment can be achieved.
如果解析后的结果仅包含加法操作,则由加法操作子模块直接利用同态加密的性质在密文上操作,完成查询并将查询结果返回给用户,这里的查询结果为满足条件的数值结果以及满足条件的数据项。If the parsed result only contains addition operations, the addition operation submodule directly uses the property of homomorphic encryption to operate on the ciphertext, completes the query and returns the query results to the user. The query results here are numerical results that meet the conditions and A data item that satisfies the condition.
显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。Apparently, the described embodiments are some, but not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present invention may be provided as methods, systems, or computer program products. Accordingly, the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.
以上仅为本发明的实施例而已,并不用于限制本发明,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均包含在发明待批的本发明的权利要求范围之内。The above is only an embodiment of the present invention, and is not intended to limit the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention are included in the scope of the present invention pending approval. within the scope of the claims.
Claims (9)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN2021115479235 | 2021-12-17 | ||
| CN202111547923 | 2021-12-17 | 
Publications (1)
| Publication Number | Publication Date | 
|---|---|
| CN116266180A true CN116266180A (en) | 2023-06-20 | 
Family
ID=86744083
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date | 
|---|---|---|---|
| CN202210167538.6A Pending CN116266180A (en) | 2021-12-17 | 2022-02-23 | Data searching and inquiring method and system based on searchable encryption and homomorphic encryption | 
Country Status (1)
| Country | Link | 
|---|---|
| CN (1) | CN116266180A (en) | 
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN119358036A (en) * | 2024-12-25 | 2025-01-24 | 中电智能科技有限公司 | A data retrieval method, device, electronic device and storage medium based on homomorphic encryption algorithm | 
| US12386825B1 (en) | 2024-10-07 | 2025-08-12 | International Business Machines Corporation | Parametric searching under fully homomorphic encryption | 
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN101739400A (en) * | 2008-11-11 | 2010-06-16 | 日电(中国)有限公司 | Method and device for generating indexes and retrieval method and device | 
| US20140090081A1 (en) * | 2012-09-24 | 2014-03-27 | Protegrity Usa, Inc. | Privacy Preserving Data Search | 
| CN107622212A (en) * | 2017-10-13 | 2018-01-23 | 上海海事大学 | A Hybrid Ciphertext Retrieval Method Based on Double Trapdoor | 
| CN110427771A (en) * | 2019-06-25 | 2019-11-08 | 西安电子科技大学 | What a kind of search modes were hidden can search for encryption method, Cloud Server | 
| CN111835500A (en) * | 2020-07-08 | 2020-10-27 | 浙江工商大学 | A secure sharing method of searchable encrypted data based on homomorphic encryption and blockchain | 
- 
        2022
        - 2022-02-23 CN CN202210167538.6A patent/CN116266180A/en active Pending
 
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN101739400A (en) * | 2008-11-11 | 2010-06-16 | 日电(中国)有限公司 | Method and device for generating indexes and retrieval method and device | 
| US20140090081A1 (en) * | 2012-09-24 | 2014-03-27 | Protegrity Usa, Inc. | Privacy Preserving Data Search | 
| CN107622212A (en) * | 2017-10-13 | 2018-01-23 | 上海海事大学 | A Hybrid Ciphertext Retrieval Method Based on Double Trapdoor | 
| CN110427771A (en) * | 2019-06-25 | 2019-11-08 | 西安电子科技大学 | What a kind of search modes were hidden can search for encryption method, Cloud Server | 
| CN111835500A (en) * | 2020-07-08 | 2020-10-27 | 浙江工商大学 | A secure sharing method of searchable encrypted data based on homomorphic encryption and blockchain | 
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US12386825B1 (en) | 2024-10-07 | 2025-08-12 | International Business Machines Corporation | Parametric searching under fully homomorphic encryption | 
| CN119358036A (en) * | 2024-12-25 | 2025-01-24 | 中电智能科技有限公司 | A data retrieval method, device, electronic device and storage medium based on homomorphic encryption algorithm | 
Similar Documents
| Publication | Publication Date | Title | 
|---|---|---|
| Mehmood et al. | Protection of big data privacy | |
| CN106571905B (en) | A kind of numeric type data homomorphism Order Preserving Encryption Method | |
| JP7011874B1 (en) | Data sharing systems, data sharing methods and data sharing programs | |
| Kadhem et al. | MV-OPES: Multivalued-order preserving encryption scheme: A novel scheme for encrypting integer value to many different values | |
| Dowsley et al. | A survey on design and implementation of protected searchable data in the cloud | |
| CN107168998A (en) | A kind of database transparent encryption method based on reservation form | |
| Singh et al. | Database security using encryption | |
| Kamara | Restructuring the NSA metadata program | |
| Li et al. | CDPS: A cryptographic data publishing system | |
| CN116266180A (en) | Data searching and inquiring method and system based on searchable encryption and homomorphic encryption | |
| Shekhawat et al. | Privacy-preserving techniques for big data analysis in cloud | |
| Song et al. | Design and implementation of HDFS data encryption scheme using ARIA algorithm on Hadoop | |
| Rao et al. | R-PEKS: RBAC enabled PEKS for secure access of cloud data | |
| CN113158210A (en) | Database encryption method and device | |
| Park et al. | PKIS: practical keyword index search on cloud datacenter | |
| Lv et al. | RASK: Range spatial keyword queries on massive encrypted geo-textual data | |
| CN119311644A (en) | A homomorphic encryption ciphertext retrieval method and system based on hardware encryption card | |
| CN117971798B (en) | Data isolation method, system and equipment for SaaS software multi-technology integration | |
| Sun et al. | SQL queries over encrypted databases: a survey | |
| Fugkeaw et al. | PPAC-CDW: A privacy-preserving access control scheme with fast OLAP query and efficient revocation for cloud data warehouse | |
| Souror et al. | Secure query processing for smart grid data using searchable symmetric encryption | |
| Dave et al. | Securing SQL with access control for database as a service model | |
| Kumar et al. | Secure query processing over encrypted database through cryptdb | |
| Abdulhamid et al. | Development of blowfish encryption scheme for secure data storage in public and commercial cloud computing environment | |
| Tian et al. | CloudKeyBank: privacy and owner authorization enforced key management Framework | 
Legal Events
| Date | Code | Title | Description | 
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |