CN113938266B - Junk mail filter training method and system based on integer vector homomorphic encryption - Google Patents
Junk mail filter training method and system based on integer vector homomorphic encryption Download PDFInfo
- Publication number
- CN113938266B CN113938266B CN202111098997.5A CN202111098997A CN113938266B CN 113938266 B CN113938266 B CN 113938266B CN 202111098997 A CN202111098997 A CN 202111098997A CN 113938266 B CN113938266 B CN 113938266B
- Authority
- CN
- China
- Prior art keywords
- training
- client
- model
- data set
- cloud server
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/08—Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
- H04L9/0861—Generation of secret information including derivation or calculation of cryptographic keys or passwords
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/008—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Bioethics (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Information Transfer Between Computers (AREA)
- Computer And Data Communications (AREA)
Abstract
本发明提供了一种基于整数向量同态加密的垃圾邮件过滤器训练方法及系统,包括如下步骤:S1:预处理,包括系统参数和密钥生成;S2:对训练数据集进行加密;S3:隐私保护的垃圾邮件过滤模型训练;S4:云服务器端垃圾邮件过滤模型训练完成,返回模型权重。本发明在能够在实现基本需求的前提下极大确保数据的安全性和可靠性,具有很高的实用价值和扩展空间。在模型训练的协议中,由云服务器端执行大部分计算,因此客户端的工作负载很低,当客户端的计算能力有限时,能够方便快速的计算;在基于客户端的协议中,由客户端执行大部分计算仅发生一次,在一定程度上亦降低了通信开销。
The invention provides a spam filter training method and system based on integer vector homomorphic encryption, which includes the following steps: S1: preprocessing, including system parameters and key generation; S2: encrypting the training data set; S3: Privacy-protecting spam filtering model training; S4: The cloud server-side spam filtering model training is completed and the model weights are returned. The present invention can greatly ensure the security and reliability of data on the premise of realizing basic needs, and has high practical value and expansion space. In the model training protocol, most of the calculations are performed by the cloud server, so the client's workload is very low. When the client's computing power is limited, calculations can be made conveniently and quickly. In the client-based protocol, the client performs most of the calculations. Part of the calculation only occurs once, which also reduces communication overhead to a certain extent.
Description
技术领域Technical field
本发明涉及安全多方计算隐私保护技术领域,具体而言,涉及一种基于整数向量同态加密的垃圾邮件过滤器训练方法及系统。The present invention relates to the technical field of secure multi-party computing privacy protection, and specifically to a spam filter training method and system based on integer vector homomorphic encryption.
背景技术Background technique
近年来,随着机器学习技术的不断发展,人们对隐私保护愈加重视。云计算的兴起,使得“机器学习即服务”发展了起来,企业用户可以将工作外包给第三方供应商。同时,云计算不能侵犯用户的数据隐私,保护隐私和安全的一种方法是在将数据上传到云之前对数据进行加密。如果加密阻止了有效的计算,数据的效用将受到严重限制,但是,这个问题可以通过使用同态加密方案来解决,该方案允许在不解密的情况下对加密数据执行操作,而不会泄露结果以外的任何信息。In recent years, with the continuous development of machine learning technology, people have paid more and more attention to privacy protection. The rise of cloud computing has led to the development of "machine learning as a service", where enterprise users can outsource work to third-party providers. At the same time, cloud computing cannot violate users' data privacy. One way to protect privacy and security is to encrypt data before uploading it to the cloud. If encryption prevents efficient computation, the utility of the data will be severely limited. However, this problem can be solved by using a homomorphic encryption scheme that allows operations to be performed on encrypted data without decryption, without revealing the results. any information other than
电子邮件是一种私人的交流媒介,其信息仅供收件人阅读。由于信息内容的敏感性,共享和发布电子邮件数据可能存在个人、战略和法律限制。这些限制在许多电子邮件处理应用程序中形成了巨大的障碍,例如垃圾邮件过滤,通常由一个单独的服务提供商提供。电子邮件固有的隐私约束是开发有效的垃圾邮件过滤方法的主要障碍,需要访问大量属于多个用户的电子邮件数据。为了缓解这一问题,设计一个隐私保护的垃圾邮件过滤系统,服务器能够训练和评估基于逻辑回归的垃圾邮件分类器,并且使用同态加密和随机化等技术使得提供电子邮件数据的所有用户不能够观察到任何非自身的数据。Email is a private communication medium in which messages are read only by the recipient. Due to the sensitive nature of the information content, there may be personal, strategic and legal restrictions on sharing and publishing email data. These limitations create a significant barrier in many email processing applications, such as spam filtering, which are often provided by a separate service provider. The inherent privacy constraints of email are a major obstacle to developing effective spam filtering methods, which require access to large amounts of email data belonging to multiple users. To alleviate this problem, a privacy-preserving spam filtering system is designed in which the server is able to train and evaluate a spam classifier based on logistic regression, and uses techniques such as homomorphic encryption and randomization to make it impossible for all users who provide email data to Observed any data other than itself.
目前也出现了一些解决上述问题的方法,例如基于同态加密算法的数据聚合方法;简单来说,数据聚合的作用是将多个数据聚合成一个数据;而同态加密算法具有这样的特性:对明文进行环上的加法和乘法运算再加密,与加密后对密文进行相应的运算,结果是等价的。现有的聚合方案使用paillier加密方案来完成模型的训练,通过使数据扩展一个大整数的倍数来使得paillier可以在实数域上使用。但是这样的聚合方案存在效率较低、方案繁琐、且缩短了加密域等缺点。There are currently some methods to solve the above problems, such as data aggregation methods based on homomorphic encryption algorithms; simply put, the function of data aggregation is to aggregate multiple data into one data; and the homomorphic encryption algorithm has the following characteristics: Performing ring addition and multiplication operations on the plaintext and then encrypting it, and performing corresponding operations on the ciphertext after encryption are equivalent. Existing aggregation schemes use a paillier encryption scheme to complete model training, allowing the paillier to be used in the real number domain by expanding the data by a multiple of a large integer. However, such aggregation scheme has shortcomings such as low efficiency, cumbersome scheme, and shortened encryption domain.
发明内容Contents of the invention
本发明要解决的问题是:已有的垃圾邮件过滤器训练方法存在性能较差、步骤繁琐等问题。The problem to be solved by the present invention is that existing spam filter training methods have problems such as poor performance and cumbersome steps.
为解决上述问题,一方面,本发明提供一种基于整数向量同态加密的垃圾邮件过滤器训练方法,其中,包括如下步骤:In order to solve the above problems, on the one hand, the present invention provides a spam filter training method based on integer vector homomorphic encryption, which includes the following steps:
S1:预处理,包括系统参数和密钥生成;S1: Preprocessing, including system parameters and key generation;
S2:对训练数据集进行加密;S2: Encrypt the training data set;
S3:隐私保护的垃圾邮件过滤模型训练;S3: Privacy-preserving spam filtering model training;
S4:云服务器端垃圾邮件过滤模型训练完成,返回模型权重。S4: The cloud server-side spam filtering model training is completed, and the model weights are returned.
优选地,所述步骤S1具体包括如下步骤:Preferably, step S1 specifically includes the following steps:
S11:客户端数据处理:首先根据文本数据的内容生成相应的词向量,词向量是来自词汇表的单词或短语被映射到实数的向量,对于一个处理过的文本数据,如果句子中该词汇存在则给出1,如果不存在则给出0,然后再生成加密密钥对;S11: Client data processing: First, the corresponding word vector is generated according to the content of the text data. The word vector is a vector in which words or phrases from the vocabulary are mapped to real numbers. For a processed text data, if the word exists in the sentence Then 1 is given, or 0 if it does not exist, and then the encryption key pair is generated;
S12:云服务器端数据处理:首先在服务器上预设一些向量和矩阵进行存储,这包括单位向量和各种长度的编码矢量。S12: Cloud server-side data processing: First, some vectors and matrices are preset on the server for storage, including unit vectors and encoding vectors of various lengths.
优选地,所述步骤S2具体包括如下步骤:Preferably, the step S2 specifically includes the following steps:
S21:客户端使用加密函数加密向量,使用加密工具对原始数据集逐条加密,然后将加密后的数据集、加密工具函数、公钥以及辅助信息发送给服务器;S21: The client uses the encryption function to encrypt the vector, uses the encryption tool to encrypt the original data set one by one, and then sends the encrypted data set, encryption tool function, public key and auxiliary information to the server;
S22:客户端将加密好的密文数据集和公钥M上传至云服务器端,云服务器端收到加密后的数据集,然后运行同态加密的逻辑回归算法。S22: The client uploads the encrypted ciphertext data set and public key M to the cloud server. The cloud server receives the encrypted data set and then runs the homomorphic encryption logistic regression algorithm.
优选地,所述步骤S3具体包括如下步骤:Preferably, the step S3 specifically includes the following steps:
S31:云服务器端设置迭代次数n;S31: The cloud server sets the number of iterations n;
S32:计算密文梯度;S32: Calculate the ciphertext gradient;
S33:利用梯度下降法更新参数;S33: Update parameters using gradient descent method;
S34:重复步骤S32、S33,直至达到迭代次数n。S34: Repeat steps S32 and S33 until the iteration number n is reached.
优选地,所述步骤S4具体包括如下步骤:Preferably, the step S4 specifically includes the following steps:
S41:训练完成后,云服务器端得到了加密的训练模型,将密文模型及模型权重ω返回给客户端;S41: After the training is completed, the cloud server obtains the encrypted training model and returns the ciphertext model and model weight ω to the client;
S42:客户端收到密文模型及模型权重ω,并使用私钥解密得到明文模型,即训练好的垃圾邮件过滤模型,使用该模型对数据进行预测。S42: The client receives the ciphertext model and model weight ω, and uses the private key to decrypt it to obtain the plaintext model, which is the trained spam filtering model, and uses the model to predict the data.
另一方面,本发明还提供一种系统,其采用了如上所述的基于整数向量同态加密的垃圾邮件过滤器训练方法,如图2所示,其中,所述系统包括:On the other hand, the present invention also provides a system that adopts the above-mentioned spam filter training method based on integer vector homomorphic encryption, as shown in Figure 2, wherein the system includes:
客户端和云服务器端;Client and cloud server;
所述客户端由客户主机群组成,用于提供训练数据集,加密所述训练数据集;The client is composed of a client host group and is used to provide a training data set and encrypt the training data set;
所述云服务器端用于完成加密的数据集垃圾邮件过滤训练模型;The cloud server is used to complete the encrypted data set spam filtering training model;
所述客户端利用公钥将所述训练数据集加密后,将所述训练数据集发送给所述云服务器端,所述云服务器端运行隐私保护逻辑回归算法训练模型,训练完成后发送给所述客户端,所述客户端使用私钥解密得到明文模型。After the client uses the public key to encrypt the training data set, the training data set is sent to the cloud server. The cloud server runs the privacy protection logistic regression algorithm training model. After the training is completed, it is sent to the cloud server. The client uses the private key to decrypt to obtain the plaintext model.
相对于现有技术,本发明所述的基于整数向量同态加密的垃圾邮件过滤器训练方法及系统具有以下有益效果:Compared with the existing technology, the spam filter training method and system based on integer vector homomorphic encryption described in the present invention has the following beneficial effects:
(1)本发明所述的基于整数向量同态加密的垃圾邮件过滤器训练方法及系统包括基于云服务器端的隐私保护逻辑回归训练协议和基于客户端的隐私数据加密协议,在协议执行过程中,使用同态加密将客户端的隐私数据进行加密并上传至云服务器端上,之后云服务器端使用这些加密数据运行隐私保护的逻辑回归协议,训练完成后得到模型权重的密文,最终云服务器端将其传输给客户端,客户端利用其私钥进行解密得到明文模型权重;(1) The spam filter training method and system based on integer vector homomorphic encryption according to the present invention includes a cloud server-based privacy protection logistic regression training protocol and a client-based privacy data encryption protocol. During the execution of the protocol, use Homomorphic encryption encrypts the client's private data and uploads it to the cloud server. The cloud server then uses these encrypted data to run the privacy-protecting logistic regression protocol. After the training is completed, the ciphertext of the model weights is obtained, and finally the cloud server Transmitted to the client, the client uses its private key to decrypt to obtain the plaintext model weight;
(2)本发明所述的基于整数向量同态加密的垃圾邮件过滤器训练方法及系统在逻辑回归训练方法中,使用了最小二乘法拟合logistic函数,使得最终计算结果能更好的拟合数据,在能够在实现基本需求的前提下极大确保数据的安全性和可靠性,具有很高的实用价值和扩展空间。在模型训练的协议中,由云服务器端执行大部分计算,因此客户端的工作负载很低,当客户端的计算能力有限时,能够方便快速的计算;在基于客户端的协议中,由客户端执行大部分计算仅发生一次,在一定程度上亦降低了通信开销。(2) The spam filter training method and system based on integer vector homomorphic encryption of the present invention uses the least squares method to fit the logistic function in the logistic regression training method, so that the final calculation result can be better fitted Data can greatly ensure the security and reliability of data while meeting basic needs, and has high practical value and room for expansion. In the model training protocol, most of the calculations are performed by the cloud server, so the client's workload is very low. When the client's computing power is limited, calculations can be made conveniently and quickly; in the client-based protocol, the client performs most of the calculations. Part of the calculation only occurs once, which also reduces communication overhead to a certain extent.
附图说明Description of the drawings
图1为本发明的方法流程图;Figure 1 is a flow chart of the method of the present invention;
图2为本发明的系统示意图。Figure 2 is a schematic diagram of the system of the present invention.
具体实施方式Detailed ways
为使本发明的上述目的、特征和优点能够更为明显易懂,下面结合附图对本发明的具体实施例做详细的说明。In order to make the above objects, features and advantages of the present invention more obvious and understandable, specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
实施例一Embodiment 1
提供一种基于整数向量同态加密的垃圾邮件过滤器训练方法,如图1所示,其中,包括如下步骤:A spam filter training method based on integer vector homomorphic encryption is provided, as shown in Figure 1, which includes the following steps:
S1:预处理,包括系统参数和密钥生成;S1: Preprocessing, including system parameters and key generation;
S2:对训练数据集进行加密;S2: Encrypt the training data set;
S3:隐私保护的垃圾邮件过滤模型训练;S3: Privacy-preserving spam filtering model training;
S4:云服务器端垃圾邮件过滤模型训练完成,返回模型权重。S4: The cloud server-side spam filtering model training is completed, and the model weights are returned.
其中,所述步骤S1具体包括如下步骤:Among them, the step S1 specifically includes the following steps:
S11:客户端数据处理:首先根据文本数据的内容生成相应的词向量,词向量是来自词汇表的单词或短语被映射到实数的向量,对于一个处理过的文本数据,如果句子中该词汇存在则给出1,如果不存在则给出0,然后再生成加密密钥对;具体为:S11: Client data processing: First, the corresponding word vector is generated according to the content of the text data. The word vector is a vector in which words or phrases from the vocabulary are mapped to real numbers. For a processed text data, if the word exists in the sentence Then give 1, if it does not exist, give 0, and then generate the encryption key pair; specifically:
先客户端初始化参数,将邮件数据集转化成词向量数据集;然后构建私钥S,S=[I,T],其中I∈Zm×m为单位矩阵,T∈Zm×(n-m)为一个随机矩阵。再使用密钥转换技术,首先通过比特转换,将密钥S转换为新的密钥S′,其满足:S′c*=Sc,其中c为密文,c*为比特转换后的密文;之后建立一个密钥转换矩阵M,其满足:S′M=S*+E,其中E为误差项,将M作为加密数据的公钥。First, the client initializes the parameters and converts the email data set into a word vector data set; then constructs the private key S, S = [I, T], where I∈Z m×m is the identity matrix, and T∈Z m×(nm) is a random matrix. Then use the key conversion technology, first convert the key S into a new key S′ through bit conversion, which satisfies: S′c * = Sc, where c is the ciphertext and c * is the bit-converted ciphertext. ; Then establish a key conversion matrix M, which satisfies: S'M=S * +E, where E is the error term, and M is used as the public key for encrypted data.
S12:云服务器端数据处理:首先在服务器上预设一些向量和矩阵进行存储,这包括单位向量和各种长度的编码矢量,由此减少算法本身的计算量,从而使得整数向量同态加密算法在密文逻辑回归方案中更加高效。S12: Cloud server-side data processing: First, some vectors and matrices are preset on the server for storage, including unit vectors and encoding vectors of various lengths, thereby reducing the calculation amount of the algorithm itself, thereby making the integer vector homomorphic encryption algorithm More efficient in ciphertext logistic regression schemes.
其中,所述步骤S2具体包括如下步骤:Among them, the step S2 specifically includes the following steps:
S21:客户端使用加密函数加密向量,使用加密工具对原始数据集逐条加密,然后将加密后的数据集、加密工具函数、公钥以及辅助信息发送给服务器;具体为:S21: The client uses the encryption function to encrypt the vector, uses the encryption tool to encrypt the original data set one by one, and then sends the encrypted data set, encryption tool function, public key and auxiliary information to the server; specifically:
对于步骤S11生成的词向量数据集D={(x1,y1),(x2,y2),…,(xm,ym)},使用整数向量加密方案用公钥M加密为D={(c1,y1),(c2,y2),…,(cm,ym)},其中ci满足Sci=wxi+ei,其中w是一个大整数,ei为小于w/2的随机误差。For the word vector data set D={(x 1 , y 1 ), (x 2 , y 2 ),..., (x m , y m )} generated in step S11, use the integer vector encryption scheme to encrypt with the public key M as D={(c 1 ,y 1 ), (c 2 ,y 2 ),…,(c m ,y m )}, where c i satisfies Sc i =wx i +e i , where w is a large integer, e i is a random error less than w/2.
S22:客户端将加密好的密文数据集和公钥M上传至云服务器端,云服务器端收到加密后的数据集,然后运行同态加密的逻辑回归算法。S22: The client uploads the encrypted ciphertext data set and public key M to the cloud server. The cloud server receives the encrypted data set and then runs the homomorphic encryption logistic regression algorithm.
其中,所述步骤S3具体包括如下步骤:Among them, the step S3 specifically includes the following steps:
S31:云服务器端设置迭代次数n;具体为:S31: The cloud server sets the number of iterations n; specifically:
云服务器端产生随机向量ρ,并生成迭代次数n。The cloud server generates a random vector ρ and the number of iterations n.
S32:计算密文梯度;具体为:S32: Calculate the ciphertext gradient; specifically:
云服务器端计算密文梯度其中hρ为Sigmoid函数,/>对于该函数使用最小二乘法进行逼近,即,使用以下公式计算:Cloud server-side calculation of ciphertext gradient where h ρ is the Sigmoid function,/> This function is approximated using the least squares method, i.e. calculated using the following formula:
g7(x)=b0+b1(x/8)+b3(x/8)3+b5(x/8)5+b7(x/8)7 g 7 (x)=b 0 +b 1 (x/8)+b 3 (x/8) 3 +b 5 (x/8) 5 +b 7 (x/8) 7
最后所有的运算均能转换为向量间的加法和内积运算。Finally, all operations can be converted into addition and inner product operations between vectors.
S33:利用梯度下降法更新参数;具体为:S33: Use gradient descent method to update parameters; specifically:
利用梯度下降法更新参数:其中ρj是第j次训练参数的迭代值。Update parameters using gradient descent: where ρ j is the iterative value of the jth training parameter.
S34:重复步骤S32、S33,直至达到迭代次数n。S34: Repeat steps S32 and S33 until the iteration number n is reached.
其中,所述步骤S4具体包括如下步骤:Among them, the step S4 specifically includes the following steps:
S41:训练完成后,云服务器端得到了加密的训练模型,将密文模型及模型权重ω返回给客户端;S41: After the training is completed, the cloud server obtains the encrypted training model and returns the ciphertext model and model weight ω to the client;
S42:客户端收到密文模型及模型权重ω,并使用私钥解密得到明文模型,即训练好的垃圾邮件过滤模型,可以使用该模型对数据进行预测。S42: The client receives the ciphertext model and model weight ω, and uses the private key to decrypt it to obtain the plaintext model, which is the trained spam filtering model. This model can be used to predict the data.
这样,本实施例中的方法是基于整数向量同态加密算法和隐私保护逻辑回归算法实现了一种隐私保护垃圾邮件过滤器的训练。首先客户端进行数据处理:根据文本数据的内容生成相应的词向量;之后,运行加密算法对其进行加密处理得到密文数据并将其发送给云服务器端;云服务器收到密文后,运行隐私保护的逻辑回归算法,进行密文下的模型训练;训练完成后得到密文模型,将其发送回给客户端,客户端利用其私钥进行解密得到明文模型。该方法不仅可以对训练数据进行隐私保护,云服务器端也无法获取训练模型信息。In this way, the method in this embodiment implements the training of a privacy-preserving spam filter based on the integer vector homomorphic encryption algorithm and the privacy-preserving logistic regression algorithm. First, the client performs data processing: generates corresponding word vectors based on the content of the text data; then, runs an encryption algorithm to encrypt it to obtain ciphertext data and sends it to the cloud server; after the cloud server receives the ciphertext, it runs The privacy-preserving logistic regression algorithm performs model training under ciphertext; after the training is completed, the ciphertext model is obtained and sent back to the client. The client uses its private key to decrypt and obtain the plaintext model. This method can not only protect the privacy of training data, but also prevent the cloud server from obtaining training model information.
实施例二Embodiment 2
提供一种系统,其采用了如实施例一所述的基于整数向量同态加密的垃圾邮件过滤器训练方法,如图2所示,其中,所述系统包括:A system is provided, which adopts the spam filter training method based on integer vector homomorphic encryption as described in Embodiment 1, as shown in Figure 2, wherein the system includes:
客户端(Client)和云服务器端(Cloud);Client (Client) and cloud server (Cloud);
所述客户端由客户主机群组成,用于提供训练数据集,加密所述训练数据集;The client is composed of a client host group, and is used to provide a training data set and encrypt the training data set;
所述云服务器端用于完成加密的数据集垃圾邮件过滤训练模型;The cloud server is used to complete the encrypted data set spam filtering training model;
所述客户端利用公钥将所述训练数据集加密后,将所述训练数据集发送给所述云服务器端,所述云服务器端运行隐私保护逻辑回归算法训练模型,训练完成后发送给所述客户端,所述客户端使用私钥解密得到明文模型。After the client uses the public key to encrypt the training data set, the training data set is sent to the cloud server. The cloud server runs the privacy protection logistic regression algorithm training model. After the training is completed, it is sent to the cloud server. The client uses the private key to decrypt to obtain the plaintext model.
这样,本实施例中的系统有客户端和云服务器端两个实体;客户端(Client)拥有用于训练模型的电子邮件数据集,如果不经加密,云服务器端(Cloud)就能直接获取到其中的数据集信息,这些数据集中可能包含一些隐私和敏感数据,因此客户端并不希望云服务器端能够获取其中的信息。由此,客户端首先生成密钥,并使用生成的密钥加密自己的数据集,最后将加密的数据集也就是密文数据发送到云服务器端;云服务器端拿到这些密文数据,只能按照固有的计算协议分析处理,因此它会按照预设的密文数据训练方法完成密文逻辑回归的训练,最终将得到一个密文逻辑回归模型,同时云服务器端也不能解密该模型,因此它也获取不到模型的信息;最后云服务器端将密文模型发送给客户端。In this way, the system in this embodiment has two entities: the client and the cloud server; the client (Client) owns the email data set used to train the model. If it is not encrypted, the cloud server (Cloud) can directly obtain it. These data sets may contain some private and sensitive data, so the client does not want the cloud server to obtain the information. Therefore, the client first generates a key, uses the generated key to encrypt its own data set, and finally sends the encrypted data set, that is, the ciphertext data to the cloud server; the cloud server gets the ciphertext data and only It can be analyzed and processed according to the inherent computing protocol, so it will complete the ciphertext logistic regression training according to the preset ciphertext data training method, and finally a ciphertext logistic regression model will be obtained. At the same time, the cloud server cannot decrypt the model, so It also cannot obtain model information; finally, the cloud server sends the ciphertext model to the client.
虽然本发明披露如上,但本发明的保护范围并非仅限于此。本领域技术人员在不脱离本发明的精神和范围的前提下,可进行各种变更与修改,这些变更与修改均将落入本发明的保护范围。Although the present invention is disclosed as above, the protection scope of the present invention is not limited thereto. Those skilled in the art can make various changes and modifications without departing from the spirit and scope of the present invention, and these changes and modifications will fall within the protection scope of the present invention.
Claims (4)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111098997.5A CN113938266B (en) | 2021-09-18 | 2021-09-18 | Junk mail filter training method and system based on integer vector homomorphic encryption |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111098997.5A CN113938266B (en) | 2021-09-18 | 2021-09-18 | Junk mail filter training method and system based on integer vector homomorphic encryption |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN113938266A CN113938266A (en) | 2022-01-14 |
| CN113938266B true CN113938266B (en) | 2024-03-26 |
Family
ID=79276257
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202111098997.5A Expired - Fee Related CN113938266B (en) | 2021-09-18 | 2021-09-18 | Junk mail filter training method and system based on integer vector homomorphic encryption |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN113938266B (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119544658B (en) * | 2025-01-22 | 2025-05-09 | 广东盈世计算机科技有限公司 | Method, system, computer equipment and storage medium for identifying junk mail |
Citations (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5600720A (en) * | 1993-07-20 | 1997-02-04 | Canon Kabushiki Kaisha | Encryption apparatus, communication system using the same and method therefor |
| CN102035753A (en) * | 2009-10-02 | 2011-04-27 | 青岛理工大学 | A Spam Filtering Method Based on Filter Dynamic Integration |
| CN110084063A (en) * | 2019-04-23 | 2019-08-02 | 中国科学技术大学 | A kind of gradient descent algorithm method for protecting private data |
| CN110190946A (en) * | 2019-07-12 | 2019-08-30 | 之江实验室 | A kind of secret protection multimachine structure data classification method based on homomorphic cryptography |
| CN110190945A (en) * | 2019-05-28 | 2019-08-30 | 暨南大学 | Multi-encryption-based linear regression privacy protection method and system |
| CN111460478A (en) * | 2020-03-30 | 2020-07-28 | 西安电子科技大学 | A privacy protection method for collaborative deep learning model training |
| CN111563265A (en) * | 2020-04-27 | 2020-08-21 | 电子科技大学 | Distributed deep learning method based on privacy protection |
| CN112182649A (en) * | 2020-09-22 | 2021-01-05 | 上海海洋大学 | A Data Privacy Protection System Based on Secure Two-Party Computation Linear Regression Algorithm |
| CN112822005A (en) * | 2021-02-01 | 2021-05-18 | 福州大学 | Secure transfer learning system based on homomorphic encryption |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2001211155A (en) * | 2000-01-25 | 2001-08-03 | Murata Mach Ltd | Method and device for generating common key and cipher communication method |
| US8989089B2 (en) * | 2011-08-18 | 2015-03-24 | Ofinno Technologies, Llc | Automobile data transmission |
| US9252942B2 (en) * | 2012-04-17 | 2016-02-02 | Futurewei Technologies, Inc. | Method and system for secure multiparty cloud computation |
| US9405928B2 (en) * | 2014-09-17 | 2016-08-02 | Commvault Systems, Inc. | Deriving encryption rules based on file content |
| US20190007196A1 (en) * | 2017-06-28 | 2019-01-03 | Qatar University | Method and system for privacy preserving computation in cloud using fully homomorphic encryption |
-
2021
- 2021-09-18 CN CN202111098997.5A patent/CN113938266B/en not_active Expired - Fee Related
Patent Citations (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5600720A (en) * | 1993-07-20 | 1997-02-04 | Canon Kabushiki Kaisha | Encryption apparatus, communication system using the same and method therefor |
| CN102035753A (en) * | 2009-10-02 | 2011-04-27 | 青岛理工大学 | A Spam Filtering Method Based on Filter Dynamic Integration |
| CN110084063A (en) * | 2019-04-23 | 2019-08-02 | 中国科学技术大学 | A kind of gradient descent algorithm method for protecting private data |
| CN110190945A (en) * | 2019-05-28 | 2019-08-30 | 暨南大学 | Multi-encryption-based linear regression privacy protection method and system |
| CN110190946A (en) * | 2019-07-12 | 2019-08-30 | 之江实验室 | A kind of secret protection multimachine structure data classification method based on homomorphic cryptography |
| CN111460478A (en) * | 2020-03-30 | 2020-07-28 | 西安电子科技大学 | A privacy protection method for collaborative deep learning model training |
| CN111563265A (en) * | 2020-04-27 | 2020-08-21 | 电子科技大学 | Distributed deep learning method based on privacy protection |
| CN112182649A (en) * | 2020-09-22 | 2021-01-05 | 上海海洋大学 | A Data Privacy Protection System Based on Secure Two-Party Computation Linear Regression Algorithm |
| CN112822005A (en) * | 2021-02-01 | 2021-05-18 | 福州大学 | Secure transfer learning system based on homomorphic encryption |
Non-Patent Citations (8)
| Title |
|---|
| A publicly verifiable network coding scheme with null-space HMAC.;Mingwu Zhang;Int. J. Intell. Inf. Database Syst.;20181231;全文 * |
| Spam Filtering Based on Variable Precision Rough Set Decision Tree;Wang Jing;Journal of System Simulation;20170223;全文 * |
| Trust-based Distributed Authentication Middleware in Ubiquitous Mobile Environments;Mingwu Zhang;Third International Conference on Natural Computation (ICNC 2007);20070827;全文 * |
| 一种高效的同态加密方案及其应用;杨浩淼;金保隆;陈诚;吴新沿;;密码学报;20171215(第06期);全文 * |
| 云计算环境下朴素贝叶斯安全分类外包方案研究;陈思;;计算机应用与软件;20200712(第07期);全文 * |
| 基于信道特征和随机插值的物理层算法;吴游;陈诚;金龙;;计算机与现代化;20200415(第04期);全文 * |
| 基于概率神经网络的Drive-bydownload恶意脚本检测技术研究;付垒朋;《百度学术》;20121231;全文 * |
| 粗糙集与决策树在电子邮件分类与过滤中的应用;邓春燕;《计算机工程与应用》;20091231;全文 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN113938266A (en) | 2022-01-14 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110008717B (en) | Decision tree classification service system and method supporting privacy protection | |
| CN108737115B (en) | A privacy-preserving method for solving intersection of private attribute sets | |
| CN108632030B (en) | A Fine-grained Access Control Method Based on CP-ABE | |
| CN108847934B (en) | Multi-dimensional quantum homomorphic encryption method | |
| CN111526002B (en) | A lattice-based multi-identity fully homomorphic encryption method | |
| CN111026788A (en) | A multi-keyword ciphertext ranking retrieval method based on homomorphic encryption in hybrid cloud | |
| Wu et al. | Towards efficient secure aggregation for model update in federated learning | |
| WO2016197680A1 (en) | Access control system for cloud storage service platform and access control method therefor | |
| CN108521326A (en) | A privacy-preserving linear SVM model training algorithm based on vector homomorphic encryption | |
| CN112332979B (en) | Ciphertext search method, system and equipment in cloud computing environment | |
| CN113157778B (en) | Proxiable query method, system, device and medium for distributed data warehouse | |
| CN108833077A (en) | Encryption and decryption method based on outsourcing classifier based on homomorphic OU cipher | |
| CN110247767A (en) | Voidable attribute base outsourcing encryption method in mist calculating | |
| CN111159727B (en) | Multi-party cooperation oriented Bayes classifier safety generation system and method | |
| CN110059501A (en) | A kind of safely outsourced machine learning method based on difference privacy | |
| CN115994559A (en) | An Efficient Transformation Method for Inattentive Neural Networks | |
| Zhang et al. | Efficient privacy-preserving federated learning with improved compressed sensing | |
| CN111967514A (en) | Data packaging-based sample classification method for privacy protection decision tree | |
| CN115062323A (en) | Multi-center federal learning method for enhancing privacy protection and computer equipment | |
| CN118381600B (en) | Federal learning privacy protection method and system | |
| CN115238288A (en) | Safety processing method for industrial internet data | |
| CN113794561A (en) | Public key searchable encryption method and system | |
| CN107203723B (en) | File storage and retrieval method on multiple public clouds based on hash table method | |
| CN111339539A (en) | Efficient encrypted image retrieval method under multi-user environment | |
| CN118364873A (en) | Convolutional neural network reasoning method with privacy protection based on edge intelligence and homomorphic encryption |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20240326 |