CN115525697B - A process optimization method based on traditional Chinese medicine production data mining - Google Patents
A process optimization method based on traditional Chinese medicine production data miningInfo
- Publication number
- CN115525697B CN115525697B CN202211273092.1A CN202211273092A CN115525697B CN 115525697 B CN115525697 B CN 115525697B CN 202211273092 A CN202211273092 A CN 202211273092A CN 115525697 B CN115525697 B CN 115525697B
- Authority
- CN
- China
- Prior art keywords
- data
- evaluation
- space
- model
- chinese medicine
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/04—Manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Mathematical Physics (AREA)
- Fuzzy Systems (AREA)
- Primary Health Care (AREA)
- General Business, Economics & Management (AREA)
- Economics (AREA)
- Health & Medical Sciences (AREA)
- Tourism & Hospitality (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Probability & Statistics with Applications (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Manufacturing & Machinery (AREA)
- Marketing (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a process optimization method based on traditional Chinese medicine production data mining, which comprises the steps of firstly, sampling data of collected traditional Chinese medicine production data through a data sampling function to obtain an initial data set, secondly, carrying out validity evaluation on sampling data points in the initial data set through a constructed data evaluation model to obtain an evaluation data set, introducing a kernel model in machine learning to carry out evaluation training on the evaluation data set to establish a prediction evaluation model, carrying out iterative operation in an interested data space through a self-adaptive optimizing function based on the prediction evaluation model to obtain optimal data points, and forming a corresponding high-quality data set by using a series of obtained optimal data points, and finally, training a traditional Chinese medicine production optimization decision model through data in the high-quality data set, and realizing accurate control of a traditional Chinese medicine production process based on the traditional Chinese medicine production optimization decision model to realize optimization of a traditional Chinese medicine production process.
Description
Technical Field
The invention relates to the technical fields of big data analysis and traditional Chinese medicine production, in particular to a process optimization method based on traditional Chinese medicine production data mining.
Background
The analysis of technological parameters is an important link in the production process of traditional Chinese medicines. In the process of manufacturing traditional Chinese medicines, process data points generated by each process unit are accumulated along with production batches. With the continuous development of the technical field of big data, how to utilize big data technology to perform data mining on the accumulated massive data points, extract potential rules and hidden relations in the process production relation from the data, and better use the potential rules and hidden relations in the process production relation for traditional Chinese medicine production process guidance and production decision, thus becoming an important current research problem. The data mining is a key link in big data analysis, specifically, the initial data is explored, valuable data is mined, and data preparation is provided for the subsequent big data analysis links such as modeling and data analysis.
In the analysis and research of big data of traditional Chinese medicine production, most of research work at present is to find a model relation between technological parameters and traditional Chinese medicine production quality, so that data analysis is carried out, optimal technological parameters are obtained, and the traditional Chinese medicine production quality is improved. The Chinese patent application CN 111888788A' is a circulating neural network control method and system (Zhang Zirui; he Yan; chen Xuesong; cai Shuting; xiong Xiaoming; zhang Riwei; teng Xiao; cao Fan; xu Yaotong) suitable for traditional Chinese medicine extraction and concentration, and the method replaces the traditional PID control system in the traditional Chinese medicine extraction and concentration link with the improved circulating neural network control system. Training is carried out by utilizing the collected data samples to obtain a neural network model, so that the network model is utilized to output optimal parameters in the traditional Chinese medicine extraction and concentration link. The data sample for training is not subjected to data mining analysis, and the data quality for model training is still greatly improved. Chinese patent application CN112348360A "a Chinese medicine production process parameter analysis system based on big data technique" (Xie Zhijian; zhang Jinghai; wang Zhenyu; zhao Feifei; zhang He), which utilizes Chinese medicine production data to establish BP neural network model by data mining to predict Chinese medicine quality index by means of production process parameter data, thereby optimizing production process. The system adopts a clustering algorithm to conduct data mining on the collected production process parameter data, and invalid data in the production process parameter data set are removed according to a clustering result, so that the quality of the data set is improved. However, the clustering algorithm converges slowly in large-scale data, and the K value (the number of clusters) is sensitive, so that the clustering algorithm is difficult to select, and especially for multi-category and multi-dimension data sets, local optimal conditions are easy to cause.
For such an extremely complex chemical process of traditional Chinese medicine production, the characteristic parameter types in the production data are various, the data space in which the data are located is a high-dimensional space, and it is very difficult to determine effective and feasible parameter areas. How to extract effective data from traditional Chinese medicine production data by using a data mining method becomes a key problem in traditional Chinese medicine production of big data analysis at present.
Disclosure of Invention
The invention aims to provide a process optimization method based on traditional Chinese medicine production data mining, which solves the difficulty that the feasible region is difficult to find in the complex data space in the prior art, and realizes the accurate control of the traditional Chinese medicine production process, thereby achieving the aim of optimizing the production process.
In order to realize the tasks, the invention adopts the following technical scheme:
a process optimization method based on traditional Chinese medicine production data mining comprises the following steps:
Firstly, data sampling is carried out on collected traditional Chinese medicine production data through a data sampling function to obtain an initial data set, secondly, validity evaluation is carried out on sampling data points in the initial data set through a constructed data evaluation model to obtain an evaluation data set, a kernel model in machine learning is introduced to establish a prediction evaluation model through evaluation training on the evaluation data set, based on the prediction evaluation model, iterative operation is carried out in a data space of interest through a self-adaptive optimizing function to obtain optimal data points, a series of acquired optimal data points form a corresponding high-quality data set, finally, a traditional Chinese medicine production optimization decision model is trained through data in the high-quality data set, accurate control on a traditional Chinese medicine production process is achieved based on the traditional Chinese medicine production optimization decision model, and optimization of a traditional Chinese medicine production process is achieved.
Further, the data sampling is performed on the collected traditional Chinese medicine production data through a data sampling function to obtain an initial data set, which comprises the following steps:
The collected Chinese medicine production data is a multidimensional data space Establishing a data sampling modelWhere s i is the sampling step size of the ith p-dimensional sub-data space x i,For sampling grids, using data sampling modelsFor traditional Chinese medicine production data sampling, determining sampling grid by setting sampling step s i in data sampling modelTo determine how much data is sampled.
Further, the validity evaluation is performed on the sampled data points in the initial data set by using the constructed data evaluation model to obtain an evaluation data set, which comprises the following steps:
Defining a map
Wherein, the Representing the number field, p representing the dimension of the field, mappingFrom p-dimensional data spaceMapping of sampled data points in (a) to a solution space S;
the solution space S is defined as:
the solution space S is composed of a tensor product of two parts, namely a classification space η and a target space τ, wherein:
The classification space eta is used for describing whether the sampling data point is a valid data point or an invalid data point, and is represented by a set formed by valid data (valid) and invalid data (invalid), and the data evaluation class is y, and y is obtained by mapping the sampling data point x through the mapping Obtained, expressed asTarget space τ, defined as Representing a number field;
Mapping D init through a data evaluation model to obtain a data point evaluation result D (x), wherein the data point evaluation result D (x) is determined by the sample data point x, an evaluation category y corresponding to the sample data point x and the target t, and D (x) is identical to (x, y, t);
The results of the evaluation of all sampled data points are collected and defined as evaluation dataset D expl.
Further, the establishing the predictive evaluation model includes:
Constructing an evaluation prediction model epsilon based on two independent kernel model estimators of a support vector machine SVM and a kernel ridge regression KRR, wherein the evaluation prediction model is used for performing evaluation prediction on data points which are not evaluated in a data space;
Defining an evaluation prediction model epsilon as a function of the evaluation dataset D expl and the data space The tensor product of (a) is mapped to the tensor product of the solution space S and the prediction result probability space η p, specifically expressed as:
further, in the predictive evaluation model, the solution space S includes a predictive evaluation category And predicting the targetWherein SVM is used for the resultKRR is used to predict optimal target parameters
Based on the kernel model, it is assumed that there is one slave data spaceMapping phi to feature space F:
with the inner product < phi (x), phi (x') > representing the data space A gaussian kernel k of two sampled data points x and x';
Mapping of two feature spaces, namely SVM and KRR:
And:
phi SVM represents the data space Mapping to the classification feature space F S, phi KRR represents the data spaceMapping to a target regression feature space F R, wherein the hyper-parameters in the Gaussian kernels k SVM (x, x ') and k KRR (x, x') are gamma SVM and gamma KRR, respectively;
The prediction result probability space eta p contains the effective prediction evaluation class probability obtained under the prediction evaluation condition of the sampling data point x The concrete steps are as follows:
in the above formula, f (x) represents a prediction result output by the prediction evaluation model, and a and b are probability model parameters.
Further, performing iterative operation in the data space of interest by using the adaptive optimizing function to obtain an optimal data point, including:
constructing an adaptive optimizing function U (U, w) to obtain an optimal data point x new, wherein the adaptive optimizing function U (U, w) consists of an optimizing vector U and a weight vector w, and is specifically as follows:
Wherein, the i w 1 denotes the l1 norm of w, the superscript T denotes a transpose;
I.e., the optimal data point x new is represented as:
further, the optimizing vector u is composed of three parts, specifically defined as:
Wherein U S predicts probability for data point class, which evaluates class probability based on effective prediction The reliability of the prediction data point evaluation category is represented by shannon information entropy S, which is specifically expressed as:
Wherein, the Representing an effective predictive assessment class probability;
U o is the target prediction reliability, and according to the prediction target Establishing a target prediction reliability judging mechanism with an existing target t, wherein the specific form is as follows:
In the above equation, the maximum and minimum targets t max and t min obtained in the evaluation of the initial dataset D init are evaluated using extremum conditions, and the resulting predicted targets in the evaluation prediction model ε Repartitioning to achieve normalization thereof, i.e
U r is a classification feature space distance, and specifically expressed as:
Wherein, |·| 2 represents l2 norms, e is a natural logarithmic base, and γ C represents a hyper-parameter;
the weight vector w is defined as:
The components s more than or equal to 0, o more than or equal to 0 and r more than or equal to 0 of the weight vector determine the influence degree of U S、Uo and U r on data point optimization respectively.
Further, by constructing the adaptive optimization function U (U, w), the obtained optimal data point x new can be expressed specifically as:
Using the established mapping relationship I.e., evaluate x new using the data evaluation model, and return the evaluation result of x new, i.e., D (x new), to the premium dataset D expl.
Further, the training of the accurate and efficient optimized decision model for producing traditional Chinese medicine by data in the high-quality data set realizes the accurate control of the production process of traditional Chinese medicine based on the optimized decision model for producing traditional Chinese medicine, and comprises the following steps:
The method comprises the steps of obtaining high-quality data set, namely, training a convolutional neural network model by using the data, inputting real-time traditional Chinese medicine production data into the convolutional neural network model after the network model is trained, outputting traditional Chinese medicine production data values at the next moment, namely, predicted production data by calculation, meanwhile, judging the predicted production data and a preset ideal production data range in intervals and making corresponding production decisions, and if the predicted production data and the preset ideal production data range exceed the interval range, adjusting corresponding technological parameter values in the traditional Chinese medicine production process according to specific exceeding the ideal interval range, so that the traditional Chinese medicine production technological process is accurately controlled.
Compared with the prior art, the invention has the following technical characteristics:
According to the invention, effective data are extracted in the traditional Chinese medicine production process by a data mining method, and aiming at the difficulty of determining a data exploration feasible region in an unknown data space, a target item is introduced into a data mining algorithm model, so that data mining emphasis is guided to an interested data region. The method comprises the steps of constructing a self-adaptive optimizing function based on a nuclear model in machine learning, realizing data prediction evaluation in an interest data subspace by training a relatively less data set, and obtaining optimal data points through sequential iterative mining. Finally, a series of data points obtained by data mining are formed into corresponding data sets, and a database formed by the data sets can provide a good data analysis basis for a subsequently established production optimization decision model, so that the production process is accurately controlled, and the optimization of the traditional Chinese medicine production process is realized.
Drawings
FIG. 1 is a flow diagram of an evaluation dataset creation process;
FIG. 2 is a graph showing the comparison of effects of an embodiment of the method of the present invention.
Detailed Description
The invention provides a process optimization method based on traditional Chinese medicine production data mining, which carries out traditional Chinese medicine production data effectiveness evaluation screening in a sequential iteration mode, and aims at the difficulty of determining a data exploration feasible region in a complex data space. Firstly, data sampling is carried out on collected traditional Chinese medicine production data through a data sampling function to obtain an initial data set, secondly, validity evaluation is carried out on sampling data points in the initial data set through a constructed data evaluation model to obtain an evaluation data set, a kernel model in machine learning is introduced to establish a prediction evaluation model through evaluation training on the evaluation data set, an iterative operation is carried out in a data space of interest through a self-adaptive optimizing function based on the prediction evaluation model to obtain optimal data points, a series of acquired optimal data points form a corresponding high-quality data set, finally, an accurate and efficient traditional Chinese medicine production optimization decision model is trained through the data in the high-quality data set, accurate control on a traditional Chinese medicine production process is realized based on the traditional Chinese medicine production optimization decision model, and optimization of a traditional Chinese medicine production process is realized.
Referring to the attached drawings, the process optimization method based on traditional Chinese medicine production data mining comprises the following specific steps:
And step 1, data sampling is carried out on the collected traditional Chinese medicine production data through a data sampling function, and an initial data set is obtained.
In order to obtain the initial dataset D init, data sampling is required for the collected mass of chinese medicine production data (process data). The huge traditional Chinese medicine production data is a multidimensional data space(Its dimension is set to p), which can be regarded as a dataset x consisting of one-dimensional sub-data space x i, i=1, 2.
Where p represents the dimension of the data space, i.e. the number of one-dimensional sub-data spaces.
In the invention, the sampling method of the traditional Chinese medicine production data is to establish a data sampling modelWhere s i is the sampling step size of the ith p-dimensional sub-data space x i,For sampling grids, using data sampling modelsSampling the production data of Chinese medicine, i.e. in p-dimensional data spaceDetermining a sampling grid by setting a sampling step size s i in a data sampling modelTo determine how much data is sampled.
By data sampling modelData sampling is performed on the traditional Chinese medicine production data (namely, a data set x), and n data points obtained by sampling are defined as an initial data set D init, specifically expressed as:
Dinit=(x1,x2,x3,...,xn)
Where x n is the nth sample data in the initial dataset.
To more clearly illustrate the data sampling mechanism described above, a specific description is given in one example.
Assuming that the obtained traditional Chinese medicine production data is a two-dimensional data set x', the method is specifically expressed as follows:
And the data space in which the data is located Consists of one-dimensional sub-data spaces x '1 = (-3, 3) and x' 2 = (-3, 3), expressed in particular as:
in the data space In the process, a data sampling model is constructedIn the data sampling model, the sampling step s i 'of each dimension of the sub-data space x' i is set to be 1, namely a 6×6 sampling grid is constructedImplementing data spaceSamples of 49 data points in (a).
And 2, carrying out validity evaluation on the sampling data points in the initial data set by using the constructed data evaluation model to obtain an evaluation data set.
A data evaluation model is built to evaluate the sampled data points x 1,x2,x3,...,xn in the initial data set D init, with the evaluated data set defined as D expl.
The construction data evaluation model is specifically as follows:
Defining a map
Wherein, the Representing the number field, p representing the dimension of the field, mappingFrom p-dimensional data spaceThe mapping of the sampled data points x to the solution space S.
The solution space S is defined as:
the solution space S is composed of a tensor product of two parts, namely a classification space η and a target space τ, wherein:
The classification space η is used to describe whether a sampled data point is a valid data point or a non-valid data point, the classification space η being defined as:
η={valid,invalid}
The classification space eta is represented by a set of valid data and invalid data, the data evaluation class of which is y, because y is the data point x which is sampled through the mapping relation Obtained and thus can be represented by identityThe specific definition is as follows:
Where≡denotes constant equality, η denotes classification space.
Target space τ, defined as:
Wherein, the Representing the number domain.
Each sampling data point is defined as independent targets t, t epsilon tau in a target space tau, the size of the target t value reflects the effectiveness degree of the sampling data point, and one of the preconditions for obtaining the target interval range of the effective data point is to solve the maximum value of the target t under the set extreme value condition.
Mapping is performed on D init through a data evaluation model, so that a data point evaluation result D (x) can be obtained, wherein the data point evaluation result D (x) is determined by the sampled data point x, an evaluation category y corresponding to the sampled data point x and a target t, and can be expressed as an identity form:
d(x)≡(x,y,t)
The results of the evaluation of all sampled data points were collected and defined as evaluation dataset D expl:
Dexpl={d(x1),d(x2),d(x3)...,d(xn)}
and step 3, a core model in machine learning is introduced to establish a prediction evaluation model by performing evaluation training on the evaluation data set, then, based on the prediction evaluation model, iterative operation is performed in the data space of interest by utilizing the self-adaptive optimizing function, so as to obtain optimal data points, and the obtained optimal data points form a corresponding high-quality data set.
And 3.1, performing evaluation training on the evaluation data set D expl to construct an evaluation prediction model epsilon.
An estimated prediction model epsilon is constructed based on two independent kernel model estimators of a Support Vector Machine (SVM) and a kernel-ridge regression (KRR), and is an estimated prediction of an unevaluated data point in a data space.
Defining an evaluation prediction model epsilon as a function of the evaluation dataset D expl and the data spaceThe tensor product of (a) is mapped to the tensor product of the solution space S and the prediction result probability space η p, specifically expressed as:
to further demonstrate the mechanism of the above model, the following analysis and explanation is given:
The solution space S contains the predictive evaluation category (Η is the classification space) and prediction targets(
Τ is the target space), where SVM is used for the resultKRR is used to predict optimal target parameters
Based on the kernel model, it is assumed that there is one slave data spaceMapping phi to feature space F:
with the inner product < phi (x), phi (x') > representing the data space The gaussian kernel k of the two sampled data points x and x', specifically, represents:
<φ(x),φ(x′)>=k(x,x′)
where γ is the hyper-parameter of the feature space metric.
Since the feature space of the SVM is not necessarily the same as that of the KRR, there are two feature space mappings of the SVM and the KRR based on the above model derivation, namely:
And:
phi SVM represents the data space Mapping to the classification feature space F S, phi KRR represents the data spaceMapping to the target regression feature space F R. Wherein the hyper-parameters in the gaussian kernels k SVM (x, x ') and k KRR (x, x') are γ SVM and γ KRR, respectively.
The prediction result probability space eta p contains the effective prediction evaluation class probability obtained under the prediction evaluation condition of the data point xThe concrete steps are as follows:
in the above formula, f (x) represents a prediction result output by the prediction evaluation model, and a and b are probability model parameters.
Briefly, the evaluation prediction model ε is an evaluation of predicted data points in the data space of interest and their associated probabilities based on the evaluation of the previous evaluation dataset D expl.
And 3.2, performing iterative operation in the interested data space by using the self-adaptive optimizing function to obtain an optimal data point x new.
The self-adaptive optimizing function U (U, w) is constructed to obtain an optimal data point x new, and the self-adaptive optimizing function U (U, w) constructed by the invention consists of an optimizing vector U and a weight vector w, and the specific expression form is as follows:
Wherein, the i w 1 denotes the l1 norm of w, the superscript T denotes a transpose.
I.e., the optimal data point x new can be expressed as:
to further demonstrate the mechanism of the above model, the following analysis and explanation is given:
the optimizing vector u consists of three parts, specifically defined as:
Wherein U S predicts probability for data point class, which evaluates class probability based on effective prediction The reliability of the prediction data point evaluation category is represented by shannon information entropy S, which is specifically expressed as:
U o is the target prediction reliability, and according to the prediction target Establishing a target prediction reliability judging mechanism with an existing target t, wherein the specific form is as follows:
In the above formula, the obtained prediction targets in the estimated prediction model ε are evaluated by using the maximum target and the minimum targets t max and t min obtained under the extreme conditions Repartitioning to achieve normalization thereof, i.ePredicting targetsNormalization is shown in the following formula:
Wherein, the maximum and minimum targets t max and t min are obtained in the evaluation of the initial dataset D init in step2, and the constraint condition of the acquisition is that the data evaluation category is valid, that is, s.t.y=valid, and the maximum and minimum targets t max and t min are specifically expressed as:
u r is a classification feature space distance, and specifically expressed as:
wherein, 2 represents l2 norm, e is natural logarithmic base, gamma C represents one hyper-parameter of the kernel estimator.
By definition, the more similar the nearest neighbor in sampled data point x and D expl, the smaller U r, and therefore U r can be guaranteed to be in spaceIs to explore new data points x new.
The weight vector w is defined as:
The components s more than or equal to 0, o more than or equal to 0 and r more than or equal to 0 of the weight vector determine the influence of U S、Uo and U r on data point optimization respectively, namely w determines the data mining exploration degree.
In summary, the adaptive optimization function U is constructed based on the optimization vector U and the weight vector w. Wherein the optimizing vector u consists of three components with different meanings. These components are weighted by weights s, o and r in the weight vector w. Thus, the degree of data mining can be controlled by adjusting the weights.
And 3.3, evaluating the optimal data point x new to obtain a high-quality data set.
The optimal data point x new obtained by constructing the adaptive optimization function U (U, w) can be expressed specifically as:
using the mapping relationship established in step 2 I.e., evaluate x new using the data evaluation model, and return the evaluation result of x new, i.e., D (x new), to the premium dataset D expl.
Step 3.4, in each successive iteration from step 3.1 to step 3.3, an optimal data x new is obtained, D (x new) is returned to D expl, so that the data set D expl can be updated from the results collected from N iterations, and when the total number of evaluation data points reaches the preset value N, the iteration is ended, and the data mining is completed, so as to obtain a high-quality data set D expl.
And step 4, an accurate and efficient traditional Chinese medicine production optimization decision model is trained through data in the high-quality data set, accurate control of a traditional Chinese medicine production process is realized based on the traditional Chinese medicine production optimization decision model, and optimization of a traditional Chinese medicine production process is realized.
The data in the high-quality data set D expl is the effective production data which are mined, the convolutional neural network model is trained by the data, after the network model is trained, the real-time traditional Chinese medicine production data are input into the convolutional neural network model, the convolutional neural network model outputs traditional Chinese medicine production data values at the next moment, namely prediction production data, meanwhile, the prediction production data and a preset ideal production data range are subjected to interval judgment and corresponding production decisions are made, if the interval range is exceeded, corresponding technological parameter values are adjusted in the traditional Chinese medicine production process according to the fact that the specific ideal interval range is exceeded, and therefore the accurate control of the traditional Chinese medicine production technological process is achieved, and finally the aim of optimizing the production process is achieved.
Examples:
taking the common alcohol precipitation procedure in the production of traditional Chinese medicines as an example, the alcohol flow rate, the alcohol concentration, the alcohol temperature and the volume of the supernatant fluid of alcohol precipitation are used as key parameters of the process in the alcohol precipitation procedure. The data mining model applied to the traditional Chinese medicine production process is used for carrying out data mining exploration on the technological parameters in the alcohol precipitation process, establishing a production decision method, realizing accurate control on the alcohol precipitation process, and optimizing the alcohol precipitation process. The method of practicing the invention is described in conjunction with FIG. 1.
1) Firstly, inputting data points in key parameters of an alcohol precipitation process, wherein the data space is a four-dimensional data space consisting of alcohol adding flow rate F V epsilon [ a, b ], alcohol concentration C E epsilon [ C, d ], alcohol temperature T E epsilon [ e, F ] and alcohol precipitation supernatant volume V SL epsilon [ g, h ].
Using the parameter vector x:
the data space of (2) can thus be written as
Using the data sampling function established in step 1In the parameter subspaceSetting up grid sampling window in the middleFor data spaceData sampling is performed on the input data points in (a) to obtain an initial data set D init=D(x1,x2,x3,...,xn).
In addition, in order to better perform data mining and searching on the technological parameters in the alcohol precipitation process, the ratio of the concentrated solution to the ethanol fully mixed in the alcohol precipitation process is taken as a target t. I.e.
Wherein m 0 is the dosage of the concentrated solution in the alcohol precipitation, m 1 is the dosage of the ethanol, m 2 is the total mass of the obtained supernatant, S 0 is the total solid mass content of the concentrated solution, and S 2 is the total solid content of the supernatant. H is the index component retention rate.
2) Data evaluation
Evaluating the data points in the initial data set D init of the alcohol precipitation process, and performing mapping by using the data evaluation model in the step 2Obtaining a data point evaluation result d (x)
For mappingExpressed by data evaluation category y and target t, respectively:
The optimal data point is obtained by utilizing the maximum value of the solving target t. An evaluated dataset D expl was obtained.
3) Search optimizing
According to step 3, an estimated predictive model epsilon is constructed based on two independent kernel model estimators, a Support Vector Machine (SVM) and a kernel-ridge regression (KRR). Based on the prediction result obtained by evaluating the prediction model epsilon, the optimal data point x new in the alcohol precipitation process is obtained through the constructed self-adaptive optimizing function U (U, w).
4) Evaluation of x new
Using the mapping relationship established in step 2I.e., x new is evaluated using the data evaluation model, and the evaluation result d (x new) is returned.
5) Iterative computation
Dexpl∪d(xnew)→Dexpl
Updating D expl;
6) When n=n, the data mining is ended, D expl is output, and the data mining library is updated.
7) Establishing an alcohol precipitation process optimization decision model based on data mining
Training the effective data of the dug alcohol precipitation procedure on the convolutional neural network model. After the model is trained, the convolutional neural network model outputs the production data value of the alcohol precipitation process at the next moment through calculation, namely, the predicted production data. Meanwhile, the predicted production data and a preset ideal production data range are subjected to interval judgment, and a corresponding alcohol precipitation production decision is made. If the range is out of the range, the relevant process parameter values such as alcohol adding flow rate, alcohol concentration, alcohol temperature and alcohol precipitation supernatant volume are adjusted in the alcohol precipitation process of the traditional Chinese medicine according to the specific out of the ideal range. Thereby achieving the accurate control of the process of the alcohol precipitation of the traditional Chinese medicine and optimizing the alcohol precipitation of the traditional Chinese medicine.
In order to prove the effectiveness of the process optimization method based on the traditional Chinese medicine production data mining, for the alcohol precipitation process, an alcohol precipitation process optimization decision model is established by the method and a traditional method (namely, the original production data of the alcohol precipitation process are adopted), a comparison test is carried out, the experimental result is shown as fig. 2, the accuracy of the optimization decision of the method is about 98% under the condition that the model converges stably, and the optimization decision of the traditional method is about 83%. The method provided by the invention is higher than the traditional method in model convergence and optimization decision accuracy. Experimental results prove that the process optimization method based on traditional Chinese medicine production data mining has good accuracy and high efficiency.
Claims (7)
1. A process optimization method based on traditional Chinese medicine production data mining is characterized by comprising the following steps:
firstly, data sampling is carried out on collected traditional Chinese medicine production data through a data sampling function to obtain an initial data set, and the method comprises the following steps:
Inputting data points in key parameters of an alcohol precipitation process, wherein the data space is a four-dimensional data space formed by alcohol adding flow rate F V epsilon [ a, b ], alcohol concentration C E epsilon [ C, d ], alcohol temperature T E epsilon [ e, F ] and alcohol precipitation supernatant volume V SL epsilon [ g, h ];
Using the parameter vector x:
data space writing
Using data sampling functionsIn the parameter subspaceSetting up grid sampling window in the middle For data spaceData sampling is carried out on the input data points in the data acquisition module to obtain an initial data set D init=D(x1,x2,x3,...,xn);
in order to better perform data mining exploration on the technological parameters in the alcohol precipitation process, the ratio of the concentrated solution to the ethanol fully mixed in the alcohol precipitation process is taken as a target t, namely
Wherein, m 0 is the dosage of concentrated solution in alcohol precipitation, m 1 is the dosage of ethanol, m 2 is the total mass of the obtained supernatant, S 0 is the total solid mass content of the concentrated solution, S 2 is the total solid content of the supernatant, and H is the retention rate of index components;
Secondly, carrying out validity evaluation on sampling data points in an initial data set by using the constructed data evaluation model to obtain an evaluation data set, and building an evaluation prediction model by carrying out evaluation training on the evaluation data set by a kernel model in machine learning, wherein the evaluation prediction model is built based on two independent kernel model estimators of a support vector machine and kernel ridge regression;
Based on a prediction result obtained by evaluating the prediction model, obtaining an optimal data point x new in an alcohol precipitation process through a self-adaptive optimizing function U (U, w) constructed by an optimizing vector U and a weight vector w, and forming a series of obtained optimal data points into a corresponding high-quality data set;
And finally, training the data in the high-quality data set to obtain an alcohol precipitation process optimization decision model, outputting an alcohol precipitation process production data value at the next moment, namely predicted production data, by the convolutional neural network model through calculation after the model training, meanwhile, judging the interval between the predicted production data and a preset ideal production data range, making a corresponding alcohol precipitation production decision, and adjusting relevant process parameter values in the alcohol precipitation process of the traditional Chinese medicine according to the condition that the interval exceeds the ideal interval range if the interval exceeds the interval range, so as to accurately control the alcohol precipitation process of the traditional Chinese medicine and optimize the alcohol precipitation process of the traditional Chinese medicine.
2. The method of claim 1, wherein the performing validity assessment on the sampled data points in the initial data set using the constructed data assessment model to obtain an assessment data set comprises:
Defining a map
Wherein, the Representing the number field, p representing the dimension of the field, mappingFrom p-dimensional data spaceMapping of sampled data points in (a) to a solution space S;
the solution space S is defined as:
the solution space S is composed of a tensor product of two parts, namely a classification space η and a target space τ, wherein:
The classification space eta is used for describing whether the sampling data point is a valid data point or an invalid data point, the classification space eta is represented by a set formed by the valid data and the invalid data, the data evaluation class is y, and y is obtained by the sampling data point x through the mapping Obtained, expressed asTarget space τ, defined as Representing a number field;
Mapping D init through a data evaluation model to obtain a data point evaluation result D (x), wherein the data point evaluation result D (x) is determined by the sample data point x, an evaluation category y corresponding to the sample data point x and the target t, and D (x) is identical to (x, y, t);
The results of the evaluation of all sampled data points are collected and defined as evaluation dataset D expl.
3. The process optimization method based on traditional Chinese medicine production data mining according to claim 2, wherein the establishing an evaluation prediction model comprises:
Constructing an evaluation prediction model epsilon based on two independent kernel model estimators of a support vector machine SVM and a kernel ridge regression KRR, wherein the evaluation prediction model is used for performing evaluation prediction on data points which are not evaluated in a data space;
Defining an evaluation prediction model epsilon as a function of the evaluation dataset D expl and the data space The tensor product of (a) is mapped to the tensor product of the solution space S and the prediction result probability space η p, specifically expressed as:
4. the method of claim 3, wherein the solution space S comprises a predictive evaluation category in the evaluation prediction model And predicting the target Wherein SVM is used for the resultKRR is used to predict optimal target parameters
Based on the kernel model, it is assumed that there is one slave data spaceMapping phi to feature space F:
with the inner product < phi (x), phi (x') > representing the data space A gaussian kernel k of two sampled data points x and x';
Mapping of two feature spaces, namely SVM and KRR:
And:
phi SVM represents the data space Mapping to the classification feature space F S, phi KRR represents the data spaceMapping to a target regression feature space F R, wherein the hyper-parameters in the Gaussian kernels k SVM (x, x ') and k KRR (x, x') are gamma SVM and gamma KRR, respectively;
The prediction result probability space eta p contains the effective prediction evaluation class probability obtained under the prediction evaluation condition of the sampling data point x The concrete steps are as follows:
in the above formula, f (x) represents a prediction result output by the evaluation prediction model, and a and b are probability model parameters.
5. The process optimization method based on traditional Chinese medicine production data mining according to claim 1, wherein performing iterative operation in the data space of interest by using the adaptive optimizing function to obtain optimal data points comprises:
constructing an adaptive optimizing function U (U, w) to obtain an optimal data point x new, wherein the adaptive optimizing function U (U, w) consists of an optimizing vector U and a weight vector w, and is specifically as follows:
Wherein, the i w 1 denotes the l1 norm of w, the superscript T denotes a transpose;
I.e., the optimal data point x new is represented as:
6. the process optimization method based on traditional Chinese medicine production data mining according to claim 1, wherein the optimizing vector u is composed of three parts, specifically defined as:
Wherein U S predicts probability for data point class, which evaluates class probability based on effective prediction The reliability of the prediction data point evaluation category is represented by shannon information entropy S, which is specifically expressed as:
Wherein, the Representing an effective predictive assessment class probability;
U o is the target prediction reliability, and according to the prediction target Establishing a target prediction reliability judging mechanism with an existing target t, wherein the specific form is as follows:
In the above equation, the maximum and minimum targets t max and t min obtained in the evaluation of the initial dataset D init are evaluated using extremum conditions, and the resulting predicted targets in the evaluation prediction model ε Repartitioning to achieve normalization thereof, i.e
U r is a classification feature space distance, and specifically expressed as:
Wherein, |·| 2 represents l2 norms, e is a natural logarithmic base, and γ C represents a hyper-parameter;
the weight vector w is defined as:
The components s more than or equal to 0, o more than or equal to 0 and r more than or equal to 0 of the weight vector determine the influence degree of U S、Uo and U r on data point optimization respectively.
7. The process optimization method based on traditional Chinese medicine production data mining according to claim 1, wherein the optimal data points x new obtained through the constructed adaptive optimizing function U (U, w) are specifically expressed as:
Using the established mapping relationship I.e., evaluate x new using the data evaluation model, and return the evaluation result of x new, i.e., D (x new), to the premium dataset D expl.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211273092.1A CN115525697B (en) | 2022-10-18 | 2022-10-18 | A process optimization method based on traditional Chinese medicine production data mining |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211273092.1A CN115525697B (en) | 2022-10-18 | 2022-10-18 | A process optimization method based on traditional Chinese medicine production data mining |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN115525697A CN115525697A (en) | 2022-12-27 |
| CN115525697B true CN115525697B (en) | 2025-10-03 |
Family
ID=84704515
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202211273092.1A Active CN115525697B (en) | 2022-10-18 | 2022-10-18 | A process optimization method based on traditional Chinese medicine production data mining |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN115525697B (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117476183B (en) * | 2023-12-27 | 2024-03-19 | 深圳市一五零生命科技有限公司 | Construction system of autism children rehabilitation effect AI evaluation model |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110988153A (en) * | 2019-11-13 | 2020-04-10 | 浙江中医药大学 | Optimization method of ultrasonic extraction process for effective components of Salvia miltiorrhiza based on LS-SVM model |
| CN113505562A (en) * | 2021-07-05 | 2021-10-15 | 广东工业大学 | Clock tree comprehensive optimal strategy prediction method, system and application |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1997879A (en) * | 2004-05-14 | 2007-07-11 | 克莫麦特公司 | A method and a system for the assessment of samples |
| US20210164984A1 (en) * | 2018-04-13 | 2021-06-03 | INSERM (Institut National de la Santé et de la Recherche Médicale) | Methods for predicting outcome and treatment of patients suffering from prostate cancer or breast cancer |
-
2022
- 2022-10-18 CN CN202211273092.1A patent/CN115525697B/en active Active
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110988153A (en) * | 2019-11-13 | 2020-04-10 | 浙江中医药大学 | Optimization method of ultrasonic extraction process for effective components of Salvia miltiorrhiza based on LS-SVM model |
| CN113505562A (en) * | 2021-07-05 | 2021-10-15 | 广东工业大学 | Clock tree comprehensive optimal strategy prediction method, system and application |
Also Published As
| Publication number | Publication date |
|---|---|
| CN115525697A (en) | 2022-12-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Tian et al. | An intrusion detection approach based on improved deep belief network: An Intrusion Detection Approach Based on Improved Deep Belief Network | |
| CN109165664B (en) | Attribute-missing data set completion and prediction method based on generation of countermeasure network | |
| CN108491970B (en) | Atmospheric pollutant concentration prediction method based on RBF neural network | |
| CN110782658B (en) | Traffic prediction method based on LightGBM algorithm | |
| CN113095550B (en) | Air quality prediction method based on variational recursive network and self-attention mechanism | |
| CN114678080B (en) | Converter end point phosphorus content prediction model, construction method and phosphorus content prediction method | |
| CN109583635B (en) | Short-term load prediction modeling method for operational reliability | |
| CN116721537A (en) | Urban short-time traffic flow prediction method based on GCN-IPSO-LSTM combination model | |
| CN117277279A (en) | A deep learning short-term load forecasting method based on particle swarm optimization | |
| Chen | Mining of instant messaging data in the Internet of Things based on support vector machine | |
| CN116468170A (en) | Air quality time sequence prediction model research based on optimization variation modal decomposition | |
| CN114065996A (en) | Traffic flow prediction method based on variational self-coding learning | |
| CN114091333A (en) | An artificial intelligence prediction method for shale gas content based on machine learning | |
| CN112434888A (en) | PM2.5 prediction method of bidirectional long and short term memory network based on deep learning | |
| Ma et al. | Estimation of Gaussian overlapping nuclear pulse parameters based on a deep learning LSTM model | |
| CN115525697B (en) | A process optimization method based on traditional Chinese medicine production data mining | |
| CN119673325A (en) | A water quality prediction method and related device based on double decomposition and hybrid model | |
| CN120067581A (en) | Overhead transmission line current-carrying capacity self-correction prediction method based on multi-combination mode decomposition and integrated prediction model | |
| Zhong et al. | Multi-scale persistent spatiotemporal transformer for long-term urban traffic flow prediction | |
| CN119167325A (en) | A tropical cyclone maximum sustained wind speed prediction method based on deep learning | |
| Liu et al. | SFM-GMDH: Sparse feature mapping GMDH network for time series prediction | |
| CN115034478B (en) | A Traffic Flow Forecasting Method Based on Domain Adaptation and Knowledge Transfer | |
| CN114861759B (en) | A distributed training method for linear dynamic system models | |
| Chen | Brain Tumor Prediction with LSTM Method | |
| Huang et al. | Estimating missing data for sparsely sensed time series with exogenous variables using bidirectional-feedback echo state networks |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant |