CN104820775A - Discovery method of core drug of traditional Chinese medicine prescription - Google Patents
Discovery method of core drug of traditional Chinese medicine prescription Download PDFInfo
- Publication number
- CN104820775A CN104820775A CN201510183745.0A CN201510183745A CN104820775A CN 104820775 A CN104820775 A CN 104820775A CN 201510183745 A CN201510183745 A CN 201510183745A CN 104820775 A CN104820775 A CN 104820775A
- Authority
- CN
- China
- Prior art keywords
- prescription
- drug
- algorithm
- clustering
- prescriptions
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Medical Treatment And Welfare Office Work (AREA)
Abstract
一种中药方剂核心药物的发现方法,由改进聚类算法和加权TF-IDF算法两部分组成,聚类算法包括方剂数据的预处理、聚类距离函数的选择和聚类挖掘算法三部分,其中方剂数据的预测理将方剂数据处理成适合聚类算法的模型;聚类距离的选择用于选择合理的聚类距离函数;距离挖掘算法用于将相似的方剂聚类成一个簇;加权TF-IDF算法用于计算药物的权重,发明的权重计算公式结合聚类结果、药物顺序重要度、TF-IDF算法三部分;算法具有较高的准确性。
A method for discovering the core drugs of traditional Chinese medicine prescriptions, which consists of two parts: an improved clustering algorithm and a weighted TF-IDF algorithm. The clustering algorithm includes three parts: preprocessing of prescription data, selection of clustering distance functions, and clustering mining algorithm. The prediction theory of prescription data processes the prescription data into a model suitable for the clustering algorithm; the selection of clustering distance is used to select a reasonable clustering distance function; the distance mining algorithm is used to cluster similar prescriptions into a cluster; the weighted TF- The IDF algorithm is used to calculate the weight of the drug. The invented weight calculation formula combines three parts: the clustering result, the order importance of the drug, and the TF-IDF algorithm; the algorithm has high accuracy.
Description
技术领域:Technical field:
本发明主要涉及中药方剂核心药物的发现,用于挖掘治疗某种病症的方剂中的核心药物。The invention mainly relates to the discovery of core medicines of traditional Chinese medicine prescriptions, and is used for excavating the core medicines in prescriptions for treating certain diseases.
背景技术:Background technique:
药物是方剂的基本组成成份。众所周知,“君臣佐使”是中医组方的基本原则。方剂的药物按照其在方剂中所起的作用分别分为君药、臣药、佐药、使药,简称为“君臣佐使”。各种药在方剂中所起的作用是不一样的。找到中药方剂中对治疗某种疾病起主要作用的核心药物,能够揭示中药方剂配伍中的用药规律,对于年轻的中医从业者学习名老中医经验、掌握中医理论精髓以及进一步研究中医理论,有着非常重要的作用。Drugs are the basic components of prescriptions. As we all know, "the emperor, ministers, assistants and envoys" is the basic principle of formulating prescriptions in traditional Chinese medicine. According to the functions they play in the prescription, the medicines in the prescription are divided into monarch medicine, minister medicine, adjuvant medicine and envoy medicine, which are referred to as "junchen, adjuvant and envoy". Various medicines play different roles in prescriptions. Finding the core drug that plays a major role in the treatment of a certain disease in a traditional Chinese medicine prescription can reveal the medication rules in the compatibility of traditional Chinese medicine prescriptions. It is of great significance for young Chinese medicine practitioners to learn the experience of famous and old Chinese medicine practitioners, master the essence of Chinese medicine theory, and further study Chinese medicine theory. important role.
现存方剂数据库已有近十万首方剂,涉及一万多种药物。针对某种特定疾病的方剂往往也涉及几百首方剂和药物。传统的通过人工的方法去提取这些方剂的核心药物已经无法适应现代需求,迫切需要计算机辅助方法。The existing prescription database has nearly 100,000 first prescriptions, involving more than 10,000 medicines. Prescriptions for a specific disease often involve hundreds of prescriptions and drugs. The traditional manual method to extract the core drugs of these prescriptions can no longer meet the modern needs, and computer-assisted methods are urgently needed.
目前关于中药方剂核心药物的挖掘,主要有基于频次的方法和基于PageRank的方法。基于频次的方法容易受药物出现频次的影响,挖掘结果不够准确。基于PageRank的方法也存在排名不够合理,算法相对难以理解等特点,不能很好满足需求。At present, there are mainly frequency-based methods and PageRank-based methods for mining the core drugs of traditional Chinese medicine prescriptions. The frequency-based method is easily affected by the frequency of drug occurrence, and the mining results are not accurate enough. The method based on PageRank also has the characteristics that the ranking is not reasonable enough, and the algorithm is relatively difficult to understand, which cannot meet the needs well.
发明内容:Invention content:
本发明需要解决的技术问题是,提供一种中药方剂核心药物的方法,尤其是基于改进K-Means聚类和加权TF-IDF的中药方剂核心药物提取方法,主要针对目前已有方法容易受药物出现频次影响、挖掘结果不够准确、算法复杂等问题,提出的通用型、准确有效、合理的中药方剂核心药物挖掘方法。The technical problem to be solved in the present invention is to provide a method for the core drug of traditional Chinese medicine prescriptions, especially the method for extracting the core drugs of traditional Chinese medicine prescriptions based on improved K-Means clustering and weighted TF-IDF, which is mainly aimed at the current existing methods that are easily affected by drugs. A common, accurate, effective, and reasonable mining method for core drugs of traditional Chinese medicine prescriptions is proposed due to problems such as frequency impact, inaccurate mining results, and complex algorithms.
本发明解决上述问题所采取的技术方案为:一种中药方剂核心药物的发现方法即基于改进K-Means聚类和加权TF-IDF的中药方剂核心药物提取方法,其特征在于,由改进聚类算法和加权TF-IDF算法两部分组成,聚类算法包括方剂数据的预处理、聚类距离函数的选择和聚类挖掘算法三部分,其中方剂数据的预测理将方剂数据处理成适合聚类算法的模型;聚类距离的选择用于选择合理的聚类距离函数;距离挖掘算法用于将相似的方剂聚类成一个簇;The technical scheme adopted by the present invention to solve the above problems is: a method for discovering the core drug of a traditional Chinese medicine prescription, that is, a method for extracting the core drug of a traditional Chinese medicine prescription based on improved K-Means clustering and weighted TF-IDF. The algorithm and the weighted TF-IDF algorithm are composed of two parts. The clustering algorithm includes three parts: the preprocessing of the prescription data, the selection of the clustering distance function and the clustering mining algorithm. model; the selection of clustering distance is used to select a reasonable clustering distance function; the distance mining algorithm is used to cluster similar prescriptions into a cluster;
加权TF-IDF算法用于计算药物的权重,发明的权重计算公式结合聚类结果、药物顺序重要度、TF-IDF算法三部分;The weighted TF-IDF algorithm is used to calculate the weight of drugs, and the invented weight calculation formula combines clustering results, drug order importance, and TF-IDF algorithm in three parts;
所述的方剂数据的预处理,其采用的是向量空间模型。每首方剂抽象成一个向量,方剂中的药物表示为向量的某一维。如果方剂包含某种药物,则其对应的维为1,否则为0;The preprocessing of the prescription data uses a vector space model. Each prescription is abstracted into a vector, and the medicine in the prescription is expressed as a certain dimension of the vector. If the prescription contains a certain drug, its corresponding dimension is 1, otherwise it is 0;
所述的聚类距离函数的选择,采用的是余弦距离函数,其距离为:能够合理度量两个方剂的相似性;这里αi,βi分别是方剂向量;The selection of the described clustering distance function adopts the cosine distance function, and its distance is: It can reasonably measure the similarity of two prescriptions; here αi and βi are the prescription vectors respectively;
所述的聚类挖掘算法,其采用的是改进的基于节点部分分配的K-Means算法;算法预先设置一个阈值α,在将节点分配到中心点的时候,对于到所有中心点的距离都超过α的节点,暂时不把它分配到任何中心节点所表示的聚类;这样在一轮分配结束的时候可能会存在一些未被分配的节点。在下一轮分配的时候,从这些节点中在随机选取一些种子节点作为中心点;这样通过不断的迭代,最终数据集中的每个节点都会被分配到合适的分类中;Described cluster mining algorithm, what it adopts is the improved K-Means algorithm based on node partial distribution; Algorithm pre-sets a threshold α, when assigning nodes to central points, for the distance to all central points all exceeds The node of α is temporarily not assigned to any cluster represented by the central node; so there may be some unassigned nodes at the end of a round of assignment. In the next round of allocation, some seed nodes are randomly selected from these nodes as the center point; in this way, through continuous iteration, each node in the final data set will be assigned to the appropriate classification;
所述的药物顺序重要度,其指的是方剂组成中某一药物的重要程度;其定义为:这里hi是方剂中的第i味药物,I(hi)为药物hi的顺序重要度;药物h在所有方剂中的总重要度定义为: The importance of the drug sequence refers to the importance of a certain drug in the composition of the prescription; it is defined as: Here h i is the i- th drug in the prescription, and I(hi) is the order importance of drug h i ; the total importance of drug h in all prescriptions is defined as:
所述的TF-IDF算法,指的是信息学中的词频-逆文档频率算法;一个词的权重定义为:这里ni,j是词频,表示单词ti在文件dj出现次数。|D|表示语料库中的文件总数,|{j:ti∈dj}|表示包含单词ti的文件的数目;The TF-IDF algorithm refers to the word frequency-inverse document frequency algorithm in informatics; the weight of a word is defined as: Here n i, j is the word frequency, which means the number of occurrences of word t i in file d j . |D| indicates the total number of documents in the corpus, and |{j:t i ∈ d j }| indicates the number of documents containing word t i ;
         根据下式计算药物h的权重W(h,x),用于计算药物h在治疗某种病x的权重指数,定义为:
         公式中的count(h∈cj)定义为
本发明的有益效果:主要针对目前已有方法容易受药物出现频次影响、挖掘结果不够准确、算法复杂等问题,提出的通用型、准确有效、合理的中药方剂核心药物挖掘方法。本发明主要涉及中医药方剂的核心药物挖掘。聚类的目的是为了降低相似方剂对核心药物挖掘结果的影响。与现有技术相比,其显著有点为:Beneficial effects of the present invention: mainly aiming at the problems that the current existing methods are easily affected by the frequency of drug occurrence, the mining results are not accurate enough, and the algorithm is complicated, etc., a general, accurate, effective, and reasonable method for mining core drugs of traditional Chinese medicine prescriptions is proposed. The invention mainly relates to core medicine mining of traditional Chinese medicine prescriptions. The purpose of clustering is to reduce the impact of similar prescriptions on the core drug mining results. Compared with the existing technology, its notable points are:
(1)实现简单:本发明涉及的实现过程十分简单,结构清晰明了,适合治疗各(1) simple to realize: the realization process involved in the present invention is very simple, and the structure is clear and clear, and is suitable for treating various
种不同病症方剂的核心药物的发现。The discovery of the core drug of a prescription for different diseases.
(2)不受方剂、药物频次影响:本发明采用聚类对方剂进行处理,对于多次出现的类似方剂只计算一次。基于TF-IDF算法降低百搭药物频次的影响。(2) Not affected by the frequency of prescriptions and medicines: the present invention uses clustering prescriptions for processing, and only calculates once for similar prescriptions that appear multiple times. Based on the TF-IDF algorithm to reduce the impact of the frequency of wild drugs.
(3)准确度高:本发明同时考虑类似方剂的聚类、药物顺序重要度、TF-IDF算法,综合设计药物权重公式,经实验结果对比分析,其准确率高。(3) High accuracy: the present invention considers the clustering of similar prescriptions, the order importance of medicines, and the TF-IDF algorithm simultaneously, and comprehensively designs the weight formula of medicines. Through comparative analysis of experimental results, the accuracy rate is high.
附图说明:Description of drawings:
图1为本发明方法总体流程图;Fig. 1 is the overall flowchart of the method of the present invention;
图2为本发明聚类算法伪代码图;Fig. 2 is the pseudocode figure of clustering algorithm of the present invention;
图3为本发明核心药物算法伪代码图。Fig. 3 is a pseudocode diagram of the core drug algorithm of the present invention.
具体实施方式:Detailed ways:
本发明主要涉及中药方剂核心药物的发现,用于挖掘治疗某种病症的方剂中的核心药物。主要针对目前已有方法容易受药物出现频次影响、挖掘结果不够准确、算法复杂等问题,提出的通用型、准确有效、合理的中药方剂核心药物挖掘方法。The invention mainly relates to the discovery of core medicines of traditional Chinese medicine prescriptions, and is used for excavating the core medicines in prescriptions for treating certain diseases. This paper proposes a general, accurate, effective and reasonable mining method for the core drugs of traditional Chinese medicine prescriptions mainly for the problems that the existing methods are easily affected by the frequency of drug occurrence, the mining results are not accurate enough, and the algorithm is complex.
算法过程主要包含三个步骤:1)方剂数据的预处理;2)方剂数据的聚类处理阶段;3)核心药物的提取。真实的实验测试结果表明,算法具有较高的准确性,能够提取针对某种病有效的核心药物。The algorithm process mainly includes three steps: 1) Preprocessing of prescription data; 2) Cluster processing stage of prescription data; 3) Extraction of core drugs. The real experimental test results show that the algorithm has high accuracy and can extract core drugs that are effective for a certain disease.
以下结合附图详细介绍整个系统的实施过程:The implementation process of the entire system is described in detail below in conjunction with the accompanying drawings:
参见图1,本发明的总体架构为:Referring to Fig. 1, the overall framework of the present invention is:
1、中药方剂数据的预处理:1. Preprocessing of traditional Chinese medicine prescription data:
①为了满足聚类的要求,首先对方剂数据进行预处理。中药方剂是文本型数据,因此本发明采用VSM(vector space model)模型。将每首方剂抽象成一个向量,方剂中的药物表示为向量的某一维。如果方剂包含某种药物,则其对应的维为1,否则为0。因此,方剂集合抽象为一个以药物为属性列,每首方剂为一行的0,1矩阵。对应为图1中的步骤1.① In order to meet the requirements of clustering, firstly, preprocess the data of the antidote. Traditional Chinese medicine prescriptions are text data, so the present invention adopts VSM (vector space model) model. Each prescription is abstracted into a vector, and the medicine in the prescription is expressed as a certain dimension of the vector. If the prescription contains a certain drug, its corresponding dimension is 1, otherwise it is 0. Therefore, the prescription collection is abstracted as a 0,1 matrix with drug as the attribute column and each prescription as a row. Corresponds to step 1 in Figure 1.
2、中药方剂的聚类:2. Clustering of traditional Chinese medicine prescriptions:
②聚类首先要选择方剂距离的度量。本发明选择的是余弦距离,其定义为:这里αi,βi分别是方剂向量。对应为图1步骤2。②Clustering must first choose the measure of prescription distance. What the present invention selects is cosine distance, and it is defined as: Here αi, βi are prescription vectors respectively. Corresponding to step 2 in Figure 1.
③调用“基于节点部分分配的K-Means算法”对整个方剂数据库和治疗某种病X的方剂分别进行聚类。聚类的目的是降低类似方剂对药物权重计算的影响。本发明的聚类算法是对原始K-Means算法的改进,具有以下特性:1)能够降低噪声数据对聚类结果的影响;2)能够降低初始种子点对聚类结果的影响;3)通过融合策略和从未选取的节点中重新选取种子节点的策略,改变了传统的k-均值做法中聚类个数固定的这一点,使得最终的聚类的个数与初始的中心点的个数的选取无关,从而能够使得结果尽可能的收敛于真实的结果。对应为图1步骤3③Invoke the "K-Means algorithm based on node partial distribution" to cluster the entire prescription database and the prescriptions for treating a certain disease X respectively. The purpose of clustering is to reduce the influence of similar prescriptions on the calculation of drug weights. The clustering algorithm of the present invention is an improvement to the original K-Means algorithm, and has the following characteristics: 1) can reduce the impact of noise data on the clustering result; 2) can reduce the impact of the initial seed point on the clustering result; 3) pass The fusion strategy and the strategy of reselecting seed nodes from unselected nodes have changed the fact that the number of clusters is fixed in the traditional k-means approach, making the final number of clusters the same as the number of initial center points It has nothing to do with the selection of , so that the results can converge to the real results as much as possible. Corresponding to step 3 in Figure 1
3、调用“核心药物挖掘算法”:3. Call the "core drug mining algorithm":
对于治疗某种病X方剂中的每种药物,调用本发明定义的公式计算其重要度,提取重要度值大于所有药物重要度平均值avg_w的药物作为核心药物。对应为图1步骤4和步骤5。For each medicine in the prescription for treating a certain disease X, call the formula defined in the present invention to calculate its importance, and extract the medicine whose importance value is greater than the average value avg_w of all medicines as the core medicine. Corresponding to step 4 and step 5 in Figure 1.
参见图2,本发明“基于节点部分分配的K-Means算法”为聚类核心算法,由于中药方剂中包含大量的加减方,频繁出现的类似方剂会影响实验结果。因此聚类的目的是找出相似方剂集合以降低频繁出现方剂的影响,提高核心药物提取准确率。其详细流程为:Referring to Fig. 2, the "K-Means algorithm based on node partial distribution" of the present invention is a clustering core algorithm. Since traditional Chinese medicine prescriptions contain a large number of addition and subtraction prescriptions, frequently occurring similar prescriptions will affect the experimental results. Therefore, the purpose of clustering is to find a set of similar prescriptions to reduce the impact of frequent prescriptions and improve the accuracy of core drug extraction. Its detailed process is:
步骤11是算法的开始。Step 11 is the beginning of the algorithm.
步骤12是输入算法的参数p,a,和数据集data_set。这里p可以选择1或2,表示先随机选择p%个点做为中心点。a表示一个节点到聚类中心的最大距离,故其选择策略可以通过样本抽样计算样本数据中同一聚类中内部点的距离来得到。data_set为数据预处理后的方剂数据矩阵。Step 12 is to input the parameters p, a, and data set data_set of the algorithm. Here p can choose 1 or 2, which means that p% points are randomly selected as the center point first. a represents the maximum distance from a node to the cluster center, so its selection strategy can be obtained by calculating the distance of internal points in the same cluster in the sample data through sample sampling. data_set is the prescription data matrix after data preprocessing.
步骤13从数据集中随机选择p%个聚类中心点,存于k_centers中。Step 13 randomly selects p% cluster center points from the data set and stores them in k_centers.
步骤14判断聚类是否满足收敛条件,若满足,执行步骤32。否则执行16.这里聚类收敛的条件是:聚类中心点不再变化,或者前后两次迭代的聚类结果不再变化。Step 14 judges whether the clustering meets the convergence condition, and if so, executes step 32 . Otherwise, go to 16. The condition for clustering convergence here is: the clustering center point does not change anymore, or the clustering results of the previous two iterations do not change.
步骤15令变量i=0;Step 15 makes variable i=0;
步骤16判断i是否小于data_set.size(),若成立执行步骤17,否则跳转到步骤29。Step 16 judges whether i is smaller than data_set.size(), if true, execute step 17, otherwise jump to step 29.
步骤17令变量xi=data_set[i],即第i个点向量;Step 17 makes variable xi=data_set[i], i.e. the i-th point vector;
步骤18令变量j=0;Step 18 makes variable j=0;
步骤19判断j是否小于centers.size(),若成立执行步骤20,否则跳转至步骤25;Step 19 judges whether j is smaller than centers.size(), if true, execute step 20, otherwise jump to step 25;
步骤20令变量centerj=k_enters[j],即第j个中心点向量;Step 20 makes variable centerj=k_enters[j], i.e. the jth center point vector;
         步骤21计算点xi与centerj的距离d,本发明定义的distance函数为余弦函数,
步骤22判断j是否为0,或者j不为0并且d的大小小于shortest_d,如果成立,则执行步骤23,否则执行步骤24;Step 22 judges whether j is 0, or j is not 0 and the size of d is smaller than shortest_d, if true, then execute step 23, otherwise execute step 24;
步骤23将d赋值给shortest_d,centerj赋值给shortest_c,即记录最短的距离和对应中心点。Step 23 assign d to shortest_d, and centerj to shortest_c, that is, record the shortest distance and the corresponding center point.
步骤24令变量j增加1,即j++;Step 24 increases the variable j by 1, i.e. j++;
步骤25判断shortest_d是否小于设定是阈值a,如果是则将该结点分配到距离自己最近的中心点所形成的聚类中,即执行步骤26,否则暂时不分配该点,执行步骤27;Step 25 judges whether shortest_d is smaller than the set threshold a, if so, assigns the node to the cluster formed by the center point closest to itself, that is, executes step 26, otherwise does not assign the point temporarily, executes step 27;
步骤26将向量xi加入到shortest_c中心点所属的聚类cluster中;Step 26: Add the vector xi to the cluster to which the center point of shortest_c belongs;
步骤27暂时不分配该点,即将向量xi加入到未分配集合unallocated_nodes中;Step 27 temporarily does not allocate this point, that is, add the vector xi to the unallocated_nodes set;
步骤28令变量i增加1,即i++;Step 28 increases the variable i by 1, i.e. i++;
步骤29重新计算各个聚类所形成的中心点,如果两个聚类满足特定的融合条件,则将两个聚类按照一定的融合策略,融合成一个聚类,这样做的目的是为了降低随机选取的初始中心点对聚类结果的影响,同时防止聚类的个数一直增加而导致了聚类结果划分过细的问题。Step 29 recalculates the central points formed by each cluster. If the two clusters meet specific fusion conditions, the two clusters are merged into one cluster according to a certain fusion strategy. The purpose of this is to reduce the random The influence of the selected initial center point on the clustering results, and at the same time prevent the number of clusters from increasing, which leads to the problem of too fine division of the clustering results.
本发明采取的融合策略分为四步:The fusion strategy that the present invention takes is divided into four steps:
1)计算两个聚类的中心点到各自集合中其余点的平均距离;1) Calculate the average distance from the center points of the two clusters to the remaining points in the respective sets;
2)计算当前两个聚类的中心点之间的距离,如果两个聚类中心点的距离小于各自聚类的中心点到聚类中其余点的平均距离的两倍,则将当前两个聚类作为融合的候选集合,跳转到第三步,否则不融合这两个聚类;2) Calculate the distance between the center points of the current two clusters. If the distance between the center points of the two clusters is less than twice the average distance from the center points of the respective clusters to the rest of the clusters, the current two Clustering is used as a fusion candidate set, skip to the third step, otherwise the two clusters are not fused;
3)将一个聚类中所有到另一个聚类中心点的距离小于两个聚类中心点本身距离的点放入到待融合的集合中,其余的点放到未分配点的集合中。3) Put all points in one cluster whose distance to another cluster center point is less than the distance between the two cluster center points themselves into the set to be fused, and put the rest of the points into the set of unassigned points.
4)计算待融合集合中所有的点形成的中心点,将这个中心点代替原来的两个集合中的中心点作为新的中心点加入到中心点集合中。4) Calculate the center point formed by all the points in the set to be fused, and replace the center point in the original two sets with this center point and add it to the center point set as a new center point.
步骤30从数据集中随机选择p%个聚类中心点,存于k_centers中.Step 30 randomly selects p% cluster center points from the data set and stores them in k_centers.
步骤31将步骤29融合后的中心点加入到k_centers。Step 31 adds the center points fused in step 29 to k_centers.
步骤32聚类算法结束;Step 32 clustering algorithm ends;
参见图3,本发明“核心药物的提取”详细算法流程为:Referring to Fig. 3, the detailed algorithm flow of "extraction of core drug" of the present invention is:
步骤40是算法的开始;Step 40 is the beginning of the algorithm;
步骤41从治疗病X的所有方剂中提取所有药物,存放在集合h_set中。步骤42令变量i=0,total_w=0。Step 41 extracts all medicines from all prescriptions for treating disease X, and stores them in the set h_set. Step 42 sets variable i=0, total_w=0.
步骤43判断i是否小于h_set.size(),即方剂药物的数量。若是则执Step 43 judges whether i is smaller than h_set.size(), that is, the quantity of prescription medicine. If so, execute
行步骤44,否则执行步骤49;Go to step 44, otherwise go to step 49;
步骤44令变量h=h_set[i],即第i种药物;Step 44 makes the variable h=h_set[i], i.e. the i-th drug;
步骤45根据公式计算药物h的权重h_w,本发明定义的公式为:Step 45 calculates the weight h_w of medicine h according to the formula, and the formula defined in the present invention is:
               
步骤46将药物h及其权重h_w以键值对的形式存入映射集合w_map中;Step 46 stores the drug h and its weight h_w in the mapping set w_map in the form of key-value pairs;
步骤47将计算得到的重要度值h_w累加到总重要度total_w中。Step 47 accumulates the calculated importance value h_w into the total importance total_w.
步骤48令变量i增加1,即i++;Step 48 increases the variable i by 1, i.e. i++;
步骤49用总重要度除以药物数量,计算得到平均重要度avg_w;Step 49 divides the total importance by the number of drugs to calculate the average importance avg_w;
步骤50重新令i=0;Step 50 makes i=0 again;
步骤51判断i是否小于h_set.size(),若是则执行步骤52,否则执行步骤56;Step 51 judges whether i is less than h_set.size(), if so, executes step 52, otherwise executes step 56;
步骤52令h=h_set[i],即第i中药物;Step 52 makes h=h_set[i], i.e. the ith medicine;
步骤53检查h的重要度h_w是否大于avg_w,若是,则执行步骤54,否则直接执行步骤55;Step 53 checks whether the importance h_w of h is greater than avg_w, if so, then execute step 54, otherwise directly execute step 55;
步骤54将药物h加入到集合core_herbs中;Step 54 adds drug h to the set core_herbs;
步骤55令变量i增加1,即i++;Step 55 increases the variable i by 1, i.e. i++;
步骤56得到核心药物集合core_herbs;Step 56 obtains the core drug set core_herbs;
步骤57算法结束。In step 57, the algorithm ends.
Claims (1)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN201510183745.0A CN104820775A (en) | 2015-04-17 | 2015-04-17 | Discovery method of core drug of traditional Chinese medicine prescription | 
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN201510183745.0A CN104820775A (en) | 2015-04-17 | 2015-04-17 | Discovery method of core drug of traditional Chinese medicine prescription | 
Publications (1)
| Publication Number | Publication Date | 
|---|---|
| CN104820775A true CN104820775A (en) | 2015-08-05 | 
Family
ID=53731070
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date | 
|---|---|---|---|
| CN201510183745.0A Pending CN104820775A (en) | 2015-04-17 | 2015-04-17 | Discovery method of core drug of traditional Chinese medicine prescription | 
Country Status (1)
| Country | Link | 
|---|---|
| CN (1) | CN104820775A (en) | 
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN106202888A (en) * | 2016-06-29 | 2016-12-07 | 北京千安哲信息技术有限公司 | A kind of method and device measuring Chinese crude drug similarity | 
| CN107799160A (en) * | 2017-10-26 | 2018-03-13 | 医渡云(北京)技术有限公司 | Medication aid decision-making method and device, storage medium, electronic equipment | 
| CN109271515A (en) * | 2018-09-19 | 2019-01-25 | 南京邮电大学 | A kind of antibiotic medicine method for risk stratification based on clustering | 
| CN110010251A (en) * | 2019-02-01 | 2019-07-12 | 华南师范大学 | A kind of Chinese medicine community information generation method, system, device and storage medium | 
| CN112133382A (en) * | 2020-09-24 | 2020-12-25 | 南京中爱人工智能与生命科学研究院有限公司 | Learning method and system for medicine analysis by using algorithm model | 
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US20050028117A1 (en) * | 2003-07-31 | 2005-02-03 | Naoto Yokoyama | Method and apparatus for designing high-frequency circuit, and display method for use in designing high-frequency circuit | 
| CN101408912A (en) * | 2008-11-21 | 2009-04-15 | 天津师范大学 | Method for automatically extracting characteristic function of traditional Chinese medicine pulse manifestation | 
| CN102646168A (en) * | 2012-04-16 | 2012-08-22 | 南京大学 | Hierarchical overlapping community discovery method based on co-nearest neighbor similar triangle agglomeration for traditional Chinese medicine prescription network | 
| CN104199853A (en) * | 2014-08-12 | 2014-12-10 | 南京信息工程大学 | Clustering method | 
- 
        2015
        - 2015-04-17 CN CN201510183745.0A patent/CN104820775A/en active Pending
 
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US20050028117A1 (en) * | 2003-07-31 | 2005-02-03 | Naoto Yokoyama | Method and apparatus for designing high-frequency circuit, and display method for use in designing high-frequency circuit | 
| CN101408912A (en) * | 2008-11-21 | 2009-04-15 | 天津师范大学 | Method for automatically extracting characteristic function of traditional Chinese medicine pulse manifestation | 
| CN102646168A (en) * | 2012-04-16 | 2012-08-22 | 南京大学 | Hierarchical overlapping community discovery method based on co-nearest neighbor similar triangle agglomeration for traditional Chinese medicine prescription network | 
| CN104199853A (en) * | 2014-08-12 | 2014-12-10 | 南京信息工程大学 | Clustering method | 
Non-Patent Citations (3)
| Title | 
|---|
| 周伟 等: "利用效用度挖掘核心药物及配伍规律", 《计算机科学与探索》 * | 
| 周伟: "中药方剂核心药物及其配伍规律挖掘", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * | 
| 王峰: "用于中药方剂知识发现的若干数据挖掘方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * | 
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN106202888A (en) * | 2016-06-29 | 2016-12-07 | 北京千安哲信息技术有限公司 | A kind of method and device measuring Chinese crude drug similarity | 
| CN106202888B (en) * | 2016-06-29 | 2019-05-07 | 北京千安哲信息技术有限公司 | A kind of method and device for measuring Chinese medicine similitude | 
| CN107799160A (en) * | 2017-10-26 | 2018-03-13 | 医渡云(北京)技术有限公司 | Medication aid decision-making method and device, storage medium, electronic equipment | 
| CN109271515A (en) * | 2018-09-19 | 2019-01-25 | 南京邮电大学 | A kind of antibiotic medicine method for risk stratification based on clustering | 
| CN110010251A (en) * | 2019-02-01 | 2019-07-12 | 华南师范大学 | A kind of Chinese medicine community information generation method, system, device and storage medium | 
| CN110010251B (en) * | 2019-02-01 | 2022-04-15 | 华南师范大学 | A method, system, device and storage medium for generating information of traditional Chinese medicine community | 
| CN112133382A (en) * | 2020-09-24 | 2020-12-25 | 南京中爱人工智能与生命科学研究院有限公司 | Learning method and system for medicine analysis by using algorithm model | 
| CN112133382B (en) * | 2020-09-24 | 2024-02-20 | 南京泛泰数字科技研究院有限公司 | Learning method and system for medical analysis by using algorithm model | 
Similar Documents
| Publication | Publication Date | Title | 
|---|---|---|
| Choi et al. | Emerging topic detection in twitter stream based on high utility pattern mining | |
| Antoniak et al. | Evaluating the stability of embedding-based word similarities | |
| Kosinski et al. | Mining big data to extract patterns and predict real-life outcomes. | |
| CN108399163A (en) | Bluebeard compound polymerize the text similarity measure with word combination semantic feature | |
| CN104834747B (en) | Short text classification method based on convolutional neural networks | |
| CN111221968B (en) | Author disambiguation method and device based on subject tree clustering | |
| Dreßler et al. | On the efficient execution of bounded jaro-winkler distances | |
| CN104820775A (en) | Discovery method of core drug of traditional Chinese medicine prescription | |
| CN108804677A (en) | In conjunction with the deep learning question classification method and system of multi-layer attention mechanism | |
| Wang et al. | Exact confidence intervals for the relative risk and the odds ratio | |
| CN103020454A (en) | Method and system for extracting morbidity key factor and early warning disease | |
| CN111368891A (en) | A K-Means Text Classification Method Based on Immune Clone Grey Wolf Optimization Algorithm | |
| CN103955703A (en) | Medical image disease classification method based on naive Bayes | |
| Fu et al. | Automatic record linkage of individuals and households in historical census data | |
| CN108519971A (en) | A Cross-lingual News Topic Similarity Comparison Method Based on Parallel Corpus | |
| CN111143547A (en) | Big data display method based on knowledge graph | |
| KR20240046481A (en) | Systems and methods for associating compounds with physiological conditions using fingerprint analysis | |
| Li et al. | Subgroup identification via homogeneity pursuit for dense longitudinal/spatial data | |
| CN115617978A (en) | Index name retrieval method, device, electronic equipment and storage medium | |
| Wang et al. | A meta-learning based stress category detection framework on social media | |
| CN115206533B (en) | Knowledge-graph-based health management method and device and electronic equipment | |
| CN110299194A (en) | The similar case recommended method with the wide depth model of improvement is indicated based on comprehensive characteristics | |
| CN104317837A (en) | Cross-modal searching method based on topic model | |
| Xu et al. | Dynamic clustering for short text stream based on Dirichlet process | |
| Ashraf et al. | Feature selection techniques on thyroid, hepatitis, and breast cancer datasets | 
Legal Events
| Date | Code | Title | Description | 
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| EXSB | Decision made by sipo to initiate substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | ||
| RJ01 | Rejection of invention patent application after publication | Application publication date: 20150805 |