Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a recommendation method of a medical scheme suitable for concept drift, which can take the influence of the concept drift phenomenon on the result into consideration, takes the factors such as the acquisition time of a sample into consideration, can detect the occurrence of the concept drift, and corrects the outdated sample to adapt to a new rule, so that the later prediction is more accurate.
The invention provides a recommendation method for a concept drifting medical scheme, which comprises the following steps:
step one, starting;
step two, inputting case information x of a scheme to be recommended;
reading historical case data Da from the historical case sample set;
step four, simultaneously calculating a detector sensitive to the concept;
step five, outputting a recommendation result Yac of the self-adaptive classifier, wherein the result of the self-adaptive classifier is output as a recommended medical scheme of the case;
step six, determining a scheme y adopted by time, performing discussion analysis by a doctor according to a recommendation result, and finally obtaining an actually adopted diagnosis and treatment scheme by taking individual characteristics of a patient as reference;
step seven, adding the case information x and the scheme y in the step six into a history and sample set, and adding the information of the patient into a history data sample set for future use;
step eight, judging whether the result of the self-adaptive classifier and the result of the CSD classifier are the same as the actually adopted scheme or not, if so, turning to the step nine, and otherwise, turning to the step ten;
step nine, updating the conflict list Ct of each historical sample Dt, wherein the self-adaptive classifier predicts wrongly, and the CSD classifier predicts correctly, the relevant data in the historical case sample set needs to be updated, so that the old sample adapts to a new rule, and the medical scheme under the current rule is predicted more accurately;
and step ten, finishing.
Preferably, in the second step, the case information, various clinical indexes and other attributes are used as characteristics to establish a model, the similarity between the new case and all existing case records is calculated item by using the model, and the recommendation scheme is comprehensively generated according to several results with the highest similarity according to the similarity ranking.
Preferably, the historical case data Da in the third step is equal to the training sample set, and each piece of historical case data Da is a training sample.
Compared with the prior art, the invention has the following beneficial effects: the method can consider the influence of the concept drift phenomenon on the result, considers the factors such as the acquisition time of the sample and the like, can detect the occurrence of the concept drift, and corrects the outdated sample to adapt to a new rule, so that the later prediction is more accurate.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications can be made by persons skilled in the art without departing from the spirit of the invention. All falling within the scope of the present invention.
As shown in fig. 1, the recommendation system for a concept-shifted medical scenario of the present invention includes:
the user interface module is connected with the recommending module and used for inputting case information of the scheme to be recommended;
and the recommendation module is connected with the workflow system and automatically and intelligently calculates a most appropriate medical scheme for the current case according to the data of the historical case.
The external database is connected with the recommendation module and used for storing the training samples; the external database is used for storing detailed information of each case, such as personal information including names, heights, weights and the like, and specific medical index information.
A workflow database connected to the workflow system for storing and reading the status of each case, such as what status a case is currently in (whether a medical solution has been determined, a solution is waiting for a system recommendation, specific information is insufficient, etc.)
A workflow system for controlling a basic flow of each case processed by the entire system.
The recommending module comprises a scheme recommending module and a scheme formulating module, wherein: the scheme recommending module is a process of recommending a scheme for the sample to be predicted by the recommending method; the solution formulation module is the process of determining the solution actually employed.
The recommendation method of the medical scheme suitable for concept drift comprises the following steps:
step one, starting;
step two, inputting case information x of a scheme to be recommended;
reading historical case data Da from the historical case sample set;
and step four, simultaneously calculating CSD (Concept-Sensitive Detector, which means a Detector Sensitive to concepts and essentially a classifier for detecting the occurrence of Concept drift).
A Classifier recommendation scheme Ycsd and an Adaptive Classifier (AC) recommendation scheme Yac respectively use a CSD Classifier and an AC Classifier to perform scheme recommendation, Yac represents the classification result of the Adaptive Classifier, and Ycsd represents the classification result of the Classifier CSD;
step five, outputting a recommendation result Yac of the self-adaptive classifier, wherein the result of the self-adaptive classifier is output as a recommended medical scheme of the case;
step six, determining a scheme y adopted by time, performing discussion analysis by a doctor according to a recommendation result, and finally obtaining an actually adopted diagnosis and treatment scheme by taking individual characteristics of a patient as reference;
step seven, adding < x, y > into the historical sample set, and adding the information of the patient into the historical data sample set for future use;
step eight, judging whether the result of the self-adaptive classifier and the result of the CSD classifier are the same as the actually adopted scheme or not, if so, turning to the step nine, and otherwise, turning to the step ten;
step nine, updating the conflict list Ct of each historical sample Dt, wherein the self-adaptive classifier predicts wrongly, and the CSD classifier predicts correctly, the relevant data in the historical case sample set needs to be updated, so that the old sample adapts to a new rule, and the medical scheme under the current rule is predicted more accurately;
and step ten, finishing.
And secondly, establishing a model by using the attributes of the case information, various clinical indexes and the like as characteristics, calculating the similarity of the new case and all existing case records one by using the model, sequencing according to the similarity, and comprehensively generating a recommendation scheme according to several results with the highest similarity, so that the use is convenient.
In the third step, the historical case data Da is equal to a training sample set, each piece of historical case data is a training sample, and the training sample set D is shown in the following formula (1):
wherein N represents the sample capacity of the training set, N represents the number of attributes of each sample, and in the training set, the samples are arranged according to the time sequence of acquisition, in other words, the smaller the serial number i, the earlier the acquisition time of the sample, and for the ith historical sample Di in the training set, as shown in the following formula (2):
wherein
Values, y, representing n different attributes of the ith sample, respectively
iIndicating the category to which the ith sample belongs.
In the fourth step, the AC represents the self-adaptive classifier using all samples in the training set, and the classification result comprehensively considers all historical samples, so that the method is more stable; CSD represents that a classifier of recently acquired samples in a training set is used, and classification results of the classifier are more sensitive to new concepts and used for detecting concept drift; to detect the occurrence of concept drift, a set of kNN classifiers using different samples is required to classify the target samples.
Said seventh step, for a sample < x (now), y (now) >, to be predicted currently, if the result of the classifier AC is wrong and the result of the CSD is correct, it means that the historical data used in the process of generating this result by the AC and the current sample are inconsistent due to different rules of compliance.
Judging whether the Yac is consistent with the Ycsd or not, and if the Yac is consistent with the Ycsd, determining that no concept drift occurs; on the contrary, if the Yac is not equal to the Ycsd, the rule of obeying the recently collected sample is different from the historical sample, and concept drift may occur, and further detection is needed.
In the ninth step, Dt is represented by the above formula (2) and satisfies t > N, N is the sample capacity of the current training set, and other samples in the training set are used for predicting the category yt to which Dt belongs; ct is expressed by the following formula (3) for recording samples inconsistent with Dt, for each historical sample Dt used by the AC classifier, it needs to be marked that it conflicts with the currently predicted sample, namely, the sample is added into Ct in (now, y (now)), for any historical sample Dt, if the item in Ct reaches a certain threshold, the concept drift is considered to have great influence on the historical sample, yt in the historical sample is updated by the latest y value to adapt to the latest rule, and the next prediction and recommendation are more accurately served,
Ct={(i,y(i))|i>t∧x(t)∈AC.nearest(k,x(i))∧y(i)≠y(t)}......(3)
ct is a set used to record samples that do not match Dt. This formula is a mathematical collective expression meaning: each element in this set is of the form (i, y (i)) where i is the number of the inconsistent sample, and is required to be greater than t; x (i) is the characteristic attribute of this sample, which is required to satisfy that it belongs to the "k samples nearest to Dt".
Wherein, the nearest represents the set of k samples with the highest similarity to the current sample Xi to be predicted, as shown in the following formula (4).
a and b are the numbers of the two samples Da and Db, respectively, and d represents the total number of samples in the training sample.
For the discrete attributes of the class type, there is no specific magnitude relationship between the values, so the similarity between the samples cannot be directly measured by using the mahalanobis distance as simple as the numerical attribute, and generally, in order to calculate the sample DaAnd DbThe similarity between the two samples needs to define the distance on each attribute respectively, and finally the final result is obtained by weighted comprehensive calculation of each attributeaAnd DbThe value on the k-th attribute is the same, sample DaAnd DbThe similarity on this attribute is 1; if two samples DaAnd DbAny value on the kth attribute is unknown or missing, and the similarity is 0.5; otherwise, the similarity is 0, as shown in the following formulas (5) and (6):
the foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention.