Disclosure of Invention
Aiming at the defects in the prior art, the anti-NMDAR encephalitis image feature classification method based on machine learning solves the problems that the existing brain structure image is difficult to identify the volume features only by naked eyes, and the existing machine learning method is limited in computing capacity and difficult to process all image parameters.
In order to achieve the purpose of the invention, the invention adopts the technical scheme that: a method for classifying anti-NMDAR encephalitis image features based on machine learning comprises the following steps:
s1, acquiring a corresponding MRI image based on the sample which is diagnosed as the NMDAR encephalitis resisting sample;
s2, processing the image data of the obtained MRI image, reconstructing the cerebral cortex and calculating the corresponding cortical features;
s3, performing data positioning and feature screening processing on all the calculated cortex features to form a data set;
s4, constructing an anti-NMDAR encephalitis image feature classification model, and training the model through a data set;
and S5, inputting the cortex characteristics to be processed into the trained NMDAR encephalitis resisting image characteristic classification model to obtain a characteristic classification result.
Further, in step S2, the method for processing the image data of the MRI image corresponding to the sample specifically includes:
a1, performing skull stripping on the MRI image, and performing B1 deviation field correction and grey white substance segmentation;
a2, reconstructing a cortex surface model based on the grey white substance segmentation result;
a3, marking the surface area of the cortex and the subcortical brain structure on the reconstructed cortical surface model;
a4, based on the marked surface region of the cortex and the subcortical brain structure, carrying out nonlinear registration on the surface of the cortex in the MRI image of each sample by using a stereotaxic map, reconstructing the cerebral cortex and calculating the cortical characteristic parameters of each cerebral region;
a5, selecting a plurality of cortical features with highest relevance to the autoimmune encephalitis in the cerebral cortex based on the cortical feature parameters.
Further, the step a5 specifically includes:
determining a standard brain as a surface template, mapping an annotation corresponding to an aparc.a2009 brain map into each cortical area of the surface template, and coloring the cortical areas interested in resisting NMDAR encephalitis in the cerebral cortex according to the cortical characteristic parameters to obtain the cortical characteristics of the cerebral cortex.
Further, in step S3, the method for performing data positioning on the cortical feature specifically includes:
b1, performing missing value check on the cortex characteristics corresponding to each sample, and deleting the cortex characteristics of the sample corresponding to the cortex characteristics with the missing values;
b2, detecting the outliers of all the cortical features of each current sample by a quartile spacing method, and deleting the outliers
B3, performing standardization processing on all cortical features of the current sample;
and B4, taking all the processed cortical features as the feature information of the sample.
Further, the method for screening the feature information in step S3 specifically includes:
c1, forming a feature set by the feature information of all samples;
c2, taking the trained lasso model as a characteristic screening model;
c2, processing the feature information in the feature set through the trained feature screening model, distributing weight to each feature information in the processing process, and deleting the feature information smaller than a set weight threshold value from the feature set;
and C4, carrying out validity verification on the feature information retained in the feature set by an ROC regression method, and forming a data set by the verified feature information.
Further, the structure of the anti-NMDAR encephalitis image feature classification model in step S4 is:
on the basis of a 3D-CNN network, a full connection layer is modified into a support vector machine, and a feature classification result of the NMDAR encephalitis resisting image is output by the support vector machine.
Further, in the step S5, the performance evaluation parameters of the trained NMDAR encephalitis resisting image feature classification model include an accuracy value and a classification performance value;
wherein, the accuracy ACC is:
the categorical performance values F1 were:
where precision is precision, recall is recall,
TP is true positive, FP is false positive, TN is true negative and FN false negative.
The invention has the beneficial effects that:
(1) the method only uses the characteristics of the cerebral cortex, has high correlation with autoimmune encephalitis, and greatly reduces the actual characteristic classification and the dependence on domain expert knowledge compared with the existing artificial classification method;
(2) the method of the invention processes the medical image, extracts corresponding cortical parameters aiming at different brain images so as to extract the characteristics of the local brain lesion, greatly saves the cost of a detection instrument and improves the accuracy of characteristic classification;
(3) the method converts the visual information of the medical image into deep data characteristics for quantitative research, and provides objective, consistent and reproducible reference information for doctors.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
As shown in fig. 1, a method for classifying NMDAR encephalitis-resistant image features based on machine learning includes the following steps:
s1, acquiring a corresponding MRI image based on the sample which is diagnosed as the NMDAR encephalitis resisting sample;
s2, processing the image data of the obtained MRI image, reconstructing the cerebral cortex and calculating the corresponding cortical features;
s3, performing data positioning and feature screening processing on all the calculated cortex features to form a data set;
s4, constructing an anti-NMDAR encephalitis image feature classification model, and training the model through a data set;
and S5, inputting the cortex characteristics to be processed into the trained NMDAR encephalitis resisting image characteristic classification model to obtain a characteristic classification result.
In step S1, a plurality of patient data are collected based on historical clinical work, a plurality of health samples are randomly selected from the database for comparison, after T1 weighted images are evaluated, Autoimmune Encephalitis (AE) and health data in the patient are analyzed, and corresponding serum and/or CSF samples are sent to patients in the cohort of autoimmune neurological disease markers in a remote special detection center, thereby identifying individuals diagnosed with NMDAR encephalitis.
In step S1, when the MRI image is acquired, structural MRI images of all subjects are acquired by acquiring signals using a Trio or Skyra 3.0 scanner and a standard 8-channel head coil. The T1 weighted structure brain image was acquired using a three-dimensional fourier transform fast spoiled gradient recall with a steady state because it provided excellent contrast between gray and white matter. In Trio, the repetition time (repetition time) is 450-500ms, the echo time (echo time) is 20-25ms, the slice thickness (slice thickness) is 5.0mm, the matrix (matrix) is 256256, and the number of axial slices (number of axial slices) is 20. On Skyra, repetition time is 1600-1660 ms, echo time is 8-8.6 ms, slice thickness is 5.0mm, matrix is 256256, and axial slice number is 20.
In step S2, the method for processing the image data of the MRI image corresponding to one sample specifically includes:
a1, performing skull stripping on the MRI image, and performing B1 deviation field correction and grey white substance segmentation;
a2, reconstructing a cortex surface model based on the grey white substance segmentation result;
a3, marking the surface area of the cortex and the subcortical brain structure on the reconstructed cortical surface model;
a4, based on the marked surface region of the cortex and the subcortical brain structure, carrying out nonlinear registration on the surface of the cortex in the MRI image of each sample by using a stereotaxic map, reconstructing the cerebral cortex and calculating the cortical characteristic parameters of each cerebral region;
in this step, the reconstructed cerebral cortex is visually checked for the reconstructed result of each viewer, and any incorrect place due to the artifact image is corrected manually;
a5, selecting a plurality of cortical features with highest relevance to the autoimmune encephalitis in the cerebral cortex based on the cortical feature parameters;
wherein, the step A5 specifically comprises the following steps:
determining a standard brain as a surface template, mapping an annotation corresponding to an aparc.a2009 brain map into each cortical area of the surface template, and coloring the cortical areas interested in resisting NMDAR encephalitis in the cerebral cortex according to the cortical characteristic parameters to obtain the cortical characteristics of the cerebral cortex.
The visualization result obtained based on the above image data processing is shown in fig. 2.
In step S3, the method for performing data positioning on the cortical feature specifically includes:
b1, performing missing value check on the cortex characteristics corresponding to each sample, and deleting the cortex characteristics of the sample corresponding to the cortex characteristics with the missing values;
b2, detecting the outliers of all the cortical features of each current sample by a quartile spacing method, and deleting the outliers
B3, performing standardization processing on all cortical features of the current sample;
wherein for the normalization process to subtract the sample mean, divided by the population standard deviation of each individual of the same characteristic, the standard scalar is applied to all datasets with fitted transformation ranges;
and B4, taking all the processed cortical features as the feature information of the sample.
The method for screening the feature information in step S3 specifically includes:
c1, forming a feature set by the feature information of all samples;
c2, taking the trained lasso model as a characteristic screening model;
c2, processing the feature information in the feature set through the trained feature screening model, distributing weight to each feature information in the processing process, and deleting the feature information smaller than a set weight threshold value from the feature set;
and C4, carrying out validity verification on the feature information retained in the feature set by an ROC regression method, and forming a data set by the verified feature information.
Specifically, in the step C2, when the lasso model is trained, the feature set obtained by positioning the previous data is randomly divided into k mutually exclusive subsets with mutually exclusive sizes, where a sum of the k subsets is used as a training set of the lasso model, and the remaining subsets are used as test sets, and the content of each training round is different, so that the generalization capability of the lasso model is improved.
The lasso model is an LR-based model that constructs penalty coefficients to obtain a more accurate model, compressing some of the regression coefficients, limits the sum of the absolute values of the forcing coefficients to less than some fixed value, and sets some of the regression coefficients to zero, thus, it retains the advantage of subset shrinkage, being a biased estimate of the data with multiple linearity.
Fig. 3 is a schematic diagram showing the visualization of the features retained after the feature screening.
The structure of the anti-NMDAR encephalitis image feature classification model in the step S4 is as follows:
on the basis of a 3D-CNN network, a full connection layer is modified into a support vector machine, and a feature classification result of the NMDAR encephalitis resisting image is output by the support vector machine.
The basic model of Support Vector Machine (SVM) is a linear classifier with maximum separation defined in feature space, and a C-support vector machine classifier (C-SVM) is used in the present invention, which defines the classification of nonlinear samples. The learning strategy of the support vector machine is interval maximization, which can be formalized as a convex quadratic programming problem and is also equivalent to a regularized hinge loss function minimization problem, the optimal classification hyperplane of the SVM depends on some support vector machines, and in order to prevent model transition fitting, we fit data through a limited data set (only 400 samples but 304 features).
The invention adopts SVM-based ensemble learning (SVM + AdaBoost): because of limitations of medical image data and difficulties in feature selection, only SVM cannot obtain satisfactory classification results and high accuracy, and thus, an attempt has been made to improve classification results based on integrated learning of SVM, AdaBoost is a weak classification enhancement method with an integrated boosting algorithm that combines a group of weak classifiers into a weighted sum to create a stronger enhanced classifier, the principle of which is to update the weak classifiers by adjusting sample weights according to a training set, the weight of each sample being estimated from the output of the classifier in the previous step, so as to improve the next classifier to cope with more challenging examples.
In step S5, the performance evaluation parameters of the trained NMDAR encephalitis resisting image feature classification model include an accuracy value and a classification performance value;
wherein, the accuracy ACC is:
the categorical performance values F1 were:
where precision is precision, recall is recall,
TP is true positive, FP is false positive, TN is true negative and FN false negative. Wherein, the accuracy value is used for characterizing the accuracy of the classification task of the model, and the classification performance value F1 is used for characterizing the classification recognition capability of the model to the features. The classification performance value achieves a perfect balance between accuracy and recall, thereby providing a correct assessment of the model's performance in classifying images.