[go: up one dir, main page]

CN116071296B - Model training method and device - Google Patents

Model training method and device

Info

Publication number
CN116071296B
CN116071296B CN202211488759.XA CN202211488759A CN116071296B CN 116071296 B CN116071296 B CN 116071296B CN 202211488759 A CN202211488759 A CN 202211488759A CN 116071296 B CN116071296 B CN 116071296B
Authority
CN
China
Prior art keywords
output
loss function
detection branch
instance
contour
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211488759.XA
Other languages
Chinese (zh)
Other versions
CN116071296A (en
Inventor
戴亚康
耿辰
戴斌
周志勇
李凤美
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan Guoke Medical Engineering Technology Development Co ltd
Suzhou Institute of Biomedical Engineering and Technology of CAS
Original Assignee
Jinan Guoke Medical Engineering Technology Development Co ltd
Suzhou Institute of Biomedical Engineering and Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan Guoke Medical Engineering Technology Development Co ltd, Suzhou Institute of Biomedical Engineering and Technology of CAS filed Critical Jinan Guoke Medical Engineering Technology Development Co ltd
Priority to CN202211488759.XA priority Critical patent/CN116071296B/en
Publication of CN116071296A publication Critical patent/CN116071296A/en
Application granted granted Critical
Publication of CN116071296B publication Critical patent/CN116071296B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/149Segmentation; Edge detection involving deformable models, e.g. active contour models
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10088Magnetic resonance imaging [MRI]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30096Tumor; Lesion
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30101Blood vessel; Artery; Vein; Vascular

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computing Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Algebra (AREA)
  • Medical Informatics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a model training method and device, the method comprises the steps of obtaining a plurality of first images, extracting a first object in the first images, obtaining a second object instance marked in the first images as a second object instance label, obtaining the outline of the first object, obtaining the outline of the second object as a second object outline label according to the outline of the first object and the second object instance, inputting the first object into a model to be trained, obtaining the output of the model to be trained, wherein the model to be trained comprises an outline detection branch and an instance detection branch, the output comprises a second object outline output and a second object instance output, calculating a first loss function value and a second loss function value, calculating a third loss function value according to the first loss function value and the second loss function value, and adjusting the parameters of the model to be trained based on the third loss function value. The model trained by the model training method provided by the invention has high detection efficiency.

Description

Model training method and device
Technical Field
The invention relates to the technical field of intelligent detection, in particular to a model training method and device.
Background
At present, a target object detection method based on deep learning is mostly limited to directly taking a complete image block as input of a detection model through pre-processing, and has low detection precision, and the detection area is limited, so that the time of single detection is long, namely the detection efficiency is low.
Disclosure of Invention
Therefore, the invention aims to solve the technical problems of low detection precision and low detection efficiency of the detection of the target object by using the existing detection model in the prior art, thereby providing a model training method and device.
According to a first aspect, an embodiment of the present invention provides a model training method, including the following steps:
acquiring a plurality of first images;
extracting a first object in each first image, and acquiring a second object instance marked from the first image as a second object instance tag;
acquiring the outline of the first object;
Acquiring the outline of the second object according to the outline of the first object and the second object instance, and taking the outline of the second object as a second object outline tag;
The method comprises the steps of inputting a first object to a model to be trained, and obtaining output of the model to be trained, wherein the model to be trained comprises a contour detection branch and an instance detection branch, and the output comprises a second object contour output and a second object instance output;
Calculating a first loss function value based on the second object instance label and the second object instance output, respectively, and calculating a second loss function value based on the second object contour label and the second object contour output;
Calculating a third loss function value from the first loss function value and the second loss function value;
And adjusting parameters of the model to be trained based on the third loss function value.
Optionally, the inputting the first object into the model to be trained includes:
dividing a second object obtained from the first object annotation according to the position of the second object in the first object;
Counting the number of the second objects at each position;
performing a location-balanced amplification on the first object using at least one of flipping along a cross-section, adding discrete gaussian noise, and performing histogram equalization based on the number;
and inputting the first object and the amplified first object into the model to be trained.
Optionally, the model to be trained includes an encoding block, a feature extraction block and a decoding block, and the contour detection branch and the instance detection branch each include the feature extraction block and the decoding block;
the coding block comprises M groups of downsampling structures which are sequentially connected, the M groups of downsampling structures are used for respectively obtaining downsampling results of different scales, the decoding block comprises M groups of upsampling structures which are in one-to-one correspondence with the downsampling structures, and the sampling result of each group of downsampling structures is spliced with the characteristics output by the previous stage structure of the corresponding upsampling structure and then used as the input characteristics of the upsampling structure;
The feature extraction block in the contour detection branch is used for extracting deep features based on the output of the coding block, the feature extraction block in the instance detection branch is used for extracting deep features based on the output of the coding block and the up-sampling result of the intermediate layer up-sampling structure of the decoding block in the contour detection branch, and the decoding block further comprises a classification layer used for performing classification detection based on the output of the up-sampling structure of M groups.
Optionally, each set of downsampling structures in the coding block includes a convolution block and BiA modules connected in sequence, the convolution block is used for downsampling, the BiA module includes two parallel residual branches, the two residual branches are used for decoupling characteristics output by the convolution block, two characteristic diagrams are obtained, and the two characteristic diagrams are respectively input into the decoding block in the contour detection branch and the decoding block in the instance detection branch.
Optionally, each residual branch of the BiA modules comprises two residual sub-modules connected in sequence, the BiA modules further comprise a spatial attention mechanism block, the spatial attention mechanism block comprises a maximum pooling layer and an average pooling layer which are connected in sequence, the input of the spatial attention mechanism block is the output of the convolution block of the same group of downsampling structures, the output of the spatial attention mechanism block obtains a weight graph through a Sigmoid function, and the weight graph is respectively connected with the output of the subsequent residual sub-module of the two residual branches.
Optionally, the feature extraction block comprises a plurality of downsampling layers and a plurality of upsampling layers which are sequentially connected, wherein a Swin-transform layer is further included before each downsampling layer and each upsampling layer, a convolution layer is adopted for splicing the last downsampling layer and the first upsampling layer, and the output of the last downsampling layer and the output of the previous downsampling layer are spliced through short circuit.
Optionally, the up-sampling result of the up-sampling structure of the middle layer of the decoding block in the contour detection branch is down-sampled to the output scale of the encoding block, then a weight map is obtained through a Sigmoid function, and the weight map is multiplied by the output of the encoding block after being added to be used as the input of the feature extraction block of the example detection branch.
Optionally, the first up-sampling result of the decoding block under different scales adjusts the channel number through convolution, and then adds up-sampling result elements with larger scales as a deep supervision of the model to be trained.
Optionally, the first loss function value is calculated using the following formula:
Wherein a is the a-th connected domain, K is the number of the connected domains, p (x b) is the true value input to the contour detection branch, b is the categories of foreground 1 and background 0 of the true value, q (x ab) is the predicted value of the contour detection branch, L CE is a cross entropy loss function, and L CDE is the loss value of the contour detection branch.
Optionally, the second loss function value is calculated using the following formula:
Where X is a matrix of predicted results in the instance detection branch, Y is a true value entered into the instance detection branch, and L Dice is a loss value of the instance detection branch.
According to a second aspect, an embodiment of the present invention provides a model training apparatus, including a first acquisition module configured to acquire a plurality of first images;
the processing module is used for extracting a first object in each first image, and acquiring a second object instance marked from the first image as a second object instance tag;
the second acquisition module is used for acquiring the outline of the first object;
the label module is used for acquiring the outline of the second object according to the outline of the first object and the second object instance and taking the outline of the second object as a second object outline label;
The system comprises a first object, a detection module, a training module, a detection module and a control module, wherein the first object is input into a model to be trained to acquire the output of the model to be trained;
a first calculation module for calculating a first loss function value based on the second object instance label and the second object instance output, respectively, and a second loss function value based on the second object contour label and the second object contour output;
A second calculation module for calculating a third loss function value from the first loss function value and the second loss function value;
and the adjusting module is used for adjusting the parameters of the model to be trained based on the third loss function value.
According to a third aspect, an embodiment of the present invention provides a computer device, including a memory and a processor, where the memory and the processor are communicatively connected to each other, and the memory stores computer instructions, and the processor executes the computer instructions, thereby performing the model training method described above.
According to a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium storing computer instructions for causing the computer to perform the above-described model training method.
The technical scheme of the invention has the following advantages:
the method for training the model provided by the embodiment of the invention can train the obtained double-supervision model, and considers that a detection target object, namely a second object, in an image to be detected is closely related to a first object, for example, an aneurysm is abnormal bulge on an arterial wall, so that before the second object of the image is detected by using the double-supervision model, the first object is required to be extracted from the image to be detected, and the first object is input into the double-supervision model to output a detection result. The method has the advantages that long-time operation such as N4 correction is not needed when the first object is preprocessed, the detection efficiency is high, the first image is not needed to be divided into small blocks during detection, the sensitivity is high, and the operations such as image reconstruction and the like are avoided. In addition, the dual-supervision model is used for detecting the first object closely related to the second object, not detecting the whole image content, so that the efficiency and the accuracy are high during detection. The two branches of the double-supervision model respectively extract the outline features and all the features of the second object, detect possible second objects according to the outline features, and further judge whether the possible second objects are actually the second objects or not based on all the features, so that the false detection rate is reduced, and the detection precision is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart showing a specific example of a model training method in embodiment 1 of the present application;
FIG. 2 is a schematic block diagram of a specific example of preprocessing in embodiment 1 of the present application;
FIG. 3 is a schematic block diagram of a specific example of a model to be trained in embodiment 1 of the present application;
FIG. 4 is a schematic block diagram of a model training apparatus according to embodiment 2 of the present application;
fig. 5 is a schematic structural diagram of a specific example of a computer device in embodiment 3 of the present application.
Detailed Description
The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the description of the present invention, it should be noted that the directions or positional relationships indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected, mechanically connected, electrically connected, directly connected, indirectly connected via an intermediate medium, and in communication with each other between two elements, and wirelessly connected, or wired. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
In addition, the technical features of the different embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
Example 1
The embodiment provides a model training method, as shown in fig. 1, including the following steps:
Step S101, a plurality of first images are acquired. The first image may be an image obtained by Time-Of-Flight vessel imaging (Time Of Flight-Magnetic Resonance Angiography, abbreviated as TOF-MRA), and the TOF-MRA is mainly used for imaging a head vessel. Of course, the first image in this embodiment may be a CT image, a DSA image, or the like, and the first object may be a tissue structure included in the image, or the like. In this embodiment, TOF-MRA images are mainly taken as an example. In this embodiment, the plurality of first images includes TOF-MRA images of the aneurysm. The first image should avoid obtaining images with poor imaging quality and serious artifacts.
Step S102, extracting a first object in the first image for each first image, and acquiring a second object instance marked from the first image as a second object instance label.
The first image also needs to be preprocessed before it is subjected to the first object extraction, said preprocessing comprising data normalization. The voxel intensities of the acquired plurality of first images are normalized to the range of 0-1024, and the origin and voxel space distances (i.e., voxel pitches) of all the first images are unified, which can be set to (0, 0) and (1, 1), respectively. Here, the second object instance may be labeled according to the first image before preprocessing, and in this case, normalization of the origin and voxel distance of the labeled second object instance needs to be performed with reference to normalization of the first image. And (3) carrying out origin and voxel distance standardization on the first image and a second object instance obtained based on the first image annotation, wherein the origin and voxel distance standardization is used for standardizing the differences of voxel ranges in different images caused by different acquisition parameters.
When the first image is a TOF-MRA image, the first object may be a vascular structure and the second object may be a hemangioma, such as an aneurysm, accordingly.
Further, the TOF-MRA image can be segmented by setting a threshold of voxel intensity, extracting a complete vascular structure in the TOF-MRA image, and taking the vascular structure as a first object. The vascular structure is inflated according to the size of the aneurysm or the diameter of the vascular structure, and for smaller aneurysms, the inflation radius can be larger, and for larger aneurysms, the inflation radius can be smaller, so that the extraction result comprises the vascular structure and part of surrounding tissues.
Before the second object instance is acquired, a second object existing in the first image is also noted. In this embodiment, taking the second object as an aneurysm as an example, that is, labeling the aneurysm in the first image, thereby obtaining a second object instance. In this embodiment, the obtained second object instance is used as the second instance tag.
Step S103, acquiring a contour of the first object. As described above, the first object may be a vascular structure, and further, the first image is preprocessed, and as shown in fig. 2, the extracted vascular structure is segmented by using Canny operator processing, so as to obtain a vascular contour. In this embodiment, the contour of the first object may be a contour of a blood vessel.
Step S104, according to the outline of the first object and the second object instance, obtaining the outline of the second object as a second object outline tag.
And (3) performing expansion treatment on the marked second object example according to the size of the aneurysm, wherein the aneurysm is smaller, the expansion radius is larger, the aneurysm is larger, and the expansion radius is smaller. And multiplying the expanded second object instance by the contour of the first object to obtain the contour of the second object, as shown in fig. 2.
As described above, the contour of the first object may be a vessel contour and the second object instance may be a label of the aneurysm in the first image. In this embodiment, the contour of the second object is obtained according to the contour of the first object and the second object instance, that is, the contour of the aneurysm may be obtained according to the blood vessel contour and the aneurysm marking. In this embodiment, the second object is located on the first object. Further, the contour of the second object is taken as a second image contour label.
Step S105, inputting the first object to a model to be trained, and obtaining output of the model to be trained, wherein the model to be trained comprises a contour detection branch and an instance detection branch, the output comprises a second object contour output and a second object instance output, and the instance detection branch detects a second object contour based on contour features extracted by the first object and the contour detection branch.
In this embodiment, the model to be trained may employ a deep convolutional encoder-decoder network structure (CEDNet). And capturing contour and texture feature information of the second object by adopting a CEDNet structure, and executing a detection task of the second object.
In this embodiment, the model to be trained includes two monitoring branches, namely a contour detection branch and an instance detection branch. And taking the second image contour label as a label of the contour detection branch, and taking the second object instance label as a label of the instance detection branch.
Besides the first object, the training sample can be input to the model to be trained, and the normal vascular structure acquired based on other images can be further arranged, so that the problem of high false detection rate after the model is trained is avoided.
The contour feature of the second object (i.e. the contour feature of the vessel in the aneurysm area in the present embodiment) can be extracted by the contour detection branch to output the contour output of the second object, and the extracted contour feature of the second object can be further judged by the instance detection branch to output the instance output of the second object. The second object instance output of the instance detection branch is based on the first object and the outline characteristics of the second object extracted from the outline detection branch, so that the second object is further detected, and the detection result is output as the second object instance.
Step S106, calculating a first loss function value based on the second object instance label and the second object instance output, and calculating a second loss function value based on the second object contour label and the second object contour output.
In this embodiment, the first loss function L CDE is an improvement based on a cross entropy loss function, and is used to execute the cross entropy loss function at the connected domain level of the prediction result of the model to be trained.
When the first loss function is used, the voxel with probability higher than a threshold value in a probability map obtained by prediction of a model to be trained is set to be 1 to obtain a prediction result, the number of connected domains in the prediction result of the first step is calculated in the second step, when the number of the connected domains is 0 or 1, the first loss function is directly equal to the cross loss entropy function, when the number of the connected domains is greater than 1, the prediction result and the real result are taken as elements, each connected domain in the elements and the results is taken out to be cross entropy with the real value, and finally a series of connected domain cross entropy results obtained in the third step are averaged to obtain the first loss function value.
Step S107, calculating a third loss function value from the first loss function value and the second loss function value.
In this embodiment, the third loss function may be a sum of weights of the first loss function L CDE and the second loss function L Dice, that is: Where α is the calculated weight of the first loss function L CDE, β is the calculated weight of the second loss function L Dice, and L is the third loss function.
And step S108, adjusting parameters of the model to be trained based on the third loss function value.
If the third loss function value meets the preset threshold requirement, the training is finished, otherwise, the step S107 is carried out after the parameters are adjusted, and the training is continued.
In the embodiment, the acquired multiple first images are subjected to first object extraction to acquire a second object instance marked in the first images, and the contours of the second objects are acquired according to the first object contours and the second object instance. The outline of the second object is taken as a second object outline label, and the second object instance is taken as a second object instance label. Further, the first object is input into the model to be trained, and the second object contour output and the second object instance output are output by utilizing the contour detection branches and the instance detection branches included in the model to be trained. Calculating a first loss function value according to the second object instance label and the second object instance output, calculating a second loss function value according to the second object outline label and the second object outline output, and calculating a third loss function value according to the first loss function value and the second loss function value, thereby adjusting parameters of the model to be trained based on the third loss function value.
The dual-supervision model obtained by training the model training method provided in this embodiment considers that the detection target object, i.e. the second object, in the image to be detected is closely related to the first object, for example, the aneurysm is an abnormal bulge on the arterial wall, so before the second object in the image to be detected using the dual-supervision model is detected, the first object needs to be extracted from the image to be detected, and the first object is input into the dual-supervision model to output a detection result. The method has the advantages that long-time operation such as N4 correction is not needed when the first object is preprocessed, the detection efficiency is high, the first image is not needed to be divided into small blocks during detection, the sensitivity is high, and the operations such as image reconstruction and the like are avoided. In addition, the dual-supervision model is used for detecting the first object closely related to the second object, not detecting the whole image content, so that the efficiency and the accuracy are high during detection. The two branches of the double-supervision model respectively extract the outline features and all the features of the second object, detect possible second objects according to the outline features, and further judge whether the possible second objects are actually the second objects or not based on all the features, so that the false detection rate is reduced, and the detection precision is improved.
As an optional implementation manner, in an embodiment of the present invention, the inputting the first object into the model to be trained includes:
dividing a second object obtained from the first object annotation according to the position of the second object in the first object;
Counting the number of the second objects at each position;
performing a location-balanced amplification on the first object using at least one of flipping along a cross-section, adding discrete gaussian noise, and performing histogram equalization based on the number;
and inputting the first object and the amplified first object into the model to be trained.
In this embodiment, the first image is a TOF-MRA image. The first object may be a vascular structure, and the position of the aneurysm in the vascular structure is marked to obtain a second object, in this embodiment, the aneurysm. And according to the position of the aneurysm in each vascular structure, dividing the position of the aneurysm into areas, and counting the number of second objects in each area. The first objects are amplified according to the number of each region to equalize the number of the first objects in each region. Finally, the first object and the amplified first object are input into the region to be detected together, 80% of the input image can be selected as a training set of the model to be trained, and the remaining 20% can be used as a verification set of the model to be trained.
As an optional implementation manner, in the embodiment of the present invention, the model to be trained includes an encoding Block (encoding Block), a feature extraction Block (SC Block), and a decoding Block (decoding Block), and the contour detection branch and the instance detection branch each include the feature extraction Block and the decoding Block;
The coding block comprises M groups of downsampling structures which are sequentially connected, the M groups of downsampling structures are used for respectively obtaining downsampling results of different scales, the decoding block comprises M groups of upsampling structures which are in one-to-one correspondence with the downsampling structures, and the sampling results of each group of downsampling structures are spliced with the corresponding characteristics of the previous-stage structure output of the upsampling structures and then used as input characteristics of the upsampling structures.
In this embodiment, taking fig. 3 as an example, the encoding block includes three groups of downsampling structures with different scales connected in sequence, each group of downsampling structures includes two convolution blocks, the decoding block includes three groups of upsampling structures corresponding to the downsampling structures one to one, and each group of upsampling structures includes two convolution blocks. The decoding block has the same structure as the convolution block in the encoding block, but replaces the convolution structure in the encoding block with a deconvolution structure. Each set of downsampling structures in the encoding block further includes BiA modules, each set of downsampling structures outputting two feature maps via BiA modules. One of the feature maps serves as splicing data in the contour detection branch, and the other feature map serves as splicing data of the example detection branch.
The contour detection branch and the instance detection branch both comprise a feature extraction block and a decoding block, the feature extraction block in each branch is sequentially connected with the decoding block, the feature extraction block is used as the previous stage of the decoding block, and the output result of the feature extraction block is used as the input data of the decoding block.
And splicing the sampling result of each group of downsampling structures with the characteristics output by the structure of the previous stage of the corresponding upsampling structure, wherein the previous stage of the upsampling structure corresponding to the first downsampling structure is a characteristic extraction block.
Taking the decoding block in the contour detection branch in fig. 3 as an example, the sampling results of the first set of downsampling structures are d 11 and d 21,d11, which are spliced with the feature extraction block in the contour detection branch to serve as input features of the first set of upsampling structures, the sampling results of the second set of downsampling structures are d 12 and d 22,d12, which are spliced with features output by a previous stage structure of the upsampling structure corresponding to the second set of downsampling structures, which are spliced with features output by the first set of upsampling structures, and so on.
Taking fig. 3 as an example, j in d 1j、d2j in fig. 3 may be 1, 2, 3, and i in d i1、di2、di3 may be 1, 2.
The feature extraction block in the contour detection branch is used for extracting deep features based on the output of the coding block, the feature extraction block in the instance detection branch is used for extracting deep features based on the output of the coding block and the up-sampling result of the intermediate layer up-sampling structure of the decoding block in the contour detection branch, and the decoding block further comprises a classification layer used for performing classification detection based on the output of the up-sampling structure of M groups. Specifically, the classification layer performs classification detection based on the final output of the M groups of up-sampling structures connected in sequence.
In the embodiment, the feature extraction block in the outline detection branch or the feature extraction block in the instance detection branch has the following effects of strengthening the feature extraction capability of the deep network and extracting the abstract features and the image global dependency relationship of the second object of the deep network.
As shown in the decoding block in fig. 3, the decoding block in the contour detection branch further performs downsampling of the intermediate upsampled result, and the downsampled result is further used as an input to the feature extraction block in the example detection branch together with the output of the encoding block, so that the example detection branch further detects the second object in combination with the contour features detected by the contour detection branch.
The decoding blocks in the contour detection branch and the instance detection branch also comprise a classification layer, and the classification layer adopts SoftMax to carry out classification detection on the output of the up-sampling structure in the coding block.
As an optional implementation manner, in an embodiment of the present invention, each set of the downsampling structures in the coding block includes a convolution block and a BiA module that are sequentially connected, where the convolution block is used for downsampling, and the BiA module includes two parallel residual branches, where the two residual branches are used for decoupling the features output by the convolution block, so as to obtain two feature maps, and the two feature maps are respectively input to the decoding block in the contour detection branch.
The convolution block comprises a 3D convolution layer, a batch standardization layer and a ReLU activation layer, namely Conv+BN+ ReLU fusion layer. As shown in fig. 3, each set of downsampling structures includes two convolution blocks, the output result of the first convolution block is taken as the input of the next convolution block, and is spliced with the output result of the next convolution block, and the output result of the last convolution block is taken as the input of the BiA module.
The BiA module includes two residual branches in parallel for decoupling the features of the two dual supervisory branches, the contour detection branch and the instance detection branch.
And outputting two feature graphs by utilizing two residual branches connected in parallel with BiA modules, inputting one feature graph in the two feature graphs into a decoding block in the contour detection branch, splicing the feature graph with the feature output by the previous stage structure of the corresponding up-sampling structure in the decoding block in the contour detection branch, inputting the other feature graph into a decoding block in the example detection branch, and splicing the feature output by the previous stage structure of the corresponding up-sampling structure in the decoding block in the example detection branch.
As an optional implementation manner, in an embodiment of the present invention, each residual branch of the BiA modules includes two residual sub-modules that are sequentially connected;
The BiA module further comprises a spatial attention mechanism block, the spatial attention mechanism block comprises a maximum pooling layer MaxPool and an average pooling layer AveSPool which are sequentially connected, the input of the spatial attention mechanism block is the output of the convolution block of the same group of downsampling structures, the output of the spatial attention mechanism block is obtained into a weight graph through a Sigmoid function, and the weight graph is multiplied by the output of the subsequent residual sub-module of the two residual branches respectively and then added to obtain the two feature graphs. The Sigmoid function is one of SoftMax, and other classification functions can be used for calculation.
In this embodiment, the residual submodule adopts ResNet Block, and the two residual submodules are sequentially connected. As shown in fig. 3, each residual branch includes two residual sub-modules, the output of the first residual sub-module is used as the input of the next residual sub-module, and the output of the last residual sub-module is multiplied by the weight map of the output of the spatial attention mechanical block and then added to obtain the feature map. Here, the output of the spatial attention mechanical block refers to the output of the spatial attention mechanical block after passing through the max pooling layer MaxPool, the average pooling layer AveSPool, and the Sigmoid function. In this embodiment, the use of spatial attention mechanical blocks may enhance the more resolved feature learning capabilities.
As an optional implementation manner, in an embodiment of the present invention, the feature extraction block includes a plurality of downsampling layers and a plurality of upsampling layers that are sequentially connected;
And the front of each downsampling layer and the front of each upsampling layer also comprises a Swin-transform layer, the last downsampling layer and the first upsampling layer are spliced by adopting a convolution layer, and the output of the last downsampling layer and the output of the previous downsampling layer are spliced by short circuit.
In this embodiment, as shown in the feature extraction Block SC Block in fig. 3, both the downsampling layer and the upsampling layer adopt conv+bn+ relu fusion layers, and a Swin-transform layer is further included before each downsampling layer and upsampling layer. And the output of the last downsampling layer is fully connected with the output of the previous downsampling layer. And the last downsampling layer is spliced with the first upsampling layer by using a 1x1 convolution layer.
As an optional implementation manner, in the embodiment of the present invention, after the up-sampling result of the up-sampling structure of the middle layer of the decoding block in the contour detection branch is downsampled to the output scale of the encoding block, a weight map is obtained through a Sigmoid function, and the weight map is added to the output of the encoding block and multiplied to be used as the input of the feature extraction block of the example detection branch.
As described above, the decoding block in the contour detection branch, the intermediate layer up-sampling result thereof, further performs down-sampling, and the down-sampling result is further used as the input of the feature extraction block in the example detection branch together with the output of the encoding block to extract the deep features in the example detection branch. Specifically, after the intermediate layer result of the decoding block in the contour detection branch is downsampled to the output scale of the encoding block, a weight map is obtained through a Sigmoid function, the weight map is multiplied after being added with the output of the encoding block, and the calculated result is taken as the input of the feature extraction block of the example detection branch. The feature extraction block in the instance detection branch extracts deep features based on the input. And mapping the extracted outline features as weights, thereby influencing the extraction of the instance features.
As an alternative implementation manner, in the embodiment of the present invention, the channel number is adjusted by convolution according to the first upsampling result performed by the decoding block under different scales, and then the first upsampling result is added to the upsampling result element with a larger scale, so as to be used as a deep supervision of the model.
In this embodiment, as shown in the Decode Block in fig. 3, the first upsampling result in each upsampling structure in the decoding Block is added to the upsampling result element with a larger scale by adjusting the channel number through 1x1 convolution, so as to realize deep supervision of the model to be trained.
As an alternative implementation manner, in an embodiment of the present invention, the first loss function value is calculated using the following formula:
Where a is the a-th connected domain, K is the number of connected domains, p (x b) is the true value of the input contour detection branch, b is the category of foreground (or called positive) 1 and background (or called negative) 0 of the true value, in other words, b is equal to 1, which represents the second object, b is equal to 0, which represents the non-second object, q (x ab) is the predicted value of the contour detection branch, L CE is the cross entropy loss function, and L CDE is the loss value of the contour detection branch.
Specifically:
Wherein X is a prediction result matrix of the contour detection branch, Y is a true value, g (x+y) is a result of adding X and Y corresponding elements, th is a preset threshold value of the prediction result of the contour detection branch, and δ is a minimum value.
Q (X ab) takes the value of the a-th connected domain position in the prediction result matrix of the contour detection branch when the number of connected domains is more than or equal to 1 and the addition result of the corresponding elements of X and Y is more than or equal to the threshold value of the prediction result, and q (X ab) takes the minimum value delta in the prediction result matrix of the contour detection branch when the number of connected domains is more than or equal to 1 and the addition result of the corresponding elements of X and Y is less than the threshold value of the prediction result.
Regarding the first loss function, when the calculated number of connected domains is 0 or 1, the first loss function is directly calculated by adopting a cross loss entropy function, when the number of the connected domains is greater than 1, the predicted result and the real result are taken as element sums, each connected domain in the element and the result is taken out to be cross entropy with the real value, and finally, a series of obtained cross entropy results of the connected domains are averaged to obtain a first loss function value.
Specifically, L CDE is an improvement based on a cross entropy loss function, which is L CE:
Where q (x b) is the predicted value of the contour detection branch.
The connected domains are positive areas of predicted results in the contour detection branches, and for data input into the contour detection branches, the number of the connected domains is predicted to be more than the number of positive. And b is 1 when the prediction is positive, and is 0 when the prediction is negative, wherein 1 is foreground and 0 is background. The cross entropy of the true and predicted values can be calculated directly using L CE.
As an alternative implementation, in an embodiment of the present invention, the second loss function value is calculated using the following formula:
Where X is a matrix of predicted results in the instance detection branch, Y is a true value entered into the instance detection branch, and L Dice is a loss value of the instance detection branch.
And respectively calculating the losses of the contour detection branch and the example detection branch by using the first loss function L CDE and the second loss function L Dice of the connected domain cross entropy, and carrying out weighted summation on the two to obtain a complete loss function of the model to be trained. The cross entropy loss function of the connected domain can pay attention to the cross entropy loss of all detection positive areas and labeling areas, so that the sensitivity of the result detected by the model to be trained is higher.
Example 2
The present embodiment provides a model training apparatus, which may be used to perform the model training method in the foregoing embodiment 1, where the apparatus may be disposed inside a server or other devices, and the modules are mutually matched, so as to implement training of a model, as shown in fig. 4, and the apparatus includes:
A first acquisition module 201, configured to acquire a plurality of first images;
A processing module 202, configured to extract, for each of the first images, a first object in the first image, and obtain, as a second object instance tag, a second object instance annotated from the first image;
A second acquiring module 203, configured to acquire a contour of the first object;
A tag module 204, configured to obtain, as a second object profile tag, a profile of the second object according to the profile of the first object and the second object instance;
The detection module 205 is configured to input the first object to a model to be trained, and obtain an output of the model to be trained, where the model to be trained includes a contour detection branch and an instance detection branch, and the output includes a second object contour output and a second object instance output;
A first calculation module 206 for calculating a first loss function value based on the second object instance label and the second object instance output, respectively, and a second loss function value based on the second object contour label and the second object contour output;
A second calculation module 207 for calculating a third loss function value from the first loss function value and the second loss function value;
an adjustment module 208, configured to adjust parameters of the model to be trained based on the third loss function value.
The dual-supervision model obtained by training the model training method provided in this embodiment considers that the detection target object, i.e. the second object, in the image to be detected is closely related to the first object, for example, the aneurysm is an abnormal bulge on the arterial wall, so before the second object in the image to be detected using the dual-supervision model is detected, the first object needs to be extracted from the image to be detected, and the first object is input into the dual-supervision model to output a detection result. The method has the advantages that long-time operation such as N4 correction is not needed when the first object is preprocessed, the detection efficiency is high, the first image is not needed to be divided into small blocks during detection, the sensitivity is high, and the operations such as image reconstruction and the like are avoided. In addition, the dual-supervision model is used for detecting the first object closely related to the second object, not detecting the whole image content, so that the efficiency and the accuracy are high during detection. The two branches of the double-supervision model respectively extract the outline features and all the features of the second object, detect possible second objects according to the outline features, and further judge whether the possible second objects are actually the second objects or not based on all the features, so that the false detection rate is reduced, and the detection precision is improved.
For a specific description of the above device portion, reference may be made to the above method embodiment, and no further description is given here.
Example 3
The present embodiment provides a computer device, as shown in fig. 5, which includes a processor 301 and a memory 302, where the processor 301 and the memory 302 may be connected by a bus or other means, and in fig. 5, the connection is exemplified by a bus.
The processor 301 may be a central processing unit (Central Processing Unit, CPU). The Processor 301 may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL processors, DSPs), graphics processors (Graphics Processing Unit, GPUs), embedded neural network processors (Neural-network Processing Unit, NPUs) or other special purpose deep learning coprocessors, application Specific Integrated Circuits (ASICs), field-Programmable gate arrays (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., or a combination of the above.
The memory 302 acts as a non-transitory computer readable storage medium storing non-transitory software programs, non-transitory computer executable programs, and modules, such as the model training method of embodiments of the present invention. Corresponding program instructions/modules. The processor 301 executes various functional applications of the processor and data processing, i.e., implements the model training method in the method embodiments described above, by running non-transitory software programs, instructions, and modules stored in the memory 302.
The memory 302 may also include a storage program area that may store an operating system, application programs required for at least one function, and a storage data area that may store data created by the processor 301, etc. In addition, memory 302 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 302 may optionally include memory located remotely from processor 301, such remote memory being connectable to processor 301 through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The memory 302 stores one or more modules that, when executed by the processor 301, perform the model training method of the embodiment shown in fig. 1.
The details of the above computer device may be understood correspondingly with respect to the corresponding relevant descriptions and effects in the embodiment shown in fig. 1, which are not repeated here.
Embodiments of the present invention also provide a computer-readable storage medium storing computer-executable instructions that can perform the model training method in any of the above embodiments. The storage medium may be a magnetic disk, an optical disc, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Flash Memory (Flash Memory), a hard disk (HARD DISK DRIVE, abbreviated as HDD), a Solid state disk (Solid-state disk STATE DRIVE, SSD), or the like, and may further include a combination of the above types of memories.
It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. While still being apparent from variations or modifications that may be made by those skilled in the art are within the scope of the invention.

Claims (11)

1.一种模型训练方法,其特征在于,包括:1. A model training method, comprising: 获取多张第一图像;acquiring a plurality of first images; 针对每一所述第一图像,提取所述第一图像中的第一对象;并获取从所述第一图像中标注的第二对象实例,作为第二对象实例标签;For each of the first images, extract a first object in the first image; and obtain a second object instance annotated in the first image as a second object instance label; 获取所述第一对象的轮廓;obtaining a contour of the first object; 根据所述第一对象的轮廓和所述第二对象实例,获取所述第二对象的轮廓,作为第二对象轮廓标签;Acquire, according to the outline of the first object and the second object instance, the outline of the second object as a second object outline label; 将所述第一对象输入至待训练模型,获取所述待训练模型的输出;所述待训练模型包括轮廓检测分支和实例检测分支,所述输出包括第二对象轮廓输出和第二对象实例输出;所述实例检测分支基于所述第一对象和所述轮廓检测分支提取得到的轮廓特征检测第二对象轮廓;Inputting the first object into a to-be-trained model and obtaining an output of the to-be-trained model; the to-be-trained model includes a contour detection branch and an instance detection branch, and the output includes a second object contour output and a second object instance output; the instance detection branch detects a second object contour based on the first object and contour features extracted by the contour detection branch; 分别基于所述第二对象实例标签和所述第二对象实例输出计算第一损失函数值,基于所述第二对象轮廓标签和所述第二对象轮廓输出计算第二损失函数值;Calculating a first loss function value based on the second object instance label and the second object instance output, and calculating a second loss function value based on the second object contour label and the second object contour output; 根据所述第一损失函数值和所述第二损失函数值计算第三损失函数值;Calculate a third loss function value based on the first loss function value and the second loss function value; 基于所述第三损失函数值,调整所述待训练模型的参数;Adjusting the parameters of the model to be trained based on the third loss function value; 其中,利用以下公式计算所述第一损失函数值:The first loss function value is calculated using the following formula: 其中,a为第a个连通域,K为所述连通域的个数,p(xb)为输入所述轮廓检测分支的真实值,b为所述真实值的前景1与背景0的类别,q(xab)为所述轮廓检测分支的预测值,LCE为交叉熵损失函数,LCDE为所述轮廓检测分支的损失值;Wherein, a is the ath connected domain, K is the number of connected domains, p(x b ) is the true value input to the contour detection branch, b is the category of foreground 1 and background 0 of the true value, q(x ab ) is the predicted value of the contour detection branch, L CE is the cross entropy loss function, and L CDE is the loss value of the contour detection branch; 利用以下公式计算所述第二损失函数值:The second loss function value is calculated using the following formula: 其中,X为所述实例检测分支中预测结果的矩阵,Y为输入所述实例检测分支中的真实值,LDice为所述实例检测分支的损失值;Wherein, X is the matrix of the prediction results in the instance detection branch, Y is the true value input into the instance detection branch, and L Dice is the loss value of the instance detection branch; 利用以下公式计算所述第三损失函数值:The third loss function value is calculated using the following formula: 其中,α为所述第一损失函数的计算权重,β为所述第二损失函数的计算权重。Among them, α is the calculation weight of the first loss function, and β is the calculation weight of the second loss function. 2.根据权利要求1所述的模型训练方法,其特征在于,所述将所述第一对象输入至待训练模型包括:2. The model training method according to claim 1, wherein inputting the first object into the model to be trained comprises: 根据从所述第一对象标注得到的第二对象,在所述第一对象中的位置对所述第二对象进行划分;dividing the second object obtained by labeling the first object according to the position of the second object in the first object; 统计各位置的所述第二对象的数量;Counting the number of the second objects at each position; 基于所述数量,使用沿横断面翻转、添加离散高斯噪声和进行直方图均衡化中的至少之一方式,对所述第一对象进行位置均衡扩增;Based on the number, performing position-balanced amplification on the first object by using at least one of flipping along a cross section, adding discrete Gaussian noise, and performing histogram equalization; 将所述第一对象和扩增得到的所述第一对象输入至所述待训练模型。The first object and the expanded first object are input into the model to be trained. 3.根据权利要求1所述的模型训练方法,其特征在于,所述待训练模型包括编码块、特征提取块和解码块,所述轮廓检测分支和所述实例检测分支均包括所述特征提取块和所述解码块;3. The model training method according to claim 1, wherein the model to be trained includes an encoding block, a feature extraction block, and a decoding block, and the contour detection branch and the instance detection branch both include the feature extraction block and the decoding block; 所述编码块包括依次连接的M组下采样结构,M组所述下采样结构用于分别得到不同尺度的下采样结果,所述解码块包括M组与所述下采样结构一一对应的上采样结构,每一组所述下采样结构的采样结果与对应的所述上采样结构的前一级结构输出的特征拼接后作为所述上采样结构的输入特征;The encoding block includes M groups of downsampling structures connected in sequence, and the M groups of downsampling structures are used to obtain downsampling results of different scales respectively. The decoding block includes M groups of upsampling structures corresponding to the downsampling structures one by one. The sampling results of each group of the downsampling structures are spliced with the features output by the previous level structure of the corresponding upsampling structure to serve as the input features of the upsampling structure; 所述轮廓检测分支中的所述特征提取块用于基于所述编码块的输出提取深层特征,所述实例检测分支中的所述特征提取块用于基于编码块的输出和所述轮廓检测分支中的所述解码块的中间层上采样结构的上采样结果提取深层特征,所述解码块还包括分类层,所述分类层用于基于M组所述上采样结构的输出进行分类检测。The feature extraction block in the contour detection branch is used to extract deep features based on the output of the encoding block, and the feature extraction block in the instance detection branch is used to extract deep features based on the output of the encoding block and the upsampling result of the intermediate layer upsampling structure of the decoding block in the contour detection branch. The decoding block also includes a classification layer, which is used to perform classification detection based on the output of M groups of the upsampling structures. 4.根据权利要求3所述的方法,其特征在于,所述编码块中的每一组所述下采样结构包括依次连接的卷积块和BiA模块,所述卷积块用于下采样,所述BiA模块包括并联的两条残差分支,所述两条残差分支用于解耦所述卷积块输出的特征,得到两个特征图,所述两个特征图分别输入所述轮廓检测分支中的所述解码块、所述实例检测分支中的所述解码块。4. The method according to claim 3 is characterized in that each group of the downsampling structure in the encoding block includes a convolution block and a BiA module connected in sequence, the convolution block is used for downsampling, and the BiA module includes two residual branches connected in parallel, and the two residual branches are used to decouple the features output by the convolution block to obtain two feature maps, and the two feature maps are respectively input into the decoding block in the contour detection branch and the decoding block in the instance detection branch. 5.根据权利要求4所述的方法,其特征在于,所述BiA模块的每一条残差分支包括依次连接的两个残差子模块;5. The method according to claim 4, wherein each residual branch of the BiA module comprises two residual submodules connected in sequence; 所述BiA模块还包括空间注意力机制块,所述空间注意力机制块包括依次连接的最大池化层和平均池化层,所述空间注意力机制块的输入为同一组所述下采样结构的所述卷积块的输出,所述空间注意力机制块的输出通过Sigmoid函数得到权重图,并分别与两条所述残差分支的后一个所述残差子模块的输出。The BiA module also includes a spatial attention mechanism block, which includes a maximum pooling layer and an average pooling layer connected in sequence. The input of the spatial attention mechanism block is the output of the convolution block of the same group of downsampling structures. The output of the spatial attention mechanism block obtains a weight map through a Sigmoid function and is respectively combined with the output of the residual sub-module after the two residual branches. 6.根据权利要求3所述的方法,其特征在于,所述特征提取块包括依次连接的多个下采样层与多个上采样层;6. The method according to claim 3, wherein the feature extraction block comprises a plurality of downsampling layers and a plurality of upsampling layers connected in sequence; 每一所述下采样层与所述上采样层之前还包括Swin-Transformer层,最后一个所述下采样层与第一个所述上采样层采用一个卷积层进行拼接;最后一个所述下采样层的输出与前一个下采样层的输出通过短接实现拼接。Each of the downsampling layers and the upsampling layer also includes a Swin-Transformer layer before it, and the last downsampling layer and the first upsampling layer are spliced using a convolutional layer; the output of the last downsampling layer and the output of the previous downsampling layer are spliced by short-circuiting. 7.根据权利要求3所述的方法,其特征在于,所述轮廓检测分支中的所述解码块的中间层上采样结构的上采样结果通过下采样至所述编码块的输出尺度后,通过Sigmoid函数得到权重图,与所述编码块的输出相加后相乘作为实例检测分支的特征提取块的输入。7. The method according to claim 3 is characterized in that the upsampling result of the intermediate layer upsampling structure of the decoding block in the contour detection branch is downsampled to the output scale of the encoding block, and then a weight map is obtained by a Sigmoid function, which is added to the output of the encoding block and then multiplied as the input of the feature extraction block of the instance detection branch. 8.根据权利要求3所述的方法,其特征在于,所述解码块在不同尺度下进行的第一次上采样结果通过卷积调整通道数,之后与更大尺度的上采样结果元素相加,作为所述待训练模型的深监督。8. The method according to claim 3 is characterized in that the first upsampling results performed by the decoding block at different scales are adjusted for the number of channels through convolution, and then added to the upsampling result elements of the larger scale as deep supervision of the model to be trained. 9.一种模型训练装置,其特征在于,包括:9. A model training device, comprising: 第一获取模块,用于获取多张第一图像;A first acquisition module, configured to acquire a plurality of first images; 处理模块,用于针对每一所述第一图像,提取所述第一图像中的第一对象;并获取从所述第一图像中标注的第二对象实例,作为第二对象实例标签;a processing module, configured to extract, for each of the first images, a first object in the first image; and obtain a second object instance annotated in the first image as a second object instance label; 第二获取模块,用于获取所述第一对象的轮廓;A second acquisition module, configured to acquire the outline of the first object; 标签模块,用于根据所述第一对象的轮廓和所述第二对象实例,获取所述第二对象的轮廓,作为第二对象轮廓标签;a labeling module, configured to obtain, according to the outline of the first object and the second object instance, the outline of the second object as a second object outline label; 检测模块,用于将所述第一对象输入至待训练模型,获取所述待训练模型的输出;所述待训练模型包括轮廓检测分支和实例检测分支,所述输出包括第二对象轮廓输出和第二对象实例输出;所述实例检测分支基于所述第一对象和所述轮廓检测分支提取得到的轮廓特征检测第二对象轮廓;a detection module, configured to input the first object into a to-be-trained model and obtain an output of the to-be-trained model; the to-be-trained model includes a contour detection branch and an instance detection branch, the output including a second object contour output and a second object instance output; the instance detection branch detects a second object contour based on the first object and contour features extracted by the contour detection branch; 第一计算模块,用于分别基于所述第二对象实例标签和所述第二对象实例输出计算第一损失函数值,基于所述第二对象轮廓标签和所述第二对象轮廓输出计算第二损失函数值;a first calculation module, configured to calculate a first loss function value based on the second object instance label and the second object instance output, and to calculate a second loss function value based on the second object contour label and the second object contour output; 其中,利用以下公式计算所述第一损失函数值:The first loss function value is calculated using the following formula: 其中,a为第a个连通域,K为所述连通域的个数,p(xb)为输入所述轮廓检测分支的真实值,b为所述真实值的前景1与背景0的类别,q(xab)为所述轮廓检测分支的预测值,LCE为交叉熵损失函数,LCDE为所述轮廓检测分支的损失值;Wherein, a is the ath connected domain, K is the number of connected domains, p(x b ) is the true value input to the contour detection branch, b is the category of foreground 1 and background 0 of the true value, q(x ab ) is the predicted value of the contour detection branch, L CE is the cross entropy loss function, and L CDE is the loss value of the contour detection branch; 利用以下公式计算所述第二损失函数值:The second loss function value is calculated using the following formula: 其中,X为所述实例检测分支中预测结果的矩阵,Y为输入所述实例检测分支中的真实值,LDice为所述实例检测分支的损失值;Wherein, X is the matrix of the prediction results in the instance detection branch, Y is the true value input into the instance detection branch, and L Dice is the loss value of the instance detection branch; 第二计算模块,用于根据所述第一损失函数值和所述第二损失函数值计算第三损失函数值;利用以下公式计算所述第三损失函数值:The second calculation module is used to calculate a third loss function value based on the first loss function value and the second loss function value; the third loss function value is calculated using the following formula: L=αLCDE+βLDice L=αL CDE +βL Dice 其中,α为所述第一损失函数的计算权重,β为所述第二损失函数的计算权重;Wherein, α is the calculation weight of the first loss function, and β is the calculation weight of the second loss function; 调整模块,用于基于所述第三损失函数值,调整所述待训练模型的参数。An adjustment module is used to adjust the parameters of the model to be trained based on the third loss function value. 10.一种计算机设备,其特征在于,包括:10. A computer device, comprising: 存储器和处理器,所述存储器和所述处理器之间互相通信连接,所述存储器中存储有计算机指令,所述处理器通过执行所述计算机指令,从而执行权利要求1-8任一项所述的模型训练方法。A memory and a processor, wherein the memory and the processor are communicatively connected to each other, the memory stores computer instructions, and the processor executes the model training method according to any one of claims 1 to 8 by executing the computer instructions. 11.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机指令,所述计算机指令用于使所述计算机执行权利要求1-8任一项所述的模型训练方法。11. A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer instructions, and the computer instructions are used to enable the computer to execute the model training method described in any one of claims 1-8.
CN202211488759.XA 2022-11-25 2022-11-25 Model training method and device Active CN116071296B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211488759.XA CN116071296B (en) 2022-11-25 2022-11-25 Model training method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211488759.XA CN116071296B (en) 2022-11-25 2022-11-25 Model training method and device

Publications (2)

Publication Number Publication Date
CN116071296A CN116071296A (en) 2023-05-05
CN116071296B true CN116071296B (en) 2025-08-08

Family

ID=86182890

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211488759.XA Active CN116071296B (en) 2022-11-25 2022-11-25 Model training method and device

Country Status (1)

Country Link
CN (1) CN116071296B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116993680B (en) * 2023-07-04 2025-06-17 阿里巴巴达摩院(杭州)科技有限公司 Image processing method, image processing model training method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109711288A (en) * 2018-12-13 2019-05-03 西安电子科技大学 Remote Sensing Ship Detection Method Based on Feature Pyramid and Distance Constrained FCN
CN115063411A (en) * 2022-08-04 2022-09-16 湖南自兴智慧医疗科技有限公司 Chromosome abnormal region segmentation detection method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115049923A (en) * 2022-05-30 2022-09-13 北京航空航天大学杭州创新研究院 SAR image ship target instance segmentation training method, system and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109711288A (en) * 2018-12-13 2019-05-03 西安电子科技大学 Remote Sensing Ship Detection Method Based on Feature Pyramid and Distance Constrained FCN
CN115063411A (en) * 2022-08-04 2022-09-16 湖南自兴智慧医疗科技有限公司 Chromosome abnormal region segmentation detection method and system

Also Published As

Publication number Publication date
CN116071296A (en) 2023-05-05

Similar Documents

Publication Publication Date Title
US11748879B2 (en) Method and system for intracerebral hemorrhage detection and segmentation based on a multi-task fully convolutional network
US10460447B2 (en) Method and system for performing segmentation of image having a sparsely distributed object
CN111784671B (en) Pathological image lesion area detection method based on multi-scale deep learning
US8798345B2 (en) Diagnosis processing device, diagnosis processing system, diagnosis processing method, diagnosis processing program and computer-readable recording medium, and classification processing device
KR20230059799A (en) A Connected Machine Learning Model Using Collaborative Training for Lesion Detection
CN115330813B (en) Image processing method, device, equipment and readable storage medium
CN110490927A (en) For generating the methods, devices and systems of center line for the object in image
CN112949654B (en) Image detection method and related device and equipment
CN112396605B (en) Network training method and device, image recognition method and electronic equipment
EP3570216B1 (en) Devices and methods for identifying an object in an image
CN113012086B (en) Cross-modal image synthesis method
CN110390340A (en) The training method and detection method of feature coding model, vision relationship detection model
US20230115927A1 (en) Systems and methods for plaque identification, plaque composition analysis, and plaque stability detection
CN119206306A (en) Method and electronic device for identifying targets in medical images
CN114119637A (en) Brain white matter high signal segmentation method based on multi-scale fusion and split attention
CN109919915A (en) Retina fundus image abnormal region detection method and device based on deep learning
CN116034398A (en) Attentional multi-arm machine learning model for lesion segmentation
CN118657800B (en) Joint segmentation method of multiple lesions in retinal OCT images based on hybrid network
CN110363776B (en) Image processing method and electronic device
CN116071296B (en) Model training method and device
CN115546239B (en) Target segmentation method and device based on boundary attention and distance transformation
CN113379770B (en) Construction method of nasopharyngeal carcinoma MR image segmentation network, image segmentation method and device
JP7593177B2 (en) Model generation device, classification device, data generation device, model generation method, and model generation program
CN114140744A (en) Object-based quantity detection method, device, electronic device and storage medium
CN113326749A (en) Target detection method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant