CN119155549B

CN119155549B - Zoom method, device and storage medium based on deep learning

Info

Publication number: CN119155549B
Application number: CN202411639743.3A
Authority: CN
Inventors: 陈明珠; 杨铭怡; 葛艳红; 沈剑
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2024-11-18
Filing date: 2024-11-18
Publication date: 2025-03-21
Anticipated expiration: 2044-11-18
Also published as: CN119155549A

Abstract

The application discloses a zoom method, equipment and storage medium based on deep learning, wherein the zoom method based on deep learning comprises the steps of inputting a target image acquired by image acquisition equipment into a diffusion model trained in advance to obtain a focus prediction image output by the diffusion model; comparing a target focusing area in the target image with a predicted focusing area in the focus predicted image to obtain a focus comparison result; and carrying out zooming treatment on the image acquisition equipment according to the focusing comparison result. By means of the scheme, zooming accuracy can be improved.

Description

Deep learning-based zooming method, device and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a zoom method, apparatus, and storage medium based on deep learning.

Background

With the rapid development of related technologies in the field of image processing and the field of computer vision, the acquisition and transmission of images has been popularized to various aspects of life.

Particularly in some special situations, such as railway road detection, electric power transmission line detection or pipeline detection, instrument reading detection and the like, there is a need to quickly and clearly focus on a target object in an interested region of an acquired image under long-distance and high magnification.

However, the principle of optical imaging of the lens is limited, and particularly, a long-focus lens often makes a target object and a background in an acquired image not to be kept in focus at the same time, and the situation can be generally expressed as that a focus evaluation function has double peaks. In the use process of the user, the problem that the user originally expects to focus on a foreground target object in the acquired image, but the actual lens focuses on the distant view of the image can be caused, and the zoom accuracy is reduced.

Disclosure of Invention

The application provides at least a zoom method, a device, equipment and a computer readable storage medium based on deep learning.

The application provides a zoom method based on deep learning, which comprises the steps of inputting a target image acquired by image acquisition equipment into a pre-trained diffusion model to obtain a focus prediction image output by the diffusion model, comparing a target focus area in the target image with a predicted focus area in the focus prediction image to obtain a focus comparison result, and carrying out zoom processing on the image acquisition equipment according to the focus comparison result.

In an embodiment, before the step of inputting the target image acquired by the image acquisition device into a pre-trained diffusion model to obtain a focus prediction image output by the diffusion model, the method further comprises the steps of respectively carrying out focus evaluation processing on a first image and a second image acquired by the image acquisition device to obtain a first focus evaluation value corresponding to the first image and a second focus evaluation value corresponding to the second image, wherein image acquisition focal lengths of the first image and the second image are different, judging whether a focus evaluation peak exists between the first focus evaluation value and the second focus evaluation value according to the first focus evaluation value, the second focus evaluation value and a preset focus evaluation threshold, and determining the first image or the second image as the target image in response to the presence of the focus evaluation peak between the first focus evaluation value and the second focus evaluation value.

In one embodiment, before the step of comparing the target focus area in the target image with the predicted focus area in the focus predicted image, the method further comprises determining the target focus area in the target image according to the definition of the target image and determining the predicted focus area in the focus predicted image according to the definition of the focus predicted image, and the step of comparing the target focus area in the target image with the predicted focus area in the focus predicted image to obtain a focus comparison result comprises comparing the position information of the target focus area with the position information of the predicted focus area to obtain a position deviation between the target focus area and the predicted focus area, and determining the focus comparison result according to the position deviation.

In one embodiment, the step of determining the target focusing area in the target image according to the definition of the target image and determining the predicted focusing area in the focus predicted image according to the definition of the focus predicted image comprises the steps of respectively carrying out image segmentation processing on the target image and the focus predicted image to obtain a plurality of target sub-images in the target image and a plurality of focus predicted sub-images in the focus predicted image, determining the target sub-image with the highest definition in each target sub-image as the target focusing area, and determining the focus predicted sub-image with the highest definition in each focus predicted sub-image as the predicted focusing area.

In an embodiment, the step of determining the focus comparison result according to the position deviation includes determining that the image acquisition device fails to focus to obtain the focus comparison result in response to the position deviation being greater than a preset deviation threshold value, and determining that the image acquisition device succeeds in focusing to obtain the focus comparison result in response to the position deviation being smaller than or equal to the preset deviation threshold value.

In an embodiment, the zooming processing of the image acquisition device according to the focusing comparison result includes the steps of responding to the focusing comparison result to represent focusing failure of the image acquisition device, zooming processing of the image acquisition device according to a preset zooming step length, obtaining a third image and a fourth image after the zooming processing of the image acquisition device, wherein the image acquisition focal lengths of the third image and the fourth image are different, respectively performing focusing evaluation processing on the third image and the fourth image to obtain a third focusing evaluation value corresponding to the third image and a fourth focusing evaluation value corresponding to the fourth image, and zooming down the preset zooming step length according to the fact that a peak value exists between the third focusing evaluation value and the fourth focusing evaluation value to obtain a zoomed zooming step length, and zooming processing the image acquisition device according to the zoomed zooming step length until a focusing evaluation difference value between a fifth image and a sixth image acquired by the image acquisition device is smaller than a preset focusing error threshold.

In an embodiment, the step of performing zoom processing on the image capturing device according to the zoomed-out zoom step length until a focus evaluation difference value between a fifth image and a sixth image captured by the image capturing device is smaller than a preset focus error threshold value includes adjusting an image capturing focal length of the image capturing device according to the zoomed-out zoom step length to obtain an adjusted image capturing focal length, and controlling the image capturing device to perform image capturing according to the adjusted image capturing focal length to obtain the fifth image and the sixth image, wherein the image capturing focal lengths of the fifth image and the sixth image are different.

In one embodiment, after the step of controlling the image capturing device to capture an image according to the adjusted image capturing focal length to obtain the fifth image and the sixth image, the method further includes performing focus evaluation processing on the fifth image and the sixth image, respectively, to obtain a fifth focus evaluation value corresponding to the fifth image and a sixth focus evaluation value corresponding to the sixth image, determining the focus evaluation difference value according to the fifth focus evaluation value and the sixth focus evaluation value, and suspending or stopping zoom processing on the image capturing device in response to the focus evaluation difference value being less than or equal to the focus error threshold value

In an embodiment, the step of performing focus evaluation processing on the first image and the second image acquired by the image acquisition device to obtain a first focus evaluation value corresponding to the first image and a second focus evaluation value corresponding to the second image includes obtaining a gray value of the first image, and calculating according to a preset focus evaluation function and the gray value of the first image to obtain the first focus evaluation value.

In an embodiment, before the step of inputting the target image acquired by the image acquisition device into a pre-trained diffusion model to obtain a focusing prediction image output by the diffusion model, the method further comprises inputting the acquired sample image into an initial model to be trained so that the initial model carries out forward diffusion processing on the sample image to obtain a noise image, wherein the sample image focuses on a target area, carrying out reverse diffusion processing on the noise image according to the initial model to obtain a denoising image and a model loss value corresponding to the denoising image, and carrying out model training on the initial model according to the loss value to obtain the diffusion model.

The application provides a zoom device based on deep learning, which comprises a focus prediction module, a focus comparison module and a zoom processing module, wherein the focus prediction module is used for inputting a target image acquired by image acquisition equipment into a pre-trained diffusion model to obtain a focus prediction image output by the diffusion model, the focus comparison module is used for comparing a target focusing area in the target image with a predicted focusing area in the focus prediction image to obtain a focus comparison result, and the zoom processing module is used for carrying out zoom processing on the image acquisition equipment according to the focus comparison result.

A third aspect of the present application provides an electronic device comprising a memory and a processor for executing program instructions stored in the memory to implement the above-described deep learning-based zoom method.

A fourth aspect of the present application provides a computer-readable storage medium having stored thereon program instructions which, when executed by a processor, implement the above-described deep learning-based zoom method.

According to the scheme, the target image acquired by the image acquisition device is input into the pre-trained diffusion model, diffusion processing is carried out on the received target image by the diffusion model to obtain the focusing prediction image output by the diffusion model, the target focusing area in the target image is compared with the predicted focusing area in the focusing prediction image to obtain the focusing comparison result, so that whether the target area of the target image acquired by the image acquisition device based on the current zooming parameters is matched with the predicted focusing area in the focusing prediction image predicted by the diffusion model or not can be judged according to the focusing comparison result, zooming processing is carried out on the image acquisition device according to the focusing comparison result, and zooming can be carried out on the image acquisition device based on a depth learning technology, and zooming accuracy in a zooming process is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

FIG. 1 is a flow diagram of an exemplary embodiment of a deep learning based zoom method of the present application;

FIG. 2 is a schematic diagram of a focus evaluation function dual peak in a depth learning based zoom method of the present application;

FIG. 3 is a schematic diagram of an exemplary image segmentation in a depth learning based zoom method of the present application;

FIG. 4 is a schematic illustration of forward diffusion in a deep learning based zoom method of the present application;

FIG. 5 is a schematic diagram of constructing training samples in the deep learning based zoom method of the present application;

FIG. 6 is a schematic diagram of back diffusion in a deep learning based zoom method of the present application;

FIG. 7 is a schematic diagram of a deep learning based zoom apparatus shown in accordance with an exemplary embodiment of the present application;

FIG. 8 is a schematic diagram of an embodiment of an electronic device of the present application;

fig. 9 is a schematic diagram of a computer-readable storage medium according to an embodiment of the present application.

Detailed Description

The following describes embodiments of the present application in detail with reference to the drawings.

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the present application.

The term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean that a exists alone, while a and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship. Further, "a plurality" herein means two or more than two. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, may mean including any one or more elements selected from the group consisting of A, B and C.

For ease of understanding, applicable scenarios of the present application will now be described by way of example. With the rapid development of related technologies such as image processing field and computer vision field, image acquisition and transmission have been popularized to life aspects. Particularly in some special situations, such as railway road detection, electric power transmission line or pipeline detection, instrument and meter reading detection and other industrial situations, it is required that the image acquisition device can still focus on the target in the region of interest rapidly and clearly under long-distance and high-magnification conditions. However, due to the limitation of the tele lens, the target and background areas can not be kept in focus at the same time in the image acquisition process, and the focus evaluation function has double wave peaks.

Although the conventional climbing algorithm can obtain a clear focusing point, the focusing point cannot be ensured to be the focusing position of the region of interest. Therefore, in the image acquisition process, there may be a problem that the foreground object is wanted to be seen clearly, but the lens is actually focused to the distant view, and the zoom accuracy of the image acquisition process cannot be ensured.

Referring to fig. 1, fig. 1 is a flowchart illustrating an exemplary embodiment of a deep learning-based zoom method according to the present application. Specifically, the method may include the steps of:

step S110, inputting the target image acquired by the image acquisition equipment into a pre-trained diffusion model to obtain a focusing prediction image output by the diffusion model.

The target image may be an image acquired by the image acquisition device after zooming to a certain extent (including manual zooming and/or automatic zooming) in the image acquisition process, and the target image may be an initial image acquired by the image acquisition device before zooming. In the present application, in the process of performing the zooming process based on the initial image to obtain the target image, recording analysis may be performed on relevant image acquisition parameters (for example, a zooming direction, a zooming step length, a focal length, a pixel gray value, etc.).

Diffusion Model (Diffusion Model), also known as generative Model. In the training process of a common diffusion model, a markov chain of diffusion steps is typically defined to slowly add random noise to the data, and then a reverse diffusion process is learned to construct the required data samples from the noise. Briefly, a diffusion model may be used to generate data similar to training data. The working principle is that training data is destroyed by continuously adding Gaussian noise, and then the training data is learned and recovered by reversing (reverse diffusion) the noise process. In the zooming method of the application, the focusing area (interested area) in the image is predicted by the reverse diffusion process of the diffusion model, namely the focusing predicted image output by the diffusion model is obtained.

Step S120, comparing the target focusing area in the target image with the predicted focusing area in the predicted focusing image to obtain a focusing comparison result.

The target focusing area refers to an area with clear focusing in the target image, and the predicted focusing area refers to an area with clear focusing in the focus predicted image.

Illustratively, in the present application, the area of focus sharpness may be determined by the sharpness of each pixel point in the image. For example, a target focus area of the target image is determined by the sharpness of each pixel in the target image, and a predicted focus area of the focus predicted image is determined by the sharpness of each pixel in the focus predicted image.

It should be noted that the position of the target focusing region in the target image and the position of the predicted focusing region in the focused predicted image may be the same or different, which is not limited by the present application. The method is equivalent to taking the position of the predicted focusing area generated by the diffusion model as reference data of focusing accuracy, if the position of the target focusing area is the same as the position of the predicted focusing area, the current focusing accuracy of the image acquisition equipment is represented, if the position of the target focusing area is different from the position of the predicted focusing area, the current unfocused accuracy of the image acquisition equipment is represented, and the zooming processing is needed to be continued on the image acquisition equipment. The resolution information of the target image and the focus prediction image can be kept consistent, so that the target focus area and the focus prediction area can be accurately compared.

It should be noted that, in the present application, the criterion for determining that the image capturing device focuses accurately may be that the position of the target focusing area is required to be identical to the position of the predicted focusing area, or that a position deviation threshold is set for providing an error interval in the processing procedure of the present application. For example, the position deviation between the position of the target focusing area and the position of the predicted focusing area is obtained, if the position deviation is smaller than or equal to the position deviation threshold, the target focusing area is represented to be matched with the predicted focusing area, the focusing accuracy of the image acquisition device can be judged, and if the position deviation is larger than the position deviation threshold, the target focusing area is represented to be not matched with the predicted focusing area, the image acquisition device can be judged to be not focused accurately, namely, the target focusing area currently focused by the image acquisition device is represented to be not the region of interest.

Step S130, zooming processing is carried out on the image acquisition equipment according to the focusing comparison result.

And if the focusing comparison result indicates that the image acquisition equipment is not focused accurately, the zooming processing can be continued until the actual focusing area of the image acquisition equipment is matched with the focusing area predicted by the diffusion model, and the focusing of the image acquisition equipment is accurate.

Illustratively, the method of performing the zooming process on the image pickup apparatus in this step may include, but is not limited to, taking a zooming direction from the initial image to the target image as a first direction and taking an opposite direction thereof (a zooming direction from the target image to the initial image) as a second direction. And controlling the image acquisition equipment to perform zooming processing according to the first direction and/or the second direction.

According to the application, the target image acquired by the image acquisition device is input into a pre-trained diffusion model, diffusion processing is carried out on the received target image by the diffusion model to obtain a focusing predicted image output by the diffusion model, the target focusing area in the target image is compared with the predicted focusing area in the focusing predicted image to obtain a focusing comparison result, so that whether the target area of the target image acquired by the image acquisition device based on the current zooming parameters is matched with the predicted focusing area in the focusing predicted image predicted by the diffusion model or not can be judged according to the focusing comparison result, zooming processing is carried out on the image acquisition device according to the focusing comparison result, and therefore zooming can be carried out on the image acquisition device based on a depth learning technology, and zooming accuracy of a zooming process is improved.

On the basis of the above embodiments, the steps before inputting the target image acquired by the image acquisition device into the diffusion model trained in advance to obtain the focus prediction image output by the diffusion model will be described in the embodiments of the present application. Specifically, the method of the embodiment comprises the following steps:

the method comprises the steps of respectively carrying out focusing evaluation processing on a first image and a second image acquired by image acquisition equipment to obtain a first focusing evaluation value corresponding to the first image and a second focusing evaluation value corresponding to the second image, wherein the image acquisition focal lengths of the first image and the second image are different, judging whether a focusing evaluation peak value exists between the first focusing evaluation value and the second focusing evaluation value according to the first focusing evaluation value, the second focusing evaluation value and a preset focusing evaluation threshold value, and determining the first image or the second image as a target image in response to the focusing evaluation peak value existing between the first focusing evaluation value and the second focusing evaluation value.

The focus evaluation processing refers to evaluation processing of the focus degree of the image, and may be evaluation processing based on all pixels in the image, or evaluation processing based on a part of pixels in the image, which is not limited herein. The focusing evaluation processing is carried out on the image to obtain the focusing evaluation value of the image, and the focusing degree of the image can be determined through the focusing evaluation value.

For example, in the present application, the image (may include the first image and the second image) may be subjected to evaluation processing according to a preset focus evaluation function. The mathematical expression of the focus evaluation function is:

Where f (i, j) refers to the gray value of the digital signal of the image at the point (i, j), FV (u, v) refers to the value (focus evaluation value) after discrete cosine transform (Discrete Cosine Transform, DCT) of the image frequency domain, u and v refer to the uv coordinates of the image (the coordinates of the image pixels in the horizontal and vertical directions, respectively), N refers to the number of pixels in the image (the total number of pixels in the image), c (u) and c (v) refer to the preset compensation coefficient such that the DCT transform matrix is an orthogonal matrix, c (u) =when and only when u=0 The value in other scenarios may be 1.

Specifically, in this embodiment, the image capturing apparatus may perform image capturing according to at least two different image capturing focal lengths, to obtain at least two images, such as a first image and a second image. It should be noted that, the relationship between the first image and the second image may refer to the relationship between the initial image and the target image in the foregoing embodiment, and the image capturing device may capture the second image after capturing the first image through zoom processing, or the image capturing device may capture the first image after capturing the second image through zoom processing, which is not limited herein. For convenience of explanation, in the example of the present application, the acquisition timing of the first image is earlier than the acquisition timing of the second image will be explained as an example.

For example, the first image acquired isThe position of the lens of the image acquisition device when the first image is acquired can be recorded as the initial position(The recording may be of a corresponding initial focal length, etc., and is not limited herein), and the first image is subjected to focus evaluation processing to obtain a first focus evaluation value of the first image. Zooming the image acquisition equipment to obtain the current position of the lens of the image acquisition equipment(Or the corresponding current focal length) and performing image acquisition, a second image can be obtainedThe second image is subjected to focus evaluation processing to obtain a second focus evaluation value of the second image。

It should be noted that, in the present application, the zoom direction when the initial image is acquired may be used as the reference direction of the image acquisition apparatus in the subsequent zooming process. For example, the initial moving direction of the focus lens motor, which is the direction corresponding to the automatic zoom command and/or the manual zoom command received when the image capturing apparatus captures the initial image, is denoted as Dira (corresponding to the first direction of the foregoing embodiment), and the corresponding opposite direction is denoted as Dirb (corresponding to the second direction of the foregoing embodiment).

According to the first focus evaluation valueSecond focus evaluation valueAnd a preset focus evaluation threshold FV' judges whether a focus evaluation peak exists between the first focus evaluation value and the second focus evaluation value. Wherein, a climbing algorithm may be used to determine whether a focus evaluation peak exists between the first focus evaluation value and the second focus evaluation value.

Illustratively, with reference to the foregoing description, the position of the lens when the first image is acquired isThe initial direction of movement of the lens is denoted Dira, i.e., dir=dira. The lens can be controlled to move delta L towards the Dir direction according to the preset zooming step length (or moving step length) to reach the current positionAnd image acquisition is carried out to obtain a second image. Focus evaluation values of the first image and the second image are calculated, respectively. If FV0< FV1< FV', the FV trend is positive, the focusing accuracy of the images collected by the image collecting device before and after the zooming process is gradually improved, and the zooming process steps are continuously repeated. If FV' > FV0> FV1, where the FV trend is negative, is indicative of a gradual decrease in focus accuracy of images acquired by the image acquisition device before and after the zoom process, adjustment of the zoom parameters (including, but not limited to, adjustment of the zoom reversal and/or the zoom step) may be required. For example, the moving direction of lens becomes Dirb, that is, dir=dirb, and the zoom step Δl can be shortened to half of the original, and the above-described zooming process is repeated. Until after n times of zooming processing based on the climbing algorithm are continued, FVn-1 and FVn are both larger than the focus evaluation threshold FV' and the trend of change of FV is changed from positive to negative, the climbing algorithm processing may be stopped. At this time, the positions of the corresponding lens groups FVn-1 and FVn are respectivelyAndAnd can judgeAndThere is a peak of the focus evaluation function (i.e., focus evaluation peak).

In sum, can be used forCorresponding imageOr isCorresponding imageDetermined as the target image, or atAndSelecting a lens positionCorresponding imageAs a target image. Hereinafter mainly toAn example is described. It can be understood that if it is determined that there is a focus evaluation peak between the first image and the second image, the first image or the second image may be directly determined as the target image, or the zoom process is performed based on the hill climbing algorithm according to the foregoing manner, so as to determine a target image with more accurate focus.

On the basis of the above embodiments, the embodiments of the present application describe a step of comparing a target focus area in a target image with a predicted focus area in a focus predicted image and a step of comparing the target focus area in the target image with the predicted focus area in the focus predicted image to obtain a focus comparison result. Specifically, the method of the embodiment comprises the following steps:

And comparing the position information of the target focusing region with the position information of the predicted focusing region to obtain the position deviation between the target focusing region and the predicted focusing region, and determining a focusing comparison result according to the position deviation.

The sharpness may be determined according to the focus evaluation value of each pixel in the foregoing embodiment, and the present application is not limited thereto, referring to the existing image sharpness evaluation methods, such as the Brenner gradient method, tenegrad gradient method, laplace gradient method, variance method, energy gradient method, and the like.

In the description of the foregoing embodiment, since there may be a case of a double peak of the focus evaluation function in the scene of the depth of field of the lens, it is necessary to determine whether the current lens position is near the focus position corresponding to the region of interest of the scene. Thus, thenInputting the obtained image into a Diffusion Model after training to perform reverse Diffusion treatment to obtain an output result of the Diffusion Model, namely a focus prediction image’。

The partial image region with higher definition in the target image is determined as the target focus region, and the partial image region with higher definition in the focus prediction image is determined as the prediction focus region. Comparing the position information of the target focusing region with the position information of the predicted focusing region, the position deviation between the target focusing region and the predicted focusing region can be obtained. If the position deviation is smaller than or equal to a preset position deviation threshold value, the focus comparison result is obtained to represent that the target focusing area is matched with the predicted focusing area, and the focusing accuracy of the image acquisition equipment can be judged. Referring specifically to fig. 2, fig. 2 is a schematic diagram of a focus evaluation function dual peak in the depth learning-based zoom method of the present application, in which,AndAlthough there is a peak 1 in between, due toIs defined by the target focal region of (a)If the position deviation between the predicted focusing areas is larger than the position deviation threshold, the image area corresponding to the crest 1 is characterized as a clearly focused area, but is not an interested area, so that the image acquisition equipment is required to be controlled to perform zooming processing, a climbing algorithm is continuously performed until the image enters the valley value of the focusing evaluation function curve, a second crest (crest 2) is searched, and the lens position range (focal length range) near the crest 2 is judged as the focusing range when the image area is accurately focused on the interested area.

On the basis of the above-described embodiments, the steps of determining a target focus area in a target image from the sharpness of the target image and determining a predicted focus area in a focus predicted image from the sharpness of the focus predicted image will be described. Specifically, the method of the embodiment comprises the following steps:

the method comprises the steps of respectively carrying out image segmentation processing on a target image and a focus prediction image to obtain a plurality of target sub-images in the target image and a plurality of focus prediction sub-images in the focus prediction image, determining the target sub-image with the highest definition in each target sub-image as a target focus area, and determining the focus prediction sub-image with the highest definition in each focus prediction sub-image as a prediction focus area.

As described in connection with the foregoing embodiments, the methods of determining the target focus area and the predicted focus area in the present embodiment may be determined in accordance with the image block processing method, whereby the processing efficiency can be improved.

The image segmentation process is performed on the target image and the focus prediction image, respectively, to obtain a plurality of target sub-images (or target image blocks) in the target image and a plurality of focus prediction sub-images (or prediction image blocks) in the focus prediction image. As shown in fig. 3, fig. 3 is an exemplary image segmentation schematic diagram in the depth learning-based zoom method of the present application. Image of objectFocusing a predicted image' An image grid matrix (each grid corresponds to a sub-image) divided into N x M according to resolution. Each square may determine the filling value of the square according to the definition or focus evaluation value of each pixel point in the square, and specifically, an average value calculation method, a weighted average calculation method, a summation calculation method, etc. may be used, which is not limited herein. For example, each square is filled with a value of 0 to 10 according to the focusing condition, and the higher the focusing evaluation value or the higher the sharpness of the square is, the higher the filling value is. The method comprises the steps of obtaining two filled N x M image square matrixes, rapidly judging that maximum square grids in the two matrixes are respectively located at specific positions of the matrixes by using an argmax function, namely determining a target sub-image with highest definition in a target image to obtain a target focusing area, and determining a focusing prediction sub-image with highest definition in a focusing prediction sub-image to obtain a prediction focusing area.

On the basis of the above embodiments, the steps of determining the focus contrast result from the positional deviation will be described in the embodiments of the present application. Specifically, the method of the embodiment comprises the following steps:

And judging that the image acquisition equipment is successful in focusing to obtain a focusing comparison result in response to the position deviation being smaller than or equal to the preset deviation threshold value.

The description is given in connection with the previous embodiment and can still be understood in connection with fig. 3. If the position deviation between the target focusing area and the predicted focusing area is larger than a preset deviation threshold, judging that the focusing of the image acquisition equipment fails (or is not focused accurately), and obtaining a focusing comparison result. If the position deviation is smaller than or equal to a preset deviation threshold, judging that the focusing of the image acquisition equipment is successful (or the focusing is accurate), and obtaining a focusing comparison result.

In the case of [,Target image selected between-Target focus area of (2), and focus predicted image' The positional deviation between the predicted focal regions is less than or equal to the preset deviation threshold, it can be determined that,And is the focus range corresponding to the accurate focus of the region of interest. Otherwise, the zooming process needs to be continued on the image acquisition equipment.

On the basis of the above embodiments, the steps of performing the zooming process on the image pickup apparatus according to the focus comparison result will be described. Specifically, the method of the embodiment comprises the following steps:

The method comprises the steps of responding to focusing comparison results to represent focusing failure of image acquisition equipment, carrying out zooming processing on the image acquisition equipment according to preset zooming step length, obtaining a third image and a fourth image after zooming processing by the image acquisition equipment, carrying out focusing evaluation processing on the third image and the fourth image respectively to obtain a third focusing evaluation value corresponding to the third image and a fourth focusing evaluation value corresponding to the fourth image, responding to a peak value existing between the third focusing evaluation value and the fourth focusing evaluation value, reducing the preset zooming step length to obtain reduced zooming step length, carrying out zooming processing on the image acquisition equipment according to the reduced zooming step length until the focusing evaluation difference value between the fifth image and the sixth image acquired by the image acquisition equipment is smaller than a preset focusing error threshold value.

As described in connection with the foregoing embodiments, if the image capturing device is characterized as focusing failure (or unfocused) in response to the focus comparison result, the zooming process of the image capturing device needs to be continued. Specific procedures can be exemplified according to the foregoing embodimentAndAfter judging that the image acquisition equipment is not focused accurately, carrying out iterative zooming adjustment on the image acquisition equipment until the image acquisition equipment is focused accurately according to the following conditionsAndAnd judging that the focusing of the image acquisition equipment is accurate, and suspending or stopping the zooming adjustment of the image acquisition equipment.

Illustratively, if the image capturing device is characterized as failed to focus in response to the focus comparison result, the image capturing device may be zoomed according to a preset zoom step Δl. The relationship between the third image and the fourth image may be described with reference to the relationship between the first image and the second image in the foregoing embodiment, which is not described herein. In this embodiment, the acquisition timing of the third image is earlier than the acquisition timing of the fourth image is mainly taken as an example for illustration. Is equivalent to adjusting the lens system toAcquiring a third image when the angle is delta L, and adjusting the lens to beAnd carrying out image acquisition when the delta L is in a range of delta L, and obtaining a fourth image.

The method includes the steps of obtaining a third image, obtaining a fourth image, obtaining a zoom step by reducing the preset zoom step in response to a peak value between the third image and the fourth image, obtaining the zoom step by reducing the zoom step, carrying out zoom processing on the image acquisition device according to the reduced zoom step, and carrying out image acquisition on the image acquisition device according to image acquisition parameters after the zoom processing until a focus evaluation difference value between a fifth image and a sixth image acquired by the image acquisition device is smaller than a preset focus error threshold. When the correct focusing range is determined, the preset zooming step length is reduced mainly to reduce the focusing amplitude (the zooming step length after being reduced can be recorded as delta l) in the zooming process, and then zooming processing is continuously performed based on a climbing algorithm, so that the image acquisition equipment can accurately approach to the focusing evaluation peak value of the focusing evaluation function until the difference value between the FVs of the two images is smaller than the tolerable focusing error threshold value.

It can be understood that the embodiment of the present application is mainly described by taking the dual-peak case as an example, so that if the image area corresponding to the peak value 1 determined between the first image and the second image is not the region of interest, after the zooming processing based on the hill climbing algorithm, if the peak value 2 is determined between the third image and the fourth image, the image area corresponding to the peak value 2 can be directly determined as the region of interest, and the collected image does not need to be input into the diffusion model again to be predicted, and then the focus area deviation of the two images is compared, thereby improving the execution efficiency. Therefore, the focus of the region of interest can be completed by determining the position (also equivalent to the focal length) of the lens corresponding to the peak 2 by a method of zooming by reducing the zoom step. Of course, the present application is not limited to the above steps, and in order to improve the accuracy of the method, the collected image may be input into the diffusion model again to predict and then compare the focus area deviation of the two images.

On the basis of the above embodiment, the steps of performing zoom processing on the image capturing device according to the zoomed-out zoom step size until the focus evaluation difference between the fifth image and the sixth image captured by the image capturing device is smaller than the preset focus error threshold are described in the embodiments of the present application. Specifically, the method of the embodiment comprises the following steps:

The image acquisition device is controlled to acquire images according to the adjusted image acquisition focal length to obtain a fifth image and a sixth image, wherein the image acquisition focal length of the fifth image and the image acquisition focal length of the sixth image are different.

The foregoing description of the embodiments is that, after determining that the peak 2 exists in the third image and the fourth image, the zooming process is continued according to the zoomed step Δl after the zooming is reduced, and the adjusted image acquisition focal length is obtained based on the hill-climbing zooming method in the foregoing embodiment. And controlling the image acquisition equipment to acquire images according to the adjusted image acquisition focal length to obtain a fifth image and a sixth image. The image acquisition focal lengths of the fifth image and the sixth image are different, and the relationship between the fifth image and the sixth image may be the same as that described in the foregoing embodiment, and the description of the relationship between the first image and the second image is omitted here. The fifth image and the sixth image may be images acquired after zooming one or more times based on the third image and the fourth image, which is not limited herein.

On the basis of the above embodiment, the following steps of controlling the image capturing device to perform image capturing according to the adjusted image capturing focal length to obtain the fifth image and the sixth image will be described in the embodiments of the present application. Specifically, the method of the embodiment comprises the following steps:

The method comprises the steps of respectively carrying out focusing evaluation processing on a fifth image and a sixth image to obtain a fifth focusing evaluation value corresponding to the fifth image and a sixth focusing evaluation value corresponding to the sixth image, determining a focusing evaluation difference value according to the fifth focusing evaluation value and the sixth focusing evaluation value, and suspending or stopping zooming processing on the image acquisition equipment in response to the focusing evaluation difference value being smaller than or equal to a focusing error threshold value.

As described in connection with the foregoing embodiment, it can be known that there is a focal length corresponding to the focus evaluation peak between the focal length at which the fifth image is acquired and the focal length at which the sixth image is acquired after the processing procedure of the foregoing embodiment. Therefore, after the processing procedure of the foregoing embodiment, the fifth focus evaluation value of the fifth image and the sixth focus evaluation value of the sixth image may be acquired, it may be determined whether the focus evaluation difference between the fifth focus evaluation value and the sixth focus evaluation value is smaller than or equal to the focus error threshold, if so, it is indicated that the difference between the fifth image and the sixth image is larger, and thus the range where the focus evaluation peak is located is not accurately located, and it is necessary to continue the hill-climbing zoom processing, and if not, it is indicated that the difference between the fifth image and the sixth image is smaller, and therefore the range where the focus evaluation peak is located can be accurately represented, and therefore, the lens position (or focal length) in the interval range from the lens position (or focal length) corresponding to the fifth image to the lens position (or focal length) corresponding to the sixth image may be selected as the lens position (or focal length) accurately focusing on the region of interest. For example, the lens motor may be stopped directly at the lens position (or focal length) where the sixth image was acquired, completing the focusing of the region of interest.

On the basis of the above embodiment, the embodiment of the present application describes steps of performing focus evaluation processing on a first image and a second image acquired by an image acquisition device, to obtain a first focus evaluation value corresponding to the first image and a second focus evaluation value corresponding to the second image, respectively. Specifically, the method of the embodiment comprises the following steps:

And calculating according to a preset focusing evaluation function and the gray value of the first image to obtain a first focusing evaluation value.

The present embodiment will be described mainly with reference to the foregoing embodiments, in which the focus evaluation processing procedure of the first image is explained. It can be understood that the focus evaluation processing procedure of other images in the present application can be referred to in the same way, and thus will not be described in detail.

Specifically, the gray value of each pixel point of the first image and the number of the pixel points of the first image can be obtained, and the gray value and the number of the pixel points to be subjected to focusing evaluation are input into a preset focusing evaluation function, so that the focusing evaluation value output by the focusing evaluation function can be obtained.

Inputting the obtained sample image into an initial model to be trained, so that the initial model carries out forward diffusion treatment on the sample image to obtain a noise image, focusing the sample image on a target area, carrying out reverse diffusion treatment on the noise image according to the initial model to obtain a denoising image and model loss values corresponding to the denoising image, and carrying out model training on the initial model according to the loss values to obtain a diffusion model.

In connection with the foregoing embodiment, before inputting the target image into the diffusion model, model training is required using the real image which is clearly focused on the region of interest as a sample, thereby enabling the diffusion model after training to generate a focus prediction image with accurate focus from the input target image.

Illustratively, a training sample (sample image) first needs to be constructed. The real image group which has enough sample number and is clearly focused on the region of interest needs to be obtained as a training sample, and the Gaussian noise is added to the sample image x0 through T times of accumulation in each Diffusion Model forward Diffusion process to obtain x1, x 2. Referring to fig. 4, fig. 4 is a schematic diagram of forward diffusion in the zoom method based on deep learning according to the present application, and the mathematical expression of the noise superposition method is shown in the following formula:

Wherein, The gaussian distribution variance of the representation series is a super parameter which needs to be customized, the step size of training is determined, the specific principle of the formula can refer to the description of the forward diffusion noise superposition method in the prior art, and the description is omitted here.

Reference is made to fig. 5, which is a schematic diagram of a training sample constructed in the deep learning-based zoom method of the present application. Where xT gets closer to pure noise as T increases, xT becomes a complete Gaussian noise as T goes to infinity. Because of the benefit of balancing training effects and training duration, T may be a suitable value selected according to the actual application scenario, which is not limited herein. S real pictures which are clearly focused on the region of interest are subjected to the forward diffusion operation, so that a training sample set can be obtained and can be used as input of subsequent reverse diffusion training.

Further, the back Diffusion process in the Diffusion Model training process is the denoising inference process. Referring to fig. 6, fig. 6 is a schematic diagram of back diffusion in the deep learning-based zoom method of the present application. The forward-diffused Markov chain is traversed in reverse, sampling from q (xt-1|xt), converting noise into samples of the source-target distribution, and generating new picture data x0'. The mathematical expression of the training conditional probability distribution can be expressed as follows:

By Bayes formula It can be known that:

The application can select minimized negative log likelihood as training Loss loss=Eq < -logp θ (x 0) ], and when the Loss is minimized and converged, model training is finished to obtain the diffusion model of the application.

After the operation of triggering the refocusing of the lens of the image acquisition device is completed, for example, the spherical camera holder is turned from rotation to static, the tracking and zooming processes are finished, the image acquisition device can be controlled to acquire an initial image according to the method of the embodimentAnd a target imageAnd inputting the two images into a diffusion model to obtain a focusing evaluation value of the two images output by the diffusion model.

It should be further noted that, the execution subject of the zooming method based on the deep learning may be a zooming apparatus based on the deep learning, for example, the zooming method based on the deep learning may be executed by a terminal device or a server or other processing device, where the terminal device may be a User Equipment (UE), a computer, a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal digital assistant (Personal DIGITAL ASSISTANT, PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like. In some possible implementations, the deep learning based zoom method may be implemented by way of a processor invoking computer readable instructions stored in a memory.

Fig. 7 is a schematic diagram of a depth learning based zoom apparatus according to an exemplary embodiment of the present application. As shown in fig. 7, the exemplary deep learning-based zoom apparatus 700 includes a focus prediction module 710, a focus contrast module 720, and a zoom processing module 730. Specifically:

The focus prediction module 710 is configured to input the target image acquired by the image acquisition device into a pre-trained diffusion model, and obtain a focus prediction image output by the diffusion model.

And the focusing comparison module 720 is used for comparing the target focusing region in the target image with the predicted focusing region in the focusing predicted image to obtain a focusing comparison result.

And a zooming processing module 730, configured to perform zooming processing on the image capturing device according to the focus comparison result.

In the zoom device based on the deep learning, a target image acquired by an image acquisition device is input into a pre-trained diffusion model, diffusion processing is carried out on the received target image by the diffusion model to obtain a focusing predicted image output by the diffusion model, a target focusing area in the target image is compared with a predicted focusing area in the focusing predicted image to obtain a focusing comparison result, so that whether the target area of the target image acquired by the image acquisition device based on current zoom parameters is matched with the predicted focusing area in the focusing predicted image predicted by the diffusion model or not can be judged according to the focusing comparison result, and the image acquisition device is subjected to zoom processing according to the focusing comparison result, so that zooming can be carried out on the image acquisition device based on the deep learning technology, and the zoom accuracy of a zooming process is improved.

It should be noted that, the apparatus provided in the foregoing embodiments and the method provided in the foregoing embodiments belong to the same concept, and the specific manner in which each module and unit perform the operation has been described in detail in the method embodiments, which is not repeated herein. In practical application, the device provided in the above embodiment may distribute the functions to different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above, which is not limited herein.

The function of each module may refer to an embodiment of a zooming method based on deep learning, which is not described herein.

Referring to fig. 8, fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the application. The electronic device 100 comprises a memory 101 and a processor 102, the processor 102 being arranged to execute program instructions stored in the memory 101 for implementing the steps of any of the embodiments of the depth learning based zoom method described above. In a specific implementation scenario, the electronic device 100 may include, but is not limited to, a microcomputer, a server, and further, the electronic device 100 may also include a mobile device such as a notebook computer, a tablet computer, etc., which is not limited herein.

In particular, the processor 102 is configured to control itself and the memory 101 to implement the steps in any of the deep learning based zoom method embodiments described above. The processor 102 may also be referred to as a CPU (Central Processing Unit ). The processor 102 may be an integrated circuit chip having signal processing capabilities. The Processor 102 may also be a general purpose Processor, a digital signal Processor (DIGITAL SIGNAL Processor, DSP), an Application SPECIFIC INTEGRATED Circuit (ASIC), a Field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, a discrete gate or transistor logic device, a discrete hardware component. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 102 may be commonly implemented by an integrated circuit chip.

In the electronic equipment, a target image acquired by the image acquisition equipment is input into a pre-trained diffusion model, diffusion processing is carried out on the received target image by the diffusion model to obtain a focusing predicted image output by the diffusion model, a target focusing area in the target image is compared with a predicted focusing area in the focusing predicted image to obtain a focusing comparison result, so that whether the target area of the target image acquired by the image acquisition equipment based on current zooming parameters is matched with the predicted focusing area in the focusing predicted image predicted by the diffusion model or not can be judged according to the focusing comparison result, zooming processing is carried out on the image acquisition equipment according to the focusing comparison result, and therefore zooming of the image acquisition equipment based on a depth learning technology can be realized, and zooming accuracy of a zooming process is improved.

Referring to fig. 9, fig. 9 is a schematic structural diagram of an embodiment of a computer readable storage medium according to the present application. The computer readable storage medium 110 stores program instructions 111 executable by the processor, the program instructions 111 for implementing the steps in any of the deep learning based zoom method embodiments described above.

In the storage medium, a target image acquired by the image acquisition device is input into a pre-trained diffusion model by running a program instruction in the storage medium, diffusion processing is carried out on the received target image by the diffusion model to obtain a focusing prediction image output by the diffusion model, a target focusing area in the target image is compared with a predicted focusing area in the focusing prediction image to obtain a focusing comparison result, so that whether the target area of the target image acquired by the image acquisition device based on the current zooming parameters is matched with the predicted focusing area in the focusing prediction image predicted by the diffusion model or not can be judged according to the focusing comparison result, zooming processing is carried out on the image acquisition device according to the focusing comparison result, and zooming on the image acquisition device based on a depth learning technology can be realized, and zooming accuracy of a zooming process is improved.

In some embodiments, functions or modules included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.

The foregoing description of various embodiments is intended to highlight differences between the various embodiments, which may be the same or similar to each other by reference, and is not repeated herein for the sake of brevity.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical, or other forms.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units. The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present application. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.

Claims

1. A zoom method based on deep learning, characterized in that the method comprises:

Inputting a target image acquired by an image acquisition device into a pre-trained diffusion model to obtain a focus prediction image output by the diffusion model; the target image is any one of a first image and a second image acquired by the image acquisition device, and there is a focus evaluation peak between the focus evaluation values of the first image and the second image;

Comparing a target focus area in the target image with a predicted focus area in the focus prediction image to obtain a focus comparison result;

The step of comparing the target focus area in the target image with the predicted focus area in the predicted focus image to obtain a focus comparison result comprises: comparing position information of the target focus area with position information of the predicted focus area to obtain a position deviation between the target focus area and the predicted focus area, and determining the focus comparison result according to the position deviation;

In response to the position deviation being greater than a preset deviation threshold, determining that the image acquisition device fails to focus, and obtaining the focus comparison result;

Performing zoom processing on the image acquisition device according to the focus comparison result;

The method comprises: performing zoom processing on the image acquisition device according to a preset zoom step length until there is another focus evaluation peak between the focus evaluation values of two adjacent frames of images acquired by the image acquisition device during the zoom processing, and the focus evaluation difference between the focus evaluation values of the two adjacent frames of images is less than a preset focus error threshold.

2. The method according to claim 1, characterized in that before the step of inputting the target image acquired by the image acquisition device into the pre-trained diffusion model to obtain the focus prediction image output by the diffusion model, the method further comprises:

Performing focus evaluation processing on a first image and a second image captured by an image acquisition device respectively to obtain a first focus evaluation value corresponding to the first image and a second focus evaluation value corresponding to the second image; wherein the image acquisition focal lengths of the first image and the second image are different;

Determining whether there is a focus evaluation peak between the first focus evaluation value and the second focus evaluation value according to the first focus evaluation value, the second focus evaluation value and a preset focus evaluation threshold;

In response to the focus evaluation peak existing between the first focus evaluation value and the second focus evaluation value, the first image or the second image is determined as the target image.

3. The method according to claim 1, characterized in that before the step of comparing the target focus area in the target image with the predicted focus area in the focus prediction image, the method further comprises:

A target focus area in the target image is determined according to the definition of the target image, and a predicted focus area in the predicted focus image is determined according to the definition of the predicted focus image.

4. The method according to claim 3, characterized in that the step of determining the target focus area in the target image according to the clarity of the target image, and determining the predicted focus area in the predicted focus image according to the clarity of the predicted focus image, comprises:

Performing image segmentation processing on the target image and the focus prediction image respectively to obtain a plurality of target sub-images in the target image and a plurality of focus prediction sub-images in the focus prediction image;

The target sub-image with the highest definition among the target sub-images is determined as the target focus area, and the predicted focus area is determined as the predicted focus area from the predicted focus sub-image with the highest definition among the predicted focus sub-images.

5. The method according to claim 1, characterized in that the step of determining the focus contrast result according to the position deviation comprises:

In response to the position deviation being less than or equal to the preset deviation threshold, it is determined that the image acquisition device is focused successfully, and the focus comparison result is obtained.

6. The method according to claim 5, characterized in that the step of performing zoom processing on the image acquisition device according to the focus contrast result comprises:

In response to the focus comparison result indicating that the image acquisition device fails to focus, zoom processing is performed on the image acquisition device according to a preset zoom step length, and a third image and a fourth image are acquired by the image acquisition device after the zoom processing; wherein the image acquisition focal lengths of the third image and the fourth image are different;

performing focus evaluation processing on the third image and the fourth image respectively to obtain a third focus evaluation value corresponding to the third image and a fourth focus evaluation value corresponding to the fourth image;

In response to the presence of a peak value between the third focus evaluation value and the fourth focus evaluation value, reducing the preset zoom step to obtain a reduced zoom step;

The image acquisition device is zoomed according to the reduced zoom step size until the focus evaluation difference between the fifth image and the sixth image acquired by the image acquisition device is smaller than a preset focus error threshold.

7. The method according to claim 6, characterized in that the step of performing zoom processing on the image acquisition device according to the reduced zoom step until the focus evaluation difference between the fifth image and the sixth image acquired by the image acquisition device is less than a preset focus error threshold comprises:

Adjusting the image acquisition focal length of the image acquisition device according to the reduced zoom step length to obtain an adjusted image acquisition focal length;

The image acquisition device is controlled to perform image acquisition according to the adjusted image acquisition focal length to obtain the fifth image and the sixth image; wherein the image acquisition focal lengths of the fifth image and the sixth image are different.

8. The method according to claim 7, characterized in that after the step of controlling the image acquisition device to acquire images according to the adjusted image acquisition focal length to obtain the fifth image and the sixth image, the method further comprises:

performing focus evaluation processing on the fifth image and the sixth image respectively to obtain a fifth focus evaluation value corresponding to the fifth image and a sixth focus evaluation value corresponding to the sixth image;

determining the focus evaluation difference value according to the fifth focus evaluation value and the sixth focus evaluation value;

In response to the focus evaluation difference being less than or equal to the focus error threshold, the zoom processing of the image acquisition device is paused or stopped.

9. The method according to claim 2, characterized in that the step of performing focus evaluation processing on the first image and the second image captured by the image acquisition device respectively to obtain a first focus evaluation value corresponding to the first image and a second focus evaluation value corresponding to the second image comprises:

Acquire the grayscale value of the first image;

The first focus evaluation value is obtained by performing calculation according to a preset focus evaluation function and the gray value of the first image.

10. The method according to claim 1, characterized in that before the step of inputting the target image acquired by the image acquisition device into a pre-trained diffusion model to obtain a focus prediction image output by the diffusion model, the method further comprises:

Inputting the acquired sample image into the initial model to be trained, so that the initial model performs forward diffusion processing on the sample image to obtain a noise image; wherein the sample image is focused on the target area;

Performing a back diffusion process on the noise image according to the initial model to obtain a denoised image and a model loss value corresponding to the denoised image;

The initial model is trained according to the loss value to obtain the diffusion model.

11. An electronic device, comprising a memory and a processor, wherein the processor is used to execute program instructions stored in the memory to implement the method according to any one of claims 1 to 10.

12. A computer-readable storage medium having program instructions stored thereon, wherein the program instructions, when executed by a processor, implement the method according to any one of claims 1 to 10.