[go: up one dir, main page]

CN119155549B - Zoom method, device and storage medium based on deep learning - Google Patents

Zoom method, device and storage medium based on deep learning Download PDF

Info

Publication number
CN119155549B
CN119155549B CN202411639743.3A CN202411639743A CN119155549B CN 119155549 B CN119155549 B CN 119155549B CN 202411639743 A CN202411639743 A CN 202411639743A CN 119155549 B CN119155549 B CN 119155549B
Authority
CN
China
Prior art keywords
image
focus
target
image acquisition
predicted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411639743.3A
Other languages
Chinese (zh)
Other versions
CN119155549A (en
Inventor
陈明珠
杨铭怡
葛艳红
沈剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202411639743.3A priority Critical patent/CN119155549B/en
Publication of CN119155549A publication Critical patent/CN119155549A/en
Application granted granted Critical
Publication of CN119155549B publication Critical patent/CN119155549B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/67Focus control based on electronic image sensor signals
    • H04N23/675Focus control based on electronic image sensor signals comprising setting of focusing regions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/69Control of means for changing angle of the field of view, e.g. optical zoom objectives or electronic zooming

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Automatic Focus Adjustment (AREA)
  • Studio Devices (AREA)

Abstract

The application discloses a zoom method, equipment and storage medium based on deep learning, wherein the zoom method based on deep learning comprises the steps of inputting a target image acquired by image acquisition equipment into a diffusion model trained in advance to obtain a focus prediction image output by the diffusion model; comparing a target focusing area in the target image with a predicted focusing area in the focus predicted image to obtain a focus comparison result; and carrying out zooming treatment on the image acquisition equipment according to the focusing comparison result. By means of the scheme, zooming accuracy can be improved.

Description

Deep learning-based zooming method, device and storage medium
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a zoom method, apparatus, and storage medium based on deep learning.
Background
With the rapid development of related technologies in the field of image processing and the field of computer vision, the acquisition and transmission of images has been popularized to various aspects of life.
Particularly in some special situations, such as railway road detection, electric power transmission line detection or pipeline detection, instrument reading detection and the like, there is a need to quickly and clearly focus on a target object in an interested region of an acquired image under long-distance and high magnification.
However, the principle of optical imaging of the lens is limited, and particularly, a long-focus lens often makes a target object and a background in an acquired image not to be kept in focus at the same time, and the situation can be generally expressed as that a focus evaluation function has double peaks. In the use process of the user, the problem that the user originally expects to focus on a foreground target object in the acquired image, but the actual lens focuses on the distant view of the image can be caused, and the zoom accuracy is reduced.
Disclosure of Invention
The application provides at least a zoom method, a device, equipment and a computer readable storage medium based on deep learning.
The application provides a zoom method based on deep learning, which comprises the steps of inputting a target image acquired by image acquisition equipment into a pre-trained diffusion model to obtain a focus prediction image output by the diffusion model, comparing a target focus area in the target image with a predicted focus area in the focus prediction image to obtain a focus comparison result, and carrying out zoom processing on the image acquisition equipment according to the focus comparison result.
In an embodiment, before the step of inputting the target image acquired by the image acquisition device into a pre-trained diffusion model to obtain a focus prediction image output by the diffusion model, the method further comprises the steps of respectively carrying out focus evaluation processing on a first image and a second image acquired by the image acquisition device to obtain a first focus evaluation value corresponding to the first image and a second focus evaluation value corresponding to the second image, wherein image acquisition focal lengths of the first image and the second image are different, judging whether a focus evaluation peak exists between the first focus evaluation value and the second focus evaluation value according to the first focus evaluation value, the second focus evaluation value and a preset focus evaluation threshold, and determining the first image or the second image as the target image in response to the presence of the focus evaluation peak between the first focus evaluation value and the second focus evaluation value.
In one embodiment, before the step of comparing the target focus area in the target image with the predicted focus area in the focus predicted image, the method further comprises determining the target focus area in the target image according to the definition of the target image and determining the predicted focus area in the focus predicted image according to the definition of the focus predicted image, and the step of comparing the target focus area in the target image with the predicted focus area in the focus predicted image to obtain a focus comparison result comprises comparing the position information of the target focus area with the position information of the predicted focus area to obtain a position deviation between the target focus area and the predicted focus area, and determining the focus comparison result according to the position deviation.
In one embodiment, the step of determining the target focusing area in the target image according to the definition of the target image and determining the predicted focusing area in the focus predicted image according to the definition of the focus predicted image comprises the steps of respectively carrying out image segmentation processing on the target image and the focus predicted image to obtain a plurality of target sub-images in the target image and a plurality of focus predicted sub-images in the focus predicted image, determining the target sub-image with the highest definition in each target sub-image as the target focusing area, and determining the focus predicted sub-image with the highest definition in each focus predicted sub-image as the predicted focusing area.
In an embodiment, the step of determining the focus comparison result according to the position deviation includes determining that the image acquisition device fails to focus to obtain the focus comparison result in response to the position deviation being greater than a preset deviation threshold value, and determining that the image acquisition device succeeds in focusing to obtain the focus comparison result in response to the position deviation being smaller than or equal to the preset deviation threshold value.
In an embodiment, the zooming processing of the image acquisition device according to the focusing comparison result includes the steps of responding to the focusing comparison result to represent focusing failure of the image acquisition device, zooming processing of the image acquisition device according to a preset zooming step length, obtaining a third image and a fourth image after the zooming processing of the image acquisition device, wherein the image acquisition focal lengths of the third image and the fourth image are different, respectively performing focusing evaluation processing on the third image and the fourth image to obtain a third focusing evaluation value corresponding to the third image and a fourth focusing evaluation value corresponding to the fourth image, and zooming down the preset zooming step length according to the fact that a peak value exists between the third focusing evaluation value and the fourth focusing evaluation value to obtain a zoomed zooming step length, and zooming processing the image acquisition device according to the zoomed zooming step length until a focusing evaluation difference value between a fifth image and a sixth image acquired by the image acquisition device is smaller than a preset focusing error threshold.
In an embodiment, the step of performing zoom processing on the image capturing device according to the zoomed-out zoom step length until a focus evaluation difference value between a fifth image and a sixth image captured by the image capturing device is smaller than a preset focus error threshold value includes adjusting an image capturing focal length of the image capturing device according to the zoomed-out zoom step length to obtain an adjusted image capturing focal length, and controlling the image capturing device to perform image capturing according to the adjusted image capturing focal length to obtain the fifth image and the sixth image, wherein the image capturing focal lengths of the fifth image and the sixth image are different.
In one embodiment, after the step of controlling the image capturing device to capture an image according to the adjusted image capturing focal length to obtain the fifth image and the sixth image, the method further includes performing focus evaluation processing on the fifth image and the sixth image, respectively, to obtain a fifth focus evaluation value corresponding to the fifth image and a sixth focus evaluation value corresponding to the sixth image, determining the focus evaluation difference value according to the fifth focus evaluation value and the sixth focus evaluation value, and suspending or stopping zoom processing on the image capturing device in response to the focus evaluation difference value being less than or equal to the focus error threshold value
In an embodiment, the step of performing focus evaluation processing on the first image and the second image acquired by the image acquisition device to obtain a first focus evaluation value corresponding to the first image and a second focus evaluation value corresponding to the second image includes obtaining a gray value of the first image, and calculating according to a preset focus evaluation function and the gray value of the first image to obtain the first focus evaluation value.
In an embodiment, before the step of inputting the target image acquired by the image acquisition device into a pre-trained diffusion model to obtain a focusing prediction image output by the diffusion model, the method further comprises inputting the acquired sample image into an initial model to be trained so that the initial model carries out forward diffusion processing on the sample image to obtain a noise image, wherein the sample image focuses on a target area, carrying out reverse diffusion processing on the noise image according to the initial model to obtain a denoising image and a model loss value corresponding to the denoising image, and carrying out model training on the initial model according to the loss value to obtain the diffusion model.
The application provides a zoom device based on deep learning, which comprises a focus prediction module, a focus comparison module and a zoom processing module, wherein the focus prediction module is used for inputting a target image acquired by image acquisition equipment into a pre-trained diffusion model to obtain a focus prediction image output by the diffusion model, the focus comparison module is used for comparing a target focusing area in the target image with a predicted focusing area in the focus prediction image to obtain a focus comparison result, and the zoom processing module is used for carrying out zoom processing on the image acquisition equipment according to the focus comparison result.
A third aspect of the present application provides an electronic device comprising a memory and a processor for executing program instructions stored in the memory to implement the above-described deep learning-based zoom method.
A fourth aspect of the present application provides a computer-readable storage medium having stored thereon program instructions which, when executed by a processor, implement the above-described deep learning-based zoom method.
According to the scheme, the target image acquired by the image acquisition device is input into the pre-trained diffusion model, diffusion processing is carried out on the received target image by the diffusion model to obtain the focusing prediction image output by the diffusion model, the target focusing area in the target image is compared with the predicted focusing area in the focusing prediction image to obtain the focusing comparison result, so that whether the target area of the target image acquired by the image acquisition device based on the current zooming parameters is matched with the predicted focusing area in the focusing prediction image predicted by the diffusion model or not can be judged according to the focusing comparison result, zooming processing is carried out on the image acquisition device according to the focusing comparison result, and zooming can be carried out on the image acquisition device based on a depth learning technology, and zooming accuracy in a zooming process is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
FIG. 1 is a flow diagram of an exemplary embodiment of a deep learning based zoom method of the present application;
FIG. 2 is a schematic diagram of a focus evaluation function dual peak in a depth learning based zoom method of the present application;
FIG. 3 is a schematic diagram of an exemplary image segmentation in a depth learning based zoom method of the present application;
FIG. 4 is a schematic illustration of forward diffusion in a deep learning based zoom method of the present application;
FIG. 5 is a schematic diagram of constructing training samples in the deep learning based zoom method of the present application;
FIG. 6 is a schematic diagram of back diffusion in a deep learning based zoom method of the present application;
FIG. 7 is a schematic diagram of a deep learning based zoom apparatus shown in accordance with an exemplary embodiment of the present application;
FIG. 8 is a schematic diagram of an embodiment of an electronic device of the present application;
fig. 9 is a schematic diagram of a computer-readable storage medium according to an embodiment of the present application.
Detailed Description
The following describes embodiments of the present application in detail with reference to the drawings.
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the present application.
The term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean that a exists alone, while a and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship. Further, "a plurality" herein means two or more than two. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, may mean including any one or more elements selected from the group consisting of A, B and C.
For ease of understanding, applicable scenarios of the present application will now be described by way of example. With the rapid development of related technologies such as image processing field and computer vision field, image acquisition and transmission have been popularized to life aspects. Particularly in some special situations, such as railway road detection, electric power transmission line or pipeline detection, instrument and meter reading detection and other industrial situations, it is required that the image acquisition device can still focus on the target in the region of interest rapidly and clearly under long-distance and high-magnification conditions. However, due to the limitation of the tele lens, the target and background areas can not be kept in focus at the same time in the image acquisition process, and the focus evaluation function has double wave peaks.
Although the conventional climbing algorithm can obtain a clear focusing point, the focusing point cannot be ensured to be the focusing position of the region of interest. Therefore, in the image acquisition process, there may be a problem that the foreground object is wanted to be seen clearly, but the lens is actually focused to the distant view, and the zoom accuracy of the image acquisition process cannot be ensured.
Referring to fig. 1, fig. 1 is a flowchart illustrating an exemplary embodiment of a deep learning-based zoom method according to the present application. Specifically, the method may include the steps of:
step S110, inputting the target image acquired by the image acquisition equipment into a pre-trained diffusion model to obtain a focusing prediction image output by the diffusion model.
The target image may be an image acquired by the image acquisition device after zooming to a certain extent (including manual zooming and/or automatic zooming) in the image acquisition process, and the target image may be an initial image acquired by the image acquisition device before zooming. In the present application, in the process of performing the zooming process based on the initial image to obtain the target image, recording analysis may be performed on relevant image acquisition parameters (for example, a zooming direction, a zooming step length, a focal length, a pixel gray value, etc.).
Diffusion Model (Diffusion Model), also known as generative Model. In the training process of a common diffusion model, a markov chain of diffusion steps is typically defined to slowly add random noise to the data, and then a reverse diffusion process is learned to construct the required data samples from the noise. Briefly, a diffusion model may be used to generate data similar to training data. The working principle is that training data is destroyed by continuously adding Gaussian noise, and then the training data is learned and recovered by reversing (reverse diffusion) the noise process. In the zooming method of the application, the focusing area (interested area) in the image is predicted by the reverse diffusion process of the diffusion model, namely the focusing predicted image output by the diffusion model is obtained.
Step S120, comparing the target focusing area in the target image with the predicted focusing area in the predicted focusing image to obtain a focusing comparison result.
The target focusing area refers to an area with clear focusing in the target image, and the predicted focusing area refers to an area with clear focusing in the focus predicted image.
Illustratively, in the present application, the area of focus sharpness may be determined by the sharpness of each pixel point in the image. For example, a target focus area of the target image is determined by the sharpness of each pixel in the target image, and a predicted focus area of the focus predicted image is determined by the sharpness of each pixel in the focus predicted image.
It should be noted that the position of the target focusing region in the target image and the position of the predicted focusing region in the focused predicted image may be the same or different, which is not limited by the present application. The method is equivalent to taking the position of the predicted focusing area generated by the diffusion model as reference data of focusing accuracy, if the position of the target focusing area is the same as the position of the predicted focusing area, the current focusing accuracy of the image acquisition equipment is represented, if the position of the target focusing area is different from the position of the predicted focusing area, the current unfocused accuracy of the image acquisition equipment is represented, and the zooming processing is needed to be continued on the image acquisition equipment. The resolution information of the target image and the focus prediction image can be kept consistent, so that the target focus area and the focus prediction area can be accurately compared.
It should be noted that, in the present application, the criterion for determining that the image capturing device focuses accurately may be that the position of the target focusing area is required to be identical to the position of the predicted focusing area, or that a position deviation threshold is set for providing an error interval in the processing procedure of the present application. For example, the position deviation between the position of the target focusing area and the position of the predicted focusing area is obtained, if the position deviation is smaller than or equal to the position deviation threshold, the target focusing area is represented to be matched with the predicted focusing area, the focusing accuracy of the image acquisition device can be judged, and if the position deviation is larger than the position deviation threshold, the target focusing area is represented to be not matched with the predicted focusing area, the image acquisition device can be judged to be not focused accurately, namely, the target focusing area currently focused by the image acquisition device is represented to be not the region of interest.
Step S130, zooming processing is carried out on the image acquisition equipment according to the focusing comparison result.
And if the focusing comparison result indicates that the image acquisition equipment is not focused accurately, the zooming processing can be continued until the actual focusing area of the image acquisition equipment is matched with the focusing area predicted by the diffusion model, and the focusing of the image acquisition equipment is accurate.
Illustratively, the method of performing the zooming process on the image pickup apparatus in this step may include, but is not limited to, taking a zooming direction from the initial image to the target image as a first direction and taking an opposite direction thereof (a zooming direction from the target image to the initial image) as a second direction. And controlling the image acquisition equipment to perform zooming processing according to the first direction and/or the second direction.
According to the application, the target image acquired by the image acquisition device is input into a pre-trained diffusion model, diffusion processing is carried out on the received target image by the diffusion model to obtain a focusing predicted image output by the diffusion model, the target focusing area in the target image is compared with the predicted focusing area in the focusing predicted image to obtain a focusing comparison result, so that whether the target area of the target image acquired by the image acquisition device based on the current zooming parameters is matched with the predicted focusing area in the focusing predicted image predicted by the diffusion model or not can be judged according to the focusing comparison result, zooming processing is carried out on the image acquisition device according to the focusing comparison result, and therefore zooming can be carried out on the image acquisition device based on a depth learning technology, and zooming accuracy of a zooming process is improved.
On the basis of the above embodiments, the steps before inputting the target image acquired by the image acquisition device into the diffusion model trained in advance to obtain the focus prediction image output by the diffusion model will be described in the embodiments of the present application. Specifically, the method of the embodiment comprises the following steps:
the method comprises the steps of respectively carrying out focusing evaluation processing on a first image and a second image acquired by image acquisition equipment to obtain a first focusing evaluation value corresponding to the first image and a second focusing evaluation value corresponding to the second image, wherein the image acquisition focal lengths of the first image and the second image are different, judging whether a focusing evaluation peak value exists between the first focusing evaluation value and the second focusing evaluation value according to the first focusing evaluation value, the second focusing evaluation value and a preset focusing evaluation threshold value, and determining the first image or the second image as a target image in response to the focusing evaluation peak value existing between the first focusing evaluation value and the second focusing evaluation value.
The focus evaluation processing refers to evaluation processing of the focus degree of the image, and may be evaluation processing based on all pixels in the image, or evaluation processing based on a part of pixels in the image, which is not limited herein. The focusing evaluation processing is carried out on the image to obtain the focusing evaluation value of the image, and the focusing degree of the image can be determined through the focusing evaluation value.
For example, in the present application, the image (may include the first image and the second image) may be subjected to evaluation processing according to a preset focus evaluation function. The mathematical expression of the focus evaluation function is:
Where f (i, j) refers to the gray value of the digital signal of the image at the point (i, j), FV (u, v) refers to the value (focus evaluation value) after discrete cosine transform (Discrete Cosine Transform, DCT) of the image frequency domain, u and v refer to the uv coordinates of the image (the coordinates of the image pixels in the horizontal and vertical directions, respectively), N refers to the number of pixels in the image (the total number of pixels in the image), c (u) and c (v) refer to the preset compensation coefficient such that the DCT transform matrix is an orthogonal matrix, c (u) =when and only when u=0 The value in other scenarios may be 1.
Specifically, in this embodiment, the image capturing apparatus may perform image capturing according to at least two different image capturing focal lengths, to obtain at least two images, such as a first image and a second image. It should be noted that, the relationship between the first image and the second image may refer to the relationship between the initial image and the target image in the foregoing embodiment, and the image capturing device may capture the second image after capturing the first image through zoom processing, or the image capturing device may capture the first image after capturing the second image through zoom processing, which is not limited herein. For convenience of explanation, in the example of the present application, the acquisition timing of the first image is earlier than the acquisition timing of the second image will be explained as an example.
For example, the first image acquired isThe position of the lens of the image acquisition device when the first image is acquired can be recorded as the initial position(The recording may be of a corresponding initial focal length, etc., and is not limited herein), and the first image is subjected to focus evaluation processing to obtain a first focus evaluation value of the first image. Zooming the image acquisition equipment to obtain the current position of the lens of the image acquisition equipment(Or the corresponding current focal length) and performing image acquisition, a second image can be obtainedThe second image is subjected to focus evaluation processing to obtain a second focus evaluation value of the second image
It should be noted that, in the present application, the zoom direction when the initial image is acquired may be used as the reference direction of the image acquisition apparatus in the subsequent zooming process. For example, the initial moving direction of the focus lens motor, which is the direction corresponding to the automatic zoom command and/or the manual zoom command received when the image capturing apparatus captures the initial image, is denoted as Dira (corresponding to the first direction of the foregoing embodiment), and the corresponding opposite direction is denoted as Dirb (corresponding to the second direction of the foregoing embodiment).
According to the first focus evaluation valueSecond focus evaluation valueAnd a preset focus evaluation threshold FV' judges whether a focus evaluation peak exists between the first focus evaluation value and the second focus evaluation value. Wherein, a climbing algorithm may be used to determine whether a focus evaluation peak exists between the first focus evaluation value and the second focus evaluation value.
Illustratively, with reference to the foregoing description, the position of the lens when the first image is acquired isThe initial direction of movement of the lens is denoted Dira, i.e., dir=dira. The lens can be controlled to move delta L towards the Dir direction according to the preset zooming step length (or moving step length) to reach the current positionAnd image acquisition is carried out to obtain a second image. Focus evaluation values of the first image and the second image are calculated, respectively. If FV0< FV1< FV', the FV trend is positive, the focusing accuracy of the images collected by the image collecting device before and after the zooming process is gradually improved, and the zooming process steps are continuously repeated. If FV' > FV0> FV1, where the FV trend is negative, is indicative of a gradual decrease in focus accuracy of images acquired by the image acquisition device before and after the zoom process, adjustment of the zoom parameters (including, but not limited to, adjustment of the zoom reversal and/or the zoom step) may be required. For example, the moving direction of lens becomes Dirb, that is, dir=dirb, and the zoom step Δl can be shortened to half of the original, and the above-described zooming process is repeated. Until after n times of zooming processing based on the climbing algorithm are continued, FVn-1 and FVn are both larger than the focus evaluation threshold FV' and the trend of change of FV is changed from positive to negative, the climbing algorithm processing may be stopped. At this time, the positions of the corresponding lens groups FVn-1 and FVn are respectivelyAndAnd can judgeAndThere is a peak of the focus evaluation function (i.e., focus evaluation peak).
In sum, can be used forCorresponding imageOr isCorresponding imageDetermined as the target image, or atAndSelecting a lens positionCorresponding imageAs a target image. Hereinafter mainly toAn example is described. It can be understood that if it is determined that there is a focus evaluation peak between the first image and the second image, the first image or the second image may be directly determined as the target image, or the zoom process is performed based on the hill climbing algorithm according to the foregoing manner, so as to determine a target image with more accurate focus.
On the basis of the above embodiments, the embodiments of the present application describe a step of comparing a target focus area in a target image with a predicted focus area in a focus predicted image and a step of comparing the target focus area in the target image with the predicted focus area in the focus predicted image to obtain a focus comparison result. Specifically, the method of the embodiment comprises the following steps:
And comparing the position information of the target focusing region with the position information of the predicted focusing region to obtain the position deviation between the target focusing region and the predicted focusing region, and determining a focusing comparison result according to the position deviation.
The sharpness may be determined according to the focus evaluation value of each pixel in the foregoing embodiment, and the present application is not limited thereto, referring to the existing image sharpness evaluation methods, such as the Brenner gradient method, tenegrad gradient method, laplace gradient method, variance method, energy gradient method, and the like.
In the description of the foregoing embodiment, since there may be a case of a double peak of the focus evaluation function in the scene of the depth of field of the lens, it is necessary to determine whether the current lens position is near the focus position corresponding to the region of interest of the scene. Thus, thenInputting the obtained image into a Diffusion Model after training to perform reverse Diffusion treatment to obtain an output result of the Diffusion Model, namely a focus prediction image’。
The partial image region with higher definition in the target image is determined as the target focus region, and the partial image region with higher definition in the focus prediction image is determined as the prediction focus region. Comparing the position information of the target focusing region with the position information of the predicted focusing region, the position deviation between the target focusing region and the predicted focusing region can be obtained. If the position deviation is smaller than or equal to a preset position deviation threshold value, the focus comparison result is obtained to represent that the target focusing area is matched with the predicted focusing area, and the focusing accuracy of the image acquisition equipment can be judged. Referring specifically to fig. 2, fig. 2 is a schematic diagram of a focus evaluation function dual peak in the depth learning-based zoom method of the present application, in which,AndAlthough there is a peak 1 in between, due toIs defined by the target focal region of (a)If the position deviation between the predicted focusing areas is larger than the position deviation threshold, the image area corresponding to the crest 1 is characterized as a clearly focused area, but is not an interested area, so that the image acquisition equipment is required to be controlled to perform zooming processing, a climbing algorithm is continuously performed until the image enters the valley value of the focusing evaluation function curve, a second crest (crest 2) is searched, and the lens position range (focal length range) near the crest 2 is judged as the focusing range when the image area is accurately focused on the interested area.
On the basis of the above-described embodiments, the steps of determining a target focus area in a target image from the sharpness of the target image and determining a predicted focus area in a focus predicted image from the sharpness of the focus predicted image will be described. Specifically, the method of the embodiment comprises the following steps:
the method comprises the steps of respectively carrying out image segmentation processing on a target image and a focus prediction image to obtain a plurality of target sub-images in the target image and a plurality of focus prediction sub-images in the focus prediction image, determining the target sub-image with the highest definition in each target sub-image as a target focus area, and determining the focus prediction sub-image with the highest definition in each focus prediction sub-image as a prediction focus area.
As described in connection with the foregoing embodiments, the methods of determining the target focus area and the predicted focus area in the present embodiment may be determined in accordance with the image block processing method, whereby the processing efficiency can be improved.
The image segmentation process is performed on the target image and the focus prediction image, respectively, to obtain a plurality of target sub-images (or target image blocks) in the target image and a plurality of focus prediction sub-images (or prediction image blocks) in the focus prediction image. As shown in fig. 3, fig. 3 is an exemplary image segmentation schematic diagram in the depth learning-based zoom method of the present application. Image of objectFocusing a predicted image' An image grid matrix (each grid corresponds to a sub-image) divided into N x M according to resolution. Each square may determine the filling value of the square according to the definition or focus evaluation value of each pixel point in the square, and specifically, an average value calculation method, a weighted average calculation method, a summation calculation method, etc. may be used, which is not limited herein. For example, each square is filled with a value of 0 to 10 according to the focusing condition, and the higher the focusing evaluation value or the higher the sharpness of the square is, the higher the filling value is. The method comprises the steps of obtaining two filled N x M image square matrixes, rapidly judging that maximum square grids in the two matrixes are respectively located at specific positions of the matrixes by using an argmax function, namely determining a target sub-image with highest definition in a target image to obtain a target focusing area, and determining a focusing prediction sub-image with highest definition in a focusing prediction sub-image to obtain a prediction focusing area.
On the basis of the above embodiments, the steps of determining the focus contrast result from the positional deviation will be described in the embodiments of the present application. Specifically, the method of the embodiment comprises the following steps:
And judging that the image acquisition equipment is successful in focusing to obtain a focusing comparison result in response to the position deviation being smaller than or equal to the preset deviation threshold value.
The description is given in connection with the previous embodiment and can still be understood in connection with fig. 3. If the position deviation between the target focusing area and the predicted focusing area is larger than a preset deviation threshold, judging that the focusing of the image acquisition equipment fails (or is not focused accurately), and obtaining a focusing comparison result. If the position deviation is smaller than or equal to a preset deviation threshold, judging that the focusing of the image acquisition equipment is successful (or the focusing is accurate), and obtaining a focusing comparison result.
In the case of [,Target image selected between-Target focus area of (2), and focus predicted image' The positional deviation between the predicted focal regions is less than or equal to the preset deviation threshold, it can be determined that,And is the focus range corresponding to the accurate focus of the region of interest. Otherwise, the zooming process needs to be continued on the image acquisition equipment.
On the basis of the above embodiments, the steps of performing the zooming process on the image pickup apparatus according to the focus comparison result will be described. Specifically, the method of the embodiment comprises the following steps:
The method comprises the steps of responding to focusing comparison results to represent focusing failure of image acquisition equipment, carrying out zooming processing on the image acquisition equipment according to preset zooming step length, obtaining a third image and a fourth image after zooming processing by the image acquisition equipment, carrying out focusing evaluation processing on the third image and the fourth image respectively to obtain a third focusing evaluation value corresponding to the third image and a fourth focusing evaluation value corresponding to the fourth image, responding to a peak value existing between the third focusing evaluation value and the fourth focusing evaluation value, reducing the preset zooming step length to obtain reduced zooming step length, carrying out zooming processing on the image acquisition equipment according to the reduced zooming step length until the focusing evaluation difference value between the fifth image and the sixth image acquired by the image acquisition equipment is smaller than a preset focusing error threshold value.
As described in connection with the foregoing embodiments, if the image capturing device is characterized as focusing failure (or unfocused) in response to the focus comparison result, the zooming process of the image capturing device needs to be continued. Specific procedures can be exemplified according to the foregoing embodimentAndAfter judging that the image acquisition equipment is not focused accurately, carrying out iterative zooming adjustment on the image acquisition equipment until the image acquisition equipment is focused accurately according to the following conditionsAndAnd judging that the focusing of the image acquisition equipment is accurate, and suspending or stopping the zooming adjustment of the image acquisition equipment.
Illustratively, if the image capturing device is characterized as failed to focus in response to the focus comparison result, the image capturing device may be zoomed according to a preset zoom step Δl. The relationship between the third image and the fourth image may be described with reference to the relationship between the first image and the second image in the foregoing embodiment, which is not described herein. In this embodiment, the acquisition timing of the third image is earlier than the acquisition timing of the fourth image is mainly taken as an example for illustration. Is equivalent to adjusting the lens system toAcquiring a third image when the angle is delta L, and adjusting the lens to beAnd carrying out image acquisition when the delta L is in a range of delta L, and obtaining a fourth image.
The method includes the steps of obtaining a third image, obtaining a fourth image, obtaining a zoom step by reducing the preset zoom step in response to a peak value between the third image and the fourth image, obtaining the zoom step by reducing the zoom step, carrying out zoom processing on the image acquisition device according to the reduced zoom step, and carrying out image acquisition on the image acquisition device according to image acquisition parameters after the zoom processing until a focus evaluation difference value between a fifth image and a sixth image acquired by the image acquisition device is smaller than a preset focus error threshold. When the correct focusing range is determined, the preset zooming step length is reduced mainly to reduce the focusing amplitude (the zooming step length after being reduced can be recorded as delta l) in the zooming process, and then zooming processing is continuously performed based on a climbing algorithm, so that the image acquisition equipment can accurately approach to the focusing evaluation peak value of the focusing evaluation function until the difference value between the FVs of the two images is smaller than the tolerable focusing error threshold value.
It can be understood that the embodiment of the present application is mainly described by taking the dual-peak case as an example, so that if the image area corresponding to the peak value 1 determined between the first image and the second image is not the region of interest, after the zooming processing based on the hill climbing algorithm, if the peak value 2 is determined between the third image and the fourth image, the image area corresponding to the peak value 2 can be directly determined as the region of interest, and the collected image does not need to be input into the diffusion model again to be predicted, and then the focus area deviation of the two images is compared, thereby improving the execution efficiency. Therefore, the focus of the region of interest can be completed by determining the position (also equivalent to the focal length) of the lens corresponding to the peak 2 by a method of zooming by reducing the zoom step. Of course, the present application is not limited to the above steps, and in order to improve the accuracy of the method, the collected image may be input into the diffusion model again to predict and then compare the focus area deviation of the two images.
On the basis of the above embodiment, the steps of performing zoom processing on the image capturing device according to the zoomed-out zoom step size until the focus evaluation difference between the fifth image and the sixth image captured by the image capturing device is smaller than the preset focus error threshold are described in the embodiments of the present application. Specifically, the method of the embodiment comprises the following steps:
The image acquisition device is controlled to acquire images according to the adjusted image acquisition focal length to obtain a fifth image and a sixth image, wherein the image acquisition focal length of the fifth image and the image acquisition focal length of the sixth image are different.
The foregoing description of the embodiments is that, after determining that the peak 2 exists in the third image and the fourth image, the zooming process is continued according to the zoomed step Δl after the zooming is reduced, and the adjusted image acquisition focal length is obtained based on the hill-climbing zooming method in the foregoing embodiment. And controlling the image acquisition equipment to acquire images according to the adjusted image acquisition focal length to obtain a fifth image and a sixth image. The image acquisition focal lengths of the fifth image and the sixth image are different, and the relationship between the fifth image and the sixth image may be the same as that described in the foregoing embodiment, and the description of the relationship between the first image and the second image is omitted here. The fifth image and the sixth image may be images acquired after zooming one or more times based on the third image and the fourth image, which is not limited herein.
On the basis of the above embodiment, the following steps of controlling the image capturing device to perform image capturing according to the adjusted image capturing focal length to obtain the fifth image and the sixth image will be described in the embodiments of the present application. Specifically, the method of the embodiment comprises the following steps:
The method comprises the steps of respectively carrying out focusing evaluation processing on a fifth image and a sixth image to obtain a fifth focusing evaluation value corresponding to the fifth image and a sixth focusing evaluation value corresponding to the sixth image, determining a focusing evaluation difference value according to the fifth focusing evaluation value and the sixth focusing evaluation value, and suspending or stopping zooming processing on the image acquisition equipment in response to the focusing evaluation difference value being smaller than or equal to a focusing error threshold value.
As described in connection with the foregoing embodiment, it can be known that there is a focal length corresponding to the focus evaluation peak between the focal length at which the fifth image is acquired and the focal length at which the sixth image is acquired after the processing procedure of the foregoing embodiment. Therefore, after the processing procedure of the foregoing embodiment, the fifth focus evaluation value of the fifth image and the sixth focus evaluation value of the sixth image may be acquired, it may be determined whether the focus evaluation difference between the fifth focus evaluation value and the sixth focus evaluation value is smaller than or equal to the focus error threshold, if so, it is indicated that the difference between the fifth image and the sixth image is larger, and thus the range where the focus evaluation peak is located is not accurately located, and it is necessary to continue the hill-climbing zoom processing, and if not, it is indicated that the difference between the fifth image and the sixth image is smaller, and therefore the range where the focus evaluation peak is located can be accurately represented, and therefore, the lens position (or focal length) in the interval range from the lens position (or focal length) corresponding to the fifth image to the lens position (or focal length) corresponding to the sixth image may be selected as the lens position (or focal length) accurately focusing on the region of interest. For example, the lens motor may be stopped directly at the lens position (or focal length) where the sixth image was acquired, completing the focusing of the region of interest.
On the basis of the above embodiment, the embodiment of the present application describes steps of performing focus evaluation processing on a first image and a second image acquired by an image acquisition device, to obtain a first focus evaluation value corresponding to the first image and a second focus evaluation value corresponding to the second image, respectively. Specifically, the method of the embodiment comprises the following steps:
And calculating according to a preset focusing evaluation function and the gray value of the first image to obtain a first focusing evaluation value.
The present embodiment will be described mainly with reference to the foregoing embodiments, in which the focus evaluation processing procedure of the first image is explained. It can be understood that the focus evaluation processing procedure of other images in the present application can be referred to in the same way, and thus will not be described in detail.
Specifically, the gray value of each pixel point of the first image and the number of the pixel points of the first image can be obtained, and the gray value and the number of the pixel points to be subjected to focusing evaluation are input into a preset focusing evaluation function, so that the focusing evaluation value output by the focusing evaluation function can be obtained.
On the basis of the above embodiments, the steps before inputting the target image acquired by the image acquisition device into the diffusion model trained in advance to obtain the focus prediction image output by the diffusion model will be described in the embodiments of the present application. Specifically, the method of the embodiment comprises the following steps:
Inputting the obtained sample image into an initial model to be trained, so that the initial model carries out forward diffusion treatment on the sample image to obtain a noise image, focusing the sample image on a target area, carrying out reverse diffusion treatment on the noise image according to the initial model to obtain a denoising image and model loss values corresponding to the denoising image, and carrying out model training on the initial model according to the loss values to obtain a diffusion model.
In connection with the foregoing embodiment, before inputting the target image into the diffusion model, model training is required using the real image which is clearly focused on the region of interest as a sample, thereby enabling the diffusion model after training to generate a focus prediction image with accurate focus from the input target image.
Illustratively, a training sample (sample image) first needs to be constructed. The real image group which has enough sample number and is clearly focused on the region of interest needs to be obtained as a training sample, and the Gaussian noise is added to the sample image x0 through T times of accumulation in each Diffusion Model forward Diffusion process to obtain x1, x 2. Referring to fig. 4, fig. 4 is a schematic diagram of forward diffusion in the zoom method based on deep learning according to the present application, and the mathematical expression of the noise superposition method is shown in the following formula:
Wherein, The gaussian distribution variance of the representation series is a super parameter which needs to be customized, the step size of training is determined, the specific principle of the formula can refer to the description of the forward diffusion noise superposition method in the prior art, and the description is omitted here.
Reference is made to fig. 5, which is a schematic diagram of a training sample constructed in the deep learning-based zoom method of the present application. Where xT gets closer to pure noise as T increases, xT becomes a complete Gaussian noise as T goes to infinity. Because of the benefit of balancing training effects and training duration, T may be a suitable value selected according to the actual application scenario, which is not limited herein. S real pictures which are clearly focused on the region of interest are subjected to the forward diffusion operation, so that a training sample set can be obtained and can be used as input of subsequent reverse diffusion training.
Further, the back Diffusion process in the Diffusion Model training process is the denoising inference process. Referring to fig. 6, fig. 6 is a schematic diagram of back diffusion in the deep learning-based zoom method of the present application. The forward-diffused Markov chain is traversed in reverse, sampling from q (xt-1|xt), converting noise into samples of the source-target distribution, and generating new picture data x0'. The mathematical expression of the training conditional probability distribution can be expressed as follows:
By Bayes formula It can be known that:
The application can select minimized negative log likelihood as training Loss loss=Eq < -logp θ (x 0) ], and when the Loss is minimized and converged, model training is finished to obtain the diffusion model of the application.
After the operation of triggering the refocusing of the lens of the image acquisition device is completed, for example, the spherical camera holder is turned from rotation to static, the tracking and zooming processes are finished, the image acquisition device can be controlled to acquire an initial image according to the method of the embodimentAnd a target imageAnd inputting the two images into a diffusion model to obtain a focusing evaluation value of the two images output by the diffusion model.
It should be further noted that, the execution subject of the zooming method based on the deep learning may be a zooming apparatus based on the deep learning, for example, the zooming method based on the deep learning may be executed by a terminal device or a server or other processing device, where the terminal device may be a User Equipment (UE), a computer, a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal digital assistant (Personal DIGITAL ASSISTANT, PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like. In some possible implementations, the deep learning based zoom method may be implemented by way of a processor invoking computer readable instructions stored in a memory.
Fig. 7 is a schematic diagram of a depth learning based zoom apparatus according to an exemplary embodiment of the present application. As shown in fig. 7, the exemplary deep learning-based zoom apparatus 700 includes a focus prediction module 710, a focus contrast module 720, and a zoom processing module 730. Specifically:
The focus prediction module 710 is configured to input the target image acquired by the image acquisition device into a pre-trained diffusion model, and obtain a focus prediction image output by the diffusion model.
And the focusing comparison module 720 is used for comparing the target focusing region in the target image with the predicted focusing region in the focusing predicted image to obtain a focusing comparison result.
And a zooming processing module 730, configured to perform zooming processing on the image capturing device according to the focus comparison result.
In the zoom device based on the deep learning, a target image acquired by an image acquisition device is input into a pre-trained diffusion model, diffusion processing is carried out on the received target image by the diffusion model to obtain a focusing predicted image output by the diffusion model, a target focusing area in the target image is compared with a predicted focusing area in the focusing predicted image to obtain a focusing comparison result, so that whether the target area of the target image acquired by the image acquisition device based on current zoom parameters is matched with the predicted focusing area in the focusing predicted image predicted by the diffusion model or not can be judged according to the focusing comparison result, and the image acquisition device is subjected to zoom processing according to the focusing comparison result, so that zooming can be carried out on the image acquisition device based on the deep learning technology, and the zoom accuracy of a zooming process is improved.
It should be noted that, the apparatus provided in the foregoing embodiments and the method provided in the foregoing embodiments belong to the same concept, and the specific manner in which each module and unit perform the operation has been described in detail in the method embodiments, which is not repeated herein. In practical application, the device provided in the above embodiment may distribute the functions to different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above, which is not limited herein.
The function of each module may refer to an embodiment of a zooming method based on deep learning, which is not described herein.
Referring to fig. 8, fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the application. The electronic device 100 comprises a memory 101 and a processor 102, the processor 102 being arranged to execute program instructions stored in the memory 101 for implementing the steps of any of the embodiments of the depth learning based zoom method described above. In a specific implementation scenario, the electronic device 100 may include, but is not limited to, a microcomputer, a server, and further, the electronic device 100 may also include a mobile device such as a notebook computer, a tablet computer, etc., which is not limited herein.
In particular, the processor 102 is configured to control itself and the memory 101 to implement the steps in any of the deep learning based zoom method embodiments described above. The processor 102 may also be referred to as a CPU (Central Processing Unit ). The processor 102 may be an integrated circuit chip having signal processing capabilities. The Processor 102 may also be a general purpose Processor, a digital signal Processor (DIGITAL SIGNAL Processor, DSP), an Application SPECIFIC INTEGRATED Circuit (ASIC), a Field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, a discrete gate or transistor logic device, a discrete hardware component. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 102 may be commonly implemented by an integrated circuit chip.
In the electronic equipment, a target image acquired by the image acquisition equipment is input into a pre-trained diffusion model, diffusion processing is carried out on the received target image by the diffusion model to obtain a focusing predicted image output by the diffusion model, a target focusing area in the target image is compared with a predicted focusing area in the focusing predicted image to obtain a focusing comparison result, so that whether the target area of the target image acquired by the image acquisition equipment based on current zooming parameters is matched with the predicted focusing area in the focusing predicted image predicted by the diffusion model or not can be judged according to the focusing comparison result, zooming processing is carried out on the image acquisition equipment according to the focusing comparison result, and therefore zooming of the image acquisition equipment based on a depth learning technology can be realized, and zooming accuracy of a zooming process is improved.
Referring to fig. 9, fig. 9 is a schematic structural diagram of an embodiment of a computer readable storage medium according to the present application. The computer readable storage medium 110 stores program instructions 111 executable by the processor, the program instructions 111 for implementing the steps in any of the deep learning based zoom method embodiments described above.
In the storage medium, a target image acquired by the image acquisition device is input into a pre-trained diffusion model by running a program instruction in the storage medium, diffusion processing is carried out on the received target image by the diffusion model to obtain a focusing prediction image output by the diffusion model, a target focusing area in the target image is compared with a predicted focusing area in the focusing prediction image to obtain a focusing comparison result, so that whether the target area of the target image acquired by the image acquisition device based on the current zooming parameters is matched with the predicted focusing area in the focusing prediction image predicted by the diffusion model or not can be judged according to the focusing comparison result, zooming processing is carried out on the image acquisition device according to the focusing comparison result, and zooming on the image acquisition device based on a depth learning technology can be realized, and zooming accuracy of a zooming process is improved.
In some embodiments, functions or modules included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.
The foregoing description of various embodiments is intended to highlight differences between the various embodiments, which may be the same or similar to each other by reference, and is not repeated herein for the sake of brevity.
In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical, or other forms.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units. The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present application. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.

Claims (12)

1.一种基于深度学习的变焦方法,其特征在于,所述方法包括:1. A zoom method based on deep learning, characterized in that the method comprises: 将图像采集设备采集到的目标图像输入预先训练的扩散模型中,得到所述扩散模型输出的聚焦预测图像;所述目标图像为所述图像采集设备采集到的第一图像和第二图像中的任一者,所述第一图像和所述第二图像的聚焦评价值之间存在聚焦评价峰值;Inputting a target image acquired by an image acquisition device into a pre-trained diffusion model to obtain a focus prediction image output by the diffusion model; the target image is any one of a first image and a second image acquired by the image acquisition device, and there is a focus evaluation peak between the focus evaluation values of the first image and the second image; 对比所述目标图像中的目标聚焦区域与所述聚焦预测图像中的预测聚焦区域,得到聚焦对比结果;Comparing a target focus area in the target image with a predicted focus area in the focus prediction image to obtain a focus comparison result; 所述对比所述目标图像中的目标聚焦区域与所述聚焦预测图像中的预测聚焦区域,得到聚焦对比结果的步骤,包括:对比所述目标聚焦区域的位置信息和所述预测聚焦区域的位置信息,得到所述目标聚焦区域和所述预测聚焦区域之间的位置偏差,并根据所述位置偏差确定所述聚焦对比结果;The step of comparing the target focus area in the target image with the predicted focus area in the predicted focus image to obtain a focus comparison result comprises: comparing position information of the target focus area with position information of the predicted focus area to obtain a position deviation between the target focus area and the predicted focus area, and determining the focus comparison result according to the position deviation; 响应于所述位置偏差大于预设偏差阈值,判定所述图像采集设备聚焦失败,得到所述聚焦对比结果;In response to the position deviation being greater than a preset deviation threshold, determining that the image acquisition device fails to focus, and obtaining the focus comparison result; 根据所述聚焦对比结果对所述图像采集设备进行变焦处理;Performing zoom processing on the image acquisition device according to the focus comparison result; 包括:根据预设变焦步长对所述图像采集设备进行变焦处理,直至所述图像采集设备在变焦处理过程中采集到的相邻两帧图像的聚焦评价值之间存在另一聚焦评价峰值,且所述相邻两帧图像的聚焦评价值之间的聚焦评价差值小于预设的聚焦误差阈值。The method comprises: performing zoom processing on the image acquisition device according to a preset zoom step length until there is another focus evaluation peak between the focus evaluation values of two adjacent frames of images acquired by the image acquisition device during the zoom processing, and the focus evaluation difference between the focus evaluation values of the two adjacent frames of images is less than a preset focus error threshold. 2.根据权利要求1所述的方法,其特征在于,在所述将图像采集设备采集到的目标图像输入预先训练的扩散模型中,得到所述扩散模型输出的聚焦预测图像的步骤之前,所述方法还包括:2. The method according to claim 1, characterized in that before the step of inputting the target image acquired by the image acquisition device into the pre-trained diffusion model to obtain the focus prediction image output by the diffusion model, the method further comprises: 分别对图像采集设备采集到的第一图像和第二图像进行聚焦评价处理,得到所述第一图像对应的第一聚焦评价值和所述第二图像对应的第二聚焦评价值;其中,所述第一图像和所述第二图像的图像采集焦距不同;Performing focus evaluation processing on a first image and a second image captured by an image acquisition device respectively to obtain a first focus evaluation value corresponding to the first image and a second focus evaluation value corresponding to the second image; wherein the image acquisition focal lengths of the first image and the second image are different; 根据所述第一聚焦评价值、所述第二聚焦评价值和预设聚焦评价阈值判断所述第一聚焦评价值和所述第二聚焦评价值之间是否存在聚焦评价峰值;Determining whether there is a focus evaluation peak between the first focus evaluation value and the second focus evaluation value according to the first focus evaluation value, the second focus evaluation value and a preset focus evaluation threshold; 响应于所述第一聚焦评价值和所述第二聚焦评价值之间存在所述聚焦评价峰值,将所述第一图像或所述第二图像确定为所述目标图像。In response to the focus evaluation peak existing between the first focus evaluation value and the second focus evaluation value, the first image or the second image is determined as the target image. 3.根据权利要求1所述的方法,其特征在于,在所述对比所述目标图像中的目标聚焦区域与所述聚焦预测图像中的预测聚焦区域的步骤之前,所述方法还包括:3. The method according to claim 1, characterized in that before the step of comparing the target focus area in the target image with the predicted focus area in the focus prediction image, the method further comprises: 根据所述目标图像的清晰度确定所述目标图像中的目标聚焦区域,以及根据所述聚焦预测图像的清晰度确定所述聚焦预测图像中的预测聚焦区域。A target focus area in the target image is determined according to the definition of the target image, and a predicted focus area in the predicted focus image is determined according to the definition of the predicted focus image. 4.根据权利要求3所述的方法,其特征在于,所述根据所述目标图像的清晰度确定所述目标图像中的目标聚焦区域,以及根据所述聚焦预测图像的清晰度确定所述聚焦预测图像中的预测聚焦区域的步骤,包括:4. The method according to claim 3, characterized in that the step of determining the target focus area in the target image according to the clarity of the target image, and determining the predicted focus area in the predicted focus image according to the clarity of the predicted focus image, comprises: 分别对所述目标图像和所述聚焦预测图像进行图像分割处理,得到所述目标图像中的多个目标子图像以及所述聚焦预测图像中的多个聚焦预测子图像;Performing image segmentation processing on the target image and the focus prediction image respectively to obtain a plurality of target sub-images in the target image and a plurality of focus prediction sub-images in the focus prediction image; 将各个目标子图像中的清晰度最高的目标子图像确定为所述目标聚焦区域,以及将各个聚焦预测子图像中的清晰度最高的聚焦预测子图像确定所述预测聚焦区域。The target sub-image with the highest definition among the target sub-images is determined as the target focus area, and the predicted focus area is determined as the predicted focus area from the predicted focus sub-image with the highest definition among the predicted focus sub-images. 5.根据权利要求1所述的方法,其特征在于,所述根据所述位置偏差确定所述聚焦对比结果的步骤,包括:5. The method according to claim 1, characterized in that the step of determining the focus contrast result according to the position deviation comprises: 响应于所述位置偏差小于或等于所述预设偏差阈值,判定所述图像采集设备聚焦成功,得到所述聚焦对比结果。In response to the position deviation being less than or equal to the preset deviation threshold, it is determined that the image acquisition device is focused successfully, and the focus comparison result is obtained. 6.根据权利要求5所述的方法,其特征在于,所述根据所述聚焦对比结果对所述图像采集设备进行变焦处理的步骤,包括:6. The method according to claim 5, characterized in that the step of performing zoom processing on the image acquisition device according to the focus contrast result comprises: 响应于所述聚焦对比结果表征所述图像采集设备聚焦失败,根据预设变焦步长对所述图像采集设备进行变焦处理,并获取所述图像采集设备在变焦处理之后获取到第三图像和第四图像;其中,所述第三图像和所述第四图像的图像采集焦距不同;In response to the focus comparison result indicating that the image acquisition device fails to focus, zoom processing is performed on the image acquisition device according to a preset zoom step length, and a third image and a fourth image are acquired by the image acquisition device after the zoom processing; wherein the image acquisition focal lengths of the third image and the fourth image are different; 分别对所述第三图像和第四图像进行聚焦评价处理,得到所述第三图像对应的第三聚焦评价值和所述第四图像对应的第四聚焦评价值;performing focus evaluation processing on the third image and the fourth image respectively to obtain a third focus evaluation value corresponding to the third image and a fourth focus evaluation value corresponding to the fourth image; 响应于所述第三聚焦评价值和所述第四聚焦评价值之间存在峰值,缩小所述预设变焦步长,得到缩小后的变焦步长;In response to the presence of a peak value between the third focus evaluation value and the fourth focus evaluation value, reducing the preset zoom step to obtain a reduced zoom step; 根据所述缩小后的变焦步长对所述图像采集设备进行变焦处理,直至所述图像采集设备采集到的第五图像和第六图像之间的聚焦评价差值小于预设的聚焦误差阈值。The image acquisition device is zoomed according to the reduced zoom step size until the focus evaluation difference between the fifth image and the sixth image acquired by the image acquisition device is smaller than a preset focus error threshold. 7.根据权利要求6所述的方法,其特征在于,所述根据所述缩小后的变焦步长对所述图像采集设备进行变焦处理,直至所述图像采集设备采集到的第五图像和第六图像之间的聚焦评价差值小于预设的聚焦误差阈值的步骤,包括:7. The method according to claim 6, characterized in that the step of performing zoom processing on the image acquisition device according to the reduced zoom step until the focus evaluation difference between the fifth image and the sixth image acquired by the image acquisition device is less than a preset focus error threshold comprises: 根据所述缩小后的变焦步长调整所述图像采集设备的图像采集焦距,得到调整后的图像采集焦距;Adjusting the image acquisition focal length of the image acquisition device according to the reduced zoom step length to obtain an adjusted image acquisition focal length; 控制所述图像采集设备根据所述调整后的图像采集焦距进行图像采集,得到所述第五图像和所述第六图像;其中,所述第五图像和所述第六图像的图像采集焦距不同。The image acquisition device is controlled to perform image acquisition according to the adjusted image acquisition focal length to obtain the fifth image and the sixth image; wherein the image acquisition focal lengths of the fifth image and the sixth image are different. 8.根据权利要求7所述的方法,其特征在于,在所述控制所述图像采集设备根据所述调整后的图像采集焦距进行图像采集,得到所述第五图像和所述第六图像的步骤之后,所述方法还包括:8. The method according to claim 7, characterized in that after the step of controlling the image acquisition device to acquire images according to the adjusted image acquisition focal length to obtain the fifth image and the sixth image, the method further comprises: 分别对所述第五图像和所述第六图像进行聚焦评价处理,得到所述第五图像对应的第五聚焦评价值和所述第六图像对应的第六聚焦评价值;performing focus evaluation processing on the fifth image and the sixth image respectively to obtain a fifth focus evaluation value corresponding to the fifth image and a sixth focus evaluation value corresponding to the sixth image; 根据所述第五聚焦评价值和所述第六聚焦评价值确定所述聚焦评价差值;determining the focus evaluation difference value according to the fifth focus evaluation value and the sixth focus evaluation value; 响应于所述聚焦评价差值小于或等于所述聚焦误差阈值,暂停或停止对所述图像采集设备进行变焦处理。In response to the focus evaluation difference being less than or equal to the focus error threshold, the zoom processing of the image acquisition device is paused or stopped. 9.根据权利要求2所述的方法,其特征在于,所述分别对图像采集设备采集到的第一图像和第二图像进行聚焦评价处理,得到所述第一图像对应的第一聚焦评价值和所述第二图像对应的第二聚焦评价值的步骤,包括:9. The method according to claim 2, characterized in that the step of performing focus evaluation processing on the first image and the second image captured by the image acquisition device respectively to obtain a first focus evaluation value corresponding to the first image and a second focus evaluation value corresponding to the second image comprises: 获取所述第一图像的灰度值;Acquire the grayscale value of the first image; 根据预设的聚焦评价函数和所述第一图像的灰度值进行计算,得到所述第一聚焦评价值。The first focus evaluation value is obtained by performing calculation according to a preset focus evaluation function and the gray value of the first image. 10.根据权利要求1所述的方法,其特征在于,在所述将图像采集设备采集到的目标图像输入预先训练的扩散模型中,得到所述扩散模型输出的聚焦预测图像的步骤之前,所述方法还包括:10. The method according to claim 1, characterized in that before the step of inputting the target image acquired by the image acquisition device into a pre-trained diffusion model to obtain a focus prediction image output by the diffusion model, the method further comprises: 将获取到的样本图像输入待训练的初始模型中,以使所述初始模型对所述样本图像进行正向扩散处理,得到噪声图像;其中,所述样本图像聚焦于目标区域;Inputting the acquired sample image into the initial model to be trained, so that the initial model performs forward diffusion processing on the sample image to obtain a noise image; wherein the sample image is focused on the target area; 根据所述初始模型对所述噪声图像进行反向扩散处理,得到去噪图像和所述去噪图像对应的模型损失值;Performing a back diffusion process on the noise image according to the initial model to obtain a denoised image and a model loss value corresponding to the denoised image; 根据所述损失值对所述初始模型进行模型训练,得到所述扩散模型。The initial model is trained according to the loss value to obtain the diffusion model. 11.一种电子设备,其特征在于,包括存储器和处理器,所述处理器用于执行所述存储器中存储的程序指令,以实现权利要求1至10任一项所述的方法。11. An electronic device, comprising a memory and a processor, wherein the processor is used to execute program instructions stored in the memory to implement the method according to any one of claims 1 to 10. 12.一种计算机可读存储介质,其上存储有程序指令,其特征在于,所述程序指令被处理器执行时实现权利要求1至10任一项所述的方法。12. A computer-readable storage medium having program instructions stored thereon, wherein the program instructions, when executed by a processor, implement the method according to any one of claims 1 to 10.
CN202411639743.3A 2024-11-18 2024-11-18 Zoom method, device and storage medium based on deep learning Active CN119155549B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411639743.3A CN119155549B (en) 2024-11-18 2024-11-18 Zoom method, device and storage medium based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411639743.3A CN119155549B (en) 2024-11-18 2024-11-18 Zoom method, device and storage medium based on deep learning

Publications (2)

Publication Number Publication Date
CN119155549A CN119155549A (en) 2024-12-17
CN119155549B true CN119155549B (en) 2025-03-21

Family

ID=93802033

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411639743.3A Active CN119155549B (en) 2024-11-18 2024-11-18 Zoom method, device and storage medium based on deep learning

Country Status (1)

Country Link
CN (1) CN119155549B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117119172A (en) * 2023-09-21 2023-11-24 杭州海康慧影科技有限公司 Automatic focusing effect detection method, device and equipment

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4207980B2 (en) * 2006-06-09 2009-01-14 ソニー株式会社 IMAGING DEVICE, IMAGING DEVICE CONTROL METHOD, AND COMPUTER PROGRAM
EP3089449B1 (en) * 2015-04-30 2020-03-04 InterDigital CE Patent Holdings Method for obtaining light-field data using a non-light-field imaging device, corresponding device, computer program product and non-transitory computer-readable carrier medium
CN107888819B (en) * 2016-09-29 2020-07-07 华为技术有限公司 Automatic focusing method and device
CN115567778A (en) * 2021-07-01 2023-01-03 深圳绿米联创科技有限公司 Automatic focusing method and device, electronic equipment and storage medium
CN114387327B (en) * 2021-12-21 2024-03-12 陕西师范大学 Synthetic aperture focusing imaging method based on deep learning parallax prediction
CN116506730A (en) * 2022-01-17 2023-07-28 北京小米移动软件有限公司 Method and device for determining focus area, electronic equipment, and storage medium
CN114760419B (en) * 2022-06-15 2022-09-20 深圳深知未来智能有限公司 Automatic focusing method and system based on deep learning
JP2024104391A (en) * 2023-01-24 2024-08-05 浜松ホトニクス株式会社 Focus position estimation method, focus position estimation program, focus position estimation system, model generation method, model generation program, model generation system, and focus position estimation model
WO2024159082A2 (en) * 2023-01-27 2024-08-02 Google Llc Monocular depth and optical flow estimation using diffusion models
CN117528234A (en) * 2023-10-31 2024-02-06 重庆大学 Deep learning camera automatic focusing method and system based on each pixel supervision

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117119172A (en) * 2023-09-21 2023-11-24 杭州海康慧影科技有限公司 Automatic focusing effect detection method, device and equipment

Also Published As

Publication number Publication date
CN119155549A (en) 2024-12-17

Similar Documents

Publication Publication Date Title
CN113837079B (en) Automatic focusing method, device, computer equipment and storage medium of microscope
Wang et al. Deep learning for camera autofocus
KR101643607B1 (en) Method and apparatus for generating of image data
CN103595920B (en) Image acquisition equipment, method and device for assisting focusing during zooming
US20150124059A1 (en) Multi-frame image calibrator
CN112367474B (en) Self-adaptive light field imaging method, device and equipment
JP2013531268A (en) Measuring distance using coded aperture
EP2733923A2 (en) Multiresolution depth from defocus based autofocus
CN112333379A (en) Image focusing method and device and image acquisition equipment
CN114913171A (en) Image out-of-focus detection method and device, electronic equipment and storage medium
CN112351196B (en) Image clarity determination method, image focusing method and device
WO2019104670A1 (en) Method and apparatus for determining depth value
KR20170101532A (en) Method for image fusion, Computer program for the same, and Recording medium storing computer program for the same
CN111885297B (en) Image definition determining method, image focusing method and device
WO2016161734A1 (en) Autofocusing method and device
CN115359108A (en) Depth prediction method and system based on defocusing under guidance of focal stack reconstruction
CN108376394B (en) Camera automatic focusing method and system based on frequency domain histogram analysis
CN119155549B (en) Zoom method, device and storage medium based on deep learning
CN111491105B (en) Focusing method of mobile terminal, mobile terminal and computer storage medium
CN118393711A (en) Real-time bright field focusing method for avoiding influence of impurities on surface of glass slide
CN118301471A (en) Image processing method, apparatus, electronic device, and computer-readable storage medium
Wang et al. Intelligent autofocus
JP2019083580A (en) Image processing apparatus, image processing method, and program
CN116567413A (en) Automatic focusing method, system and storage medium based on convolutional neural network
CN116582750A (en) Focusing method, focusing device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant