[go: up one dir, main page]

CN112381083A - Saliency perception image clipping method based on potential region pair - Google Patents

Saliency perception image clipping method based on potential region pair Download PDF

Info

Publication number
CN112381083A
CN112381083A CN202010538411.1A CN202010538411A CN112381083A CN 112381083 A CN112381083 A CN 112381083A CN 202010538411 A CN202010538411 A CN 202010538411A CN 112381083 A CN112381083 A CN 112381083A
Authority
CN
China
Prior art keywords
saliency
network
roi
map
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010538411.1A
Other languages
Chinese (zh)
Inventor
袁峰
徐武将
王冕
徐亦飞
李浬
桑葛楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Oying Network Technology Co ltd
Original Assignee
Hangzhou Oying Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Oying Network Technology Co ltd filed Critical Hangzhou Oying Network Technology Co ltd
Priority to CN202010538411.1A priority Critical patent/CN112381083A/en
Publication of CN112381083A publication Critical patent/CN112381083A/en
Priority to CN202110400578.6A priority patent/CN113159028B/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a saliency perception image clipping method based on potential region pairs, which generates an attractive clipping image by constructing a clipping image frame based on deep learning. The framework includes a multi-scale CNN feature extractor, a deformable salient position sensitive roi (rod) alignment operator, a twinned fully-connected network, and a mixture-loss function. The method fully utilizes the saliency map, considers the saliency information to eliminate poor candidate cropping maps, prevents the model from over-fitting, and integrates the saliency map into the pooling operator to help construct the sense of saliency capable of coding content preference. The present invention reveals the intrinsic mechanism of the clipping process and also reveals the internal connection of potential region pairs. Not only can achieve better aesthetic effect of image cutting, but also can neglect the required calculation burden.

Description

Saliency perception image clipping method based on potential region pair
Technical Field
The invention belongs to the technical field of artificial intelligence, and relates to a saliency perception image clipping method based on potential region pairs.
Background
Image cropping, which is intended to find an image crop with the best aesthetic quality, is widely used in image post-processing, visual recommendation, and image selection as an important technique. Especially when a large number of images need to be cropped, image cropping becomes a laborious task. Thus, automatic image cropping has recently attracted increased attention within the research community and industry.
Early cropping methods explicitly designed various manual features based on photographic knowledge (e.g., the trisection method and the center method). With the development of deep learning, a great deal of researchers are dedicated to developing clipping methods in a data-driven manner, and the release of some reference data sets for comparison greatly facilitates the progress of related research.
However, obtaining the best candidate clip map is still extremely difficult, and is mainly influenced by the following three aspects: 1) the potential of image saliency information cannot be fully released. The previous saliency-based cropping methods focus on preserving the most important content in the best cropping map, but ignore the cases: if the rectangle of the saliency region is located near the boundary of the source image, the saliency region and the best cropped picture will overlap. Moreover, the saliency information is only used for the generation of candidate clipping maps and is not continuously used in subsequent clipping modules. 2) The potential region pairs (region of interest (ROI) and region of discard (ROD)) and their internal laws are not well represented. In general, the pairwise cropping method explicitly forms and feeds a pair of source images into an automated cropping model, but the performance of such methods is often poor due to the selection of a source image pair that is overly dependent on detail and uncertain. 3) Traditional indicators for evaluating clipping methods are unreliable and inaccurate. In some cases, the intersection ratio (IoU) and the Boundary Displacement Error (BDE) are not sufficient to subjectively evaluate the performance of their clipping method.
Disclosure of Invention
The invention aims to provide a saliency-sensing image clipping method based on potential region pairs, so as to overcome the defects of the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme:
a salient image cropping method based on potential region pairs comprises the following steps:
step 1), generating a candidate cutting map based on a grid anchor by researching the criteria and procedures of professional photography.
And 2) describing the features of the source image by adopting a multi-scale and lightweight feature extraction network, and then clipping the extracted features by utilizing deformable interesting pooling and deformable uninteresting pooling.
And 3) training a twin aesthetic evaluation network, and predicting the aesthetic scores of the candidate cutting pictures by minimizing a mixing loss function.
Further, candidate clips based on saliency are generated, an initial clip map is first created based on saliency areas, and then candidate clip maps are generated in a grid anchor frame manner.
Further, the algorithm for creating the initial cropping map is as follows:
inputting: the size of the image (I) is wide (W) x high (H), and the magnification is lambdalargeReduction ratio lambdasmallArea function area (·), two rectangles Re1And Re2The closest distance between the outlines of (a) and (b) Clo _ Dis (Re)1, Re2)。
And (3) outputting: initial cut Sinit_crop
Figure BDA0002537887000000021
Figure BDA0002537887000000031
Wherein s is1∈(0,1]And d1∈[0,1]The threshold values of the respective location (b) and location (a)
Further, a method for generating candidate cropping graphs by means of mesh anchors is shown in fig. 2:
wherein the input image size is WXH, which corresponds to M × N bins, M1,m2,n1,n2Respectively, the number of bins from the initial cropped picture to the source image boundary.
Figure BDA0002537887000000032
The total number of the candidate cutting pictures is
Figure BDA0002537887000000033
And setting constraint conditions: a qualified clip map should exceed a certain proportion of the input image to exclude a certain number of unsuitable size candidate clip maps:
area(Scrop)=ρarea(I) (1)
wherein area (. cndot.) is an area function, ScroAnd SsalRespectively representing a clipping region and a saliency bounding box region.
Figure BDA0002537887000000034
And the aesthetic quality of the cut-out picture is improved by specifying the length-width ratio of the image:
Figure BDA0002537887000000035
α1and alpha2Are respectively set to 0.5 and 2
Further, a multi-scale and lightweight feature extraction network is adopted to describe the features of the source image, as shown in module 1 of fig. 1. Through the feature extraction network, a source image can be converted into a feature diagram rich in information and capable of simultaneously representing the whole local context and the local context. The feature extraction network consists of two modules: an infrastructure network and a Feature Aggregation Module (FAM).
Further, the multi-scale features can effectively remove local interference elements and enhance the recognition capability of the features by considering the spatial relationship.
Further, the underlying network may be any effective Convolutional Neural Network (CNN) model to capture image features while preserving a sufficient receptive field. The nth layer and the n-1 st layer of the base network are the last two layers of the base network, and global context information can be provided to some extent by skipping connections.
Further, with respect to FAM, it aims to compensate for the loss of global and multi-scale context during feature extraction. The FAM execution steps are as follows:
step 1, firstly, average pooling at different scales is adopted to generate some feature maps, and then the feature maps are added on the 3 × 3 convolutional layer.
And 2, directly performing up-sampling on the low-dimensional feature map through bilinear interpolation to obtain the same size features as the original feature map of the nth layer.
And 3, finally connecting the up-sampling characteristic graphs from different sub-branches into a final output characteristic graph.
Further, the cropped regions are focused using two saliency-oriented alignment operators, i.e., the deformable position-sensitive ROI and the ROD align are perceptually significant. The saliency information is combined with deformable psroi (psrod) pooling and some lightweight head designs to take full advantage of the feature representation.
Further, significance deformable psroi (rod) pooling is defined as:
Figure BDA0002537887000000051
Figure BDA0002537887000000052
f' (i, j) and f (i, j) are the output pooled feature map and the original feature map, respectively. (x)lf,ylf) For ROI (ROD) in the upper left corner, n is the number of pixels in bins, (Δ x, Δ y) is the fractional offset learned from the fully connected (fc) layer, S is the saliency map, S is the number of pixels in the bini,j(x, y) is 0 or 1.
Further, as shown in fig. 3, we set C to 8 to reduce the amount of computation of the subsequent subnet, and to some extent, fix k to 3 according to the composition pattern of the 3 × 3 grid. And using the bilinear interpolation values to calculate the exact values employed in ROI (ROD) align to account for rounding errors and misalignment issues that occur in the significance aware deformable PS ROI (ROD) merge, and named significance aware deformable PS ROI (ROD) align.
Further, as shown in block 2 of FIG. 1, F represents the entire feature map generated by the feature extraction network, FROIAnd FRODCharacteristic maps of ROI and ROD, respectively. Applying a saliency-aware deformable PS ROI alignment approach to F at 8 × 8 resolutionROIIs converted into
Figure BDA0002537887000000053
In branch 2, the ROD is first reconstructed from mode 4 and four separable component ROD alignments are performed with saliency perception to generate the corresponding feature maps, followed by the 1 x 1 convolutional layer to reduce the channel size. All four feature maps are connected together as an alignment feature map
Figure BDA0002537887000000054
And (4) showing. On the one hand FROIAnd
Figure BDA0002537887000000055
is connected as
Figure BDA0002537887000000056
Will be fed into both fully connected layers for final MOS prediction. On the other hand, in the case of a liquid,
Figure BDA0002537887000000057
is marked as being copied
Figure BDA0002537887000000058
Will be mixed with
Figure BDA0002537887000000059
Are fed together into the twin evaluation network.
Further, twinning the netThe network is shown in block 3 of fig. 1 and is composed of two identical fully-connected networks, which are connected in series
Figure BDA00025378870000000510
And
Figure BDA00025378870000000511
the weights are shared when extracting features. The twin network inputs the aligned feature map and outputs the predicted aesthetic score. By using
Figure BDA00025378870000000512
And
Figure BDA00025378870000000513
input feature maps indicating ROI and ROD, respectively, and prediction scores thereof are indicated by Φ (ROI _ D _ P4) and Φ (ROD _ P4), respectively. And training the twin aesthetic evaluation network with the following constraints:
Figure BDA0002537887000000061
Figure BDA0002537887000000062
here, area (·) represents an area function, γ is an area ratio, and is empirically set to 2/3. After twin network processing, the ranking penalty is defined for each potential pair as follows:
lrank(ROI_D_P4,ROD_P4)=max{0,Φ(ROD_P4)-Φ(ROI_D_P4)}
(7)
let eij=gij-pij,gijAnd pijRespectively the Mean Opinion Score (MOS) and the predicted aesthetic score of the jth cropping map of image i. To enhance robustness to outliers, Huber loss is defined as follows:
Figure BDA0002537887000000063
the final overall loss function is:
Figure BDA0002537887000000064
wherein,
Figure BDA0002537887000000065
to balance the parameters, it is empirically set to 1. If a saliency map is not available, all values of the saliency map are set to 0.
Compared with the prior art, the invention has the following beneficial technical effects:
the saliency perception image clipping method based on the potential region pair fully utilizes the saliency map, considers saliency information to eliminate poor candidate clipping maps, prevents the problem of excessive fitting of a model, and integrates the saliency information into a pooling operator to help build a saliency perception sense field capable of coding content preference.
The invention discloses an image cropping method based on significance perception of potential region pairs, which discloses an internal mechanism of a cropping process and reveals internal relation of the potential region pairs. Specifically, four different ROD modes and various combinations of ROIs and RODs were designed in different cases, and then the relative ranking order and ranking loss of ROIs and ROIs were learned.
The invention discloses a saliency perception image clipping method based on potential region pairs, which constructs a clipping image frame based on deep learning to generate an attractive clipping image. The framework includes a multi-scale CNN feature extractor, a deformable salient position sensitive roi (rod) alignment operator, a twinned fully-connected network, and a mixture-loss function.
According to the saliency perception image clipping method based on the potential region pairs, most of indexes can be ignored in calculation load, and the method is superior to other methods.
Drawings
Fig. 1 is a diagram of the overall network architecture of the present invention.
Fig. 2 is a saliency map of a candidate clip map. The red solid line boxes represent saliency areas, the red dashed line boxes represent candidate clip diagrams, and the blue solid line boxes represent initial clip diagrams. (a) The bounding box of the saliency region is located near the boundary of the given image. (b) A salient region is a small portion of the source image. (c) The salient region is directly used as an initial cropping map.
Fig. 3 is a saliency perceived deformable location sensitive ROI pooling map.
Fig. 4 is a diagram of four modes of ROD.
Detailed Description
GAICD _ S dataset: the GAICD dataset first captured about 50K images from the Flickr website and then manually reduced to 10K images with a good composition. For each image, 19 annotators were invited to assign aesthetic scores to the various aspect ratio crop maps using the annotation tool. Among 1,236 images, there are a total of 106,800 candidate cropping patterns. As a condensed version of GAICD, GAICD _ S contains 1,236 photographs containing 100,641 reasonable annotated cutmaps.
For all samples, the short edge is resized to 256 by bilinear interpolation and data enhancement is performed using several conventional operators (random adjustments of contrast, saturation, brightness, hue and horizontal flipping).
In addition, the values of all samples were normalized to [0,1] using the mean and standard deviation calculated on the ImageNet dataset. During training, a pre-trained MobilNetV2 model is loaded into the feature extraction network of the present invention to mitigate overfitting. The network of the present invention is trained with an Adam optimizer by minimizing the mixing loss and setting all hyper-parameters to default values. The initial learning rate lr is 1e-4, and the maximum epoch is set to 100. With respect to saliency maps, using PoolNet can produce a pleasing saliency bounding box. Furthermore, batch normalization and dropout are also used for twin evaluation networks.

Claims (10)

1. A salient perception image clipping method based on potential region pairs is characterized by comprising the following steps:
step 1), generating candidate clipping anchor frames based on significance by researching the criteria and procedures of professional photography.
And 2) describing the features of the source image by adopting a multi-scale and lightweight feature extraction network, and then clipping the extracted features by utilizing deformable interesting pooling and deformable uninteresting pooling.
And 3) training a twin aesthetic evaluation network, and predicting the aesthetic scores of the candidate cutting pictures by minimizing a mixing loss function.
2. The criteria and procedure for generating mesh anchor-based candidate cropping maps from research professional photography of claim 1, wherein the initial cropping map is first created based on salient regions, and then the candidate cropping maps are generated in a mesh anchor fashion.
3. The method for creating an initial clipping map based on the salient region according to claim 2, wherein the algorithm for creating the initial clipping map is as follows:
inputting: the size of the image (I) is wide (W) x high (H), and the magnification is lambdalargeReduction ratio lambdasmallArea function area (·), two rectangles Re1And Re2The closest distance between the outlines of (a) and (b) Clo _ Dis (Re)1,Re2)。
And (3) outputting: initial clipping region Sinit_crop
Figure FDA0002537886990000011
Figure FDA0002537886990000021
Wherein s is1∈(0,1]And d1∈(0,1]The threshold values of the location (b) and the location (a), respectively.
4. The method for generating candidate cropping maps in the form of mesh anchors according to claim 2, wherein as shown in fig. 2:
where the input image size is WxH, which corresponds to the anchor frame MxN blocks, M1,m2,n1,n2Respectively, representing the number of blocks from the initial cropped picture to the source image boundary.
Figure FDA0002537886990000022
The total number of the candidate cutting pictures is
Figure FDA0002537886990000023
And setting constraint conditions: a qualified clip map should exceed a certain proportion of the input image to exclude a certain number of unsuitable size candidate clip maps:
area(Scrop)=ρarea(I) (1)
wherein area (. cndot.) is an area function, ScroAnd SsalRespectively representing a clipping region and a saliency bounding box region.
Figure FDA0002537886990000024
And the aesthetic quality of the cut-out picture is improved by specifying the length-width ratio of the image:
Figure FDA0002537886990000025
α1and alpha2Set to 0.5 and 2, respectively.
5. The method as claimed in claim 1, wherein the source image is characterized by using a multi-scale and lightweight feature extraction network to describe the features of the source image, and focusing the clipping region by using two significance-oriented alignment operators, wherein the source image can be converted into an information-rich feature map capable of simultaneously representing a global context and a local context through the feature extraction network. The feature extraction network consists of two modules: an infrastructure network and a Feature Aggregation Module (FAM).
6. The infrastructure network as in claim 5, wherein the infrastructure network can be any effective Convolutional Neural Network (CNN) model to capture image features while preserving a sufficient receptive field. The nth layer and the n-1 st layer of the base network are the last two layers of the base network, and global context information can be provided to some extent by skipping connections.
7. The Feature Aggregation Module (FAM) according to claim 5, characterized in that it aims to compensate for the loss of global and multi-scale context during feature extraction. The FAM execution steps are as follows:
step 1, firstly, average pooling of different scales is adopted to generate some feature maps, and then the feature maps are added on a 3 × 3 convolutional layer.
And 2, directly performing up-sampling on the low-dimensional feature map through bilinear interpolation to obtain the same size features as the original feature map of the nth layer.
And 3, finally connecting the up-sampling characteristic graphs from different sub-branches into a final output characteristic graph.
8. The method of claim 1, using a multi-scale, lightweight feature extraction network to characterize a source image, using two saliency-guided alignment operators to focus a cropped region, characterized by using saliency-aware deformable position-sensitive ROI and ROD align, combining saliency information with deformable psroi (psrod) pooling and some lightweight head design to fully exploit feature representation. Significance deformable psroi (rod) pooling is defined as:
Figure FDA0002537886990000031
Figure FDA0002537886990000032
f' (i, j) and f (i, j) are the output pooled feature map and the original feature map, respectively. (x)mf,ymf) For ROI (ROD) in the upper left corner, n is the number of pixels in bins, (Δ x, Δ y) is the offset learned from the fully connected (fc) layer, S is the saliency mapi,j(x, y) is 0 or 1.
Further, as shown in fig. 3, we set C to 8 to reduce the amount of computation of the subsequent sub-network, and to some extent, fix k to 3 according to the composition pattern of the 3 × 3 grid. And computing the exact values employed in ROI (ROD) align using bilinear interpolation to solve rounding errors and misalignment problems that occur in significance aware deformed PS ROI (ROD) merging, and is named significance aware deformable PS ROI (ROD) align.
9. Training a twin aesthetic evaluation network to predict the aesthetic score of a candidate clipping graph by minimizing a blending loss function according to claim 1, wherein F denotes the entire feature graph generated by the feature extraction network, F, as shown in block 2 of figure 1ROIAnd FROmCharacteristic maps of ROI and ROD, respectively. Applying a deformable PS ROI alignment approach with saliency perception, F at 8 × 8 resolutionROIIs converted into
Figure FDA0002537886990000041
In branch 2, the ROD is first reconstructed from mode 4 and four separable component ROD alignments are performed with saliency sensing to generate the corresponding feature maps, followed by the 1 × 1 convolutional layer to reduce the channel size. All four feature maps are connected together as an alignment feature map
Figure FDA0002537886990000042
And (4) showing. On the one hand FROIAnd
Figure FDA0002537886990000043
is connected as
Figure FDA0002537886990000044
Will be fed into the fully connected layer for final MOS prediction. On the other hand, in the case of a liquid,
Figure FDA0002537886990000045
is marked as being copied
Figure FDA0002537886990000046
Will be mixed with
Figure FDA0002537886990000047
Are fed together into the twin evaluation network.
10. Training a twin aesthetic evaluation network to predict the aesthetic score of a candidate cutting graph by minimizing a blending loss function according to claim 1, wherein the twin network is composed of two identical fully connected networks as shown in block 3 of figure 1
Figure FDA0002537886990000048
And
Figure FDA0002537886990000049
the weights are shared when extracting features. The twin network inputs the aligned feature map and outputs the predicted aesthetic score. By using
Figure FDA00025378869900000410
And
Figure FDA00025378869900000411
input feature maps indicating ROI and ROD, respectively, and prediction scores thereof are indicated by Φ (ROI _ D _ P4) and Φ (ROD _ P4), respectively. And training the twin aesthetic evaluation network with the following constraints:
Figure FDA0002537886990000051
Figure FDA0002537886990000052
here, area (·) represents an area function, γ is an area ratio, and is empirically set to 2/3. After twin network processing, the ranking penalty is defined as follows for each potential pair:
lrank(ROI_D_P4,ROD_P4)=max{0,Φ(ROD_P4)-Φ(ROI_D_P4)} (7)
let eij=gij-pij,gijAnd pijRespectively the Mean Opinion Score (MOS) and the predicted aesthetic score of the jth cropping map of image i. To enhance robustness to outliers, Huber loss is defined as follows:
Figure FDA0002537886990000053
the final overall loss function is:
Figure FDA0002537886990000054
wherein,
Figure FDA0002537886990000055
to balance the parameters, it is empirically set to 1. If a saliency map is not available, all values of the saliency map are set to 0.
CN202010538411.1A 2020-06-12 2020-06-12 Saliency perception image clipping method based on potential region pair Withdrawn CN112381083A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010538411.1A CN112381083A (en) 2020-06-12 2020-06-12 Saliency perception image clipping method based on potential region pair
CN202110400578.6A CN113159028B (en) 2020-06-12 2021-04-14 Saliency-aware image cropping method and apparatus, computing device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010538411.1A CN112381083A (en) 2020-06-12 2020-06-12 Saliency perception image clipping method based on potential region pair

Publications (1)

Publication Number Publication Date
CN112381083A true CN112381083A (en) 2021-02-19

Family

ID=74586331

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202010538411.1A Withdrawn CN112381083A (en) 2020-06-12 2020-06-12 Saliency perception image clipping method based on potential region pair
CN202110400578.6A Active CN113159028B (en) 2020-06-12 2021-04-14 Saliency-aware image cropping method and apparatus, computing device, and storage medium

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202110400578.6A Active CN113159028B (en) 2020-06-12 2021-04-14 Saliency-aware image cropping method and apparatus, computing device, and storage medium

Country Status (1)

Country Link
CN (2) CN112381083A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113222904A (en) * 2021-04-21 2021-08-06 重庆邮电大学 Concrete pavement crack detection method for improving PoolNet network structure
CN113642710A (en) * 2021-08-16 2021-11-12 北京百度网讯科技有限公司 Network model quantification method, device, equipment and storage medium
CN113706546A (en) * 2021-08-23 2021-11-26 浙江工业大学 Medical image segmentation method and device based on lightweight twin network
CN113724261A (en) * 2021-08-11 2021-11-30 电子科技大学 Fast image composition method based on convolutional neural network
CN114025099A (en) * 2021-11-25 2022-02-08 努比亚技术有限公司 A method, device and computer-readable storage medium for controlling composition of photographed images
WO2022256020A1 (en) * 2021-06-04 2022-12-08 Hewlett-Packard Development Company, L.P. Image re-composition

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113763391B (en) * 2021-09-24 2024-03-19 华中科技大学 Intelligent image cutting method and system based on visual element relation
CN115115941B (en) * 2021-11-09 2023-04-18 腾晖科技建筑智能(深圳)有限公司 Laser radar point cloud map rod-shaped target extraction method based on template matching
CN116168207A (en) * 2021-11-24 2023-05-26 北京字节跳动网络技术有限公司 Image clipping method, model training method, device, electronic equipment and medium
CN114119373A (en) * 2021-11-29 2022-03-01 广东维沃软件技术有限公司 Image cropping method and device and electronic equipment
CN114529558B (en) * 2022-02-09 2025-07-11 维沃移动通信有限公司 Image processing method, device, electronic device and readable storage medium
CN117152409B (en) * 2023-08-07 2024-09-27 中移互联网有限公司 Image clipping method, device and equipment based on multi-mode perception modeling

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8311364B2 (en) * 2009-09-25 2012-11-13 Eastman Kodak Company Estimating aesthetic quality of digital images
US10002415B2 (en) * 2016-04-12 2018-06-19 Adobe Systems Incorporated Utilizing deep learning for rating aesthetics of digital images
WO2020034663A1 (en) * 2018-08-13 2020-02-20 The Hong Kong Polytechnic University Grid-based image cropping
CN109544524B (en) * 2018-11-15 2023-05-23 中共中央办公厅电子科技学院 Attention mechanism-based multi-attribute image aesthetic evaluation system
CN110084284A (en) * 2019-04-04 2019-08-02 苏州千视通视觉科技股份有限公司 Target detection and secondary classification algorithm and device based on region convolutional neural networks

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113222904A (en) * 2021-04-21 2021-08-06 重庆邮电大学 Concrete pavement crack detection method for improving PoolNet network structure
WO2022256020A1 (en) * 2021-06-04 2022-12-08 Hewlett-Packard Development Company, L.P. Image re-composition
CN113724261A (en) * 2021-08-11 2021-11-30 电子科技大学 Fast image composition method based on convolutional neural network
CN113642710A (en) * 2021-08-16 2021-11-12 北京百度网讯科技有限公司 Network model quantification method, device, equipment and storage medium
CN113642710B (en) * 2021-08-16 2023-10-31 北京百度网讯科技有限公司 A quantification method, device, equipment and storage medium for a network model
CN113706546A (en) * 2021-08-23 2021-11-26 浙江工业大学 Medical image segmentation method and device based on lightweight twin network
CN113706546B (en) * 2021-08-23 2024-03-19 浙江工业大学 Medical image segmentation method and device based on lightweight twin network
CN114025099A (en) * 2021-11-25 2022-02-08 努比亚技术有限公司 A method, device and computer-readable storage medium for controlling composition of photographed images

Also Published As

Publication number Publication date
CN113159028B (en) 2022-04-05
CN113159028A (en) 2021-07-23

Similar Documents

Publication Publication Date Title
CN112381083A (en) Saliency perception image clipping method based on potential region pair
Hosu et al. KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment
US6748097B1 (en) Method for varying the number, size, and magnification of photographic prints based on image emphasis and appeal
US6738494B1 (en) Method for varying an image processing path based on image emphasis and appeal
CN115147380A (en) Small transparent plastic product defect detection method based on YOLOv5
CN111445474B (en) Kidney CT Image Segmentation Method Based on Bidirectional Complex Attention Deep Network
CN113516116A (en) Text detection method, system and medium suitable for complex natural scene
CN112233129B (en) Deep learning-based parallel multi-scale attention mechanism semantic segmentation method and device
CN112801182B (en) A RGBT Target Tracking Method Based on Difficult Sample Perception
CN111899164B (en) An Image Stitching Method for Multi-focal Scenes
CN115661628A (en) A Fish Detection Method Based on Improved YOLOv5S Model
CN117392017A (en) A face restoration method based on feature points and deformable hybrid attention adversarial network
CN115909493A (en) A method and system for detecting inappropriate gestures of teachers for classroom live video
CN120147111B (en) Method for automatically processing portrait photo into standard certificate photo
CN118608926A (en) Image quality evaluation method, device, electronic device and storage medium
CN114627293B (en) Portrait cutout method based on multi-task learning
CN112446292A (en) 2D image salient target detection method and system
CN115100150A (en) Machine vision universal detection algorithm
CN119090986A (en) A method for generating images based on artificial intelligence
CN117036711A (en) Weak supervision semantic segmentation method based on attention adjustment
CN117372259A (en) Image super-resolution method based on reference image
CN115713464A (en) Attention text super-resolution method based on text perception loss
CN113240573B (en) High-resolution image style transformation method and system for local and global parallel learning
Li et al. An underwater image segmentation model for complex scenes in aquaculture using vision Transformer
CN111833992A (en) Mid-split phase chromosome region searching method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210219

WW01 Invention patent application withdrawn after publication