Vivoli et al., 2024 - Google Patents

One missing piece in vision and language: A survey on comics understanding

Vivoli et al., 2024

Document ID: 14708670203999436257
Author: Vivoli E; Souibgui M; Barsky A; LLabres A; Bertini M; Karatzas D
Publication year: 2024
Publication venue: arXiv preprint arXiv:2409.09502

External Links

Cited by

Snippet

Vision-language models have recently evolved into versatile systems capable of high performance across a range of tasks, such as document understanding, visual question answering, and grounding, often in zero-shot settings. Comics Understanding, a complex …

Continue reading at arxiv.org (PDF) (other versions)

230000004438 eyesight 0 title description 8

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
- G06F17/30799—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre using low-level visual features of the video content
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30244—Information retrieval; Database structures therefor; File system structures therefor in image databases
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30017—Multimedia data retrieval; Retrieval of more than one type of audiovisual media
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6201—Matching; Proximity measures
- G06K9/6202—Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/20—Image acquisition
- G06K9/34—Segmentation of touching or overlapping patterns in the image field
- G06K9/342—Cutting or merging image elements, e.g. region growing, watershed, clustering-based techniques
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K2209/00—Indexing scheme relating to methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints

Similar Documents

Publication	Publication Date	Title
Anantrasirichai et al.	2022	Artificial intelligence in the creative industries: a review
Agnese et al.	2020	A survey and taxonomy of adversarial neural networks for text‐to‐image synthesis
Iyyer et al.	2017	The amazing mysteries of the gutter: Drawing inferences between panels in comic book narratives
Lin et al.	2023	Autoposter: A highly automatic and content-aware design system for advertising poster generation
US12277766B2 (en)	2025-04-15	Information generation method and apparatus
Zhao et al.	2022	Cartoon image processing: A survey
Vivoli et al.	2024	One missing piece in vision and language: A survey on comics understanding
Zhao et al.	2021	Selective region-based photo color adjustment for graphic designs
Tang et al.	2025	Generative ai for cel-animation: A survey
Jing et al.	2015	Content-aware video2comics with manga-style layout
Ueno et al.	2021	Continuous and gradual style changes of graphic designs with generative model
Bai et al.	2024	Intelligent artistic typography: A comprehensive review of artistic text design and generation
Vijendran et al.	2024	Artificial intelligence for geometry-based feature extraction, analysis and synthesis in artistic images: a survey
Zhang et al.	2022	AI video editing: A survey
Diviya et al.	2023	Deep neural architecture for natural language image synthesis for Tamil text using BASEGAN and hybrid super resolution GAN (HSRGAN)
Mei et al.	2020	Vision and language: from visual perception to content creation
Rigaud et al.	2018	Computer vision applied to comic book images
Gao et al.	2022	EL‐GAN: Edge‐Enhanced Generative Adversarial Network for Layout‐to‐Image Generation
Melistas et al.	2021	A Deep Learning Pipeline for the Synthesis of Graphic Novels.
Bousetouane	2025	Generative AI for Vision: A Comprehensive Study of Frameworks and Applications
Zou et al.	2018	Lucss: Language-based user-customized colourization of scene sketches
Sun et al.	2024	Learning Fine-Grained and Semantically Aware Mamba Representations for Tampered Text Detection in Images
Mukherjee et al.	2022	A review on generative adversarial networks
Hu et al.	2024	Unveiling Deep Shadows: A Survey and Benchmark on Image and Video Shadow Detection, Removal, and Generation in the Deep Learning Era
CN118212326B (en)	2024-09-03	Visual text generation method, device, equipment and storage medium