Vivoli et al., 2024 - Google Patents
One missing piece in vision and language: A survey on comics understandingVivoli et al., 2024
View PDF- Document ID
- 14708670203999436257
- Author
- Vivoli E
- Souibgui M
- Barsky A
- LLabres A
- Bertini M
- Karatzas D
- Publication year
- Publication venue
- arXiv preprint arXiv:2409.09502
External Links
Snippet
Vision-language models have recently evolved into versatile systems capable of high performance across a range of tasks, such as document understanding, visual question answering, and grounding, often in zero-shot settings. Comics Understanding, a complex …
- 230000004438 eyesight 0 title description 8
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
- G06F17/30799—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre using low-level visual features of the video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30244—Information retrieval; Database structures therefor; File system structures therefor in image databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30017—Multimedia data retrieval; Retrieval of more than one type of audiovisual media
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6201—Matching; Proximity measures
- G06K9/6202—Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/20—Image acquisition
- G06K9/34—Segmentation of touching or overlapping patterns in the image field
- G06K9/342—Cutting or merging image elements, e.g. region growing, watershed, clustering-based techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K2209/00—Indexing scheme relating to methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Anantrasirichai et al. | Artificial intelligence in the creative industries: a review | |
| Agnese et al. | A survey and taxonomy of adversarial neural networks for text‐to‐image synthesis | |
| Iyyer et al. | The amazing mysteries of the gutter: Drawing inferences between panels in comic book narratives | |
| Lin et al. | Autoposter: A highly automatic and content-aware design system for advertising poster generation | |
| US12277766B2 (en) | Information generation method and apparatus | |
| Zhao et al. | Cartoon image processing: A survey | |
| Vivoli et al. | One missing piece in vision and language: A survey on comics understanding | |
| Zhao et al. | Selective region-based photo color adjustment for graphic designs | |
| Tang et al. | Generative ai for cel-animation: A survey | |
| Jing et al. | Content-aware video2comics with manga-style layout | |
| Ueno et al. | Continuous and gradual style changes of graphic designs with generative model | |
| Bai et al. | Intelligent artistic typography: A comprehensive review of artistic text design and generation | |
| Vijendran et al. | Artificial intelligence for geometry-based feature extraction, analysis and synthesis in artistic images: a survey | |
| Zhang et al. | AI video editing: A survey | |
| Diviya et al. | Deep neural architecture for natural language image synthesis for Tamil text using BASEGAN and hybrid super resolution GAN (HSRGAN) | |
| Mei et al. | Vision and language: from visual perception to content creation | |
| Rigaud et al. | Computer vision applied to comic book images | |
| Gao et al. | EL‐GAN: Edge‐Enhanced Generative Adversarial Network for Layout‐to‐Image Generation | |
| Melistas et al. | A Deep Learning Pipeline for the Synthesis of Graphic Novels. | |
| Bousetouane | Generative AI for Vision: A Comprehensive Study of Frameworks and Applications | |
| Zou et al. | Lucss: Language-based user-customized colourization of scene sketches | |
| Sun et al. | Learning Fine-Grained and Semantically Aware Mamba Representations for Tampered Text Detection in Images | |
| Mukherjee et al. | A review on generative adversarial networks | |
| Hu et al. | Unveiling Deep Shadows: A Survey and Benchmark on Image and Video Shadow Detection, Removal, and Generation in the Deep Learning Era | |
| CN118212326B (en) | Visual text generation method, device, equipment and storage medium |