CN111343512B - Information acquisition method, display device and server - Google Patents
Information acquisition method, display device and server Download PDFInfo
- Publication number
- CN111343512B CN111343512B CN202010080037.5A CN202010080037A CN111343512B CN 111343512 B CN111343512 B CN 111343512B CN 202010080037 A CN202010080037 A CN 202010080037A CN 111343512 B CN111343512 B CN 111343512B
- Authority
- CN
- China
- Prior art keywords
- target
- area
- image
- characters
- display
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/8126—Monomedia components thereof involving additional data, e.g. news, sports, stocks, weather forecasts
- H04N21/8133—Monomedia components thereof involving additional data, e.g. news, sports, stocks, weather forecasts specifically related to the content, e.g. biography of the actors in a movie, detailed information about an article seen in a video program
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/48—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/413—Classification of content, e.g. text, photographs or tables
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/431—Generation of visual interfaces for content selection or interaction; Content or additional data rendering
- H04N21/4312—Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Library & Information Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Controls And Circuits For Display Device (AREA)
Abstract
The application discloses an information acquisition method, display equipment and a server, and belongs to the technical field of electronics. The method comprises the following steps: determining an object to be identified in a first area in an image displayed by a display screen; determining a target multimedia file indicated by characters in a second area in an image displayed by a display screen; determining at least one alternative object in a plurality of objects related to the target multimedia file according to the object to be identified, wherein the type of the alternative object is the same as that of the object to be identified; determining a target object in at least one candidate object, wherein the similarity between the characteristics of the target object and the characteristics of the object to be identified is greater than a similarity threshold value; and controlling the display screen to display the introduction information of the target object. The problem of the smart television that the accuracy is lower that the object is identified according to the characteristics of the object in the poster of the film is solved. The method and the device are used for acquiring information.
Description
Technical Field
The present application relates to the field of electronic technologies, and in particular, to an information obtaining method, a display device, and a server.
Background
With the development of electronic technology, the functions of the smart television are more and more abundant.
The smart television can display posters of a plurality of films, and a user can select a film to watch according to the posters of the plurality of films. Further, the smart tv may have an image retrieval function, through which the user may obtain more information of the plurality of movies to select a movie of more interest among the plurality of movies for viewing. For example, a user may trigger a smart television to intercept its displayed image, which may include a plurality of posters of a movie. Usually, the poster of the movie includes a plurality of objects (e.g., persons who are playing the movie), and the smart television can identify the object according to the characteristics of the object (e.g., the facial characteristics of a person) in the poster of the movie and obtain information about the object (e.g., the age, height, and play of the person). Therefore, the user can select the film which is more suitable for the interest of the user according to the obtained information of the object.
However, the accuracy of identifying an object in a poster of a movie by a smart television is low simply according to the human face characteristics of the object.
Disclosure of Invention
The application provides an information acquisition method, display equipment and a server, and can solve the problem that the accuracy of identifying an object in a poster of a film by a smart television is low. The technical scheme is as follows:
in one aspect, an information obtaining method is provided, where the method includes:
determining an object to be identified in a first area in an image displayed by a display screen;
determining a target multimedia file indicated by characters of a second area in an image displayed by the display screen;
determining at least one alternative object in a plurality of objects related to the target multimedia file according to the object to be identified, wherein the type of the alternative object is the same as that of the object to be identified;
determining a target object in the at least one candidate object, wherein the similarity between the characteristics of the target object and the characteristics of the object to be identified is greater than a similarity threshold value;
and controlling the display screen to display the introduction information of the target object.
In another aspect, an information obtaining method is provided, and the method includes:
receiving a screen shot image sent by display equipment;
identifying a face in the screenshot image; identifying characters in the screen shot image, wherein the characters are used for screening feature data of people corresponding to the characters in a preset database;
comparing the characteristic data of the face with the characteristic data of the characters corresponding to the characters, and identifying a target character in the characters corresponding to the characters;
and sending the information of the target person to the display equipment so as to enable the display equipment to display the information of the target person.
In another aspect, an information obtaining method is provided, where the method includes:
receiving a screen capturing instruction input by a user when displaying an image corresponding to the audio-video file;
capturing a screen of an image corresponding to the currently displayed audio-video file according to the screen capturing instruction to generate a screen capturing image;
identifying a face in the screenshot image; identifying characters in the screen shot image, wherein the characters are used for screening feature data of people corresponding to the characters in a preset database;
comparing the characteristic data of the face with the characteristic data of the characters corresponding to the characters, and identifying a target character in the characters corresponding to the characters;
and displaying the information of the target person.
In yet another aspect, there is provided a display apparatus including: a display screen, and a controller in communication with the display screen; the controller is used for executing the information acquisition method.
In yet another aspect, a server is provided, the server being communicatively connected to a display device, the server including a processor and a memory, the memory storing at least one instruction, the at least one instruction, when executed by the processor, implementing the information acquisition method described above.
The beneficial effect that technical scheme that this application provided brought includes at least:
in the information acquisition method provided by the embodiment of the application, the target multimedia file indicated by the characters in the second area displayed on the display screen can be determined, then at least one candidate object related to the target multimedia file is determined, and further the target object is determined in the at least one candidate object, and the similarity between the characteristics of the target object and the characteristics of the object to be identified is greater than the similarity threshold value. The relevance between the object to be recognized in the first area displayed by the display screen and the target multimedia file indicated by the characters in the second area is high, and the probability that the object to be recognized belongs to the at least one candidate object is high, so that the accuracy of the target object determined in the at least one candidate object related to the target multimedia file is high, and the accuracy of recognizing the object to be recognized is high.
And only the similarity between the characteristics of the at least one candidate object and the characteristics of the object to be identified needs to be determined, so that the object to be identified can be identified only by determining less similarity, the process of identifying the object to be identified is simplified, and the speed of identifying the object to be identified is improved.
Drawings
Fig. 1 is a schematic diagram of an information acquisition system according to an embodiment of the present application;
fig. 2 is a block diagram of a hardware configuration of a display device according to an embodiment of the present disclosure;
fig. 3 is a block diagram of a configuration of a control device according to an embodiment of the present application;
fig. 4 is a schematic diagram of a functional configuration of a display device according to an embodiment of the present application;
fig. 5 is a block diagram of a configuration of a software system in a display device according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a server provided in an embodiment of the present application;
fig. 7 is a flowchart of an information obtaining method according to an embodiment of the present application;
fig. 8 is a flowchart of another information acquisition method provided in the embodiment of the present application;
fig. 9 is a flowchart of another information acquiring method provided in an embodiment of the present application;
fig. 10 is a flowchart of another information acquiring method provided in an embodiment of the present application;
FIG. 11 is a schematic view of an interface displayed on a display screen provided in an embodiment of the present application;
FIG. 12 is a schematic diagram of an image of a first area according to an embodiment of the present application;
fig. 13 is a flowchart of an information obtaining method according to another embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
At present, the application of the smart television is more and more extensive. The smart television is provided with the operating system, and the smart television can be provided with a plurality of application programs to enrich the functions of the smart television, so that the diversified and personalized requirements of users are met. Usually, a smart television can simultaneously display posters of a plurality of films, and then a user can select a certain poster to trigger the smart television to play the film corresponding to the poster. In order to assist the user in selecting a movie with more interest from the plurality of movies, the smart television may extract features of each object (e.g., a person) in the displayed poster, and then compare the features of each object in the poster with the features of each object in the preset database to identify the object in the poster. Then, the smart television can display the introduction information of the identified object, so that the user can know more information about the movie, and further can determine the movie which the user wants to watch according to the information. As the number of movies is increasing, and objects (such as actors) related to the movies are also increasing, and usually, the objects in the preset database include features of most actors in the performance circle, the number of the objects in the preset database is very large, and may reach thousands. Therefore, the number of times of feature comparison that needs to be performed when identifying the object in the poster is large, and the process of feature comparison is complex, so that the identification rate of the object in the poster is low.
In addition, since the obtained poster is an image of a partial area in one screen displayed by the smart television, the area of the poster is small, the resolution of the poster is low, and a plurality of objects exist in the poster in general, so that the feature of each object extracted by the smart television is small. The number of objects with the characteristics in the preset database is possibly large, so that the smart television can match the large preset database according to the characteristics of the objects with few characteristics, the number of objects which can be matched is large, the smart television cannot accurately determine which object in the poster is in the preset database, and the recognition accuracy of the smart television on the object in the poster is low.
The following embodiments of the present application provide an information acquisition method, a display device, and a server, where the display device or the server can improve accuracy of identifying an object in a poster and improve a rate of identifying the object by executing the information acquisition method.
The display device provided by the embodiment of the application can be a liquid crystal display, an OLED display or a projection display device. The particular display device type, size, resolution, etc. are not limiting, and those skilled in the art will appreciate that the display device may be modified in performance and configuration as desired. The display device can be a television product, and the display device can provide a broadcast receiving television function and can additionally provide an intelligent network television function of a computer support function. For example, the display device may include a web tv, a smart tv, an Internet Protocol Television (IPTV), and the like. Optionally, the display device may also be a mobile terminal, a tablet computer, a notebook computer, and other smart devices.
Fig. 1 is a schematic diagram of an information acquisition system according to an embodiment of the present application. The information acquisition system may include a display device 200 and a server 400 communicatively connected. Alternatively, when the display device is a television product, the information acquisition system may further include the control apparatus 100 and the terminal 300 which are communicatively connected to the display device 200. As shown in fig. 1, a user may operate the display apparatus 200 through the terminal 300 and the control device 100.
The control device 100 may be a remote controller, which includes infrared protocol communication, bluetooth protocol communication, other short-distance communication methods, and the like, and controls the display apparatus 200 in a wireless or other wired manner. The user may input a user command through a key on a remote controller, voice input, control panel input, etc. to control the display apparatus 200. Such as: the user may input a corresponding control command through a volume up/down key, a channel control key, up/down/left/right movement keys, a voice input key, a menu key, a power on/off key, etc. on the remote controller, to implement a function of controlling the display apparatus 200.
In some embodiments, the display device 200 may also be controlled using an application running on the smart device. The application program may provide various controls for the User in an intuitive User Interface (UI) on a screen associated with the smart device through configuration. For example, the terminal 300 may install a software application with the display device 200, implement connection communication through a network communication protocol, and implement the purpose of one-to-one control operation and data communication. Such as: the control instruction protocol can be established between the terminal 300 and the display device 200, the remote control keyboard can be synchronized to the terminal 300, and the function of controlling the display device 200 can be realized by controlling the user interface on the terminal 300. The audio and video content displayed on the terminal 300 can also be transmitted to the display device 200, so as to realize the synchronous display function.
As also shown in fig. 1, the display apparatus 200 also performs data communication with the server 400 through various communication means. The display apparatus 200 may be allowed to make a communication connection through a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 400 may provide various contents and interactions to the display apparatus 200. Illustratively, the display device 200 receives software Program updates, or accesses a remotely stored digital media library by sending and receiving information, and Electronic Program Guide (EPG) interactions. The server 400 may be a group of servers, multiple groups of servers, or one or more types of servers. Other web service content such as video-on-demand and advertising services may be provided by the server 400.
It should be noted that, when the display device is a mobile terminal, a tablet computer, a notebook computer, or other intelligent devices, the user may directly operate on the display device to control the display device, and the display device does not need to be controlled by the control device.
Fig. 2 is a block diagram of a hardware configuration of a display device according to an embodiment of the present disclosure. As shown in fig. 2, the display device 200 includes a controller 210, a tuning demodulator 220, a communication interface 230, a detector 240, an input/output interface 250, a video processor 260-1, an audio processor 260-2, a display 280, an audio output 270, a memory 290, a power supply, and an infrared receiver.
A display 280 for receiving the image signal from the input of the video processor 260-1 and displaying the video content and images and components of the menu manipulation interface. The display 280 includes a display screen for presenting a picture, and a driving component for driving the display of an image. The video content may be displayed from broadcast television content, or may be a variety of broadcast signals that may be received via a wired or wireless communication protocol. Alternatively, various image contents received from the network communication protocol and sent from the network server side can be displayed.
Meanwhile, the display 280 simultaneously displays a user manipulation UI interface generated in the display apparatus 200 and used to control the display apparatus 200.
And, a driving component for driving the display according to the type of the display 280. Alternatively, a projection device and projection screen may also be included, provided that display 280 is a projection display.
The communication interface 230 is a component for communicating with an external device or an external server according to various communication protocol types. For example: the communication interface 230 may be a Wifi chip 231, a bluetooth communication protocol chip 232, a wired ethernet communication protocol chip 233, or other network communication protocol chips or near field communication protocol chips, and an infrared receiver (not shown in fig. 2).
The display apparatus 200 may establish control signal and data signal transmission and reception with an external control apparatus or a content providing apparatus through the communication interface 230. And an infrared receiver, an interface device for receiving an infrared control signal for controlling the apparatus 100 (e.g., an infrared remote controller, etc.).
The detector 240 is a signal used by the display device 200 to collect an external environment or interact with the outside. The detector 240 includes a light receiver 242, a sensor for collecting the intensity of ambient light, and parameters such as parameter changes can be adaptively displayed by collecting the ambient light.
The system comprises an image collector 241, such as a camera, a video camera and the like, which can be used for collecting external environment scenes, collecting attributes of a user or interacting gestures with the user, adaptively changing display parameters, and recognizing the gestures of the user to realize the interaction function with the user.
In some other exemplary embodiments, the detector 240 may further include a temperature sensor, etc., such that the display device 200 may adaptively adjust a display color temperature of the image by sensing an ambient temperature. For example, the display apparatus 200 may be adjusted to display a cool tone when the temperature is in a high environment, or the display apparatus 200 may be adjusted to display a warm tone when the temperature is in a low environment.
In some other exemplary embodiments, the detector 240 may further include a sound collector or the like, such as a microphone, which may be used to receive a user's sound, a voice signal including a control instruction of the user to control the display device 200, or collect an environmental sound for identifying an environmental scene type, and the display device 200 may be adapted to the environmental noise.
The input/output interface 250 controls data transmission between the display device 200 of the controller 210 and other external devices. Such as receiving video and audio signals, or command instructions, etc. from an external device.
Input/output interface 250 may include, but is not limited to, the following: any one or more of a high definition multimedia interface HDMI interface 251, analog or data high definition component input interface 253, composite video input interface 252, USB input interface 254, RGB ports (not shown in fig. 2), etc.
In other exemplary embodiments, the input/output interface 250 may also form a composite input/output interface with the above-mentioned plurality of interfaces.
The tuning demodulator 220 receives the broadcast television signals in a wired or wireless receiving manner, may perform modulation and demodulation processing such as amplification, frequency mixing, resonance, and the like, and demodulates the television audio and video signals carried in the television channel frequency selected by the user and the EPG data signals from a plurality of wireless or wired broadcast television signals.
The tuner demodulator 220 is responsive to a user selected television signal frequency and the television signal carried thereby, as selected by the user and as controlled by the controller 210.
The tuner-demodulator 220 may receive signals in various ways according to the broadcasting system of the television signal, such as: terrestrial broadcast, cable broadcast, satellite broadcast, or internet broadcast signals, etc.; and according to different modulation types, the modulation mode can be digital modulation or analog modulation. Depending on the type of television signal received, both analog and digital signals are possible.
In other exemplary embodiments, the tuning demodulator 220 may be in an external device, such as an external set-top box. Thus, the set-top box outputs television audio and video signals after modulation and demodulation, and the television audio and video signals are input into the display device 200 through the input/output interface 250.
The video processor 260-1 is configured to receive an external video signal, and perform video processing such as decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, image synthesis, and the like according to a standard codec protocol of the input signal, so as to obtain a signal that can be displayed or played on the direct display device 200.
Illustratively, the video processor 260-1 includes a demultiplexing module, a video decoding module, an image synthesizing module, a frame rate conversion module, a display formatting module, and the like.
The demultiplexing module is used for demultiplexing the input audio and video data stream, and if the input MPEG-2 is input, the demultiplexing module demultiplexes the input audio and video data stream into a video signal, an audio signal and the like.
And the video decoding module is used for processing the video signal after demultiplexing, including decoding, scaling and the like.
And the image synthesis module, such as an image synthesizer, is used for performing superposition mixing processing on the GUI signal input by the user or generated by the user and the video image after the zooming processing by the graphics generator so as to generate an image signal for display.
The frame rate conversion module is configured to convert an input video frame rate, such as a 60Hz frame rate into a 120Hz frame rate or a 240Hz frame rate, and the normal format is implemented in, for example, an interpolation frame mode.
The display format module is used for converting the received video output signal after the frame rate conversion, and changing the signal to conform to the signal of the display format, such as outputting an RGB data signal.
The audio processor 260-2 is configured to receive an external audio signal, decompress and decode the received audio signal according to a standard codec protocol of the input signal, and perform noise reduction, digital-to-analog conversion, amplification processing, and the like to obtain an audio signal that can be played in the speaker.
In other exemplary embodiments, the video processor 260-1 may comprise one or more chips. The audio processor 260-2 may also comprise one or more chips.
And, in other exemplary embodiments, the video processor 260-1 and the audio processor 260-2 may be separate chips or may be integrated together with the controller 210 in one or more chips.
An audio output 272, which receives the sound signal output from the audio processor 260-2 under the control of the controller 210, such as: the speaker 272, and the external sound output terminal 274 that can be output to the generation device of the external device, in addition to the speaker 272 carried by the display device 200 itself, such as: an external sound interface or an earphone interface and the like.
The power supply provides power supply support for the display device 200 from the power input from the external power source under the control of the controller 210. The power supply may include a built-in power supply circuit installed inside the display device 200, or may be a power supply interface installed outside the display device 200 to provide an external power supply in the display device 200.
A user input interface for receiving an input signal of a user and then transmitting the received user input signal to the controller 210. The user input signal may be a remote controller signal received through an infrared receiver, and various user control signals may be received through the network communication module.
Illustratively, the user inputs a user command through the remote controller 100 or the terminal 300, the user input interface responds to the user input through the controller 210 according to the user input, and the display device 200 responds to the user input.
In some embodiments, a User may enter a User command at a Graphical User Interface (GUI) displayed on the display 280, and the User input Interface receives the User input command through the GUI. Alternatively, the user may input a user command by inputting a specific sound or gesture, and the user input interface receives the user input command by recognizing the sound or gesture through the sensor.
The controller 210 controls the operation of the display device 200 and responds to the user's operation through various software control programs stored on the memory 290.
As shown in fig. 2, the controller 210 includes a RAM213 and a ROM214, and a graphic processor 216, a CPU processor 212, a communication interface 218, such as: a first interface 218-1 through an nth interface 218-n, and a communication bus. The RAM213 and the ROM214, the graphic processor 216, the CPU processor 212, and the communication interface 218 are connected via a bus.
And a RAM213 for storing instructions for various system boots. If the display apparatus 200 starts power-on upon receipt of the power-on signal, the CPU processor 212 executes a system boot instruction in the ROM, copies the operating system stored in the memory 290 to the RAM213, and starts running the boot operating system. After the start of the operating system is completed, the CPU processor 212 copies the various application programs in the memory 290 to the RAM213, and then starts running and starting the various application programs.
A graphics processor 216 for generating various graphics objects, such as: icons, operation menus, user input instruction display graphics, and the like. The display device comprises an arithmetic unit which carries out operation by receiving various interactive instructions input by a user and displays various objects according to display attributes. And a renderer for generating various objects based on the operator and displaying the rendered result on the display 280.
A CPU processor 212 for executing operating system and application program instructions stored in memory 290. And executing various application programs, data and contents according to various interactive instructions received from the outside so as to finally display and play various audio and video contents.
In some exemplary embodiments, the CPU processor 212 may include a plurality of processors. The plurality of processors may include one main processor and a plurality of or one sub-processor. A main processor for performing some operations of the display apparatus 200 in a pre-power-up mode and/or operations of displaying a screen in a normal mode. A plurality of or one sub-processor for one operation in a standby mode or the like.
The controller 210 may control the overall operation of the display apparatus 100. For example: in response to receiving a user command for selecting a UI object displayed on the display 280, the controller 210 may perform an operation related to the object selected by the user command.
Wherein the object may be any one of selectable objects, such as a hyperlink or an icon. Operations related to the selected object, such as: displaying an operation connected to a hyperlink page, document, image, or the like, or performing an operation of a program corresponding to the icon. The user command for selecting the UI object may be a command input through various input means (e.g., a mouse, a keyboard, a touch pad, etc.) connected to the display apparatus 200 or a voice command corresponding to a voice spoken by the user.
The memory 290 includes a memory for storing various software modules for driving the display device 200. Such as: various software modules stored in memory 290, including: the system comprises a basic module, a detection module, a communication module, a display control module, a browser module, various service modules and the like.
Wherein the basic module is a bottom layer software module for signal communication among the various hardware in the postpartum care display device 200 and for sending processing and control signals to the upper layer module. The detection module is used for collecting various information from various sensors or user input interfaces, and the management module is used for performing digital-to-analog conversion and analysis management.
For example: the voice recognition module comprises a voice analysis module and a voice instruction database module. The display control module is a module for controlling the display 280 to display image content, and may be used to play information such as multimedia image content and UI interface. And the communication module is used for carrying out control and data communication with external equipment. And the browser module is used for executing a module for data communication between browsing servers. And the service module is used for providing various services and modules including various application programs.
Meanwhile, the memory 290 is also used to store visual effect maps and the like for receiving external data and user data, images of respective items in various user interfaces, and a focus object.
Fig. 3 is a block diagram of a configuration of a control device according to an embodiment of the present application. As shown in fig. 3, the control apparatus 100 includes a controller 110, a communication interface 130, a user input/output interface 140, a memory 190, and a power supply 180.
The control apparatus 100 is configured to control the display apparatus 200 and may receive an input operation instruction of a user and convert the operation instruction into an instruction recognizable and responsive to the display apparatus 200, serving as an interaction intermediary between the user and the display apparatus 200. Such as: the user operates the channel up/down keys on the control device 100, and the display device 200 responds to the channel up/down operation.
In some embodiments, the control device 100 may be a smart device. Such as: the control apparatus 100 may install various applications that control the display apparatus 200 according to user demands.
In some embodiments, as shown in fig. 1, a terminal 300 or other intelligent electronic device may function similar to the control device 100 after installing an application that manipulates the display device 200. Such as: the user may implement the functions of controlling the physical keys of the device 100 by installing applications, various function keys or virtual buttons of a graphical user interface available on the terminal 300 or other intelligent electronic device.
The controller 110 includes a processor 112 and RAM113 and ROM114, a communication interface 130, and a communication bus. The controller 110 is used to control the operation of the control device 100, as well as the internal components for communication and coordination and external and internal data processing functions.
The communication interface 130 enables communication of control signals and data signals with the display apparatus 200 under the control of the controller 110. Such as: the received user input signal is transmitted to the display apparatus 200. The communication interface 130 may include at least one of a WiFi chip, a bluetooth module, an NFC module, and other near field communication modules.
A user input/output interface 140, wherein the input interface includes at least one of a microphone 141, a touch pad 142, a sensor 143, keys 144, and other input interfaces. Such as: the user can realize a user instruction input function through actions such as voice, touch, gesture, pressing, and the like, and the input interface converts the received analog signal into a digital signal and converts the digital signal into a corresponding instruction signal, and sends the instruction signal to the display device 200.
The output interface includes an interface that transmits the received user instruction to the display apparatus 200. In some embodiments, the interface may be an infrared interface or a radio frequency interface. Such as: when the infrared signal interface is used, the user input command needs to be converted into an infrared control signal according to an infrared control protocol, and the infrared control signal is sent to the display device 200 through the infrared sending module. And the following steps: when the rf signal interface is used, a user input command needs to be converted into a digital signal, and then modulated according to an rf control signal modulation protocol, and then transmitted to the display device 200 through the rf transmitting terminal.
In some embodiments, the control device 100 includes at least one of a communication interface 130 and an output interface. The control device 100 is provided with a communication interface 130, such as: the WiFi, bluetooth, NFC, etc. modules may encode the user input command according to the WiFi protocol, or the bluetooth protocol, or the NFC protocol, and send the encoded user input command to the display device 200.
A memory 190 for storing various operation programs, data and applications for driving and controlling the control apparatus 200 under the control of the controller 110. The memory 190 may store various control signal commands input by a user.
And a power supply 180 for providing operational power support to the various elements of the control device 100 under the control of the controller 110. A battery and associated control circuitry.
Fig. 4 is a schematic functional configuration diagram of a display device according to an embodiment of the present application. As shown in fig. 4, the memory 290 stores an operating system, an application program, contents, user data, and the like, and performs system operations for driving the display device 200 and various operations in response to a user under the control of the controller 210. The memory 290 may include volatile and/or nonvolatile memory.
The memory 290 is specifically configured to store an operating program for driving the controller 210 in the display device 200, and to store various application programs installed in the display device 200, various application programs downloaded by a user from an external device, various graphical user interfaces related to the applications, various objects related to the graphical user interfaces, user data information, and internal data of various supported applications. The memory 290 is used to store system software such as an OS kernel, middleware, and applications, and to store input video data and audio data, and other user data.
The memory 290 is specifically used for storing drivers and related data such as the audio/video processors 260-1 and 260-2, the display 280, the communication interface 230, the tuning demodulator 220, the input/output interface of the detector 240, and the like.
In some embodiments, memory 290 may store software and/or programs, software programs for representing an Operating System (OS) including, for example: a kernel, middleware, an Application Programming Interface (API), and/or an application program. For example, the kernel may control or manage system resources, or functions implemented by other programs (e.g., the middleware, APIs, or applications), and the kernel may provide interfaces to allow the middleware and APIs, or applications, to access the controller to implement controlling or managing system resources.
The memory 290, for example, includes a broadcast receiving module 2901, a channel control module 2902, a volume control module 2903, an image control module 2904, a display control module 2905, an audio control module 2906, an external instruction recognition module 2907, a communication control module 2908, a light receiving module 2909, a power control module 2910, an operating system 2911, and other application programs 2912, a browser module, and so forth. The controller 210 performs functions such as: a broadcast television signal reception demodulation function, a television channel selection control function, a volume selection control function, an image control function, a display control function, an audio control function, an external instruction recognition function, a communication control function, an optical signal reception function, an electric power control function, a software control platform supporting various functions, a browser function, and the like.
Fig. 5 is a block diagram of a configuration of a software system in a display device according to an embodiment of the present application. As shown in fig. 5, an operating system 2911, including executing operating software for handling various basic system services and for performing hardware related tasks, acts as an intermediary for data processing performed between application programs and hardware components. In some embodiments, portions of the operating system kernel may contain a series of software to manage the display device hardware resources and provide services to other programs or software code.
In other embodiments, portions of the operating system kernel may include one or more device drivers, which may be a set of software code in the operating system that assists in operating or controlling the devices or hardware associated with the display device. The driver may contain code to operate video, audio and/or other multimedia components. Examples include a display screen, a camera, flash, wiFi, and audio drivers.
The accessibility module 2911-1 is configured to modify or access the application program to achieve accessibility of the application program and operability of the displayed content.
A communication module 2911-2 for connection to other peripherals via associated communication interfaces and a communication network.
User interface modules 2911-3, which are used to provide objects for displaying user interfaces for access by various applications, enable user operability.
Control applications 2911-4 for controllable process management, including runtime applications and the like.
The event transmission system 2914, which may be implemented within the operating system 2911 or within the application program 2912, in some embodiments, on the one hand, within the operating system 2911 and on the other hand, within the application program 2912, is configured to listen for various user input events, and to refer to handlers that perform one or more predefined sets of operations in response to the identification of various types of events or sub-events, depending on the various events.
The event monitoring module 2914-1 is configured to monitor an event or a sub-event input by the user input interface.
The event recognition module 2914-1 is configured to input definitions of various types of events for various user input interfaces, recognize various events or sub-events, and transmit them to a process for executing one or more corresponding sets of processes.
The event or sub-event refers to an input detected by one or more sensors in the display device 200 and an input of an external control device (e.g., the control device 100). Such as: the method comprises the following steps of inputting various sub-events through voice, inputting gestures through gesture recognition, inputting sub-events through remote control key commands of the control equipment and the like. For example, the one or more sub-events in the remote control may include various forms, including but not limited to one or a combination of key press up/down/left/right, ok key, key press, etc., and non-physical key operation, such as move, press, release, etc.
The interface layout manager 2913, directly or indirectly receiving the input events or sub-events from the event transmission system 2914, monitors the input events or sub-events, and updates the layout of the user interface, including but not limited to the position of each control or sub-control in the interface, and the size, position, and level of the container, and other various execution operations related to the layout of the interface.
Fig. 6 is a schematic structural diagram of a server according to an embodiment of the present application. The server 400 includes a Central Processing Unit (CPU) 401, a system memory 404 including a Random Access Memory (RAM) 402 and a Read Only Memory (ROM) 403, and a system bus 405 connecting the system memory 404 and the central processing unit 401. The server 400 also includes a basic input/output system (I/O system) 406, which facilitates the transfer of information between devices within the computer, and a mass storage device 407 for storing an operating system 413, application programs 414, and other program modules 415.
The basic input/output system 406 includes a display 408 for displaying information and an input device 409 such as a mouse, keyboard, etc. for a user to input information. Wherein the display 408 and the input device 409 are connected to the central processing unit 401 through an input output controller 410 connected to the system bus 405. The basic input/output system 406 may also include an input/output controller 410 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input/output controller 410 may also provide output to a display screen, a printer, or other type of output device.
The mass storage device 407 is connected to the central processing unit 401 through a mass storage controller (not shown) connected to the system bus 405. The mass storage device 407 and its associated computer-readable media provide non-volatile storage for the server 400. That is, the mass storage device 407 may include a computer-readable medium (not shown) such as a hard disk or CD-ROM drive.
Without loss of generality, the computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. The system memory 404 and mass storage device 407, described above, may collectively be referred to as memory.
The server 400 may also operate as a remote computer connected to a network via a network, such as the internet, in accordance with various embodiments of the present invention. That is, the server 400 may be connected to the network 412 through the network interface unit 411 connected to the system bus 405, or may be connected to other types of networks or remote computer systems (not shown) using the network interface unit 411.
The memory further includes one or more programs, the one or more programs are stored in the memory, and the central processing unit 401 implements the method provided by the following embodiments of the present application by executing the one or more programs.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as a memory comprising instructions, executable by a processor of a server to perform the emoticon recommendation method according to various embodiments of the present invention is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Fig. 7 is a flowchart of an information acquisition method according to an embodiment of the present application. The method may be used for the display device shown in fig. 1 or fig. 2, such as a controller that may be used in the display device, as shown in fig. 7, the method may include:
And step 605, controlling the display screen to display the introduction information of the target object.
To sum up, in the information obtaining method provided in the embodiment of the present application, a target multimedia file indicated by the text in the second area displayed on the display screen may be determined, and then at least one candidate object related to the target multimedia file is determined, so that the target object is determined in the at least one candidate object, and the similarity between the feature of the target object and the feature of the object to be identified is greater than the similarity threshold. The target object to be identified in the first area displayed on the display screen is higher in relevance with the target multimedia file indicated by the characters in the second area, and the probability that the object to be identified belongs to the at least one candidate object is higher, so that the target object determined in the at least one candidate object related to the target multimedia file is higher in accuracy, and the accuracy of identifying the object to be identified is higher.
In addition, because the similarity between the characteristics of the at least one candidate object and the characteristics of the object to be recognized is only required to be determined, the object to be recognized can be recognized only by determining less similarity, the process of recognizing the object to be recognized is simplified, and the speed of recognizing the object to be recognized is improved.
Fig. 8 is a flowchart of an information obtaining method according to an embodiment of the present application. The method may be used for a display device as shown in fig. 1 or fig. 2, such as a controller that may be used in the display device, as shown in fig. 8, the method may include:
And step 802, performing screen capture on the image corresponding to the currently displayed audio-video file according to the screen capture instruction to generate a screen capture image.
And step 804, comparing the characteristic data of the face with the characteristic data of the characters corresponding to the characters, and identifying a target character in the characters corresponding to the characters.
And step 805, controlling the display screen to display the information of the target person.
In summary, in the information obtaining method provided in the embodiment of the present application, the characters in the screenshot image may be identified to screen the feature data of the person corresponding to the characters in the preset database, and then the feature data of the face is compared with the feature data of the person corresponding to the characters to determine the information of the target person in the persons corresponding to the characters. Because the relevance between the face in the screenshot image and the figure corresponding to the characters in the screenshot image is high, and the probability that the face belongs to the figure corresponding to the characters is high, the accuracy of the target object determined in the figure corresponding to the characters is high, and the accuracy of the face in the screenshot image is high.
In addition, only the feature data of the face and the feature data of the character corresponding to the character need to be compared, so that the comparison times only need to be determined to be less, the process of identifying the face in the screenshot image is simplified, and the speed of face identification is improved.
Fig. 9 is a flowchart of another information obtaining method according to an embodiment of the present application. The method may be used for the server shown in fig. 1 or fig. 6, and as shown in fig. 9, the method may include:
and step 901, receiving a screenshot image sent by the display device.
And 903, comparing the characteristic data of the face with the characteristic data of the character corresponding to the characters, and identifying a target character in the characters corresponding to the characters.
In summary, in the information obtaining method provided in the embodiment of the present application, the server may identify the characters in the screenshot image to screen the feature data of the person corresponding to the characters in the preset database, and then compare the feature data of the face with the feature data of the person corresponding to the characters to determine the information of the target person in the persons corresponding to the characters. Because the relevance between the face in the screenshot image and the character corresponding to the characters in the screenshot image is high, and the probability that the face belongs to the character corresponding to the characters is high, the accuracy of the target object determined in the character corresponding to the characters is high, and the accuracy of the face identification in the screenshot image is high.
In addition, only the characteristic data of the face and the characteristic data of the character corresponding to the character need to be compared, so that the comparison times only need to be determined to be less, the process of identifying the face in the screenshot image is simplified, and the speed of face identification is improved.
Fig. 10 is a flowchart of another information acquiring method according to an embodiment of the present application. The method may be used for a display device as shown in fig. 1 or fig. 2, such as a controller that may be used in the display device, as shown in fig. 10, the method may include:
step 701, acquiring an image of a first area in an image displayed by a display screen.
Alternatively, the display device may have a screen capture function, that is, a function of capturing an image displayed on the display screen of the display device. When the display screen displays the image, the user can control the display device to enable the screen capture function, so that the controller acquires the image displayed by the display screen, and further acquires the image of the first area in the image. Illustratively, the controller may receive a screen capture instruction input by a user. A user may send a screen capture instruction to a controller of the display device through the control device 100 shown in fig. 1, so that the controller acquires an image displayed by the display device according to the screen capture instruction. For example, the image displayed on the display screen may include an image corresponding to an audio-video file, and the controller may perform a screen capture on the image corresponding to the currently displayed audio-video file according to the screen capture instruction to generate a screen capture image. The audio and video files belong to multimedia files, the multimedia files can comprise movies, television shows, music, games, novels and the like, and the audio and video files can comprise movies, television shows and the like. The image corresponding to the audio-visual file may include a poster of the audio-visual file.
Fig. 11 is a schematic view of an interface displayed on a display screen according to an embodiment of the present application, and fig. 11 illustrates a display device as an example of an intelligent television. As shown in fig. 11, a screenshot icon T may be displayed on the display screen of the display device, and a selection frame K and a plurality of view display areas Q1 may be displayed on the display screen of the display device, where a poster of one or a type of multimedia file may be displayed in each view display area Q1. Alternatively, the interface displayed by the display screen shown in fig. 11 may be a home interface displayed by the smart television. It should be noted that, in the embodiment of the present application, only the arrangement of the multiple view display areas in the display screen in the arrangement manner shown in fig. 11 is taken as an example, and the arrangement manner of the multiple view display areas may also be changed arbitrarily.
The user can press a key on the remote controller (i.e. the control device) to control the selection box to move left, right, up or down, so that the selection box K frames different icons or view display areas displayed on the interface of the display screen. After the selection frame K frames the screen capture icon T, the user may perform a selected operation on the remote controller (e.g., press a determination key on the remote controller) to trigger the remote controller to send a screen capture instruction to the controller of the display device. Furthermore, the controller may obtain an image displayed on the display screen according to the screen capture instruction, where the image may include all the content displayed on the current interface of the display screen. When the controller acquires an image displayed by the display screen (i.e., a screenshot image), the controller may determine each view display area in the screenshot image to capture the image of each view display area in the screenshot image (e.g., segment the image according to each view display area), so as to obtain images of multiple view display areas. Alternatively, each view display area may be a partial area in the screen shot image. Each view display area can be a first area, so that images of a plurality of first areas can be obtained.
Optionally, the controller may perform edge detection on the screenshot image, thereby determining each region surrounded by a plurality of edges (e.g., four edges) in the screenshot image, and further determining each region as a view display region; optionally, the controller may also identify a continuous area in the screenshot image, which is in a single small color and is in a grid shape, and determine an area surrounded by each grid as a view display area; or the controller may also determine the view display area according to other manners, which is not limited in this embodiment of the application.
Optionally, the screen capture icon may not be displayed on the display screen, and the user may press a designated key on the remote controller to trigger the remote controller to send a screen capture instruction to the controller of the display device. Alternatively, if the controller receives a screen capture instruction after the selection frame frames a certain view display area, the controller may capture only an image displayed in the view display area. That is, the controller may also directly obtain an image of a first area in the image displayed on the display screen without obtaining all the content in the image displayed on the display screen.
Alternatively, when the display screen displays only one view display area, for example, when the display screen is playing a certain movie, the controller may directly determine that all areas in the image displayed on the display screen are the first area, and then directly intercept the image displayed on the display screen without segmenting the image.
Alternatively, the controller may determine the object to be recognized in the image of the first region after acquiring the image of the first region. For example, the controller may first preliminarily determine the type of the object to be recognized in the image of the first region. For example, the controller may input the image of the first area into a plurality of Support Vector Machines (SVMs) respectively to distinguish different types of objects to be recognized, and each SVM may recognize one type of object. Optionally, the type of the object to be recognized may include: humans, animals, buildings or plants, etc. Optionally, the type of the object to be identified in the image of the first area may also be a preset type, which is not limited in this application.
Alternatively, when the images of the plurality of first areas are determined according to the image displayed on the display screen, the controller may process the images of the plurality of first areas in parallel, and the processing procedures of the images of the plurality of first areas are the same. It should be noted that, in the embodiments of the present application, only a process of processing an image of a first area is explained, and in the following embodiments of the present application, an object to be recognized in an image of a first area is taken as an example for explanation.
For example, after determining the image of the first region, the controller may detect a face region in the image of the first region, that is, identify a face in the image of the first region. The face region in the image of the first region may be detected using a Multi-task convolutional neural network (MTCNN) model, for example. The face detection model mainly adopts three cascaded networks. The three cascaded networks are respectively a proposed Network (P-Net) for generating a candidate region of a face, a refinement Network (R-Net) for filtering a non-face region, and an Output Network (O-Net) for generating a final face region and face key points.
And 703, determining characters in a second area in the image displayed on the display screen.
The image displayed by the display screen can also comprise characters. The image displayed on the display screen may include a second area in addition to the first area, the first area corresponding to the second area, and the image in the second area including text. The characters included in the image of the second region will be referred to as characters of the second region hereinafter. The text of the second area may include an identifier of the multimedia file corresponding to the image of the first area, and is used to indicate the multimedia file corresponding to the image of the first area, and the text of the second area may also include an identifier of an object to be recognized in the image of the first area. For example, when the object to be identified in the image of the first region is a person, the text of the second region may be used to filter feature data of the person corresponding to the text of the second region in a preset database. The preset database may include feature data of a plurality of persons, and the feature data may include facial features. Optionally, the characteristic data may also include other characteristics indicative of the identity of the person. In the following embodiments of the present application, the feature data is simply referred to as a feature.
Optionally, the second region may be located within the first region, or the second region may also be located outside the first region. For example, the controller may detect an image displayed on the display screen, determine whether an area in which text is displayed exists around each view display area, and further determine whether a second area exists outside the first area. And when determining that the areas displaying the characters exist around the view display areas, determining that the second area is positioned outside the first area. And when determining that no region with characters is displayed around each view display area, determining that the second region is positioned in the first region.
In an alternative case, the second area is located outside the first area, and the controller may segment the image displayed on the display screen to obtain an image of at least one of the second areas. Alternatively, the controller may segment the image displayed on the display screen once, and then obtain the image of the first area and the image of the second area simultaneously. The controller may directly perform text recognition on the image of the second region to determine the text of the second region.
Illustratively, the second region is located at a designated location outside the first region. For example, the second region may be located in a specified direction around the first region, and for example, the second region may be located below, above, to the left, or to the right of the first region. And the second region may comprise a region of a specified width in a specified direction of the first region, and the first region may be spaced from the second region by less than a spacing threshold, e.g., the second region may be immediately adjacent to the first region. For example, with continued reference to FIG. 11, the image displayed by the display screen may include a second region Q2, and the second region Q2 may include a region at least one character wide below the first region Q1. The characters in the second area Q2 may indicate the multimedia file corresponding to the image of the first area; if the image in the first area Q1 is a poster of the movie zhangsanzhu, the text in the second area Q2 may be an identifier of the movie zhangsanzhu, which may be the movie name "zhangsanzhu".
In another optional case, the second region is located in the first region, and at this time, because the style, font, color, style, angle, and the like of the characters in the first region are all greatly different from the standard printing font, an image of the second region (that is, a text region) may be obtained from an image of the first region by using a scene detection (scene detection text) technique, and then the image of the second region is rotated, deformed, or denoised according to a requirement, so as to obtain an image convenient for character recognition. And the controller can perform character recognition on the processed image.
For example, the Scene Text detection technology may include a Connection Text Proposal Network (CTPN) algorithm, a slice connection (SegLink) algorithm, an Efficient and Accurate Scene Text detection (EAST) algorithm, and the like, where the EAST algorithm performs the Scene Text region detection based on a Full Convolution Network (FCN) Network and a non-maximum suppression (NMS) technology, and can effectively eliminate the multi-stage tuning problem. The improved algorithm of the EAST algorithm, the advanced EAST algorithm, is more suitable for recognizing short texts and long texts, and has better detection effect on text regions. In the embodiment of the application, an advanced EAST algorithm is adopted to determine the image of the second area in the image of the first area. The advanced EAST model (namely, the model adopting the advanced EAST algorithm) is a deep network model which takes a VGG model as a network backbone structure and is constructed based on a Keras high-level neural network library. Illustratively, posters containing scene text may be retrieved from a media asset database and tagged with text locations in the posters, and the posters with tagged text locations may be combined with a Document Analysis and Recognition International Conference (ICDAR) data set to form a training set. The advanced test model is trained by adopting the training set, so that the trained advanced test model can determine a text area in the poster according to the input poster.
The controller may initiate the advanced model in advance. Since the advanced east model may process an image of a fixed size, and the size of the image of the first region may be different from the fixed size, the controller may transform the size of the image of the first region to the fixed size after determining the image of the first region. After the advanced platform model detects the text region in the image of the first region, the controller may extract the content of the text region in the image of the first region, and further obtain the image of the text region (that is, the image of the second region). The controller may further detect a tilt angle of the image of the text region, and if the image of the text region is tilted, the controller may rotate the image of the text region to correct the tilt angle of the image of the text region. Because the image of the text region may include a background and redundant graphics in addition to the text, the controller may further perform denoising (for example, may perform wavelet transform) processing on the image of the text region, so as to obtain the image of the text region including relatively clear text. Then, the controller may perform character recognition on the image of the text region after the above-described processing.
Optionally, in the two optional cases, the controller may perform Character Recognition on the image of the second region by using an Optical Character Recognition (OCR) technology to obtain characters of the second region. The following describes a specific process of the controller performing character recognition on the image of the second area by using the OCR technology:
first, the controller may pre-process the image of the determined second region to obtain an image more convenient for character recognition. For example, the controller may increase the resolution of the image of the second region; when the image of the second area is low in definition, the controller can sharpen the image of the second area; and the controller can also reject the background in the image of the second area to make the text part in the image of the second area more prominent. For example, the controller may increase the resolution of the image in the second region to be higher than 300dpi (Dots Per Inch) through an interpolation algorithm, that is, the image includes at least 300 pixels Per Inch of length, where dpi (Dots Per Inch) refers to the number of pixels Per Inch in the image. The controller can also eliminate the background in the image of the second area by combining with a Retinex algorithm, and convert the image of the second area into a black and white image, wherein the Retinex algorithm is an image enhancement algorithm.
Then, the controller may perform character recognition through an OCR technology on the preprocessed image of the second region to obtain at least one phrase. For example, on one hand, when the second area is located outside the first area, the second area is at a fixed position, and the characters of the second area may be arranged in a certain fixed direction and form a phrase or sentence; the controller may directly perform text recognition on the image of the second region to obtain a text set representing a phrase or sentence. On the other hand, when the second area is located in the first area, the second area may have more characters and may be distributed more scattered, and the controller may determine a plurality of areas including the characters, where the second area includes the plurality of areas. The controller may perform character recognition on the image of each of the plurality of regions to obtain characters of each region, respectively, to obtain characters of the second region. Optionally, when the images of the plurality of regions are subjected to character recognition to obtain a plurality of single characters, the controller may further recombine the recognized single characters according to the positions of the respective regions to obtain a character set representing a sentence or a phrase. In any of the above aspects, after obtaining the text set representing a sentence or a phrase, the controller may perform a word segmentation process on the text set, and divide the text in the text set into phrases that can be used for classification. The word set can be participled based on the HanLP (Han Lankuage Processing) toolkit.
For example, fig. 12 is a schematic diagram of an image of a first region provided in an embodiment of the present application, and it is assumed that a second region is located in the first region. The controller may perform text region detection on the image of the first region, and determine that the second region Q2 includes a region enclosed by a dashed box in fig. 12. The controller may perform text recognition on the image of the second region, for example, perform text recognition on the image of the region enclosed by each dashed box, and the final recognition result may include the following phrases: 'Zhangsan Eritan', 'Zhang Sanji', 'Shi Anji', 'Lisi' and 'Wangwu' and '47 sets'.
Optionally, the controller may determine a text library corresponding to the text in the second region according to the image in the first region, and then the controller may perform text recognition on the image in the second region according to the corresponding text library (for example, compare the text in the preprocessed image in the second region with the text in the text library) to obtain the text in the second region.
For example, a specialized word stock may be built for a certain domain, with words in the word stock each indicating objects related to the domain. For example, a specialized text library may be constructed for the movie and television domain, which may include the identification of a plurality of objects in the movie and television domain; such as identification of the movie (e.g., movie name or movie title), identification of the actor (e.g., actor name), identification of the director (e.g., director's name), identification of the production company (e.g., company name), and identification of the cutting team, etc. By way of further example, a specialized corpus of text may be constructed for the music domain, which may also include identifications of multiple objects in the music domain; such as music logos (e.g. song names), word makers and composers logos (e.g. names) and production company logos, etc. The constructed word stock can be a file of the type of the traineddata, the suffix name of the word stock is the traineddata, the training of the word stock is finished, and all the words are the marks of the objects in the corresponding fields.
Optionally, when a text library is constructed, the format of an image containing characters in a certain field of media assets may be converted into a TIF format, and when there is a problem (such as missing part of information, low definition, and the like) in the characters in the image, the problem of the image is corrected, and the corrected image is used as training data to construct the text library. The TIF format is a picture processing format that minimizes compression of pictures. It should be noted that the construction of the text library may be completed by the controller, or may be completed by other devices, and the controller only needs to acquire the text library.
In the embodiment of the application, since the characters in the second area are used for indicating the multimedia file corresponding to the image in the first area, the relevance between the characters in the character library determined by the controller according to the image in the first area and the characters in the second area is relatively large, and the accuracy of character recognition on the image in the second area by the controller directly according to the corresponding character library is relatively high. And because characters in other character libraries do not need to be compared, the speed of character recognition on the image of the second area is higher.
And step 704, determining the target multimedia file indicated by the characters in the second area.
After the controller performs text recognition on the image of the second area in step 703 to obtain the text of the second area, the controller may determine the target multimedia file indicated by the text of the second area. Alternatively, the controller may predetermine the type of the target multimedia file. For example, the controller may determine an application program to which an interface currently displayed on the display screen belongs, that is, determine which interface of the application program is currently displayed, and further determine the type of the target multimedia file according to the function of the application program. If the controller determines that the display screen currently displays the interface of the music program, the controller can determine that the type of the target multimedia file is music; if the controller determines that the interface of the movie program is currently displayed, the controller may determine that the type of the target multimedia file is movie.
When the second area is located outside the first area, the controller can directly determine that the whole text of the second area is the identifier of the target multimedia file, and then the controller can directly determine the target multimedia file indicated by the text of the second area. For example, if the type of the target multimedia file is movie, the image in the first area is a poster of the movie, and the text in the second area includes "zhangsanzhuiji", the controller may directly determine that the target multimedia file indicated by the text in the second area is the zhangsanzhuiji of the movie.
When the second area is located in the first area, the text of the second area may include not only the identification of the multimedia file, but also other information related to the multimedia file (e.g., promotional information of the multimedia file or identification of other objects). At this time, the text in the second area may be further filtered, the identifier of the multimedia file included in the text in the second area is determined, and then the target multimedia file indicated by the identifier is determined.
Illustratively, the controller may determine the identity of the multimedia file included in the phrase identified in step 703. For example, the controller may input each phrase into the first identifier recognition model, and the first identifier recognition model may output a phrase belonging to an identifier of the multimedia file from the input phrases. For example, the controller may input the obtained phrases "zhan san zhu ji", "zhan san", "zhu ji", "li si", "wang wu", and "47 ji" into the first identifier recognition model, and then the first identifier recognition model may output the identifier "zhan san zhu ji" of the target multimedia file, so that the controller may determine that the target multimedia file is the movie zhan san zhu ji. It should be noted that the multimedia file indicated by the identifier output by the first identifier recognition model is the same as the target multimedia file type. The first logo recognition model may be a Support Vector Machine (SVM). The identification of a plurality of multimedia files can be obtained for each type of multimedia file, and then the first identification recognition model is trained according to the identifications, so that the first identification recognition model can determine the identification of the type of multimedia file in the input characters.
After the controller determines the target multimedia file, the controller may determine at least one candidate object of the plurality of objects associated with the target multimedia file according to the type of the object to be identified determined in step 702. Alternatively, for different types of multimedia files, the type of object associated with the multimedia file is different. For example, if the type of the multimedia file is a movie, the types of the objects related to the multimedia file may include actors, directors, production companies, editing teams, shooting sites, and the like. If the type of the multimedia file is music, the types of the objects related to the multimedia file may include a speaker, a composer, a manufacturing company, and the like.
For example, assuming that the controller obtains and determines that the type of the preset object to be identified is an actor in step 702, and the controller determines that the type of the target multimedia file is a movie in step 704, the controller may obtain an actor list corresponding to the target multimedia file in step 705, and determine actors in the actor list as candidate objects, thereby obtaining at least one candidate object. The at least one candidate object is also a character corresponding to the text in the second area in the preset database.
It should be noted that, after the controller determines the candidate, the controller may obtain information of the candidate. Illustratively, the information of the candidate may include at least the characteristics of the candidate and the introduction information. For example, the candidate is an actor, the feature of the candidate may include at least one of a face feature, a height, a skin color, and a slimming degree, and the introduction information of the candidate may include at least one of nationality, an age, a showing drama, and a character relationship of the candidate. Alternatively, the controller may look up the information of the candidate in the knowledge-graph. It should be noted that, the content included in the above features and the introduction information may be modified arbitrarily according to the actual situation, and the embodiment of the present application does not limit this.
And step 706, determining the auxiliary characteristics of the object to be recognized.
After determining the object to be recognized in step 702, the controller may extract an auxiliary feature of the object to be recognized, where the auxiliary feature may be any feature of the object to be recognized. For example, in step 702, if the controller determines that the object to be recognized is a person, the controller may determine a face area of the object to be recognized, perform face recognition on the face area, and determine auxiliary features such as skin color, gender, height, weight, and age area of the object to be recognized. It should be noted that the auxiliary feature may also include other features of the object to be identified, which is not limited in this application.
Step 707, determining at least one auxiliary object having an auxiliary feature in the at least one candidate object.
After the controller determines the auxiliary feature of the object to be recognized, the controller may further screen the determined at least one candidate object according to the auxiliary feature to obtain at least one auxiliary object more similar to the object to be recognized.
For example, the controller may determine whether each candidate has an assist feature of the object to be identified, such as determining whether the assist feature is included in the information of each candidate determined by the controller in step 705. When the information of a certain candidate includes the auxiliary information, the controller may determine that the candidate has the auxiliary feature, and further determine that the candidate is an auxiliary object. For example, when the assist feature is a woman, then the controller may take a woman of the at least one candidate as the assist object.
It should be noted that, in the embodiment of the present application, the corresponding similarity of any candidate object is: the similarity between the feature of any candidate object and the feature of the object to be recognized, so the similarity corresponding to the auxiliary object is the similarity between the feature of the auxiliary object and the feature of the object to be recognized. The process of determining the similarity between the features of the candidate object and the features of the object to be recognized is a process of comparing the features of the candidate object with the features of the object to be recognized.
It is assumed that the features of the candidate object and the features of the object to be recognized are both face features in the embodiment of the present application. After the controller determines the face region of the object to be recognized in step 702, face feature extraction may also be performed on the face region to determine the face feature of the object to be recognized. Furthermore, the controller may calculate a similarity between the face feature of the object to be recognized and the face feature of each of the at least one auxiliary object, to obtain a similarity corresponding to the at least one auxiliary object.
Optionally, the candidate objects may have priorities, and when determining the similarity corresponding to at least some of the candidate objects, the similarity corresponding to at least some of the candidate objects is determined in sequence according to the sequence from high priority to low priority of the candidate objects. The determination order of the similarity of the candidates having the same priority may be randomly determined by the controller. For example, the at least one candidate object determined by the controller includes all actors in a movie, and the priorities of the actors in all of the actors may be higher than the priorities of the actors, or the priorities of all of the actors may be the same. In step 708, the at least part of the candidate objects includes the at least one auxiliary object, and the controller may sequentially determine the similarity corresponding to the at least one auxiliary object according to the order of the priority of the at least one auxiliary object from high to low.
Optionally, the controller may adjust the priority of the at least one candidate object according to the text of the second region. Optionally, the controller may also adjust only the priority of the at least one auxiliary object (that is, at least part of the at least one candidate object) according to the text in the second region, and then determine the similarity corresponding to the at least one auxiliary object according to the adjusted priority. For example, after the controller performs text recognition on the image of the second area in step 703 to obtain text of the second area, it may further determine whether the text of the second area includes an identifier of any candidate object. When the text of the second area includes an identification of a candidate, the controller may increase the priority of the candidate. For example, the priority change amount of each candidate may be a fixed value, or the priority change amount of a candidate may also be related to the original priority of the candidate, such as the original priority of the candidate is positively or negatively related to the corresponding priority change amount.
For example, the controller may input each word group obtained by performing text recognition on the image in the second region in step 703 into the second identifier recognition model, and the second identifier recognition model may output the word groups belonging to the identifier of the object in the input word groups. It should be noted that the object indicated by the identifier output by the second identifier recognition model is the same as the type of the object to be recognized. The second signature recognition model may be a Support Vector Machine (SVM). For each type of object, the identifiers of a plurality of objects of the type can be obtained, and then the second identifier recognition model is trained according to the identifiers, so that the second identifier recognition model can determine the identifier of the type of object in the input characters. For example, the controller may input the obtained phrases "zhan san zhu ji", "zhan san", "zhu ji", "li si", "wang wu", and "47 sets" into the second identification recognition model, and the second identification recognition model may output the identifications "zhan san", "li si" and "wang wu" of the object. Then, the controller may determine whether the identifiers of the objects include identifiers of candidate objects, for example, the controller determines that "liqi" and "wangwu" are both identifiers of candidate objects, and then the controller may increase the priorities of the candidate objects liqi and wangwu, so as to preferentially calculate the similarity between the feature of liqi and the feature of the object to be recognized, and the similarity between the feature of wangwu and the feature of the object to be recognized.
And 709, determining whether the similarity which is greater than the similarity threshold exists in the similarities corresponding to the at least one auxiliary object. When there is a similarity greater than the similarity threshold in the similarities corresponding to the at least one auxiliary object, performing step 710; when the similarity corresponding to at least one auxiliary object is less than or equal to the similarity threshold, step 711 is executed.
After determining the priority corresponding to each auxiliary object, the controller may compare the low priority corresponding to the auxiliary object with a priority threshold to determine whether there is a similarity greater than the similarity threshold in the similarities corresponding to the at least one auxiliary object. When the similarity corresponding to the auxiliary object is greater than the similarity threshold, the controller may determine that the auxiliary object is similar to the object to be recognized. For example, the similarity threshold may be 90%, 95%, or other values, which is not limited in this embodiment of the application.
And step 710, determining the auxiliary object corresponding to the maximum similarity as the target object.
The target object is also the recognition result obtained by recognizing the object to be recognized. When the similarity greater than the similarity threshold exists in the similarities corresponding to the at least one auxiliary object, the controller may determine that the auxiliary objects corresponding to the similarities greater than the similarity threshold are most likely to be the same as the object to be recognized, and further, the controller may determine a target object that is most likely to be the same as the object to be recognized. And when the image displayed on the display screen is the image corresponding to the audio-video file and the object to be identified in the image of the first area is a person, the target object is a target person.
And 711, determining the corresponding similarity of the other objects except the at least one auxiliary object in the at least one candidate object. Step 712 is performed.
When the similarity corresponding to the at least one auxiliary object is less than or equal to the similarity threshold, the controller may determine that the similarities of the at least one auxiliary object and the object to be recognized are all small, and the at least one auxiliary object may not be the same as the object to be recognized. Further, the controller may match the object to be recognized with other objects in the at least one candidate object except for the at least one auxiliary object, for example, the controller may determine a similarity of the object to be recognized with each of the other objects.
Optionally, the controller may also determine the similarity corresponding to the other objects in turn according to the order of the priorities of the other objects from high to low. For the description of the priority, reference may be made to step 708, and details of the embodiment of the present application are not described herein.
And 712, when the similarity greater than the similarity threshold exists in the similarities corresponding to the other objects, determining the other object corresponding to the maximum similarity as the target object.
It should be noted that, reference may be made to the description of step 710 in step 712, and details of the embodiment of the present application are not described again.
Optionally, if there is no similarity with a similarity greater than the similarity threshold in the similarities corresponding to the at least one candidate object, the controller may determine that the at least one candidate object and the object to be identified are different. At this time, the controller may determine remaining objects of the same type as the object to be identified, except the candidate object, in the preset database, and further determine similarities corresponding to the remaining objects, so as to determine an object corresponding to the maximum similarity greater than the similarity threshold among the similarities corresponding to the remaining objects, as the target object. The predetermined database may include characteristics and introduction information of a plurality of objects of the same type as the object to be recognized.
And 713, controlling the display screen to display the introduction information of the target object.
When the controller determines the target object, the controller can determine that the object to be recognized is successfully recognized, and then the controller can acquire the information of the target object. For example, the controller may obtain information about the target object in a predetermined database. And then, the controller controls the display screen to display the information of the target object, so that the user can further know the target multimedia file according to the information and further select the multimedia file which is more interested. Alternatively, the controller may acquire only the introduction information of the target object and control the display screen to display the introduction information.
Alternatively, when the introduction information of the target object is displayed, the display screen may stop displaying the currently displayed screen, or the display screen may display the introduction information of the target object on the currently displayed screen with the currently displayed screen as a background. Alternatively, the display screen may also display the images of the first areas, respectively, and display the introduction information of the target object around the images of the first areas, or on the images of the first areas. Optionally, the introduction information of each target object may be displayed in a list form, or may be displayed in other manners, which is not limited in this application embodiment.
It should be noted that, in the embodiment of the present application, each step takes the execution sequence shown in fig. 7 as an example, and optionally, the sequence of each step in the embodiment of the present application may also be adjusted accordingly according to requirements. If step 702 can be executed after step 703 or step 704, step 706 can be executed after step 702, or step 701 and step 703 can be executed in parallel.
It should be noted that, in the embodiment of the present application, the controller identifies the characters in the second area in the image displayed on the display screen, and obtains text information (such as an identifier of the target multimedia file or an identifier of the object to be identified) closely related to the image in the first area. And screening the objects in a preset database according to the text information, determining at least one candidate object with high relevance with the object to be identified, and identifying the object to be identified in the at least one candidate object. Therefore, the number of objects to be compared with the object to be recognized is reduced, the probability that the object to be recognized is the same as the object to be compared is higher, the time wasted in comparing the object to be recognized with more objects with lower reference values is avoided, and the situation that other objects which have higher similarity with the object to be recognized but are irrelevant to the target multimedia file are mistakenly recognized is avoided, so that the object to be recognized can be recognized more quickly and accurately, and the intelligent experience of a terminal user is optimized.
To sum up, in the information obtaining method provided in the embodiment of the present application, the target multimedia file indicated by the text in the second area displayed on the display screen may be determined, and then at least one candidate object related to the target multimedia file may be determined, so that the target object is determined in the at least one candidate object, and the similarity between the feature of the target object and the feature of the object to be identified is greater than the similarity threshold. The relevance between the object to be recognized in the first area displayed by the display screen and the target multimedia file indicated by the characters in the second area is high, and the probability that the object to be recognized belongs to the at least one candidate object is high, so that the accuracy of the target object determined in the at least one candidate object related to the target multimedia file is high, and the accuracy of recognizing the object to be recognized is high.
In addition, because the similarity between the characteristics of the at least one candidate object and the characteristics of the object to be recognized is only required to be determined, the object to be recognized can be recognized only by determining less similarity, the process of recognizing the object to be recognized is simplified, and the speed of recognizing the object to be recognized is improved.
Fig. 13 is a flowchart of an information obtaining method according to another embodiment of the present application. The method may be used in the information acquisition system shown in fig. 1, and as shown in fig. 13, the method may include:
It should be noted that, in step 1302, reference may be made to the related description of the controller obtaining the screen capturing instruction in step 701, and details of the embodiment of the present application are not described again.
It should be noted that, in step 1303, reference may be made to the related description of the controller obtaining the image displayed by the display device according to the screen capturing instruction in step 701, and details of this embodiment are not repeated.
And step 1303, the display device sends the screen shot image to the server.
For example, after the display device generates the screenshot image, the display device may send the screenshot image to the server through a communication connection between the display device and the server. Optionally, the display device may further send a recognition instruction to the server, instructing the server to recognize the face in the image.
In step 1304, the server determines the view display area in the screenshot image.
Optionally, when receiving the screenshot image and the identification instruction sent by the display device, the server may determine the view display area in the screenshot image according to the identification instruction, and further identify the screenshot image. The number of the view display areas in the screen capture image can be one or more. Alternatively, the view display area may be a partial area in the screen shot image.
It should be noted that, in step 1304, reference may be made to the related description of determining the view display area by the controller in step 701, which is not described again in this embodiment of the present application.
Optionally, for each view display area in the screenshot image, the server may propose an area outside the view display area in the screenshot image, thereby obtaining an image of one view display area.
It should be noted that, in step 1306, reference may be made to the related description of determining the object to be identified in step 702, and details of this embodiment are not described herein again.
It should be noted that, in step 1307, reference may be made to the introduction of step 703 described above, and details of the embodiment of the present application are not described again.
It should be noted that, reference may be made to the description of the above steps 704 and 705 for the step 1308, and details of the embodiment of the present application are not described again.
It should be noted that, in step 1309, reference may be made to descriptions of step 706 to step 712, which is not described again in this embodiment of the present application.
When the controller determines the target task, the controller can determine that the face recognition in the image of the view display area in the screen capture image is successful, determine that the face belongs to the target person, and further can acquire the information of the target person. For example, the controller may obtain the information of the target person from a preset database.
For example, after obtaining the information of the target person, the server may send the information of the target person to the display device through a communication connection between the display device and the server. Alternatively, the server may also transmit a display instruction to the display device instructing the display device to display information of the target person.
Alternatively, the display device may display the information of the target person according to the display instruction after determining that the information of the target person and the display instruction transmitted by the server are received. For the display manner of the information of the target person, reference may be made to the related introduction of the introduction information of the target object displayed by the display screen in step 713, and details of the embodiment of the present application are not described again.
It should be noted that, in the embodiment of the present application, only processing an image of one view display area in a screenshot image is taken as an example for explanation, processing manners of images of all view display areas in the screenshot image are the same, and details are not repeated in the embodiment of the present application.
In summary, in the information obtaining method provided in the embodiment of the present application, the server may identify the characters in the screenshot image to screen the feature data of the person corresponding to the characters in the preset database, and then compare the feature data of the face with the feature data of the person corresponding to the characters to determine the information of the target person in the persons corresponding to the characters. Because the relevance between the face in the screenshot image and the figure corresponding to the characters in the screenshot image is high, and the probability that the face belongs to the figure corresponding to the characters is high, the accuracy of the target object determined in the figure corresponding to the characters is high, and the accuracy of the face in the screenshot image is high.
In addition, only the characteristic data of the face and the characteristic data of the character corresponding to the character need to be compared, so that the comparison times only need to be determined to be less, the process of identifying the face in the screenshot image is simplified, and the speed of face identification is improved.
It should be understood that the terms "first," "second," "third," and the like in the description and in the claims of the present application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used are interchangeable under appropriate circumstances and can be implemented in sequences other than those illustrated or otherwise described herein with respect to the embodiments of the application, for example.
Furthermore, the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or device that comprises a list of elements is not necessarily limited to those elements explicitly listed, but may include other elements not expressly listed or inherent to such product or device.
The term module, as used herein, refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware or/and software code that is capable of performing the functionality associated with that element.
The term "remote control" as used in this application refers to a component of an electronic device (such as the display device disclosed in this application) that is typically wirelessly controllable over a relatively short range of distances. Typically using infrared and/or Radio Frequency (RF) signals and/or bluetooth to connect with the electronic device, and may also include WiFi, wireless USB, bluetooth, motion sensor, etc. For example: the hand-held touch remote controller replaces most of the physical built-in hard keys in a common remote control device with a user interface in a touch screen.
The term "gesture" as used in this application refers to a user's behavior through a change in hand shape or an action such as hand motion to convey a desired idea, action, purpose, and/or result.
It should be noted that: in the control device provided in the above embodiment, when controlling to acquire information, only the division of the above functional modules is used for illustration, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the above described functions.
It should be noted that, the method embodiments provided in the embodiments of the present application can be mutually referred to corresponding apparatus embodiments, and the embodiments of the present application do not limit this. The sequence of the steps of the method embodiments provided in the embodiments of the present application can be appropriately adjusted, and the steps can be correspondingly increased or decreased according to the situation, and any method that can be easily conceived by those skilled in the art within the technical scope disclosed in the present application shall be covered by the protection scope of the present application, and therefore, the details are not repeated.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Embodiments of the present application also provide a computer program product containing instructions, which when run on a computer, cause the computer to execute the method provided by the embodiments of the present application.
The above description is intended only to illustrate the alternative embodiments of the present application, and should not be construed as limiting the present application, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present application should be included in the protection scope of the present application.
Claims (9)
1. An information acquisition method, characterized in that the method comprises:
the method comprises the steps that a screen shot is conducted on an image displayed by a display screen to obtain a screen shot image, the image displayed by the display screen comprises a plurality of first areas and a plurality of second areas, the first areas are used for displaying posters of multimedia files, and the second areas are used for displaying characters indicating the multimedia files;
performing edge detection on the screen shot image to determine images of the plurality of first areas, wherein the image of each first area comprises an object to be identified;
aiming at a target first area in the plurality of first areas, determining a target second area corresponding to the target first area; the target first area is any one of the plurality of first areas, the target second area is a second area which is located in a specified direction around the target first area and is spaced from the target first area by less than a spacing threshold value, and the characters in the target second area comprise an identifier of a target multimedia file to which the poster displayed in the target first area belongs;
identifying characters in the target second area to determine a target multimedia file indicated by the characters;
determining at least one alternative object with the same type as the object to be identified in the image of the target first area in a plurality of objects related to the target multimedia file;
determining a target object in the at least one candidate object, wherein the similarity between the characteristics of the target object and the characteristics of the object to be identified is greater than a similarity threshold value;
and controlling the display screen to display the introduction information of the target object.
2. The method of claim 1, wherein identifying the text in the target second region to determine the target multimedia file indicated by the text comprises:
inputting the characters into a first identification recognition model to obtain the identification of the multimedia file included by the characters;
and determining the target multimedia file indicated by the identification of the multimedia file.
3. The method according to claim 1 or 2, wherein the determining a target object of the at least one candidate object comprises:
determining an auxiliary characteristic of the object to be recognized;
determining at least one auxiliary object having the auxiliary feature among the at least one candidate object;
determining the target object from the at least one auxiliary object.
4. The method of claim 3, wherein determining the target object from the at least one auxiliary object comprises:
when the similarity greater than a similarity threshold exists in the similarities corresponding to the at least one auxiliary object, determining the auxiliary object corresponding to the maximum similarity as the target object;
wherein, the corresponding similarity of any candidate object in the at least one candidate object is: similarity between the features of any candidate object and the features of the object to be identified.
5. The method of claim 4, wherein determining the target object from the at least one auxiliary object further comprises:
and when the similarity corresponding to the at least one auxiliary object is smaller than or equal to the similarity threshold value and the similarity corresponding to the candidate object is larger than the similarity threshold value, determining the candidate object corresponding to the maximum similarity as the target object.
6. An information acquisition method, characterized in that the method comprises:
receiving a screen capture instruction input by a user when a display screen displays a plurality of images of a first area and a plurality of images of a second area, wherein the first area is used for displaying posters of audio-video files;
the image displayed by the display screen is subjected to screen capture according to the screen capture instruction so as to generate a screen capture image;
performing edge detection on the screen shot image to determine images of the plurality of first areas, wherein the image of each first area comprises a human face;
for a target first area in the plurality of first areas, recognizing a human face in an image of the target first area, and determining a target second area corresponding to the target first area; the target first area is any one of the plurality of first areas, the target second area is a second area which is located in a specified direction around the target first area and is spaced from the target first area by less than a spacing threshold value, and the characters in the target second area comprise an identifier of a target multimedia file to which the poster displayed in the target first area belongs;
identifying characters of the target second area, wherein the characters are used for screening character feature data corresponding to the target multimedia file indicated by the characters in a preset database;
comparing the characteristic data of the face with the characteristic data of the figure corresponding to the target multimedia file indicated by the characters, and identifying a target figure in the figure corresponding to the target multimedia file indicated by the characters;
and controlling a display screen to display the information of the target person.
7. An information acquisition method, characterized in that the method comprises:
receiving a screen shot image sent by display equipment; the screen shot image is obtained by the screen shot of the display device when displaying a plurality of first area images and a plurality of second area images, wherein the first area is used for displaying posters of audio-video files;
performing edge detection on the screen shot image to determine images of the plurality of first areas, wherein the image of each first area comprises a human face;
for a target first area in the plurality of first areas, recognizing a human face in an image of the target first area, and determining a target second area corresponding to the target first area; the target first area is any one of the plurality of first areas, the target second area is a second area which is located in a specified direction around the target first area and has a distance with the target first area smaller than a distance threshold value, and characters in the target second area comprise an identifier of a target multimedia file to which the poster displayed in the target first area belongs;
identifying characters of the target second area, wherein the characters are used for screening character feature data corresponding to the target multimedia file indicated by the characters in a preset database;
comparing the characteristic data of the face with the characteristic data of the figures corresponding to the target multimedia files indicated by the characters, and identifying the target figures in the figures corresponding to the target multimedia files indicated by the characters;
and sending the information of the target person to the display equipment so as to enable the display equipment to display the information of the target person.
8. A display device, characterized in that the display device comprises: a display screen, and a controller in communication with the display screen; the controller is configured to execute the information acquisition method according to any one of claims 1 to 6.
9. A server communicatively coupled to a display device, the server comprising a processor and a memory, the memory having stored therein at least one instruction, which when executed by the processor, implements the information acquisition method of claim 7.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010080037.5A CN111343512B (en) | 2020-02-04 | 2020-02-04 | Information acquisition method, display device and server |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010080037.5A CN111343512B (en) | 2020-02-04 | 2020-02-04 | Information acquisition method, display device and server |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN111343512A CN111343512A (en) | 2020-06-26 |
| CN111343512B true CN111343512B (en) | 2023-01-10 |
Family
ID=71188040
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202010080037.5A Active CN111343512B (en) | 2020-02-04 | 2020-02-04 | Information acquisition method, display device and server |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN111343512B (en) |
Families Citing this family (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111930326B (en) * | 2020-06-30 | 2024-12-20 | 西安万像电子科技有限公司 | Image processing method, device and system |
| CN114945102B (en) * | 2020-07-14 | 2024-11-08 | 海信视像科技股份有限公司 | Display device and method for character recognition and display |
| CN111787376B (en) * | 2020-07-22 | 2022-10-14 | 聚好看科技股份有限公司 | Display device, server and video recommendation method |
| CN111984801B (en) * | 2020-09-04 | 2023-12-12 | 腾讯科技(深圳)有限公司 | Media information display method, storage medium and electronic display device |
| CN112348077A (en) * | 2020-11-04 | 2021-02-09 | 深圳Tcl新技术有限公司 | Image recognition method, device, equipment and computer readable storage medium |
| CN114286142B (en) * | 2021-01-18 | 2023-03-28 | 海信视像科技股份有限公司 | Virtual reality equipment and VR scene screen capturing method |
| CN112926420B (en) * | 2021-02-09 | 2022-11-08 | 海信视像科技股份有限公司 | Display device and menu character recognition method |
| CN113326395B (en) * | 2021-04-23 | 2025-04-29 | 维沃移动通信有限公司 | Information processing method, device, electronic device and storage medium |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103246678A (en) * | 2012-02-13 | 2013-08-14 | 腾讯科技(深圳)有限公司 | Method and device for previewing web page contents |
| CN104125492A (en) * | 2013-04-23 | 2014-10-29 | 深圳市快播科技有限公司 | Video playing method and device |
| CN108259973A (en) * | 2017-12-20 | 2018-07-06 | 青岛海信电器股份有限公司 | Smart TV and method for displaying graphical user interface of screenshot of TV screen |
| CN109168069A (en) * | 2018-09-03 | 2019-01-08 | 聚好看科技股份有限公司 | A kind of recognition result subregion display methods, device and smart television |
| CN109189289A (en) * | 2018-09-03 | 2019-01-11 | 聚好看科技股份有限公司 | A kind of method and device generating icon based on screenshotss image |
| CN109906445A (en) * | 2016-11-07 | 2019-06-18 | 高通股份有限公司 | Associating the captured screenshot with application-specific metadata defining a session state of an application contributing image data to the captured screenshot |
Family Cites Families (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102158732A (en) * | 2011-04-22 | 2011-08-17 | 深圳创维-Rgb电子有限公司 | Information search method based on television pictures and system |
| CN104809120B (en) * | 2014-01-24 | 2020-10-30 | 腾讯科技(深圳)有限公司 | Information processing method and device |
| CN105868684A (en) * | 2015-12-10 | 2016-08-17 | 乐视网信息技术(北京)股份有限公司 | Video information acquisition method and apparatus |
| CN106909548B (en) * | 2015-12-22 | 2021-01-08 | 北京奇虎科技有限公司 | Picture loading method and device based on server |
| CN107480236B (en) * | 2017-08-08 | 2021-03-26 | 深圳创维数字技术有限公司 | An information query method, device, equipment and medium |
| CN110019899B (en) * | 2017-08-25 | 2023-10-03 | 腾讯科技(深圳)有限公司 | Target object identification method, device, terminal and storage medium |
| CN107862315B (en) * | 2017-11-02 | 2019-09-17 | 腾讯科技(深圳)有限公司 | Subtitle extraction method, video searching method, subtitle sharing method and device |
| CN107967110A (en) * | 2017-11-30 | 2018-04-27 | 广东小天才科技有限公司 | Playing method, playing device, electronic equipment and computer readable storage medium |
| CN109218629B (en) * | 2018-09-14 | 2021-02-05 | 三星电子(中国)研发中心 | Video generation method, storage medium and device |
-
2020
- 2020-02-04 CN CN202010080037.5A patent/CN111343512B/en active Active
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103246678A (en) * | 2012-02-13 | 2013-08-14 | 腾讯科技(深圳)有限公司 | Method and device for previewing web page contents |
| CN104125492A (en) * | 2013-04-23 | 2014-10-29 | 深圳市快播科技有限公司 | Video playing method and device |
| CN109906445A (en) * | 2016-11-07 | 2019-06-18 | 高通股份有限公司 | Associating the captured screenshot with application-specific metadata defining a session state of an application contributing image data to the captured screenshot |
| CN108259973A (en) * | 2017-12-20 | 2018-07-06 | 青岛海信电器股份有限公司 | Smart TV and method for displaying graphical user interface of screenshot of TV screen |
| WO2019120008A1 (en) * | 2017-12-20 | 2019-06-27 | 青岛海信电器股份有限公司 | Smart television and method for displaying graphical user interface of television screen shot |
| CN109168069A (en) * | 2018-09-03 | 2019-01-08 | 聚好看科技股份有限公司 | A kind of recognition result subregion display methods, device and smart television |
| CN109189289A (en) * | 2018-09-03 | 2019-01-11 | 聚好看科技股份有限公司 | A kind of method and device generating icon based on screenshotss image |
Also Published As
| Publication number | Publication date |
|---|---|
| CN111343512A (en) | 2020-06-26 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111343512B (en) | Information acquisition method, display device and server | |
| CN109618206B (en) | Method and display device for presenting user interface | |
| KR102830405B1 (en) | Image detection apparatus and method thereof | |
| CN111818378B (en) | Display device and person identification display method | |
| CN112399212A (en) | Display device, file sharing method and server | |
| CN111770370A (en) | Display device, server and media asset recommendation method | |
| CN111625716A (en) | Media asset recommendation method, server and display device | |
| US11997341B2 (en) | Display apparatus and method for person recognition and presentation | |
| CN112165641A (en) | Display device | |
| CN114079829A (en) | Display device and generation method of video collection file watermark | |
| CN112492390A (en) | Display device and content recommendation method | |
| CN111669662A (en) | Display device, video call method and server | |
| CN114339346B (en) | Display device and image recognition result display method | |
| CN111654732A (en) | Advertisement playing method and display device | |
| CN111556350B (en) | Intelligent terminal and man-machine interaction method | |
| CN112473121A (en) | Display device and method for displaying dodging ball based on limb recognition | |
| CN111163343A (en) | Method for recognizing pattern recognition code and display device | |
| CN113163228A (en) | Media asset playing type marking method and server | |
| CN114554266B (en) | Display device and display method | |
| CN114390329B (en) | Display device and image recognition method | |
| CN113468351B (en) | Intelligent device and image processing method | |
| CN113973216A (en) | A method for generating a video collection and a display device | |
| CN113542899A (en) | Information display method, display device and server | |
| CN113115081A (en) | Display device, server and media asset recommendation method | |
| CN113542900A (en) | Media information display method and display equipment |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |