Disclosure of Invention
In order to solve at least one technical problem, the first aspect of the disclosure proposes a method for controlling a display device based on a gesture, which comprises obtaining skeleton node information of a manipulator and skeleton node information of a manipulation hand in an image, generating a feature vector describing a hand gesture of the manipulator according to the skeleton node information of the manipulation hand, matching the feature vector describing the hand gesture of the manipulator with a gesture model in a gesture model library, dividing the image into a gesture type frame image and a gesture type frame image according to a matching result to obtain gesture type information in the gesture type frame image, obtaining gesture type information in the gesture type frame image and position information of the hand in the gesture type frame image according to the skeleton node information of the manipulator and the gesture type information of front and rear frames of the gesture type frame image, and generating instructions for controlling the display device according to the gesture type information of the gesture type frame image, the position information of the hand and the gesture type information of the gesture type frame image and the position information of the hand.
Preferably, the method further comprises the step of capturing real-time skeleton node information of a manipulator through a depth camera, wherein the skeleton node information comprises three-dimensional coordinates of joints and connection relations among the joints.
Preferably, the skeletal node information of the operator in the gesture type frame images of the front frame and the rear frame of the gesture type frame image is used for interpolation, and the hand skeletal node information of the operator in the gesture type frame image is calculated.
Preferably, the hand skeleton node information of the operator in the gesture-free type frame image is matched with a gesture model in a gesture model library, so that the gesture type of the operator in the gesture-free type frame image is obtained.
Preferably, the wrist node position of the operator in the gesture-free type frame image is obtained according to the skeleton node information of the operator and the skeleton node information of the control hand, and the hand position information of the operator in the gesture-free type frame image is obtained according to the wrist node position of the operator in the gesture-free type frame image.
Preferably, the gesture type information in the gesture type frame images of the front frame and the rear frame of the gesture type frame image is compared, and if the gesture type information of the front frame and the rear frame belongs to the same gesture type, the gesture information in the gesture type frame image is judged to be the same as the gesture type information of the front frame and the rear frame.
Preferably, if the gesture type information of the previous and subsequent frames respectively belong to two continuous different gesture types, it is determined that the gesture type information in the gesture-free type frame image is the same as the gesture type information of the previous frame or the gesture type information of the subsequent frame.
Preferably, the gestures of the gesture type frame image and the gesture type frame image are classified into a first-level response gesture and a second-level response gesture, displacement change data of the first gesture and the second gesture between frames of the same control hand of a manipulator are obtained, when the displacement change data of the first gesture reach the condition triggering the second gesture, whether the displacement change data of the second gesture reach the condition triggering menu display is judged, and if the displacement change data reach the condition triggering the menu display, an instruction for controlling the display device is generated.
The present disclosure proposes in a second aspect an apparatus for controlling a display device based on a gesture, comprising an acquisition unit configured to acquire skeletal node information of a manipulator and skeletal node information of a manipulation hand in an image;
The matching unit is used for generating a characteristic vector describing the hand gesture of the operator according to the skeletal node information of the control hand, matching the characteristic vector describing the hand gesture of the operator with a gesture model in the gesture model library, and dividing the image into a gesture type frame image and a gesture type frame image according to a matching result to obtain gesture type information in the gesture type frame image; the device comprises a judging unit, a generating unit and a control unit, wherein the judging unit is used for acquiring gesture type information and hand position information in gesture-free type frame images according to skeleton node information of a manipulator and gesture type information of front and rear frames of the gesture-free type frame images, and the generating unit is used for generating instructions for controlling display equipment according to the gesture type information and hand position information of the gesture-free type frame images and the gesture type information and hand position information of the gesture-free type frame images.
The present disclosure proposes in a third aspect a computer readable medium having stored therein a computer program to be loaded and executed by a processing module to implement the steps of any of the methods described above.
The method for controlling the display device based on the gestures has the advantages that bone node information of a manipulator and bone node information of a hand are obtained in an image, feature vectors of the manipulator describing the hand gestures are generated according to the bone node information of the hand, the feature vectors of the manipulator describing the hand gestures are matched with gesture models in a gesture model library, the image is divided into gesture type frame images and gesture type frame images according to matching results, gesture type information in the gesture type frame images is obtained, gesture type information and hand position information in the gesture type frame images are obtained according to the bone node information of the manipulator and the gesture type information of front frames and back frames of the gesture type frame images, and instructions for controlling the display device are generated according to the gesture type information of the gesture type frame images, the hand position information, the gesture type information of the gesture type frame images and the hand position information. By combining the gesture of the gesture-free type frame image and the position information of the hand, the display device responds to the operation corresponding to the gesture more accurately and rapidly. Response delays are avoided. And finally, the distributed display equipment can be controlled rapidly and accurately. And the delay reaction of the distributed display equipment or the display equipment which is triggered by mistake due to the delay or misjudgment of the control gesture is avoided.
Detailed Description
Further technical means or technical effects to which the present disclosure relates will be described below, and it is apparent that examples (or embodiments) provided are only some embodiments, but not all, which are intended to be covered by the present disclosure. All other embodiments that can be made by those skilled in the art without the exercise of inventive faculty, based on the embodiments in this disclosure and the explicit or implicit presentation of the drawings, are intended to be within the scope of the present disclosure.
In the conventional method for controlling a scene such as a large screen by using gestures, the performance of a computer vision algorithm may be affected by conditions such as illumination and speed of hand motion. Thus, when capturing an image, some of the gesture in the gesture image is blurred or the speed of the arm waving is too high, which also causes the gesture in the picture to be blurred. Some gestures may cause a delay in controlling a large screen and may cause false touches. For example, when a gesture that is a hand swipe to the right is used to control page turning, sometimes the display device does not respond to the page turning action or turns the page after a while after a little bit of the hand swipe to the right. Some cases of judgment errors or response delays may occur.
In the present gesture recognition process, if the frames of the gesture-free categories such as the blurring appear, the frames of the gesture-free categories are discarded. This may result in the lack of information during the continuous gesture recognition process, thereby affecting the accuracy and timeliness of gesture recognition.
In this regard, the embodiment of the disclosure provides a method for controlling a display device based on gestures, which can be applied to the fields of various industries and the like, such as an emergency command and dispatch center, a public security command and dispatch center, a traffic command and dispatch center, an energy command and dispatch center, a smart city command and dispatch center, where a purpose of rapid and accurate man-machine interaction is achieved by controlling a large screen of the display device, and the like, the control and dispatch system, such as switching a distributed signal source and taking over a mouse in the signal source, is controlled by controlling the large screen, so that content in the signal source can be operated arbitrarily. The command center is used as a central brain for command scheduling control, plays an extremely important role in social management and civilian development, and has high requirements on accuracy and speed of control. In this regard, the gesture-based method for controlling the display device according to the embodiments of the present disclosure does not need to use any complicated control device or wear a sensor, and only by using the behavior recognition of the living person, the taking over and the control of the control right of the large screen of the command center can be quickly and accurately completed, and the quick and accurate interactive operation control such as signal screen loading, switching, scaling, etc. of the person and the large screen content can be efficiently realized by simple space gesture operation. Of course, the method disclosed by the invention can also be applied to the general application scene with low requirements for performing operations such as turning pages on the PPT.
An exemplary system architecture of the gesture-based means of controlling a display device of the present disclosure may be applied. The system architecture may include a camera with a cradle head, a server, and a display device. The display device is a large screen, the camera with the cradle head is connected with the server through a serial port line and a USB line, and the server is connected with the display device through a network cable. The image information shot by the camera is transmitted to the server through the USB line, the server processes, analyzes and makes a decision on the received image information to generate information or a command, and the server can also transmit the information to the distributed scheduling and image integrated management platform through the network port. The distributed scheduling and image integrated management platform receives the information and enables a large screen to display corresponding operation feedback and the like.
Display devices generally require large-screen, multi-color, high-brightness, high-resolution display effects. Illustratively, the display device is a large screen display device, which refers to a large screen in a direct view color television or rear projection television, and typically, the diagonal size of the screen is 40 inches or more. The display surface of the large-screen display device may be flat or curved. The large screen display device may also be tiled, not limited in number.
In this embodiment, the camera with the pan/tilt head is located right above the large screen. The cradle head is a device for bearing a camera.
The target user can interact with the server through the camera by using gesture spacing, and then the server interacts information with the large screen, so that the gesture spacing control of the large screen is realized. The server may be a server, or a server cluster or a cloud computing center formed by a plurality of servers. The server can provide various services for the display device, and for different application programs on the display device, the server can be regarded as a background server for providing corresponding network services, and the method disclosed by the application can be mainly executed by the server side.
The application scene in an embodiment of the disclosure includes a large screen, a camera with a pan/tilt head located above the large screen, and an operable range located in front of the large screen. The operable range is generally an annular shaped region. The operator can manipulate the large screen within the operating range. If not within the operable range, for example, if too far from the large screen, the recognition gesture may be erroneous, a manipulation error may be caused. If too close to the large screen, the operator may not be able to observe the entire contents of the large screen, which is detrimental to the manipulation of the large screen. In this embodiment, the width of the large screen is 10 meters, and the operable range in this embodiment is a circular range of 3 meters to 12 meters from the large screen.
As shown in fig. 1, which illustrates a method of controlling a display device based on a gesture of the present disclosure, includes:
s10, acquiring skeleton node information of a manipulator and skeleton node information of a manipulator in an image;
S20, generating a feature vector describing the hand gesture of the operator according to the skeletal node information of the control hand, matching the feature vector describing the hand gesture of the operator with a gesture model in a gesture model library, and dividing the image into a gesture type frame image and a gesture type frame image according to a matching result to obtain gesture type information in the gesture type frame image;
s30, acquiring gesture type information and hand position information in the gesture-free type frame image according to skeleton node information of a manipulator and gesture type information of front and rear frames of the gesture-free type frame image;
S40, generating an instruction for controlling the display device according to the gesture type information and the hand position information of the gesture type frame image and the gesture type information and the hand position information of the gesture-free type frame image.
S10, acquiring skeleton node information of a manipulator and skeleton node information of a manipulator in an image;
The depth camera captures real-time skeleton node information of a manipulator, wherein the skeleton node information comprises three-dimensional coordinates of joints and connection relations among the joints. The image including the above operable range can be acquired by the depth camera, bone recognition is performed on the image acquired by the camera, and bone node information of a manipulator in the image are acquired. Including the location of the various nodes. Preprocessing, including denoising, smoothing and the like, is performed on the captured bone node data so as to improve the accuracy and stability of the data. The image in the operable range, that is, the preset range, obtained by the camera is preprocessed, such as graying, denoising, edge detection and the like, so that the efficiency and accuracy of subsequent processing are improved. Skeletal nodes in the image are detected using any computer vision algorithm, again unrestricted. These algorithms are generally capable of identifying and tracking a number of critical skeletal nodes of the human body, including the head, extremities, etc., including wrist joints. The bone node information may be converted from two-dimensional image coordinates to three-dimensional spatial coordinates, as desired. And fusing the three-dimensional skeleton node information captured by the depth camera with the two-dimensional skeleton node information detected by the computer vision algorithm. In order to provide more complete, accurate operator skeletal node information, particularly hand skeletal node information.
Preprocessing the acquired image, acquiring an image in a preset range by using image acquisition equipment such as a camera and the like, and acquiring an image containing an operator and an operator thereof. And then denoising, sharpening and the like are carried out on the image so as to improve the image quality and facilitate the subsequent gesture detection. Then, the hand region is detected, and the hand region is extracted from the image by using image processing techniques such as color space conversion, edge detection, morphological processing, and the like. In the hand region, key points of the hand, such as fingertips, wrists, joints, and the like, are detected using a deep learning algorithm or the like.
S20, generating a feature vector describing the hand gesture of the operator according to the skeletal node information of the control hand, matching the feature vector describing the hand gesture of the operator with a gesture model in a gesture model library, and dividing the image into a gesture type frame image and a gesture type frame image according to a matching result to obtain gesture type information in the gesture type frame image.
And executing gesture actions by the user according to a preset gesture specification. Gesture actions include a particular shape, speed, direction, etc. And generating a characteristic vector describing the hand gesture of the operator according to the skeletal node information of the control hand, wherein the characteristic vector is generated according to the position information of the key points. These feature vectors may include coordinates, angles, distances, etc. of the keypoints. The method comprises the steps of matching feature vectors describing hand gestures of a manipulator with gesture models in a gesture model library, wherein the gesture matching and classification are carried out, namely, the gesture model library is established, and in a training stage, a known gesture image and a known label are utilized, and the gesture model library is obtained through training of a deep learning algorithm (such as a support vector machine and a neural network). These models contain feature vectors and corresponding labels for the various gestures. Gesture matching, namely matching the extracted feature vector with a model in a gesture model library. This may be achieved by calculating the distance or similarity between feature vectors. Gesture classification, namely determining the gesture type in the image according to the matching result. Generally, a model with the smallest distance or highest similarity is selected as the recognition result.
And dividing the image into a gesture type frame image and a gesture type frame image according to the matching result, and obtaining gesture type information in the gesture type frame image. And processing the distance data through an algorithm, and extracting characteristic information of the gesture, such as the shape, speed, direction and the like of the gesture. And according to the extracted gesture characteristic information, a classifier (such as a support vector machine, a neural network and the like) is used for identifying the gesture. The classifier can judge whether the current gesture is matched with a certain preset gesture according to a preset gesture template or gesture samples in a database.
Based on gesture recognition, whether the gesture is completed or not is judged by comparing the actually measured distance with a preset distance threshold or range. If the actually measured distance meets a preset threshold or range, the gesture is considered to be completed, and a corresponding response action is triggered.
In gesture recognition systems, in order to accurately recognize and respond to a user's gesture, the system typically first processes a video stream or sequence of images to segment out hand regions and count the number of fingers or recognize the gesture type. However, not all frames contain valid gesture actions. Some frames may simply be the user being stationary, making gestures independent, or hands not being present in the screen. These frames are referred to as "gesture-free class frames".
The gesture-free frame is a frame which does not contain gesture actions or unobvious gesture characteristics in a video stream or an image sequence. These frames typically contain only background information, images of other non-gesture objects or other parts of the human body, and do not contain the complete morphology or motion trajectories of the gestures. An effective gesture frame is a frame that contains a clear, complete gesture motion in a video stream or image sequence that meets the requirements of a particular gesture recognition algorithm. The frames have definite gesture forms, motion tracks and characteristic points, and can be accurately identified by a gesture identification algorithm.
And extracting the position information of key nodes such as the wrist from the video or image sequence. The motion trail and relative position change of the nodes in the front and rear frames are analyzed. The extracted features are compared and matched with features in a library of known gesture patterns. Pattern matching may be performed using machine learning algorithms (e.g., support vector machines, random forests, etc.) or deep learning models (e.g., convolutional neural networks, recurrent neural networks, etc.).
For situations where it is desirable to capture gesture dynamics, a motion sensor (e.g., gyroscope, accelerometer) or the like may be used to aid in the analysis. The type and the intention of the gesture can be further judged by calculating parameters such as the movement speed, the acceleration and the like of the nodes. And combining gesture type information of the previous and subsequent frames, and utilizing the context information to assist in judging the gesture type of the current gesture-free frame. For example, if the previous frame is the start frame of a gesture and the next frame is the end frame of the gesture, then it is determined that the currently gesture-free class frame belongs to the transition frame of the gesture.
S30, acquiring gesture type information and hand position information in the gesture-free type frame image according to skeleton node information of a manipulator and gesture type information of front and rear frames of the gesture-free type frame image.
For frames lacking hand position information, the hand position may be estimated by an inter-frame interpolation or prediction method. And interpolating by utilizing the skeletal node information of the operator in the gesture type frame images of the front frame and the rear frame of the gesture type frame image, and calculating the hand skeletal node information of the operator in the gesture type frame image. And matching the hand skeleton node information of the operator in the gesture-free type frame image with a gesture model in a gesture model library to obtain the gesture type of the operator in the gesture-free type frame image.
For the frame without the hand position information, the wrist node position of the operator in the gesture-free type frame image can be obtained according to the skeletal node information of the operator and the skeletal node information of the control hand, and the hand position information of the operator in the gesture-free type frame image can be obtained according to the wrist node position of the operator in the gesture-free type frame image.
And comparing the gesture type information in the gesture type frame images with the gesture type in the front and rear frames of the gesture type-free frame image, and judging that the gesture information in the gesture type-free frame image is the same as the gesture type information of the front and rear frames if the gesture type information of the front and rear frames belongs to the same gesture type. If the gesture type information of the previous and subsequent frames respectively belong to two continuous different gesture types, judging that the gesture type information in the gesture-free type frame image is the same as the gesture type information of the previous frame or the gesture type information of the subsequent frame.
In the video or image sequence, the position information of key nodes such as wrists is continuously tracked. Algorithms such as optical flow, kalman filtering, etc. can be used to optimize the accuracy of the tracking of the node locations. For gesture-free class frames, the position information of the key nodes can be calculated through an interpolation algorithm. Common interpolation algorithms include linear interpolation, polynomial interpolation, and the like. And correcting and optimizing the position information of the gesture-free frame by combining the position information of the front frame and the rear frame. Smoothing filtering, denoising algorithms, etc. may be used to improve the accuracy and stability of the location information. And realizing gesture judgment and position information calculation of gesture-free frames. The video or image sequence can be processed in real time, and the gesture recognition result and the position information are output.
S40, generating instructions for controlling the display device according to the gesture type information and the hand position information in the gesture type frame images and the gesture-free type frame images.
Classifying the gestures with gesture type frame images and gesture without gesture type frame images into a first-level response gesture and a second-level response gesture, acquiring displacement change data of the first gesture and the second gesture between frames of the same control hand of a manipulator, judging whether the displacement change data of the second gesture reach the condition for triggering the display of a menu when the displacement change data of the first gesture reach the condition for triggering the second gesture, and generating an instruction for controlling the display equipment if the displacement change data of the second gesture reach the condition for triggering the display of the menu.
A camera (e.g., a color camera or a depth camera) is used to capture images of the hand or hand gesture during execution in real time. The distance between the hand or gesture and the camera is calculated by an image processing algorithm, such as a depth map algorithm. For a depth camera, depth information can be directly acquired, and for a color camera, depth needs to be estimated by a specific algorithm (such as stereoscopic vision, structured light, etc.). And setting a corresponding distance threshold according to the expected completion condition of the gesture. And calculating the distance between the hand or the gesture and the camera in real time, and comparing the distance with a set threshold value. If the distance data is within the set threshold range, the gesture is deemed to be proceeding as expected. If the distance data exceeds the set threshold range, the gesture is deemed not to be completed as expected or to have an error.
The device for controlling the display equipment based on the gestures is characterized by comprising an acquisition unit and a matching unit, wherein the acquisition unit is used for acquiring skeleton node information of a manipulator and skeleton node information of a manipulation hand in an image, the matching unit is used for generating feature vectors of the manipulator describing hand gestures according to the skeleton node information of the manipulation hand, matching the feature vectors of the manipulator describing the hand gestures with gesture models in a gesture model library, and dividing the image into a gesture type frame image and a gesture type frame image according to matching results to obtain gesture type information in the gesture type frame image. The device comprises a judging unit, a generating unit and a display device, wherein the judging unit is used for acquiring gesture type information and hand position information in gesture-free frame images according to skeleton node information of a manipulator and gesture type information of front and rear frames of the gesture-free frame images, and the generating unit is used for generating instructions for controlling the display device according to the gesture type information and the hand position information in the gesture-free frame images.
The present disclosure proposes in a third aspect a computer readable medium having stored therein a computer program to be loaded and executed by a processing module for carrying out the steps of the described acquisition method. It will be appreciated by those skilled in the art that all or part of the steps in the embodiments may be implemented by a computer program to instruct related hardware, and the program may be stored in a computer readable medium, and the readable medium may include various media that may store program codes, such as a flash disk, a removable hard disk, a read-only memory, a random access device, a magnetic disk, or an optical disk.
It is within the knowledge and ability of one skilled in the art to combine the various embodiments or features mentioned herein with one another as additional alternative embodiments without conflict, and such limited number of alternative embodiments, not listed one by one, formed by a limited number of combinations of features, still fall within the skill of the present disclosure, as would be understood or inferred by one skilled in the art in view of the drawings and the foregoing.
In addition, the description of the most embodiments has been developed based on various emphasis instead of being further understood that reasonable inferences can be made regarding prior art, other related descriptions herein, or the inventive concepts where not explicitly described.
It is emphasized that the above-described embodiments, which are typical and preferred embodiments of the disclosure, are merely set forth for a detailed description of the disclosed technology and are intended to be read by the reader in light of this disclosure, and are not intended to limit the scope or applicability of the disclosure. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present disclosure, are intended to be encompassed within the scope of the present disclosure.