DE102020116296A1

DE102020116296A1 - METHOD AND DEVICE FOR DETERMINING THE SPATIAL POSITIONS OF A HAND AND THE FINGERS OF THE HAND IN A AREA OF SPACE

Info

Publication number: DE102020116296A1
Application number: DE102020116296.0A
Authority: DE
Inventors: Julian Eichhorn; Francesco MANTOVANI
Original assignee: Bayerische Motoren Werke AG
Current assignee: Bayerische Motoren Werke AG
Priority date: 2020-06-19
Filing date: 2020-06-19
Publication date: 2021-12-23

Abstract

Ein Verfahren zur Ermittlung räumlicher Positionen einer Hand und der Finger der Hand in einem Raumbereich umfasst die Schritte:a) Erfassen von Pixeldaten von einer Vielzahl von Pixeln eines zweidimensionalen Bildes des Raumbereichs, welches ein Bild einer Hand umfasst, und von den einzelnen Pixeln des zweidimensionalen Bildes zugeordneten jeweiligen Tiefeninformationen des Raumbereichs,b) Erkennen der Hand anhand der Pixeldaten der Vielzahl von Pixeln des zweidimensionalen Bildes und Definieren eines Bildausschnitts des zweidimensionalen Bildes, der ein Bild der Hand enthält, als einen Bereich von Interesse in dem zweidimensionalen Bild, anhand der erkannten Hand,c) Bestimmen einer Entfernung des Bereichs von Interesse anhand mindestens einer einem Pixel des Bereichs von Interesse zugeordneten Tiefeninformation,d) Entfernen von Pixeln des zweidimensionalen Bildes, deren zugeordnete Tiefeninformationen eine vorbestimmte Bedingung in Bezug auf die bestimmte Entfernung des Bereichs von Interesse erfüllen, und dadurch Erhalten eines reduzierten zweidimensionalen Bildes, unde) Ermitteln der räumlichen Positionen der Hand und der Finger der Hand basierend auf den Pixeldaten der Pixel des reduzierten zweidimensionalen Bildes und den den Pixeln des reduzierten zweidimensionalen Bildes zugeordneten Tiefeninformationen.A method for determining spatial positions of a hand and the fingers of the hand in a spatial area comprises the steps: a) acquiring pixel data from a plurality of pixels of a two-dimensional image of the spatial area, which includes an image of a hand, and from the individual pixels of the two-dimensional Image assigned respective depth information of the spatial region,b) recognizing the hand based on the pixel data of the plurality of pixels of the two-dimensional image and defining an image detail of the two-dimensional image, which contains an image of the hand, as an area of interest in the two-dimensional image, based on the recognized hand,c) determining a distance of the area of interest based on at least one pixel of the area of interest associated depth information,d) removing pixels of the two-dimensional image, whose associated depth information a predetermined condition in relation to the determined distance of the area of I satisfy interest and thereby obtain a reduced two-dimensional image, ande) determining the spatial positions of the hand and the fingers of the hand based on the pixel data of the pixels of the reduced two-dimensional image and the depth information associated with the pixels of the reduced two-dimensional image.

Description

Die vorliegende Erfindung betrifft ein Verfahren und eine Vorrichtung zur Ermittlung räumlicher Positionen einer Hand und der Finger der Hand in einem Raumbereich.The present invention relates to a method and a device for determining spatial positions of a hand and the fingers of the hand in a spatial region.

Auf dem Markt erhältliche technische Geräte werden in zunehmendem Maße mit einer Gestenerkennungsfunktion ausgestattet, durch welche eine berührungsfreie Eingabe eines Benutzers in Form einer Geste erkannt werden kann. Bei Erkennung einer jeweiligen Geste führt das Gerät dann einen jeweiligen Schaltvorgang bzw. eine jeweilige Funktion durch, der bzw. die der jeweiligen erkannten Geste zugeordnet ist. Dabei muss sich im Falle einer mit einer Hand eines Benutzers durchgeführten Geste die Hand in einem von dem Gerät überwachten vorbestimmten Raumbereich befinden, damit die Geste beispielsweise unter Verwendung maschinellen Sehens erkannt werden kann.Technical devices available on the market are increasingly being equipped with a gesture recognition function, by means of which a contact-free input by a user can be recognized in the form of a gesture. When a respective gesture is recognized, the device then carries out a respective switching process or a respective function that is assigned to the respective recognized gesture. In the case of a gesture carried out with a user's hand, the hand must be located in a predetermined spatial area monitored by the device so that the gesture can be recognized using machine vision, for example.

Bei einem üblichen Ansatz zur Erfassung einer Hand mittels maschinellen Sehens wird eine Region of Interest (ROI) bzw. ein Bereich von Interesse in einem aufgenommenen Bild manuell beschränkt, auf die bzw. den sich die Analyse der Pixeldaten der Vielzahl von Pixeln des Bildes hinsichtlich der Erfassung der Hand konzentrieren soll, und der Benutzer wird dazu angehalten, die Hand in diesem Bereich von Interesse zu belassen.In a common approach to capturing a hand by means of machine vision, a region of interest (ROI) or an area of interest in a recorded image is manually restricted to which the analysis of the pixel data of the plurality of pixels in the image is based Capture the hand should focus and the user is encouraged to keep the hand in the area of interest.

Ein anderer bekannter Ansatz zur Erfassung der Hand mittels maschinellen Sehens beruht auf der Verwendung von Tiefenkameras, um das menschliche Skelett bzw. den menschlichen Körper ganz oder teilweise zu erfassen, und dann die Hand als Teil des Skeletts bzw. menschlichen Körpers zu erkennen. Die vorgelagerte Erfassung des menschlichen Skeletts bzw. des menschlichen Körpers ist erforderlich, da der entsprechende Algorithmus zur Erfassung der Hand einen Ausgangspunkt für die Suche nach der Hand benötigt.Another known approach to capturing the hand by means of machine vision is based on the use of depth cameras in order to capture all or part of the human skeleton or the human body and then to recognize the hand as part of the skeleton or the human body. The upstream detection of the human skeleton or the human body is necessary because the corresponding algorithm for detecting the hand requires a starting point for the search for the hand.

Bei dem Segmentierungsvorgang, bei dem der das Bild der Hand enthaltende Bildbereich aus dem gesamten Bild extrahiert wird bzw. von dem Hintergrund getrennt wird, wird üblicherweise basierend auf der Hautfarbe eine Maske erzeugt, die einen Wertebereich in dem HSV-Farbraum filtert, so dass durch den Farbunterschied zwischen der Hand und dem Hintergrund die Hand erkannt und extrahiert werden kann. Dieser Vorgang erfordert jedoch eine hohe Rechenleistung und ist störanfällig falls das Bild unerwünschtes Hintergrundrauschen enthält.In the segmentation process, in which the image area containing the image of the hand is extracted from the entire image or is separated from the background, a mask is usually generated based on the skin color, which filters a range of values in the HSV color space, so that by the color difference between the hand and the background the hand can be recognized and extracted. However, this process requires a high level of computing power and is prone to interference if the image contains unwanted background noise.

Es ist daher eine Aufgabe der vorliegenden Erfindung, ein Verfahren und eine Vorrichtung bereitzustellen, mit denen die Rechenleistung zur Ermittlung räumlicher Positionen einer Hand und der Finger in einem Raumbereich reduziert werden kann und die weniger störanfällig sind, falls ein Bild des Raumbereichs Hintergrundrauschen enthält.It is therefore an object of the present invention to provide a method and a device with which the computing power for determining spatial positions of a hand and fingers in a spatial area can be reduced and which are less susceptible to interference if an image of the spatial area contains background noise.

Diese Aufgabe wird durch ein Verfahren gemäß dem Patentanspruch 1 und eine Vorrichtung gemäß Patentanspruch 9 gelöst. Weitere bevorzugte Ausführungsformen, Weiterbildungen oder Varianten sind insbesondere Gegenstand von abhängigen Patentansprüchen. Der Gegenstand der Patentansprüche wird ausdrücklich zum Teil der Offenbarung der Beschreibung gemacht.This object is achieved by a method according to patent claim 1 and a device according to patent claim 9. Further preferred embodiments, developments or variants are in particular the subject of the dependent claims. The subject matter of the claims is expressly made part of the disclosure of the description.

Ein Verfahren zur Ermittlung räumlicher Positionen einer Hand und der Finger der Hand in einem Raumbereich gemäß einer Ausführungsform umfasst die Schritte:

a) Erfassen von Pixeldaten von einer Vielzahl von Pixeln eines zweidimensionalen Bildes des Raumbereichs, welches ein Bild einer Hand umfasst, und von den einzelnen Pixeln des zweidimensionalen Bildes zugeordneten jeweiligen Tiefeninformationen des Raumbereichs,
b) Erkennen der Hand anhand der Pixeldaten der Vielzahl von Pixeln des zweidimensionalen Bildes und Definieren eines Bildausschnitts des zweidimensionalen Bildes, der ein Bild der Hand enthält, als einen Bereich von Interesse in dem zweidimensionalen Bild, anhand der erkannten Hand,
c) Bestimmen einer Entfernung des Bereichs von Interesse anhand mindestens einer einem Pixel des Bereichs von Interesse zugeordneten Tiefeninformation,
d) Entfernen von Pixeln des zweidimensionalen Bildes, deren zugeordnete Tiefeninformationen eine vorbestimmte Bedingung in Bezug auf die bestimmte Entfernung des Bereichs von Interesse erfüllen, und dadurch Erhalten eines reduzierten zweidimensionalen Bildes, und
e) Ermitteln der räumlichen Positionen der Hand und der Finger der Hand basierend auf den Pixeldaten der Pixel des reduzierten zweidimensionalen Bildes und den den Pixeln des reduzierten zweidimensionalen Bildes zugeordneten Tiefeninformationen.

A method for determining spatial positions of a hand and the fingers of the hand in a spatial region according to one embodiment comprises the steps:

a) Acquisition of pixel data from a plurality of pixels of a two-dimensional image of the spatial area, which comprises an image of a hand, and from the respective depth information of the spatial area assigned to the individual pixels of the two-dimensional image,
b) recognizing the hand based on the pixel data of the plurality of pixels of the two-dimensional image and defining an image section of the two-dimensional image, which contains an image of the hand, as an area of interest in the two-dimensional image, based on the recognized hand,
c) determining a distance of the area of interest on the basis of at least one depth information assigned to a pixel of the area of interest,
d) removing pixels of the two-dimensional image, the associated depth information of which satisfies a predetermined condition with respect to the determined distance of the area of interest, and thereby obtaining a reduced two-dimensional image, and
e) determining the spatial positions of the hand and the fingers of the hand based on the pixel data of the pixels of the reduced two-dimensional image and the depth information assigned to the pixels of the reduced two-dimensional image.

Erfindungsgemäß wird die Hand anhand der Pixeldaten einer Vielzahl von Pixeln eines zweidimensionalen Bildes des Raumbereichs, insbesondere in dem gesamten Sichtfeld der das Bild aufnehmenden Kamera, erkannt und basierend auf der erkannten Hand automatisch ein Bildausschnitt des zweidimensionalen Bildes, der ein Bild der Hand enthält, als ein Bereich von Interesse in dem zweidimensionalen Bild definiert. Somit kann der in weiteren Schritten zu verarbeitende Bildausschnitt, nämlich der Bereich von Interesse, im Falle einer sequenziellen Aufnahme von Einzelbildern dynamisch an die jeweilige Position der Hand angepasst werden und muss nicht manuell beschränkt werden.According to the invention, the hand is recognized on the basis of the pixel data of a large number of pixels of a two-dimensional image of the spatial area, in particular in the entire field of view of the camera taking the image, and, based on the recognized hand, an image section of the two-dimensional image containing an image of the hand is automatically recognized as defines an area of interest in the two-dimensional image. Thus, the image section to be processed in further steps, namely the area of interest, in the case of a sequential recording of individual images can be dynamically adapted to the respective position of the hand and does not have to be restricted manually.

Des Weiteren ist es bei dem erfindungsgemäßen Verfahren nicht wie bei den bekannten Verfahren erforderlich, andere Teile des menschlichen Körpers bzw. des menschlichen Skeletts zu erkennen oder die Pixeldaten, insbesondere die Farbwerte, der Vielzahl von Pixeln mit einer basierend auf der Hautfarbe erzeugten Maske zu vergleichen. Auf diese Weise kann die erforderliche Rechenleistung der entsprechenden Datenverarbeitungsvorrichtung zur Ermittlung der räumlichen Positionen der Hand und der Finger der Hand bereits erheblich reduziert werden. Darüber hinaus können hierdurch Fehler bei der Erkennung der Hand, die beispielsweise bei dem Vergleich der Farbwerte der Vielzahl von Pixeln mit der basierend auf der Hautfarbe erzeugten Maske aufgrund von Hintergrundrauschen auftreten können, vermieden werden.Furthermore, in the method according to the invention it is not necessary, as in the known methods, to recognize other parts of the human body or the human skeleton or to compare the pixel data, in particular the color values, of the plurality of pixels with a mask generated based on the skin color . In this way, the required computing power of the corresponding data processing device for determining the spatial positions of the hand and the fingers of the hand can already be reduced considerably. In addition, errors in the detection of the hand, which can occur due to background noise, for example when comparing the color values of the plurality of pixels with the mask generated based on the skin color, can thereby be avoided.

Erfindungsgemäß wird ferner die Entfernung des Bereichs von Interesse bestimmt, und dann basierend auf einer vorbestimmten Bedingung in Bezug auf die bestimmte Entfernung des Bereichs von Interesse, beispielsweise der Bedingung, dass der Betrag der Differenz zwischen dem Wert der Tiefeninformation eines jeweiligen Pixels und der bestimmten Entfernung des Bereichs von Interesse kleiner als ein vorbestimmter Wert, beispielsweise 5 cm, ist, Pixel des zweidimensionalen Bildes entfernt, wodurch ein reduziertes zweidimensionales Bild erhalten wird. Durch diese Reduzierung der zu verarbeitenden Pixelanzahl kann die erforderliche Rechenleistung der entsprechenden Datenverarbeitungsvorrichtung zur Ermittlung der räumlichen Positionen der Hand und der Finger der Hand weiter reduziert werden.According to the invention the distance of the area of interest is further determined, and then based on a predetermined condition with respect to the determined distance of the area of interest, for example the condition that the amount of the difference between the value of the depth information of a respective pixel and the determined distance of the area of interest is smaller than a predetermined value, for example 5 cm, pixels of the two-dimensional image are removed, whereby a reduced two-dimensional image is obtained. As a result of this reduction in the number of pixels to be processed, the required computing power of the corresponding data processing device for determining the spatial positions of the hand and the fingers of the hand can be further reduced.

Gemäß einer bevorzugten Ausführungsform erfolgt in Schritt b) das Erkennen der Hand basierend auf mittels maschinellen Lernens gewonnenen Modellen einer Hand.According to a preferred embodiment, the hand is recognized in step b) based on models of a hand obtained by means of machine learning.

Dabei kann in Schritt b) das Erkennen der Hand unter Verwendung eines neuronalen Netzwerks erfolgen.The hand can be recognized in step b) using a neural network.

Die Pixeldaten der Vielzahl von Pixeln des zweidimensionalen Bildes des Raumbereichs können Farbinformationen enthalten, und die Farbinformationen der Pixeldaten der Vielzahl von Pixeldaten des zweidimensionalen Bildes und die den Pixeln des zweidimensionalen Bildes zugeordneten Tiefeninformationen des Raumbereichs können unter Verwendung einer Tiefenkamera, einer Stereokamera oder einer Kombination aus einer Farbkamera und einer Time-of-Flight-Kamera erfasst werden.The pixel data of the plurality of pixels of the two-dimensional image of the spatial area can contain color information, and the color information of the pixel data of the plurality of pixel data of the two-dimensional image and the depth information of the spatial area assigned to the pixels of the two-dimensional image can be obtained using a depth camera, a stereo camera or a combination of a color camera and a time-of-flight camera.

Das Verfahren kann ferner einen Schritt d1) umfassen, der nach Schritt d) und vor Schritt e) ausgeführt wird, bei dem das reduzierte zweidimensionale Bild einer morphologischen Transformation unterzogen wird, um Pixeldaten von fehlerhaften Pixeln durch an Pixeldaten benachbarter Pixel angepasste Pixeldaten zu ersetzen und ein transformiertes zweidimensionales Bild zu erhalten.The method can further comprise a step d1), which is carried out after step d) and before step e), in which the reduced two-dimensional image is subjected to a morphological transformation in order to replace pixel data of defective pixels with pixel data adapted to pixel data of neighboring pixels and obtain a transformed two-dimensional image.

Weiterhin kann das Verfahren ferner einen Schritt d2) umfassen, der nach Schritt d1) und vor Schritt e) ausgeführt wird, und bei dem das transformierte zweidimensionale Bild in ein binäres zweidimensionales Bild oder in ein zweidimensionales Graustufenbild umgewandelt wird.The method can furthermore comprise a step d2), which is carried out after step d1) and before step e), and in which the transformed two-dimensional image is converted into a binary two-dimensional image or into a two-dimensional grayscale image.

Bevorzugt umfasst das Verfahren ferner einen Schritt d3), der nach Schritt d2) und vor Schritt e) ausgeführt wird, und bei dem anhand des binären zweidimensionalen Bildes oder des zweidimensionalen Graustufenbildes eine Kontur der Hand in dem binären zweidimensionalen Bild oder dem zweidimensionalen Graustufenbild, und ein Mittelpunkt der Hand anhand eines Bildmoments des durch die Kontur der Hand definierten Bildausschnitts des binären zweidimensionalen Bildes oder des zweidimensionalen Graustufenbildes ermittelt wird.The method preferably further comprises a step d3), which is carried out after step d2) and before step e), and in which a contour of the hand in the binary two-dimensional image or the two-dimensional grayscale image is based on the binary two-dimensional image or the two-dimensional grayscale image, and a center point of the hand is determined on the basis of an image moment of the image section of the binary two-dimensional image or the two-dimensional grayscale image defined by the contour of the hand.

Das Verfahren kann ferner einen Schritt d4) umfassen, der nach Schritt d3) und vor Schritt e) ausgeführt wird, und bei dem basierend auf Krümmungen der Kontur der Hand die Bereiche des binären zweidimensionalen Bildes oder des zweidimensionalen Graustufenbildes ermittelt werden, die einzelnen Fingern der Hand entsprechen, und basierend auf dem Mittelpunkt der Hand und dem Bildmoment eine Ausrichtung der Hand ermittelt wird.The method can further include a step d4), which is carried out after step d3) and before step e), and in which, based on the curvatures of the contour of the hand, the areas of the binary two-dimensional image or the two-dimensional gray-scale image are determined, the individual fingers of the Hand, and an orientation of the hand is determined based on the center of the hand and the image moment.

Eine Vorrichtung zur Ermittlung räumlicher Positionen einer Hand und der Finger der Hand in einem Raumbereich gemäß einer Ausführungsform umfasst eine Aufnahmevorrichtung, die dazu eingerichtet ist Pixeldaten von einer Vielzahl von Pixeln eines zweidimensionalen Bildes des Raumbereichs, welches ein Bild einer Hand umfasst, und von den einzelnen Pixeln des zweidimensionalen Bildes zugeordneten jeweiligen Tiefeninformationen des Raumbereichs zu erfassen, und eine Datenverarbeitungsvorrichtung, die zur Durchführung folgender Schritte eingerichtet ist,

a) die Hand anhand der Pixeldaten der Vielzahl von Pixeln des zweidimensionalen Bildes zu erkennen und einen Bildausschnitt des zweidimensionalen Bildes, der ein Bild der Hand enthält, als einen Bereich von Interesse in dem zweidimensionalen Bild zu definieren, anhand der erkannten Hand,
b) eine Entfernung des Bereichs von Interesse anhand mindestens einer einem Pixel des Bereichs von Interesse zugeordneten Tiefeninformation zu bestimmen,
c) Pixel des zweidimensionalen Bildes zu entfernen, deren zugeordnete Tiefeninformationen eine vorbestimmte Bedingung in Bezug auf die bestimmte Entfernung des Bereichs von Interesse erfüllen, und dadurch ein reduziertes zweidimensionalen Bild zu erhalten, und
e) die räumlichen Positionen der Hand und der Finger der Hand basierend auf den Pixeldaten der Pixel des reduzierten zweidimensionalen Bildes und den den Pixeln des reduzierten zweidimensionalen Bildes zugeordneten Tiefeninformationen zu ermitteln.

A device for determining spatial positions of a hand and the fingers of the hand in a spatial area according to one embodiment comprises a recording device which is set up to receive pixel data from a plurality of pixels of a two-dimensional image of the spatial area, which includes an image of a hand, and from the individual ones To acquire depth information of the spatial region assigned to pixels of the two-dimensional image, and a data processing device which is set up to carry out the following steps,

a) to recognize the hand based on the pixel data of the plurality of pixels of the two-dimensional image and to define an image section of the two-dimensional image, which contains an image of the hand, as an area of interest in the two-dimensional image, based on the recognized hand,
b) a removal of the area of interest using at least one pixel of the Determine depth information associated with an area of interest;
c) removing pixels of the two-dimensional image, the associated depth information of which fulfills a predetermined condition with regard to the determined distance of the area of interest, and thereby obtaining a reduced two-dimensional image, and
e) to determine the spatial positions of the hand and the fingers of the hand based on the pixel data of the pixels of the reduced two-dimensional image and the depth information assigned to the pixels of the reduced two-dimensional image.

Bevorzugt umfasst die Vorrichtung ferner eine Speichereinrichtung, in der eine Vielzahl mittels maschinellen Lernens gewonnener Modelle einer Hand gespeichert sind, wobei die Datenverarbeitungsvorrichtung dazu eingerichtet ist, in Schritt a) die Hand unter Verwendung eines neuronalen Netzwerks und der in der Speichereinrichtung gespeicherten Vielzahl von Modellen einer Hand zu erkennen.The device preferably further comprises a storage device in which a plurality of models of a hand obtained by means of machine learning are stored, the data processing device being set up to process the hand in step a) using a neural network and the plurality of models stored in the storage device Hand to recognize.

Weitere Vorteile, Merkmale und Anwendungsmöglichkeiten der vorliegenden Erfindung ergeben sich aus der nachfolgenden detaillierten Beschreibung wenigstens einer bevorzugten Ausführungsform und/oder aus den Figuren. Gleiche Bauteile der Ausführungsformen werden im Wesentlichen durch die gleichen Bezugszeichen gekennzeichnet, falls dies nicht anders beschrieben wird oder sich nicht anderes aus dem Kontext ergibt.Further advantages, features and possible applications of the present invention emerge from the following detailed description of at least one preferred embodiment and / or from the figures. The same components of the embodiments are essentially identified by the same reference symbols, unless otherwise described or if nothing else results from the context.

Dabei zeigen, teilweise schematisiert:

1 eine Vorrichtung zur Ermittlung räumlicher Positionen einer Hand und der Finger der Hand in einem Raumbereich gemäß einer Ausführungsform,
2 ein Flussdiagramm zur Veranschaulichung eines Verfahrens zur Ermittlung räumlicher Positionen einer Hand und der Finger der Hand in einem Raumbereich gemäß einer Ausführungsform, und
3A und 3B Darstellungen zur Erläuterung des in 2 veranschaulichten Verfahrens.

Show, partly schematized:

1 a device for determining spatial positions of a hand and the fingers of the hand in a spatial area according to an embodiment,
2 a flowchart to illustrate a method for determining spatial positions of a hand and the fingers of the hand in a spatial region according to an embodiment, and
3A and 3B Illustrations to explain the in 2 illustrated procedure.

1 veranschaulicht eine Vorrichtung 100 zur Ermittlung räumlicher Positionen einer Hand und der Finger der Hand in einem Raumbereich gemäß einer Ausführungsform. Die Vorrichtung 100 weist eine Aufnahmeeinrichtung 30 auf, die dazu eingerichtet ist ein zweidimensionales Bild in einem durch die Konfiguration der Aufnahmeeinrichtung 30 vorgegebenen Raumbereich bzw. Sichtwinkel der Aufnahmeeinrichtung 30 zu erfassen. Zur Erfassung des zweidimensionalen Bildes kann die Aufnahmeeinrichtung 30 beispielweise wie in 1 veranschaulicht eine Farbkamera 31 aufweisen, durch welche Pixeldaten der einzelnen Pixel des zweidimensionalen Bildes als Farbwerte wie etwa RGB-Werte aufgenommen und gegebenenfalls in einer Speichereinrichtung der Aufnahmeeinrichtung 30 abgespeichert werden. 1 illustrates an apparatus 100 for determining spatial positions of a hand and the fingers of the hand in a spatial area according to one embodiment. The device 100 has a receiving device 30th which is set up to a two-dimensional image in one by the configuration of the recording device 30th specified spatial area or viewing angle of the receiving device 30th capture. To capture the two-dimensional image, the recording device 30th for example as in 1 illustrates a color camera 31 have, by means of which pixel data of the individual pixels of the two-dimensional image are recorded as color values such as RGB values and optionally in a storage device of the recording device 30th can be saved.

Die Aufnahmeeinrichtung 30 ist ferner dazu eingerichtet, jeweilige Tiefeninformationen für jeden Pixel des zweidimensionalen Bildes zu erfassen und die jeweiligen für den jeweiligen Zeitpunkt der Aufnahme des zweidimensionalen Bildes erfassten Tiefeninformationen den jeweiligen Pixeln des entsprechenden zweidimensionalen Bildes zuzuordnen. Zur Erfassung der Tiefeninformationen kann die Aufnahmeeinrichtung 30 beispielsweise wie in 1 veranschaulicht einen Infrarotsender 32 und zwei Infrarotsensoren 33, 34 aufweisen. Im Betrieb sendet der Infrarotsender 32 ein codiertes infrarotes Signal aus, welches von in dem Raumbereich befindlichen Objekten, die beispielsweise die in 1 gezeigte Hand 10 umfassen können, reflektiert, und von den beiden Infrarotsensoren 33, 34 empfangen wird. Basierend auf der Laufzeit des Signals werden dann von der Aufnahmeeinrichtung 30 die Entfernungen der einzelnen Punkte der Objekte von der Aufnahmeeinrichtung 30 als die Tiefeninformationen ermittelt und den entsprechenden Pixeln des zweidimensionalen Bildes zugeordnet.The receiving facility 30th is also set up to acquire respective depth information for each pixel of the two-dimensional image and to assign the respective depth information acquired for the respective point in time of the acquisition of the two-dimensional image to the respective pixels of the corresponding two-dimensional image. To acquire the depth information, the recording device 30th for example as in 1 illustrates an infrared transmitter 32 and two infrared sensors 33 , 34 exhibit. The infrared transmitter transmits during operation 32 an encoded infrared signal, which is generated by objects located in the area of the room, such as those in 1 shown hand 10 may include, reflected, and from the two infrared sensors 33 , 34 Will be received. Based on the transit time of the signal, the recording device 30th the distances of the individual points of the objects from the recording device 30th determined as the depth information and assigned to the corresponding pixels of the two-dimensional image.

Eine derartig ausgebildete Aufnahmeeinrichtung 30 mit der Farbkamera 31, dem Infrarotsender 32 und den Infrarotsensoren 33, 34 wird auch als Tiefenkamera bezeichnet. Bei anderen nicht gezeigten Ausführungsformen kann die Aufnahmeeinrichtung 30 statt dem Infrarotsender 32 und den Infrarotsensoren 33, 34 auch andere geeignete Vorrichtungen zur Erfassung einer Entfernung der einzelnen Punkte der Objekte umfassen, wie etwa eine Time-of-Flight-Kamera. Bei einer weiteren anderen nicht gezeigten Ausführungsform kann die Aufnahmeeinrichtung 30 auch eine Stereokamera aufweisen, mit der zwei zweidimensionale Bilder gleichzeitig von unterschiedlichen Positionen erfasst werden, und die Entfernungen der einzelnen Punkte der Objekte von der Aufnahmeeinrichtung 30 basierend auf diesen zwei zweidimensionalen Bildern berechnet wird.A receiving device designed in this way 30th with the color camera 31 , the infrared transmitter 32 and the infrared sensors 33 , 34 is also known as a depth camera. In other embodiments, not shown, the receiving device 30th instead of the infrared transmitter 32 and the infrared sensors 33 , 34 also include other suitable devices for detecting a distance of the individual points of the objects, such as a time-of-flight camera. In a further other embodiment, not shown, the receiving device 30th also have a stereo camera with which two two-dimensional images are captured simultaneously from different positions, and the distances of the individual points of the objects from the recording device 30th is calculated based on these two two-dimensional images.

Die Pixeldaten des zweidimensionalen Bildes des Raumbereichs, welches das Bild der Hand 10 umfasst, werden zusammen mit den zugeordneten Tiefeninformationen an eine Datenverarbeitungsvorrichtung 40 wie etwa einen Computer weitergeleitet, welche basierend auf den erfassten Pixeldaten des zweidimensionalen Bildes und den jedem Pixel des zweidimensionalen Bildes zugeordneten jeweiligen Tiefeninformationen die räumlichen Positionen der Hand 10 und der Finger der Hand 10 ermittelt. Aus den ermittelten räumlichen Positionen der Hand 10 und der Finger der Hand 10 kann die Datenverarbeitungsvorrichtung 40 wie in 1 veranschaulicht eine räumliche Darstellung 50 der Hand 10 erstellen und auf einer Anzeige ausgeben.The pixel data of the two-dimensional image of the area of space, which is the image of the hand 10 are sent to a data processing device together with the assigned depth information 40 such as a computer, which based on the captured pixel data of the two-dimensional image and the respective depth information assigned to each pixel of the two-dimensional image, the spatial positions of the hand 10 and the finger of the hand 10 determined. From the determined spatial positions of the hand 10 and the finger of the hand 10 can the data processing device 40 as in 1 illustrated a spatial representation 50 of the hand 10 create and display on a display.

Die einzelnen Schritte zur Ermittlung der räumlichen Positionen der Hand 10 und der Finger der Hand 10 werden nachfolgend anhand des Flussdiagramms der 2 und den 3A und 3B erläutert.The individual steps for determining the spatial positions of the hand 10 and the finger of the hand 10 are described below using the flow chart of 2 and the 3A and 3B explained.

Mit Bezug auf 2 werden zunächst in Schritt S1 mittels der in 1 veranschaulichten Aufnahmeeinrichtung 30 ein in 3A veranschaulichtes zweidimensionales Bild 20 eines Raumbereichs, das ein Bild einer Hand 10 umfasst, und den einzelnen Pixeln des zweidimensionalen Bildes zugeordnete Tiefeninformationen erfasst. Die Erfassung der Pixeldaten der einzelnen Pixel des zweidimensionalen Bildes 20 kann beispielsweise mit der in 1 veranschaulichten Farbkamera 31 erfolgen und die Erfassung von jedem Pixel des zweidimensionalen Bildes zugeordneten Farbwerten wie etwa RGB-Werten umfassen. Die Erfassung der Tiefeninformationen kann durch eine der mit Bezug auf 1 beschriebenen Vorrichtungen zur Bestimmung der Entfernung der einzelnen Punkte der in dem Raumbereich befindlichen Objekte erfolgen. Das erfasste zweidimensionale Bild 20 wird dann zusammen mit den jedem Pixel des zweidimensionalen Bildes zugeordneten jeweiligen Tiefeninformationen der Datenverarbeitungsvorrichtung 40 zugeführt.Regarding 2 are first in step S1 using the in 1 illustrated receiving device 30th an in 3A illustrated two-dimensional image 20th an area of space that is an image of a hand 10 and the depth information associated with the individual pixels of the two-dimensional image is recorded. The acquisition of the pixel data of the individual pixels of the two-dimensional image 20th for example, with the in 1 illustrated color camera 31 and include the detection of color values such as RGB values associated with each pixel of the two-dimensional image. The acquisition of the depth information can be carried out by one of the methods referring to 1 described devices for determining the distance of the individual points of the objects located in the spatial area. The captured two-dimensional image 20th is then used together with the respective depth information of the data processing device assigned to each pixel of the two-dimensional image 40 fed.

In Schritt S2 führt die Datenverarbeitungsvorrichtung 40 eine Bildverarbeitung anhand des zweidimensionalen Bildes 20 durch, um eine Hand 10 in dem zweidimensionalen Bild zu erkennen. Zur Erkennung der Hand 10 werden von der Datenverarbeitungsvorrichtung 40 eine Vielzahl von Modellen einer Hand, welche vorab durch maschinelles Lernen gewonnen wurden und in der Speichervorrichtung der Datenverarbeitungsvorrichtung 40 gespeichert sind, verwendet. Dabei kann die Bildverarbeitung zur Erkennung der Hand 10 unter Verwendung eines neuronalen Netzwerks erfolgen. Anhand der in dem zweidimensionalen Bild erkannten Hand 10 wird dann von der Datenverarbeitungsvorrichtung 40 ein Bildausschnitt des zweidimensionalen Bildes 20, der ein Bild der Hand 10 enthält, als ein Bereich von Interesse 21 bzw. als eine Region of Interest (ROI) 21 in dem zweidimensionalen Bild definiert.In step S2 runs the data processing device 40 image processing based on the two-dimensional image 20th through to a hand 10 to be recognized in the two-dimensional image. To recognize the hand 10 are from the data processing device 40 a plurality of models of a hand, which were obtained in advance by machine learning, and in the storage device of the data processing device 40 are used. The image processing can be used to recognize the hand 10 be done using a neural network. Using the hand recognized in the two-dimensional image 10 is then used by the data processing device 40 an image section of the two-dimensional image 20th holding a picture of the hand 10 contains as an area of interest 21 or as a Region of Interest (ROI) 21 defined in the two-dimensional image.

Dann wird von der Datenverarbeitungsvorrichtung 40 in Schritt S3 eine Entfernung des Bereichs von Interesse 21 von der Aufnahmeeinrichtung 30 anhand mindestens einer einem Pixel des Bereichs von Interesse 21 zugeordneten Tiefeninformation bestimmt. Dabei kann ein Mittelpunkt 22 des Bereichs von Interesse 21 ermittelt werden, und die Entfernung des Bereichs von Interesse 21 basierend auf der dem Mittelpunkt 22 zugeordneten Tiefeninformation bestimmt werden bzw. als die dem Mittelpunkt 22 zugeordnete Tiefeninformation bestimmt werden.Then from the data processing device 40 in step S3 a removal of the area of interest 21 from the receiving facility 30th based on at least one pixel of the area of interest 21 associated depth information determined. It can be a center point 22nd of the area of interest 21 and the distance of the area of interest 21 based on the the center point 22nd associated depth information can be determined or as the center point 22nd associated depth information can be determined.

Basierend auf der bestimmten Entfernung des Bereichs von Interesse 21 werden von der Datenverarbeitungsvorrichtung 40 in Schritt S4 Pixel des zweidimensionalen Bildes entfernt, deren zugeordnete Tiefeninformationen eine vorbestimmte Bedingung in Bezug auf die bestimmte Entfernung des Bereichs von Interesse 21 erfüllen, wodurch ein reduziertes zweidimensionales Bild erhalten wird. Die vorbestimmte Bedingung kann beispielsweise die Bedingung sein, dass der Betrag der Differenz zwischen dem Wert der Tiefeninformation des jeweiligen Pixels und der bestimmten Entfernung des Bereichs von Interesse kleiner als ein vorbestimmter Wert, beispielsweise 5 cm, ist.Based on the determined distance of the area of interest 21 are from the data processing device 40 in step S4 Removed pixels of the two-dimensional image, their associated depth information a predetermined condition related to the determined distance of the area of interest 21 meet, thereby obtaining a reduced two-dimensional image. The predetermined condition can be, for example, the condition that the amount of the difference between the value of the depth information of the respective pixel and the determined distance of the area of interest is smaller than a predetermined value, for example 5 cm.

In Schritt S5 wird das reduzierte zweidimensionale Bild mittels der Datenverarbeitungsvorrichtung 40 einer morphologischen Transformation unterzogen, bei der fehlerhafte Pixel 22 erkannt werden und die Pixeldaten der fehlerhaften Pixel durch an Pixeldaten benachbarter Pixel angepasste Pixeldaten ersetzt werden, um ein transformiertes zweidimensionales Bild zu erhalten.In step S5 becomes the reduced two-dimensional image by means of the data processing device 40 subjected to a morphological transformation in which defective pixels 22nd are recognized and the pixel data of the defective pixels are replaced by pixel data adapted to pixel data of neighboring pixels in order to obtain a transformed two-dimensional image.

Das transformierte zweidimensionale Bild wird in Schritt S6 mittels der Datenverarbeitungsvorrichtung 40 in ein binäres zweidimensionales Bild oder in ein zweidimensionales Graustufenbild umgewandelt wird.The transformed two-dimensional image is in step S6 by means of the data processing device 40 is converted into a binary two-dimensional image or a two-dimensional grayscale image.

Anhand des binären zweidimensionalen Bildes oder des zweidimensionalen Graustufenbildes wird in Schritt S7 in dem binären zweidimensionalen Bild oder in dem zweidimensionalen Graustufenbild mittels Bildverarbeitung durch die Datenverarbeitungsvorrichtung 40 eine Kontur 11 der Hand 10 ermittelt. Des Weiteren wird mittels der Datenverarbeitungsvorrichtung 40 ein Bildmoment des durch die Kontur 11 der Hand 10 definierten Bildausschnitts des binären zweidimensionalen Bildes oder des zweidimensionalen Graustufenbildes ermittelt, und ein Mittelpunkt 12 der Hand 10 anhand des ermittelten Bildmoments bestimmt.Using the binary two-dimensional image or the two-dimensional grayscale image, step S7 in the binary two-dimensional image or in the two-dimensional grayscale image by means of image processing by the data processing device 40 a contour 11 of the hand 10 determined. Furthermore, by means of the data processing device 40 an image moment of the through the contour 11 of the hand 10 defined image section of the binary two-dimensional image or the two-dimensional grayscale image is determined, and a center point 12th of the hand 10 determined based on the determined image moment.

Anschließend werden in Schritt S8 mittels einer von der Datenverarbeitungsvorrichtung 40 durchgeführten Bildanalyse die Krümmungen der Kontur 11 der Hand 10, und basierend auf den Krümmungen der Kontur 11 der Hand 10 eine konvexe Hülle der Hand 10 ermittelt, aus welcher die Fingerspitzen 13 und die Abschnitte der konvexen Hülle, welche von der konvexen Form abweichen, als die die einzelnen Finger verbindenden Abschnitte 14 ermittelt, und somit die jeweiligen Bereiche des Bildes, welche den einzelnen Fingern der Hand 10 entsprechen, bestimmt. Des Weiteren werden mittels einer von der Datenverarbeitungsvorrichtung 40 durchgeführten Bildanalyse unter Verwendung einer Hauptkomponentenanalyse basierend auf dem Mittelpunkt 12 der Hand und dem Bildmoment die Hauptachsen 14-1 und 14-2 ermittelt. Basierend auf den bestimmten Hauptachsen 14-1, 14-2 wird dann eine Drehung der Hand 10 bzw. eine Ausrichtung der Hand 10 ermittelt.Then in step S8 by means of one of the data processing device 40 performed image analysis the curvatures of the contour 11 of the hand 10 , and based on the curvatures of the contour 11 of the hand 10 a convex hull of the hand 10 determined from which the fingertips 13th and the portions of the convex shell deviating from the convex shape as the portions connecting the individual fingers 14th determined, and thus the respective areas of the image, which the individual fingers of the hand 10 correspond, determined. Furthermore, by means of one of the data processing device 40 image analysis performed using principal component analysis based on the center point 12th the main axes of the hand and the moment of the image 14-1 and 14-2 determined. Based on the determined main axes 14-1 , 14-2 then becomes a twist of the hand 10 or an alignment of the hand 10 determined.

In Schritt S9 werden schließlich von der Datenverarbeitungsvorrichtung 40 basierend auf den ermittelten jeweiligen Bereichen des Bildes, welche den einzelnen Fingern der Hand 10 entsprechen, der ermittelten Ausrichtung der Hand 10 und den entsprechenden zugeordneten Tiefeninformationen die räumlichen Positionen der Hand 10 und der Finger der Hand 10 ermittelt. Basierend auf den ermittelten Positionen der Hand 10 und der Finger der Hand 10 wird dann eine räumliche Darstellung 50 der Hand 10 erstellt und auf einer Anzeige ausgegeben.In step S9 are eventually used by the data processing device 40 based on the determined respective areas of the image, which the individual fingers of the hand 10 correspond to the determined orientation of the hand 10 and the spatial positions of the hand to the corresponding associated depth information 10 and the finger of the hand 10 determined. Based on the determined positions of the hand 10 and the finger of the hand 10 then becomes a spatial representation 50 of the hand 10 created and issued on an advertisement.

Claims

Method for determining spatial positions of a hand (10) and the fingers of the hand (10) in a spatial area, comprising the steps: a) acquisition of pixel data from a plurality of pixels of a two-dimensional image (20) of the spatial area, which comprises an image of a hand (10), and from the respective depth information of the spatial area assigned to the individual pixels of the two-dimensional image (20), b) recognizing the hand (10) on the basis of the pixel data of the plurality of pixels of the two-dimensional image (20) and defining an image section of the two-dimensional image (20), which contains an image of the hand (10), as an area of interest in the two-dimensional Image (20), based on the recognized hand (10), c) determining a distance of the area of interest (21) on the basis of at least one depth information assigned to a pixel of the area of interest (21), d) removing pixels of the two-dimensional image (20), the associated depth information of which fulfills a predetermined condition with regard to the determined distance of the area of interest (21), and thereby obtaining a reduced two-dimensional image, and e) determining the spatial positions of the hand (10) and the fingers of the hand (10) based on the pixel data of the pixels of the reduced two-dimensional image and the depth information assigned to the pixels of the reduced two-dimensional image.

Procedure according to Claim 1 , in which the hand (10) is recognized in step b) based on models of a hand obtained by means of machine learning.

Procedure according to Claim 1 or 2 , in which the hand (10) is recognized in step b) using a neural network.

Method according to one of the Claims 1 until 3 , in which the pixel data of the plurality of pixels of the two-dimensional image (20) of the spatial area contain color information, and the color information of the pixel data of the plurality of pixel data of the two-dimensional image (20) and the depth information of the spatial area assigned to the pixels of the two-dimensional image (20) below Using a depth camera, a stereo camera or a combination of a color camera and a time-of-flight camera.

Method according to one of the Claims 1 until 4th , further comprising a step d1) which is carried out after step d) and before step e), in which the reduced two-dimensional image is subjected to a morphological transformation in order to replace pixel data of defective pixels (22) with pixel data adapted to pixel data of neighboring pixels and obtain a transformed two-dimensional image.

Procedure according to Claim 5 , further comprising a step d2), which is carried out after step d1) and before step e), and in which the transformed two-dimensional image is converted into a binary two-dimensional image or into a two-dimensional grayscale image.

Procedure according to Claim 6 , further comprising a step d3), which is carried out after step d2) and before step e), and in which a contour (11) of the hand (10) in the binary two-dimensional image or in the two-dimensional grayscale image, and a center point (12) of the hand (10) is determined on the basis of an image moment of the image section of the binary two-dimensional image or the two-dimensional grayscale image defined by the contour (11) of the hand (10).

Procedure according to Claim 7 , further comprising a step d4), which is carried out after step d3) and before step e), and in which the areas of the binary two-dimensional image or the two-dimensional grayscale image are determined based on the curvatures of the contour (11) of the hand (10), the individual fingers of the hand (10) correspond, and an orientation of the hand (10) is determined based on the center point (12) of the hand (10) and the image moment.

A device (100) for determining spatial positions of a hand (10) and the fingers of the hand (10) in a spatial region, comprising a recording device (30), which is set up for pixel data from a plurality of pixels of a two-dimensional image (20) of the spatial area, which comprises an image of a hand (10), and from the individual pixels of the two-dimensional image (20) assigned respective depth information of the To capture spatial area, and a data processing device (40) which is set up to carry out the following steps, a) to recognize the hand (10) on the basis of the pixel data of the plurality of pixels of the two-dimensional image (20) and an image section of the two-dimensional image (20) containing an image of the hand (10) as an area of interest in the two-dimensional image (20) based on the recognized hand (10), b) a distance of the area of interest (21) based on at least one pixel to determine depth information associated with the region of interest (21), c) remove pixels of the two-dimensional image (20) whose associated depths are in formations meet a predetermined condition related to the determined distance of the area of interest (21), thereby obtaining a reduced two-dimensional image, and e) the spatial positions of the hand (10) and the fingers of the hand (10) based on the Determine pixel data of the pixels of the reduced two-dimensional image and the depth information assigned to the pixels of the reduced two-dimensional image.

Device (100) according to Claim 9 , further comprising a memory device in which a plurality of models of a hand obtained by means of machine learning are stored, the data processing device (40) being set up to: a) the hand (10) using a neural network and the plurality stored in the memory device recognizable from models of one hand.