[go: up one dir, main page]

CN109086873B - Training method, identification method, device and processing device of recurrent neural network - Google Patents

Training method, identification method, device and processing device of recurrent neural network Download PDF

Info

Publication number
CN109086873B
CN109086873B CN201810870041.4A CN201810870041A CN109086873B CN 109086873 B CN109086873 B CN 109086873B CN 201810870041 A CN201810870041 A CN 201810870041A CN 109086873 B CN109086873 B CN 109086873B
Authority
CN
China
Prior art keywords
action
frame
image
neural network
image sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810870041.4A
Other languages
Chinese (zh)
Other versions
CN109086873A (en
Inventor
张弛
曹宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kuangshi Technology Co Ltd
Original Assignee
Beijing Kuangshi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kuangshi Technology Co Ltd filed Critical Beijing Kuangshi Technology Co Ltd
Priority to CN201810870041.4A priority Critical patent/CN109086873B/en
Publication of CN109086873A publication Critical patent/CN109086873A/en
Application granted granted Critical
Publication of CN109086873B publication Critical patent/CN109086873B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

本发明提供了一种递归神经网络的训练方法、识别方法、装置及处理设备,涉及动作识别技术领域,该方法包括:获取训练样本,训练样本包括视频的多帧图像序列及视频对应的动作标识;对多帧图像序列进行特征提取,得到图像序列特征,图像序列特征包括每帧图像的特征;将图像序列特征输入递归神经网络进行动作分类,获得每帧图像的动作分类概率;其中,动作分类包含无动作类;基于动作分类概率,根据连接时序分类方法计算损失函数;通过对损失函数进行反向传播以训练递归神经网络。本发明实施例可以对动作之间的连接关系进行更好的学习,对于时间序列的动作预测更为准确,从而可以对动作进行更细粒度,更准确的动作识别。

Figure 201810870041

The present invention provides a training method, recognition method, device and processing equipment of a recurrent neural network, and relates to the technical field of action recognition. The method includes: acquiring a training sample, where the training sample includes a multi-frame image sequence of a video and an action identifier corresponding to the video ; Perform feature extraction on multi-frame image sequences to obtain image sequence features, and the image sequence features include the features of each frame of image; Input the image sequence features into a recurrent neural network for action classification, and obtain the action classification probability of each frame image; Among them, the action classification Contains no-action classes; calculates a loss function according to the connected time series classification method based on action classification probabilities; trains a recurrent neural network by back-propagating the loss function. The embodiment of the present invention can better learn the connection relationship between actions, and more accurately predict actions in time series, so that actions can be more fine-grained and more accurate.

Figure 201810870041

Description

Training method, recognition method and device of recurrent neural network and processing equipment
Technical Field
The invention relates to the technical field of motion recognition, in particular to a training method, a recognition method, a device and a processing device of a recurrent neural network.
Background
Existing neural network-based motion recognition technologies are basically divided into two categories: the identification method based on single frame image and the identification method based on multi-frame image.
Based on the identification method of the single-frame image, the single frame in the video is directly subjected to feature extraction by using a CNN (Convolutional Neural Network) and classification and identification of actions are directly performed. The method has the advantages of high training speed, high convergence, good classification effect on some actions with obvious static characteristics, neglecting time information, easily fitting more scenes and simply changing into scene recognition, and the actions such as opening and closing cannot be distinguished. Based on the identification method of multi-frame images, frame extraction is carried out on an action video sequence to form an image sequence for training, CNN is used for extracting image characteristics, attention and other mechanisms are added for learning time information, and meanwhile, a Long Short-Term Memory module LSTM (Long Short-Term Memory) and other RNN (Recurrent Neural Network) modules can be added for learning and integrating time dimension information. However, the output of the motion classification label is still a single motion classification label, and one image sequence is divided into one motion, so that the motion identification prediction from frame to frame cannot be performed, and the time sequence conversion relation of the motion cannot be learned.
In view of the problems of the identification method based on multi-frame images, no effective solution has been proposed.
Disclosure of Invention
In view of the above, an object of the present invention is to provide a training method, an identification method, an apparatus and a processing device for a recurrent neural network, which can learn a time sequence conversion relationship between actions, so as to perform finer granularity and more accurate action identification on the actions.
In order to achieve the above purpose, the embodiment of the present invention adopts the following technical solutions:
in a first aspect, an embodiment of the present invention provides a training method for a recurrent neural network, where the method includes: acquiring a training sample, wherein the training sample comprises a multi-frame image sequence of a video and an action identifier corresponding to the video; extracting the features of the multi-frame image sequence to obtain image sequence features, wherein the image sequence features comprise the features of each frame of image; inputting the image sequence characteristics into a recurrent neural network for action classification to obtain action classification probability of each frame of image; wherein the action classification comprises a no action class; calculating a loss function according to a connection timing classification method based on the action classification probability; training a recurrent neural network by back-propagating the loss function.
Further, the step of extracting the features of the multi-frame image sequence to obtain the features of the image sequence includes: inputting the multi-frame image sequence into a convolutional neural network for spatial feature extraction; inputting the extracted spatial features into a long-term and short-term memory module for time feature extraction; and taking the extracted time characteristic as an image sequence characteristic.
Further, the step of inputting the image sequence features into a recurrent neural network for action classification to obtain an action classification probability of each frame of image includes: and inputting the characteristics of each frame of image into a Softmax layer of a recurrent neural network to obtain the action classification probability of each frame of image at each time.
Further, the step of calculating a loss function according to a connection timing classification method based on the action classification probability includes: multiplying the action classification probability by the time corresponding to the action classification probability, and calculating the classification probability of the multi-frame image sequence; calculating the error rate of the action identifier by using a connection time sequence classification method through a dynamic programming mode for the classification probability of the multi-frame image sequence; the action identification error rate is used as a loss function.
Further, the step of calculating the error rate of the action identifier by using a connection time sequence classification method through a dynamic planning mode for the classification probability of the multi-frame image sequence comprises the following steps: based on the classification probability of the multi-frame image sequence, calculating a first probability of a prior action identifier corresponding to the kth action identifier through a forward module; calculating a second probability of a subsequent action identifier corresponding to the k +1 th action identifier through a backward module based on the classification probability of the multi-frame image sequence; and combining the first probability and the second probability to obtain an action identifier error rate.
In a second aspect, an embodiment of the present invention provides a method for performing motion recognition by using a recurrent neural network obtained in any one of the first aspect and various possible implementation manners thereof, including: acquiring a video to be identified, and extracting an image sequence of the video to be identified; the image sequence comprises at least two frames of images ordered in time; inputting the sequence of images into the recurrent neural network; the recurrent neural network is obtained by training based on a connection time sequence classification method; classifying each frame of image through the recurrent neural network to obtain an action identifier of each frame of image; and determining the action identifier of the video to be identified according to the action identifier of each frame of image.
Further, the step of classifying each frame of the image through the recurrent neural network to obtain an action identifier of each frame of the image includes: determining, by the recurrent neural network, a classification probability for each frame of the image; and classifying each frame of image according to the maximum classification probability to obtain the action identifier of each frame of image.
Further, the step of determining the motion identifier of the video to be recognized according to the motion identifier of each frame of the image includes: if the action identifier of the image is a no-action identifier, removing the no-action identifier; if the action identifiers of at least two continuous frames of images are repeated, removing the repeated action identifiers; and determining the obtained simplified result as the action identifier of the video to be identified.
In a third aspect, an embodiment of the present invention provides an apparatus for training a recurrent neural network, where the apparatus includes: the device comprises a sample acquisition module, a data acquisition module and a data processing module, wherein the sample acquisition module is used for acquiring a training sample, and the training sample comprises a multi-frame image sequence of a video and an action identifier corresponding to the video; the characteristic extraction module is used for extracting the characteristics of the multi-frame image sequence to obtain image sequence characteristics, and the image sequence characteristics comprise the characteristics of each frame of image; the action classification module is used for inputting the image sequence characteristics into a recurrent neural network to carry out action classification so as to obtain action classification probability of each frame of image; wherein the action classification comprises a no action class; a loss function module for calculating a loss function according to a connection timing classification method based on the action classification probability; a back propagation module for training a recurrent neural network by back propagating the loss function.
In a fourth aspect, an embodiment of the present invention provides an apparatus for performing motion recognition by using the recurrent neural network obtained in any one of the first aspect and its possible implementation manners, including: the device comprises an acquisition module, a recognition module and a recognition module, wherein the acquisition module is used for acquiring a video to be recognized and extracting an image sequence of the video to be recognized; the image sequence comprises at least two frames of images ordered in time; an input module for inputting the sequence of images into the recurrent neural network; the recurrent neural network is obtained by training based on a connection time sequence classification method; the classification module is used for classifying each frame of image through the recurrent neural network to obtain an action identifier of each frame of image; and the identification module is used for determining the action identifier of the video to be identified according to the action identifier of each frame of image.
In a fifth aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the method according to any one of the first and second aspects when executing the computer program.
In a sixth aspect, the present invention provides a computer readable medium having a program code executable by a processor, the program code causing the processor to perform the steps of the method according to any one of the first and second aspects.
The embodiment of the invention provides a training method, an identification method, a device and a processing device of a recurrent neural network, which can be used for more accurately identifying actions based on an image sequence by a CTC method, learning and distinguishing non-action frames, better learning the connection relation between the actions and more accurately predicting the actions of a time sequence, thereby being capable of more finely-grained and more accurately identifying the actions.
Additional features and advantages of the disclosure will be set forth in the description which follows, or in part may be learned by the practice of the above-described techniques of the disclosure, or may be learned by practice of the disclosure.
In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic diagram illustrating an electronic system according to an embodiment of the present invention;
FIG. 2 is a flow chart of a recurrent neural network training method provided by an embodiment of the present invention;
FIG. 3 shows a schematic diagram of a backward and forward algorithm provided by an embodiment of the present invention;
FIG. 4 is a flow chart illustrating a method for motion recognition according to an embodiment of the present invention;
FIG. 5 is a block diagram illustrating an exemplary embodiment of a recurrent neural network training apparatus;
fig. 6 is a block diagram illustrating a structure of a motion recognition apparatus according to an embodiment of the present invention.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In various applications of video structuring, the analysis of action behaviors is crucial to the real cognition of a machine, and especially the action recognition of people plays a core role in numerous fields such as intelligent retail, security and the like.
In view of the fact that the existing multi-frame image recognition method can only output a single motion classification label, divides an image sequence into one motion, cannot perform motion recognition prediction frame by frame, does not consider no motion label, and is not consistent with real application, in order to improve the problem, embodiments of the present invention provide a training method, a recognition method, an apparatus, and a processing device for a recurrent neural network, which are described in detail below.
The first embodiment is as follows:
first, an example electronic system 100 for implementing a motion recognition method, apparatus, and processing device of an embodiment of the present invention and a storage medium thereof will be described with reference to fig. 1.
As shown in fig. 1, an electronic system 100 includes one or more processing devices 102 and one or more memory devices 104. Optionally, the electronic system 100 shown in FIG. 1 may also include an input device 106, an output device 108, and a data acquisition device 110, which may be interconnected via a bus system 112 and/or other form of connection mechanism (not shown). It should be noted that the components and structure of the electronic system 100 shown in fig. 1 are exemplary only, and not limiting, and that the electronic system may have other components and structures as desired.
The processing device 102 may be a gateway, an intelligent terminal, or a device including a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or other form of processing unit having data processing capability and/or instruction execution capability, may process data of other components in the electronic system 100, and may control other components in the electronic system 100 to perform desired functions.
The storage 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. On which one or more computer program instructions may be stored that may be executed by processing device 102 to implement client functionality (implemented by the processing device) and/or other desired functionality in embodiments of the present invention described below. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.
The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.
The output device 108 may output various information (e.g., images or sounds) to the outside (e.g., a user), and may include one or more of a display, a speaker, and the like.
The data capture device 110 may capture a video to be identified, etc., and store the captured video in the storage device 104 for use by other components. Illustratively, the data acquisition device 110 may be a camera.
For example, the devices in the exemplary electronic system for implementing the motion recognition method, the motion recognition apparatus, the processing device and the storage medium thereof according to the embodiments of the present invention may be integrally disposed, or may be disposed in a distributed manner, such as integrally disposing the processing device 102, the storage device 104, the input device 106 and the output device 108, and disposing the data acquisition device 110 separately.
Exemplary electronic devices for implementing the training method, the recognition method, the apparatus and the processing device of the recurrent neural network according to the embodiments of the present invention may be implemented as smart terminals such as smart phones, tablet computers, and the like.
Example two:
in accordance with an embodiment of the present invention, there is provided an embodiment of an action recognition method, it should be noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.
The recurrent neural network used in the embodiment of the present invention is trained by an action recognition network training method based on a CTC (dynamic programming method connection timing classification) loss function, and refer to a flowchart of the recurrent neural network training method shown in fig. 2, where the method specifically includes the following steps:
step S202, training samples are obtained. The training sample comprises a multi-frame image sequence and an action identifier corresponding to the multi-frame image sequence. The action identifier may use an action tag to represent an action corresponding to each frame of image, and it should be noted herein that if a certain frame of image is not an action, the action tag is a no-action tag. The action identification is subjected to simplification operations including a null operation and a duplicate operation.
And step S204, performing feature extraction on the multi-frame image sequence to obtain image sequence features.
The method comprises the following steps of firstly extracting spatial information and then extracting time information of a multi-frame image sequence, and can be executed according to the following steps:
(1) inputting the multi-frame image sequence into a convolution neural network to carry out spatial feature extraction, namely carrying out feature extraction of spatial information. (2) And inputting the extracted spatial features into a long-term and short-term memory module for time feature extraction, namely, preliminary learning of time information. (3) And taking the extracted time characteristic as an image sequence characteristic.
And step S206, inputting the image sequence characteristics into a recurrent neural network for action classification, and obtaining the action classification probability of each frame of image. Wherein the action class includes a no action class.
And inputting the time characteristics of each frame of image into a Softmax layer of the recurrent neural network to obtain the action classification probability of each frame of image. CTC models can be concatenated at the last layer of the RNN network for sequence learning. For a sequence with the length of T, each sample outputs a softmax vector at the last layer of the RNN, the softmax vector represents the prediction probability, and after the CTC model is connected, the action tag of the sequence can be correctly predicted.
Step S208, based on the action classification probability, calculating a loss function according to a connection time sequence classification method.
The classification probability of the multi-frame image sequence can be calculated by multiplying T (time) by P (probability), wherein T represents the probability of the next time point by multiplying the time by the time to obtain the probability of the labeled image sequence. And calculating the action identifier error rate by using a connection time sequence classification method in a dynamic programming mode, and taking the action identifier error rate as a loss function. For example, based on the above probabilities, a first probability of a previous identifier corresponding to the kth action identifier is calculated by the forward module, a second probability of a subsequent identifier corresponding to the kth action identifier is calculated by the backward module, and the action identifier error rate can be obtained by combining the first probability and the second probability.
The label error probability is composed of a forward module and a backward module. The forward module calculates the sum of the probabilities of all possible tags to which the kth action tag can correspond. The backward module calculates the sum of the probabilities of all subsequent tags that can correspond to the tags of the upper video starting from the k +1 tag. All possible probabilities are added up to get the label error rate based on the current probability distribution.
Referring to the schematic diagram of the forward and backward algorithm shown in fig. 3, since the probability of many paths needs to be calculated, the forward and backward algorithm is introduced to calculate the probability value. For example, a video comprising 50 frames of image frames is classified as ab (label true), a and b being two different action labels, respectively. The dynamic planning method summarizes all possible situations and calculates the probability of being the correct label (i.e. ab) after simplification. The forward direction refers to the time t, the simplified label is the same as the labeled true value, and the backward direction refers to the probability that all the simplified labels from the time t backward are the same as the labeled true value. Both are solved by dynamic programming.
For example, when t is 0 and the true value is a, aaaaaa — aaaaaa is the same as the true value a, but aaaaaaab is different, the dynamic programming may find all correct cases and sum the probabilities of all correct cases, and the training target is the maximum.
Step S210, training the recurrent neural network by back-propagating the loss function.
The recurrent neural network is trained by back propagation of the loss function, and the existing back propagation training mode can be adopted, so that the details are not repeated.
According to the training method of the motion recognition network, the motion is more accurately recognized based on the image sequence through the CTC method, meanwhile, the non-motion frames are learned and distinguished, the connection relation between the motions is better learned, and the motion prediction of the time sequence is more accurate, so that the motion can be more finely grained and more accurate in motion recognition
Fig. 4 is a flowchart of a motion recognition method according to an embodiment of the present invention, which applies the recurrent neural network obtained by training the above method to perform recognition. As shown in fig. 4, the method includes the steps of:
step S402, acquiring a video to be identified, and extracting an image sequence of the video to be identified. The image sequence comprises at least two frames of temporally ordered images. In this embodiment, the action of the video to be recognized is recognized based on a multi-frame image recognition method, at least two frames of images ordered in time need to be extracted, and the extracted images are used as an image sequence for recognition and detection. All frames in the video can be extracted to form an image sequence, or a part of frames can be extracted to form an image sequence according to actual needs, for example, part of frames are extracted uniformly at fixed time intervals.
Step S404, inputting the image sequence into the recurrent neural network.
The recurrent neural network is obtained by training based on the connection time sequence classification method. In the training process, the motion classification probability of each frame of image can be obtained by sequentially extracting spatial features and time features from training samples and performing motion classification, and the motion label error rate is calculated by connecting a dynamic programming method with a time sequence classification method CTC. And taking the error rate of the action label as a loss function, and performing back propagation training on the recurrent neural network by using the loss function until the loss function meets a preset training end condition. The training sample comprises a multi-frame image sequence and an action identifier corresponding to the multi-frame image sequence, and the action classification comprises a no-action class which indicates that the image has no action or the action is null. The CTC loss function-based action recognition network training method can better learn the connection relation between actions and predict the action of a time sequence more accurately.
Step S406, classifying each frame of image through a recurrent neural network to obtain the action identifier of each frame of image.
And determining the classification probability of each frame of image by using the recurrent neural network, classifying each frame of image according to the maximum classification probability, and taking the action identifier corresponding to the category of the maximum classification probability as the action identifier of the frame of image.
And step S408, determining the action identifier of the video to be recognized according to the action identifier of each frame of image.
Since the motion identifier obtained by the recognition and classification exists in the image extracted from the video to be recognized and is a non-motion or repeated image, the motion identifier of the image needs to be simplified, including the operations of removing null and removing duplicate. If the action identifier of the image is a no-action identifier, removing the no-action identifier; if the action identifiers of at least two frames of images are repeated, removing the repeated action identifiers, or combining adjacent repeated action identifiers and only reserving one action identifier; and finally, determining the obtained simplified result as the action identifier of the video to be recognized, wherein the action identifier can represent the action included in the video to be recognized.
According to the action recognition method provided by the embodiment of the invention, the image sequences are classified through the recurrent neural network, the recurrent neural network is obtained by training based on a connection time sequence classification method, the action identification of each frame of image is obtained, the action identification of the video to be recognized is determined according to the action identification of each frame of image, the connection relation between the actions can be better learned, the actions of the sequences can be more accurately predicted, and the actions can be more finely-grained and more accurately recognized based on the image sequences.
Example three:
corresponding to the training method of the recurrent neural network provided in the second embodiment, an embodiment of the present invention provides a training apparatus of the recurrent neural network, and referring to a structural block diagram of the training apparatus of the recurrent neural network shown in fig. 5, the training apparatus includes:
a sample obtaining module 502, configured to obtain a training sample, where the training sample includes a multi-frame image sequence of a video and an action identifier corresponding to the video;
the feature extraction module 504 is configured to perform feature extraction on a multi-frame image sequence to obtain image sequence features, where the image sequence features include features of each frame of image;
the action classification module 506 is used for inputting the image sequence characteristics into a recurrent neural network to carry out action classification, and obtaining action classification probability of each frame of image; wherein the action classification comprises a no action class;
a loss function module 508, configured to calculate a loss function according to a connection timing classification method based on the action classification probability;
a back propagation module 510 for training the recurrent neural network by back propagating the loss function.
The training device of the recursive motion recognition network provided by the embodiment of the invention can be used for more accurately recognizing the motion based on the image sequence through the CTC method, learning and distinguishing the frames without the motion, better learning the connection relation between the motions and more accurately predicting the motion of the time sequence, thereby being capable of performing more fine-grained and more accurate motion recognition on the motion.
In one embodiment, the feature extraction module is further configured to: inputting a multi-frame image sequence into a convolutional neural network for spatial feature extraction; inputting the extracted spatial features into a long-term and short-term memory module for time feature extraction; and taking the extracted time characteristic as the characteristic of each frame of image.
In another embodiment, the action classification module is further configured to: and inputting the characteristics of each frame of image into a Softmax layer of the recurrent neural network to obtain the action classification probability of each frame of image at each time.
In another embodiment, the loss function module is further configured to: multiplying the action classification probability by the time corresponding to the action classification probability, and calculating the classification probability of the multi-frame image sequence; calculating the error rate of the action identifier by using a connection time sequence classification method through a dynamic programming mode for the classification probability of the multi-frame image sequence; the action identification error rate is taken as a loss function.
In another embodiment, the loss function module is further configured to: based on the classification probability of the multi-frame image sequence, calculating a first probability of a prior action identifier corresponding to the kth action identifier through a forward module; based on the classification probability of the multi-frame image sequence, calculating a second probability of a subsequent action identifier corresponding to the k +1 th action identifier through a backward module; and combining the first probability and the second probability to obtain the error rate of the action identifier.
Corresponding to the action recognition method provided in the second embodiment, an embodiment of the present invention provides an action recognition apparatus, referring to a structural block diagram of an action recognition apparatus shown in fig. 6, including:
an obtaining module 602, configured to obtain a video to be identified, and extract an image sequence of the video to be identified; the image sequence comprises at least two frames of images ordered in time;
an input module 604 for inputting the sequence of images into a recurrent neural network; the recurrent neural network is obtained by training based on a connection time sequence classification method;
a classification module 606, configured to classify each frame of image through a recurrent neural network to obtain an action identifier of each frame of image;
and the identifying module 608 is configured to determine an action identifier of the video to be identified according to the action identifier of each frame of image.
The motion recognition device provided by the embodiment of the invention classifies each frame of image through the recurrent neural network, the recurrent neural network is obtained by training based on a connection time sequence classification method, the motion identifier of each frame of image is obtained, the motion identifier of the video to be recognized is determined according to the motion identifier of each frame of image, the connection relation between motions can be better learned, and the motions can be recognized more finely and accurately based on the image sequence.
In one embodiment, the classification module is further configured to: determining the classification probability of each frame of image through a recurrent neural network; and classifying each frame of image according to the maximum classification probability to obtain the action identifier of each frame of image.
In another embodiment, the identification module is further configured to: if the action identifier of the image is a no-action identifier, removing the no-action identifier; if the action marks of at least two continuous frames of images are repeated, removing the repeated action marks; and determining the obtained simplified result as the action identifier of the video to be recognized.
The device provided by the embodiment has the same implementation principle and technical effect as the foregoing embodiment, and for the sake of brief description, reference may be made to the corresponding contents in the foregoing method embodiment for the portion of the embodiment of the device that is not mentioned.
The embodiment of the present invention further provides a processing device, which includes a memory and a processor, where the memory stores a computer program that can be executed on the processor, and the processor implements the steps of the method provided in the foregoing embodiment when executing the computer program. Optionally, the electronic device may further comprise an image capture device or a fingerprint sensor.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the electronic device described above may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.
Further, the present embodiment also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program performs the steps of the method provided by the foregoing method embodiment.
The training method, the recognition method, the device for the recurrent neural network, and the computer program product of the processing apparatus provided in the embodiments of the present invention include a computer-readable storage medium storing a program code, and instructions included in the program code may be used to execute the method provided in the foregoing method embodiments.
The embodiment also provides a computer program, and the computer program can be stored on a storage medium in the cloud or the local. When being executed by a computer or a processor, for performing the methods provided in the previous method embodiments and for implementing the corresponding modules in the identification-based management apparatus according to embodiments of the present invention. For specific implementation, reference may be made to the method embodiment, which is not described herein again.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another device, or some features may be omitted, or not executed.
The various apparatus embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some of the blocks in an apparatus according to embodiments of the present invention. The present application may also be embodied as apparatus programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. For example, the programs of the present application may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
The above-described functions of the present application, if implemented in the form of software functional units and sold or used as a separate product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method of training a recurrent neural network, the method comprising:
acquiring a training sample, wherein the training sample comprises a multi-frame image sequence of a video and an action identifier corresponding to the video;
extracting the features of the multi-frame image sequence to obtain image sequence features, wherein the image sequence features comprise the features of each frame of image;
inputting the image sequence characteristics into a recurrent neural network for action classification to obtain action classification probability of each frame of image; wherein the action classification comprises a no action class;
multiplying the action classification probability by the time corresponding to the action classification probability, and calculating the classification probability of the multi-frame image sequence;
based on the classification probability of the multi-frame image sequence, calculating a first probability of a prior action identifier corresponding to the kth action identifier through a forward module;
calculating a second probability of a subsequent action identifier corresponding to the k +1 th action identifier through a backward module based on the classification probability of the multi-frame image sequence;
combining the first probability and the second probability to obtain an action identifier error rate;
identifying the action identification error rate as a loss function;
training a recurrent neural network by back-propagating the loss function.
2. The method according to claim 1, wherein the step of performing feature extraction on the multi-frame image sequence to obtain image sequence features comprises:
inputting the multi-frame image sequence into a convolutional neural network for spatial feature extraction;
inputting the extracted spatial features into a long-term and short-term memory module for time feature extraction;
and taking the extracted time characteristic as an image sequence characteristic.
3. The method according to claim 1 or 2, wherein the step of inputting the image sequence features into a recurrent neural network for action classification to obtain action classification probabilities of each frame of image comprises:
and inputting the characteristics of each frame of image into a Softmax layer of a recurrent neural network to obtain the action classification probability of each frame of image at each time.
4. A method for performing motion recognition using the recurrent neural network obtained in any one of claims 1-3, comprising:
acquiring a video to be identified, and extracting an image sequence of the video to be identified; the image sequence comprises at least two frames of images ordered in time;
inputting the sequence of images into the recurrent neural network; the recurrent neural network is obtained by training based on a connection time sequence classification method;
classifying each frame of image through the recurrent neural network to obtain an action identifier of each frame of image;
and determining the action identifier of the video to be identified according to the action identifier of each frame of image.
5. The method of claim 4, wherein the step of classifying each frame of the image by the recurrent neural network to obtain the motion identifier of each frame of the image comprises:
determining, by the recurrent neural network, a classification probability for each frame of the image;
and classifying each frame of image according to the maximum classification probability to obtain the action identifier of each frame of image.
6. The method according to claim 5, wherein the step of determining the motion identifier of the video to be recognized according to the motion identifier of each frame of the image comprises:
if the action identifier of the image is a no-action identifier, removing the no-action identifier;
if the action identifiers of at least two continuous frames of images are repeated, removing the repeated action identifiers;
and determining the obtained simplified result as the action identifier of the video to be identified.
7. An apparatus for training a recurrent neural network, the apparatus comprising:
the device comprises a sample acquisition module, a data acquisition module and a data processing module, wherein the sample acquisition module is used for acquiring a training sample, and the training sample comprises a multi-frame image sequence of a video and an action identifier corresponding to the video;
the characteristic extraction module is used for extracting the characteristics of the multi-frame image sequence to obtain image sequence characteristics, and the image sequence characteristics comprise the characteristics of each frame of image;
the action classification module is used for inputting the image sequence characteristics into a recurrent neural network to carry out action classification so as to obtain action classification probability of each frame of image; wherein the action classification comprises a no action class;
the loss function module is used for multiplying the action classification probability by the time corresponding to the action classification probability and calculating the classification probability of the multi-frame image sequence; based on the classification probability of the multi-frame image sequence, calculating a first probability of a prior action identifier corresponding to the kth action identifier through a forward module; calculating a second probability of a subsequent action identifier corresponding to the k +1 th action identifier through a backward module based on the classification probability of the multi-frame image sequence; combining the first probability and the second probability to obtain an action identifier error rate; identifying the action identification error rate as a loss function;
a back propagation module for training a recurrent neural network by back propagating the loss function.
8. An apparatus for performing motion recognition using the recurrent neural network obtained in any one of claims 1 to 3, comprising:
the device comprises an acquisition module, a recognition module and a recognition module, wherein the acquisition module is used for acquiring a video to be recognized and extracting an image sequence of the video to be recognized; the image sequence comprises at least two frames of images ordered in time;
an input module for inputting the sequence of images into the recurrent neural network; the recurrent neural network is obtained by training based on a connection time sequence classification method;
the classification module is used for classifying each frame of image through the recurrent neural network to obtain an action identifier of each frame of image;
and the identification module is used for determining the action identifier of the video to be identified according to the action identifier of each frame of image.
9. A processing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any of the preceding claims 1 to 6 when executing the computer program.
10. A computer-readable medium having program code executable by a processor, the program code causing the processor to perform the method of any of claims 1 to 6.
CN201810870041.4A 2018-08-01 2018-08-01 Training method, identification method, device and processing device of recurrent neural network Active CN109086873B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810870041.4A CN109086873B (en) 2018-08-01 2018-08-01 Training method, identification method, device and processing device of recurrent neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810870041.4A CN109086873B (en) 2018-08-01 2018-08-01 Training method, identification method, device and processing device of recurrent neural network

Publications (2)

Publication Number Publication Date
CN109086873A CN109086873A (en) 2018-12-25
CN109086873B true CN109086873B (en) 2021-05-04

Family

ID=64833862

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810870041.4A Active CN109086873B (en) 2018-08-01 2018-08-01 Training method, identification method, device and processing device of recurrent neural network

Country Status (1)

Country Link
CN (1) CN109086873B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109815846B (en) * 2018-12-29 2021-08-27 腾讯科技(深圳)有限公司 Image processing method, image processing apparatus, storage medium, and electronic apparatus
CN111797655A (en) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 User activity identification method and device, storage medium and electronic equipment
CN110287810B (en) * 2019-06-04 2021-05-25 北京字节跳动网络技术有限公司 Vehicle door motion detection method, device and computer readable storage medium
CN110363159A (en) * 2019-07-17 2019-10-22 腾讯科技(深圳)有限公司 Image-recognizing method, device, electronic equipment and computer readable storage medium
CN110414446B (en) * 2019-07-31 2021-04-16 广东工业大学 Method and device for generating operation instruction sequence of robot
CN110751021A (en) * 2019-09-03 2020-02-04 北京迈格威科技有限公司 Image processing method, image processing device, electronic equipment and computer readable medium
CN110866509B (en) * 2019-11-20 2023-04-28 腾讯科技(深圳)有限公司 Action recognition method, device, computer storage medium and computer equipment
CN111008280B (en) * 2019-12-04 2023-09-05 北京百度网讯科技有限公司 A video classification method, device, equipment and storage medium
CN111199202B (en) * 2019-12-30 2024-04-26 南京师范大学 Human body action recognition method and recognition device based on circulating attention network
CN111401259B (en) * 2020-03-18 2024-02-02 南京星火技术有限公司 Model training method, system, computer readable medium and electronic device
CN112231516B (en) * 2020-09-29 2024-02-27 北京三快在线科技有限公司 Training method of video abstract generation model, video abstract generation method and device
CN114612811B (en) * 2020-12-04 2025-06-17 丰田自动车株式会社 A target behavior classification method, storage medium and terminal
CN113420813B (en) * 2021-06-23 2023-11-28 北京市机械工业局技术开发研究所 Diagnostic method for particulate matter filter cotton state of vehicle tail gas detection equipment
CN115641640A (en) * 2021-07-20 2023-01-24 北京百度网讯科技有限公司 Motion recognition model training method and device and motion recognition method and device
CN114266997B (en) * 2021-12-23 2025-05-13 厦门市美亚柏科信息股份有限公司 Training method, device, computing device and storage medium for video action recognition model
CN114429675B (en) * 2022-01-21 2025-06-13 京东方科技集团股份有限公司 Action recognition method, model training method, device and electronic equipment
CN114419739A (en) * 2022-03-31 2022-04-29 深圳市海清视讯科技有限公司 Training method of behavior recognition model, behavior recognition method and equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451552A (en) * 2017-07-25 2017-12-08 北京联合大学 A kind of gesture identification method based on 3D CNN and convolution LSTM
CN107862331A (en) * 2017-10-31 2018-03-30 华中科技大学 It is a kind of based on time series and CNN unsafe acts recognition methods and system
CN107766839B (en) * 2017-11-09 2020-01-14 清华大学 Motion recognition method and device based on 3D convolutional neural network
CN108216252B (en) * 2017-12-29 2019-12-20 中车工业研究院有限公司 Subway driver vehicle-mounted driving behavior analysis method, vehicle-mounted terminal and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Sequence Modeling with CTC";Awni Hannun;《DOI:10.23915/distill.00008》;20171027;文章全文 *
Pavlo Molchanov.et al."Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks".《2016 IEEE Conference on Computer Vision and Pattern Recognition》.2016, *

Also Published As

Publication number Publication date
CN109086873A (en) 2018-12-25

Similar Documents

Publication Publication Date Title
CN109086873B (en) Training method, identification method, device and processing device of recurrent neural network
CN112560999B (en) Target detection model training method and device, electronic equipment and storage medium
CN108875676B (en) Living body detection method, device and system
CN109891897B (en) Methods for analyzing media content
CN113591527B (en) Object track recognition method and device, electronic equipment and storage medium
Mandal et al. Scene independency matters: An empirical study of scene dependent and scene independent evaluation for CNN-based change detection
CN110781960B (en) Training method, classification method, device and equipment of video classification model
CN109727275B (en) Object detection method, device, system and computer readable storage medium
CN110163052B (en) Video action recognition method, device and machine equipment
US11354904B2 (en) Spatial-temporal graph-to-sequence learning based grounded video descriptions
CN113449586B (en) Target detection method, device, computer equipment and storage medium
CN110263916A (en) Data processing method and device, storage medium and electronic device
CN118644811B (en) A video object detection method, device, electronic device and storage medium
Ehsan et al. An accurate violence detection framework using unsupervised spatial–temporal action translation network
CN113705293A (en) Image scene recognition method, device, equipment and readable storage medium
CN116704433A (en) Self-supervision group behavior recognition method based on context-aware relationship predictive coding
CN112766218A (en) Cross-domain pedestrian re-identification method and device based on asymmetric joint teaching network
CN113553952A (en) Abnormal behavior identification method and device, equipment, storage medium, program product
CN112784691B (en) Target detection model training method, target detection method and device
Viet‐Uyen Ha et al. High variation removal for background subtraction in traffic surveillance systems
CN119904786A (en) Method, device and apparatus for generating event description text based on video data
CN115115985B (en) Video analysis method and device, electronic equipment and storage medium
CN110490058B (en) Training method, device and system of pedestrian detection model and computer readable medium
CN110489592B (en) Video classification method, apparatus, computer device and storage medium
EP3940586A1 (en) An electronic device and a related method for detecting and counting an action

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Training methods, recognition methods, devices, and processing equipment for recursive neural networks

Granted publication date: 20210504

Pledgee: Shanghai Yunxin Venture Capital Co.,Ltd.

Pledgor: BEIJING KUANGSHI TECHNOLOGY Co.,Ltd.

Registration number: Y2024110000102

PE01 Entry into force of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Granted publication date: 20210504

Pledgee: Shanghai Yunxin Venture Capital Co.,Ltd.

Pledgor: BEIJING KUANGSHI TECHNOLOGY Co.,Ltd.

Registration number: Y2024110000102

PC01 Cancellation of the registration of the contract for pledge of patent right