Detailed Description
The technical solutions in the embodiments of the present application will be described clearly below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present disclosure.
The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.
The video generation method provided by the embodiment of the present application is described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.
Please refer to fig. 1, which is a flowchart illustrating a video generation method according to an embodiment of the present application. The method can be applied to electronic equipment, and the electronic equipment can be a mobile phone, a tablet computer, a notebook computer and the like. As shown in FIG. 1, the method may include steps S1100-S1300, described in detail below.
S1100, under the condition of playing an original video, collecting behavior data of a user, wherein the user is a user watching the original video.
The original video refers to an online education video played in the electronic device, for example, the original video may be a network course video provided to the user for watching in a live broadcast manner; for another example, the original video may be a video provided to the user for viewing by the network education platform.
The behavior data of the user refers to behavior data representing the behavior of the user in the process of playing the original video. For example, the behavioral data may be data of a user's eye gaze focus, facial data, pose data, voice data, and so on. In the embodiment of the application, the behavior data of the user can be obtained by the electronic device after various data acquisition devices acquire images of the user watching the original video and perform image recognition under the condition of obtaining the authorization of the user.
In the related art, since online education generally provides knowledge points to a user in the form of live video, in order to prevent the user from missing knowledge points, in the process of receiving online education through an electronic device, the electronic device generally detects the class attendance state data of the user, so as to send a prompt message to the user in time to restrain the user from attentively listening to classes under the condition that the user may not concentrate on listening to the speech.
However, at the first moment when the concentration degree of the user is reduced, even if the electronic equipment sends out the reminding information in time, the user can improve the concentration degree and listen and talk again with concentration according to the reminding information at the second moment; however, the user has also missed a knowledge point of the original video in the time length range between the first time and the second time.
In order to enable a user to conveniently warm and learn about missed knowledge points in an original video, in the embodiment of the application, the electronic device can continuously acquire behavior data of the user through a data acquisition device in communication connection with the electronic device under the condition that the original video is played, so that video data corresponding to a target concentration degree in the original video is recorded under the condition that the concentration degree of the user is detected to be the target concentration degree according to the behavior data, and therefore the user can perform key warming on the missed knowledge points according to the recorded target video.
In an embodiment of the present application, the data acquisition device may be an image sensor, an audio input device, or the like.
Specifically, in one embodiment, referring to fig. 2, in the case that a user watches a live web lesson through a tablet, the tablet may collect at least one user image of the user watching a video through its front camera and/or microphone, and recognize the image through an image recognition process, such as a facial recognition algorithm, a gesture recognition algorithm, and the like, to obtain behavior data.
S1200, obtaining the target concentration degree of the user according to the behavior data, wherein the target concentration degree represents the concentration degree of the user watching the original video.
Optionally, the target concentration degree represents the concentration degree of the user watching the original video. In an embodiment of the application, the target concentration may be at least one of preset level concentrations associated with the behavioral data of the user, wherein the preset level concentrations may include a first level concentration, a second level concentration, and a third level concentration.
Optionally, the different preset levels of concentration respectively represent the degree of concentration of the user when watching the original video, for example, a first level of concentration corresponds to the user not looking at the screen or the user not appearing in front of the camera for more than a threshold time, a second level of concentration corresponds to the user's facial expression being an expression of frowning, anxious, puzzling, etc., and a third level of concentration corresponds to the user having a small difference action, for example, there is a shaking head, looking right at left, eating, playing a toy, etc.
Hereinafter, how to determine the target concentration degree of the user based on the behavior data of the user will be described.
In one embodiment, the determining the target concentration of the user from the behavioral data includes: according to the behavior data, obtaining an eye gazing focus of the user; acquiring a video position area of the original video; determining the target concentration degree as a first level concentration degree if an offset value of the eye gaze focus and the video location area is greater than a preset offset threshold.
The focus is watched by eyes, the electronic equipment can collect eye data of a user, and the eye data is obtained by obtaining the position data of the eye focus of the user through a face recognition algorithm.
The video position area may be position data representing a play area of the original video in the screen of the electronic device. For example, the video position area can be simply expressed by position data corresponding to all screen areas of the electronic device; alternatively, to improve the accuracy, the video position area may be position data of a corresponding playing area in the original apparent electronic device screen.
Specifically, the electronic device may determine whether the eye gaze focus is located in the video position area by converting the eye gaze focus and the video position area into the same coordinate space, and if not, obtain an offset value of the eye gaze focus with respect to a boundary line of the video position area, and determine that the target concentration degree is the first-level concentration degree when the offset value is greater than a preset offset threshold value.
In this embodiment, to avoid the erroneous determination, in the case that the offset value between the eye gazing focus and the video position area is greater than a preset offset threshold, the determining that the target concentration degree is the first level concentration degree may be: acquiring unfocused time length of the user under the condition that the deviation value is larger than the deviation threshold value; and under the condition that the unfocused time length is greater than a preset time length threshold value, determining the target concentration degree as the first level concentration degree.
The unfocused duration represents a duration that the user does not watch the original video, that is, the target concentration degree can be determined to be the first-level concentration degree under the condition that the gaze of the user deviates from the original video and the deviation duration reaches a preset duration threshold.
For example, in the course of a user surfing a lesson through a tablet computer, when the user has at least one of the conditions that the duration of the user not watching the screen exceeds 20 seconds, the eyesight is frequently removed from the screen, the eyesight does not watch the lesson area in the screen, the user does not hear seriously, and the like, the user is judged to be low in concentration degree and not to be talking seriously, and the target concentration degree can be determined to be the first level of concentration degree.
Because the user gazing focus can be obtained by collecting eye data of the user and quickly calculating, the concentration degree of the user watching the original video can be quickly judged through the eye gazing focus and the deviation value of the video position area corresponding to the original video.
In another embodiment, the determining a target concentration of the user from the behavioral data includes: obtaining facial feature information of the user according to the behavior data; obtaining emotion information of the user according to the facial feature information; determining the target concentration degree as a second level concentration degree under the condition that the emotion information comprises preset emotion information, wherein the preset emotion information comprises at least one of anxiety information and confusion information.
Specifically, the electronic device may extract facial feature information of the user according to the collected user facial data; to identify the current emotion information of the user according to the obtained facial feature information; and, when the recognized emotion information includes a state indicating that the user is in an anxiety state, a confusion state, or the like, it is determined that the user does not understand the knowledge points being played and concentration is reduced, and therefore, it is possible to specify the target concentration in this case as the second level concentration.
In specific implementation, an emotion recognition model for recognizing emotion information of the user can be trained in advance, so that facial feature information in facial data in behavior data is extracted through the emotion recognition model, current emotion information of the user is further obtained, and the concentration degree of the user when the user watches the original video is rapidly judged.
In yet another embodiment, the determining a target concentration of the user from the behavioral data includes: acquiring attitude characteristic information of the user according to the behavior data; acquiring action information of the user according to the posture characteristic information; and determining the target concentration degree as a third level concentration degree under the condition that the action information comprises preset action information, wherein the preset action information comprises at least one of information representing shaking head, looking aside, eating and playing toys.
The posture characteristic information of the user may be information indicating a current posture of the user. For example, the posture characteristic information may be information indicating an operation state of the upper body of the user as shown in fig. 2.
In specific implementation, whether the user has poor movements such as shaking head, looking ahead, eating, playing toys and the like in the process of playing the original video can be identified according to the posture characteristic information of the user, and if the poor movements exist, the current concentration degree of the user is reduced, so that the target concentration degree in the situation can be determined to be the third level concentration degree.
As can be seen from the above description, in the process of playing the original video, according to the embodiment of the application, by acquiring the behavior data of the user and according to the information such as the eye gazing focus, the emotion information, and the gesture action of the user, the concentration degree of the user can be quickly and conveniently determined, so that the corresponding video data is recorded to obtain the target video when the concentration degree of the user is the target concentration degree.
It should be noted that, the above embodiments of determining the target concentration degree are only provided for the embodiments of the present disclosure, and the target concentration degree may also be determined by other steps in the specific implementation, and is not particularly limited herein. In addition, in the embodiment of the present application, the concentration degree is set to different levels to represent that the concentration degree of the user decreases according to the concentration degree of the corresponding level, and the concentration degree of the user may also be directly represented in a numerical form in concrete implementation, and is not particularly limited herein.
S1300, generating a target video according to the target concentration degree and the original video, wherein the target video comprises video data played in the original video under the target concentration degree.
The target video may be a video obtained by recording the original video and/or comment information corresponding to the original video and identifying video data that the user wrongly watched under the target concentration level.
Specifically, under the condition that the user concentration obtained according to the above steps is the target concentration, the corresponding video data in the original video may be recorded according to the target concentration, so that the user may review the video and review the knowledge points that may be missed according to the key information identified in the video.
In an embodiment of the application, the generating a target video according to the target concentration degree and the original video includes: under the condition that the target concentration degree is a preset level concentration degree, acquiring the target starting and ending time played in the original video under the target concentration degree; and obtaining the target video according to the target start-stop time and the original video.
The target start-stop time may be a start-stop time corresponding to a start playing time of the original video, which represents a playing progress of the original video with the user at the target concentration.
In the embodiment of the present application, since the user may have a plurality of attentiveness reduction during playing of the original video, the target attentiveness may be at least one, and the target start-stop time corresponding to the target attentiveness may also be at least one.
For example, the user is at 19: 00 start a web session with the tablet, and the tablet determines that the user is at 19: 01-19: if the user does not watch the web lesson in the period of 02, the corresponding target starting and stopping time in the situation can be set to be 1-2 minutes, so as to indicate that the user does not pay attention to listening to the speech within the 1 st minute to the 2 nd minute of the playing progress of the web lesson; and if it is also determined that the user is at 19: 06-19: when eating during 10 hours, the corresponding target start-stop time in this case can be set to [ 6-10 points ], so as to indicate that the user is not paying attention to listening and speaking within the 6 th to 10 th minutes of the web class playing progress; and if it is also determined that the user is at 19: 15-19: if the face has expression indicating confusion such as frowning, puzzling, and tight mouth during 16 hours, the corresponding target start-stop time in this case may be set to [15 min-16 min ] to indicate the knowledge points in the 15 th minute to 16 th minute when the user does not understand the playing progress of the web lesson.
In one embodiment, the obtaining the target video according to the target start-stop time and the original video comprises at least one of: according to the target starting and stopping time, adding identification information corresponding to the target concentration degree in a progress bar of a recorded video to obtain the target video, wherein the recorded video is obtained by recording the original video, and the identification information comprises color information; and extracting video data corresponding to the target start-stop time from the original video to generate the target video.
Specifically, in the process of playing a video, the electronic device may record an original video, and after the recording is completed, the electronic device adds identification information corresponding to the target concentration degree to a progress bar of the recorded video according to the obtained target start-stop time, so that a user may warm knowledge points that may be missed due to the reduction of the concentration degree according to the obtained identification information in the target video.
For example, in a progress bar of a complete recorded video obtained by recording a web lesson, a progress bar segment corresponding to a first level of concentration, i.e., an eye-unattended web lesson playing area, may be marked red, a progress bar segment corresponding to a second level of concentration, i.e., a face appearing in a confused expression, may be marked orange, and a progress bar segment corresponding to a third level of concentration, i.e., a little-away situation, may be marked yellow. After the target video is obtained according to the setting, the user can conveniently obtain the front and back contents of the corresponding progress bar section according to the color information in the progress bar of the target video so as to warm and learn the knowledge points which may be missed.
Of course, in order to make the target video more targeted, the target video may also be a video obtained by extracting video data corresponding to the target start-stop time in the original video and performing compression editing, and the detailed processing procedure is not described herein again.
Referring additionally to fig. 3, in an embodiment of the present application, in the case of playing an original video, the method further includes: receiving a target instruction, wherein the target instruction is generated by triggering preset video data in the original video; collecting gesture data of the user in response to the target instruction; and performing preset interactive operation with the user according to the attitude data, wherein the preset interactive operation comprises at least one of question answering operation and attitude correction operation.
Specifically, since users of a small age are not used to a web lesson, when a teacher needs to hold his hands to answer a question, clicking a button on a screen to answer a question is less intuitive for children. Therefore, in the process of playing the original video, the electronic device can identify the hand-lifting gesture of the user according to the first instruction generated by triggering the preset video data, so as to naturally and directly guide the user to answer the question.
In addition, considering the original video, that is, the web lesson is not only a Chinese language, but also may be a member action requiring, for example, a classroom exercise, an eye exercise, a physical education course, etc., at the same time, the electronic device may also help the user to complete part of body training in a remote manner by collecting the posture data of the user, so as to fully develop the moral character and beauty, and in the process of playing the original video, perform posture judgment and correction on the user in training of doing exercises, doing sit-ups, etc., so as to help the teacher to automatically remind the user whether the action is standard, etc.
In this embodiment, some accessories can be collocated at the same time, so that the electronic device can perform gesture correction on gesture actions with higher requirements on details, such as lifting a teach-in taijiquan and the like. Of course, the method provided by the embodiment can also be used for the user to play games so as to guide the user to do limb exercise and correct posture in the educational games.
It should be noted that in the video generation method provided in the embodiment of the present application, the execution subject may be a video generation apparatus, or a control module in the video generation apparatus for executing loading of a video generation method. In the embodiment of the present application, a video generation device executes a loaded video generation method as an example, and a video generation method provided in the embodiment of the present application is described.
Corresponding to the above embodiments, referring to fig. 4, an embodiment of the present application further provides a video generating apparatus 400, including:
the collecting module 410 is configured to collect behavior data of a user in a case that an original video is played, where the user is a user watching the original video.
A concentration determination module 420, configured to determine a concentration of the user according to the behavior data, wherein the concentration represents a concentration of the user viewing the original video.
A target video generation module 430, configured to generate a target video according to the concentration degree and the original video, where the target video includes video data of the user when viewing the original video at the target concentration degree.
In an embodiment, the target video generating module 430 is specifically configured to: under the condition that the target concentration degree is a preset level concentration degree, acquiring the target starting and ending time of the video data played in the original video under the target concentration degree; and obtaining the target video according to the target start-stop time and the original video.
In an embodiment, the target video generating module 430 is specifically configured to obtain the target video by any one of: according to the target starting and stopping time, adding identification information corresponding to the target concentration degree in a progress bar of a recorded video to obtain the target video, wherein the recorded video is obtained by recording the original video, and the identification information comprises color information; and extracting video data corresponding to the target start-stop time from the original video to generate the target video.
In one embodiment, the concentration determination module 420 is specifically configured to: according to the behavior data, obtaining an eye gazing focus of the user; acquiring a video position area of the original video; determining the target concentration degree as a first level concentration degree if an offset value of the eye gaze focus and the video location area is greater than a preset offset threshold.
In one embodiment, the concentration determination module 420 is specifically configured to: acquiring unfocused time length of the user under the condition that the deviation value is larger than the deviation threshold value; and under the condition that the unfocused time length is greater than a preset time length threshold value, determining the target concentration degree as the first level concentration degree.
In one embodiment, the concentration determination module 420 is specifically configured to: obtaining facial feature information of the user according to the behavior data; obtaining emotion information of the user according to the facial feature information; determining the target concentration degree as a second level concentration degree under the condition that the emotion information comprises preset emotion information, wherein the preset emotion information comprises at least one of anxiety information and confusion information.
In one embodiment, the concentration determination module 420 is specifically configured to: acquiring attitude characteristic information of the user according to the behavior data; acquiring action information of the user according to the posture characteristic information; and determining the target concentration degree as a third level concentration degree under the condition that the action information comprises preset action information, wherein the preset action information comprises at least one of information representing shaking head, looking aside, eating and playing toys.
In one embodiment, the apparatus 400 further includes an interaction module, specifically configured to: receiving a target instruction, wherein the target instruction is generated by triggering preset video data in the original video; collecting gesture data of the user in response to the target instruction; and performing preset interactive operation with the user according to the attitude data, wherein the preset interactive operation comprises at least one of question answering operation and attitude correction operation.
The video generation apparatus 400 in the embodiment of the present application may be an apparatus, or may be a component, an integrated circuit, or a chip in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine, a self-service machine, and the like, and the embodiments of the present application are not particularly limited.
The video generation apparatus in the embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present application are not limited specifically.
The video generation device provided in the embodiment of the present application can implement each process implemented by the above method embodiment, and is not described here again to avoid repetition.
Corresponding to the foregoing embodiment, optionally, as shown in fig. 5, an electronic device 500 is further provided in this embodiment of the present application, and includes a processor 501, a memory 502, and a program or an instruction stored in the memory 502 and capable of being executed on the processor 501, where the program or the instruction is executed by the processor 501 to implement each process of the foregoing video generation method embodiment, and can achieve the same technical effect, and no further description is provided here to avoid repetition.
It should be noted that the electronic device in the embodiment of the present application includes the mobile electronic device and the non-mobile electronic device described above.
Fig. 6 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
The electronic device 1000 includes, but is not limited to: a radio frequency unit 1001, a network module 1002, an audio output unit 1003, an input unit 1004, a sensor 1005, a display unit 1006, a user input unit 1007, an interface unit 1008, a memory 1009, and a processor 1010.
Those skilled in the art will appreciate that the electronic device 1000 may further comprise a power source (e.g., a battery) for supplying power to various components, and the power source may be logically connected to the processor 1010 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. The electronic device structure shown in fig. 6 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is omitted here.
The processor 1010 is configured to collect behavior data of a user in a case that an original video is played, where the user is a user watching the original video; determining a target concentration degree of the user according to the behavior data, wherein the target concentration degree represents the concentration degree of the user for watching the original video; generating a target video according to the target concentration degree and the original video, wherein the target video comprises video data played in the original video under the target concentration degree.
In an embodiment, the processor 1010 is further configured to, if the target concentration degree is a preset level concentration degree, obtain a target start-stop time of video data played in the original video at the target concentration degree; and obtaining the target video according to the target start-stop time and the original video.
In one embodiment, processor 1010 is further configured to perform any of the following:
according to the target starting and stopping time, adding identification information corresponding to the target concentration degree in a progress bar of a recorded video to obtain the target video, wherein the recorded video is obtained by recording the original video, and the identification information comprises color information; and extracting video data corresponding to the target start-stop time from the original video to generate the target video.
In one embodiment, the processor 1010 is further configured to obtain an eye gaze focus of the user according to the behavior data; acquiring a video position area of the original video; determining the target concentration degree as a first level concentration degree if an offset value of the eye gaze focus and the video location area is greater than a preset offset threshold.
In one embodiment, the processor 1010 is further configured to obtain an unfocused duration of the user if the offset value is greater than the offset threshold; and under the condition that the unfocused time length is greater than a preset time length threshold value, determining the target concentration degree as the first level concentration degree.
In one embodiment, the processor 1010 is further configured to obtain facial feature information of the user according to the behavior data; obtaining emotion information of the user according to the facial feature information; determining the target concentration degree as a second level concentration degree under the condition that the emotion information comprises preset emotion information, wherein the preset emotion information comprises at least one of anxiety information and confusion information.
In one embodiment, the processor 1010 is further configured to obtain posture feature information of the user according to the behavior data; acquiring action information of the user according to the posture characteristic information; and determining the target concentration degree as a third level concentration degree under the condition that the action information comprises preset action information, wherein the preset action information comprises at least one of information representing shaking head, looking aside, eating and playing toys.
In one embodiment, the processor 1010 is further configured to receive a target instruction, where the target instruction is triggered and generated by preset video data in the original video; collecting gesture data of the user in response to the target instruction; and performing preset interactive operation with the user according to the attitude data, wherein the preset interactive operation comprises at least one of question answering operation and attitude correction operation.
It should be understood that in the embodiment of the present application, the input unit 1004 may include a Graphics Processing Unit (GPU) 10041 and a microphone 10042, and the graphics processing unit 10041 processes image data of still pictures or videos obtained by an image capturing device (such as a camera) in a video capturing mode or an image capturing mode. The display unit 1006 may include a display panel 10061, and the display panel 10061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1007 includes a touch panel 10071 and other input devices 10072. The touch panel 10071 is also referred to as a touch screen. The touch panel 10071 may include two parts, a touch detection device and a touch controller. Other input devices 10072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein. The memory 1009 may be used to store software programs as well as various data, including but not limited to application programs and operating systems. Processor 1010 may integrate an application processor that handles primarily operating systems, user interfaces, applications, etc. and a modem processor that handles primarily wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 1010.
The embodiments of the present application further provide a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the video processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like.
The embodiment of the application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, the processor is configured to run a program or an instruction, implement each process of the above comment displaying method embodiment, and achieve the same technical effect, and the details are not repeated here to avoid repetition.
It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.
While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.