CN120687631A - Audio program content playback control method, device, equipment and storage medium - Google Patents
Audio program content playback control method, device, equipment and storage mediumInfo
- Publication number
- CN120687631A CN120687631A CN202510803797.7A CN202510803797A CN120687631A CN 120687631 A CN120687631 A CN 120687631A CN 202510803797 A CN202510803797 A CN 202510803797A CN 120687631 A CN120687631 A CN 120687631A
- Authority
- CN
- China
- Prior art keywords
- content
- listening
- audio program
- continuous
- playing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/683—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3343—Query execution using phonetics
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Databases & Information Systems (AREA)
- Library & Information Science (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Indexing, Searching, Synchronizing, And The Amount Of Synchronization Travel Of Record Carriers (AREA)
- Electrically Operated Instructional Devices (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The application relates to the technical field of computers, in particular to a method, a device, equipment and a storage medium for controlling playing of audio program content, which are used for improving the listening efficiency of the audio program content. The method comprises the steps of responding to a pause operation triggered by aiming at target audio program contents, pausing playing the target audio program contents, responding to a resume operation triggered by aiming at the target audio program contents, displaying a continuous listening control area in a playing control interface, and playing continuous listening summary contents corresponding to the target audio program contents, wherein the continuous listening summary contents are summary information generated for audio contents corresponding to played parts in the target audio program contents, the review time corresponding to the continuous listening summary contents is determined according to at least one of historical listening behaviors of objects and content difficulty levels corresponding to the target audio program contents, and continuing playing non-played parts in the target audio program contents from the current pause position after the continuous listening summary contents are played.
Description
The application is a divisional application, the application number of the original application is 202110541007.4, the original application date is 2021, 05 and 18, and the original application name is 'play control method, device, equipment and storage medium of audio program content', and the whole content of the original application is incorporated by reference.
Technical Field
The present application relates to the field of computer technologies, and in particular, to the field of machine learning technologies, and provides a method, an apparatus, a device, and a storage medium for controlling playing of audio program content.
Background
Against the rapid development of internet technology, various social software has also been developed. Wherein the audio program content sharing platform is also becoming more and more interesting and liked by more people. Podcast platforms are a common audio program content sharing platform, and many netfriends like to record and share audio programs through the podcast platform, or to listen to audio programs shared by other users, including audio novels, commentary, voices, talk shows, and the like.
However, in the related art, when playing audio program content, if a user pauses the program for a long time and does not listen to the program for a long time, when listening again, listening again at the position of last pause directly and continuously, the user cannot well link up the forgotten foreground stories, so that the user thinking cannot quickly keep up with the story line and narrative rhythm of the subsequent paragraph of the program, and forgets about the listened content to different degrees.
Disclosure of Invention
The embodiment of the application provides a play control method, device and equipment of audio program content and a storage medium, which are used for improving the listening efficiency of the audio program content.
The first audio program content playing control method provided by the embodiment of the application comprises the following steps:
In the playing process of a target audio program, responding to a pause operation triggered for the target audio program content, and pausing the playing of the target audio program content;
responding to a recovery operation triggered by aiming at the target audio program content, displaying a continuous listening control area in a playing control interface, and playing continuous listening summary content corresponding to the target audio program content, wherein the continuous listening summary content is summary information generated aiming at audio content corresponding to a played part in the target audio program content, and the review time corresponding to the continuous listening summary content is determined according to at least one of historical listening behavior of an object and content difficulty level corresponding to the target audio program content;
and after the playback of the continuous listening summary content is finished, continuing to play the unreflected part in the target audio program content from the current pause position.
The second method for controlling playing of audio program content provided by the embodiment of the application comprises the following steps:
In the playing process of the target audio program, a pause request and a resume request which are sent by a client and aim at the target audio program content are received, and then the reviewing time length is determined according to at least one of the historical listening behavior of the object and the content difficulty level corresponding to the target audio program content;
generating continuous listening summary content which is summary information of the played part based on the review duration according to the audio content corresponding to the played part in the target audio program content;
And feeding back the continuous-listening summary content to the client so that the client displays a continuous-listening control area in a playing control interface, plays the continuous-listening summary content, and continuously plays the unreflected part in the target audio program content from the current pause position after the continuous-listening summary content is played.
The first audio program content playing control device provided by the embodiment of the application comprises:
A pause unit configured to pause playing of a target audio program content in response to a pause operation triggered for the target audio program content during playing of the target audio program;
The continuous playing unit is used for responding to the resume operation triggered by the target audio program content, displaying a continuous listening control area in a playing control interface and playing continuous listening summary content corresponding to the target audio program content, wherein the review duration corresponding to the continuous listening summary content is determined according to at least one of the historical listening behavior of an object and the content difficulty level corresponding to the target audio program content, the continuous listening playing control area is used for controlling the playing state of the continuous listening summary content, and after the continuous listening summary content is played, the playing of the unremitted part in the target audio program content is continued from the current pause position.
Optionally, the listening control area includes a summary control, and the continuous playing unit is further configured to:
and before the playback of the continuous-listening summary content is finished, if the closing operation triggered by the summary control is responded, closing the playback of the continuous-listening summary content, and continuing to play the audio content corresponding to the unreflected part in the target audio program content.
Optionally, the continuous broadcasting unit is configured to:
responding to a recovery operation triggered by aiming at the target audio program content, displaying a continuous listening control area containing continuous listening prompt information in a playing control interface so as to prompt that an object is currently intelligently continuous listening, and playing continuous listening summary content corresponding to the target audio program content;
And, the continuous broadcasting unit is further configured to:
and the continuous listening control area is not displayed in the play control interface.
Optionally, the apparatus further includes:
The setting unit is used for setting the continuous playing authority aiming at the target object to start or close an intelligent continuous playing mode before the continuous playing unit responds to the resume operation triggered aiming at the target audio program content, displays a continuous playing control area in a playing control interface and plays continuous playing summary content corresponding to the target audio program content, and responds to the setting operation aiming at the continuous playing authority control in an authority setting interface;
And sending the corresponding continuous broadcasting permission information to a server so that the server can store the continuous broadcasting permission information and the identification information of the target object in a correlated way.
Optionally, the continuous playing unit is further configured to:
Responding to a recovery operation triggered by aiming at the target audio program content, and if the target object is determined to have the continuous playing authority according to the continuous playing authority information associated with the target object, determining that the target object is in an intelligent continuous listening mode currently;
and displaying a continuous listening control area in a playing control interface, and playing continuous listening summary content corresponding to the target audio program content.
Optionally, the follow-up unit is further configured to determine the follow-up summary content by:
Selecting a section of audio content from the audio content corresponding to the played part based on the reviewing time length, and taking the section of audio content as audio content to be reviewed;
Converting the audio content to be reviewed into text information, and generating summary content text for the text information based on a text summarization technology;
and converting the summary content text into audio according to the key sound characteristics in the audio content to be reviewed, so as to obtain the continuous listening summary content.
Optionally, the continuous broadcasting unit is specifically configured to:
determining a review duration corresponding to the continuous listening summary content based on a time interval between a pause time corresponding to the pause operation and a continuous listening time corresponding to the resume operation and a played duration corresponding to a played part in the target audio program content;
wherein the review duration is positively correlated with both the time interval and the broadcast duration.
Optionally, the continuous broadcasting unit is specifically configured to:
Determining a corresponding first review duration based on a time interval between the pause operation and the resume operation and a played duration corresponding to a played part in the target audio program content, wherein the first review duration is positively correlated with the time interval and the played duration;
determining a corresponding second review duration based on a content difficulty level corresponding to the target program content, wherein the second review duration is positively correlated with the content difficulty level;
And taking the sum of the first reviewing time length and the second reviewing time length as the corresponding reviewing time length.
Optionally, the continuous broadcasting unit is specifically configured to:
If the target audio program content contains sounds of a plurality of objects, determining the highest duty ratio sound by extracting characteristics of the sounds of the plurality of objects;
And converting the summary content text into audio based on the highest duty ratio sound to obtain the continuous listening summary content.
The second audio program content playing control device provided by the embodiment of the application comprises:
the determining unit is used for receiving a pause request and a resume request which are sent by the client and aim at the target audio program content in the playing process of the target audio program, and determining the reviewing time length according to at least one of the historical listening behavior of the object and the content difficulty level corresponding to the target audio program content;
The generation unit is used for generating continuous listening summary content which is summary information of the played part according to the audio content corresponding to the played part in the target audio program content based on the review time length;
And the feedback unit is used for feeding back the continuous-listening summary content to the client so that the client displays a continuous-listening control area in a playing control interface, plays the continuous-listening summary content, and starts to play the unreflected part in the target audio program content from the current pause position after the continuous-listening summary content is played.
Optionally, the apparatus further includes:
The judging unit is used for determining that the target audio program content meets at least one of the following target conditions before the generating unit generates continuous listening summary content according to the audio content corresponding to the played part in the target audio program content based on the review time length:
the played duration corresponding to the played part in the target audio program content is not less than a first time duration threshold;
And the time interval between the pause time and the continuous listening time corresponding to the target audio program content is not smaller than a second duration threshold.
Optionally, the generating unit is specifically configured to:
Selecting a section of audio content from the audio content corresponding to the played part based on the reviewing time length, and taking the section of audio content as audio content to be reviewed;
Converting the audio content to be reviewed into text information, and generating summary content text for the text information based on a text summarization technology;
and converting the summary content text into audio according to the key sound characteristics in the audio content to be reviewed, so as to obtain the continuous listening summary content.
Optionally, the determining unit is specifically configured to:
determining a review duration corresponding to the continuous listening summary content based on a time interval between a pause time corresponding to the pause operation and a continuous listening time corresponding to the resume operation and a played duration corresponding to a played part in the target audio program content;
wherein the review duration is positively correlated with both the time interval and the broadcast duration.
Optionally, the determining unit is specifically configured to:
if the time interval is not greater than a preset interval threshold, determining the review time length according to the broadcasted time length, wherein the review time length and the broadcasted time length are positively correlated;
And if the time interval is greater than a preset interval threshold, determining the review time interval according to the broadcasted time interval and the time interval, wherein the review time interval is positively correlated with both the time interval and the broadcasted time interval.
Optionally, the determining unit is specifically configured to:
If the time interval is not greater than a preset interval threshold, taking the product of the broadcast duration and a first preset proportional value as the review duration;
If the time interval is greater than a preset interval threshold, increasing a first preset proportional value by a first set step length when the time interval is increased by a set time length, obtaining a first proportional value, and taking the product of the played time length and the first proportional value as the review time length.
Optionally, the determining unit is specifically configured to:
Determining a corresponding first review duration based on a time interval between the pause operation and the resume operation and a played duration corresponding to a played part in the target audio program content, wherein the first review duration is positively correlated with the time interval and the played duration;
determining a corresponding second review duration based on a content difficulty level corresponding to the target program content, wherein the second review duration is positively correlated with the content difficulty level;
And taking the sum of the first reviewing time length and the second reviewing time length as the corresponding reviewing time length.
Optionally, the determining unit is specifically configured to:
If the content difficulty level is not greater than a preset level threshold, determining the second review time length according to the broadcast time length, wherein the review time length and the broadcast time length are positively correlated;
And if the content difficulty level is greater than a preset level threshold, determining the second review time according to the broadcast time and the content difficulty level, wherein the review time is positively correlated with the broadcast time and the content difficulty level.
Optionally, the determining unit is specifically configured to:
If the time interval is not greater than a preset interval threshold, taking the product of the broadcast duration and a second preset proportional value as the first review duration;
If the time interval is greater than a preset interval threshold, increasing the second preset proportional value by a second set step length when the time interval is increased by a set time length, obtaining a second proportional value, and taking the product of the played time length and the second proportional value as the first review time length.
Optionally, the feedback unit is specifically configured to:
If the content difficulty level is not greater than a preset level threshold, taking the product of the played duration and a third preset proportional value as the second review duration;
If the content difficulty level is greater than a preset level threshold, increasing the third preset proportional value by a third set step length to obtain a third proportional value, and taking the product of the played duration and the third proportional value as the second review duration when the content difficulty level is increased by a set level.
Optionally, the generating unit is specifically configured to:
If the target audio program content contains sounds of a plurality of objects, determining the highest duty ratio sound by extracting characteristics of the sounds of the plurality of objects;
And converting the summary content text into audio based on the highest duty ratio sound to obtain the continuous listening summary content.
Optionally, the apparatus further includes:
The system comprises a client, an association unit, a target object identification information and an intelligent continuous listening mode, wherein the client is used for receiving a setting request for the continuous playing permission control in the permission setting interface sent by the client, acquiring continuous playing permission information associated with the target object, and associating and storing the continuous playing permission information with the target object identification information, the continuous playing permission control is used for starting or closing the intelligent continuous listening mode, and the intelligent continuous listening mode is an intelligent playing control function for playing customized continuous listening summary content according to the operation of the object after the playing of the audio program content is paused.
The electronic device provided by the embodiment of the application comprises a processor and a memory, wherein the memory stores program codes, and when the program codes are executed by the processor, the processor is caused to execute the steps of any one of the playing control methods of the audio program contents.
Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the electronic device to perform the steps of any one of the above-described playback control methods of audio program content.
An embodiment of the present application provides a computer-readable storage medium including program code for causing an electronic device to execute the steps of any one of the above-described playback control methods of audio program content, when the program product is run on the electronic device.
The application has the following beneficial effects:
the embodiment of the application provides a play control method, a device, equipment and a storage medium of audio program content. Firstly, the method can dynamically adjust the reviewing time length and the generating mode of the summary content according to the actual listening situation (such as historical behaviors, program content difficulty and the like) of the user, and personalized experience is improved. Compared with the traditional single choice of 'replay from head' or 'direct replay', the application provides a more natural and more efficient content linking mode for users.
And secondly, the continuous listening summary content is presented in an audio form, so that users do not need to actively read or search information, and the original immersive experience of the audio program is maintained. The mechanism is particularly suitable for audio programs with complex content and strong logic, such as knowledge podcasts, audio books, lecture courses and the like, helps users to quickly wake up memory and understand contexts, effectively reduces repeated playing behaviors caused by forgetting, and improves overall listening efficiency and content absorptivity.
In addition, a continuous listening control area is provided in the playing control interface, so that the control capability of a user on the playing state is further enhanced, the user can select to skip, pause or replay the summary content according to the self requirement, and the interaction flexibility and the use convenience are improved.
In summary, the application supports the intelligent generation of the review summary when the user clicks the continuous listening audio, and converts the review summary into the audio for playing, so that the user can be helped to review the core thought of the previously listened program content, further carry out the acceptance with the continuous listening content, enhance the understanding of the user on the program content, reduce the repeated replay condition of the user due to forgetting the audio program content which was listened, and improve the listening efficiency of the audio program content.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
fig. 1 is a schematic diagram of an audio program continuous playing method in the related art;
FIG. 2 is an alternative schematic diagram of an application scenario in an embodiment of the present application;
FIG. 3 is a flowchart illustrating a first method for controlling playback of audio program content according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a playback control interface according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a playback control interface according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a rights setting interface in an embodiment of the application;
fig. 7 is a schematic diagram of an audio program resumption method according to an embodiment of the present application;
FIG. 8 is a flowchart illustrating a second method for controlling playback of audio program content according to an embodiment of the present application;
FIG. 9 is a flow chart of a method for generating follow-up summary content in an embodiment of the application;
FIG. 10 is a schematic diagram of a language identification process according to an embodiment of the present application;
FIG. 11 is a schematic diagram of a model structure according to an embodiment of the present application;
FIG. 12A is a schematic diagram of a parametric French speech synthesis flow in an embodiment of the present application;
FIG. 12B is a schematic flow chart of a text analysis in an embodiment of the application;
FIG. 13A is a flowchart of a method for implementing audio program content playback control based on a client and a server in an embodiment of the present application;
FIG. 13B is a timing diagram illustrating interactions between a client and a server according to an embodiment of the present application;
fig. 14 is a schematic diagram of the composition structure of a playback control device for a first audio program according to an embodiment of the present application;
Fig. 15 is a schematic diagram of a composition structure of a playback control apparatus for a second audio program content according to an embodiment of the present application;
FIG. 16 is a schematic diagram showing a hardware configuration of an electronic device to which the embodiment of the present application is applied;
fig. 17 is a schematic diagram of a hardware composition structure of another electronic device to which the embodiment of the present application is applied.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the technical solutions of the present application, but not all embodiments. All other embodiments, based on the embodiments described in the present document, which can be obtained by a person skilled in the art without any creative effort, are within the scope of protection of the technical solutions of the present application.
Some of the concepts involved in the embodiments of the present application are described below.
Audio products the audio products encompassed by the present application include, but are not limited to, audio books, podcasts, and the like that are delivered with content via audio-only interactive forms.
And the program refers to unit content transmitted in an interactive form of voice broadcasting in the audio product.
Audio program content in the embodiment of the present application refers to audio programs shared on instant messaging software or podcast platforms, such as audio novels, voices, comments, talk shows, radio stations, etc. Where voiced speech is referred to as a generic audio program content file. The playing speed can be adjusted when the player plays in the player, and the playing stop time can be automatically remembered so as to be convenient for reading. The audio program content in the embodiment of the application may refer to audio content containing audio data (refer to text obtained through speech recognition, and not pure music).
Listening is also called playing, which means that when a user listens to a program, the user clicks playing again to continue listening after listening is interrupted.
A Client (Client), or Client, refers to a program corresponding to a server that provides local services to a Client. Except for some applications that only run locally, they are typically installed on a common client and need to run in conjunction with the server. After development of the internet, more commonly used clients include web browsers such as those used by the world wide web, email clients when receiving email, and client software for instant messaging. For this type of application, there is a need for a corresponding server and service program in the network to provide corresponding services, such as database service, email service, etc., so that a specific communication connection needs to be established between the client and the server to ensure the normal operation of the application.
The application operation interface is the medium for interaction and information exchange between the application system and the user, and can realize the conversion between the internal form of information and the acceptable form of human beings, so that the user can conveniently and effectively operate the application to achieve bidirectional interaction, and the work expected to be completed by the application is completed. In the embodiment of the application, the application operation interface comprises a man-machine interaction and graphic user interface, and the specific application operation interface comprises a permission setting interface, a play control interface and the like. The different application operation interfaces are used for displaying different contents to the user, so that different information interaction between the user and the application is realized.
The audio program content sharing platform is one of digital broadcasting technology, and may be used in recording audio program content of network broadcast and similar network audio program, and network friends may download the network broadcast program to their own player for personal listening without sitting in front of computer and listening in real time to enjoy freedom. In addition, the user can also manufacture the audio program by himself and upload the audio program to the internet through the podcast platform to share with the vast net friends. It is understood as a client playing audio program content, video. Currently there are many applications for audio program content sharing platforms, such as podcasts.
The playing control interfaces are used for controlling the playing of the audio program content, one or more playing control interfaces are arranged on one audio program content sharing platform according to the need, and the playing control interfaces jump with set logic. In the embodiment of the application, the play control interface mainly refers to a page for performing play control on the audio program content, and comprises a follow-up play control area, wherein the follow-up play control area is mainly used for controlling the play state of follow-up summary content.
Speech synthesis technology (TTS) is a technology that produces artificial Speech by mechanical, electronic means. TTS technology (also known as text-to-speech technology) is a technology that converts text information generated by a computer itself or input externally into intelligible and fluent spoken chinese language output.
Artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, perceives the environment, obtains knowledge, and uses the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Machine learning (MACHINE LEARNING, ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.
With research and advancement of artificial intelligence technology, research and application of artificial intelligence technology is being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, automatic driving, unmanned aerial vehicles, robots, smart medical treatment, smart customer service, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and with increasing importance value.
The scheme provided by the embodiment of the application relates to artificial intelligence, machine learning and other technologies. The method for generating the acoustic model, the language model, the neural network model and the like provided by the embodiment of the application can be divided into two parts, namely a training part and an application part, wherein the training part relates to the technical field of machine learning, the model is trained through the machine learning technology in the training part, model parameters are continuously adjusted through an optimization algorithm, the application part is used for carrying out voice recognition by using the acoustic model, the voice model and the like which are obtained through training in the training part, and the generated neural network model which is obtained through training in the training part is used for generating summary content and the like.
The following briefly describes the design concept of the embodiment of the present application:
When voice is used as a unique input channel, the receiving efficiency of the user on the information is far lower than that of the multi-mode interactive input modes such as voice, vision, touch feeling and the like. In audio programs, podcast programs are mostly 1 to 3 hours, and audio reading material programs are longer than tens of hours, so that most users cannot listen to the audio programs completely at one time. When the user does not listen to the program continuously for a long time, the direct listening at the position of last pause can cause that the user can not be well connected with the forgotten foreground story, and the user thinking can not be fast caught up with the story line and the narrative rhythm of the subsequent paragraph of the program.
That is, the audio product experience in the related art is that when the user pauses the listening in the middle of the listening process and clicks the play button again, the listening starts from the last paused position, as shown in fig. 1, which is a schematic diagram of a method for playing an audio program in the related art. But when the user pauses the program for a long time, different degrees of forgetfulness are created for the listened content.
In view of this, the embodiments of the present application provide a method, apparatus, device and storage medium for controlling playing of audio program content. The application supports the intelligent generation of the review summary when the user clicks the continuous listening audio, converts the review summary into the audio to play, can help the user review the core thought of the previously listened program content, further realizes the acceptance with the continuous listening content, enhances the understanding of the user on the program content, reduces the repeated replay condition of the user due to forgetting the audio program content which is listened, and improves the listening efficiency of the audio program content.
The preferred embodiments of the present application will be described below with reference to the accompanying drawings of the specification, it being understood that the preferred embodiments described herein are for illustration and explanation only, and not for limitation of the present application, and embodiments of the present application and features of the embodiments may be combined with each other without conflict.
Fig. 2 is a schematic diagram of an application scenario according to an embodiment of the present application. The application scenario diagram includes two terminal devices 210 and a server 230, and the relevant interface 220 for executing the target service can be logged in through the terminal devices 210. Communication between the terminal device 210 and the server 230 may be through a communication network.
In the embodiment of the present application, the interface 220 may be a play control interface, a rights setting interface, etc. The user may log on to the interface 220 through the terminal device 210, the terminal device 210 responds to the user's operation triggered by the interface 220 and sends a related request to the server 230, and the server 230 feeds back related information to the terminal device, etc. For example, in response to a resume operation triggered for the target audio program content, the terminal device 210 sends a resume request to the server 230, and the server 230 generates a follow-up summary content based on the request, and feeds back to the terminal device 210, where the follow-up summary content corresponding to the target audio program content is displayed in the play control interface by the terminal device 210, and the follow-up summary content corresponding to the target audio program content is played, which will not be described in detail herein.
In an alternative embodiment, the communication network is a wired network or a wireless network.
In the embodiment of the present application, the terminal device 210 is an electronic device used by a user, and the electronic device may be a computer device having a certain computing capability, such as a personal computer, a mobile phone, a tablet computer, a notebook, an electronic book reader, etc., and running instant messaging software and a website or social software and a website. Each terminal device 210 is connected to a server 230 through a wireless network, where the server 230 is a server cluster or cloud computing center formed by one server or several servers, or is a virtualization platform.
In the embodiment of the present application, the terminal device 210 is provided with a client related to the audio program content, and the client may be software, for example, instant messaging software, podcast software, applet, web page, etc., which is not limited herein. Correspondingly, the server is a server corresponding to software, web pages, applets and the like.
The user can search and play the audio program content which is liked to be listened to directly through podcast software, and can also listen to the audio program content which is shared by friends in instant messaging software and the like, or search or listen to the audio program content in public numbers, applets and the like. It should be noted that, in the embodiment of the present application, the audio program content refers to audio recorded by a user, for example, the user speaks each chapter in a novel, and records a corresponding audio file, and then the user shares the recorded audio file to a podcast platform for listening, that is, listening. The audio program content in the scene refers to audio recorded by a user, and the user listening to the audio program content can play continuous listening summary content corresponding to the audio program content in a play control interface by the method in the embodiment of the application, wherein the continuous listening summary content is summary information generated by a client or a server for the audio content corresponding to the played part in the audio program content uploaded by the user. In addition, the words may be audio, comments, etc. recorded by the user, and are not particularly limited herein.
It should be noted that, the number of terminal devices and servers shown in fig. 2 is merely illustrative, and the number of terminal devices and servers is not limited in practice, and is not particularly limited in the embodiment of the present application.
In the embodiment of the application, when the number of the servers is a plurality of, the plurality of servers can be formed into a blockchain, and the servers are nodes on the blockchain, the playing control method of the audio program content disclosed by the embodiment of the application can save related data on the blockchain, such as pause time, resume time, content difficulty level, review time, played time and the like.
In addition, the embodiment of the application can be applied to various scenes, including but not limited to cloud technology, artificial intelligence, intelligent transportation, auxiliary driving and other scenes.
It is further emphasized that in particular embodiments of the present application, data relating to objects, such as historical listening behavior, other operational behavior, etc., as exemplified herein, is referred to. When the above embodiments of the present application are applied to specific products or technologies, subject permissions or agreements need to be obtained, and the collection, use and processing of relevant data need to comply with relevant laws and regulations and standards of the relevant countries and regions.
Referring to fig. 3, a flowchart of an implementation of a first method for controlling playing of audio program content according to an embodiment of the present application is applied to a terminal device, which may specifically be a client in the terminal device, and the specific implementation flow of the method is as follows S31 to S33:
S31, in the playing process of the target audio program, the terminal equipment responds to a pause operation triggered by the target audio program content to pause the playing of the target audio program content;
S32, the terminal equipment responds to a recovery operation triggered by aiming at the target audio program content, displays a continuous listening control area in a playing control interface and plays continuous listening summary content corresponding to the target audio program content, wherein the continuous listening summary content is summary information generated by aiming at audio content corresponding to a played part in the target audio program content, and the review duration corresponding to the continuous listening summary content is determined according to at least one of historical listening behavior of an object and content difficulty level corresponding to the target audio program content;
In the embodiment of the application, the historical listening behavior of the object refers to the behavior characteristics of the object in the process of listening to the audio program content in the past, such as the listening time, the pause operation related time, the resume operation related time and the like, and the application can intelligently judge the understanding and the memorizing degree of the played content by the user according to the data so as to adaptively adjust the review time, while the content difficulty level is the level of the audio program content after being evaluated based on the factors such as the semantic complexity, the professional term density, the language speed and the like, and the high difficulty content generally needs longer review time to help the user to better connect the context. The review duration determined by the mode can be more fit with the actual demands of users, and the effectiveness of information review is improved.
In the embodiment of the application, the continuous listening playing control area is used for controlling the playing state of continuous listening summary content, such as skip, double speed and the like, so that a user can flexibly control the review process according to the self understanding condition, and the interactive experience and the content absorption effect are enhanced.
And S33, after the playback of the follow-up summary content is finished, the terminal equipment continues to play the unreflected part in the target audio program content from the current pause position.
Fig. 4 is a schematic diagram of a playback control interface according to an embodiment of the present application, where both interface 41 and interface 42 are playback control interfaces. The user may trigger a pause operation or a resume operation for the audio program content by clicking on the pause/play control S410 in the interface S41. When the control is in the state shown by interface 41, it indicates that the playing of the target audio program content is currently paused. When the control is in the state shown by interface 42, it indicates that the playback of the target audio program content is currently resumed.
Optionally, after the terminal device responds to the recovery operation triggered by the target audio program content, a continuous listening control area containing continuous listening prompt information can be displayed in the playing control interface, so that the object is prompted to be intelligently continuous listening at present, and continuous listening summary content corresponding to the target audio program content is played.
Still taking fig. 4 as an example, as shown in the dashed box S420 in the interface 42 in fig. 4, the continuous listening control area in the embodiment of the present application is displayed with "intelligent continuous listening in progress", that is, a continuous listening prompt message in the embodiment of the present application, and simultaneously, continuous listening summary content is played, where the continuous listening summary content is generated based on the played portion (the portion before 22:22). Thus, the user can intuitively know that the user is in an intelligent continuous listening state at present, and the perception and control capability of the playing process are improved. Meanwhile, by combining the play of continuous listening summary content, a user is helped to quickly review key information, content continuity understanding is enhanced, and listening experience and learning efficiency of the whole audio program are improved.
After the continuous listening summary content is played, the target audio program content can be normally played continuously.
Optionally, after the terminal device responds to the closing operation triggered by the summary control, the continuous listening control area may not be displayed in the playing control interface, except for closing the playing of the continuous listening summary content and continuing to play the audio content corresponding to the unplayed part in the target audio program content.
Fig. 5 is a schematic diagram of a playback control interface according to another embodiment of the present application, which shows that after playback of the follow-up summary content is completed, the follow-up control area S420 is not displayed any more, and the playback of the target audio program content is continued. Therefore, the simplicity of the playing interface can be kept, unnecessary information interference is avoided, and meanwhile, the user is more focused on the main audio content currently being played, so that the overall operation smoothness and the user experience are improved.
In the embodiment, the user is supported to intelligently generate the review summary when clicking the continuous listening audio and convert the review summary into the audio to play, so that the user can be helped to review the core thought of the previously listened program content, further the continuous listening content is accepted, the understanding of the user on the program content is enhanced, the situation that the user repeatedly replays the audio program content which is listened by the user is reduced, the listening efficiency of the audio program content is improved, and the user experience of audio products is optimized.
In an alternative embodiment, the follow-up control area includes a summary control. As shown in interface 42 in fig. 4, the "skip" in S420 is a summary control in the embodiment of the present application. The user can end the playback of the listen-through summary content by clicking "skip".
Specifically, before the playback of the continuous-listening summary content is finished, if the user clicks "skip", the terminal device closes the playback of the continuous-listening summary content in response to a closing operation triggered by the summary control, and continues to play the audio content corresponding to the unplayed portion in the target audio program content, and displays the playback control interface as shown in fig. 5, without displaying the continuous-listening control area, directly skips the playback of the continuous-listening summary content, and continues to play the target audio program content, that is, continues to play from 22:22.
In the above embodiment, the user may skip the playback of the continuous-listening summary content based on the summary control, and may adjust the playback speed of the continuous-listening summary content by a speed doubling or the like, thereby improving the playback efficiency of the audio program content.
Optionally, the embodiment of the application also supports the user to start the intelligent continuous listening function when clicking the continuous listening audio, and after clicking the continuous listening, the continuous listening control area is displayed in the playing control interface to play the continuous listening summary content under the condition that the user starts the function.
For example, fig. 6 shows a rights setting interface in an embodiment of the present application, in which a continuous play rights control is shown in a dashed box S60. The user can turn "smart listen" on or off by clicking on the control. The current diagram of fig. 6 shows the "smart listen" on.
When the user clicks the follow-up playing permission control shown in fig. 6 to start intelligent follow-up listening, the terminal equipment responds to the setting operation of the follow-up playing permission control in the permission setting interface to set the follow-up playing permission for the target object, and sends corresponding follow-up playing permission information to the server so that the server can store the follow-up playing permission information in association with the identification information of the target object.
In the embodiment of the application, when the continuous playing permission control is set, the user can start or close the intelligent continuous listening. Thus, in an alternative embodiment, when the terminal device responds to the recovery operation triggered by the target object (refer to the user or the user account) for the target audio program content, it is further required to determine whether the target object has the continuous playing authority, that is, whether the object is currently in the intelligent continuous listening mode or in the intelligent continuous listening mode (that is, not opened). When the user opens the intelligent continuous listening, the user has the continuous playing authority, otherwise, the user does not have the continuous playing authority.
Specifically, if it is determined that the target object has the resume permission according to resume permission information associated with the target object, a resume control area is displayed in the play control interface, and resume summary content corresponding to the target audio program content is played.
The continuous broadcasting authority information associated with the target object can be stored locally in the terminal equipment when the user sets the authority, or can be requested by the terminal equipment to the server and returned by the server.
In the embodiment, by adding the intelligent continuous listening switch, a user can enjoy the function of intelligently generating the review summary after the intelligent continuous listening switch is turned on, and the user experience of the product can be effectively improved.
Fig. 7 is a schematic diagram of a method for continuing to play an audio program according to an embodiment of the application. Compared with the schematic diagram of the audio program continuous playing method in the related art shown in fig. 1, the application sets the intelligent continuous listening function, and when the playing button is clicked after the program playing is paused under the condition of starting the intelligent continuous listening function, the intelligent continuous playing is not directly performed at the pause position, the intelligent review content audio (namely continuous listening summary content) is generated, and the review content is played, and further, the continuous playing is performed at the pause position, so that the user is prevented from forgetting the listened content after the program is paused for a long time.
Referring to fig. 8, a flowchart of an implementation of a second method for controlling playing of audio program content according to an embodiment of the present application is applied to a server, and the specific implementation of the method is as follows:
s81, the server receives a pause request and a resume request which are sent by the client and aim at the target audio program content in the playing process of the target audio program, and determines the reviewing time length according to at least one of the historical listening behavior of the object and the content difficulty level corresponding to the target audio program content;
s82, the server generates continuous listening summary content based on the review time length according to the audio content corresponding to the played part in the target audio program content, wherein the continuous listening summary content is the summary information of the played part;
And S83, the server feeds back the continuous listening summary content to the client so that the client displays a continuous listening control area in a playing control interface, plays the continuous listening summary content, and starts to play the unreflected part in the target audio program content from the current pause position after the continuous listening summary content is played.
The client is installed on the terminal equipment, and the communication between the client and the server is the communication between the terminal equipment and the server.
In the embodiment of the application, when a user clicks to pause playing, the terminal responds to a pause operation triggered by aiming at the target audio program content to pause playing the target audio program content, and sends a pause request to the server, wherein the pause request can carry corresponding pause time. When the server receives the pause request, the pause time is recorded, for example, as T1. Similarly, when the user clicks to resume playing, the terminal resumes playing the target audio program content in response to a resume operation triggered for the target audio program content, and sends a resume request to the server, where the resume request may carry a corresponding resume time. When the server receives the resume request, the resume time is recorded, e.g., as T2. Further, the continuous-listening summary content is generated based on the pause time and the time interval y=t2-T1 between the continuous-listening times.
In an alternative embodiment, before the continuous listening summary content for the target audio program content is generated, it is further required to determine whether a condition for generating the continuous listening summary content is met, and if the condition is met, the continuous listening summary content for the target audio program content may be generated based on a pause time and a time interval between continuous listening times.
Specifically, the target conditions include at least one of:
the first condition is that the played duration corresponding to the played part in the target audio program content is not less than a first time threshold;
And secondly, the time interval between the pause time and the continuous listening time corresponding to the target audio program content is not smaller than a second duration threshold value.
That is, when the target audio program content satisfies at least one of the above-described target conditions, that is, the condition that the continuous listening summary content needs to be generated is satisfied.
Specifically, when the user starts the "intelligent continuous listening" function, if the user pauses listening in the middle of listening process, and then clicks the play button again, the client uploads the played duration corresponding to the target audio program content to the server, and stores the time of pausing playing and resuming playing of the same program by the user to the background, and uploads the time to the server as a judgment basis.
Assume that the first duration threshold is 2 minutes and the second duration threshold is 5 hours. The user pauses playing and resumes playing the same program, and the time information T1 and T2 are uploaded to the server, and the server calculates the time interval between T1 and T2. The content summary is not generated when the program has been played for less than 2 minutes, or is not generated when the listening time interval y is within 5 hours, or is not generated when the program has been played for less than 2 minutes and the listening time interval is within 5 hours.
It should be noted that the above is merely an example, and the content summary may be generated without the above determination, and the present invention is not limited to this.
For the case where the continuous listening summary content needs to be generated, an optional implementation manner may be that S82 is implemented according to a flowchart shown in fig. 9, which is a flowchart of a method for generating the continuous listening summary content in an embodiment of the present application, and includes the following steps S901 to S903:
S901, a server selects a section of audio content with the playing time length being the reviewing time length from the audio content corresponding to the played part based on the reviewing time length, and the section of audio content is used as the audio content to be reviewed;
S902, converting audio content to be reviewed into text information by a server, and generating summary content text aiming at the text information based on a text summary technology;
and S903, converting the summary content text into audio according to key sound characteristics in the audio content to be reviewed by the server to obtain continuous listening summary content.
That is, in step S901, a paragraph range that needs to be reviewed needs to be determined from the played audio content as audio content to be reviewed. And extracting the summary of the part of content to generate a summary content text, and converting the summary content text into audio to obtain the continuous listening summary content.
It should be noted that, the method for generating the follow-up summary content in the embodiment of the present application may be executed by a server alone, or may be executed by a terminal device alone, or may be executed by both the server and the terminal device together. That is, the continuous listening summary content in the embodiment of the present application may be generated only by the server side, may be generated only by the client side installed on the terminal device, or may be generated together based on the interaction between the server and the client.
The method comprises the steps of when the method is singly executed by terminal equipment, namely the terminal equipment determines a review time length according to at least one of the historical listening behavior of an object and the content difficulty level corresponding to target audio program content, selecting a section of audio content with the playing time length being the review time length from the audio content corresponding to a played part based on the review time length as audio content to be reviewed, converting the audio content to be reviewed into text information, generating a summary content text aiming at the text information based on a text summarization technology, and then converting the summary content text into audio according to key sound characteristics in the audio content to be reviewed to obtain continuous listening summary content.
For example, when executed together with the server by the terminal device, the audio content to be reviewed may be determined by the client and notified to the server by the terminal device, the follow-up summary content may be generated by the server based on the audio content to be reviewed, and the like, without being particularly limited thereto.
If the method for generating the continuous listening summary content shown in fig. 9 is executed by the terminal device alone, the client installed on the terminal device may also determine whether the condition for generating the continuous listening summary content is satisfied before generating the continuous listening summary content for the target audio program content based on the pause time and the time interval between the continuous listening times, and the specific determination process and the condition are referred to the above embodiments, and are not repeated.
Likewise, the methods listed below for determining the length of review and converting the summary content text to audio may be performed by the terminal device alone, or by both the server and the terminal device in addition to the server alone. The following is mainly exemplified by a case where the server alone performs.
The process of determining the length of review and converting summary content text to audio is described in detail below:
In the embodiment of the present application, for the case where the scope of the review paragraphs needs to be determined, the corresponding review duration may be determined through step S901 described above. Specifically, the review duration (also referred to as review paragraph range duration) is determined by at least one of the historical listening behavior (such as listening time interval) of the user, the content difficulty, and the like, denoted as A1, and the determination manner is as follows:
the first determination mode is determined only according to the historical listening behavior of the user.
In the method, the review duration corresponding to the continuous listening summary content can be determined based on the time interval between the pause time corresponding to the pause operation and the continuous listening time corresponding to the resume operation and the played duration corresponding to the played part in the target audio program content, wherein the review duration is positively correlated with the time interval and the played duration, namely, the larger the time interval is, the larger the review duration is, and similarly, the larger the played duration is, the larger the review duration is.
In an alternative embodiment, if the time interval is not greater than the preset interval threshold, it indicates that the time for resuming playing after the pause is shorter, and the memory of the played content is still clearer, where the review duration may be determined only according to the played duration, where the review duration is positively related to the played duration. Wherein the review duration is positively correlated with the broadcast duration. For example, the user can proportionally generate the retrospective content with the broadcast time length of 10 minutes and the retrospective time length is correspondingly increased to about 3 minutes if the broadcast time length is increased to 30 minutes so as to help the user to recall the key content more comprehensively. The design enables the length of the reviewed content to be adaptively matched with the listening progress of the user, and the information linking effect is improved.
If the time interval is greater than the preset interval threshold, the fact that the pause time of the user is longer is indicated, part of key content may be forgotten, more comprehensive review assistance understanding is needed, and at the moment, the review duration can be comprehensively determined according to the broadcast duration and the time interval, wherein the review duration is positively correlated with both the time interval and the broadcast duration. Wherein, the retrospective time length is positively correlated with the time interval and the broadcast time length. For example, the longer the time interval, the larger the interval that the user pauses and resumes playing, the higher the forgetting degree may be, so that the longer the reviewing time is needed to help the user link up the content, and meanwhile, the longer the played time is, the more the played content is, the information amount needed to be reviewed is increased, and the reviewing time is correspondingly prolonged. By comprehensively considering the two factors, the length of the reviewed content can be dynamically adjusted, so that the reviewed effect is more consistent with the actual memory state and content understanding requirements of the user.
Specifically, if the time interval is not greater than the preset interval threshold, the review time length is determined only according to the played time length, and the review time length is in positive correlation with the played time length, in order to avoid redundant interference to the user caused by the review process on the premise of ensuring content continuity, the product of the played time length and a first preset proportional value can be used as the review time length, if the time interval is greater than the preset interval threshold, the memory attenuation degree of the user is increased along with the increase of the time interval, the review strength is required to be enhanced to improve the content linking effect, and when the time interval is increased by the preset time length, the first preset proportional value is increased by a first set step length to obtain a first proportional value, and the product of the played time length and the first proportional value is used as the review time length. Therefore, the reviewing time length can be dynamically adjusted according to the actual memory attenuation condition of the user, and playing efficiency is considered while understanding effect is improved.
That is, it is first determined whether the listening time interval is greater than a preset interval threshold, assuming that the preset interval threshold is 5h. When the user listening time interval is less than or equal to 5h, the basic retrospective content can be determined to be 20% of the listened paragraph, and the first preset proportion value is 20%, and a1=the listened content duration (i.e. the played duration) is 20%.
That is, a1=x is 20% when y is equal to or less than 5. Where, the listened content duration=x, the two listening time interval=y, and the review paragraph range duration=a1.
Assuming that the set duration is 1h, the first set step size is 1%. When the user listening time interval is longer, greater than 5 hours, the review paragraph range increases accordingly. Every 1 hour time interval is increased, the retrospective time length is increased by 1%.
That is, when y >5, a1=x [ 20++ (y-5) 1% ], the first ratio value is 20++ (y-5) 1%.
And determining a second mode according to the historical listening behavior of the user and the content difficulty.
Compared with the first determination mode, namely when the duration is calculated, the difficulty of the program content is also considered. Specifically, in another alternative embodiment, the corresponding first review duration may be determined based on the time interval and the played duration corresponding to the played portion of the target audio program content, where the first review duration is positively correlated with both the time interval and the played duration, the corresponding second review duration may be determined based on the content difficulty level corresponding to the target program content, where the greater the content difficulty level, the longer the second review duration, and further, the sum of the first review duration and the second review duration is taken as the corresponding review duration.
When the corresponding first review duration is determined based on the time interval, the same manner as the first determination is not described herein. Such as:
if the time interval is not greater than the preset interval threshold, the recall time length is determined only according to the played time length, and the recall time length is in positive correlation with the played time length, in order to avoid redundant interference to the user caused by the recall process on the premise of ensuring content consistency, the product of the played time length and a second preset proportional value can be used as a first recall time length, if the time interval is greater than the preset interval threshold, the memory attenuation degree of the user is increased along with the increase of the time interval, the recall intensity is required to be enhanced to improve the content linking effect, and when the time interval is increased by the preset time length, the second preset proportional value is increased by a second set step length to obtain a second proportional value, and the product of the played time length and the second proportional value is used as the first recall time length.
For example, the preset interval threshold is 5h, the listened-to content duration=x, the two listening time interval=y, and the first review duration is a11.
Then, when y is less than or equal to 5, a11=x is 20% (wherein the second preset ratio value is 20%);
When the y is greater than 5, a11=x [20% + (y-5) 1% ] (wherein, the time period is set to be 1h, the second set step is 1%) and the second ratio is 20++ (y-5) 1%.
It should be noted that, in the embodiment of the present application, the first preset ratio value and the second preset ratio value may be the same or different, and are not specifically limited herein. Similarly, the first setting step size and the second setting step size may be the same or different, and are not particularly limited.
When determining the corresponding second review duration based on the content difficulty level corresponding to the target program content, the specific process is as follows:
if the content difficulty level is not greater than the preset level threshold, the content is indicated to be simpler or easy to understand, the memory and understanding of the played content by the user are relatively better, and the second review time length can be determined only according to the played time length, wherein the review time length is positively correlated with the played time length, and the relationship between the review time length and the played time length can be specifically referred to the above embodiment, and the repeated description is omitted.
If the content difficulty level is greater than the preset level threshold, the current audio program content is more complex or has stronger professional, the user may need longer time to review and understand the played content after resuming playing, and a second review time length can be determined according to the played time length and the content difficulty level, wherein the review time length is positively correlated with both the played time length and the content difficulty level. For example, when the user has played for 15 minutes, if the program content is a popular life podcast (with low difficulty level), the system can generate about 1.5 minutes of review content, and if the program content is a high-difficulty professional course (such as quantum physical explanation), the system can promote the review time according to the content difficulty level, for example, generate 3 minutes of review content. In this way, the system can dynamically adjust the reviewing time length according to the complexity degree of different contents, and help the user to better understand and link the audio contents.
Specifically, if the content difficulty level is not greater than the preset level threshold, the review duration is determined only according to the played duration, and the review duration is positively correlated with the played duration, in order to avoid redundant interference to the user caused by the review process on the premise of ensuring content continuity, the product of the played duration and a third preset proportional value can be used as a second review duration, if the content difficulty level is greater than the preset level threshold, the memory attenuation degree of the user is increased along with the increase of the time interval, the review strength is required to be enhanced to improve the content linking effect, at the moment, each time the set level is increased, the third preset proportional value is increased by a third set step length to obtain a third proportional value, and the product of the played duration and the third proportional value is used as the second review duration.
Assuming that the second review duration is a12, the preset level threshold is 1, the set level is 1, the third preset proportional value is 0%, and the third set step length is 5%, then:
When z=1, the number of times, a12=x 0% =0;
A12=x [ (z-1) ×5% ] when z > 1.
That is, the higher the content difficulty level, the longer the corresponding second review duration. If the content difficulty level is 1 level, the second review duration is +0%, and then every 1 level of difficulty is increased, the second review duration is increased by 5%.
It should be noted that, the foregoing is exemplified by the third preset ratio value being 0%, and the third preset ratio value is actually a non-negative number, and may be 1%,2%,3%, etc. besides the foregoing 0%, and specific values may be set according to actual situations, and are not limited herein.
In the embodiment of the application, the belonging field is judged according to the program label, and then the content difficulty level is determined according to the field. Of course, other ways of determining the content difficulty level are equally applicable, such as language complexity analysis based on audio text, such as vocabulary specialty, sentence structure complexity, etc., and comprehensive evaluation is performed in combination with user feedback data, such as user's listening completion rate, number of repeated playing, etc., and the present application is not limited herein. Referring to table 1, an example of a relationship between a content difficulty level and a domain in an embodiment of the present application is shown.
TABLE 1
The above is exemplified by three content difficulty levels, so that the second review time corresponding to the highest content difficulty level is increased by 10%.
A1 To sum up, a11+a12 can be expressed as:
When y is less than or equal to 5, a1=x [ 20++ (z-1) ×5% ]
That is, when y >5, a1=x [ 20++ (y-5) 1++ (z-1) 5% ].
Note that if the final review period A1 exceeds 100% of the period of the listened-to content of the program, it is denoted as a1=100% of the period of the listened-to content.
And determining a third mode according to the content difficulty of the audio program content. The specific embodiment of the method can refer to the determination method of the second review duration listed above, and will not be described herein.
In the above embodiment, the content range which needs to be saved by the user when the user needs to recall is determined by judging the information such as the historical listening behavior of the user and the content difficulty of the audio program, and the like, and the continuous listening summary content is generated by applying the voice recognition and automatic summary technology, finally the audio content is synthesized by the voice synthesis technology, so that the "intelligent continuous listening" function is supported to be started when the user clicks the continuous listening audio, and the user is helped to recall the audio content which has been listened by playing the summary audio content and better accepted with the continuous listening content.
In the embodiment of the application, after the reviewing time length is determined, the paragraph range needing to be reviewed can be converted into text information. Specifically, the audio program content to be reviewed is first uploaded to a server, and the audio content is converted into text information mainly using an automatic speech recognition (Automatic Speech Recognition, ASR) language recognition technique. The ASR language recognition flow is shown in fig. 10, and the specific process is as follows:
First, the audio content to be reviewed, which is determined based on the review duration, is uploaded to a server (i.e., the speech input in fig. 10), and further, important information reflecting the speech characteristics is extracted from the speech waveform of the audio content to be reviewed, relatively irrelevant information (e.g., background noise) is removed, and the information is converted into a set of discrete parameter vectors (i.e., the coding (feature extraction) in fig. 10).
If a plurality of sound features exist in the audio content to be reviewed, speaker sound separation is needed at the same time, and key sound features in the audio content to be reviewed, such as the sound feature with the highest occupation ratio, are extracted. Specifically, firstly, voice information is preprocessed, voice endpoint detection (Voice Activity Detection, VAD) and framing are carried out on the audio content, a sound waveform diagram is obtained, then, the conversion from a time domain to a frequency domain is completed through Fourier change, namely, fourier change is carried out on each frame, a frequency spectrum of each frame is obtained through a characteristic parameter Mel frequency cepstrum coefficient (Mel Frequency Cepstral Coefficent, MFCC), and finally, the frequency spectrum diagram is summarized. The method removes background noise, irrelevant human voice and the like in the program audio.
After feature extraction is completed, feature recognition is entered, and the character generation stage (i.e., decoding in fig. 10), generally referred to herein as "phonemes", each pronunciation is the smallest unit in speech, such as a vowel in mandarin pronunciation, a consonant. The method mainly processes pronunciation-related work by framing the voice through an acoustic model, wherein the output of the acoustic model comprises the basic phoneme state and probability of the sound, the acoustic characteristics in the target language are covered, the minimum 'phonemes' in the voice are recognized, the system finds out the phonemes of the current speaking from each frame, then a plurality of phonemes form words, and then the words form text sentences. In the process, by judging which phoneme has the largest probability, the frame belongs to which phoneme. The system then composes words from the plurality of phonemes and text sentences from the words. The language model training set helps the system to combine the semantic scene and the context to achieve the best recognition effect.
Finally, obtaining the text output corresponding to the audio content to be reviewed through decoding.
Further, the server generates a program continued listening summary content, hereinafter referred to as "summary", using a generated text summary technique.
In order to obtain better summary review experience, the method limits that the program content review summary content finally played for the user must not exceed 90s in playing time, the length of text content corresponding to the program content review summary content must not exceed 1000 words, and the summary content length limit is input to a system server.
The review paragraphs of the audio class product are then used to generate a content summary using a generated text summarization technique (abstractive).
In the embodiment of the application, the generation type abstract is based on natural language generation (Natural Language Generation, NLG) technology, and is based on natural language description generated by an algorithm model according to source document content, instead of extracting sentences of an original text. The generated text abstract is realized mainly by means of a deep neural network structure, which is also called a coder and decoder (Encoder, decoder) architecture. An abstract semantic representation is established by utilizing natural language processing (Natural Language Processing, NLP) natural semantic recognition technology, and after the machine semantic recognition is carried out on the article content, a corresponding paragraph abstract is generated according to the requirement of providing the summary length.
The generated abstract technology used by the application is realized by adding a attention (attention) mechanism on the basis of a Sequence-to-Sequence model in deep learning. Basic model structure as shown in fig. 11, the basic structure of the generated neural network model mainly consists of an encoder (Encoder) and a Decoder (Decoder), and both the encoding and the decoding are realized by the neural network.
Wherein the encoder is responsible for encoding the input original text into a vector C (Context), which is a representation of the original text, containing the text Context. And the decoder is responsible for extracting important information from this vector, retrieving the semantic processing clip, and generating the text excerpt.
For example, the original text is "The XX XX became THE LARGEST tech.," (XX becomes the biggest education school..), "the generated text abstract is" XX tech., ", where XX is an abbreviation for XX.
In addition, in view of the problems of generating a non-smooth and repeated words and sentences in the field of text summarization of long text summarization, the application combines an internal attention mechanism (Intra-attention mechanism) to solve the problems, namely 1) a classical decoder-encoder attention mechanism (Intra-temporal attention) and 2) an internal attention mechanism (Intra-decoder attention) of a decoder.
Specifically, intra-temporal attention enables the decoder to dynamically and on-demand obtain information at the input as the result is generated, acting on Encoder, to calculate weights for each word in the input text (input), thus enabling the generated content information to overlay the original text. In the process of calculating the Intra-temporal attention weight, the application adopts a method to punish the word with higher weight obtained in input so as to prevent the word from being given high weight again in the later decoding process. Intra-Decoder attention enables the model to pay attention to the generated words, helps to solve the problem that the same word and sentence are easy to repeat when long sentences are generated, acts on the Decoder, and calculates weights for the generated words, so that repeated contents can be avoided. Then the two are spliced together to decode and generate the next word. For each decoding step t, the sequence generated by the application in the first decoding step is empty. The method is simpler and more widely applicable to other types of recursive networks.
In an alternative embodiment, when converting the summary content text into audio to obtain the follow-up summary content, the TTS of the follow-up summary content may be generated and played by learning the audio sound in the target audio program content. Of course, the TTS of the follow-up summary content may also be generated by some other sound, such as fixed female, male, cartoon character, etc.
The process of generating the follow-up summary content TTS by learning the audio sounds in the target audio program content will be described in detail below:
Specifically, if the target audio program content includes an object sound, the summary content text is converted into audio based on the object sound, so as to obtain the follow-up summary content. If the target audio program content contains a plurality of target sounds, the feature extraction is performed on the plurality of target sounds, the sound with the highest duty ratio (namely the highest duty ratio sound) is determined, and the summary content text is converted into audio based on the highest duty ratio sound, so that the continuous listening summary content is obtained. That is, when the target audio program content contains sounds of a plurality of objects, that is, when a plurality of sound features exist, speaker separation can be performed, the sound with the highest duty ratio (the highest duty ratio sound) in the audio is acquired, and the audio feature of the highest duty ratio sound is subjected to acoustic feature learning of the speech synthesis technology, so that the summary content text is converted into the audio, and the follow-up summary content is obtained.
Wherein, the application adopts a voice synthesis method based on parameters in consideration of smaller data volume in the audio program content. The method uses a statistical model to generate voice parameters at any time and converts the parameters into sound waveforms. The process is a process that text is abstracted into phonetic features, the corresponding relation between the phonetic features and the acoustic features is learned by a statistical model, and then the predicted acoustic features are restored into waveforms (waveforms). The last step of feature-to-waveform is achieved using a mainstream neural network to predict and then using a vocoder (vocoder) to generate waveforms.
Referring to fig. 12A, a schematic diagram of a parametric french voice synthesis flow in an embodiment of the present application may be summarized as a process of audio feature extraction (parameter extraction) - > hidden markov model (Hidden Markov Model, HMM) modeling- > parametric synthesis- > waveform reconstruction. The above process is described in detail below with reference to fig. 12A, respectively:
First, audio feature extraction is required for a speech signal of a target audio program content.
For the target audio program content, the application mainly extracts the mel spectrogram (melspectrogram) audio characteristics of the target audio program content. MFCC is a relatively common audio feature, which is actually a one-dimensional time domain signal for sound, and it is difficult to intuitively see the change rule of the frequency domain. Considering that fourier transform is used to obtain frequency domain information, but time domain information is lost, and the change of the frequency domain along with the time domain cannot be seen, so that sound cannot be well described. In the embodiment of the application, short-time Fourier is used.
The short-time fourier transform (STFT) refers to fourier transform of a short-time signal, which is obtained by framing a long-time signal, and is suitable for analyzing a stable signal. In the embodiment of the application, the transformation of the voice signal is flat in a short time span range, and the two-dimensional signal form similar to a graph is obtained by framing and windowing, performing fast Fourier transform (fast Fourier transform, FFT) on each frame, and finally stacking the result of each frame along the other dimension. If the original signal of the present application is a sound signal, the two-dimensional signal obtained by STFT expansion is a so-called spectrogram.
Where the spectrogram is often a very large one, it is often transformed into the mel spectrum by a mel-scale filter bank (mel-SCALE FILTER banks) in order to obtain sound features of a suitable size. And performing cepstrum analysis (taking logarithm and performing discrete cosine transform) on the Mel spectrum to obtain Mel cepstrum.
Parameters such as fundamental frequency parameters, voice parameters, etc. can be extracted based on mel-frequency cepstrum.
Further, HMM modeling is performed. Specifically, a set of continuous density hidden Markov models (CD-HMMs) are used to model speech parameters, the output state of each HMM state is represented by a single Gaussian function (Gaussian) or a mixture of Gaussian functions (Gaussian Mixed Model, GMM, also known as Gaussian mixture model), and the objective of the parameter generation algorithm is to calculate a speech parameter sequence with a maximum likelihood function given a Gaussian distribution sequence.
The two processes correspond to the training module in fig. 12A, and the context-related HMM model can be obtained through training by the processes, and then, based on the model, speech synthesis is performed, namely, corresponds to the synthesizing module in fig. 12A.
After audio feature extraction and HMM modeling, parametric synthesis and waveform reconstruction of the summary content text is required.
Specifically, firstly, the summary content text needs to be input into a synthesis module (corresponding to the input text in fig. 12A), then text analysis is performed on the text, context characteristics are extracted, then a state sequence is generated based on the context-related HMM model obtained through the process modeling, further voice parameters are generated, finally the voice parameters are converted into acoustic waveforms (i.e. parameter synthesis and waveform reconstruction) based on a parameter synthesizer, and then voice (i.e. continuous listening summary content) is output.
When the acoustic feature learning is performed on the audio features of the target audio program content through the speech synthesis technology, specifically, the phonemes, word segmentation, part-of-speech acquisition, sentence meaning understanding, prosody prediction, pinyin prediction and the like in the audio features of the target audio program content are disassembled through the end-to-end speech synthesis technology. Referring to fig. 12B, a specific flow chart of text analysis in an embodiment of the present application includes several steps of input, sentence structure analysis, text regularization, text-to-phoneme, and phonetic prediction.
Wherein, after inputting the text, sentence structure analysis is required for the text, including language discrimination and sentence segmentation. The sentence segmentation method is realized by a statistical word segmentation method when sentence segmentation is carried out:
Formally, words are stable combinations of words, and therefore in this context, the more times adjacent words appear simultaneously, the more likely a word is composed. Therefore, the co-occurrence frequency or probability adjacent to the word can better reflect the feasibility of the word. The frequency of combinations of words that are expected to co-occur adjacently may be counted to calculate their co-occurrence information. The formula for calculating the mutual information of the Chinese characters X and Y is M (X, Y) =lg (P (X, Y)/P (X) P (Y)). The P (X, Y) is the adjacent co-occurrence probability of the Chinese characters X, Y, and the P (X) and the P (Y) are the occurrence frequencies of X, Y in the corpus respectively. The mutual information shows the tightness of the combination relationship between Chinese characters. When the degree of compactness is above a certain threshold, it is considered that the word may constitute a word. The method only needs to count the word group frequency in the corpus, and does not need to split a dictionary, so the method is also called a dictionary-free word segmentation method or a statistical word extraction method.
In the regular part of the text, regular classification of the text and rule replacement are needed. In the text-to-phoneme part, language identification is needed first, and then part-of-speech prediction is performed, so that the text-to-phoneme is performed.
Part-of-speech prediction is part-of-speech tagging (part-of-SPEECH TAGGING). The part-of-speech tagging is also called part-of-speech tagging or simply tagging, and refers to a process of tagging each word in a word segmentation result with a correct part-of-speech, that is, determining that each word is a noun, a verb, an adjective, or other part-of-speech. The application is assisted in performing syntactic analysis preprocessing. The application can be used for part-of-speech tagging based on an HMM model, which can be trained using a large corpus with tagged data, where tagged data refers to text in which each word is assigned the correct part-of-speech tagging. In addition, sentence meaning understanding can also be performed through syntactic analysis, which means that the basic task is to determine the syntactic structure of a sentence or the dependency relationship between words in the sentence. This step can be accomplished through the construction of a syntax tree.
Finally, the phonetic prediction part, mainly prosody prediction, is the key to speech synthesis.
In summary, through the above-mentioned process listed in fig. 12A and fig. 12B, the server outputs the continuous listening summary content TTS to the client, and the client preferentially plays the audio of the review content after the user clicks the "play" button, so as to achieve the effect of helping the user review the historical listening program content in the present application.
The method for determining the review duration and converting the summary content text into audio, which are listed in the embodiments of the present application, may be performed by the terminal device alone or by the terminal device and the server together, and the two methods are similar processes, and are repeated for the repetition of the two methods and are not repeated.
In an optional implementation manner, after receiving a setting request for a follow-up playing permission control in a permission setting interface sent by a client, a server acquires follow-up playing permission information associated with a target object, and stores the follow-up playing permission information in association with identification information of the target object.
Specifically, for example, as shown in fig. 6, the user may set the continuous playing authority through the authority setting interface, and the client sends a setting request to the server, where the request carries the identification information of the target object and the related continuous playing authority information, and the server performs association storage.
In the embodiment, the intelligent continuous listening function is supported to be started when the user clicks the continuous listening audio, and the user is helped to recall the audio content which is listened to so far and better accept the continuous listening content by playing the summary audio content.
In summary, the playing control method of the audio program content in the application supports the intelligent generation of the review summary when the user clicks the continuous listening audio, and recommends a quick recall function for the user, thereby helping the user review the set of program content which has been listened before. The part of content can be better accepted with the continuous listening content, so that the understanding of the user on the continuous listening content is enhanced.
Fig. 13A is a flowchart of a method for implementing audio program content playing control based on a client and a server according to an embodiment of the present application. The implementation flow of the method is as follows:
On the client side, firstly, starting an intelligent continuous listening function, suspending playing of a program (namely target audio program content) by a user, clicking a program playing button by the user;
Based on user pause and play, continuous play of target audio program content is realized, and at this time, firstly, the program played duration needs to be analyzed by the client, and the two cases of the program played duration <2min and the program played duration > =2min can be divided:
If the program has been broadcast for a period of time less than 2 minutes, not generating continuous listening summary content;
If the program broadcast duration > =2min, the server side continues to judge the user listening time interval;
if the user listening time interval is less than 5 hours, not generating continuous listening summary content;
if the user listens to the time interval > =5h, the range of the review paragraph is determined, wherein the specific determining method can refer to the first determining method, the second determining method and the like listed in the above embodiment, and the repetition is not repeated.
The server then converts the paragraph audio to be reviewed into text information, generates summary content text, determines the primary sound (i.e., the highest duty ratio sound) in the program, and generates continuous listening summary content based on the sound.
The specific implementation of the above process may be referred to the above description of the relevant portion, and the repetition is not repeated.
And finally, the server feeds back the continuous listening summary content to the client, and plays the continuous listening summary content at the client side.
Based on the above description, taking the program broadcast duration > =2min and the user listening time interval > =5h as an example, the following describes the interaction procedure between the client and the server in detail with reference to fig. 13B. Referring to fig. 13B, a timing diagram of interaction between a client and a server according to an embodiment of the present application specifically includes the following steps:
Step S1301, the client pauses playing the target audio program content in response to a pause operation triggered for the target audio program content, and sends a pause request to the server;
Step S1302, the server records corresponding pause time;
Step S1303, the client responds to the restoration operation triggered by the target audio program content and sends a restoration request to the server;
step 1304, the server records the corresponding continuous listening time;
Step S1305, the server determines that the target audio program content meets the target condition;
Step S1306, the server generates continuous listening summary content aiming at the target audio program content based on the pause time and the time interval between continuous listening times, and feeds back the continuous listening summary content to the client;
Step S1307, the client plays the continuous listening summary content corresponding to the target audio program content, and continues playing the unreflected part in the target audio program content after the continuous listening summary content is played.
Based on the same inventive concept, the embodiment of the application also provides a playing control device of the audio program content. As shown in fig. 14, which is a schematic structural diagram of a playback control apparatus 1400 for audio program content, may include:
A pause unit 1401 for pausing playing of the target audio program content in response to a pause operation triggered for the target audio program content during playing of the target audio program;
The follow-up playing unit 1402 is configured to display a follow-up listening control area in a playing control interface in response to a resume operation triggered on the target audio program content, and play a follow-up summary content corresponding to the target audio program content, where a review duration corresponding to the follow-up listening summary content is determined according to at least one of a historical listening behavior of an object and a content difficulty level corresponding to the target audio program content, the follow-up listening playing control area is configured to control a playing state of the follow-up listening summary content, and after the follow-up listening summary content is played, play an unplayed portion in the target audio program content continuously from a current pause position.
Optionally, the listening control area includes a summary control, and the rebroadcasting unit 1402 is further configured to:
and before the playback of the continuous-listening summary content is finished, if the closing operation triggered by the summary control is responded, closing the playback of the continuous-listening summary content, and continuing to play the audio content corresponding to the unreflected part in the target audio program content.
Optionally, the continuous playing unit 1402 is configured to:
responding to a recovery operation triggered by aiming at the target audio program content, displaying a continuous listening control area containing continuous listening prompt information in a playing control interface so as to prompt that an object is currently intelligently continuous listening, and playing continuous listening summary content corresponding to the target audio program content;
And, the continuous playing unit 1402 is further configured to:
and the continuous listening control area is not displayed in the play control interface.
Optionally, the apparatus further includes:
A setting unit 1403, configured to, before the resume unit 1402 responds to a resume operation triggered for the target audio program content, display a resume control area in a playback control interface and play a resume summary content corresponding to the target audio program content, set a resume right for a target object in response to a set operation for a resume right control in a right setting interface, so as to open or close an intelligent resume mode;
And sending the corresponding continuous broadcasting permission information to a server so that the server can store the continuous broadcasting permission information and the identification information of the target object in a correlated way.
Optionally, the continuous playing unit 1402 is further configured to:
Responding to a recovery operation triggered by aiming at the target audio program content, and if the target object is determined to have the continuous playing authority according to the continuous playing authority information associated with the target object, determining that the target object is in an intelligent continuous listening mode currently;
and displaying a continuous listening control area in a playing control interface, and playing continuous listening summary content corresponding to the target audio program content.
Optionally, the follow-up unit 1402 is further configured to determine the follow-up summary content by:
Selecting a section of audio content from the audio content corresponding to the played part based on the reviewing time length, and taking the section of audio content as audio content to be reviewed;
Converting the audio content to be reviewed into text information, and generating summary content text for the text information based on a text summarization technology;
and converting the summary content text into audio according to the key sound characteristics in the audio content to be reviewed, so as to obtain the continuous listening summary content.
Optionally, the continuous playing unit 1402 is specifically configured to:
determining a review duration corresponding to the continuous listening summary content based on a time interval between a pause time corresponding to the pause operation and a continuous listening time corresponding to the resume operation and a played duration corresponding to a played part in the target audio program content;
wherein the review duration is positively correlated with both the time interval and the broadcast duration.
Optionally, the continuous playing unit 1402 is specifically configured to:
Determining a corresponding first review duration based on a time interval between the pause operation and the resume operation and a played duration corresponding to a played part in the target audio program content, wherein the first review duration is positively correlated with the time interval and the played duration;
determining a corresponding second review duration based on a content difficulty level corresponding to the target program content, wherein the second review duration is positively correlated with the content difficulty level;
And taking the sum of the first reviewing time length and the second reviewing time length as the corresponding reviewing time length.
Optionally, the continuous playing unit 1402 is specifically configured to:
If the target audio program content contains sounds of a plurality of objects, determining the highest duty ratio sound by extracting characteristics of the sounds of the plurality of objects;
And converting the summary content text into audio based on the highest duty ratio sound to obtain the continuous listening summary content.
Based on the same inventive concept, the embodiment of the application also provides another playing control device of the audio program content. As shown in fig. 15, which is a schematic structural diagram of a playback control apparatus 1500 of audio program content, may include:
A determining unit 1501, configured to receive a pause request and a resume request for a target audio program content sent by a client in a playing process of the target audio program, and determine a review duration according to at least one of a historical listening behavior of an object and a content difficulty level corresponding to the target audio program content;
a generating unit 1502, configured to generate continuous listening summary content according to audio content corresponding to a played part in the target audio program content based on the review duration, where the continuous listening summary content is summary information of the played part;
And the feedback unit 1503 is configured to feed back the continuous-listening summary content to a client, so that the client displays a continuous-listening control area in a playing control interface, plays the continuous-listening summary content, and continues to play the unplayed portion of the target audio program content from the current pause position after the continuous-listening summary content is played.
Optionally, the apparatus further includes:
A determining unit 1504, configured to determine, before the generating unit 1502 generates, based on the review duration, continuous listening summary content according to audio content corresponding to a played portion of the target audio program content, that the target audio program content meets at least one of the following target conditions:
the played duration corresponding to the played part in the target audio program content is not less than a first time duration threshold;
And the time interval between the pause time and the continuous listening time corresponding to the target audio program content is not smaller than a second duration threshold.
Optionally, the generating unit 1502 is specifically configured to:
Selecting a section of audio content from the audio content corresponding to the played part based on the reviewing time length, and taking the section of audio content as audio content to be reviewed;
Converting the audio content to be reviewed into text information, and generating summary content text for the text information based on a text summarization technology;
and converting the summary content text into audio according to the key sound characteristics in the audio content to be reviewed, so as to obtain the continuous listening summary content.
Optionally, the determining unit 1501 is specifically configured to:
determining a review duration corresponding to the continuous listening summary content based on a time interval between a pause time corresponding to the pause operation and a continuous listening time corresponding to the resume operation and a played duration corresponding to a played part in the target audio program content;
wherein the review duration is positively correlated with both the time interval and the broadcast duration.
Optionally, the determining unit 1501 is specifically configured to:
if the time interval is not greater than a preset interval threshold, determining the review time length according to the broadcasted time length, wherein the review time length and the broadcasted time length are positively correlated;
And if the time interval is greater than a preset interval threshold, determining the review time interval according to the broadcasted time interval and the time interval, wherein the review time interval is positively correlated with both the time interval and the broadcasted time interval.
Optionally, the determining unit 1501 is specifically configured to:
If the time interval is not greater than a preset interval threshold, taking the product of the broadcast duration and a first preset proportional value as the review duration;
If the time interval is greater than a preset interval threshold, increasing a first preset proportional value by a first set step length when the time interval is increased by a set time length, obtaining a first proportional value, and taking the product of the played time length and the first proportional value as the review time length.
Optionally, the determining unit 1501 is specifically configured to:
Determining a corresponding first review duration based on a time interval between the pause operation and the resume operation and a played duration corresponding to a played part in the target audio program content, wherein the first review duration is positively correlated with the time interval and the played duration;
determining a corresponding second review duration based on a content difficulty level corresponding to the target program content, wherein the second review duration is positively correlated with the content difficulty level;
And taking the sum of the first reviewing time length and the second reviewing time length as the corresponding reviewing time length.
Optionally, the determining unit 1501 is specifically configured to:
If the content difficulty level is not greater than a preset level threshold, determining the second review time length according to the broadcast time length, wherein the review time length and the broadcast time length are positively correlated;
And if the content difficulty level is greater than a preset level threshold, determining the second review time according to the broadcast time and the content difficulty level, wherein the review time is positively correlated with the broadcast time and the content difficulty level.
Optionally, the determining unit 1501 is specifically configured to:
If the time interval is not greater than a preset interval threshold, taking the product of the broadcast duration and a second preset proportional value as the first review duration;
If the time interval is greater than a preset interval threshold, increasing the second preset proportional value by a second set step length when the time interval is increased by a set time length, obtaining a second proportional value, and taking the product of the played time length and the second proportional value as the first review time length.
Optionally, the feedback unit 1503 is specifically configured to:
If the content difficulty level is not greater than a preset level threshold, taking the product of the played duration and a third preset proportional value as the second review duration;
If the content difficulty level is greater than a preset level threshold, increasing the third preset proportional value by a third set step length to obtain a third proportional value, and taking the product of the played duration and the third proportional value as the second review duration when the content difficulty level is increased by a set level.
Optionally, the generating unit 1502 is specifically configured to:
If the target audio program content contains sounds of a plurality of objects, determining the highest duty ratio sound by extracting characteristics of the sounds of the plurality of objects;
And converting the summary content text into audio based on the highest duty ratio sound to obtain the continuous listening summary content.
Optionally, the apparatus further includes:
The association unit 1505 is configured to obtain, after receiving a setting request for a resume permission control in a permission setting interface sent by the client, resume permission information associated with a target object, and associate and store the resume permission information with identification information of the target object, where the resume permission control is used to turn on or off an intelligent resume mode, and the intelligent resume mode refers to an intelligent play control function of playing customized resume summary content according to an operation of the object after the audio program content is paused.
For convenience of description, the above parts are described as being functionally divided into modules (or units) respectively. Of course, the functions of each module (or unit) may be implemented in the same piece or pieces of software or hardware when implementing the present application.
Those skilled in the art will appreciate that the various aspects of the application may be implemented as a system, method, or program product. Accordingly, aspects of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects that may be referred to herein collectively as a "circuit," module "or" system.
The embodiment of the application also provides electronic equipment based on the same conception as the embodiment of the method. The electronic device may be used for playback control of audio program content. In one embodiment, the electronic device may be a terminal device, such as the terminal device 210 shown in fig. 2, and the terminal device 210 may be an electronic device such as a smart phone, a tablet computer, a laptop computer, or a PC.
Referring to fig. 16, the terminal device 210 includes a display unit 1640, a processor 1680 and a memory 1620, wherein the display unit 1640 includes a display panel 1641 for displaying information inputted by a user or provided to the user, various object selection interfaces of the terminal device 210, and the like, and is mainly used for displaying interfaces, shortcut windows, and the like of applications installed in the terminal device 210 in the embodiment of the present application. Alternatively, the display panel 1641 may be configured in the form of a Liquid crystal display (Liquid CRYSTAL DISPLAY, LCD) or an Organic Light-Emitting Diode (OLED) or the like.
The processor 1680 is configured to read the computer program and then execute a method defined by the computer program, for example, the processor 1680 reads the social application program to run the application on the terminal device 210 and display an interface of the application on the display unit 1640. The Processor 1680 may include one or more general-purpose processors and may also include one or more digital signal processors (DIGITAL SIGNAL Processor, DSP) for performing the associated operations to implement the techniques provided by embodiments of the present application.
Memory 1620 generally includes memory and external memory, which may be Random Access Memory (RAM), read Only Memory (ROM), and CACHE memory (CACHE), among others. The external memory can be a hard disk, an optical disk, a USB disk, a floppy disk, a tape drive, etc. The memory 1620 is used to store computer programs including application programs and the like corresponding to the applications, and other data, which may include data generated after the operating system or the application programs are executed, including system data (e.g., configuration parameters of the operating system) and user data. The program instructions in the embodiments of the present application are stored in the memory 1620, and the processor 1680 executes the program instructions stored in the memory 1620 to implement the playback control method of the audio program content discussed above, or to implement the functions of the adaptation application discussed above.
In addition, the terminal device 210 may further include a display unit 1640 for receiving input digital information, character information, or touch operation/noncontact gestures, and generating signal inputs related to user settings and function controls of the terminal device 210, and the like. Specifically, in an embodiment of the present application, the display unit 1640 may include a display panel 1641. The display panel 1641, such as a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations of the player using any suitable object or accessory such as a finger, a stylus, etc. on the display panel 1641 or on the display panel 1641) and drive the corresponding connection device according to a predetermined program. Alternatively, the display panel 1641 may include two parts, a touch detection device and a touch controller. The touch controller receives touch information from the touch detection device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 1680, and can receive and execute commands sent by the processor 1680. In the embodiment of the present application, if the user triggers the recovery operation of the target audio program content by clicking, the touch detection device in the display panel 1641 detects the touch operation, and then the touch controller sends a signal corresponding to the detected touch operation, the touch controller converts the signal into the touch point coordinates and sends the touch point coordinates to the processor 1680, and the processor 1680 determines whether the user successfully operates according to the received touch point coordinates.
The display panel 1641 may be implemented with various types of resistive, capacitive, infrared, and surface acoustic wave. In addition to the display unit 1640, the terminal device 210 may further include an input unit 1630, and the input unit 1630 may include one or more of, but not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, etc. In fig. 16, the input unit 1630 includes an image input device 1631 and other input devices 1632 as an example.
In addition to the above, the terminal device 210 may also include a power supply 1690 for powering other modules, audio circuitry 1660, near field communication module 1670, and RF circuitry 1610. The terminal device 210 may also include one or more sensors 1650, such as an acceleration sensor 1651, a distance sensor 1652, a fingerprint sensor 1653, a temperature sensor 1654, and so forth. The audio circuit 1660 specifically includes a speaker 1661 and a microphone 1662, and the like, and for example, a user can use voice control, and the terminal device 210 can collect a user's voice through the microphone 1662, can control the user's voice, and play a corresponding alert sound through the speaker 1661 when the user needs to be alerted.
The embodiment of the application also provides electronic equipment based on the same conception as the embodiment of the method. The electronic device may be used for playback control of audio program content. In one embodiment, the electronic device may be a server, such as server 230 shown in FIG. 2. In this embodiment, the electronic device may be configured as shown in fig. 17, including a memory 1701, a communication module 1703, and one or more processors 1702.
A memory 1701 for storing computer programs for execution by the processor 1702. The memory 1701 may mainly include a storage program area in which an operating system, programs required for running an instant messaging function, and the like are stored, and a storage data area in which various instant messaging information, an operation instruction set, and the like are stored.
The memory 1701 may be a volatile memory (RAM) such as a random-access memory (RAM), the memory 1701 may be a nonvolatile memory (non-volatile memory) such as a read-only memory (rom), a flash memory (flash memory), a hard disk (HARD DISK DRIVE, HDD) or a Solid State Disk (SSD), or the memory 1701 may be any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto. The memory 1701 may be a combination of the above.
The processor 1702 may include one or more central processing units (central processing unit, CPUs) or digital processing units, or the like. Processor 1702 is configured to implement the above-described playback control method for audio program content when calling the computer program stored in memory 1701.
The communication module 1703 is used for communicating with a terminal device and other servers.
The specific connection medium between the memory 1701, the communication module 1703 and the processor 1702 is not limited to the above embodiments of the present application. The disclosed embodiment is illustrated in fig. 17 with the memory 1701 and the processor 1702 coupled via the bus 1704, the bus 1704 being illustrated in fig. 17 with a bold line, the manner in which the other components are coupled being illustrative only and not limiting. The bus 1704 may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, only one thick line is shown in fig. 17, but not only one bus or one type of bus.
The memory 1701 stores therein a computer storage medium having stored therein computer executable instructions for implementing a playback control method for audio program content according to an embodiment of the present application. The processor 1702 is configured to perform the above-described playback control method of audio program content, as shown in fig. 8.
In some possible embodiments, aspects of the method for controlling playing of audio program content provided by the present application may also be implemented in the form of a program product, which includes a program code for causing a computer device to perform the steps of the method for controlling playing of audio program content according to the various exemplary embodiments of the present application described in the present specification when the program product is run on the computer device, for example, the computer device may perform the steps as shown in fig. 3 or fig. 8.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of a readable storage medium include an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The program product of embodiments of the present application may employ a portable compact disc read only memory (CD-ROM) and include program code and may run on a computing device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a command execution system, apparatus, or device.
The readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a command execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
Claims (25)
1. A playback control method for audio program content, the method comprising:
In the playing process of a target audio program, responding to a pause operation triggered for the target audio program content, and pausing the playing of the target audio program content;
responding to a recovery operation triggered by aiming at the target audio program content, displaying a continuous listening control area in a playing control interface, and playing continuous listening summary content corresponding to the target audio program content, wherein the continuous listening summary content is summary information generated aiming at audio content corresponding to a played part in the target audio program content, and the review time corresponding to the continuous listening summary content is determined according to at least one of historical listening behavior of an object and content difficulty level corresponding to the target audio program content;
and after the playback of the continuous listening summary content is finished, continuing to play the unreflected part in the target audio program content from the current pause position.
2. The method of claim 1, wherein the follow-up control area includes a summary control, the method further comprising:
and before the playback of the continuous-listening summary content is finished, if the closing operation triggered by the summary control is responded, closing the playback of the continuous-listening summary content, and continuing to play the audio content corresponding to the unreflected part in the target audio program content.
3. The method of claim 2, wherein the displaying a follow-up control area in a playback control interface in response to a resume operation triggered for the target audio program content and playing follow-up summary content corresponding to the target audio program content comprises:
responding to a recovery operation triggered by aiming at the target audio program content, displaying a continuous listening control area containing continuous listening prompt information in a playing control interface so as to prompt that an object is currently intelligently continuous listening, and playing continuous listening summary content corresponding to the target audio program content;
and closing the playing of the continuous-listening summary content and continuing to play the audio content corresponding to the unplayed part in the target audio program content in response to the closing operation triggered by the summary control, and further comprising:
and the continuous listening control area is not displayed in the play control interface.
4. The method of claim 1, wherein prior to the responding to the resume operation triggered for the target audio program content displaying a follow-up control area in a play control interface and playing follow-up summary content corresponding to the target audio program content, the method further comprises:
Setting the continuous playing permission aiming at the target object to start or close an intelligent continuous listening mode in response to the setting operation aiming at the continuous playing permission control in the permission setting interface, wherein the intelligent continuous listening mode is an intelligent playing control function for playing customized continuous listening summary content according to the operation of the object after the playing pause of the audio program content;
And sending the corresponding continuous broadcasting permission information to a server so that the server can store the continuous broadcasting permission information and the identification information of the target object in a correlated way.
5. The method of claim 4, wherein the displaying a follow-up control area in a playback control interface in response to a resume operation triggered for the target audio program content and playing follow-up summary content corresponding to the target audio program content comprises:
Responding to a recovery operation triggered by aiming at the target audio program content, and if the target object is determined to have the continuous playing authority according to the continuous playing authority information associated with the target object, determining that the target object is in an intelligent continuous listening mode currently;
and displaying a continuous listening control area in a playing control interface, and playing continuous listening summary content corresponding to the target audio program content.
6. The method of any one of claims 1-5, wherein the follow-up summary content is determined by:
Selecting a section of audio content from the audio content corresponding to the played part based on the reviewing time length, and taking the section of audio content as audio content to be reviewed;
Converting the audio content to be reviewed into text information, and generating summary content text for the text information based on a text summarization technology;
and converting the summary content text into audio according to the key sound characteristics in the audio content to be reviewed, so as to obtain the continuous listening summary content.
7. The method of any of claims 1-5, wherein determining the review duration from the historical listening behavior comprises:
determining a review duration corresponding to the continuous listening summary content based on a time interval between a pause time corresponding to the pause operation and a continuous listening time corresponding to the resume operation and a played duration corresponding to a played part in the target audio program content;
wherein the review duration is positively correlated with both the time interval and the broadcast duration.
8. The method of any of claims 1-5, wherein determining the review duration from the historical listening behavior and the content difficulty rating comprises:
Determining a corresponding first review duration based on a time interval between the pause operation and the resume operation and a played duration corresponding to a played part in the target audio program content, wherein the first review duration is positively correlated with the time interval and the played duration;
determining a corresponding second review duration based on a content difficulty level corresponding to the target program content, wherein the second review duration is positively correlated with the content difficulty level;
And taking the sum of the first reviewing time length and the second reviewing time length as the corresponding reviewing time length.
9. The method of claim 6, wherein said converting the summary content text to audio based on key sound features in the audio content to be reviewed to obtain the follow-up summary content, comprises:
If the target audio program content contains sounds of a plurality of objects, determining the highest duty ratio sound by extracting characteristics of the sounds of the plurality of objects;
And converting the summary content text into audio based on the highest duty ratio sound to obtain the continuous listening summary content.
10. A playback control method for audio program content, the method comprising:
In the playing process of the target audio program, a pause request and a resume request which are sent by a client and aim at the target audio program content are received, and then the reviewing time length is determined according to at least one of the historical listening behavior of the object and the content difficulty level corresponding to the target audio program content;
generating continuous listening summary content which is summary information of the played part based on the review duration according to the audio content corresponding to the played part in the target audio program content;
And feeding back the continuous-listening summary content to the client so that the client displays a continuous-listening control area in a playing control interface, plays the continuous-listening summary content, and continuously plays the unreflected part in the target audio program content from the current pause position after the continuous-listening summary content is played.
11. The method of claim 10, further comprising, prior to said generating follow-up summary content from audio content corresponding to a played portion of said target audio program content based on said review duration:
determining that the target audio program content meets at least one of the following target conditions:
the played duration corresponding to the played part in the target audio program content is not less than a first time duration threshold;
And the time interval between the pause time and the continuous listening time corresponding to the target audio program content is not smaller than a second duration threshold.
12. The method of claim 10, wherein the generating the follow-up summary content from the audio content corresponding to the played portion of the target audio program content based on the review duration comprises:
Selecting a section of audio content from the audio content corresponding to the played part based on the reviewing time length, and taking the section of audio content as audio content to be reviewed;
Converting the audio content to be reviewed into text information, and generating summary content text for the text information based on a text summarization technology;
and converting the summary content text into audio according to the key sound characteristics in the audio content to be reviewed, so as to obtain the continuous listening summary content.
13. The method of claim 10, wherein determining the review duration from the historical listening behavior comprises:
determining a review duration corresponding to the continuous listening summary content based on a time interval between a pause time corresponding to the pause operation and a continuous listening time corresponding to the resume operation and a played duration corresponding to a played part in the target audio program content;
wherein the review duration is positively correlated with both the time interval and the broadcast duration.
14. The method of claim 13, wherein the determining a review duration corresponding to the follow-up summary content based on a time interval between a pause time corresponding to the pause operation and a follow-up time corresponding to the resume operation and a played duration corresponding to a played portion of the target audio program content, comprises:
if the time interval is not greater than a preset interval threshold, determining the review time length according to the broadcasted time length, wherein the review time length and the broadcasted time length are positively correlated;
And if the time interval is greater than a preset interval threshold, determining the review time interval according to the broadcasted time interval and the time interval, wherein the review time interval is positively correlated with both the time interval and the broadcasted time interval.
15. The method of claim 14, wherein the determining the review duration from the on-air duration if the time interval is not greater than a preset interval threshold comprises:
If the time interval is not greater than a preset interval threshold, taking the product of the broadcast duration and a first preset proportional value as the review duration;
and if the time interval is greater than a preset interval threshold, determining the review duration according to the broadcast duration and the time interval, including:
If the time interval is greater than a preset interval threshold, increasing a first preset proportional value by a first set step length when the time interval is increased by a set time length, obtaining a first proportional value, and taking the product of the played time length and the first proportional value as the review time length.
16. The method of claim 10, wherein determining the review duration from the historical listening behavior and the content difficulty rating comprises:
Determining a corresponding first review duration based on a time interval between the pause operation and the resume operation and a played duration corresponding to a played part in the target audio program content, wherein the first review duration is positively correlated with the time interval and the played duration;
determining a corresponding second review duration based on a content difficulty level corresponding to the target program content, wherein the second review duration is positively correlated with the content difficulty level;
And taking the sum of the first reviewing time length and the second reviewing time length as the corresponding reviewing time length.
17. The method of claim 16, wherein the determining a corresponding second review duration based on the corresponding content difficulty rating for the target program content comprises:
If the content difficulty level is not greater than a preset level threshold, determining the second review time length according to the broadcast time length, wherein the review time length and the broadcast time length are positively correlated;
And if the content difficulty level is greater than a preset level threshold, determining the second review time according to the broadcast time and the content difficulty level, wherein the review time is positively correlated with the broadcast time and the content difficulty level.
18. The method of claim 17, wherein determining the second review duration based on the on-air duration if the content difficulty rating is not greater than a preset rating threshold comprises:
If the content difficulty level is not greater than a preset level threshold, taking the product of the played duration and a third preset proportional value as the second review duration;
If the content difficulty level is greater than a preset level threshold, determining the second review duration according to the broadcast duration and the content difficulty level, including:
If the content difficulty level is greater than a preset level threshold, increasing the third preset proportional value by a third set step length to obtain a third proportional value, and taking the product of the played duration and the third proportional value as the second review duration when the content difficulty level is increased by a set level.
19. The method of claim 12, wherein said converting the summary content text to audio based on key sound features in the audio content to be reviewed to obtain the follow-up summary content, comprises:
If the target audio program content contains sounds of a plurality of objects, determining the highest duty ratio sound by extracting characteristics of the sounds of the plurality of objects;
And converting the summary content text into audio based on the highest duty ratio sound to obtain the continuous listening summary content.
20. The method according to any one of claims 10-19, further comprising:
And after receiving a setting request for a follow-up playing permission control in a permission setting interface sent by the client, acquiring follow-up playing permission information associated with a target object, and storing the follow-up playing permission information and identification information of the target object in an associated mode, wherein the follow-up playing permission control is used for starting or closing an intelligent follow-up listening mode, and the intelligent follow-up listening mode is an intelligent playing control function for playing customized follow-up listening summary content according to the operation of the object after the playing of the audio program content is paused.
21. A playback control apparatus for audio program content, comprising:
A pause unit configured to pause playing of a target audio program content in response to a pause operation triggered for the target audio program content during playing of the target audio program;
The continuous playing unit is used for responding to the resume operation triggered by the target audio program content, displaying a continuous listening control area in a playing control interface and playing continuous listening summary content corresponding to the target audio program content, wherein the review duration corresponding to the continuous listening summary content is determined according to at least one of the historical listening behavior of an object and the content difficulty level corresponding to the target audio program content, the continuous listening playing control area is used for controlling the playing state of the continuous listening summary content, and after the continuous listening summary content is played, the playing of the unremitted part in the target audio program content is continued from the current pause position.
22. A playback control apparatus for audio program content, comprising:
the determining unit is used for receiving a pause request and a resume request which are sent by the client and aim at the target audio program content in the playing process of the target audio program, and determining the reviewing time length according to at least one of the historical listening behavior of the object and the content difficulty level corresponding to the target audio program content;
The generation unit is used for generating continuous listening summary content which is summary information of the played part according to the audio content corresponding to the played part in the target audio program content based on the review time length;
And the feedback unit is used for feeding back the continuous-listening summary content to the client so that the client displays a continuous-listening control area in a playing control interface, plays the continuous-listening summary content, and starts to play the unreflected part in the target audio program content from the current pause position after the continuous-listening summary content is played.
23. An electronic device comprising a processor and a memory, wherein the memory stores program code that, when executed by the processor, causes the processor to perform the steps of the method of any one of claims 1-9 or the steps of the method of any one of claims 10-20.
24. A computer readable storage medium, characterized in that it comprises a program code for causing an electronic device to perform the steps of the method of any one of claims 1-9 or the steps of the method of any one of claims 10-20 when the program code is run on the electronic device.
25. A computer program product comprising a computer program stored on a computer readable storage medium, which when read from the computer readable storage medium by a processor of an electronic device, causes the electronic device to perform the steps of the method of any one of claims 1 to 9 or the steps of the method of any one of claims 10 to 20.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202510803797.7A CN120687631A (en) | 2021-05-18 | 2021-05-18 | Audio program content playback control method, device, equipment and storage medium |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110541007.4A CN113761268B (en) | 2021-05-18 | 2021-05-18 | Audio program content playback control method, device, equipment and storage medium |
| CN202510803797.7A CN120687631A (en) | 2021-05-18 | 2021-05-18 | Audio program content playback control method, device, equipment and storage medium |
Related Parent Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202110541007.4A Division CN113761268B (en) | 2021-05-18 | 2021-05-18 | Audio program content playback control method, device, equipment and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN120687631A true CN120687631A (en) | 2025-09-23 |
Family
ID=78787201
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202510803797.7A Pending CN120687631A (en) | 2021-05-18 | 2021-05-18 | Audio program content playback control method, device, equipment and storage medium |
| CN202110541007.4A Active CN113761268B (en) | 2021-05-18 | 2021-05-18 | Audio program content playback control method, device, equipment and storage medium |
Family Applications After (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202110541007.4A Active CN113761268B (en) | 2021-05-18 | 2021-05-18 | Audio program content playback control method, device, equipment and storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (2) | CN120687631A (en) |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114416012A (en) * | 2021-12-14 | 2022-04-29 | 阿波罗智联(北京)科技有限公司 | Audio continuous playing method and device |
| CN115022705A (en) * | 2022-05-24 | 2022-09-06 | 咪咕文化科技有限公司 | Video playing method, device and equipment |
| CN114979769A (en) * | 2022-06-01 | 2022-08-30 | 山东福生佳信科技股份有限公司 | Video continuous playing progress management system and method |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7890331B2 (en) * | 2003-05-26 | 2011-02-15 | Koninklijke Philips Electronics N.V. | System and method for generating audio-visual summaries for audio-visual program content |
| CN105898591A (en) * | 2015-12-14 | 2016-08-24 | 乐视网信息技术(北京)股份有限公司 | Video play method and device and mobile terminal equipment |
| WO2019084181A1 (en) * | 2017-10-26 | 2019-05-02 | Rovi Guides, Inc. | Systems and methods for recommending a pause position and for resuming playback of media content |
| US11500917B2 (en) * | 2017-11-24 | 2022-11-15 | Microsoft Technology Licensing, Llc | Providing a summary of a multimedia document in a session |
| CN108305622B (en) * | 2018-01-04 | 2021-06-11 | 海尔优家智能科技(北京)有限公司 | Voice recognition-based audio abstract text creating method and device |
| CN110503991B (en) * | 2019-08-07 | 2022-03-18 | Oppo广东移动通信有限公司 | Voice broadcasting method and device, electronic equipment and storage medium |
-
2021
- 2021-05-18 CN CN202510803797.7A patent/CN120687631A/en active Pending
- 2021-05-18 CN CN202110541007.4A patent/CN113761268B/en active Active
Also Published As
| Publication number | Publication date |
|---|---|
| CN113761268B (en) | 2025-06-06 |
| CN113761268A (en) | 2021-12-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111933129B (en) | Audio processing method, language model training method and device and computer equipment | |
| CN110782870B (en) | Speech synthesis method, device, electronic equipment and storage medium | |
| US11862174B2 (en) | Voice command processing for locked devices | |
| US11393453B2 (en) | Clockwork hierarchical variational encoder | |
| JP2022531615A (en) | Using contextual information in an end-to-end model for speech recognition | |
| US20080071529A1 (en) | Using non-speech sounds during text-to-speech synthesis | |
| CN113761268B (en) | Audio program content playback control method, device, equipment and storage medium | |
| EP3387646A1 (en) | Text-to-speech processing systems and methods | |
| US11579841B1 (en) | Task resumption in a natural understanding system | |
| WO2023197206A1 (en) | Personalized and dynamic text to speech voice cloning using incompletely trained text to speech models | |
| WO2020050822A1 (en) | Detection of story reader progress for pre-caching special effects | |
| US11810556B2 (en) | Interactive content output | |
| WO2025071899A1 (en) | Natural language generation | |
| US20240274122A1 (en) | Speech translation with performance characteristics | |
| US12243511B1 (en) | Emphasizing portions of synthesized speech | |
| US20240428787A1 (en) | Generating model output using a knowledge graph | |
| CN110851650A (en) | Comment output method and device and computer storage medium | |
| US20250201230A1 (en) | Sending media comments using a natural language interface | |
| US11176943B2 (en) | Voice recognition device, voice recognition method, and computer program product | |
| CN119296540A (en) | Interactive data processing method and processing system | |
| CN116959421B (en) | Method and device for processing audio data, audio data processing equipment and media | |
| Zahariev et al. | Intelligent voice assistant based on open semantic technology | |
| CN115457931B (en) | Speech synthesis method, device, equipment and storage medium | |
| Tumpalan et al. | English-filipino speech topic tagger using automatic speech recognition modeling and topic modeling | |
| US12354603B1 (en) | Natural language response generation |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination |