CN109841202B - Rhythm generation method and device based on voice synthesis and terminal equipment - Google Patents
Rhythm generation method and device based on voice synthesis and terminal equipment Download PDFInfo
- Publication number
- CN109841202B CN109841202B CN201910008106.9A CN201910008106A CN109841202B CN 109841202 B CN109841202 B CN 109841202B CN 201910008106 A CN201910008106 A CN 201910008106A CN 109841202 B CN109841202 B CN 109841202B
- Authority
- CN
- China
- Prior art keywords
- rhythm
- target
- simulated
- note
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000033764 rhythmic process Effects 0.000 title claims abstract description 289
- 238000000034 method Methods 0.000 title claims abstract description 43
- 230000015572 biosynthetic process Effects 0.000 title claims abstract description 24
- 238000003786 synthesis reaction Methods 0.000 title claims abstract description 24
- 238000004590 computer program Methods 0.000 claims description 22
- 238000004364 calculation method Methods 0.000 claims description 6
- 230000001020 rhythmical effect Effects 0.000 claims 2
- 238000012545 processing Methods 0.000 abstract description 6
- 230000008569 process Effects 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000007670 refining Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 235000009091 Cordyline terminalis Nutrition 0.000 description 1
- 244000289527 Cordyline terminalis Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000005034 decoration Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Landscapes
- Auxiliary Devices For Music (AREA)
- Electrophonic Musical Instruments (AREA)
Abstract
The invention is applicable to the technical field of data processing, and provides a rhythm generation method, a device, terminal equipment and a computer readable storage medium based on voice synthesis, which comprise the following steps: obtaining target lyrics, beat types and measures, and calculating a rhythm note time value according to the beat types and the measures; acquiring at least two preset notes, generating at least two simulated rhythms based on the target lyrics, the rhythm note time values and the at least two preset notes, and scoring each simulated rhythm to obtain a rhythm score value, wherein each word in the target lyrics corresponds to at least one preset note; and determining the simulated rhythm corresponding to the rhythm score value with the largest value as a target rhythm, and outputting the target rhythm. According to the method, the rhythms are automatically generated, and the rhythms with high scores are preferentially selected, so that convenience and accuracy of rhythms are improved.
    Description
Technical Field
      The invention belongs to the technical field of data processing, and particularly relates to a rhythm generation method and device based on voice synthesis, terminal equipment and a computer readable storage medium. 
    Background
      With the development of the age, music has become an indispensable part of people's daily life. Music composition comprises word making and composition, and the words making is simpler and easier to master, so that the composition is a common music composition mode at present according to the composed lyrics.
      In the prior art, only the operations of detecting, identifying, extracting and the like can be performed on the existing music rhythm, but no music rhythm generation technology exists, so that the music rhythm can only be written manually, and the difficulty is high for people who are not familiar with the music theory. In summary, in the prior art, a rhythm corresponding to lyrics needs to be manually written, and the difficulty of generating the rhythm is high.
    Disclosure of Invention
      In view of this, the embodiments of the present invention provide a method, an apparatus, a terminal device, and a computer readable storage medium for generating a tempo based on speech synthesis, so as to solve the problem of high difficulty in generating a tempo in the prior art.
      A first aspect of an embodiment of the present invention provides a method for generating a tempo based on speech synthesis, including:
      acquiring target lyrics, beat types and measures, and calculating a rhythm note time value according to the beat types and the measures, wherein the rhythm note time value is used for indicating the duration of a rhythm corresponding to the target lyrics; 
      Acquiring at least two preset notes, generating at least two simulated rhythms based on the target lyrics, the rhythm note time values and the at least two preset notes, and scoring each simulated rhythm to obtain a rhythm score value, wherein each word in the target lyrics corresponds to at least one preset note;
      and determining the simulated rhythm corresponding to the rhythm score value with the largest value as a target rhythm, and outputting the target rhythm.
      A second aspect of an embodiment of the present invention provides a tempo generation device based on speech synthesis, including:
      a calculating unit, configured to obtain a target lyric, a beat type, and a measure, and calculate a rhythm note duration according to the beat type and the measure, where the rhythm note duration is used to indicate a duration of a rhythm corresponding to the target lyric;
      the scoring unit is used for obtaining at least two preset notes, generating at least two simulated rhythms based on the target lyrics, the rhythm note time values and the at least two preset notes, and scoring each simulated rhythm to obtain a rhythm score value, wherein each word in the target lyrics corresponds to at least one preset note; 
      And the output unit is used for determining the simulated rhythm corresponding to the rhythm score value with the largest value as a target rhythm and outputting the target rhythm.
      A third aspect of an embodiment of the present invention provides a terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
      acquiring target lyrics, beat types and measures, and calculating a rhythm note time value according to the beat types and the measures, wherein the rhythm note time value is used for indicating the duration of a rhythm corresponding to the target lyrics;
      acquiring at least two preset notes, generating at least two simulated rhythms based on the target lyrics, the rhythm note time values and the at least two preset notes, and scoring each simulated rhythm to obtain a rhythm score value, wherein each word in the target lyrics corresponds to at least one preset note;
      and determining the simulated rhythm corresponding to the rhythm score value with the largest value as a target rhythm, and outputting the target rhythm. 
      A fourth aspect of the embodiments of the present invention provides a computer readable storage medium storing a computer program which when executed by a processor performs the steps of:
      acquiring target lyrics, beat types and measures, and calculating a rhythm note time value according to the beat types and the measures, wherein the rhythm note time value is used for indicating the duration of a rhythm corresponding to the target lyrics;
      acquiring at least two preset notes, generating at least two simulated rhythms based on the target lyrics, the rhythm note time values and the at least two preset notes, and scoring each simulated rhythm to obtain a rhythm score value, wherein each word in the target lyrics corresponds to at least one preset note;
      and determining the simulated rhythm corresponding to the rhythm score value with the largest value as a target rhythm, and outputting the target rhythm.
      Compared with the prior art, the embodiment of the invention has the beneficial effects that:
      according to the embodiment of the invention, a rhythm note time value is calculated according to the obtained beat type and the number of bars, at least two simulated rhythms reaching the rhythm note time value are generated according to the target lyrics, the rhythm note time value and at least two preset notes, each generated simulated rhythm is scored to obtain a rhythm score value, and the simulated rhythm corresponding to the rhythm score value with the highest value is output. 
    Drawings
      In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
      Fig. 1 is a flowchart of an implementation of a tempo generation method based on speech synthesis according to a first embodiment of the present invention;
      fig. 2 is a flowchart of an implementation of a tempo generation method based on speech synthesis according to a second embodiment of the present invention;
      fig. 3 is a flowchart of an implementation of a tempo generation method based on speech synthesis according to a third embodiment of the present invention;
      fig. 4 is a flowchart of an implementation of a tempo generation method based on speech synthesis according to a fourth embodiment of the present invention;
      fig. 5 is a block diagram of a rhythm generation device based on speech synthesis according to a fifth embodiment of the present invention;
      fig. 6 is a schematic diagram of a terminal device according to a sixth embodiment of the present invention.
    Detailed Description
      In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail. 
      In order to illustrate the technical scheme of the invention, the following description is made by specific examples.
      Fig. 1 shows an implementation flow of a rhythm generation method based on speech synthesis according to an embodiment of the present invention, which is described in detail below:
      in S101, a target lyric, a beat type, and a measure are acquired, and a tempo note duration is calculated according to the beat type and the measure, where the tempo note duration is used to indicate a duration of a tempo corresponding to the target lyric.
      In order to realize automatic generation of rhythms, firstly, target lyrics to be added with rhythms, beat types and corresponding bar numbers are acquired, wherein the target lyrics are in a text format, and in the embodiment of the invention, each sentence of lyrics written can be independently used as one target lyric, and all the lyrics can be uniformly used as the target lyrics to be input. The beat type indicates a rule of combining strong beats and weak beats, and in the embodiment of the present invention, the beat type includes, but is not limited to, one quarter beat, two quarter beats, three quarter beats, four quarter beats and three eighth beats, and the bar is a length division unit related to the beat type in the beat, for example, assuming that the beat type is four quarter beats, one quarter note is taken as one beat, and four beats are included in each bar. The number of the bars refers to the number of the bars, and the number of the bars in the embodiment of the invention can be customized in advance. 
      Optionally, the word number of the target lyrics is obtained, the word number is compared with a preset measure of the measure, and the result obtained after the comparison is subjected to one operation to obtain the measure. In the embodiment of the invention, the number of the sections can be calculated according to the word number of the target lyrics besides the preset number of the sections. Specifically, the measure of the measure is firstly set, the value can be set according to experience or beat type, for example, the value is set to be 8, then the word number of the target lyrics is compared with the measure of the measure, and the surplus note time values in the measure can be filled by rest, so that the measure is obtained by carrying out one operation on the result obtained after the comparison operation, and the fact that the measure is too small and the simulation rhythm generated later is too compact is prevented. For example, if the number of words of the target lyrics is 7, the result obtained by comparing the number of words of the target lyrics with the measure of measure is 0.875, and the measure obtained by performing further operation is 1, i.e. the simulated tempo corresponding to the target lyrics is limited to only one measure.
      And calculating a rhythm note time value according to the obtained beat type and the measure number, wherein the rhythm note time value indicates the duration of the expected rhythm corresponding to the target lyrics, and for convenience of explanation, the total rhythm time value is calculated by taking the note time value of the quarter note as a basic unit, and particularly the note time value of the quarter note is determined to be 1. By way of example, assuming that the beat type is four beats, the bar number is 2, and since all bars contain 8 quarter notes, the tempo note value is 8. It should be understood, however, that this is not a limitation of the embodiments of the present invention, and other note duration values may be used as a basic unit in a practical application scenario, such as the note duration value of the eighth note or the note duration value of the sixteenth note. 
      In S102, at least two preset notes are obtained, at least two simulated rhythms are generated based on the target lyrics, the rhythm note duration values and at least two preset notes, and each simulated rhythm is scored to obtain a rhythm score value, wherein each word in the target lyrics corresponds to at least one preset note.
      After determining the duration of the rhythm note, the construction of the rhythm is started, that is, at least two preset notes serving as construction bases are acquired, the preset notes can be freely set according to actual application scenes, and in order to reduce the calculation amount of the constructed rhythm, one setting mode of the embodiment of the invention is to set all preset notes including full notes, symbol point half notes, symbol point quarter notes, symbol point eighth notes, eighth notes and sixteen notes without considering the triple notes and the decoration notes, and the corresponding note duration values are 4, 3, 2, 1.5, 1, 0.75, 0.5 and 0.25 respectively. At least two simulated rhythms are randomly generated based on the obtained target lyrics, the rhythm note duration and at least two preset notes, wherein when the simulated rhythms are randomly generated based on the preset notes, the simulated rhythms are generated according to the following conditions: (1) The sum of note values of all notes in the simulated rhythm is the same as the note value of the rhythm; (2) Each word in the target lyrics corresponds to at least one preset note in the simulated rhythm, and the preset notes corresponding to different words are located at different positions in the simulated rhythm. For each obtained simulated rhythm, scoring is carried out to obtain a rhythm score value, a scoring mechanism can be freely set, for example, a relatively continuous rhythm tends to be generated, different scores can be preset for different preset notes, and the score corresponding to the preset note is higher when the note value is higher. It should be noted that, in generating the simulated tempo, in order to promote the order, the generation is performed in the order of words in the target lyrics. 
      Optionally, if the sum of the note duration values of all notes in the generated simulated rhythm does not reach the rhythm note duration value, filling the rest into the simulated rhythm until the sum of the note duration values of all notes in the simulated rhythm reaches the rhythm note duration value. For a certain generated simulated rhythm, if the sum of note duration values of all notes in the simulated rhythm exceeds the rhythm note duration value, deleting the simulated rhythm and not bringing the simulated rhythm into the range of subsequent scoring; if the sum of note time values of all notes in the simulated rhythm does not reach the rhythm note time value, in order to ensure the uniformity of the obtained sum of note time values of all the simulated rhythms, and also in order to enlarge the scale of the generated simulated rhythm, filling the rest into the simulated rhythm until the sum of note time values of all the notes in the simulated rhythm reaches the rhythm note time value. It should be noted that, before filling the rest, calculating the absolute value of the difference between the sum of note values of all notes in the simulated rhythm and the note value of the rhythm, and determining the type of the rest to be filled according to the absolute value of the difference, for example, if the absolute value of the difference is 1, filling the quarter rest into the simulated rhythm; the absolute value of the difference is 0.25, sixteen rest marks are filled into the simulated tempo. The rest may be filled in the position of the head or the end of the sentence of the simulated rhythm, which is not limited in the embodiment of the present invention. 
      In S103, the simulated tempo corresponding to the tempo score value with the largest value is determined as a target tempo, and the target tempo is output.
      The rhythm score value with the largest value in all rhythm score values obtained after scoring is determined, the rhythm score value can be used for determining preset notes contained in the simulated rhythm, but the arrangement sequence of the preset notes in the simulated rhythm cannot be determined, so the rhythm score value with the largest value often corresponds to at least one simulated rhythm. In order to realize the diversity of the output rhythms, the user can conveniently select the rhythms, all the simulated rhythms corresponding to the rhythms with the largest numerical value are determined as target rhythms one by one, and all the target rhythms are output. In addition, all the simulated rhythms can be ordered based on the numerical sequence of the rhythms score values, the ordered simulated rhythms are output, and the selection range of the user is enlarged, wherein the numerical sequence can be the sequence from big to small of the rhythms score values, or the sequence from small to big of the rhythms score values.
      As can be seen from the embodiment shown in fig. 1, in the embodiment of the present invention, by acquiring a target lyric, a beat type and a measure, calculating a rhythm note duration according to the beat type and the measure, acquiring at least two preset notes, generating at least two simulated rhythms based on the target lyric, the rhythm note duration and the at least two preset notes, scoring each simulated rhythm to obtain a rhythm score value, and finally outputting a simulated rhythm corresponding to the rhythm score value with the largest value as a target rhythm. 
      Fig. 2 shows a method of refining the process of scoring each simulated tempo to obtain a tempo score value on the basis of the first embodiment of the present invention. The embodiment of the invention provides a realization flow chart of a rhythm generation method based on voice synthesis, as shown in fig. 2, the rhythm generation method can comprise the following steps:
      in S201, a base score corresponding to each note in the simulated rhythm is obtained, wherein different notes correspond to different base scores.
      In the embodiment of the present invention, in order to facilitate calculation of the tempo score value, different base scores may be preset for different notes, for example, in the following setting manner: setting the basic score corresponding to the whole note, the symbol point half note and the half note to 35; setting the corresponding basic scores of the quarter notes and the quarter notes of the symbol points to be 20; setting the corresponding basic scores of the symbol point octave, the octave and the sixteen score to be 10; the basic score corresponding to the rest is set to 0, however, the above setting method is merely an example, and other setting methods can be applied to the actual application scenario. And for each constructed simulated rhythm, obtaining a basic score corresponding to each note in the simulated rhythm. 
      In S202, determining a word type of a word corresponding to each note in the simulated rhythm, obtaining a weighting coefficient corresponding to the word type, and performing weighted summation on all the base scores according to the weighting coefficient to obtain the rhythm score value, wherein the word type comprises an initial character and a final character, and different word types correspond to different weighting coefficients.
      Because the simulated rhythm is generated based on the target lyrics, the embodiment of the invention obtains the weighting coefficient according to the word type of the word, and weights the basic score according to the weighting coefficient, wherein the word type comprises an initial consonant word and a final word. One setting method of the weighting coefficient is that the weighting coefficient corresponding to the initial consonant word is determined to be 2, the weighting coefficient corresponding to the final consonant word is determined to be 1, wherein the initial consonant word refers to a word containing the initial consonant, the initial consonant includes b, p, m, f, d, t, n, l, g, k, h, j, q, x, zh, ch, sh, r, z, c, s, y and w, and the initial consonant word is like 'not', 'on' or 'return', and the like; the final words are words containing only final, and the final words include a, o, e, i, u, u, ai, ei, ui, ao, ou, iu, ie, u e, er, an, en, in, un, u n, ang, eng, ing and ong, and the final words are "ao", "ou" or "an", and the actual setting mode is not limited to this, but the weighting coefficient corresponding to the final words is higher than the weighting coefficient corresponding to the final words because the pronunciation duration occupied by the final words is shorter during setting. In addition, the type of the words which are lighter in language and contain initials can be set as the final words in advance. 
      For each generated simulated rhythm, determining the word type of the word corresponding to each note in the simulated rhythm, obtaining a weighting coefficient corresponding to the word type, weighting the basic score corresponding to each note according to the weighting coefficient corresponding to each note, and summing the weighted results corresponding to all notes in the simulated rhythm to obtain a rhythm score value, wherein the calculation formula of the rhythm score value is as follows:
      in the calculation formula, g is a rhythm score value, k is the number of notes in the simulated rhythm, and k is more than or equal to l and z i For the base score corresponding to the ith note, y zi The weighting coefficient corresponding to the ith note.
      As can be seen from the embodiment shown in fig. 2, in the embodiment of the present invention, by obtaining the base score and the weighting coefficient corresponding to each note in the simulated rhythm, weighting the base score based on the weighting coefficient, and summing all the weighted results to obtain the rhythm score value, the embodiment of the present invention improves the accuracy of calculating the rhythm score value by setting a specific scoring mechanism.
      Fig. 3 shows a method of refining a process of generating at least two simulated rhythms based on a target lyric, a tempo note duration and at least two preset notes, based on a first embodiment of the present invention. The embodiment of the invention provides a realization flow chart of a rhythm generation method based on voice synthesis, as shown in fig. 3, the rhythm generation method can comprise the following steps: 
      In S301, a conjunctive word and a non-conjunctive word in the target lyrics are analyzed, where the conjunctive word includes at least two words.
      When generating the simulated rhythm, in the embodiment of the invention, different note matching modes are implemented for the conjunctions and the non-conjunctions in the target lyrics, wherein the conjunctions are formed by at least two words in the target lyrics. When the conjunctions and the non-conjunctive words are analyzed, the target lyrics can be matched with a preset conjunctive word library, the conjunctions which are successfully matched in the target lyrics are determined, and the words except the conjunctions in the target lyrics are determined to be the non-conjunctive words, wherein the conjunctive word library comprises at least two conjunctions, such as 'sleeping', 'flying', 'eating' and the like, and the conjunctions in the conjunctive word library can be freely set or the conjunctive word library with an open source can be directly called.
      In S302, at least two simulated rhythms are generated based on the target lyrics, the rhythm note duration and at least two preset notes, wherein each of the conjunctions in the target lyrics corresponds to one of the preset notes, and each of the non-conjunctions in the target lyrics corresponds to one of the preset notes.
      At least two simulated rhythms are generated based on the target lyrics, the rhythm note duration and at least two preset notes, wherein each conjunctive in the target lyrics is defined to correspond to one preset note and each non-conjunctive word in the target lyrics is defined to correspond to one preset note because the conjunctions are usually continuous reading and take a short time. 
      Optionally, the note duration of the preset note corresponding to each conjunctive in the target lyrics is defined to be greater than or equal to the note duration of the quarter note. In order to prevent the duration of the conjunctions in the rhythm from being too short, the note duration of the preset notes corresponding to each conjunctions in the target lyrics can be limited to be larger than or equal to the note duration of the quarter notes, so that the music performance of the generated simulated rhythm is improved.
      As can be seen from the embodiment shown in fig. 3, in the embodiment of the present invention, by analyzing the conjunctions and the non-conjunctions in the target lyrics, at least two simulated rhythms are generated based on the target lyrics, the rhythm note duration and at least two preset notes, wherein each conjunctions in the target lyrics corresponds to one preset note, and each non-conjunctions in the target lyrics corresponds to one preset note.
      Fig. 4 shows a method of determining, based on the first embodiment of the present invention, a simulated tempo corresponding to a tempo score having a largest value as a target tempo, and refining a process of outputting the target tempo. The embodiment of the invention provides a realization flow chart of a rhythm generation method based on voice synthesis, as shown in fig. 4, the rhythm generation method can comprise the following steps: 
      In S401, a beat strength relationship corresponding to the beat type is obtained.
      In order to improve the audibility of the target rhythm, in the embodiment of the invention, a beat intensity relationship corresponding to the beat type is also obtained, the beat intensity relationship indicates the intensity relationship of each beat in each bar, and if the beat type is four-quarter beat, the corresponding beat intensity relationship is strong-weak-secondary strong-weak; the beat type is three-quarter beat, and the corresponding beat intensity relationship is intensity-intensity.
      In S402, the intensity value of each beat in each bar of the target rhythm is set according to the beat intensity relationship, and the target rhythm after the setting is completed is output.
      Setting the intensity value of each beat in each bar of the target rhythm according to the beat intensity relationship, specifically, inputting the target rhythm into a musical instrument digital interface (Musical Instrument Digital Interface, MIDI), setting the intensity value (i.e. velocity) corresponding to the beat intensity relationship for each beat, for example, setting the intensity value range of the intensity value corresponding to the beat intensity relationship to be 100-127, setting the intensity value range of the intensity value corresponding to the next intensity to be 81-99, setting the intensity value range of the intensity value corresponding to the next intensity to be 60-80, and completing the setting of the intensity value by randomly selecting the intensity value range, for example, setting the beat type to be four-fourth beat, wherein in one bar related to the beat type, one set result is that: the first quarter note has a magnitude of 101, the second quarter note has a magnitude of 61, the third quarter note has a magnitude of 90, and the fourth quarter note has a magnitude of 62. And outputting the set target rhythm after the setting of the dynamics value of each bar in the target rhythm is completed. 
      As can be seen from the embodiment shown in fig. 4, in the embodiment of the present invention, by acquiring the beat intensity relationship corresponding to the beat type, setting the intensity value of each beat in each bar of the target rhythm according to the beat intensity relationship, and outputting the target rhythm after the setting is completed, so as to improve the audibility and the musical performance of the target rhythm.
      It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.
      Corresponding to the method for generating a tempo based on speech synthesis according to the above embodiment, fig. 5 shows a block diagram of a tempo generation device based on speech synthesis according to an embodiment of the present invention, and referring to fig. 5, the tempo generation device includes:
      a calculating unit 51, configured to obtain a target lyric, a beat type, and a measure, and calculate a rhythm note duration according to the beat type and the measure, where the rhythm note duration is used to indicate a duration of a rhythm corresponding to the target lyric;
      the scoring unit 52 is configured to obtain at least two preset notes, generate at least two simulated rhythms based on the target lyrics, the rhythm note duration values and the at least two preset notes, and score each simulated rhythm to obtain a rhythm score value, where each word in the target lyrics corresponds to at least one preset note; 
      An output unit 53, configured to determine the simulated tempo corresponding to the tempo score value with the largest value as a target tempo, and output the target tempo.
      Alternatively, the scoring unit 52 includes:
      a base score obtaining unit, configured to obtain a base score corresponding to each note in the simulated rhythm, where different notes correspond to different base scores;
      and the summing unit is used for determining the word type of the word corresponding to each note in the simulated rhythm, obtaining the weighting coefficient corresponding to the word type, and carrying out weighted summation on all the basic scores according to the weighting coefficient to obtain the rhythm score value, wherein the word type comprises an initial consonant word and a final word, and different word types correspond to different weighting coefficients.
      Alternatively, the scoring unit 52 includes:
      the analysis unit is used for analyzing the conjunctions and the non-conjunctions in the target lyrics, wherein the conjunctions comprise at least two characters;
      the generating unit is configured to generate at least two simulated rhythms based on the target lyrics, the rhythm note duration and at least two preset notes, where each of the conjunctions in the target lyrics corresponds to one of the preset notes, and each of the non-conjunctions in the target lyrics corresponds to one of the preset notes. 
      Optionally, the computing unit 51 includes:
      the comparing unit is used for obtaining the word number of the target lyrics, comparing the word number with a preset measure number of the measure section, and carrying out one operation on the result obtained after the comparison operation to obtain the measure section number.
      Alternatively, the output unit 53 includes:
      the relationship acquisition unit is used for acquiring the beat strength relationship corresponding to the beat type;
      and the dynamics setting unit is used for setting the dynamics value of each beat in each bar of the target rhythm according to the beat intensity relationship and outputting the target rhythm after the setting is completed.
      Alternatively, the scoring unit 52 includes:
      and the filling unit is used for filling the rest into the simulated rhythm until the sum of the note time values of all notes in the simulated rhythm reaches the rhythm note time value if the sum of the note time values of all notes in the simulated rhythm does not reach the rhythm note time value.
      Therefore, the rhythm generating device based on the voice synthesis, provided by the embodiment of the invention, can automatically generate the rhythm and output the rhythm with the highest score preferentially, so that the convenience and accuracy of rhythm generation are improved. 
      Fig. 6 is a schematic diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 6, the terminal device 6 of this embodiment includes: a processor 60, a memory 61 and a computer program 62 stored in said memory 61 and executable on said processor 60, for example a cadence generation program based on speech synthesis. The processor 60, when executing the computer program 62, implements the steps of the above-described respective embodiments of the tempo generation method based on speech synthesis, such as steps S101 to S103 shown in fig. 1. Alternatively, the processor 60, when executing the computer program 62, implements the functions of the units in the above-described embodiments of the rhythm generation device based on speech synthesis, such as the functions of the units 51 to 53 shown in fig. 5.
      Illustratively, the computer program 62 may be partitioned into one or more units that are stored in the memory 61 and executed by the processor 60 to complete the present invention. The one or more units may be a series of computer program instruction segments capable of performing a specific function for describing the execution of the computer program 62 in the terminal device 6. For example, the computer program 62 may be divided into a calculation unit, a scoring unit and an output unit, each unit functioning specifically as follows: 
      A calculating unit, configured to obtain a target lyric, a beat type, and a measure, and calculate a rhythm note duration according to the beat type and the measure, where the rhythm note duration is used to indicate a duration of a rhythm corresponding to the target lyric;
      the scoring unit is used for obtaining at least two preset notes, generating at least two simulated rhythms based on the target lyrics, the rhythm note time values and the at least two preset notes, and scoring each simulated rhythm to obtain a rhythm score value, wherein each word in the target lyrics corresponds to at least one preset note;
      and the output unit is used for determining the simulated rhythm corresponding to the rhythm score value with the largest value as a target rhythm and outputting the target rhythm.
      The terminal device 6 may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, etc. The terminal device may include, but is not limited to, a processor 60, a memory 61. It will be appreciated by those skilled in the art that fig. 6 is merely an example of the terminal device 6 and does not constitute a limitation of the terminal device 6, and may include more or less components than illustrated, or may combine certain components, or different components, e.g., the terminal device may further include an input-output device, a network access device, a bus, etc. 
      The processor 60 may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
      The memory 61 may be an internal storage unit of the terminal device 6, such as a hard disk or a memory of the terminal device 6. The memory 61 may be an external storage device of the terminal device 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal device 6. Further, the memory 61 may also include both an internal storage unit and an external storage device of the terminal device 6. The memory 61 is used for storing the computer program and other programs and data required by the terminal device. The memory 61 may also be used for temporarily storing data that has been output or is to be output. 
      It will be clearly understood by those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units is illustrated, and in practical application, the above-mentioned functional allocation may be performed by different functional units, that is, the internal structure of the terminal device is divided into different functional units, so as to perform all or part of the above-mentioned functions. The functional units in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present application. The specific working process of the units in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
      In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments. 
      Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
      In the embodiments provided in the present invention, it should be understood that the disclosed terminal device and method may be implemented in other manners. For example, the above-described terminal device embodiments are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms. 
      The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
      In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
      The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals. 
      The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.
    Claims (6)
1. A rhythm generation method based on speech synthesis, characterized by comprising:
      acquiring target lyrics, beat types and measures, and calculating a rhythm note time value according to the beat types and the measures, wherein the rhythm note time value is used for indicating the duration of a rhythm corresponding to the target lyrics;
      acquiring at least two preset notes, generating at least two simulated rhythms based on the target lyrics, the rhythm note time values and the at least two preset notes, and scoring each simulated rhythm to obtain a rhythm score value, wherein each word in the target lyrics corresponds to at least one preset note; 
      Determining the simulated rhythm corresponding to the rhythm score value with the largest value as a target rhythm, and outputting the target rhythm;
      scoring each of the simulated rhythms to obtain a rhythmic score value comprises:
      obtaining a basic score corresponding to each note in the simulated rhythm, wherein different notes correspond to different basic scores;
      determining the word type of a word corresponding to each note in the simulated rhythm, obtaining a weighting coefficient corresponding to the word type, and carrying out weighted summation on all basic scores according to the weighting coefficient to obtain the rhythm score value, wherein the word type comprises an initial consonant word and a vowel word, and different word types correspond to different weighting coefficients;
      the obtaining the target lyrics, beat type and bar number includes:
      obtaining the word number of the target lyrics, comparing the word number with a preset measure of the measure, and carrying out one operation on the result obtained after the comparison operation to obtain the measure;
      the step of determining the simulated tempo corresponding to the tempo score value with the largest value as a target tempo and outputting the target tempo includes: 
      Acquiring a beat strength relationship corresponding to the beat type;
      and setting the intensity value of each beat in each bar of the target rhythm according to the beat intensity relationship, and outputting the target rhythm after the setting is completed.
    2. The rhythm generation method of claim 1 wherein said generating at least two simulated rhythms based on said target lyrics, said rhythm note duration and at least two of said preset notes comprises:
      analyzing the conjunctions and non-conjunctions in the target lyrics, wherein the conjunctions comprise at least two characters;
      generating at least two simulated rhythms based on the target lyrics, the rhythm note duration and at least two preset notes, wherein each conjunctive in the target lyrics corresponds to one preset note, and each non-conjunctive word in the target lyrics corresponds to one preset note.
    3. The rhythm generation method of claim 1 wherein said generating at least two simulated rhythms based on said target lyrics, said rhythm note duration and at least two of said preset notes comprises:
      and filling rest into the simulated rhythm until the sum of the note time values of all notes in the simulated rhythm reaches the rhythm note time value if the sum of the note time values of all notes in the simulated rhythm does not reach the rhythm note time value. 
    4. A rhythm generation device based on speech synthesis, characterized by comprising:
      a calculating unit, configured to obtain a target lyric, a beat type, and a measure, and calculate a rhythm note duration according to the beat type and the measure, where the rhythm note duration is used to indicate a duration of a rhythm corresponding to the target lyric;
      the scoring unit is used for obtaining at least two preset notes, generating at least two simulated rhythms based on the target lyrics, the rhythm note time values and the at least two preset notes, and scoring each simulated rhythm to obtain a rhythm score value, wherein each word in the target lyrics corresponds to at least one preset note;
      the output unit is used for determining the simulated rhythm corresponding to the rhythm score value with the largest value as a target rhythm and outputting the target rhythm;
      the scoring unit includes:
      a base score obtaining unit, configured to obtain a base score corresponding to each note in the simulated rhythm, where different notes correspond to different base scores;
      the summing unit is used for determining the word type of the word corresponding to each note in the simulated rhythm, obtaining the weighting coefficient corresponding to the word type, and carrying out weighted summation on all the basic scores according to the weighting coefficient to obtain the rhythm score value, wherein the word type comprises an initial consonant word and a final word, and different word types correspond to different weighting coefficients; 
      The calculation unit includes:
      the comparing unit is used for acquiring the word number of the target lyrics, comparing the word number with a preset measure number of the measure section, and carrying out one operation on a result obtained after the comparison operation to obtain the measure section number;
      the output unit includes:
      the relationship acquisition unit is used for acquiring the beat strength relationship corresponding to the beat type;
      and the dynamics setting unit is used for setting the dynamics value of each beat in each bar of the target rhythm according to the beat intensity relationship and outputting the target rhythm after the setting is completed.
    5. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
      acquiring target lyrics, beat types and measures, and calculating a rhythm note time value according to the beat types and the measures, wherein the rhythm note time value is used for indicating the duration of a rhythm corresponding to the target lyrics;
      acquiring at least two preset notes, generating at least two simulated rhythms based on the target lyrics, the rhythm note time values and the at least two preset notes, and scoring each simulated rhythm to obtain a rhythm score value, wherein each word in the target lyrics corresponds to at least one preset note; 
      Determining the simulated rhythm corresponding to the rhythm score value with the largest value as a target rhythm, and outputting the target rhythm;
      scoring each of the simulated rhythms to obtain a rhythmic score value comprises:
      obtaining a basic score corresponding to each note in the simulated rhythm, wherein different notes correspond to different basic scores;
      determining the word type of a word corresponding to each note in the simulated rhythm, obtaining a weighting coefficient corresponding to the word type, and carrying out weighted summation on all basic scores according to the weighting coefficient to obtain the rhythm score value, wherein the word type comprises an initial consonant word and a vowel word, and different word types correspond to different weighting coefficients;
      the obtaining the target lyrics, beat type and bar number includes:
      obtaining the word number of the target lyrics, comparing the word number with a preset measure of the measure, and carrying out one operation on the result obtained after the comparison operation to obtain the measure;
      the step of determining the simulated tempo corresponding to the tempo score value with the largest value as a target tempo and outputting the target tempo includes: 
      Acquiring a beat strength relationship corresponding to the beat type;
      and setting the intensity value of each beat in each bar of the target rhythm according to the beat intensity relationship, and outputting the target rhythm after the setting is completed.
    6. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the tempo generation method of any one of claims 1-3.
    Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN201910008106.9A CN109841202B (en) | 2019-01-04 | 2019-01-04 | Rhythm generation method and device based on voice synthesis and terminal equipment | 
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN201910008106.9A CN109841202B (en) | 2019-01-04 | 2019-01-04 | Rhythm generation method and device based on voice synthesis and terminal equipment | 
Publications (2)
| Publication Number | Publication Date | 
|---|---|
| CN109841202A CN109841202A (en) | 2019-06-04 | 
| CN109841202B true CN109841202B (en) | 2023-12-29 | 
Family
ID=66883696
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date | 
|---|---|---|---|
| CN201910008106.9A Active CN109841202B (en) | 2019-01-04 | 2019-01-04 | Rhythm generation method and device based on voice synthesis and terminal equipment | 
Country Status (1)
| Country | Link | 
|---|---|
| CN (1) | CN109841202B (en) | 
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN110517656B (en) * | 2019-08-02 | 2024-04-26 | 平安科技(深圳)有限公司 | Lyric rhythm generation method, device, storage medium and apparatus | 
| CN110516103B (en) * | 2019-08-02 | 2022-10-14 | 平安科技(深圳)有限公司 | Song rhythm generation method, device, storage medium and apparatus based on classifier | 
| CN113793589A (en) * | 2020-05-26 | 2021-12-14 | 华为技术有限公司 | Speech synthesis method and device | 
| CN113658570B (en) * | 2021-10-19 | 2022-02-11 | 腾讯科技(深圳)有限公司 | Song processing method, apparatus, computer device, storage medium, and program product | 
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN106373580A (en) * | 2016-09-05 | 2017-02-01 | 北京百度网讯科技有限公司 | Method and device for synthesizing singing voice based on artificial intelligence | 
| CN106652984A (en) * | 2016-10-11 | 2017-05-10 | 张文铂 | Automatic song creation method via computer | 
| CN108231048A (en) * | 2017-12-05 | 2018-06-29 | 北京小唱科技有限公司 | Correct the method and device of audio rhythm | 
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| WO2010048636A1 (en) * | 2008-10-24 | 2010-04-29 | Magnaforte, Llc | Media system with playing component | 
- 
        2019
        - 2019-01-04 CN CN201910008106.9A patent/CN109841202B/en active Active
 
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN106373580A (en) * | 2016-09-05 | 2017-02-01 | 北京百度网讯科技有限公司 | Method and device for synthesizing singing voice based on artificial intelligence | 
| CN106652984A (en) * | 2016-10-11 | 2017-05-10 | 张文铂 | Automatic song creation method via computer | 
| CN108231048A (en) * | 2017-12-05 | 2018-06-29 | 北京小唱科技有限公司 | Correct the method and device of audio rhythm | 
Also Published As
| Publication number | Publication date | 
|---|---|
| CN109841202A (en) | 2019-06-04 | 
Similar Documents
| Publication | Publication Date | Title | 
|---|---|---|
| CN109841202B (en) | Rhythm generation method and device based on voice synthesis and terminal equipment | |
| CN109166564B (en) | Method, apparatus and computer readable storage medium for generating a musical composition for a lyric text | |
| JP7642335B2 (en) | Information processing device, method, and program | |
| CN109920449B (en) | Beat analysis method, audio processing method, device, equipment and medium | |
| CN110246472B (en) | Music style conversion method and device and terminal equipment | |
| EP1970895A1 (en) | Speech synthesis apparatus and method | |
| CN109461459A (en) | Speech assessment method, apparatus, computer equipment and storage medium | |
| CN114267318B (en) | Midi music file generation method, storage medium and terminal | |
| CN113010730B (en) | Music file generation method, device, equipment and storage medium | |
| CN109326270A (en) | Generation method, terminal device and the medium of audio file | |
| CN116645957B (en) | Music generation method, device, terminal, storage medium and program product | |
| JP7363107B2 (en) | Idea support devices, idea support systems and programs | |
| EP3779814A1 (en) | Method and device for training adaptation level evaluation model, and method and device for evaluating adaptation level | |
| CN109859739B (en) | Melody generation method and device based on voice synthesis and terminal equipment | |
| CN113140230A (en) | Method, device and equipment for determining pitch value of note and storage medium | |
| CN111399745B (en) | Music playing method, music playing interface generation method and related products | |
| CN119137653A (en) | Computational systems and methods for music generation | |
| CN112989109B (en) | Music structure analysis method, electronic device and storage medium | |
| US9040799B2 (en) | Techniques for analyzing parameters of a musical performance | |
| US20240087549A1 (en) | Musical score creation device, training device, musical score creation method, and training method | |
| JPH0736478A (en) | Calculating device for similarity between note sequences | |
| CN113658570B (en) | Song processing method, apparatus, computer device, storage medium, and program product | |
| JP5513985B2 (en) | CHARACTER VECTOR GENERATION DEVICE, CHARACTER VECTOR GENERATION METHOD, PROGRAM, AND COMPUTER-READABLE RECORDING MEDIUM CONTAINING THE PROGRAM | |
| JP3371761B2 (en) | Name reading speech synthesizer | |
| CN112632401A (en) | Recommendation device, information providing system, recommendation method, and storage medium | 
Legal Events
| Date | Code | Title | Description | 
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |