CN109979257B - Method for performing accurate splitting operation correction based on English reading automatic scoring - Google Patents
Method for performing accurate splitting operation correction based on English reading automatic scoring Download PDFInfo
- Publication number
- CN109979257B CN109979257B CN201910346958.9A CN201910346958A CN109979257B CN 109979257 B CN109979257 B CN 109979257B CN 201910346958 A CN201910346958 A CN 201910346958A CN 109979257 B CN109979257 B CN 109979257B
- Authority
- CN
- China
- Prior art keywords
- score
- voice
- word
- letter
- english
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 17
- 238000012937 correction Methods 0.000 title claims abstract description 9
- 238000004458 analytical method Methods 0.000 claims abstract description 14
- 230000006870 function Effects 0.000 claims description 19
- 230000009191 jumping Effects 0.000 claims description 10
- 238000012360 testing method Methods 0.000 claims description 8
- 238000011156 evaluation Methods 0.000 claims description 7
- 101150035983 str1 gene Proteins 0.000 claims description 6
- 125000004122 cyclic group Chemical group 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 3
- 238000003491 array Methods 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 claims description 2
- 230000003247 decreasing effect Effects 0.000 claims description 2
- 238000012986 modification Methods 0.000 claims description 2
- 230000004048 modification Effects 0.000 claims description 2
- 238000011160 research Methods 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- MPOKJOWFCMDRKP-UHFFFAOYSA-N gold;hydrate Chemical compound O.[Au] MPOKJOWFCMDRKP-UHFFFAOYSA-N 0.000 description 1
- 210000004932 little finger Anatomy 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000013077 scoring method Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 210000005182 tip of the tongue Anatomy 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Images
Classifications
- 
        - G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B5/00—Electrically-operated educational appliances
- G09B5/04—Electrically-operated educational appliances with audible presentation of the material to be studied
 
- 
        - G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B7/00—Electrically-operated teaching apparatus or devices working with questions and answers
- G09B7/02—Electrically-operated teaching apparatus or devices working with questions and answers of the type wherein the student is expected to construct an answer to the question which is presented or wherein the machine gives an answer to the question presented by a student
 
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Educational Administration (AREA)
- Educational Technology (AREA)
- General Physics & Mathematics (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The invention relates to a method for carrying out splitting operation accurate correction based on automatic scoring of English reading, which takes an HMM posterior probability algorithm as a scoring basis, carries out three-level operation analysis on English reading voice, namely whole sentence, word and syllable phoneme, obtains voice characteristics from the whole sentence voice of English reading, compares the voice characteristics with a standard reference model for scoring, carries out splitting operation analysis on the whole sentence voice with unqualified values to obtain words with wrong reading, further carries out splitting on phoneme and syllable level of word pronunciation, accurately positions the position with wrong pronunciation, and simultaneously retrieves standard pronunciation and related knowledge points for a user to correct and learn so as to enable the user to quickly master the key of English reading.
    Description
Technical Field
      The invention relates to the technical field of voice recognition and reading scoring, in particular to a method for performing splitting operation accurate correction learning based on English reading automatic scoring.
    Background
      The research of Chinese speech recognition technology is developed rapidly in recent years, the level of the speech recognition research is basically and internationally synchronous, the Chinese speech recognition technology has the advantages of China in certain subdivision fields, the international advanced level is achieved, but the research in the speech recognition subdivision field of English reading automatic scoring evaluation is relatively lagged, various combinations of words such as 'speech and scoring' and 'English reading', 'spoken language', 'English' and the like are respectively searched through a patent search and analysis official website of the national bureau of knowledge, the obtained results are almost infinite, and the representative technical schemes in the speech reading scoring are as follows: CN200810226674 is a text reading level automatic evaluation and diagnosis method for spoken language test, CN201711200048 is a text related pronunciation error detection and quality scoring method for spoken English, CN201811030689 is an implementation method for studying and generating a speech automatic evaluation platform for spoken English teaching, and the like. The prior art is a technical scheme which is efficient, convenient, systematic and practical in the aspect.
      Disclosure of Invention
      In view of the problems described in the background art, the invention takes an HMM posterior probability algorithm as a scoring basis, carries out three-level operation analysis on the whole sentence, words and syllable phonemes of English reading speech, firstly obtains the speech characteristics from the whole sentence speech of English reading, compares the speech characteristics with a standard reference model and scores, carries out splitting operation analysis on the whole sentence speech with unqualified values, obtains words with reading errors, further carries out phoneme and syllable splitting on the pronunciation of the words, accurately positions the position of the pronunciation errors, and respectively establishes a speech characteristic reference library of a phoneme specific combination related to the pronunciation rule. Knowledge elements of English pronunciation major are fused into English reading scoring test practice, standard pronunciation and related knowledge points are retrieved for a user to correct and learn, and reading knowledge is rapidly mastered.
      English is scored based on text reading sentences, and a set of observation sequences y = (and) of English text reading voice of students, , ,.....) Multiple sets of state sequences in the standard reference model s = (s =: (s)) , , ,.....) Then the model s generates the observation sequence yHas a probability ofAfter aligning the phonemes in the decoding process by using a Viterbi algorithm, selecting a state sequence S most possibly corresponding to an observation sequence y, and calculating the state sequence S based on the observation sequence yHidden MarkovAlgorithm of logarithmic posterior probability of statistical model: phonemeThe posterior probability calculation method under each frame of the ith speech is formula 1:
      
      the phoneme can be obtained by taking the logarithm and then cumulatively overlappingThe logarithm posterior probability of the voice segment corresponding to the ith segment time point is scored and calculated by a formula 2:
      
      whereinRepresenting phonemesThe starting time of the corresponding i-th section of voice, Z represents the total number of factors in the voice,For a given phoneme q-down observation vectorSuch that the score mean of the log posterior probabilities of all phoneme segment voices is formula 3:
      
      whereinThe number of frames for which the kth phoneme persists; and determining whether the reading voice score reaches the standard or not by comparing the calculated score with a standard score which reaches the standard and is set by the system.
      The method divides the voice reading score into a test mode and an exercise evaluation mode, the test mode only tests the reading voice, and when the exercise evaluation mode reaches the standard of the voice score of the English sentence, the method proves that the learner already learns to read the text content, so that the learner directly enters the reading of the next text content without further processing.
      Under the condition that the score does not reach the standard, accurate words or syllables with pronunciation errors must be operated and analyzed, so that a student can know the specific error, accurately learn and correct the errors and convince the result of scoring and evaluating by reading the voice aloud, therefore, the whole sentence of voice needs to be split into syllables and words, and the voice needs to be split and recognized to calculate the word voice segment, the syllable voice segment and the like.
      The prior art, the unsupervised Bayesian model proposed by Herman Kamper, Aren Jansen and Sharon GoldWater in 2016, can segment and then cluster unlabeled speech into virtual phrases. The error rate of the model is about 20 percent, but the requirement cannot be met for English reading scoring, the English reading scoring is generally based on texts, the range of the texts is limited in a very small range, so that a more detailed foundation is improved for the separation of sentence voice into word syllables. The method comprises the following specific implementation steps:
      step 1, English texts are different from Chinese character texts, and English text separates words through spaces, so that the English texts are changed into word arrays a consisting of words through functions such as split and the like by using the spaces as characters for identifying boundaries of sub character strings, namely a = split (text); a combination of consecutive letters containing the abbreviation' is considered to be a word.
      Step 2, obtaining the voice of the appointed English word through the voice interface of the third party, and the embodiment comprises the following steps: submitting the English text post to a website of a Baidu voice development platform to obtain a voice file in a format of mp3 and the like; (or obtain the speech of a particular text word through a text-to-speech engine or the like).
      And 3, obtaining voice characteristics through pre-analysis, converting the voice characteristics into a new standard reference model M, recording the duration S of the word voice, and assuming the duration S of the tested reading of the word text in advance.
      Step 4, taking the interval with the starting time 1 and the ending time S in the tested reading voice as a new tested reading voice, carrying out comparison operation with M in the step 3, calculating a score J through the formula 1, the formula 2 and the formula 3,
      and 5, taking the tested reading voice, sequentially adding 1 to the starting time and 1 to the ending time to form a new tested reading voice group, respectively comparing the new tested reading voice group with M in the step 3 until the ending time is equal to the time length of the original tested voice, and calculating the score according to the formula 1, the formula 2 and the formula 3.
      And 6, comparing the scores calculated in the steps 4 and 5 to obtain a value A of the maximum value and parameters such as the starting time T1 and the ending time T2 corresponding to the maximum value.
      The above steps are based on the fact that the word reading time length is equal to the standard time length, therefore, the result needs to be optimized and corrected, the starting time and the ending time of the step 6 are respectively expanded upwards and downwards, and the time period and the time length of the word voice are corrected by obtaining the optimal value through comparing and scoring. The following steps are specifically implemented:
      and 7, taking the tested reading voice, sequentially subtracting the interval of the starting time T1 by 1 and the ending time T2 to form a new tested reading voice group until the starting time reduced by 1 is equal to 1, comparing the acoustic characteristics of the voice section obtained in the cyclic operation with the M in the step 3 to obtain a score, comparing the score with the score A in the step 6, if the score is greater than A, setting the value of A as the current score and setting T1 as the starting time corresponding to the current score, and if the score is less than A, jumping out the cycle of reducing the starting time by 1.
      And 8, sequentially subtracting the interval of the ending time T2 by 1 and the starting time T1 in a circulating mode to form a new tested reading voice group until the ending time reduced by 1 is equal to T1, comparing the acoustic characteristics of the voice section obtained in the circulating operation with M in the step 3 to obtain a score, comparing the score with the score A in the step 7, if the score is greater than A, setting the value of A as the current score and the T2 as the ending time corresponding to the current score, and jumping out of the circulating mode of reducing the starting time by 1 if the score is less than A.
      And 9, taking the tested reading voice, sequentially adding 1 to the starting time T1 in a cycle, and adding the ending time T2 to form a new tested reading voice group until the starting time increased by 1 is equal to T2, comparing the acoustic characteristics of the voice section obtained in the cycle operation with M in the step 3 to obtain a score, comparing the score with the score A in the step 8, if the score is greater than A, setting the value of A as the current score, setting T1 as the starting time corresponding to the current score, and if the score is less than A, jumping out of the cycle with the starting time decreased by 1.
      And step 10, sequentially adding 1 to the ending time T2 in a circulating manner and the starting time T1 in a circulating manner to form a new tested reading voice group, until the ending time added by 1 is equal to the integral duration of the original tested voice, comparing the acoustic characteristics of the voice section obtained in the circulating operation with M in the step 3 to obtain a score, comparing the score with the score A in the step 9, if the score is greater than A, setting the value of A as the current score and T2 as the ending time corresponding to the current score, and if the score is less than A, jumping out of the circulating manner that the starting time is added by 1.
      Step 11Recording the words and the corresponding data of the start time, the end time, the score and the like on the read speech obtained through the steps, repeating the steps 2-10, and obtaining the corresponding start time of all the words split in the step 1 on the read speechAnd end timeAnd corresponding score valueWhere the i index is the number of words in the text sentence.
      And step 12, calling out a text mapped by the current word in the step if the word score is lower than an error threshold set by the system, namely the word is qualitatively unqualified in reading and pronunciation, displaying the text on a specific user interface, reminding a user of the word pronunciation error, setting a broadcast click function identifier to be linked to the word voice position formed in the step 2, and configuring a corresponding program to enable a student to hear the standard word voice by clicking the broadcast mark. And performing phoneme and syllable level splitting analysis on the words.
      A phoneme is the smallest speech segment that constitutes a syllable and is characterized by a distinctive feature that can be recognized by people, and is the smallest linear speech unit that is divided acoustically. According to the pronunciation action analysis in the syllable, a pronunciation action forms a phoneme; the English international phonetic symbols have 48 phonemes, wherein 20 vowel phonemes and 28 consonant phonemes are included. English letters are 26 in total, wherein 5 vowel letters, 2 semivowel letters and 19 consonant letters exist; therefore, the layer of syllable phoneme, word is split to english pronunciation, can find the most primitive core standard of english pronunciation, and the accurate factor of learning english reading fast of the user of help user especially beginner english solves the problem that learning english reading exists. In order to further help the user accurately locate errors, correct the reading pronunciation, and divide the phonetic fragments of the phoneme words to perform operation analysis, and further effectively and accurately help the user learn English reading, the phoneme obtained by technical means of framing, windowing, decoding, discrete Fourier transform and the like on the voice in the prior art is only a result of probability-based observation operation, cannot be applied as a teaching material standard, and cannot be matched with various rules of English pronunciation, the invention further divides the word with unqualified pronunciation score in the step 12 into syllables and phoneme levels, and helps the user to perform analysis. The specific implementation is as follows:
      s1, separating letters in the word according to bytes, wherein the embodiment is as follows: using the MID function, the MID character string function, the role is to intercept the specified number of characters from a character string, MID (text, start _ num, num _ char) text intercepts num _ char from the 1 st bit from the left to the right for the word start _ num to be split, 1 character length (expressed by numbers) is intercepted from the left to the right, through the cycle of increasing from 1 to the word character length, the character array X (len-1) of the word is obtained, len is the word byte length.
      S2, creating a knowledge base of English phonemic phonetic symbols, wherein the knowledge base comprises 48 international phonetic symbols, namely front vowels/i ː/,/ɪ/,/e/,/æ/; the middle vowel is the first vowel, and the back vowel is the second vowel; opening and closing the vowels: /e ɪ/,/a ɪ/,/ɔ ɪ/,/a ʊ/,/ə ʊ/; .... a.clear consonants.. a.; nasal sound: /,/n/,/ŋ/; ......... Setting corresponding database table columns such as categories, reading knowledge points, matched standard voice storage paths and voice acoustic features thereof for each phoneme record, wherein the recording knowledge points are as follows: the symbol of this phonetic symbol in English phonetic symbol is/e/, the pronunciation symbol corresponding to American phonetic symbol is [ ɛ ], and its pronunciation concrete skill is: 1) the lips are slightly separated towards the two sides, and the distance between the upper teeth and the lower teeth can approximately accommodate the tip of a little finger; 2) the front part of the tongue is lifted in the process of pronunciation, and the tip of the tongue slightly contacts the back of the lower tooth; 3) when the user pronounces, the chin gradually moves downwards to vibrate the vocal cords to give out the sound/e/E. Note that: the/e/short vowel, note the differences with/ɜ ː/,/ə/, and so on, and other phonemes also record their associated knowledge.
      S3, creating a comparison library about the corresponding relation of phonetic symbols, letters and letter combinations, firstly adding 26 letters and common letter combinations in English pronunciation rules and corresponding phonetic symbols to corresponding tables of a database corresponding to the rule library, adding special letter combinations such as oo and ee only reading one phonetic symbol in foot and meet words, and simultaneously creating columns of letter classification and the like, recording information such as the categories of the letters such as vowels, consonants and the like, namely inputting information of English conventional knowledge English phonetic symbols and letter combination comparison into the comparison library.
      S4, creating an English pronunciation rule library, arranging various English pronunciation rules into expression forms which can be logically operated by the convenient program, and classifying according to the mode of logical operation: 1. character characteristic features: the phonetic symbol of the record containing these feature keywords is the default pronunciation of the corresponding letter or letter-letter combination, and such feature keywords are combined in the same record by feature symbols such as & symbol interval, for example: open syllable & stress; 2. enumerating: words or sentences containing the currently recorded letter or letter combination are pronounced as the currently recorded phonetic symbol are listed to the present record, with different words or sentences being separated by characteristic symbols.
      S5, based on the English pronunciation rule of the word end e unvoiced, judging whether the last digit of the X (len-1) character array obtained in the step S1 is e, if so, forcibly subtracting the last digit of the character array, namely len = len-1.
      S6, two variables A, Z are created, and an initial value a =0 is assigned.
      S7, if the value len minus a is greater than or equal to 4 (the conventional letter combination is a combination of up to 4 letters, the phonetic symbol of the pronunciation is [ ʃ ]), then Z =4, otherwise Z = len-a. Reassign A: a = A +1, the A-th to Z-th character combinations are taken from the X (len-1) character array members, combined to the comparison library of the common pronunciation letter combinations for searching, and processed respectively according to the searching results.
      1. When a plurality of records are retrieved, submitting the characters of the current combination to a custom function guizeunction (the characters of the current combination, word, A, Z) embedding operation in step S9; when A + Z > len directly executes the step S10 otherwise assign A: a = a + Z-1 and resumes the present step.
      2. Only a unique record is retrieved and the recorded phonetic symbol, along with the A, Z value, is recorded, when a + Z > len directly executes the step S10 otherwise re-assigns a: a = a + Z-1 and resumes the present step.
      3. If no record is retrieved the next step is entered.
      And S8, if Z =1, jumping to the step S7, otherwise, assigning Z as Z-1, taking the A-th to Z-th characters, combining the A-th to Z-th characters into a rule base of common pronunciation letter combinations for searching, and respectively processing according to the searching result.
      1. Retrieving only a unique record records the phonetic symbol of this record, along with the A, Z value, reassigns a: a = a + Z-1 and directly jumps to execute the step S7.
      2. If a plurality of records are retrieved, submitting the characters of the current combination to the user-defined function guizeunction (the characters of the current combination, word, A, Z) embedding operation in step S9; a = a + Z-1 and directly jumps to execute the step S7.
      3. If no record is recorded, the step is repeated and the cycle analysis is carried out until the Z value is 1.
      S9, custom function guizeffection of rule operation (str, str1, Index1, Index 2),
      a. retrieving str character strings in an English pronunciation rule base, and returning phonetic symbols in the current record as the result of the function if the content of the record in the 'enumerate' column in the record in the retrieval contains the current word, recording the phonetic symbols and the values of Index1 and Index2 together, terminating the operation of the function, and performing the next record check if the content of the record in the 'enumerate' column in the record in the retrieval does not contain the current word;
      b. judging whether the letter at the position of the next byte of str in str1 is a consonant letter, firstly setting two character variables tex and texx, if the letter is index1+ index2+1> len (word), then tex = right (word, 1), otherwise, tex = MID (str1, index1+ index2+1, 1), if the tex is "r" or "w" or "y", then assigning the tex to be "open syllable", otherwise, searching the record of the letter or letter combination to be tex in the rule base, and assigning the tex to be "open syllable" if the tex contains "vowel", otherwise, assigning the tex to be "closed syllable".
      C. And retrieving str characters in the rule base, verifying whether the content of the characteristic keyword column in the record has the content of texx item by item, if so, returning the phonetic symbol corresponding to the record to the function, and recording the phonetic symbol and the values of Index1 and Index2 together, otherwise, verifying the next record meeting the conditions.
      And S10, separating the English text words, letters and letter combinations and the corresponding phoneme phonetic symbols, searching the related knowledge points, and displaying the knowledge points on a user interface to enable the user to learn and master.
      In practical English reading, the phoneme often changes the original standard pronunciation due to the upper and lower phonemes, and more syllables are used as basic units for learning English reading pronunciation, so that splitting a word into syllables can effectively help a user to master correct reading.
      According to the pronunciation rule of English words: principle of word splitting into syllables obeys: 1) after one is entered, the letter is assigned to the next syllable if the consonant letter exists, and 2) after two is entered, the letter is assigned to the next syllable if the consonant letter exists, and the two syllables are assigned to the front syllable and the back syllable respectively; error-prone point: the syllables are divided by vowels, if a vowel is not pronounced, it cannot be formed, if two vowels are together but only one vowel is pronounced, it is still a syllable. Syllable dividing boundary rule: vowel phones are the bodies that make up syllables, and consonants are the boundaries of syllables. Each vowel phoneme can form a syllable, so that the more syllable splitting can help users to learn English reading more accurately, and the implementation of English word syllable splitting is as follows:
      step 1, forming a group of phonetic symbols corresponding to English word text letters and letter combinations through the technical scheme of splitting the words into phonemes:, , ,.....and the starting position of the corresponding letter of each phonetic symbol:, , ,.....and the ending position: , , ,.....。
      and 2, searching the phonetic symbols in the knowledge base respectively to obtain one or a group of phonemes of which the phonetic symbols are classified as vowels.
      And 3, sequentially using functions such as mid and the like to obtain letters corresponding to vowels or letters among letter groups according to the position values corresponding to the phonemes in the step 1, if only one letter exists, subtracting 1 from the initial position value corresponding to the next vowel, and if two letters exist, adding 1 to the end position value corresponding to the previous vowel.
      And 4, obtaining a group of phonemes and new values of the starting position and the ending position through the step 3, calculating according to the starting position and the ending position corresponding to the group of phonemes to obtain a corresponding letter or letter group, and outputting the letter or letter group to a user interface as a result of word syllable splitting.
      Particularly, it is stated that: reference throughout this specification to "an embodiment," or the like, means that a particular feature, element, or characteristic described in connection with the embodiment is included in embodiments described generally throughout this application. The appearances of the same phrase in various places in the specification are not necessarily all referring to the same embodiment. That is, when a particular feature, element, or characteristic is described in connection with any embodiment, it is submitted that it is within the purview of the appended claims to effect such feature, element, or characteristic in connection with other ones of the embodiments; the present invention has been described with reference to a number of illustrative embodiments of the logical architecture and concept of the present invention, but the scope of the invention is not limited thereto, and those skilled in the art can devise many other modifications and embodiments within the spirit and scope of the present invention, and various combinations and/or arrangements of the elements of the present invention, and other uses will be apparent to those skilled in the art, and insubstantial changes or substitutions in the implementation can be easily made, which will fall within the spirit and scope of the principles of the present invention.
    Drawings
    FIG. 1 is a diagram of an overall logical framework of a method for performing splitting operation accurate correction learning based on English reading automatic scoring.
  Claims (4)
1. A method for performing splitting operation accurate correction learning based on English reading automatic scoring is characterized by comprising the following steps and elements: dividing the voice reading score into a test mode and an exercise evaluation mode, wherein the test mode only tests and scores the read voice, and when the exercise evaluation mode is used, the voice score of the English sentence is up to standard, the next text content is directly read; under the condition that the scores do not reach the standard, the words with wrong pronunciation are operated and analyzed, a cyclic recursive grading and splitting recognition method is adopted, firstly, the text is split into word groups, the standard voice of the words and the acoustic characteristics and the like are obtained and used as standard reference models, the voice section with the highest score is obtained, then forward and backward addition and subtraction duration correction is carried out, and the voice section with the more ideal matching words is obtained;
      step 1, English texts are different from Chinese character texts, and English text texts separate words through spaces, so that the English texts are changed into word arrays a consisting of words through functions such as split and the like by using the spaces as characters for identifying boundaries of sub character strings, and continuous letters containing abbreviation symbols' are combined to be regarded as a word;
      step 2, obtaining the voice of the appointed English word through a voice interface of a third party;
      step 3, obtaining voice characteristics through pre-analysis, converting the voice characteristics into a new standard reference model M, simultaneously recording the duration S of word voice, and presupposing the duration S of the tested reading of the word text;
      phonemeThe posterior probability calculation method under each frame of the ith speech is formula 1:
      
      the phoneme can be obtained by taking the logarithm and then cumulatively overlappingThe logarithm posterior probability of the voice segment corresponding to the ith segment time point is scored and calculated by a formula 2:
      
      whereinRepresenting phonemesThe starting time of the corresponding i-th section of voice, Z represents the total number of factors in the voice,For a given phoneme q-down observation vectorSuch that the score mean of the log posterior probabilities of all phoneme segment voices is formula 3:
      
      whereinThe number of frames for which the kth phoneme persists; determining whether the reading voice score reaches the standard or not by comparing the calculated score with a standard score which reaches the standard and is set by a system;
      step 4, taking the interval with the starting time 1 and the ending time S as new tested reading voice in the tested reading voice, comparing the new tested reading voice with M in the step 3, and calculating a score J through the formula 1, the formula 2 and the formula 3;
      step 5, taking the tested reading voice, sequentially adding 1 to the starting time and 1 to the ending time to form a new tested reading voice group, respectively carrying out comparison operation with M in the step 3 until the ending time is equal to the duration of the original tested voice, and calculating a score through a formula 1, a formula 2 and a formula 3;
      and 6, comparing the scores calculated in the steps 4 and 5 to obtain a value A of the maximum value and parameters such as the starting time T1 and the ending time T2 corresponding to the maximum value.
    2. The method for performing accurate correction learning of splitting operation based on automatic English reading scoring according to claim 1, further comprising the steps and elements of: based on the highest scoring speech segment obtained in claim 1, and then performing forward and backward modification of the plus and minus duration to obtain a speech segment with a better word match, following the steps of claim one:
      step 7, in the tested reading speech, sequentially subtracting the interval of the starting time T1 by 1 and the ending time T2 to form a new tested reading speech group until the starting time of 1 subtraction is equal to 1, comparing the acoustic characteristics of the speech segment obtained in the cyclic operation with the M in the step 3 to obtain a score, comparing the score with the score A in the step 6, if the score is greater than A, setting the value of A as the current score and setting T1 as the starting time corresponding to the current score, and if the score is less than A, jumping out the cycle of 1 subtraction of the starting time;
      step 8, sequentially subtracting the interval of the ending time T2 by 1 and the starting time T1 to form a new tested reading voice group until the ending time of 1 subtraction is equal to T1, comparing the acoustic characteristics of the voice section obtained in the cyclic operation with M in the step 3 to obtain a score, comparing the score with the score A in the step 7, if the score is greater than A, setting the value of A as the current score and the T2 as the ending time corresponding to the current score, and jumping out of the cycle of 1 subtraction of the starting time if the score is less than A;
      step 9, taking the tested reading voice, sequentially adding 1 to the starting time T1 in a cycle, taking the ending time T2 as a new tested reading voice group, until the starting time increased by 1 is equal to T2, comparing the acoustic characteristics of the voice section obtained in the cycle operation with M in the step 3 to obtain a score, comparing the score with the score A in the step 8, if the score is greater than A, setting the value of A as the current score, setting T1 as the starting time corresponding to the current score, and if the score is less than A, jumping out of the cycle with the starting time decreased by 1;
      step 10, sequentially taking an interval of the ending time T2 plus 1 and the starting time T1 as a new tested reading voice group, until the ending time plus 1 is equal to the integral duration of the original tested voice, comparing the acoustic characteristics of the voice section obtained in the loop operation with the M in the step 3 to obtain a score, comparing the score with the score A in the step 9, if the score is greater than A, setting the value of A as the current score and setting T2 as the ending time corresponding to the current score, and if the score is less than A, jumping out the loop of the starting time plus 1;
      step 11, recording the words and the corresponding data of the start time, the end time, the score value and the like on the read speech obtained through the steps, repeating the steps 2-10, and obtaining the corresponding start time of all the words split in the step 1 on the read speechAnd end timeAnd corresponding score valueWherein the i subscript is the sequence number of the word in the text sentence;
      and step 12, if the word score is lower than an error threshold value set by a system, namely the word is qualitatively read and pronunciation is unqualified, calling a text mapped by the current word in the step, displaying the text on a specific user interface, reminding a user of the word pronunciation error, setting a broadcasting and clicking function mark broadcasting and linking to the word voice position formed in the step 2, configuring a corresponding program to enable a student to hear standard word voice by clicking the broadcasting mark, and performing phoneme and syllable level splitting analysis on the word.
    3. A method for splitting and analyzing words with unqualified pronunciation scores is characterized by comprising the following steps and elements:
      s1, splitting letters from words according to bytes;
      s2, creating a knowledge base of English phonemic phonetic symbols, wherein the knowledge base comprises 48 international phonemic phonetic symbols, setting corresponding categories, reading knowledge points, matched standard voice storage paths, voice acoustic characteristics and other database table lattices for each phoneme record, and recording the knowledge points;
      s3, creating a comparison library about corresponding relations of phonetic symbols, letters and letter combinations, firstly adding 26 letters in English pronunciation rules, common letter combinations and corresponding phonetic symbols thereof to corresponding tables of a database corresponding to the rule library, adding special letter combinations such as two same vowels arranged together, and simultaneously creating columns such as letter classification and recording information such as categories of letters such as vowels and consonants;
      s4, creating an English pronunciation rule library, arranging various English pronunciation rules into expression forms which can be logically operated by the convenient program, and classifying according to the mode of logical operation: 1. character characteristic features: the phonetic symbol of the record containing these characteristic keywords is the default pronunciation of the corresponding letter or letter-letter combination, and the characteristic keywords are combined in the same record by the characteristic symbols such as & symbol interval; 2. enumerating: pronouncing the word or sentence containing the currently recorded letter or letter combination into the currently recorded phonetic symbol, listing the word or sentence into the record, and separating different words or sentences by using characteristic symbols;
      s5, judging whether the last digit of the X (len-1) character array obtained in the step S1 is e or not based on the English pronunciation rule that e is not pronounced at the tail end of the word, if so, forcibly subtracting the last digit from the character array, wherein len = len-1;
      s6, creating A, Z two variables, and assigning an initial value A = 0;
      s7, if the value of len minus a is greater than or equal to 4, then Z =4, otherwise Z = len-a; reassign A: a = A +1, the A-th to Z-th character combinations are taken from the X (len-1) character array member, combined to the comparison library of the common pronunciation letter combination for retrieval, and processed respectively according to the retrieval result;
      when a plurality of records are retrieved, submitting the characters of the current combination to a custom function guizeunction (the characters of the current combination, word, A, Z) embedding operation in step S9; when A + Z > len directly executes the step S10 otherwise assign A: a = a + Z-1 and resumes the present step; the phonetic symbol of this record is recorded, together with the A, Z value, when A + Z > len is directly executed step S10 otherwise A is assigned: a = a + Z-1 and resumes the present step; if no record is retrieved, entering the next step;
      s8, if Z =1, jumping to the step S7, otherwise, assigning Z as Z-1, taking the A-th to Z-th characters, combining the A-th to Z-th characters into a rule base of common pronunciation letter combination for retrieval, and respectively processing according to the retrieval result: retrieving only a unique record records the phonetic symbol of this record, together with the A, Z value, assigns a: a = A + Z-1 and directly jumps to execute the step S7;
      if a plurality of records are retrieved, submitting the characters of the current combination to the user-defined function guizeunction (the characters of the current combination, word, A, Z) embedding operation in step S9; a = A + Z-1 and directly jumps to execute the step S7, if no record exists, the step is repeatedly started, and loop analysis is carried out until the value Z is 1;
      s9, custom function guizeffection of rule operation (str, str1, Index1, Index 2),
      a. retrieving str character strings in an English pronunciation rule base, and returning phonetic symbols in the current record as the result of the function if the content of the record in the 'enumerate' column in the record in the retrieval contains the current word, recording the phonetic symbols and the values of Index1 and Index2 together, terminating the operation of the function, and performing the next record check if the content of the record in the 'enumerate' column in the record in the retrieval does not contain the current word;
      b. judging whether the letter at the next byte position of str in str1 is a consonant letter, firstly setting two character variables tex and texx, if the letter is index1+ index2+1> len (word), then tex = right (word, 1), otherwise, tex = MID (str1, index1+ index2+1, 1), if the tex is "r" or "w" or "y", then assigning the tex to be "open syllable", otherwise, searching the record of letter or letter combination to be tex in the rule base, assigning the tex to be "open syllable" if the tex contains "vowel", otherwise, assigning the tex to be "closed syllable";
      C. retrieving str characters from a rule base, verifying whether the content of the characteristic keyword column in the record has the content of texx item by item, if so, returning the phonetic symbol corresponding to the record to the function, recording the phonetic symbol and the values of Index1 and Index2 together, and otherwise, verifying the next record meeting the conditions;
      and S10, separating the English text words, letters and letter combinations and the corresponding phoneme phonetic symbols, searching the related knowledge points, and displaying the knowledge points on a user interface to enable the user to learn and master.
    4. The method for performing accurate correction learning of splitting operation based on automatic English reading scoring according to claim 1, further comprising the steps and elements of: the technical scheme of syllable splitting of English words is provided;
      step 1, forming a group of phonetic symbols corresponding to English word text letters and letter combinations and the starting position and the ending position of the letter corresponding to each phonetic symbol according to the record of claim 3;
      step 2, searching the phonetic symbols in the knowledge base respectively to obtain one or a group of phonemes of which the phonetic symbols are classified as vowels;
      step 3, according to the position number value corresponding to the phoneme in the step 1, sequentially using functions such as mid and the like to obtain letters corresponding to vowels or letters among letter groups, if only one letter exists, subtracting 1 from the initial position value corresponding to the next vowel, and if two letters exist, adding 1 to the end position value corresponding to the previous vowel;
      and 4, obtaining a group of phonemes and new values of the starting position and the ending position through the step 3, calculating according to the starting position and the ending position corresponding to the group of phonemes to obtain a corresponding letter or letter group, and outputting the letter or letter group to a user interface as a result of word syllable splitting.
    Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN201910346958.9A CN109979257B (en) | 2019-04-27 | 2019-04-27 | Method for performing accurate splitting operation correction based on English reading automatic scoring | 
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN201910346958.9A CN109979257B (en) | 2019-04-27 | 2019-04-27 | Method for performing accurate splitting operation correction based on English reading automatic scoring | 
Publications (2)
| Publication Number | Publication Date | 
|---|---|
| CN109979257A CN109979257A (en) | 2019-07-05 | 
| CN109979257B true CN109979257B (en) | 2021-01-08 | 
Family
ID=67086631
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date | 
|---|---|---|---|
| CN201910346958.9A Active CN109979257B (en) | 2019-04-27 | 2019-04-27 | Method for performing accurate splitting operation correction based on English reading automatic scoring | 
Country Status (1)
| Country | Link | 
|---|---|
| CN (1) | CN109979257B (en) | 
Families Citing this family (7)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN110889987A (en) * | 2019-12-16 | 2020-03-17 | 安徽必果科技有限公司 | Intelligent comment method for correcting spoken English | 
| CN111402646A (en) * | 2020-03-27 | 2020-07-10 | 深圳小茜智能科技有限公司 | A kind of teaching equipment and its use method | 
| CN113707178B (en) * | 2020-05-22 | 2024-02-06 | 苏州声通信息科技有限公司 | Audio evaluation method and device and non-transient storage medium | 
| US20220223066A1 (en) * | 2021-01-08 | 2022-07-14 | Ping An Technology (Shenzhen) Co., Ltd. | Method, device, and computer program product for english pronunciation assessment | 
| CN113422825B (en) * | 2021-06-22 | 2022-11-08 | 读书郎教育科技有限公司 | System and method for assisting in culturing reading interests | 
| CN114120963B (en) * | 2021-11-25 | 2025-04-15 | 中国银行股份有限公司 | Synthesis method and device for English dubbing, storage medium and electronic device | 
| CN114245194A (en) * | 2021-12-23 | 2022-03-25 | 深圳市优必选科技股份有限公司 | Video teaching interaction method and device and electronic equipment | 
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN101739868A (en) * | 2008-11-19 | 2010-06-16 | 中国科学院自动化研究所 | Automatic evaluation and diagnosis method of text reading level for oral test | 
| CN101739869A (en) * | 2008-11-19 | 2010-06-16 | 中国科学院自动化研究所 | Priori knowledge-based pronunciation evaluation and diagnosis system | 
| CN101739867A (en) * | 2008-11-19 | 2010-06-16 | 中国科学院自动化研究所 | Method for scoring interpretation quality by using computer | 
| CN107103915A (en) * | 2016-02-18 | 2017-08-29 | 广州酷狗计算机科技有限公司 | A kind of audio data processing method and device | 
| CN107958673A (en) * | 2017-11-28 | 2018-04-24 | 北京先声教育科技有限公司 | A kind of spoken language methods of marking and device | 
| CN108428382A (en) * | 2018-02-14 | 2018-08-21 | 广东外语外贸大学 | It is a kind of spoken to repeat methods of marking and system | 
| CN109300339A (en) * | 2018-11-19 | 2019-02-01 | 王泓懿 | A kind of exercising method and system of Oral English Practice | 
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| KR102399535B1 (en) * | 2017-03-23 | 2022-05-19 | 삼성전자주식회사 | Learning method and apparatus for speech recognition | 
- 
        2019
        - 2019-04-27 CN CN201910346958.9A patent/CN109979257B/en active Active
 
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN101739868A (en) * | 2008-11-19 | 2010-06-16 | 中国科学院自动化研究所 | Automatic evaluation and diagnosis method of text reading level for oral test | 
| CN101739869A (en) * | 2008-11-19 | 2010-06-16 | 中国科学院自动化研究所 | Priori knowledge-based pronunciation evaluation and diagnosis system | 
| CN101739867A (en) * | 2008-11-19 | 2010-06-16 | 中国科学院自动化研究所 | Method for scoring interpretation quality by using computer | 
| CN107103915A (en) * | 2016-02-18 | 2017-08-29 | 广州酷狗计算机科技有限公司 | A kind of audio data processing method and device | 
| CN107958673A (en) * | 2017-11-28 | 2018-04-24 | 北京先声教育科技有限公司 | A kind of spoken language methods of marking and device | 
| CN108428382A (en) * | 2018-02-14 | 2018-08-21 | 广东外语外贸大学 | It is a kind of spoken to repeat methods of marking and system | 
| CN109300339A (en) * | 2018-11-19 | 2019-02-01 | 王泓懿 | A kind of exercising method and system of Oral English Practice | 
Non-Patent Citations (2)
| Title | 
|---|
| 基于语音识别的汉语发音自动评分系统的设计与实现;吕军;《数字化汉语教学的研究与应用》;20060719;第541-546页 * | 
| 汉语语音评分系统的设计与实现;王娜;《绵阳师范学院学报》;20120531;第31卷(第5期);第88-92页 * | 
Also Published As
| Publication number | Publication date | 
|---|---|
| CN109979257A (en) | 2019-07-05 | 
Similar Documents
| Publication | Publication Date | Title | 
|---|---|---|
| CN109979257B (en) | Method for performing accurate splitting operation correction based on English reading automatic scoring | |
| CN111369974B (en) | Dialect pronunciation marking method, language identification method and related device | |
| JP3481497B2 (en) | Method and apparatus using a decision tree to generate and evaluate multiple pronunciations for spelled words | |
| US6233553B1 (en) | Method and system for automatically determining phonetic transcriptions associated with spelled words | |
| US10235991B2 (en) | Hybrid phoneme, diphone, morpheme, and word-level deep neural networks | |
| US20180137109A1 (en) | Methodology for automatic multilingual speech recognition | |
| US20090258333A1 (en) | Spoken language learning systems | |
| JPS61177493A (en) | Voice recognition | |
| CN115116428B (en) | Prosodic boundary labeling method, device, equipment, medium and program product | |
| CN101551947A (en) | Computer system for assisting spoken language learning | |
| CN106448288A (en) | Interactive English learning system and method | |
| CN104217713A (en) | Tibetan-Chinese speech synthesis method and device | |
| CN112216267B (en) | Prosody prediction method, device, equipment and storage medium | |
| CN112259083B (en) | Audio processing method and device | |
| CN115130457B (en) | Prosodic modeling method and modeling system integrating Amdo Tibetan phoneme vectors | |
| Lee et al. | Korean dialect identification based on intonation modeling | |
| CN111429886B (en) | Voice recognition method and system | |
| CN111508522A (en) | Statement analysis processing method and system | |
| Azim et al. | Large vocabulary Arabic continuous speech recognition using tied states acoustic models | |
| Shukla | Keywords Extraction and Sentiment Analysis using Automatic Speech Recognition | |
| Kominek | Tts from zero: Building synthetic voices for new languages | |
| Díez et al. | Non-native speech corpora for the development of computer assisted pronunciation training systems | |
| Lunde | Modeling the Interpretability of an End-to-End Automatic Speech Recognition System Adapted to Norwegian Speech | |
| Arısoy | Statistical and discriminative language modeling for Turkish large vocabulary continuous speech recognition | |
| Miyazaki et al. | Connectionist temporal classification-based sound event encoder for converting sound events into onomatopoeic representations | 
Legal Events
| Date | Code | Title | Description | 
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |