WO2018179426A1 - Speech recognition correcting system, method, and program - Google Patents
Speech recognition correcting system, method, and program Download PDFInfo
- Publication number
- WO2018179426A1 WO2018179426A1 PCT/JP2017/013826 JP2017013826W WO2018179426A1 WO 2018179426 A1 WO2018179426 A1 WO 2018179426A1 JP 2017013826 W JP2017013826 W JP 2017013826W WO 2018179426 A1 WO2018179426 A1 WO 2018179426A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- information
- voice
- user
- speech
- position information
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 31
- 238000012937 correction Methods 0.000 claims abstract description 87
- 230000006870 function Effects 0.000 description 18
- 238000001514 detection method Methods 0.000 description 13
- 238000004891 communication Methods 0.000 description 12
- 238000005259 measurement Methods 0.000 description 10
- 238000007726 management method Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 101100493705 Caenorhabditis elegans bath-36 gene Proteins 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000004984 smart glass Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
Definitions
- the present invention relates to a speech recognition correction system, method, and program.
- the sound collecting device of the system may not be able to collect the voice due to ambient noise or the like. In this case, it is required to provide a system that can correct the content of the voice that the user is trying to recognize and correctly recognize the voice.
- the present invention has been made in view of such a demand, and even if the sound collecting device of the system cannot collect sound due to ambient noise or the like, the sound that the user is trying to recognize is intended. It is an object of the present invention to provide a system capable of estimating the content of the voice and correctly recognizing the voice.
- the present invention provides the following solutions.
- the invention according to the first feature is Position information acquisition means for acquiring position information of a place visited by a user before a specific time; Speech recognition means for recognizing speech uttered by the user; Correction means for correcting the speech-recognized content based on the acquired position information;
- a speech recognition correction system comprising:
- the location information acquisition unit acquires location information of a place visited by the user before a specific time
- the correction unit is based on the location information acquired by the location information acquisition unit. Correct the content of the voice recognition.
- the sound collecting device of the system cannot pick up the sound due to ambient noise or the like, the user tried to make the user recognize from the location information of the place where the user visited before a specific time. It is possible to provide a system capable of estimating the content of a voice and correctly recognizing the voice.
- the invention according to the second feature is the invention according to the first feature,
- the position information acquisition means provides a voice recognition correction system that acquires position information of a place visited by the user before a specific time from the user's portable terminal.
- the position information of the place visited by the user before a specific time is acquired from the mobile terminal owned by the user himself, and the voice that the user tries to recognize from the position information. Guess the contents. Therefore, it is possible to provide a system that can further improve the accuracy of recognizing the voice.
- the invention according to the third feature is the invention according to the first or second feature,
- the correction means provides a voice recognition correction system that corrects the voice-recognized content with reference to Web content related to the acquired position information.
- the content recognized by the voice is corrected by referring to the Web content related to the position information. Therefore, it is possible to provide a system that can further increase the accuracy of recognizing voice.
- the invention according to the fourth feature is the invention according to any one of the first to third features,
- the correction means provides a voice recognition correction system that specifies weather information in the acquired position information and corrects the voice-recognized content.
- the weather information related to the location information is specified, and the speech-recognized content is corrected. Therefore, it is possible to provide a system that can further improve the accuracy of recognizing voices regarding the weather of a place visited by a user before a specific time.
- the invention according to the fifth feature is the invention according to any one of the first to fourth features,
- the correction means provides a voice recognition correction system that specifies time information in the acquired position information and corrects the voice-recognized content.
- the time information related to the position information is specified, and the speech-recognized content is corrected. For this reason, it is possible to provide a system that can further improve the accuracy of recognizing the voice with respect to the time when the user visited the predetermined place.
- the invention according to a sixth feature is the invention according to any one of the first to fifth features, Further comprising state information acquisition means for acquiring state information indicating the state of the user from the portable terminal of the user;
- the correction means provides a voice recognition correction system that specifies state information in the acquired position information and corrects the voice-recognized content.
- the state information in the position information is specified, and the speech-recognized content is corrected. Therefore, it is possible to provide a system that can further improve the accuracy of recognizing voices regarding the state of the user at the place where the user has visited.
- the invention according to a seventh feature is the invention according to any one of the first to sixth features, Payment information acquisition means for acquiring payment information settled by the user;
- the correction means provides a voice recognition correction system that specifies payment information in the acquired position information and corrects the voice-recognized content.
- the settlement information in the position information is specified, and the speech-recognized content is corrected. Therefore, it is possible to provide a system that can further improve the accuracy of recognizing voices with respect to matters related to the settlement status at the place where the user has visited.
- the invention according to the eighth feature is the invention according to any one of the first to seventh features, Comprising a plurality of portable terminals and a management computer connected to the plurality of portable terminals via a network;
- the plurality of portable terminals include the position information acquisition unit and voice information acquisition unit that acquires voice information related to a voice uttered by the user,
- the management computer is configured to receive the location information and the audio information acquired by the plurality of mobile terminals,
- the management computer includes a determination unit that determines whether the portable terminal that has transmitted the position information and the portable terminal that has transmitted the audio information are the same portable terminal, and the correction unit.
- the correction unit provides a voice recognition correction system that corrects the voice-recognized content based on the acquired position information when the determination unit determines that the mobile terminals are the same portable terminal. .
- the voice recognition correction system is a plurality of portable terminals and a network type system connected to the plurality of portable terminals via a network.
- the voice recognition correction system is a plurality of portable terminals and a network type system connected to the plurality of portable terminals via a network.
- the invention according to a ninth feature is the invention according to any one of the first to eighth features, A repeater that repeats the corrected content; There is provided a voice recognition system further comprising recording means for recording the corrected content when there is no problem as a result of the repetition.
- the ninth aspect of the invention since the corrected content is repeated, even if the user is moving, the content corrected by the correcting means can be confirmed without paying attention to the screen display. it can.
- the recording means records the corrected content when there is no problem as a result of the repetition. Therefore, according to the ninth aspect of the invention, when there is an error in the corrected content, it is possible to prevent the corrected content from being recorded, and as a result, the accuracy of recognizing the voice is further improved. A system that can be enhanced can be provided.
- the user can recognize the position information from a location visited by the user before a specific time. It is possible to provide a system capable of estimating the content of such a voice and correctly recognizing the voice.
- FIG. 1 is a block diagram showing a hardware configuration and software functions of a speech recognition correction system 1 according to the first embodiment of the present invention.
- FIG. 2 is a flowchart showing the speech recognition correction method according to this embodiment.
- FIG. 3 is an example of the position information database 31 in the present embodiment.
- FIG. 4 is an example of the stay time measurement area 32 in the present embodiment.
- FIG. 5 is an example for explaining the collected sound contents.
- FIG. 6 is an example of the voice database 34 in the present embodiment.
- FIG. 7 is an example of the dictionary database 35 in the present embodiment.
- FIG. 8 is an example of the classification database 36 in the present embodiment.
- FIG. 9 is an example of display content and audio output content in the speech recognition correction system 1 according to the present embodiment.
- FIG. 10 is an example of the voice database 34 after being overwritten and saved in the present embodiment.
- FIG. 11 is a block diagram showing a hardware configuration and software functions of the speech recognition correction system 1 according to the first embodiment of the present invention.
- the voice recognition correction system may be a stand-alone type system that is provided integrally with a mobile terminal such as a smartphone, smart glass, or smart watch, or is connected to the mobile terminal via the network.
- a network type system including a management computer may be used.
- the speech recognition correction system is a stand-alone system.
- the speech recognition correction system will be described as a network type system.
- FIG. 1 is a block diagram for explaining the hardware configuration and software functions of a speech recognition correction system 1 according to this embodiment.
- the speech recognition correction system 1 includes a control unit 10 that controls data, a communication unit 20 that communicates with other devices, a storage unit 30 that stores data, an input unit 40 that receives user operations, and user voices.
- a sound collection unit 50 that collects sound, a position detection unit 60 that detects a position where the speech recognition correction system 1 exists, a timer 70 that measures a staying time at a certain place, and data controlled by the control unit 10
- an image display unit 80 for outputting and displaying an image.
- the control unit 10 includes a CPU (Central Processing Unit), a RAM (Random Access Memory), a ROM (Read Only Memory), and the like.
- a CPU Central Processing Unit
- RAM Random Access Memory
- ROM Read Only Memory
- the communication unit 20 includes a device for enabling communication with other devices, for example, a Wi-Fi (Wireless Fidelity) compatible device compliant with IEEE 802.11.
- Wi-Fi Wireless Fidelity
- the control unit 10 reads a predetermined program and cooperates with the communication unit 20 as necessary, so that the position information position information acquisition module 11, the state information etc. acquisition module 12, the voice recognition module 13, and the correction module 14, a repeat module 15, and a recording module 16 are realized.
- the storage unit 30 is a device that stores data and files, and includes a data storage unit such as a hard disk, a semiconductor memory, a recording medium, and a memory card.
- the storage unit 30 stores a history information database 31, a map database 32, a stay time measurement area 33, a voice database 34, a dictionary database 35, and a classification database 36, which will be described later.
- the storage unit 30 also stores image data to be displayed on the image display unit 80.
- the type of the input unit 40 is not particularly limited. Examples of the input unit 40 include a keyboard, a mouse, and a touch panel.
- the type of the sound collecting unit 50 is not particularly limited. Examples of the sound collecting unit 50 include a microphone.
- the position detection unit 60 is not particularly limited as long as it is a device that can detect the latitude and longitude where the voice recognition correction system 1 is located.
- Examples of the position detection unit 60 include a GPS (Global Positioning System).
- the type of the timer 70 is not particularly limited as long as the staying time at a certain place can be measured.
- the type of the image display unit 80 is not particularly limited. Examples of the image display unit 80 include a monitor and a touch panel.
- FIG. 2 is a flowchart showing a voice recognition correction method using the voice recognition correction system 1. The processing executed by each hardware and the software module described above will be described.
- Step S10 Acquisition of Position Information
- the control unit 10 of the voice recognition correction system 1 executes the position information acquisition module 11 and acquires position information of a place visited by the user before a specific time (step S10).
- the position detection unit 60 of the voice recognition correction system 1 detects the latitude and longitude where the voice recognition correction system 1 is located at any time. Then, the control unit 10 refers to the map database 32 and searches for a place corresponding to the latitude and longitude detected by the position detection unit 60. Then, the control unit 10 records the searched place in the history information database 31.
- FIG. 3 shows an example of the history information database 31.
- the history information database 31 information on the date and time when the position detection unit 60 detects the position information and the location corresponding to the position detected by the position detection unit 60 is recorded in association with the identification number.
- the date can be recorded by referring to a calendar function (not shown) built in the audio content correction system 1.
- the time can be recorded by referring to a clock function (not shown) built in the audio content correction system 1.
- the control unit 10 can acquire position information of a place visited by the user before a specific time by referring to the history information database 31.
- Step S11 Acquisition of status information and the like
- the control unit 10 executes the status information acquisition module 12, and acquires status information indicating the user status, current weather information, payment information regarding a credit card and electronic payment, and the like (step S11).
- the timer 70 of the voice recognition correction system 1 measures the time during which the voice recognition correction system 1 stays at a certain place and records it in the stay time measurement area 32.
- FIG. 4 is an example of the stay time measurement area 32.
- the stay time measurement area 32 information on the stay location, stay start date and time and stay end date and time of the speech recognition correction system 1 is recorded.
- the control unit 10 determines that the user is staying at a certain place and records the history information database.
- the item of “state” in 31 is updated to “staying”.
- control unit 10 accesses an external weather forecast providing website via the communication unit 20. And the control part 10 reads the information of the weather in the spot corresponded to the latitude and the longitude which the position detection part 60 detected from the said weather forecast provision Web site. Then, the control unit 10 records the read weather information in the history information database 31.
- control unit 10 records the payment information regarding the credit card or the electronic payment in the history information database 31.
- the history information database 31 shown in FIG. 3 includes not only date, time and place information when the position detection unit 60 detects position information, but also state information indicating the user state, current weather information, credit card Also, payment information relating to electronic payment is recorded in association with an identification number.
- the control unit 10 can acquire the state information, weather information, settlement information, and the like by referring to the history information database 31.
- Step S12 Sound collection
- the control unit 10 When the sound collection unit 50 collects the user's voice, the control unit 10 performs A / D conversion on the voice collected by the sound collection unit 50 and sets the A / D converted information in a predetermined area of the storage unit 30. To do.
- the sound collection unit 50 of the speech recognition correction system 1 collects the sound. Then, the control unit 10 A / D converts the sound collected by the sound collection unit 50 and sets the A / D converted information in a predetermined area of the storage unit 30.
- step S12 determines whether the determination in step S12 is YES or not. If the determination in step S12 is YES, the process proceeds to step S13. On the other hand, when the determination in step S12 is NO, the process returns to step S10.
- Step S13 Speech recognition
- the control unit 10 refers to the voice database 34 shown in FIG. 6 and transcribes the voice collected by the sound collection unit 50 from the sound wave waveform included in the A / D converted information.
- the information that has been A / D converted is “Kyoha ??? Nidekaketa / Harete Yokatta / ??? Nyotte Brand ??????
- “???” is a place where the sound collection unit 50 of the sound content correction system 1 cannot collect sound due to ambient noise or the like.
- control unit 10 refers to the dictionary database 35 shown in FIG. 7, replaces the transcribed information with a language, and creates a sentence.
- the information that has been A / D converted is “I went out to today. It was good to be sunny.
- the documented information is set in a predetermined area of the storage unit 30 in association with the A / D converted information.
- Step S14 Correction of Recognized Content
- the control unit 10 executes the correction module 14 and corrects the content recognized in the process of step S13 based on the position information acquired in the process of step S10, the state information acquired in the process of step S11, and the like. (Step S14).
- the control unit 10 refers to the classification database 36.
- FIG. 8 is an example of the classification database 36.
- the classification database 36 records in advance the relationship between words and the like included in the documented contents and items listed in the history information database 31.
- items such as “date”, “time”, “location”, “state”, “weather”, “payment information” are listed in the history information database 31 (FIG. 3).
- word groups related to these items are recorded.
- the control unit 10 refers to the classification database 36, associates “today” included in this information with the item “date”, and associates “going out” with the item “location”.
- “good” is associated with the item “weather”, and “stop” is associated with the item “location”.
- “clothes” is associated with the item “settlement information”, and “purchase” is associated with the item “settlement information”.
- control unit 10 refers to the history information database 31.
- the control unit 10 refers to the item “date” in the history information database 31 and extracts an item related to “today” included in the speech-recognized content.
- it can be grasped
- control unit 10 refers to the item “place” in the history information database 31 and extracts items relating to “going out” and “stopping” included in the speech-recognized content.
- control unit 10 determines the “going out” location, “stopping” from the content recorded in the history information database 31. It can be inferred that the location is “Yurakucho”, “Yurakucho Station”, “A Department Store”, “Department Store”, “Ginza”, or “A Department Store Ginza Store”.
- control unit 10 refers to the voice database 34 (FIG. 6), and the voice corresponding to “Yurakucho”, “Yurakucho Station”, “A Department Store”, “Department Store”, “Ginza”, “A Department Store Ginza Store”. Synthesize data (waveform data). Subsequently, the control unit 10 compares the synthesized voice data with the voice data that has been A / D converted in the process of step S ⁇ b> 13, and went to “???”. The voice closest to the voice data corresponding to “???” is extracted.
- control unit 10 refers to the item “payment information” in the history information database 31 and extracts items related to “clothes” and “purchase” included in the speech-recognized content.
- control unit 10 purchases “clothes” from the content recorded in the history information database 31. It can be inferred that it is one of “brand X”, “shirt”, “7560 yen”, “credit card”, or “card payment”.
- control unit 10 refers to the voice database 34 (FIG. 6), and the voice data (waveform data) corresponding to “brand X”, “shirt”, “7560 yen”, “credit card”, “card payment”. Is synthesized. Subsequently, the control unit 10 compares the synthesized voice data with the voice data that has been A / D converted in the process of step S ⁇ b> 13 and sets “???” to “???”. The voice closest to the corresponding voice data is extracted.
- control unit 10 can presume that “???” in “?????? purchases” is “brand X”.
- the control unit 10 acquires position information of a place visited by the user before a specific time in the process of step S10, and the process of step S10 in the process of step S14. Based on the acquired position information, the content of voice recognition is corrected. As a result, even if the sound collecting device of the system cannot pick up the sound due to ambient noise or the like, the user tried to make the user recognize from the location information of the place where the user visited before a specific time. It is possible to provide the audio content correction system 1 that can estimate the audio content and correctly recognize the audio.
- the control unit 10 identifies the weather information, the time information, the state information indicating the user state, and the payment information settled by the user in the position information acquired in the process of step S11. It is possible to correct the content recognized by the voice processing. According to the invention described in the present embodiment, in addition to the position information of a place visited by a user before a specific time point, it is possible to specify various information related to the position information and correct the content recognized by voice. To do. Therefore, it is possible to provide the audio content correction system 1 that can further increase the accuracy of recognizing the audio.
- control unit 10 refers to the Web content related to the position information acquired in the process of step S10 and corrects the content recognized in the process of step S13. By doing so, it is possible to provide the audio content correction system 1 that can further increase the accuracy of recognizing the audio.
- Step S15 Recurrence of correction contents
- FIG. 9 shows an example of the state of the audio content correction system 1 at that time.
- the corrected content is repeated not only as a screen display on the image display unit 80 but also as a sound from a speaker.
- the content corrected in the process of step S14 can be confirmed without paying attention to the above.
- Step S16 Recording Correction Contents
- the contents of the part whose contents are unknown only by the process of step S13 are “Ginza”, “A department store”, and “Brand X”. There was found.
- the control unit 10 changes the voice data A / D converted in the process of step S13 to “Ginza”, “A”, “Department Store”, “A Department Store”, “Brand”, “X”, “Brand X”.
- the waveform at the corresponding location is extracted and overwritten and saved in the voice database 34 that was originally stored.
- FIG. 10 shows an example of the voice database 34 after being overwritten. Audio data of “Ginza”, “A”, “Department Store”, “A Department Store”, “Brand”, “X”, and “Brand X” is newly added to the audio database 34.
- the corrected content is recorded when there is no problem as a result of the repetition in step S15. Therefore, when there is an error in the content corrected in the process of step S14, it is possible to prevent the incorrect content from being recorded, and as a result, the speech recognition that can further improve the accuracy of recognizing the speech.
- a correction system 1 can be provided.
- the voice recognition correction system is described as a stand-alone type system.
- the second embodiment is different in that the voice recognition correction system is a network type system, and the rest is the same.
- FIG. 11 is a block diagram for explaining the hardware configuration and software functions of the speech recognition correction system 100 according to this embodiment.
- the voice recognition correction system 100 includes a plurality of portable terminals 200 and a management computer 300 connected to the plurality of portable terminals 200 via a network.
- the mobile terminal 200 includes a control unit 210, a communication unit 220, a storage unit 230, an input unit 240, a sound collection unit 250, a position detection unit 260, and an image display unit 280, respectively.
- the control unit 210 includes a position information acquisition module 211, a state information acquisition module 212, and a repetition module 215.
- the sound collection unit 250 functions as a voice information acquisition unit that acquires voice information related to the voice uttered by the user.
- the functions of the communication unit 220, the storage unit 230, the input unit 240, the position detection unit 260, and the image display unit 280 are the same as those of the communication unit 20, the storage unit 30, the input unit 40, and the position detection unit 60 in the first embodiment. , And the function of the image display unit 80.
- the functions of the position information acquisition module 211, the state information acquisition module 212, and the repetition module 215 are the same as the functions of the position information acquisition module 11, the state information acquisition module 12, and the repetition module 15 in the first embodiment. It is.
- the management computer 300 includes a control unit 310, a communication unit 320, a storage unit 330, an input unit 340, and an image display unit 380.
- the control unit 310 includes a voice recognition module 313, a correction module 314, and a recording module 316.
- the communication unit 320 is configured to be able to receive position information and audio information acquired by the plurality of mobile terminals 200.
- the storage unit 330 stores a history information database 331, a map database 332, a stay time measurement area 333, a voice database 334, a dictionary database 335, and a classification database 336.
- the control part 310 discriminate
- the correction module 314 of the control unit 310 when the portable terminal that transmitted the position information and the portable terminal that transmitted the audio information are the same portable terminal, based on the position information acquired by the portable terminal, The speech-recognized content of the sound collected by the sound collection unit 250 of the portable terminal is corrected.
- the speech recognition correction system 1 is a network type system including a plurality of portable terminals 200 and a management computer 300 connected to the plurality of portable terminals 200 via a network. it can. Therefore, it is possible to provide the network type speech recognition correction system 1 that can further increase the accuracy of speech recognition.
- the functions of the input unit 340 and the image display unit 380 are the same as the functions of the input unit 40 and the image display unit 80 in the first embodiment.
- the functions of the speech recognition module 313, the correction module 314, and the recording module 316 are basically the same as the functions of the speech recognition module 13, the correction module 14, and the recording module 16 in the first embodiment.
- the history information database 331, the map database 332, the stay time measurement area 333, the voice database 334, the dictionary database 335, and the classification database 336 are configured in the history information database 31, the map database 32, the stay time in the first embodiment.
- the configuration is the same as that of the measurement area 33, the voice database 34, the dictionary database 35, and the classification database 36.
- the means and functions described above are realized by a computer (including a CPU, an information processing apparatus, and various terminals) reading and executing a predetermined program.
- the program is provided in a form recorded on a computer-readable recording medium such as a flexible disk, CD (CD-ROM, etc.), DVD (DVD-ROM, DVD-RAM, etc.).
- the computer reads the program from the recording medium, transfers it to the internal storage device or the external storage device, stores it, and executes it.
- the program may be recorded in advance in a storage device (recording medium) such as a magnetic disk, an optical disk, or a magneto-optical disk, and provided from the storage device to a computer via a communication line.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
[Problem] To provide a system capable of correctly recognizing speech that is desired by a user to be recognized, by deducing the details of the speech even when a sound capturing device fails to fully collect the speech due to noise, etc., from the surrounding area. [Solution] In a speech recognition correcting system 1 according to the present invention, a control unit 10 executes a positional information acquisition module 11 to acquire positional information about a place which a user visited prior to a specific time point. Further, the control unit 10 executes a speech recognition module 13 to perform speech recognition of speech uttered by the user. The control unit 10 executes a correction module 14 to correct, on the basis of the positional information acquired by the positional information acquisition module 11, the details of speech recognition performed by execution of the speech recognition module 13.
Description
本発明は、音声認識補正システム、方法及びプログラムに関する。
The present invention relates to a speech recognition correction system, method, and program.
近年、ユーザの音声を認識する音声認識システムが知られている。音声認識システムについては、音声認識の認識精度向上が課題であり、例えば、音声認識の対象となる利用者を撮影した画像をカメラから取得し、前記画像を用いて前記利用者を特定する顔認識手段と、予め記憶された個人毎の口の動きの特徴量を記憶した口の動きデータベースを有し、前記画像から利用者の口の状態を検出し、前記口の動きデータベースに記憶された前記利用者に対応する口の動きの特徴量と前記画像から得られた前記利用者の口の動きの特徴量とを比較し、前記利用者が話しているかどうかを判定する口の動き判定手段と、前記利用者が話していると判定された場合、前記利用者の音声を取得するための音声入力手段に前記利用者の位置を通知する指向方向決定手段と、前記音声を取得し音声認識を行う音声認識手段とを備えることが提案されている(例えば、特許文献1参照)。
In recent years, a speech recognition system that recognizes a user's speech is known. For speech recognition systems, there is an issue of improving recognition accuracy of speech recognition. For example, face recognition that acquires an image of a user who is a target of speech recognition from a camera and identifies the user using the image Means and a mouth movement database storing pre-stored individual mouth movement feature quantities, detecting a user's mouth state from the image, and storing the mouth movement database in the mouth movement database Mouth movement determination means for comparing the mouth movement feature quantity corresponding to the user with the mouth movement feature quantity of the user obtained from the image to determine whether the user is speaking or not When it is determined that the user is speaking, a direction determining means for notifying the user's position to a voice input means for acquiring the user's voice, and acquiring the voice and performing voice recognition. With voice recognition means It has been proposed comprising (e.g., see Patent Document 1).
しかしながら、音声認識の精度向上については、なおいっそうの改良の余地がある。例えば、移動中のユーザが音声を認識させようとする場合、周囲の雑音等から、システムの集音装置が音声を集音しきれない場合がある。この場合において、ユーザが認識させようとした音声の内容を補正し、当該音声を正しく認識することの可能なシステムの提供が求められている。
However, there is room for further improvement in improving the accuracy of speech recognition. For example, when a moving user tries to recognize a voice, the sound collecting device of the system may not be able to collect the voice due to ambient noise or the like. In this case, it is required to provide a system that can correct the content of the voice that the user is trying to recognize and correctly recognize the voice.
本発明は、このような要望に鑑みてなされたものであり、周囲の雑音等から、システムの集音装置が音声を集音しきれない場合であっても、ユーザが認識させようとした音声の内容を推測し、当該音声を正しく認識することの可能なシステムを提供することを目的とする。
The present invention has been made in view of such a demand, and even if the sound collecting device of the system cannot collect sound due to ambient noise or the like, the sound that the user is trying to recognize is intended. It is an object of the present invention to provide a system capable of estimating the content of the voice and correctly recognizing the voice.
本発明では、以下のような解決手段を提供する。
The present invention provides the following solutions.
第1の特徴に係る発明は、
ユーザが特定の時点以前に訪れた場所の位置情報を取得する位置情報取得手段と、
前記ユーザが発声した音声を音声認識する音声認識手段と、
前記取得された位置情報に基づいて、前記音声認識された内容を補正する補正手段と、
を備える音声認識補正システムを提供する。 The invention according to the first feature is
Position information acquisition means for acquiring position information of a place visited by a user before a specific time;
Speech recognition means for recognizing speech uttered by the user;
Correction means for correcting the speech-recognized content based on the acquired position information;
A speech recognition correction system comprising:
ユーザが特定の時点以前に訪れた場所の位置情報を取得する位置情報取得手段と、
前記ユーザが発声した音声を音声認識する音声認識手段と、
前記取得された位置情報に基づいて、前記音声認識された内容を補正する補正手段と、
を備える音声認識補正システムを提供する。 The invention according to the first feature is
Position information acquisition means for acquiring position information of a place visited by a user before a specific time;
Speech recognition means for recognizing speech uttered by the user;
Correction means for correcting the speech-recognized content based on the acquired position information;
A speech recognition correction system comprising:
第1の特徴に係る発明によれば、位置情報取得手段は、ユーザが特定の時点以前に訪れた場所の位置情報を取得し、補正手段は、位置情報取得手段によって取得された位置情報に基づいて、音声認識の内容を補正する。これにより、周囲の雑音等から、システムの集音装置が音声を集音しきれない場合であっても、ユーザが特定の時点以前に訪れた場所の位置情報から、ユーザが認識させようとした音声の内容を推測し、当該音声を正しく認識することの可能なシステムを提供することができる。
According to the first aspect of the invention, the location information acquisition unit acquires location information of a place visited by the user before a specific time, and the correction unit is based on the location information acquired by the location information acquisition unit. Correct the content of the voice recognition. As a result, even if the sound collecting device of the system cannot pick up the sound due to ambient noise or the like, the user tried to make the user recognize from the location information of the place where the user visited before a specific time. It is possible to provide a system capable of estimating the content of a voice and correctly recognizing the voice.
第2の特徴に係る発明は、第1の特徴に係る発明であって、
前記位置情報取得手段は、前記ユーザの携帯端末から、当該ユーザが特定の時点以前に訪れた場所の位置情報を取得する、音声認識補正システムを提供する。 The invention according to the second feature is the invention according to the first feature,
The position information acquisition means provides a voice recognition correction system that acquires position information of a place visited by the user before a specific time from the user's portable terminal.
前記位置情報取得手段は、前記ユーザの携帯端末から、当該ユーザが特定の時点以前に訪れた場所の位置情報を取得する、音声認識補正システムを提供する。 The invention according to the second feature is the invention according to the first feature,
The position information acquisition means provides a voice recognition correction system that acquires position information of a place visited by the user before a specific time from the user's portable terminal.
第2の特徴に係る発明によれば、ユーザ自身が所有する携帯端末から、ユーザが特定の時点以前に訪れた場所の位置情報を取得し、その位置情報から、ユーザが認識させようとした音声の内容を推測する。そのため、当該音声を認識する精度をよりいっそう高めることの可能なシステムを提供することができる。
According to the second aspect of the invention, the position information of the place visited by the user before a specific time is acquired from the mobile terminal owned by the user himself, and the voice that the user tries to recognize from the position information. Guess the contents. Therefore, it is possible to provide a system that can further improve the accuracy of recognizing the voice.
第3の特徴に係る発明は、第1又は第2の特徴に係る発明であって、
前記補正手段は、前記取得された位置情報に関するWebコンテンツを参照して、前記音声認識された内容を補正する、音声認識補正システムを提供する。 The invention according to the third feature is the invention according to the first or second feature,
The correction means provides a voice recognition correction system that corrects the voice-recognized content with reference to Web content related to the acquired position information.
前記補正手段は、前記取得された位置情報に関するWebコンテンツを参照して、前記音声認識された内容を補正する、音声認識補正システムを提供する。 The invention according to the third feature is the invention according to the first or second feature,
The correction means provides a voice recognition correction system that corrects the voice-recognized content with reference to Web content related to the acquired position information.
第3の特徴に係る発明によれば、ユーザが特定の時点以前に訪れた場所の位置情報に加え、その位置情報に関するWebコンテンツを参照して、音声認識された内容を補正する。そのため、音声を認識する精度をよりいっそう高めることの可能なシステムを提供することができる。
According to the invention relating to the third feature, in addition to the position information of the place where the user visited before a specific time point, the content recognized by the voice is corrected by referring to the Web content related to the position information. Therefore, it is possible to provide a system that can further increase the accuracy of recognizing voice.
第4の特徴に係る発明は、第1から第3のいずれかの特徴に係る発明であって、
前記補正手段は、前記取得された位置情報における天気情報を特定して、前記音声認識された内容を補正する、音声認識補正システムを提供する。 The invention according to the fourth feature is the invention according to any one of the first to third features,
The correction means provides a voice recognition correction system that specifies weather information in the acquired position information and corrects the voice-recognized content.
前記補正手段は、前記取得された位置情報における天気情報を特定して、前記音声認識された内容を補正する、音声認識補正システムを提供する。 The invention according to the fourth feature is the invention according to any one of the first to third features,
The correction means provides a voice recognition correction system that specifies weather information in the acquired position information and corrects the voice-recognized content.
第4の特徴に係る発明によれば、ユーザが特定の時点以前に訪れた場所の位置情報に加え、その位置情報に関する天気情報を特定して、音声認識された内容を補正する。そのため、ユーザが特定の時点以前に訪れた場所の天気に関し、音声を認識する精度をよりいっそう高めることの可能なシステムを提供することができる。
According to the fourth aspect of the invention, in addition to the location information of the place where the user visited before a specific time point, the weather information related to the location information is specified, and the speech-recognized content is corrected. Therefore, it is possible to provide a system that can further improve the accuracy of recognizing voices regarding the weather of a place visited by a user before a specific time.
第5の特徴に係る発明は、第1から第4のいずれかの特徴に係る発明であって、
前記補正手段は、前記取得された位置情報における時間情報を特定して、前記音声認識された内容を補正する、音声認識補正システムを提供する。 The invention according to the fifth feature is the invention according to any one of the first to fourth features,
The correction means provides a voice recognition correction system that specifies time information in the acquired position information and corrects the voice-recognized content.
前記補正手段は、前記取得された位置情報における時間情報を特定して、前記音声認識された内容を補正する、音声認識補正システムを提供する。 The invention according to the fifth feature is the invention according to any one of the first to fourth features,
The correction means provides a voice recognition correction system that specifies time information in the acquired position information and corrects the voice-recognized content.
第5の特徴に係る発明によれば、ユーザが特定の時点以前に訪れた場所の位置情報に加え、その位置情報に関する時間情報を特定して、音声認識された内容を補正する。そのため、ユーザが所定の場所に訪れた時刻に関し、音声を認識する精度をよりいっそう高めることの可能なシステムを提供することができる。
According to the fifth aspect of the invention, in addition to the position information of the place visited by the user before a specific time point, the time information related to the position information is specified, and the speech-recognized content is corrected. For this reason, it is possible to provide a system that can further improve the accuracy of recognizing the voice with respect to the time when the user visited the predetermined place.
第6の特徴に係る発明は、第1から第5のいずれかの特徴に係る発明であって、
前記ユーザの携帯端末から、当該ユーザの状態を示す状態情報を取得する状態情報取得手段をさらに備え、
前記補正手段は、前記取得された位置情報における状態情報を特定して、前記音声認識された内容を補正する、音声認識補正システムを提供する。 The invention according to a sixth feature is the invention according to any one of the first to fifth features,
Further comprising state information acquisition means for acquiring state information indicating the state of the user from the portable terminal of the user;
The correction means provides a voice recognition correction system that specifies state information in the acquired position information and corrects the voice-recognized content.
前記ユーザの携帯端末から、当該ユーザの状態を示す状態情報を取得する状態情報取得手段をさらに備え、
前記補正手段は、前記取得された位置情報における状態情報を特定して、前記音声認識された内容を補正する、音声認識補正システムを提供する。 The invention according to a sixth feature is the invention according to any one of the first to fifth features,
Further comprising state information acquisition means for acquiring state information indicating the state of the user from the portable terminal of the user;
The correction means provides a voice recognition correction system that specifies state information in the acquired position information and corrects the voice-recognized content.
第6の特徴に係る発明によれば、ユーザが特定の時点以前に訪れた場所の位置情報に加え、その位置情報における状態情報を特定して、音声認識された内容を補正する。そのため、ユーザが訪れた場所でのユーザの状態に関し、音声を認識する精度をよりいっそう高めることの可能なシステムを提供することができる。
According to the sixth aspect of the invention, in addition to the position information of the place visited by the user before a specific time point, the state information in the position information is specified, and the speech-recognized content is corrected. Therefore, it is possible to provide a system that can further improve the accuracy of recognizing voices regarding the state of the user at the place where the user has visited.
第7の特徴に係る発明は、第1から第6のいずれかの特徴に係る発明であって、
前記ユーザが決済した決済情報を取得する決済情報取得手段をさらに備え、
前記補正手段は、前記取得された位置情報における決済情報を特定して、前記音声認識された内容を補正する、音声認識補正システムを提供する。 The invention according to a seventh feature is the invention according to any one of the first to sixth features,
Payment information acquisition means for acquiring payment information settled by the user;
The correction means provides a voice recognition correction system that specifies payment information in the acquired position information and corrects the voice-recognized content.
前記ユーザが決済した決済情報を取得する決済情報取得手段をさらに備え、
前記補正手段は、前記取得された位置情報における決済情報を特定して、前記音声認識された内容を補正する、音声認識補正システムを提供する。 The invention according to a seventh feature is the invention according to any one of the first to sixth features,
Payment information acquisition means for acquiring payment information settled by the user;
The correction means provides a voice recognition correction system that specifies payment information in the acquired position information and corrects the voice-recognized content.
第7の特徴に係る発明によれば、ユーザが特定の時点以前に訪れた場所の位置情報に加え、その位置情報における決済情報を特定して、音声認識された内容を補正する。そのため、ユーザが訪れた場所での決済状況と繋がる事項に関し、音声を認識する精度をよりいっそう高めることの可能なシステムを提供することができる。
According to the seventh aspect of the invention, in addition to the position information of the place where the user visited before a specific time point, the settlement information in the position information is specified, and the speech-recognized content is corrected. Therefore, it is possible to provide a system that can further improve the accuracy of recognizing voices with respect to matters related to the settlement status at the place where the user has visited.
第8の特徴に係る発明は、第1から第7のいずれかの特徴に係る発明であって、
複数の携帯端末と、これら複数の携帯端末とネットワークで接続されている管理コンピュータとを含んで構成され、
前記複数の携帯端末は、前記位置情報取得手段と、前記ユーザが発声した音声に関する音声情報を取得する音声情報取得手段とを有し、
前記管理コンピュータは、前記複数の携帯端末によって取得された前記位置情報及び前記音声情報を受信可能に構成され、
前記管理コンピュータは、前記位置情報を送信した携帯端末と、前記音声情報を送信した携帯端末とが同一の携帯端末であるかを判別する判別手段と、前記補正手段とを有し、
前記補正手段は、前記判別手段により同一の携帯端末であると判別された場合に、前記取得された位置情報に基づいて、前記音声認識された内容を補正する、音声認識補正システム、を提供する。 The invention according to the eighth feature is the invention according to any one of the first to seventh features,
Comprising a plurality of portable terminals and a management computer connected to the plurality of portable terminals via a network;
The plurality of portable terminals include the position information acquisition unit and voice information acquisition unit that acquires voice information related to a voice uttered by the user,
The management computer is configured to receive the location information and the audio information acquired by the plurality of mobile terminals,
The management computer includes a determination unit that determines whether the portable terminal that has transmitted the position information and the portable terminal that has transmitted the audio information are the same portable terminal, and the correction unit.
The correction unit provides a voice recognition correction system that corrects the voice-recognized content based on the acquired position information when the determination unit determines that the mobile terminals are the same portable terminal. .
複数の携帯端末と、これら複数の携帯端末とネットワークで接続されている管理コンピュータとを含んで構成され、
前記複数の携帯端末は、前記位置情報取得手段と、前記ユーザが発声した音声に関する音声情報を取得する音声情報取得手段とを有し、
前記管理コンピュータは、前記複数の携帯端末によって取得された前記位置情報及び前記音声情報を受信可能に構成され、
前記管理コンピュータは、前記位置情報を送信した携帯端末と、前記音声情報を送信した携帯端末とが同一の携帯端末であるかを判別する判別手段と、前記補正手段とを有し、
前記補正手段は、前記判別手段により同一の携帯端末であると判別された場合に、前記取得された位置情報に基づいて、前記音声認識された内容を補正する、音声認識補正システム、を提供する。 The invention according to the eighth feature is the invention according to any one of the first to seventh features,
Comprising a plurality of portable terminals and a management computer connected to the plurality of portable terminals via a network;
The plurality of portable terminals include the position information acquisition unit and voice information acquisition unit that acquires voice information related to a voice uttered by the user,
The management computer is configured to receive the location information and the audio information acquired by the plurality of mobile terminals,
The management computer includes a determination unit that determines whether the portable terminal that has transmitted the position information and the portable terminal that has transmitted the audio information are the same portable terminal, and the correction unit.
The correction unit provides a voice recognition correction system that corrects the voice-recognized content based on the acquired position information when the determination unit determines that the mobile terminals are the same portable terminal. .
第8の特徴に係る発明によれば、音声認識補正システムが、複数の携帯端末と、これら複数の携帯端末とネットワークで接続されているネットワーク型のシステムである場合における誤認識を抑えることができる。これにより、音声を認識する精度をよりいっそう高めることの可能なシステムを提供することができる。
According to the eighth aspect of the invention, it is possible to suppress erroneous recognition when the voice recognition correction system is a plurality of portable terminals and a network type system connected to the plurality of portable terminals via a network. . As a result, it is possible to provide a system capable of further improving the accuracy of recognizing speech.
第9の特徴に係る発明は、第1から第8のいずれかの特徴に係る発明であって、
前記補正された内容を復唱する復唱手段と、
前記復唱された結果、問題がない場合に前記補正された内容を記録する記録手段とをさらに備える、音声認識システムを提供する。 The invention according to a ninth feature is the invention according to any one of the first to eighth features,
A repeater that repeats the corrected content;
There is provided a voice recognition system further comprising recording means for recording the corrected content when there is no problem as a result of the repetition.
前記補正された内容を復唱する復唱手段と、
前記復唱された結果、問題がない場合に前記補正された内容を記録する記録手段とをさらに備える、音声認識システムを提供する。 The invention according to a ninth feature is the invention according to any one of the first to eighth features,
A repeater that repeats the corrected content;
There is provided a voice recognition system further comprising recording means for recording the corrected content when there is no problem as a result of the repetition.
移動中のユーザが音声を認識させようとする場合、ユーザは、補正手段によって補正された内容を画面表示から確認することが難しい。第9の特徴に係る発明によれば、補正された内容が復唱されるため、ユーザが移動中であっても、画面表示に注視することなく、補正手段によって補正された内容を確認することができる。
When the moving user tries to recognize the voice, it is difficult for the user to confirm the content corrected by the correcting means from the screen display. According to the ninth aspect of the invention, since the corrected content is repeated, even if the user is moving, the content corrected by the correcting means can be confirmed without paying attention to the screen display. it can.
また、記録手段は、復唱された結果、問題がない場合に補正された内容を記録する。そのため、第9の特徴に係る発明によれば、補正された内容に誤りがある場合に、補正された内容が記録されることを防ぐことができ、結果として、音声を認識する精度をよりいっそう高めることの可能なシステムを提供することができる。
Also, the recording means records the corrected content when there is no problem as a result of the repetition. Therefore, according to the ninth aspect of the invention, when there is an error in the corrected content, it is possible to prevent the corrected content from being recorded, and as a result, the accuracy of recognizing the voice is further improved. A system that can be enhanced can be provided.
本発明によれば、周囲の雑音等から、システムの集音装置が音声を集音しきれない場合であっても、ユーザが特定の時点以前に訪れた場所の位置情報から、ユーザが認識させようとした音声の内容を推測し、当該音声を正しく認識することの可能なシステムを提供することができる。
According to the present invention, even if the sound collecting device of the system cannot collect sound due to ambient noise or the like, the user can recognize the position information from a location visited by the user before a specific time. It is possible to provide a system capable of estimating the content of such a voice and correctly recognizing the voice.
以下、本発明を実施するための形態について図を参照しながら説明する。なお、これはあくまでも一例であって、本発明の技術的範囲はこれに限られるものではない。
Hereinafter, modes for carrying out the present invention will be described with reference to the drawings. This is merely an example, and the technical scope of the present invention is not limited to this.
1.第1の実施形態
まず、本発明の第1の実施形態について説明する。 1. First Embodiment First, a first embodiment of the present invention will be described.
まず、本発明の第1の実施形態について説明する。 1. First Embodiment First, a first embodiment of the present invention will be described.
音声認識補正システムは、スマートフォン、スマートグラス、スマートウォッチ等の携帯端末に一体的に設けられたスタンドアローン型のシステムであってもよいし、携帯端末と当該携帯端末とネットワークを介して接続される管理コンピュータとを備えるネットワーク型のシステムであってもよい。
The voice recognition correction system may be a stand-alone type system that is provided integrally with a mobile terminal such as a smartphone, smart glass, or smart watch, or is connected to the mobile terminal via the network. A network type system including a management computer may be used.
第1の実施形態では、音声認識補正システムがスタンドアローン型のシステムであるものとして説明する。それに対し、後述する第2の実施形態では、音声認識補正システムがネットワーク型のシステムであるものとして説明する。
In the first embodiment, description will be made assuming that the speech recognition correction system is a stand-alone system. On the other hand, in the second embodiment to be described later, the speech recognition correction system will be described as a network type system.
<音声認識補正システム1の構成>
図1は、本実施形態における音声認識補正システム1のハードウェア構成とソフトウェア機能を説明するためのブロック図である。 <Configuration of voicerecognition correction system 1>
FIG. 1 is a block diagram for explaining the hardware configuration and software functions of a speechrecognition correction system 1 according to this embodiment.
図1は、本実施形態における音声認識補正システム1のハードウェア構成とソフトウェア機能を説明するためのブロック図である。 <Configuration of voice
FIG. 1 is a block diagram for explaining the hardware configuration and software functions of a speech
音声認識補正システム1は、データを制御する制御部10と、他の機器と通信を行う通信部20と、データを記憶する記憶部30と、ユーザの操作を受け付ける入力部40と、ユーザの声を集音する集音部50と、音声認識補正システム1が存在する位置を検出する位置検出部60と、一定の場所での滞在時間を計測するタイマ70と、制御部10で制御したデータや画像を出力表示する画像表示部80とを備える。
The speech recognition correction system 1 includes a control unit 10 that controls data, a communication unit 20 that communicates with other devices, a storage unit 30 that stores data, an input unit 40 that receives user operations, and user voices. A sound collection unit 50 that collects sound, a position detection unit 60 that detects a position where the speech recognition correction system 1 exists, a timer 70 that measures a staying time at a certain place, and data controlled by the control unit 10 And an image display unit 80 for outputting and displaying an image.
制御部10は、CPU(Central Processing Unit)、RAM(Random Access Memory)、ROM(Read Only Memory)等を備える。
The control unit 10 includes a CPU (Central Processing Unit), a RAM (Random Access Memory), a ROM (Read Only Memory), and the like.
通信部20は、他の機器と通信可能にするためのデバイス、例えば、IEEE802.11に準拠したWi-Fi(Wireless Fidelity)対応デバイスを備える。
The communication unit 20 includes a device for enabling communication with other devices, for example, a Wi-Fi (Wireless Fidelity) compatible device compliant with IEEE 802.11.
制御部10は、所定のプログラムを読み込み、必要に応じて通信部20と協働することで、位置情報位置情報取得モジュール11と、状態情報等取得モジュール12と、音声認識モジュール13と、補正モジュール14と、復唱モジュール15と、記録モジュール16とを実現する。
The control unit 10 reads a predetermined program and cooperates with the communication unit 20 as necessary, so that the position information position information acquisition module 11, the state information etc. acquisition module 12, the voice recognition module 13, and the correction module 14, a repeat module 15, and a recording module 16 are realized.
記憶部30は、データやファイルを記憶する装置であって、ハードディスクや半導体メモリ、記録媒体、メモリカード等による、データのストレージ部を備える。記憶部30は、後に説明する履歴情報データベース31、地図データベース32、滞在時間計測領域33、音声データベース34、辞書データベース35、及び分類データベース36を記憶する。また、記憶部30は、画像表示部80に表示させる画像のデータを記憶する。
The storage unit 30 is a device that stores data and files, and includes a data storage unit such as a hard disk, a semiconductor memory, a recording medium, and a memory card. The storage unit 30 stores a history information database 31, a map database 32, a stay time measurement area 33, a voice database 34, a dictionary database 35, and a classification database 36, which will be described later. The storage unit 30 also stores image data to be displayed on the image display unit 80.
入力部40の種類は、特に限定されない。入力部40として、例えば、キーボード、マウス、タッチパネル等が挙げられる。
The type of the input unit 40 is not particularly limited. Examples of the input unit 40 include a keyboard, a mouse, and a touch panel.
集音部50の種類は、特に限定されない。集音部50として、例えば、マイク等が挙げられる。
The type of the sound collecting unit 50 is not particularly limited. Examples of the sound collecting unit 50 include a microphone.
位置検出部60は、音声認識補正システム1が位置する緯度及び経度を検出できる装置であれば、特に限定されない。位置検出部60として、例えば、GPS(Global Positioning System)が挙げられる。
The position detection unit 60 is not particularly limited as long as it is a device that can detect the latitude and longitude where the voice recognition correction system 1 is located. Examples of the position detection unit 60 include a GPS (Global Positioning System).
タイマ70の種類は、一定の場所での滞在時間を計測可能であれば、特に限定されない。
The type of the timer 70 is not particularly limited as long as the staying time at a certain place can be measured.
画像表示部80の種類は、特に限定されない。画像表示部80として、例えば、モニタ、タッチパネル等が挙げられる。
The type of the image display unit 80 is not particularly limited. Examples of the image display unit 80 include a monitor and a touch panel.
<音声認識補正システム1を用いた音声認識補正方法を示すフローチャート]
図2は、音声認識補正システム1を用いた音声認識補正方法を示すフローチャートである。上述した各ハードウェアと、ソフトウェアモジュールが実行する処理について説明する。 <Flowchart showing a speech recognition correction method using the speech recognition correction system 1]
FIG. 2 is a flowchart showing a voice recognition correction method using the voicerecognition correction system 1. The processing executed by each hardware and the software module described above will be described.
図2は、音声認識補正システム1を用いた音声認識補正方法を示すフローチャートである。上述した各ハードウェアと、ソフトウェアモジュールが実行する処理について説明する。 <Flowchart showing a speech recognition correction method using the speech recognition correction system 1]
FIG. 2 is a flowchart showing a voice recognition correction method using the voice
〔ステップS10:位置情報の取得〕
最初に、音声認識補正システム1の制御部10は、位置情報取得モジュール11を実行し、ユーザが特定の時点以前に訪れた場所の位置情報を取得する(ステップS10)。 [Step S10: Acquisition of Position Information]
First, the control unit 10 of the voicerecognition correction system 1 executes the position information acquisition module 11 and acquires position information of a place visited by the user before a specific time (step S10).
最初に、音声認識補正システム1の制御部10は、位置情報取得モジュール11を実行し、ユーザが特定の時点以前に訪れた場所の位置情報を取得する(ステップS10)。 [Step S10: Acquisition of Position Information]
First, the control unit 10 of the voice
音声認識補正システム1の位置検出部60は、音声認識補正システム1が位置する緯度及び経度を随時検出する。そして、制御部10は、地図データベース32を参照し、位置検出部60が検出した緯度及び経度に相当する場所を検索する。そして、制御部10は、検索した場所を履歴情報データベース31に記録する。
The position detection unit 60 of the voice recognition correction system 1 detects the latitude and longitude where the voice recognition correction system 1 is located at any time. Then, the control unit 10 refers to the map database 32 and searches for a place corresponding to the latitude and longitude detected by the position detection unit 60. Then, the control unit 10 records the searched place in the history information database 31.
図3は、履歴情報データベース31の一例を示す。履歴情報データベース31には、位置検出部60が位置情報を検出したときの日付及び時刻と、位置検出部60が検出した位置に相当する場所との情報が識別番号と関連づけて記録されている。
FIG. 3 shows an example of the history information database 31. In the history information database 31, information on the date and time when the position detection unit 60 detects the position information and the location corresponding to the position detected by the position detection unit 60 is recorded in association with the identification number.
日付は、音声内容補正システム1に内蔵されているカレンダー機能(図示せず)を参照することで、記録可能である。時刻は、音声内容補正システム1に内蔵されている時計機能(図示せず)を参照することで、記録可能である。
The date can be recorded by referring to a calendar function (not shown) built in the audio content correction system 1. The time can be recorded by referring to a clock function (not shown) built in the audio content correction system 1.
制御部10は、履歴情報データベース31を参照することで、ユーザが特定の時点以前に訪れた場所の位置情報を取得できる。
The control unit 10 can acquire position information of a place visited by the user before a specific time by referring to the history information database 31.
〔ステップS11:状態情報等の取得〕
図2に戻る。続いて、制御部10は、状態情報等取得モジュール12を実行し、ユーザの状態を示す状態情報や、現在の天気の情報、クレジットカードや電子決済に関する決済情報等を取得する(ステップS11)。 [Step S11: Acquisition of status information and the like]
Returning to FIG. Subsequently, the control unit 10 executes the status information acquisition module 12, and acquires status information indicating the user status, current weather information, payment information regarding a credit card and electronic payment, and the like (step S11).
図2に戻る。続いて、制御部10は、状態情報等取得モジュール12を実行し、ユーザの状態を示す状態情報や、現在の天気の情報、クレジットカードや電子決済に関する決済情報等を取得する(ステップS11)。 [Step S11: Acquisition of status information and the like]
Returning to FIG. Subsequently, the control unit 10 executes the status information acquisition module 12, and acquires status information indicating the user status, current weather information, payment information regarding a credit card and electronic payment, and the like (step S11).
音声認識補正システム1のタイマ70は、音声認識補正システム1が一定の場所にとどまっている時間を計測し、滞在時間計測領域32に記録する。
The timer 70 of the voice recognition correction system 1 measures the time during which the voice recognition correction system 1 stays at a certain place and records it in the stay time measurement area 32.
図4は、滞在時間計測領域32の一例である。滞在時間計測領域32には、音声認識補正システム1の滞在場所、滞在開始日時、滞在終了日時の情報が記録される。
FIG. 4 is an example of the stay time measurement area 32. In the stay time measurement area 32, information on the stay location, stay start date and time and stay end date and time of the speech recognition correction system 1 is recorded.
音声認識補正システム1が一定の場所に所定時間以上とどまっていることが、滞在時間計測領域32に記録されると、制御部10は、ユーザが一定の場所に滞在しているとして、履歴情報データベース31における「状態」の項目を「滞在中」に更新する。
When it is recorded in the stay time measurement area 32 that the voice recognition correction system 1 has been staying at a certain place for a predetermined time or longer, the control unit 10 determines that the user is staying at a certain place and records the history information database. The item of “state” in 31 is updated to “staying”.
また、制御部10は、通信部20を介して外部の天気予報提供Webサイトにアクセスする。そして、制御部10は、当該天気予報提供Webサイトから、位置検出部60が検出した緯度及び経度に相当する地点における天気の情報を読み出す。そして、制御部10は、読み出した天気の情報を履歴情報データベース31に記録する。
Further, the control unit 10 accesses an external weather forecast providing website via the communication unit 20. And the control part 10 reads the information of the weather in the spot corresponded to the latitude and the longitude which the position detection part 60 detected from the said weather forecast provision Web site. Then, the control unit 10 records the read weather information in the history information database 31.
また、携帯端末が有するクレジットカード機能や電子決済機能が利用されると、制御部10は、クレジットカードや電子決済に関する決済情報を履歴情報データベース31に記録する。
Further, when the credit card function or the electronic payment function of the mobile terminal is used, the control unit 10 records the payment information regarding the credit card or the electronic payment in the history information database 31.
図3に示す履歴情報データベース31は、位置検出部60が位置情報を検出したときの日付、時刻及び場所の情報だけでなく、ユーザの状態を示す状態情報や、現在の天気の情報、クレジットカードや電子決済に関する決済情報等についても、識別番号と関連づけて記録されている。
The history information database 31 shown in FIG. 3 includes not only date, time and place information when the position detection unit 60 detects position information, but also state information indicating the user state, current weather information, credit card Also, payment information relating to electronic payment is recorded in association with an identification number.
制御部10は、履歴情報データベース31を参照することで、これら状態情報、天気情報、決済情報等を取得できる。
The control unit 10 can acquire the state information, weather information, settlement information, and the like by referring to the history information database 31.
〔ステップS12:音声の集音〕
図2に戻る。続いて、制御部10は、集音部50がユーザの音声を集音したか否かを判別する(ステップS12)。 [Step S12: Sound collection]
Returning to FIG. Subsequently, the control unit 10 determines whether or not the sound collection unit 50 has collected the user's voice (step S12).
図2に戻る。続いて、制御部10は、集音部50がユーザの音声を集音したか否かを判別する(ステップS12)。 [Step S12: Sound collection]
Returning to FIG. Subsequently, the control unit 10 determines whether or not the sound collection unit 50 has collected the user's voice (step S12).
集音部50がユーザの音声を集音すると、制御部10は、集音部50で集音した音声をA/D変換し、A/D変換された情報を記憶部30の所定領域にセットする。
When the sound collection unit 50 collects the user's voice, the control unit 10 performs A / D conversion on the voice collected by the sound collection unit 50 and sets the A / D converted information in a predetermined area of the storage unit 30. To do.
例えば、図5に示すように、ユーザは、「今日は、銀座に出かけた。晴れて良かった。Aデパートに寄ってブランドXの服を購入した。」と音声を発生したとする。この場合、音声認識補正システム1の集音部50は、その音声を集音する。そして、制御部10は、集音部50で集音した音声をA/D変換し、A/D変換された情報を記憶部30の所定領域にセットする。
For example, as shown in FIG. 5, it is assumed that the user generates a voice saying, “Today, I went to Ginza. It was fine and sunny. In this case, the sound collection unit 50 of the speech recognition correction system 1 collects the sound. Then, the control unit 10 A / D converts the sound collected by the sound collection unit 50 and sets the A / D converted information in a predetermined area of the storage unit 30.
ステップS12での判別がYESのときは、処理をステップS13に移す。それに対し、ステップS12での判別がNOのときは、ステップS10の処理に戻る。
If the determination in step S12 is YES, the process proceeds to step S13. On the other hand, when the determination in step S12 is NO, the process returns to step S10.
〔ステップS13:音声認識〕
図2に戻る。続いて、制御部10は、音声認識モジュール13を実行し、集音部50が集音した音声を音声認識する(ステップS13)。 [Step S13: Speech recognition]
Returning to FIG. Subsequently, the control unit 10 executes the voice recognition module 13 and recognizes the voice collected by the sound collection unit 50 (step S13).
図2に戻る。続いて、制御部10は、音声認識モジュール13を実行し、集音部50が集音した音声を音声認識する(ステップS13)。 [Step S13: Speech recognition]
Returning to FIG. Subsequently, the control unit 10 executes the voice recognition module 13 and recognizes the voice collected by the sound collection unit 50 (step S13).
制御部10は、図6に示す音声データベース34を参照し、A/D変換された情報が有する音波の波形から、集音部50で集音した音声を文字起こしする。この処理により、A/D変換された情報は、「キョウハ???ニデカケタ/ハレテヨカッタ/???ニヨッテブランド???ノフクヲコウニュウシタ」とされる。なお、「???」は、周囲の雑音等から、音声内容補正システム1の集音部50が音声を集音しきれなかった箇所である。
The control unit 10 refers to the voice database 34 shown in FIG. 6 and transcribes the voice collected by the sound collection unit 50 from the sound wave waveform included in the A / D converted information. As a result of this processing, the information that has been A / D converted is “Kyoha ??? Nidekaketa / Harete Yokatta / ??? Nyotte Brand ?????? In addition, “???” is a place where the sound collection unit 50 of the sound content correction system 1 cannot collect sound due to ambient noise or the like.
続いて、制御部10は、図7に示す辞書データベース35を参照し、文字起こしされた情報を言語に置き換え、文章化する。この処理により、A/D変換された情報は、「今日は、???に出かけた。晴れて良かった。???に寄って???の服を購入した。」とされる。文章化された情報は、A/D変換された情報と関連づけて、記憶部30の所定領域にセットされる。
Subsequently, the control unit 10 refers to the dictionary database 35 shown in FIG. 7, replaces the transcribed information with a language, and creates a sentence. As a result of this processing, the information that has been A / D converted is “I went out to today. It was good to be sunny. The documented information is set in a predetermined area of the storage unit 30 in association with the A / D converted information.
〔ステップS14:音声認識した内容の補正〕
図2に戻る。続いて、制御部10は、補正モジュール14を実行し、ステップS10の処理で取得した位置情報、ステップS11の処理で取得した状態情報等に基づいて、ステップS13の処理で音声認識した内容を補正する(ステップS14)。 [Step S14: Correction of Recognized Content]
Returning to FIG. Subsequently, the control unit 10 executes the correction module 14 and corrects the content recognized in the process of step S13 based on the position information acquired in the process of step S10, the state information acquired in the process of step S11, and the like. (Step S14).
図2に戻る。続いて、制御部10は、補正モジュール14を実行し、ステップS10の処理で取得した位置情報、ステップS11の処理で取得した状態情報等に基づいて、ステップS13の処理で音声認識した内容を補正する(ステップS14)。 [Step S14: Correction of Recognized Content]
Returning to FIG. Subsequently, the control unit 10 executes the correction module 14 and corrects the content recognized in the process of step S13 based on the position information acquired in the process of step S10, the state information acquired in the process of step S11, and the like. (Step S14).
制御部10は、分類データベース36を参照する。図8は、分類データベース36の一例である。分類データベース36には、文章化された内容に含まれる単語等と、履歴情報データベース31にリストアップされている項目との関係が予め記録されている。本実施形態では、履歴情報データベース31(図3)には、「日付」、「時刻」、「場所」、「状態」、「天気」、「決済情報」等の項目がリストアップされている。分類データベース36には、これら項目に関連する単語群が記録されている。
The control unit 10 refers to the classification database 36. FIG. 8 is an example of the classification database 36. The classification database 36 records in advance the relationship between words and the like included in the documented contents and items listed in the history information database 31. In the present embodiment, items such as “date”, “time”, “location”, “state”, “weather”, “payment information” are listed in the history information database 31 (FIG. 3). In the classification database 36, word groups related to these items are recorded.
音声認識された内容である「今日は、???に出かけた。晴れて良かった。???に寄って???の服を購入した。」との情報について説明する。制御部10は、分類データベース36を参照し、この情報に含まれる「今日」を項目「日付」と関連づけ、「出かけた」を項目「場所」と関連づける。また、「良かった」を項目「天気」と関連づけ、「寄って」を項目「場所」と関連づける。また、「服」を項目「決済情報」と関連づけ、「購入」を項目「決済情報」と関連づける。
”Explain the information that is the content of the voice recognition:“ I went out for today. It was fine to be fine. The control unit 10 refers to the classification database 36, associates “today” included in this information with the item “date”, and associates “going out” with the item “location”. In addition, “good” is associated with the item “weather”, and “stop” is associated with the item “location”. Further, “clothes” is associated with the item “settlement information”, and “purchase” is associated with the item “settlement information”.
続いて、制御部10は、履歴情報データベース31を参照する。まず、制御部10は、履歴情報データベース31の項目「日付」を参照し、音声認識された内容に含まれる「今日」に関する項目を抽出する。なお、今日がいつであるかは、記憶部30に記憶されているカレンダー(図示せず)を読み出すことで把握できる。本実施形態では、今日が2017年3月20日であるものとして説明する。
Subsequently, the control unit 10 refers to the history information database 31. First, the control unit 10 refers to the item “date” in the history information database 31 and extracts an item related to “today” included in the speech-recognized content. In addition, it can be grasped | ascertained by reading the calendar (not shown) memorize | stored in the memory | storage part 30 when today is. In the present embodiment, it is assumed that today is March 20, 2017.
続いて、制御部10は、履歴情報データベース31の項目「場所」を参照し、音声認識された内容に含まれる「出かけた」、「寄って」に関する項目を抽出する。
Subsequently, the control unit 10 refers to the item “place” in the history information database 31 and extracts items relating to “going out” and “stopping” included in the speech-recognized content.
音声認識された内容から、「出かけた」場所、「寄って」た場所を直ちに特定できないものの、制御部10は、履歴情報データベース31に記録された内容から、「出かけた」場所、「寄って」た場所は、「有楽町」、「有楽町駅」、「Aデパート」、「デパート」、「銀座」、「Aデパート銀座店」のいずれかであると推測できる。
Although it is not possible to immediately identify the place “going out” and the place “stopping” from the speech-recognized content, the control unit 10 determines the “going out” location, “stopping” from the content recorded in the history information database 31. It can be inferred that the location is “Yurakucho”, “Yurakucho Station”, “A Department Store”, “Department Store”, “Ginza”, or “A Department Store Ginza Store”.
そして、制御部10は、音声データベース34(図6)を参照し、「有楽町」、「有楽町駅」、「Aデパート」、「デパート」、「銀座」、「Aデパート銀座店」に相当する音声データ(波形データ)を合成する。続いて、制御部10は、合成した音声データと、ステップS13の処理でA/D変換された音声データとを比較し、「???に出かけた。」、「???に寄って」の「???」に相当する音声データに最も近い音声を抽出する。
Then, the control unit 10 refers to the voice database 34 (FIG. 6), and the voice corresponding to “Yurakucho”, “Yurakucho Station”, “A Department Store”, “Department Store”, “Ginza”, “A Department Store Ginza Store”. Synthesize data (waveform data). Subsequently, the control unit 10 compares the synthesized voice data with the voice data that has been A / D converted in the process of step S <b> 13, and went to “???”. The voice closest to the voice data corresponding to “???” is extracted.
これにより、制御部10は、「???に出かけた。」の「???」は、「銀座」であり、「???に寄って」の「???」は、「Aデパート」であると推測できる。
As a result, the control unit 10 “Gone” of “I went to“ ??? ”is“ Ginza ”, and“ ??? ” Can be guessed.
同様に、制御部10は、履歴情報データベース31の項目「決済情報」を参照し、音声認識された内容に含まれる「服」、「購入」に関する項目を抽出する。
Similarly, the control unit 10 refers to the item “payment information” in the history information database 31 and extracts items related to “clothes” and “purchase” included in the speech-recognized content.
音声認識された内容から、「服」を「購入」した内容を直ちに特定できないものの、制御部10は、履歴情報データベース31に記録された内容から、「服」を「購入」した内容は、「ブランドX」、「シャツ」、「7560円」、「クレジットカード」、「カード決済」のいずれかであると推測できる。
Although the content of “purchasing” “clothes” cannot be immediately identified from the speech-recognized content, the control unit 10 “purchased” “clothes” from the content recorded in the history information database 31. It can be inferred that it is one of “brand X”, “shirt”, “7560 yen”, “credit card”, or “card payment”.
そして、制御部10は、音声データベース34(図6)を参照し、「ブランドX」、「シャツ」、「7560円」、「クレジットカード」、「カード決済」に相当する音声データ(波形データ)を合成する。続いて、制御部10は、合成した音声データと、ステップS13の処理でA/D変換された音声データとを比較し、「???の服を購入した。」の「???」に相当する音声データに最も近い音声を抽出する。
Then, the control unit 10 refers to the voice database 34 (FIG. 6), and the voice data (waveform data) corresponding to “brand X”, “shirt”, “7560 yen”, “credit card”, “card payment”. Is synthesized. Subsequently, the control unit 10 compares the synthesized voice data with the voice data that has been A / D converted in the process of step S <b> 13 and sets “???” to “???”. The voice closest to the corresponding voice data is extracted.
これにより、制御部10は、「???の服を購入した。」の「???」は、「ブランドX」であると推測できる。
Thus, the control unit 10 can presume that “???” in “?????? purchases” is “brand X”.
上記から、ステップS13の処理で音声認識した内容である「今日は、???に出かけた。晴れて良かった。???に寄って???の服を購入した。」との情報は、「今日は、銀座に出かけた。晴れて良かった。Aデパートに寄ってブランドXの服を購入した。」との情報に補正できる。
From the above, the information that is the content of the speech recognition in the process of step S13, “I went out to today, was fine. , "Today, I went to Ginza. It was fine to be fine. I went to A department store and bought brand X clothes."
本実施形態に記載の発明によれば、制御部10は、ステップS10の処理において、ユーザが特定の時点以前に訪れた場所の位置情報を取得し、ステップS14の処理において、ステップS10の処理で取得した位置情報に基づいて、音声認識の内容を補正する。これにより、周囲の雑音等から、システムの集音装置が音声を集音しきれない場合であっても、ユーザが特定の時点以前に訪れた場所の位置情報から、ユーザが認識させようとした音声の内容を推測し、当該音声を正しく認識することの可能な音声内容補正システム1を提供することができる。
According to the invention described in the present embodiment, the control unit 10 acquires position information of a place visited by the user before a specific time in the process of step S10, and the process of step S10 in the process of step S14. Based on the acquired position information, the content of voice recognition is corrected. As a result, even if the sound collecting device of the system cannot pick up the sound due to ambient noise or the like, the user tried to make the user recognize from the location information of the place where the user visited before a specific time. It is possible to provide the audio content correction system 1 that can estimate the audio content and correctly recognize the audio.
また、ステップS14の処理において、制御部10は、ステップS11の処理で取得した位置情報における天気情報、時間情報、ユーザの状態を示す状態情報、ユーザが決済した決済情報を特定して、ステップS13の処理で音声認識された内容を補正することを可能にする。本実施形態に記載の発明によると、ユーザが特定の時点以前に訪れた場所の位置情報に加え、その位置情報に関する各種の情報を特定して、音声認識された内容を補正することを可能にする。そのため、音声を認識する精度をよりいっそう高めることの可能な音声内容補正システム1を提供することができる。
In the process of step S14, the control unit 10 identifies the weather information, the time information, the state information indicating the user state, and the payment information settled by the user in the position information acquired in the process of step S11. It is possible to correct the content recognized by the voice processing. According to the invention described in the present embodiment, in addition to the position information of a place visited by a user before a specific time point, it is possible to specify various information related to the position information and correct the content recognized by voice. To do. Therefore, it is possible to provide the audio content correction system 1 that can further increase the accuracy of recognizing the audio.
また、ステップS14の処理において、制御部10は、ステップS10の処理で取得した位置情報に関するWebコンテンツを参照して、ステップS13の処理で音声認識した内容を補正することが好ましい。そうすることで、音声を認識する精度をよりいっそう高めることの可能な音声内容補正システム1を提供することができる。
Moreover, in the process of step S14, it is preferable that the control unit 10 refers to the Web content related to the position information acquired in the process of step S10 and corrects the content recognized in the process of step S13. By doing so, it is possible to provide the audio content correction system 1 that can further increase the accuracy of recognizing the audio.
〔ステップS15:補正内容の復唱〕
図2に戻る。続いて、制御部10は、復唱モジュール15を実行し、ステップS14の処理で補正された内容を復唱する(ステップS15)。 [Step S15: Recurrence of correction contents]
Returning to FIG. Subsequently, the control unit 10 executes the repeat module 15 and repeats the content corrected in the process of step S14 (step S15).
図2に戻る。続いて、制御部10は、復唱モジュール15を実行し、ステップS14の処理で補正された内容を復唱する(ステップS15)。 [Step S15: Recurrence of correction contents]
Returning to FIG. Subsequently, the control unit 10 executes the repeat module 15 and repeats the content corrected in the process of step S14 (step S15).
図9は、そのときの音声内容補正システム1の状態の一例を示す。
FIG. 9 shows an example of the state of the audio content correction system 1 at that time.
画像表示部80には、「今日は、銀座に出かけた。晴れて良かった。Aデパートに寄ってブランドXの服を購入した。」との文章が表示され、その下に、「これで間違いありませんか?」との文章が表示されるとともに、「OK」とのアイコンが表示されている。そして、音声内容補正システム1のスピーカ(図示せず)からは、「今日は、銀座に出かけた。晴れて良かった。Aデパートに寄ってブランドXの服を購入した。」との音声が復唱される。そして、「これで間違いありませんか?間違えなければ、「はい」と答えるか、「OK」を押してください。」との音声が出力される。
On the image display section 80, a text saying “I went to Ginza today. It was fine and good. I bought a clothes for Brand X by visiting department store A” was displayed. A sentence “is there?” Is displayed, and an icon “OK” is displayed. Then, from the speaker (not shown) of the audio content correction system 1, “Today I went to Ginza. It was fine. It was sunny. Is done. Then, “Is this correct? If yes, please answer“ Yes ”or press“ OK ”. Is output.
移動中のユーザが音声を認識させようとする場合、ユーザは、ステップS14の処理で補正された内容を画面表示から確認することが難しい。本実施形態に記載の発明によれば、補正された内容が、画像表示部80での画面表示だけでなく、スピーカからの音声として復唱されるため、ユーザが移動中であっても、画面表示に注視することなく、ステップS14の処理で補正された内容を確認することができる。
When the moving user wants to recognize the voice, it is difficult for the user to confirm the content corrected in the process of step S14 from the screen display. According to the invention described in the present embodiment, the corrected content is repeated not only as a screen display on the image display unit 80 but also as a sound from a speaker. The content corrected in the process of step S14 can be confirmed without paying attention to the above.
〔ステップS16:補正内容の記録〕
図2に戻る。続いて、制御部10は、記録モジュール16を実行し、ステップS15の処理で復唱された結果、問題がない場合に、ステップS14の処理で補正された内容を記録する(ステップS16)。 [Step S16: Recording Correction Contents]
Returning to FIG. Then, the control part 10 performs the recording module 16, and when there is no problem as a result of repeating by the process of step S15, the content correct | amended by the process of step S14 is recorded (step S16).
図2に戻る。続いて、制御部10は、記録モジュール16を実行し、ステップS15の処理で復唱された結果、問題がない場合に、ステップS14の処理で補正された内容を記録する(ステップS16)。 [Step S16: Recording Correction Contents]
Returning to FIG. Then, the control part 10 performs the recording module 16, and when there is no problem as a result of repeating by the process of step S15, the content correct | amended by the process of step S14 is recorded (step S16).
ステップS13の処理でA/D変換された音声データのうち、ステップS13の処理だけではその内容が不明であった箇所の内容は、「銀座」、「Aデパート」、「ブランドX」であることが判明した。制御部10は、ステップS13の処理でA/D変換された音声データから、「銀座」、「A」、「デパート」、「Aデパート」、「ブランド」、「X」、「ブランドX」に相当する箇所の波形を抽出し、もともと記憶されていた音声データベース34に上書き保存する。
Of the audio data A / D converted in the process of step S13, the contents of the part whose contents are unknown only by the process of step S13 are “Ginza”, “A department store”, and “Brand X”. There was found. The control unit 10 changes the voice data A / D converted in the process of step S13 to “Ginza”, “A”, “Department Store”, “A Department Store”, “Brand”, “X”, “Brand X”. The waveform at the corresponding location is extracted and overwritten and saved in the voice database 34 that was originally stored.
図10は、上書き保存された後の音声データベース34の一例を示す。音声データベース34には、新たに、「銀座」、「A」、「デパート」、「Aデパート」、「ブランド」、「X」、「ブランドX」の音声データが追加されている。
FIG. 10 shows an example of the voice database 34 after being overwritten. Audio data of “Ginza”, “A”, “Department Store”, “A Department Store”, “Brand”, “X”, and “Brand X” is newly added to the audio database 34.
本実施形態に記載の発明によると、ステップS15の処理で復唱された結果、問題がない場合に、補正された内容を記録する。そのため、ステップS14の処理で補正された内容に誤りがある場合に、誤った内容が記録されることを防ぐことができ、結果として、音声を認識する精度をよりいっそう高めることの可能な音声認識補正システム1を提供することができる。
According to the invention described in this embodiment, the corrected content is recorded when there is no problem as a result of the repetition in step S15. Therefore, when there is an error in the content corrected in the process of step S14, it is possible to prevent the incorrect content from being recorded, and as a result, the speech recognition that can further improve the accuracy of recognizing the speech. A correction system 1 can be provided.
2.第2の実施形態
次に、本発明の第2の実施形態について説明する。 2. Second Embodiment Next, a second embodiment of the present invention will be described.
次に、本発明の第2の実施形態について説明する。 2. Second Embodiment Next, a second embodiment of the present invention will be described.
第1の実施形態では、音声認識補正システムがスタンドアローン型のシステムであるものとして説明した。それに対し、第2の実施形態は、音声認識補正システムがネットワーク型のシステムである点で異なり、その余は同じである。
In the first embodiment, the voice recognition correction system is described as a stand-alone type system. On the other hand, the second embodiment is different in that the voice recognition correction system is a network type system, and the rest is the same.
<音声認識補正システム100>
図11は、本実施形態における音声認識補正システム100のハードウェア構成とソフトウェア機能を説明するためのブロック図である。 <Voice recognition correction system 100>
FIG. 11 is a block diagram for explaining the hardware configuration and software functions of the speech recognition correction system 100 according to this embodiment.
図11は、本実施形態における音声認識補正システム100のハードウェア構成とソフトウェア機能を説明するためのブロック図である。 <Voice recognition correction system 100>
FIG. 11 is a block diagram for explaining the hardware configuration and software functions of the speech recognition correction system 100 according to this embodiment.
音声認識補正システム100は、複数の携帯端末200と、これら複数の携帯端末200とネットワークで接続されている管理コンピュータ300とを含んで構成される。
The voice recognition correction system 100 includes a plurality of portable terminals 200 and a management computer 300 connected to the plurality of portable terminals 200 via a network.
〔携帯端末200〕
携帯端末200は、それぞれ、制御部210と、通信部220と、記憶部230と、入力部240と、集音部250と、位置検出部260と、画像表示部280とを備える。 [Portable terminal 200]
The mobile terminal 200 includes a control unit 210, a communication unit 220, a storage unit 230, an input unit 240, a sound collection unit 250, a position detection unit 260, and an image display unit 280, respectively.
携帯端末200は、それぞれ、制御部210と、通信部220と、記憶部230と、入力部240と、集音部250と、位置検出部260と、画像表示部280とを備える。 [Portable terminal 200]
The mobile terminal 200 includes a control unit 210, a communication unit 220, a storage unit 230, an input unit 240, a sound collection unit 250, a position detection unit 260, and an image display unit 280, respectively.
制御部210は、位置情報取得モジュール211と、状態情報等取得モジュール212と、復唱モジュール215とを有する。
The control unit 210 includes a position information acquisition module 211, a state information acquisition module 212, and a repetition module 215.
集音部250は、ユーザが発声した音声に関する音声情報を取得する音声情報取得手段として機能する。
The sound collection unit 250 functions as a voice information acquisition unit that acquires voice information related to the voice uttered by the user.
なお、通信部220、記憶部230、入力部240、位置検出部260、及び画像表示部280の機能は、第1の実施形態における通信部20、記憶部30、入力部40、位置検出部60、及び画像表示部80の機能と同じである。
The functions of the communication unit 220, the storage unit 230, the input unit 240, the position detection unit 260, and the image display unit 280 are the same as those of the communication unit 20, the storage unit 30, the input unit 40, and the position detection unit 60 in the first embodiment. , And the function of the image display unit 80.
また、位置情報取得モジュール211、状態情報等取得モジュール212、及び復唱モジュール215の機能は、第1の実施形態における位置情報取得モジュール11、状態情報等取得モジュール12、及び復唱モジュール15の機能と同じである。
Further, the functions of the position information acquisition module 211, the state information acquisition module 212, and the repetition module 215 are the same as the functions of the position information acquisition module 11, the state information acquisition module 12, and the repetition module 15 in the first embodiment. It is.
〔管理コンピュータ300〕
管理コンピュータ300は、制御部310と、通信部320と、記憶部330と、入力部340と、画像表示部380とを備える。 [Management computer 300]
The management computer 300 includes a control unit 310, a communication unit 320, a storage unit 330, an input unit 340, and an image display unit 380.
管理コンピュータ300は、制御部310と、通信部320と、記憶部330と、入力部340と、画像表示部380とを備える。 [Management computer 300]
The management computer 300 includes a control unit 310, a communication unit 320, a storage unit 330, an input unit 340, and an image display unit 380.
制御部310は、音声認識モジュール313と、補正モジュール314と、記録モジュール316とを有する。
The control unit 310 includes a voice recognition module 313, a correction module 314, and a recording module 316.
通信部320は、複数の携帯端末200によって取得された位置情報及び音声情報を受信可能に構成される。
The communication unit 320 is configured to be able to receive position information and audio information acquired by the plurality of mobile terminals 200.
記憶部330には、履歴情報データベース331、地図データベース332、滞在時間計測領域333、音声データベース334、辞書データベース335、及び分類データベース336が記憶されている。
The storage unit 330 stores a history information database 331, a map database 332, a stay time measurement area 333, a voice database 334, a dictionary database 335, and a classification database 336.
ところで、制御部310は、複数の携帯端末200のうち、位置情報を送信した携帯端末と、音声情報を送信した携帯端末とが同一の携帯端末であるかを判別する。そして、制御部310の補正モジュール314は、位置情報を送信した携帯端末と、音声情報を送信した携帯端末とが同一の携帯端末である場合に、当該携帯端末が取得した位置情報に基づいて、当該携帯端末の集音部250が集音した音声について音声認識された内容を補正する。
By the way, the control part 310 discriminate | determines whether the portable terminal which transmitted the positional information and the portable terminal which transmitted audio | voice information are the same portable terminals among the some portable terminals 200. FIG. Then, the correction module 314 of the control unit 310, when the portable terminal that transmitted the position information and the portable terminal that transmitted the audio information are the same portable terminal, based on the position information acquired by the portable terminal, The speech-recognized content of the sound collected by the sound collection unit 250 of the portable terminal is corrected.
これにより、音声認識補正システム1が、複数の携帯端末200と、これら複数の携帯端末200とネットワークで接続されている管理コンピュータ300とを備えるネットワーク型のシステムである場合における誤認識を抑えることができる。したがって、音声を認識する精度をよりいっそう高めることの可能なネットワーク型の音声認識補正システム1を提供することができる。
Accordingly, it is possible to suppress misrecognition when the speech recognition correction system 1 is a network type system including a plurality of portable terminals 200 and a management computer 300 connected to the plurality of portable terminals 200 via a network. it can. Therefore, it is possible to provide the network type speech recognition correction system 1 that can further increase the accuracy of speech recognition.
なお、入力部340及び画像表示部380の機能は、第1の実施形態における入力部40及び画像表示部80の機能と同じである。
The functions of the input unit 340 and the image display unit 380 are the same as the functions of the input unit 40 and the image display unit 80 in the first embodiment.
また、音声認識モジュール313、補正モジュール314、及び記録モジュール316の機能は、基本的に、第1の実施形態における音声認識モジュール13、補正モジュール14、及び記録モジュール16の機能と同じである。
The functions of the speech recognition module 313, the correction module 314, and the recording module 316 are basically the same as the functions of the speech recognition module 13, the correction module 14, and the recording module 16 in the first embodiment.
また、履歴情報データベース331、地図データベース332、滞在時間計測領域333、音声データベース334、辞書データベース335、及び分類データベース336の構成は、第1の実施形態における履歴情報データベース31、地図データベース32、滞在時間計測領域33、音声データベース34、辞書データベース35、及び分類データベース36の構成と同じである。
The history information database 331, the map database 332, the stay time measurement area 333, the voice database 334, the dictionary database 335, and the classification database 336 are configured in the history information database 31, the map database 32, the stay time in the first embodiment. The configuration is the same as that of the measurement area 33, the voice database 34, the dictionary database 35, and the classification database 36.
上述した手段、機能は、コンピュータ(CPU、情報処理装置、各種端末を含む)が、所定のプログラムを読み込んで、実行することによって実現される。プログラムは、例えば、フレキシブルディスク、CD(CD-ROMなど)、DVD(DVD-ROM、DVD-RAMなど)等のコンピュータ読取可能な記録媒体に記録された形態で提供される。この場合、コンピュータはその記録媒体からプログラムを読み取って内部記憶装置又は外部記憶装置に転送し記憶して実行する。また、そのプログラムを、例えば、磁気ディスク、光ディスク、光磁気ディスク等の記憶装置(記録媒体)に予め記録しておき、その記憶装置から通信回線を介してコンピュータに提供するようにしてもよい。
The means and functions described above are realized by a computer (including a CPU, an information processing apparatus, and various terminals) reading and executing a predetermined program. The program is provided in a form recorded on a computer-readable recording medium such as a flexible disk, CD (CD-ROM, etc.), DVD (DVD-ROM, DVD-RAM, etc.). In this case, the computer reads the program from the recording medium, transfers it to the internal storage device or the external storage device, stores it, and executes it. The program may be recorded in advance in a storage device (recording medium) such as a magnetic disk, an optical disk, or a magneto-optical disk, and provided from the storage device to a computer via a communication line.
以上、本発明の実施形態について説明したが、本発明は上述したこれらの実施形態に限るものではない。また、本発明の実施形態に記載された効果は、本発明から生じる最も好適な効果を列挙したに過ぎず、本発明による効果は、本発明の実施形態に記載されたものに限定されるものではない。
As mentioned above, although embodiment of this invention was described, this invention is not limited to these embodiment mentioned above. The effects described in the embodiments of the present invention are only the most preferable effects resulting from the present invention, and the effects of the present invention are limited to those described in the embodiments of the present invention. is not.
1 音声内容記録システム
10 制御部
11 位置情報取得モジュール
12 状態情報等取得モジュール
13 音声認識モジュール
14 補正モジュール
15 復唱モジュール
16 記録モジュール
20 通信部
30 記憶部
31 履歴情報データベース
32 地図データベース
33 滞在時間計測領域
34 音声データベース
35 辞書データバース
36 分類データベース
40 入力部
50 集音部
60 位置検出部
70 タイマ
80 画像表示部
DESCRIPTION OFSYMBOLS 1 Voice content recording system 10 Control part 11 Position information acquisition module 12 State information etc. acquisition module 13 Voice recognition module 14 Correction module 15 Repetition module 16 Recording module 20 Communication part 30 Storage part 31 History information database 32 Map database 33 Stay time measurement area 34 Voice Database 35 Dictionary Data Bath 36 Classification Database 40 Input Unit 50 Sound Collection Unit 60 Position Detection Unit 70 Timer 80 Image Display Unit
10 制御部
11 位置情報取得モジュール
12 状態情報等取得モジュール
13 音声認識モジュール
14 補正モジュール
15 復唱モジュール
16 記録モジュール
20 通信部
30 記憶部
31 履歴情報データベース
32 地図データベース
33 滞在時間計測領域
34 音声データベース
35 辞書データバース
36 分類データベース
40 入力部
50 集音部
60 位置検出部
70 タイマ
80 画像表示部
DESCRIPTION OF
Claims (11)
- ユーザが特定の時点以前に訪れた場所の位置情報を取得する位置情報取得手段と、
前記ユーザが発声した音声を音声認識する音声認識手段と、
前記取得された位置情報に基づいて、前記音声認識された内容を補正する補正手段と、
を備える音声認識補正システム。 Position information acquisition means for acquiring position information of a place visited by a user before a specific time;
Speech recognition means for recognizing speech uttered by the user;
Correction means for correcting the speech-recognized content based on the acquired position information;
A speech recognition correction system comprising: - 前記位置情報取得手段は、前記ユーザの携帯端末から、当該ユーザが特定の時点以前に訪れた場所の位置情報を取得する、請求項1に記載の音声認識補正システム。 The voice recognition correction system according to claim 1, wherein the position information acquisition means acquires position information of a place visited by the user before a specific time from the user's mobile terminal.
- 前記補正手段は、前記取得された位置情報に関するWebコンテンツを参照して、前記音声認識された内容を補正する、請求項1又は2に記載の音声認識補正システム。 The speech recognition correction system according to claim 1 or 2, wherein the correction unit corrects the speech-recognized content with reference to Web content related to the acquired position information.
- 前記補正手段は、前記取得された位置情報における天気情報を特定して、前記音声認識された内容を補正する、請求項1から3のいずれかに記載の音声認識補正システム。 4. The voice recognition correction system according to claim 1, wherein the correction unit specifies weather information in the acquired position information and corrects the voice-recognized content.
- 前記補正手段は、前記取得された位置情報における時間情報を特定して、前記音声認識された内容を補正する、請求項1から4のいずれかに記載の音声認識補正システム。 The speech recognition correction system according to any one of claims 1 to 4, wherein the correction unit specifies time information in the acquired position information and corrects the speech-recognized content.
- 前記ユーザの携帯端末から、当該ユーザの状態を示す状態情報を取得する状態情報取得手段をさらに備え、
前記補正手段は、前記取得された位置情報における状態情報を特定して、前記音声認識された内容を補正する、請求項1から5のいずれかに記載の音声認識補正システム。 Further comprising state information acquisition means for acquiring state information indicating the state of the user from the portable terminal of the user;
The speech recognition correction system according to claim 1, wherein the correction unit specifies state information in the acquired position information and corrects the speech-recognized content. - 前記ユーザが決済した決済情報を取得する決済情報取得手段をさらに備え、
前記補正手段は、前記取得された位置情報における決済情報を特定して、前記音声認識された内容を補正する、請求項1から6のいずれかに記載の音声認識補正システム。 Payment information acquisition means for acquiring payment information settled by the user;
The voice recognition correction system according to claim 1, wherein the correction unit specifies payment information in the acquired position information and corrects the voice-recognized content. - 複数の携帯端末と、これら複数の携帯端末とネットワークで接続されている管理コンピュータとを含んで構成され、
前記複数の携帯端末は、前記位置情報取得手段と、前記ユーザが発声した音声に関する音声情報を取得する音声情報取得手段とを有し、
前記管理コンピュータは、前記複数の携帯端末によって取得された前記位置情報及び前記音声情報を受信可能に構成され、
前記管理コンピュータは、前記位置情報を送信した携帯端末と、前記音声情報を送信した携帯端末とが同一の携帯端末であるかを判別する判別手段と、前記補正手段とを有し、
前記補正手段は、前記判別手段により同一の携帯端末であると判別された場合に、前記取得された位置情報に基づいて、前記音声認識された内容を補正する、請求項1から7のいずれかに記載の音声認識補正システム。 Comprising a plurality of portable terminals and a management computer connected to the plurality of portable terminals via a network;
The plurality of portable terminals include the position information acquisition unit and voice information acquisition unit that acquires voice information related to a voice uttered by the user,
The management computer is configured to receive the location information and the audio information acquired by the plurality of mobile terminals,
The management computer includes a determination unit that determines whether the portable terminal that has transmitted the position information and the portable terminal that has transmitted the audio information are the same portable terminal, and the correction unit.
The said correction | amendment means correct | amends the said audio | voice recognition content based on the said acquired positional information, when it determines with the said portable terminal being the same portable terminal. The speech recognition correction system described in 1. - 前記補正された内容を復唱する復唱手段と、
前記復唱された結果、問題がない場合に前記補正された内容を記録する記録手段とをさらに備える、請求項1から8のいずれかに記載の音声認識システム。 A repeater that repeats the corrected content;
The speech recognition system according to claim 1, further comprising recording means for recording the corrected content when there is no problem as a result of the repetition. - ユーザが特定の時点以前に訪れた場所の位置情報を取得するステップと、
前記ユーザが発声した音声を音声認識するステップと、
前記取得された位置情報に基づいて、前記音声認識された内容を補正するステップと、
を備える音声認識補正方法。 Obtaining location information of places visited by a user before a certain point in time;
Recognizing the voice uttered by the user;
Correcting the speech-recognized content based on the acquired position information;
A speech recognition correction method comprising: - 音声認識システムに、
ユーザが特定の時点以前に訪れた場所の位置情報を取得するステップと、
前記ユーザが発声した音声を音声認識するステップと、
前記取得された位置情報に基づいて、前記音声認識された内容を補正するステップと、
を実行させるためのプログラム。
In speech recognition system,
Obtaining location information of places visited by a user before a certain point in time;
Recognizing the voice uttered by the user;
Correcting the speech-recognized content based on the acquired position information;
A program for running
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2018516873A JP6457154B1 (en) | 2017-03-31 | 2017-03-31 | Speech recognition correction system, method and program |
PCT/JP2017/013826 WO2018179426A1 (en) | 2017-03-31 | 2017-03-31 | Speech recognition correcting system, method, and program |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2017/013826 WO2018179426A1 (en) | 2017-03-31 | 2017-03-31 | Speech recognition correcting system, method, and program |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018179426A1 true WO2018179426A1 (en) | 2018-10-04 |
Family
ID=63674781
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2017/013826 WO2018179426A1 (en) | 2017-03-31 | 2017-03-31 | Speech recognition correcting system, method, and program |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP6457154B1 (en) |
WO (1) | WO2018179426A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110534112A (en) * | 2019-08-23 | 2019-12-03 | 王晓佳 | Distributed speech recongnition error correction device and method based on position and time |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004163265A (en) * | 2002-11-13 | 2004-06-10 | Nissan Motor Co Ltd | Navigation device |
JP2006349427A (en) * | 2005-06-14 | 2006-12-28 | Toyota Motor Corp | In-vehicle speech recognition device |
JP2012093508A (en) * | 2010-10-26 | 2012-05-17 | Nec Corp | Voice recognition support system, voice recognition support device, user terminal, method and program |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3948441B2 (en) * | 2003-07-09 | 2007-07-25 | 松下電器産業株式会社 | Voice recognition method and in-vehicle device |
-
2017
- 2017-03-31 WO PCT/JP2017/013826 patent/WO2018179426A1/en active Application Filing
- 2017-03-31 JP JP2018516873A patent/JP6457154B1/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004163265A (en) * | 2002-11-13 | 2004-06-10 | Nissan Motor Co Ltd | Navigation device |
JP2006349427A (en) * | 2005-06-14 | 2006-12-28 | Toyota Motor Corp | In-vehicle speech recognition device |
JP2012093508A (en) * | 2010-10-26 | 2012-05-17 | Nec Corp | Voice recognition support system, voice recognition support device, user terminal, method and program |
Non-Patent Citations (1)
Title |
---|
KUMIKO OMORI ET AL.: "A Spoken Dialogue Interface through Natural and Efficient Responses", JOURNAL OF NATURAL LANGUAGE PROCESSING, vol. 10, no. 5, 10 October 2003 (2003-10-10), pages 23 - 40, ISSN: 1340-7619 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110534112A (en) * | 2019-08-23 | 2019-12-03 | 王晓佳 | Distributed speech recongnition error correction device and method based on position and time |
Also Published As
Publication number | Publication date |
---|---|
JP6457154B1 (en) | 2019-01-23 |
JPWO2018179426A1 (en) | 2019-04-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112214418B (en) | Application compliance detection method and device and electronic equipment | |
JP6107409B2 (en) | Position specifying processing apparatus and position specifying processing program | |
US9188456B2 (en) | System and method of fixing mistakes by going back in an electronic device | |
US8918320B2 (en) | Methods, apparatuses and computer program products for joint use of speech and text-based features for sentiment detection | |
US8521681B2 (en) | Apparatus and method for recognizing a context of an object | |
US10127907B2 (en) | Control device and message output control system | |
WO2011093025A1 (en) | Input support system, method, and program | |
US20120224707A1 (en) | Method and apparatus for identifying mobile devices in similar sound environment | |
US20140324428A1 (en) | System and method of improving speech recognition using context | |
US20130065611A1 (en) | Method and apparatus for providing information based on a location | |
US10515634B2 (en) | Method and apparatus for searching for geographic information using interactive voice recognition | |
CN110998719A (en) | Information processing apparatus, information processing method, and computer program | |
US11495245B2 (en) | Urgency level estimation apparatus, urgency level estimation method, and program | |
WO2019205398A1 (en) | Method and device for incentivizing user behavior, computer apparatus, and storage medium | |
CN103828400A (en) | Information processing device, information provision method and program | |
JP5929393B2 (en) | Position estimation method, apparatus and program | |
US9224388B2 (en) | Sound recognition method and system | |
CN112951274A (en) | Voice similarity determination method and device, and program product | |
JP6457154B1 (en) | Speech recognition correction system, method and program | |
JP7314975B2 (en) | Voice operation device and its control method | |
KR20150037104A (en) | Point of interest update method, apparatus and system based crowd sourcing | |
CN112863496B (en) | Voice endpoint detection method and device | |
CN110263135B (en) | Data exchange matching method, device, medium and electronic equipment | |
JP4408665B2 (en) | Speech recognition apparatus for speech recognition, speech data collection method for speech recognition, and computer program | |
CN113453135A (en) | Intelligent sound box optimization method, test method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2018516873 Country of ref document: JP Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17902628 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17902628 Country of ref document: EP Kind code of ref document: A1 |