WO2018179426A1

WO2018179426A1 - Speech recognition correcting system, method, and program

Info

Publication number: WO2018179426A1
Application number: PCT/JP2017/013826
Authority: WO
Inventors: 俊二菅谷
Original assignee: 株式会社オプティム
Priority date: 2017-03-31
Filing date: 2017-03-31
Publication date: 2018-10-04
Also published as: JP6457154B1; JPWO2018179426A1

Abstract

[Problem] To provide a system capable of correctly recognizing speech that is desired by a user to be recognized, by deducing the details of the speech even when a sound capturing device fails to fully collect the speech due to noise, etc., from the surrounding area. [Solution] In a speech recognition correcting system 1 according to the present invention, a control unit 10 executes a positional information acquisition module 11 to acquire positional information about a place which a user visited prior to a specific time point. Further, the control unit 10 executes a speech recognition module 13 to perform speech recognition of speech uttered by the user. The control unit 10 executes a correction module 14 to correct, on the basis of the positional information acquired by the positional information acquisition module 11, the details of speech recognition performed by execution of the speech recognition module 13.

Description

Speech recognition correction system, method and program

The present invention relates to a speech recognition correction system, method, and program.

In recent years, a speech recognition system that recognizes a user's speech is known. For speech recognition systems, there is an issue of improving recognition accuracy of speech recognition. For example, face recognition that acquires an image of a user who is a target of speech recognition from a camera and identifies the user using the image Means and a mouth movement database storing pre-stored individual mouth movement feature quantities, detecting a user's mouth state from the image, and storing the mouth movement database in the mouth movement database Mouth movement determination means for comparing the mouth movement feature quantity corresponding to the user with the mouth movement feature quantity of the user obtained from the image to determine whether the user is speaking or not When it is determined that the user is speaking, a direction determining means for notifying the user's position to a voice input means for acquiring the user's voice, and acquiring the voice and performing voice recognition. With voice recognition means It has been proposed comprising (e.g., see Patent Document 1).

JP 2013-172411 A

However, there is room for further improvement in improving the accuracy of speech recognition. For example, when a moving user tries to recognize a voice, the sound collecting device of the system may not be able to collect the voice due to ambient noise or the like. In this case, it is required to provide a system that can correct the content of the voice that the user is trying to recognize and correctly recognize the voice.

The present invention has been made in view of such a demand, and even if the sound collecting device of the system cannot collect sound due to ambient noise or the like, the sound that the user is trying to recognize is intended. It is an object of the present invention to provide a system capable of estimating the content of the voice and correctly recognizing the voice.

The present invention provides the following solutions.

The invention according to the first feature is
Position information acquisition means for acquiring position information of a place visited by a user before a specific time;
Speech recognition means for recognizing speech uttered by the user;
Correction means for correcting the speech-recognized content based on the acquired position information;
A speech recognition correction system comprising:

According to the first aspect of the invention, the location information acquisition unit acquires location information of a place visited by the user before a specific time, and the correction unit is based on the location information acquired by the location information acquisition unit. Correct the content of the voice recognition. As a result, even if the sound collecting device of the system cannot pick up the sound due to ambient noise or the like, the user tried to make the user recognize from the location information of the place where the user visited before a specific time. It is possible to provide a system capable of estimating the content of a voice and correctly recognizing the voice.

The invention according to the second feature is the invention according to the first feature,
The position information acquisition means provides a voice recognition correction system that acquires position information of a place visited by the user before a specific time from the user's portable terminal.

According to the second aspect of the invention, the position information of the place visited by the user before a specific time is acquired from the mobile terminal owned by the user himself, and the voice that the user tries to recognize from the position information. Guess the contents. Therefore, it is possible to provide a system that can further improve the accuracy of recognizing the voice.

The invention according to the third feature is the invention according to the first or second feature,
The correction means provides a voice recognition correction system that corrects the voice-recognized content with reference to Web content related to the acquired position information.

According to the invention relating to the third feature, in addition to the position information of the place where the user visited before a specific time point, the content recognized by the voice is corrected by referring to the Web content related to the position information. Therefore, it is possible to provide a system that can further increase the accuracy of recognizing voice.

The invention according to the fourth feature is the invention according to any one of the first to third features,
The correction means provides a voice recognition correction system that specifies weather information in the acquired position information and corrects the voice-recognized content.

According to the fourth aspect of the invention, in addition to the location information of the place where the user visited before a specific time point, the weather information related to the location information is specified, and the speech-recognized content is corrected. Therefore, it is possible to provide a system that can further improve the accuracy of recognizing voices regarding the weather of a place visited by a user before a specific time.

The invention according to the fifth feature is the invention according to any one of the first to fourth features,
The correction means provides a voice recognition correction system that specifies time information in the acquired position information and corrects the voice-recognized content.

According to the fifth aspect of the invention, in addition to the position information of the place visited by the user before a specific time point, the time information related to the position information is specified, and the speech-recognized content is corrected. For this reason, it is possible to provide a system that can further improve the accuracy of recognizing the voice with respect to the time when the user visited the predetermined place.

The invention according to a sixth feature is the invention according to any one of the first to fifth features,
Further comprising state information acquisition means for acquiring state information indicating the state of the user from the portable terminal of the user;
The correction means provides a voice recognition correction system that specifies state information in the acquired position information and corrects the voice-recognized content.

According to the sixth aspect of the invention, in addition to the position information of the place visited by the user before a specific time point, the state information in the position information is specified, and the speech-recognized content is corrected. Therefore, it is possible to provide a system that can further improve the accuracy of recognizing voices regarding the state of the user at the place where the user has visited.

The invention according to a seventh feature is the invention according to any one of the first to sixth features,
Payment information acquisition means for acquiring payment information settled by the user;
The correction means provides a voice recognition correction system that specifies payment information in the acquired position information and corrects the voice-recognized content.

According to the seventh aspect of the invention, in addition to the position information of the place where the user visited before a specific time point, the settlement information in the position information is specified, and the speech-recognized content is corrected. Therefore, it is possible to provide a system that can further improve the accuracy of recognizing voices with respect to matters related to the settlement status at the place where the user has visited.

The invention according to the eighth feature is the invention according to any one of the first to seventh features,
Comprising a plurality of portable terminals and a management computer connected to the plurality of portable terminals via a network;
The plurality of portable terminals include the position information acquisition unit and voice information acquisition unit that acquires voice information related to a voice uttered by the user,
The management computer is configured to receive the location information and the audio information acquired by the plurality of mobile terminals,
The management computer includes a determination unit that determines whether the portable terminal that has transmitted the position information and the portable terminal that has transmitted the audio information are the same portable terminal, and the correction unit.
The correction unit provides a voice recognition correction system that corrects the voice-recognized content based on the acquired position information when the determination unit determines that the mobile terminals are the same portable terminal. .

According to the eighth aspect of the invention, it is possible to suppress erroneous recognition when the voice recognition correction system is a plurality of portable terminals and a network type system connected to the plurality of portable terminals via a network. . As a result, it is possible to provide a system capable of further improving the accuracy of recognizing speech.

The invention according to a ninth feature is the invention according to any one of the first to eighth features,
A repeater that repeats the corrected content;
There is provided a voice recognition system further comprising recording means for recording the corrected content when there is no problem as a result of the repetition.

When the moving user tries to recognize the voice, it is difficult for the user to confirm the content corrected by the correcting means from the screen display. According to the ninth aspect of the invention, since the corrected content is repeated, even if the user is moving, the content corrected by the correcting means can be confirmed without paying attention to the screen display. it can.

Also, the recording means records the corrected content when there is no problem as a result of the repetition. Therefore, according to the ninth aspect of the invention, when there is an error in the corrected content, it is possible to prevent the corrected content from being recorded, and as a result, the accuracy of recognizing the voice is further improved. A system that can be enhanced can be provided.

According to the present invention, even if the sound collecting device of the system cannot collect sound due to ambient noise or the like, the user can recognize the position information from a location visited by the user before a specific time. It is possible to provide a system capable of estimating the content of such a voice and correctly recognizing the voice.

FIG. 1 is a block diagram showing a hardware configuration and software functions of a speech recognition correction system 1 according to the first embodiment of the present invention. FIG. 2 is a flowchart showing the speech recognition correction method according to this embodiment. FIG. 3 is an example of the position information database 31 in the present embodiment. FIG. 4 is an example of the stay time measurement area 32 in the present embodiment. FIG. 5 is an example for explaining the collected sound contents. FIG. 6 is an example of the voice database 34 in the present embodiment. FIG. 7 is an example of the dictionary database 35 in the present embodiment. FIG. 8 is an example of the classification database 36 in the present embodiment. FIG. 9 is an example of display content and audio output content in the speech recognition correction system 1 according to the present embodiment. FIG. 10 is an example of the voice database 34 after being overwritten and saved in the present embodiment. FIG. 11 is a block diagram showing a hardware configuration and software functions of the speech recognition correction system 1 according to the first embodiment of the present invention.

Hereinafter, modes for carrying out the present invention will be described with reference to the drawings. This is merely an example, and the technical scope of the present invention is not limited to this.

1. First Embodiment First, a first embodiment of the present invention will be described.

The voice recognition correction system may be a stand-alone type system that is provided integrally with a mobile terminal such as a smartphone, smart glass, or smart watch, or is connected to the mobile terminal via the network. A network type system including a management computer may be used.

In the first embodiment, description will be made assuming that the speech recognition correction system is a stand-alone system. On the other hand, in the second embodiment to be described later, the speech recognition correction system will be described as a network type system.

<Configuration of voice recognition correction system 1>
FIG. 1 is a block diagram for explaining the hardware configuration and software functions of a speech recognition correction system 1 according to this embodiment.

The speech recognition correction system 1 includes a control unit 10 that controls data, a communication unit 20 that communicates with other devices, a storage unit 30 that stores data, an input unit 40 that receives user operations, and user voices. A sound collection unit 50 that collects sound, a position detection unit 60 that detects a position where the speech recognition correction system 1 exists, a timer 70 that measures a staying time at a certain place, and data controlled by the control unit 10 And an image display unit 80 for outputting and displaying an image.

The control unit 10 includes a CPU (Central Processing Unit), a RAM (Random Access Memory), a ROM (Read Only Memory), and the like.

The communication unit 20 includes a device for enabling communication with other devices, for example, a Wi-Fi (Wireless Fidelity) compatible device compliant with IEEE 802.11.

The control unit 10 reads a predetermined program and cooperates with the communication unit 20 as necessary, so that the position information position information acquisition module 11, the state information etc. acquisition module 12, the voice recognition module 13, and the correction module 14, a repeat module 15, and a recording module 16 are realized.

The storage unit 30 is a device that stores data and files, and includes a data storage unit such as a hard disk, a semiconductor memory, a recording medium, and a memory card. The storage unit 30 stores a history information database 31, a map database 32, a stay time measurement area 33, a voice database 34, a dictionary database 35, and a classification database 36, which will be described later. The storage unit 30 also stores image data to be displayed on the image display unit 80.

The type of the input unit 40 is not particularly limited. Examples of the input unit 40 include a keyboard, a mouse, and a touch panel.

The type of the sound collecting unit 50 is not particularly limited. Examples of the sound collecting unit 50 include a microphone.

The position detection unit 60 is not particularly limited as long as it is a device that can detect the latitude and longitude where the voice recognition correction system 1 is located. Examples of the position detection unit 60 include a GPS (Global Positioning System).

The type of the timer 70 is not particularly limited as long as the staying time at a certain place can be measured.

The type of the image display unit 80 is not particularly limited. Examples of the image display unit 80 include a monitor and a touch panel.

<Flowchart showing a speech recognition correction method using the speech recognition correction system 1]
FIG. 2 is a flowchart showing a voice recognition correction method using the voice recognition correction system 1. The processing executed by each hardware and the software module described above will be described.

[Step S10: Acquisition of Position Information]
First, the control unit 10 of the voice recognition correction system 1 executes the position information acquisition module 11 and acquires position information of a place visited by the user before a specific time (step S10).

The position detection unit 60 of the voice recognition correction system 1 detects the latitude and longitude where the voice recognition correction system 1 is located at any time. Then, the control unit 10 refers to the map database 32 and searches for a place corresponding to the latitude and longitude detected by the position detection unit 60. Then, the control unit 10 records the searched place in the history information database 31.

FIG. 3 shows an example of the history information database 31. In the history information database 31, information on the date and time when the position detection unit 60 detects the position information and the location corresponding to the position detected by the position detection unit 60 is recorded in association with the identification number.

The date can be recorded by referring to a calendar function (not shown) built in the audio content correction system 1. The time can be recorded by referring to a clock function (not shown) built in the audio content correction system 1.

The control unit 10 can acquire position information of a place visited by the user before a specific time by referring to the history information database 31.

[Step S11: Acquisition of status information and the like]
Returning to FIG. Subsequently, the control unit 10 executes the status information acquisition module 12, and acquires status information indicating the user status, current weather information, payment information regarding a credit card and electronic payment, and the like (step S11).

The timer 70 of the voice recognition correction system 1 measures the time during which the voice recognition correction system 1 stays at a certain place and records it in the stay time measurement area 32.

FIG. 4 is an example of the stay time measurement area 32. In the stay time measurement area 32, information on the stay location, stay start date and time and stay end date and time of the speech recognition correction system 1 is recorded.

When it is recorded in the stay time measurement area 32 that the voice recognition correction system 1 has been staying at a certain place for a predetermined time or longer, the control unit 10 determines that the user is staying at a certain place and records the history information database. The item of “state” in 31 is updated to “staying”.

Further, the control unit 10 accesses an external weather forecast providing website via the communication unit 20. And the control part 10 reads the information of the weather in the spot corresponded to the latitude and the longitude which the position detection part 60 detected from the said weather forecast provision Web site. Then, the control unit 10 records the read weather information in the history information database 31.

Further, when the credit card function or the electronic payment function of the mobile terminal is used, the control unit 10 records the payment information regarding the credit card or the electronic payment in the history information database 31.

The history information database 31 shown in FIG. 3 includes not only date, time and place information when the position detection unit 60 detects position information, but also state information indicating the user state, current weather information, credit card Also, payment information relating to electronic payment is recorded in association with an identification number.

The control unit 10 can acquire the state information, weather information, settlement information, and the like by referring to the history information database 31.

[Step S12: Sound collection]
Returning to FIG. Subsequently, the control unit 10 determines whether or not the sound collection unit 50 has collected the user's voice (step S12).

When the sound collection unit 50 collects the user's voice, the control unit 10 performs A / D conversion on the voice collected by the sound collection unit 50 and sets the A / D converted information in a predetermined area of the storage unit 30. To do.

For example, as shown in FIG. 5, it is assumed that the user generates a voice saying, “Today, I went to Ginza. It was fine and sunny. In this case, the sound collection unit 50 of the speech recognition correction system 1 collects the sound. Then, the control unit 10 A / D converts the sound collected by the sound collection unit 50 and sets the A / D converted information in a predetermined area of the storage unit 30.

If the determination in step S12 is YES, the process proceeds to step S13. On the other hand, when the determination in step S12 is NO, the process returns to step S10.

[Step S13: Speech recognition]
Returning to FIG. Subsequently, the control unit 10 executes the voice recognition module 13 and recognizes the voice collected by the sound collection unit 50 (step S13).

The control unit 10 refers to the voice database 34 shown in FIG. 6 and transcribes the voice collected by the sound collection unit 50 from the sound wave waveform included in the A / D converted information. As a result of this processing, the information that has been A / D converted is “Kyoha ??? Nidekaketa / Harete Yokatta / ??? Nyotte Brand ?????? In addition, “???” is a place where the sound collection unit 50 of the sound content correction system 1 cannot collect sound due to ambient noise or the like.

Subsequently, the control unit 10 refers to the dictionary database 35 shown in FIG. 7, replaces the transcribed information with a language, and creates a sentence. As a result of this processing, the information that has been A / D converted is “I went out to today. It was good to be sunny. The documented information is set in a predetermined area of the storage unit 30 in association with the A / D converted information.

[Step S14: Correction of Recognized Content]
Returning to FIG. Subsequently, the control unit 10 executes the correction module 14 and corrects the content recognized in the process of step S13 based on the position information acquired in the process of step S10, the state information acquired in the process of step S11, and the like. (Step S14).

The control unit 10 refers to the classification database 36. FIG. 8 is an example of the classification database 36. The classification database 36 records in advance the relationship between words and the like included in the documented contents and items listed in the history information database 31. In the present embodiment, items such as “date”, “time”, “location”, “state”, “weather”, “payment information” are listed in the history information database 31 (FIG. 3). In the classification database 36, word groups related to these items are recorded.

”Explain the information that is the content of the voice recognition:“ I went out for today. It was fine to be fine. The control unit 10 refers to the classification database 36, associates “today” included in this information with the item “date”, and associates “going out” with the item “location”. In addition, “good” is associated with the item “weather”, and “stop” is associated with the item “location”. Further, “clothes” is associated with the item “settlement information”, and “purchase” is associated with the item “settlement information”.

Subsequently, the control unit 10 refers to the history information database 31. First, the control unit 10 refers to the item “date” in the history information database 31 and extracts an item related to “today” included in the speech-recognized content. In addition, it can be grasped | ascertained by reading the calendar (not shown) memorize | stored in the memory | storage part 30 when today is. In the present embodiment, it is assumed that today is March 20, 2017.

Subsequently, the control unit 10 refers to the item “place” in the history information database 31 and extracts items relating to “going out” and “stopping” included in the speech-recognized content.

Although it is not possible to immediately identify the place “going out” and the place “stopping” from the speech-recognized content, the control unit 10 determines the “going out” location, “stopping” from the content recorded in the history information database 31. It can be inferred that the location is “Yurakucho”, “Yurakucho Station”, “A Department Store”, “Department Store”, “Ginza”, or “A Department Store Ginza Store”.

Then, the control unit 10 refers to the voice database 34 (FIG. 6), and the voice corresponding to “Yurakucho”, “Yurakucho Station”, “A Department Store”, “Department Store”, “Ginza”, “A Department Store Ginza Store”. Synthesize data (waveform data). Subsequently, the control unit 10 compares the synthesized voice data with the voice data that has been A / D converted in the process of step S <b> 13, and went to “???”. The voice closest to the voice data corresponding to “???” is extracted.

As a result, the control unit 10 “Gone” of “I went to“ ??? ”is“ Ginza ”, and“ ??? ” Can be guessed.

Similarly, the control unit 10 refers to the item “payment information” in the history information database 31 and extracts items related to “clothes” and “purchase” included in the speech-recognized content.

Although the content of “purchasing” “clothes” cannot be immediately identified from the speech-recognized content, the control unit 10 “purchased” “clothes” from the content recorded in the history information database 31. It can be inferred that it is one of “brand X”, “shirt”, “7560 yen”, “credit card”, or “card payment”.

Then, the control unit 10 refers to the voice database 34 (FIG. 6), and the voice data (waveform data) corresponding to “brand X”, “shirt”, “7560 yen”, “credit card”, “card payment”. Is synthesized. Subsequently, the control unit 10 compares the synthesized voice data with the voice data that has been A / D converted in the process of step S <b> 13 and sets “???” to “???”. The voice closest to the corresponding voice data is extracted.

Thus, the control unit 10 can presume that “???” in “?????? purchases” is “brand X”.

From the above, the information that is the content of the speech recognition in the process of step S13, “I went out to today, was fine. , "Today, I went to Ginza. It was fine to be fine. I went to A department store and bought brand X clothes."

According to the invention described in the present embodiment, the control unit 10 acquires position information of a place visited by the user before a specific time in the process of step S10, and the process of step S10 in the process of step S14. Based on the acquired position information, the content of voice recognition is corrected. As a result, even if the sound collecting device of the system cannot pick up the sound due to ambient noise or the like, the user tried to make the user recognize from the location information of the place where the user visited before a specific time. It is possible to provide the audio content correction system 1 that can estimate the audio content and correctly recognize the audio.

In the process of step S14, the control unit 10 identifies the weather information, the time information, the state information indicating the user state, and the payment information settled by the user in the position information acquired in the process of step S11. It is possible to correct the content recognized by the voice processing. According to the invention described in the present embodiment, in addition to the position information of a place visited by a user before a specific time point, it is possible to specify various information related to the position information and correct the content recognized by voice. To do. Therefore, it is possible to provide the audio content correction system 1 that can further increase the accuracy of recognizing the audio.

Moreover, in the process of step S14, it is preferable that the control unit 10 refers to the Web content related to the position information acquired in the process of step S10 and corrects the content recognized in the process of step S13. By doing so, it is possible to provide the audio content correction system 1 that can further increase the accuracy of recognizing the audio.

[Step S15: Recurrence of correction contents]
Returning to FIG. Subsequently, the control unit 10 executes the repeat module 15 and repeats the content corrected in the process of step S14 (step S15).

FIG. 9 shows an example of the state of the audio content correction system 1 at that time.

On the image display section 80, a text saying “I went to Ginza today. It was fine and good. I bought a clothes for Brand X by visiting department store A” was displayed. A sentence “is there?” Is displayed, and an icon “OK” is displayed. Then, from the speaker (not shown) of the audio content correction system 1, “Today I went to Ginza. It was fine. It was sunny. Is done. Then, “Is this correct? If yes, please answer“ Yes ”or press“ OK ”. Is output.

When the moving user wants to recognize the voice, it is difficult for the user to confirm the content corrected in the process of step S14 from the screen display. According to the invention described in the present embodiment, the corrected content is repeated not only as a screen display on the image display unit 80 but also as a sound from a speaker. The content corrected in the process of step S14 can be confirmed without paying attention to the above.

[Step S16: Recording Correction Contents]
Returning to FIG. Then, the control part 10 performs the recording module 16, and when there is no problem as a result of repeating by the process of step S15, the content correct | amended by the process of step S14 is recorded (step S16).

Of the audio data A / D converted in the process of step S13, the contents of the part whose contents are unknown only by the process of step S13 are “Ginza”, “A department store”, and “Brand X”. There was found. The control unit 10 changes the voice data A / D converted in the process of step S13 to “Ginza”, “A”, “Department Store”, “A Department Store”, “Brand”, “X”, “Brand X”. The waveform at the corresponding location is extracted and overwritten and saved in the voice database 34 that was originally stored.

FIG. 10 shows an example of the voice database 34 after being overwritten. Audio data of “Ginza”, “A”, “Department Store”, “A Department Store”, “Brand”, “X”, and “Brand X” is newly added to the audio database 34.

According to the invention described in this embodiment, the corrected content is recorded when there is no problem as a result of the repetition in step S15. Therefore, when there is an error in the content corrected in the process of step S14, it is possible to prevent the incorrect content from being recorded, and as a result, the speech recognition that can further improve the accuracy of recognizing the speech. A correction system 1 can be provided.

2. Second Embodiment Next, a second embodiment of the present invention will be described.

In the first embodiment, the voice recognition correction system is described as a stand-alone type system. On the other hand, the second embodiment is different in that the voice recognition correction system is a network type system, and the rest is the same.

<Voice recognition correction system 100>
FIG. 11 is a block diagram for explaining the hardware configuration and software functions of the speech recognition correction system 100 according to this embodiment.

The voice recognition correction system 100 includes a plurality of portable terminals 200 and a management computer 300 connected to the plurality of portable terminals 200 via a network.

[Portable terminal 200]
The mobile terminal 200 includes a control unit 210, a communication unit 220, a storage unit 230, an input unit 240, a sound collection unit 250, a position detection unit 260, and an image display unit 280, respectively.

The control unit 210 includes a position information acquisition module 211, a state information acquisition module 212, and a repetition module 215.

The sound collection unit 250 functions as a voice information acquisition unit that acquires voice information related to the voice uttered by the user.

The functions of the communication unit 220, the storage unit 230, the input unit 240, the position detection unit 260, and the image display unit 280 are the same as those of the communication unit 20, the storage unit 30, the input unit 40, and the position detection unit 60 in the first embodiment. , And the function of the image display unit 80.

Further, the functions of the position information acquisition module 211, the state information acquisition module 212, and the repetition module 215 are the same as the functions of the position information acquisition module 11, the state information acquisition module 12, and the repetition module 15 in the first embodiment. It is.

[Management computer 300]
The management computer 300 includes a control unit 310, a communication unit 320, a storage unit 330, an input unit 340, and an image display unit 380.

The control unit 310 includes a voice recognition module 313, a correction module 314, and a recording module 316.

The communication unit 320 is configured to be able to receive position information and audio information acquired by the plurality of mobile terminals 200.

The storage unit 330 stores a history information database 331, a map database 332, a stay time measurement area 333, a voice database 334, a dictionary database 335, and a classification database 336.

By the way, the control part 310 discriminate | determines whether the portable terminal which transmitted the positional information and the portable terminal which transmitted audio | voice information are the same portable terminals among the some portable terminals 200. FIG. Then, the correction module 314 of the control unit 310, when the portable terminal that transmitted the position information and the portable terminal that transmitted the audio information are the same portable terminal, based on the position information acquired by the portable terminal, The speech-recognized content of the sound collected by the sound collection unit 250 of the portable terminal is corrected.

Accordingly, it is possible to suppress misrecognition when the speech recognition correction system 1 is a network type system including a plurality of portable terminals 200 and a management computer 300 connected to the plurality of portable terminals 200 via a network. it can. Therefore, it is possible to provide the network type speech recognition correction system 1 that can further increase the accuracy of speech recognition.

The functions of the input unit 340 and the image display unit 380 are the same as the functions of the input unit 40 and the image display unit 80 in the first embodiment.

The functions of the speech recognition module 313, the correction module 314, and the recording module 316 are basically the same as the functions of the speech recognition module 13, the correction module 14, and the recording module 16 in the first embodiment.

The history information database 331, the map database 332, the stay time measurement area 333, the voice database 334, the dictionary database 335, and the classification database 336 are configured in the history information database 31, the map database 32, the stay time in the first embodiment. The configuration is the same as that of the measurement area 33, the voice database 34, the dictionary database 35, and the classification database 36.

The means and functions described above are realized by a computer (including a CPU, an information processing apparatus, and various terminals) reading and executing a predetermined program. The program is provided in a form recorded on a computer-readable recording medium such as a flexible disk, CD (CD-ROM, etc.), DVD (DVD-ROM, DVD-RAM, etc.). In this case, the computer reads the program from the recording medium, transfers it to the internal storage device or the external storage device, stores it, and executes it. The program may be recorded in advance in a storage device (recording medium) such as a magnetic disk, an optical disk, or a magneto-optical disk, and provided from the storage device to a computer via a communication line.

As mentioned above, although embodiment of this invention was described, this invention is not limited to these embodiment mentioned above. The effects described in the embodiments of the present invention are only the most preferable effects resulting from the present invention, and the effects of the present invention are limited to those described in the embodiments of the present invention. is not.

DESCRIPTION OF SYMBOLS 1 Voice content recording system 10 Control part 11 Position information acquisition module 12 State information etc. acquisition module 13 Voice recognition module 14 Correction module 15 Repetition module 16 Recording module 20 Communication part 30 Storage part 31 History information database 32 Map database 33 Stay time measurement area 34 Voice Database 35 Dictionary Data Bath 36 Classification Database 40 Input Unit 50 Sound Collection Unit 60 Position Detection Unit 70 Timer 80 Image Display Unit

Claims

Position information acquisition means for acquiring position information of a place visited by a user before a specific time;
Speech recognition means for recognizing speech uttered by the user;
Correction means for correcting the speech-recognized content based on the acquired position information;
A speech recognition correction system comprising:
The voice recognition correction system according to claim 1, wherein the position information acquisition means acquires position information of a place visited by the user before a specific time from the user's mobile terminal.
The speech recognition correction system according to claim 1 or 2, wherein the correction unit corrects the speech-recognized content with reference to Web content related to the acquired position information.
4. The voice recognition correction system according to claim 1, wherein the correction unit specifies weather information in the acquired position information and corrects the voice-recognized content.
The speech recognition correction system according to any one of claims 1 to 4, wherein the correction unit specifies time information in the acquired position information and corrects the speech-recognized content.
Further comprising state information acquisition means for acquiring state information indicating the state of the user from the portable terminal of the user;
The speech recognition correction system according to claim 1, wherein the correction unit specifies state information in the acquired position information and corrects the speech-recognized content.
Payment information acquisition means for acquiring payment information settled by the user;
The voice recognition correction system according to claim 1, wherein the correction unit specifies payment information in the acquired position information and corrects the voice-recognized content.
Comprising a plurality of portable terminals and a management computer connected to the plurality of portable terminals via a network;
The plurality of portable terminals include the position information acquisition unit and voice information acquisition unit that acquires voice information related to a voice uttered by the user,
The management computer is configured to receive the location information and the audio information acquired by the plurality of mobile terminals,
The management computer includes a determination unit that determines whether the portable terminal that has transmitted the position information and the portable terminal that has transmitted the audio information are the same portable terminal, and the correction unit.
The said correction | amendment means correct | amends the said audio | voice recognition content based on the said acquired positional information, when it determines with the said portable terminal being the same portable terminal. The speech recognition correction system described in 1.
A repeater that repeats the corrected content;
The speech recognition system according to claim 1, further comprising recording means for recording the corrected content when there is no problem as a result of the repetition.
Obtaining location information of places visited by a user before a certain point in time;
Recognizing the voice uttered by the user;
Correcting the speech-recognized content based on the acquired position information;
A speech recognition correction method comprising:
In speech recognition system,
Obtaining location information of places visited by a user before a certain point in time;
Recognizing the voice uttered by the user;
Correcting the speech-recognized content based on the acquired position information;
A program for running