JP2023049059A

JP2023049059A - WEB CONFERENCE SERVER, WEB CONFERENCE METHOD AND WEB CONFERENCE SYSTEM

Info

Publication number: JP2023049059A
Application number: JP2021158558A
Authority: JP
Inventors: 志強賈; zhi-qiang Jia
Original assignee: Asia Star Co Ltd
Current assignee: Asia Star Co Ltd
Priority date: 2021-09-29
Filing date: 2021-09-29
Publication date: 2023-04-10

Abstract

【課題】ウェブ会議の速度及び精度を向上する。【解決手段】ウェブ会議サーバは、ウェブ会議サーバと複数の音声認識サーバとの通信速度と、複数の音声認識サーバの情報処理速度とを判断する音声認識サーバ速度判断部と、言語属性と、複数の音声認識サーバとの通信速度と、複数の音声認識サーバの情報処理速度とに基づき、複数の音声認識サーバから、特定のユーザに対して使用する特定の音声認識サーバを決定する音声認識サーバ決定部と、特定のユーザに対して使用する特定の音声認識サーバと複数の翻訳サーバとの通信速度と、複数の翻訳サーバの情報処理速度とを判断する翻訳サーバ速度判断部と、複数の翻訳サーバとの通信速度と、複数の翻訳サーバの情報処理速度とに基づき、複数の翻訳サーバから、特定のユーザに対して使用する特定の翻訳サーバを決定する翻訳サーバ決定部と、を具備する。【選択図】図２Kind Code: A1 To improve the speed and accuracy of web conferencing. A web conference server includes a speech recognition server speed determination unit that determines a communication speed between the web conference server and a plurality of speech recognition servers, an information processing speed of the plurality of speech recognition servers, a language attribute, and a plurality of speech recognition servers. Speech recognition server determination that determines a specific speech recognition server to be used for a specific user from a plurality of speech recognition servers based on the communication speed with the speech recognition server and the information processing speed of the plurality of speech recognition servers a translation server speed determination unit that determines a communication speed between a specific speech recognition server used for a specific user and a plurality of translation servers and an information processing speed of the plurality of translation servers; and a plurality of translation servers. a translation server determination unit that determines a specific translation server to be used for a specific user from among a plurality of translation servers based on a communication speed with and an information processing speed of the plurality of translation servers. [Selection drawing] Fig. 2

Description

本発明は、ウェブ会議サービスを提供するウェブ会議サーバ、ウェブ会議方法及びウェブ会議システムに関する。 The present invention relates to a web conference server, web conference method, and web conference system that provide web conference services.

ネットワークで接続された個々人のユーザ端末を用いて、遠隔地域に居る複数のユーザがオンラインで会議をするウェブ会議システムが知られている。近年では、ビデオ画像を見ながらの音声通話に加えて、音声データを音声認識して得られたテキストを同時に画面に表示するウェブ会議システムもある。ＣＯＶＩＤ－１９の流行により遠隔地域間のウェブ会議が益々一般的になる中、音声認識データを会議相手であるユーザが使用する言語に翻訳し、翻訳テキストを同時に画面に表示する技術が望まれる。 2. Description of the Related Art A web conference system is known in which a plurality of users in remote areas hold online conferences using individual user terminals connected via a network. In recent years, in addition to voice communication while viewing video images, there are web conference systems that simultaneously display text obtained by speech recognition of voice data on the screen. As web conferencing between remote regions becomes more and more common due to the COVID-19 epidemic, there is a demand for a technology that translates speech recognition data into the language used by the user who is the conference partner and simultaneously displays the translated text on the screen.

特開２０１９－１５３０９９号公報JP 2019-153099 A 特開２０１９－０６１５９４号公報JP 2019-061594 A 特開２０１７－２１５９３１号公報JP 2017-215931 A 特許第６７９５６６８号公報Japanese Patent No. 6795668

特許文献１は、ＴＶ会議システムを利用している場合、テキスト変換処理を行うデバイス（拠点）を、通信速度や装置の処理能力に基づき決定する。特許文献２は、翻訳先の言語を、ユーザが任意に指定したり、対象の会議の出席者や議事録の閲覧権限を有するユーザの属性に応じて言語を自動的に判定したりする。特許文献３は、入力言語と翻訳言語がユーザの操作に基づいて設定される。特許文献４は、会議参加者等は、複数の翻訳辞書を選択する場合に翻訳辞書毎に優先度を設定してもよく、設定された優先度の順に翻訳処理が実行される。 According to Patent Document 1, when a TV conference system is used, a device (site) for text conversion processing is determined based on the communication speed and processing capability of the device. In Patent Document 2, a user arbitrarily designates a translation destination language, and the language is automatically determined according to the attributes of the attendees of the target meeting and the user who has the viewing authority of the minutes. In Patent Document 3, an input language and a translation language are set based on a user's operation. According to Japanese Patent Laid-Open No. 2002-200000, conference participants and the like may set a priority for each translation dictionary when selecting a plurality of translation dictionaries, and translation processing is executed in the order of the set priority.

以上のような事情に鑑み、本発明の目的は、ウェブ会議の速度及び精度を向上することにある。 In view of the circumstances as described above, an object of the present invention is to improve the speed and accuracy of a web conference.

本発明の一形態に係るウェブ会議サーバは、
ウェブ会議に参加する複数のユーザに含まれる特定のユーザの音声データを、前記特定のユーザのユーザ端末から取得する音声取得部と、
前記音声データを音声認識し、前記特定のユーザの言語属性を判断する音声認識部と、
前記ウェブ会議サーバと複数の音声認識サーバとの通信速度と、前記複数の音声認識サーバの情報処理速度とを判断する音声認識サーバ速度判断部と、
前記言語属性と、前記複数の音声認識サーバとの通信速度と、前記複数の音声認識サーバの情報処理速度とに基づき、前記複数の音声認識サーバから、前記特定のユーザに対して使用する特定の音声認識サーバを決定する音声認識サーバ決定部と、
前記特定のユーザに対して使用する前記特定の音声認識サーバと複数の翻訳サーバとの通信速度と、前記複数の翻訳サーバの情報処理速度とを判断する翻訳サーバ速度判断部と、
前記複数の翻訳サーバとの通信速度と、前記複数の翻訳サーバの情報処理速度とに基づき、前記複数の翻訳サーバから、前記特定のユーザに対して使用する特定の翻訳サーバを決定する翻訳サーバ決定部と、
前記特定の音声認識サーバに、前記特定のユーザの前記音声データと、前記特定の翻訳サーバを識別する識別情報とを供給することにより、前記特定の音声認識サーバに前記特定のユーザの前記音声データを音声認識させてテキストデータである音声認識データを生成させ、前記特定の翻訳サーバに前記特定のユーザの音声認識データを翻訳させてテキストデータである翻訳データを生成させる音声データ処理要求部と、
を具備し、
前記特定の音声認識サーバ及び前記特定の翻訳サーバの組み合わせは、前記ウェブ会議に参加する複数のユーザ毎に異なる。 A web conference server according to one aspect of the present invention includes:
a voice acquisition unit that acquires voice data of a specific user included in a plurality of users participating in a web conference from a user terminal of the specific user;
a voice recognition unit that recognizes the voice data and determines the language attribute of the specific user;
a speech recognition server speed determination unit that determines a communication speed between the web conference server and the plurality of speech recognition servers and an information processing speed of the plurality of speech recognition servers;
Based on the language attribute, the communication speed with the plurality of speech recognition servers, and the information processing speed of the plurality of speech recognition servers, a specific a voice recognition server determination unit that determines a voice recognition server;
a translation server speed determination unit that determines a communication speed between the specific speech recognition server used for the specific user and a plurality of translation servers and an information processing speed of the plurality of translation servers;
Translation server determination for determining a specific translation server to be used for the specific user from among the plurality of translation servers based on a communication speed with the plurality of translation servers and an information processing speed of the plurality of translation servers. Department and
The speech data of the specific user is supplied to the specific speech recognition server by supplying the speech data of the specific user and identification information for identifying the specific translation server to the specific speech recognition server. a speech data processing requesting unit for generating speech recognition data, which is text data, by recognizing the voice of the user, and causing the specific translation server to translate the speech recognition data of the specific user to generate translation data, which is text data;
and
A combination of the specific speech recognition server and the specific translation server differs for each of the multiple users participating in the web conference.

本発明によれば、ウェブ会議の速度及び精度の向上を図れる。 According to the present invention, it is possible to improve the speed and accuracy of a web conference.

本発明の一実施形態に係るウェブ会議システムを示す。1 illustrates a web conferencing system according to one embodiment of the present invention; ウェブ会議システムの機能的構成を示す。1 shows a functional configuration of a web conference system; ウェブ会議サーバの第１の動作フローを示す。1 shows a first operational flow of a web conference server; ウェブ会議サーバの第２の動作フローを示す。2 shows a second operational flow of the web conferencing server; ウェブ会議サーバの第３の動作フローを示す。3 shows a third operational flow of the web conference server;

以下、図面を参照しながら、本発明の実施形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

１．ウェブ会議システムの概要 1. Web conferencing system overview

図１は、本発明の一実施形態に係るウェブ会議システムを示す。 FIG. 1 shows a web conference system according to one embodiment of the invention.

ウェブ会議システム１は、複数のウェブ会議サーバ１０と、複数の音声認識サーバ２０と、複数の翻訳サーバ３０とを有する。複数のウェブ会議サーバ１０と、複数の音声認識サーバ２０と、複数の翻訳サーバ３０とは、インターネット等のネットワークＮを介して相互に接続される。 The web conference system 1 has multiple web conference servers 10 , multiple speech recognition servers 20 , and multiple translation servers 30 . A plurality of web conference servers 10, a plurality of speech recognition servers 20, and a plurality of translation servers 30 are interconnected via a network N such as the Internet.

複数のウェブ会議サーバ１０は、複数の異なる国や地域にそれぞれ設置される。複数のウェブ会議サーバ１０は、ウェブ会議に参加する複数のユーザが使用する複数のユーザ端末４０（パーソナルコンピュータ、スマートフォン、タブレットコンピュータ、ウェアラブルデバイス等）とネットワークＮを介して通信し、複数のユーザにウェブ会議サービスを提供するコンピュータである。複数のユーザ端末４０がアクセスするウェブ会議サーバ１０は、同じウェブ会議に参加する複数のユーザであっても、ユーザ端末４０毎に異なる。各ウェブ会議サーバ１０は、ユーザ毎に、そのユーザに対して使用する特定の音声認識サーバ２０及び特定の翻訳サーバ３０を決定する。このため、特定の音声認識サーバ２０及び特定の翻訳サーバ３０の組み合わせは、同じウェブ会議に参加する複数のユーザであっても、ウェブ会議に参加する複数のユーザ毎に異なる。各ウェブ会議サーバ１０は、ウェブ会議に参加する特定のユーザの音声データを、そのユーザが使用するユーザ端末４０から取得し、音声データを何れかの音声認識サーバ２０に供給する。 A plurality of web conference servers 10 are installed in a plurality of different countries and regions. A plurality of web conference servers 10 communicate with a plurality of user terminals 40 (personal computers, smart phones, tablet computers, wearable devices, etc.) used by a plurality of users participating in the web conference via a network N, A computer that provides web conferencing services. The web conference server 10 accessed by a plurality of user terminals 40 is different for each user terminal 40 even for a plurality of users participating in the same web conference. Each web conference server 10 determines, for each user, a specific speech recognition server 20 and a specific translation server 30 to be used for that user. For this reason, the combination of the specific speech recognition server 20 and the specific translation server 30 differs for each of the multiple users participating in the web conference, even for multiple users participating in the same web conference. Each web conference server 10 acquires voice data of a specific user participating in the web conference from the user terminal 40 used by the user, and supplies the voice data to one of the voice recognition servers 20 .

複数の音声認識サーバ２０は、複数の異なる国や地域にそれぞれ設置される。複数の音声認識サーバ２０は、典型的には、それぞれ異なるプロバイダにより提供され、それぞれ異なる音声認識ソフトウェアを実行するコンピュータである。各音声認識サーバ２０は、何れかのウェブ会議サーバ１０から、ウェブ会議に参加する特定のユーザの音声データを取得し、音声データを音声認識してテキストデータである音声認識データを生成し、音声認識データを、音声データの供給元であるウェブ会議サーバ１０及び特定の翻訳サーバ３０に供給する。 A plurality of speech recognition servers 20 are installed in a plurality of different countries and regions. The multiple speech recognition servers 20 are typically computers provided by different providers and running different speech recognition software. Each speech recognition server 20 acquires speech data of a specific user participating in the web conference from any of the web conference servers 10, recognizes the speech data to generate speech recognition data which is text data, The recognition data is supplied to the web conferencing server 10 and the specific translation server 30 from which the voice data is supplied.

複数の翻訳サーバ３０は、複数の異なる国や地域にそれぞれ設置される。複数の翻訳サーバ３０は、典型的には、それぞれ異なるプロバイダにより提供され、それぞれ異なる翻訳ソフトウェアを実行するコンピュータである。各翻訳サーバ３０は、何れかの音声認識サーバ２０から、ウェブ会議に参加する特定のユーザの音声認識データを取得し、音声認識データを翻訳してテキストデータである翻訳データを生成し、翻訳データを、音声データの供給元であるウェブ会議サーバ１０に供給する。 A plurality of translation servers 30 are installed in a plurality of different countries and regions. The multiple translation servers 30 are typically computers provided by different providers and running different translation software. Each translation server 30 acquires speech recognition data of a specific user participating in the web conference from any of the speech recognition servers 20, translates the speech recognition data to generate translation data which is text data, and translates the translation data. is supplied to the web conference server 10 that supplies the audio data.

ウェブ会議サーバ１０は、ウェブ会議に参加する複数のユーザの音声データから得られた音声認識データ及び翻訳データを、ウェブ会議の最中にリアルタイムに、ウェブ会議に参加する複数のユーザの複数のユーザ端末４０に出力する。また、ウェブ会議サーバ１０は、ウェブ会議終了後に、音声認識データ及び翻訳データの議事録データを作成し、ウェブ会議に参加する複数のユーザの複数のユーザ端末４０に出力する。 The web conference server 10 transmits speech recognition data and translation data obtained from voice data of multiple users participating in the web conference in real time during the web conference to multiple users participating in the web conference. Output to terminal 40 . In addition, after the web conference ends, the web conference server 10 creates minutes data of the speech recognition data and the translation data, and outputs the minutes data to the plurality of user terminals 40 of the users participating in the web conference.

２．ウェブ会議システムの機能的構成 2. Functional configuration of web conferencing system

図２は、ウェブ会議システムの機能的構成を示す。 FIG. 2 shows the functional configuration of the web conference system.

ウェブ会議サーバ１０は、ＲＯＭに記録された情報処理プログラムをＣＰＵがＲＡＭにロードして実行することにより、音声取得部１０１、音声認識部１０２、音声認識サーバ速度判断部１０３、音声認識サーバ決定部１０４、翻訳サーバ速度判断部１０５、内容判断部１０６、感情判断部１０７、翻訳サーバ決定部１０８、音声データ処理要求部１０９、処理データ取得部１１０、文脈チェック部１１１、リアルタイム出力部１１２及び議事録作成部１１３として動作する。ウェブ会議サーバ１０は、不揮発性又は揮発性の記憶装置１２０を有する。 The web conference server 10 loads the information processing program recorded in the ROM into the RAM by the CPU and executes it. 104, translation server speed determination unit 105, content determination unit 106, emotion determination unit 107, translation server determination unit 108, voice data processing request unit 109, processing data acquisition unit 110, context check unit 111, real-time output unit 112, and meeting minutes It operates as the creation unit 113 . Web conferencing server 10 has non-volatile or volatile storage 120 .

ユーザ端末４０は、ＲＯＭに記録された情報処理プログラムをＣＰＵがＲＡＭにロードして実行することにより、音声入力部４０１、リアルタイム入力部４０２及び議事録取得部４０３として動作する。ユーザ端末４０は、外付け又は内蔵のマイク４１１と、外付け又は内蔵のディスプレイ４１２と、不揮発性又は揮発性の記憶装置４１３とを有する。 The user terminal 40 operates as a voice input unit 401 , a real-time input unit 402 and a minutes acquisition unit 403 by having the CPU load an information processing program recorded in the ROM into the RAM and execute it. The user terminal 40 has an external or built-in microphone 411 , an external or built-in display 412 , and a nonvolatile or volatile storage device 413 .

３．ウェブ会議システムの動作フロー 3. Web conferencing system operation flow

図３は、ウェブ会議サーバの第１の動作フロー（ウェブ会議開始から終了まで）を示す。 FIG. 3 shows the first operation flow of the web conference server (from start to end of web conference).

まず、ウェブ会議の開始時に、ウェブ会議に参加する複数のユーザのユーザ端末４０は、それぞれ、ユーザ端末４０のＩＰアドレスに基づき、複数のウェブ会議サーバ１０から最も通信速度の速い特定の（１個の）ウェブ会議サーバ１０を選択し、選択したウェブ会議サーバ１０にアクセスし、ウェブ会議へのサインインを要求する。典型的には、各ユーザ端末４０は、ユーザ端末４０のＩＰアドレスにより特定される国や地域と最も近い国や地域を特定するＩＰアドレスを有するウェブ会議サーバ１０を選択する。このため、各ユーザ端末４０がアクセスする特定のウェブ会議サーバ１０は、ウェブ会議に参加する複数のユーザのユーザ端末４０毎に異なる。これにより、複数のユーザの複数のユーザ端末４０毎に、速度について最適なウェブ会議サーバ１０を使用できるとともに、ウェブ会議全体的に見ても処理が分散され安定的な通信及び情報処理を行える。 First, at the start of the web conference, each of the user terminals 40 of the users participating in the web conference receives a specific (one ), accesses the selected web conferencing server 10, and requests sign-in to the web conference. Typically, each user terminal 40 selects the web conference server 10 having an IP address that identifies the country or region closest to the country or region identified by the IP address of the user terminal 40 . Therefore, the specific web conference server 10 accessed by each user terminal 40 differs for each user terminal 40 of a plurality of users participating in the web conference. As a result, the optimum web conference server 10 in terms of speed can be used for each of the plurality of user terminals 40 of the plurality of users, and processing can be distributed in the web conference as a whole, enabling stable communication and information processing.

ウェブ会議サーバ１０は、ウェブ会議に参加する複数のユーザに含まれる特定のユーザが使用するユーザ端末４０からのアクセス及びウェブ会議へのサインイン要求を受け付け、ウェブ会議へのサインインを許可する。以下、特記しない限り、通信を確立した１個のウェブ会議サーバ１０及び１個のユーザ端末４０と、このユーザ端末４０を使用するユーザについて説明する。また、一意に決定したウェブ会議サーバ１０、音声認識サーバ２０、翻訳サーバ３０及びユーザ端末４０を、特定のウェブ会議サーバ１０、特定の音声認識サーバ２０、特定の翻訳サーバ３０及び特定のユーザ端末４０と称することがある。 The web conference server 10 accepts access and web conference sign-in requests from user terminals 40 used by specific users included in a plurality of users participating in the web conference, and permits sign-in to the web conference. Hereinafter, unless otherwise specified, one web conference server 10 and one user terminal 40 with which communication has been established, and a user using this user terminal 40 will be described. In addition, the uniquely determined web conference server 10, speech recognition server 20, translation server 30 and user terminal 40 are defined as a specific web conference server 10, a specific speech recognition server 20, a specific translation server 30 and a specific user terminal 40. It is sometimes called

ウェブ会議が開始すると、ユーザ端末４０の音声入力部４０１は、ユーザからマイク４１１を介して入力されたユーザの音声データを、通信を確立した１個のウェブ会議サーバ１０に供給し始める。 When the web conference starts, the voice input unit 401 of the user terminal 40 starts supplying user voice data input by the user via the microphone 411 to one web conference server 10 with which communication has been established.

ウェブ会議サーバ１０の音声取得部１０１は、ユーザの音声データをユーザ端末４０から取得する（ステップＳ１０１）。 The voice acquisition unit 101 of the web conference server 10 acquires user voice data from the user terminal 40 (step S101).

ウェブ会議サーバ１０の音声認識部１０２は、ユーザ端末４０から取得された音声データを音声認識し、特定のユーザの言語属性を判断する（ステップＳ１０２）。言語属性は、例えば、言語（例えば、英語）、方言（例えば、オーストラリア英語）、訛り（例えば、オランダ語訛り）等を含む。 The speech recognition unit 102 of the web conference server 10 recognizes the speech data acquired from the user terminal 40 and determines the language attribute of the specific user (step S102). Language attributes include, for example, language (eg, English), dialect (eg, Australian English), accent (eg, Dutch accent), and the like.

ウェブ会議サーバ１０の音声認識サーバ速度判断部１０３は、ウェブ会議サーバ１０と複数の音声認識サーバ２０との通信速度と、複数の音声認識サーバ２０の情報処理速度とを判断する（ステップＳ１０３）。通信速度及び情報処理速度は、それぞれ、基準値のみならず、リアルタイムの速度を含む。 The speech recognition server speed determination unit 103 of the web conference server 10 determines the communication speed between the web conference server 10 and the plurality of speech recognition servers 20 and the information processing speed of the plurality of speech recognition servers 20 (step S103). The communication speed and information processing speed each include not only a reference value but also a real-time speed.

ウェブ会議サーバ１０の音声認識サーバ決定部１０４は、音声認識部１０２が判断した言語属性と、音声認識サーバ速度判断部１０３が判断したウェブ会議サーバ１０と複数の音声認識サーバ２０との通信速度及び複数の音声認識サーバ２０の情報処理速度とに基づき、複数の音声認識サーバ２０から、特定のユーザに対して使用する特定の音声認識サーバ２０を決定する（ステップＳ１０４）。 The speech recognition server determination unit 104 of the web conference server 10 determines the language attribute determined by the speech recognition unit 102, the communication speed between the web conference server 10 and the plurality of voice recognition servers 20 determined by the speech recognition server speed determination unit 103, and A specific speech recognition server 20 to be used for a specific user is determined from among the plurality of speech recognition servers 20 based on the information processing speed of the plurality of speech recognition servers 20 (step S104).

一例として、音声認識サーバ決定部１０４は、特定の言語属性の言語認識精度が高く且つウェブ会議サーバ１０との距離が近い（結果的に通信速度が速い）１個の音声認識サーバ２０を予め規定している。音声認識サーバ決定部１０４は、この音声認識サーバ２０の情報処理速度が所定の閾値以上であれば、この音声認識サーバ２０を選択すればよい。別の例として、音声認識サーバ決定部１０４は、特定の言語属性の言語認識精度が高い複数の音声認識サーバ２０の候補を予め規定している。音声認識サーバ決定部１０４は、この候補から複数の音声認識サーバ２０との通信速度が最も速い１個の音声認識サーバ２０を選択し、選択した音声認識サーバ２０の情報処理速度が所定の閾値以上であれば、この音声認識サーバ２０を選択すればよい。 As an example, the speech recognition server determination unit 104 predefines one speech recognition server 20 that has high language recognition accuracy for a specific language attribute and is close to the web conference server 10 (resulting in high communication speed). are doing. The speech recognition server determination unit 104 may select this speech recognition server 20 if the information processing speed of this speech recognition server 20 is equal to or higher than a predetermined threshold. As another example, the speech recognition server determination unit 104 predefines a plurality of candidates for the speech recognition server 20 with high language recognition accuracy for a specific language attribute. The speech recognition server determination unit 104 selects one speech recognition server 20 having the fastest communication speed with a plurality of speech recognition servers 20 from the candidates, and the selected speech recognition server 20 has an information processing speed equal to or higher than a predetermined threshold. If so, this voice recognition server 20 should be selected.

ウェブ会議サーバ１０の翻訳サーバ速度判断部１０５は、音声認識サーバ決定部１０４が決定した、特定のユーザに対して使用する特定の音声認識サーバ２０と複数の翻訳サーバ３０との通信速度と、複数の翻訳サーバ３０の情報処理速度とを判断する（ステップＳ１０５）。通信速度及び情報処理速度は、それぞれ、基準値のみならず、リアルタイムの速度を含む。 The translation server speed determination unit 105 of the web conference server 10 determines the communication speed between the specific speech recognition server 20 used for the specific user and the plurality of translation servers 30 determined by the speech recognition server determination unit 104, and the information processing speed of the translation server 30 (step S105). The communication speed and information processing speed each include not only a reference value but also a real-time speed.

ウェブ会議サーバ１０の内容判断部１０６は、音声認識部１０２が音声認識した結果に基づきウェブ会議の内容を判断する（ステップＳ１０６）。内容判断部１０６は、例えば、予め作成されたＡＩモデルに音声認識部１０２が音声認識した結果を入力することで、ウェブ会議の内容を判断すればよい。ウェブ会議の内容は、例えば、スポーツや化学等のウェブ会議全体のテーマでよい。 The content determination unit 106 of the web conference server 10 determines the content of the web conference based on the result of voice recognition by the voice recognition unit 102 (step S106). The content determination unit 106 may determine the content of the web conference by, for example, inputting the results of voice recognition performed by the voice recognition unit 102 into an AI model created in advance. The content of the web conference may be, for example, the overall theme of the web conference, such as sports or chemistry.

ウェブ会議サーバ１０の感情判断部１０７は、音声認識部１０２が音声認識した結果に基づきユーザの感情（苛々、穏やか等）を判断する（ステップＳ１０７）。感情判断部１０７は、例えば、予め作成されたＡＩモデルに音声認識部１０２が音声認識した結果を入力することで、ユーザの感情を判断すればよい。 The emotion determination unit 107 of the web conference server 10 determines the user's emotion (irritated, calm, etc.) based on the voice recognition result of the voice recognition unit 102 (step S107). The emotion determination unit 107 may determine the user's emotion by, for example, inputting the result of voice recognition performed by the voice recognition unit 102 into an AI model created in advance.

ウェブ会議サーバ１０の翻訳サーバ決定部１０８は、翻訳サーバ速度判断部１０５が判断した特定の音声認識サーバ２０と複数の翻訳サーバ３０との通信速度と、複数の翻訳サーバ３０の情報処理速度とに基づき、複数の翻訳サーバ３０から、特定のユーザに対して使用する特定の翻訳サーバ３０を決定する。翻訳サーバ決定部１０８は、内容判断部１０６が判断したウェブ会議の内容及び／又は感情判断部１０７が判断した特定のユーザの感情にさらに基づき、特定のユーザに対して使用する特定の翻訳サーバ３０を決定してもよい（ステップＳ１０８）。 The translation server determination unit 108 of the web conference server 10 determines the communication speed between the specific speech recognition server 20 and the plurality of translation servers 30 determined by the translation server speed determination unit 105 and the information processing speed of the plurality of translation servers 30. Based on this, a specific translation server 30 to be used for a specific user is determined from a plurality of translation servers 30 . The translation server determination unit 108 selects a specific translation server 30 to be used for a specific user further based on the content of the web conference determined by the content determination unit 106 and/or the specific user's emotion determined by the emotion determination unit 107. may be determined (step S108).

一例として、翻訳サーバ決定部１０８は、特定の内容及び／又は特定の感情の翻訳精度が高く且つウェブ会議サーバ１０との距離が近い（結果的に通信速度が速い）１個の翻訳サーバ３０を予め規定している。翻訳サーバ決定部１０８は、この翻訳サーバ３０の情報処理速度が所定の閾値以上であれば、この翻訳サーバ３０を選択すればよい。別の例として、翻訳サーバ決定部１０８は、特定の内容及び／又は特定の感情の翻訳精度が高い複数の翻訳サーバ３０の候補を予め規定している。翻訳サーバ決定部１０８は、この候補から複数の翻訳サーバ３０との通信速度が最も速い１個の翻訳サーバ３０を選択し、選択した翻訳サーバ３０の情報処理速度が所定の閾値以上であれば、この翻訳サーバ３０を選択すればよい。 As an example, the translation server determination unit 108 selects one translation server 30 that has high translation accuracy for specific content and/or specific emotions and is close to the web conference server 10 (resulting in high communication speed). stipulated in advance. The translation server determination unit 108 may select this translation server 30 if the information processing speed of this translation server 30 is equal to or higher than a predetermined threshold. As another example, the translation server determination unit 108 predefines a plurality of translation server 30 candidates that have high accuracy in translating specific content and/or specific emotions. The translation server determination unit 108 selects one translation server 30 having the fastest communication speed with a plurality of translation servers 30 from the candidates, and if the information processing speed of the selected translation server 30 is equal to or higher than a predetermined threshold, This translation server 30 can be selected.

ウェブ会議サーバ１０の音声データ処理要求部１０９は、音声認識サーバ決定部１０４が決定した音声認識サーバ２０に、音声取得部１０１が取得したユーザの音声データと、翻訳サーバ決定部１０８が決定した翻訳サーバ３０を識別する識別情報とを供給し、処理を要求する（ステップＳ１０９）。特定の音声認識サーバ２０及び特定の翻訳サーバ３０の組み合わせは、ウェブ会議に参加する複数のユーザ毎に異なる。これにより、複数のユーザの複数のユーザ端末４０毎に、速度及び精度の両方について最適な音声認識サーバ２０及び翻訳サーバ３０を使用できるとともに、ウェブ会議全体的に見ても処理が分散され安定的な通信及び情報処理を行える。 The voice data processing request unit 109 of the web conference server 10 sends the user's voice data acquired by the voice acquisition unit 101 and the translation determined by the translation server determination unit 108 to the voice recognition server 20 determined by the voice recognition server determination unit 104 . It supplies identification information for identifying the server 30 and requests processing (step S109). A combination of a specific speech recognition server 20 and a specific translation server 30 differs for each of multiple users participating in the web conference. As a result, the optimal speech recognition server 20 and translation server 30 can be used for each of the plurality of user terminals 40 of the plurality of users in terms of both speed and accuracy. communication and information processing.

音声認識サーバ２０は、ウェブ会議サーバ１０の音声データ処理要求部１０９から、特定のユーザの音声データと、翻訳サーバ決定部１０８が決定した翻訳サーバ３０を識別する識別情報（ＩＰアドレス等）とを取得する。音声認識サーバ２０は、音声認識してテキストデータである音声認識データを生成し、音声認識データを、音声データの供給元であるウェブ会議サーバ１０及び識別情報により識別される翻訳サーバ３０に供給する。 The speech recognition server 20 receives from the speech data processing request unit 109 of the web conference server 10 the speech data of a specific user and the identification information (IP address, etc.) identifying the translation server 30 determined by the translation server determination unit 108. get. The speech recognition server 20 performs speech recognition to generate speech recognition data, which is text data, and supplies the speech recognition data to the web conference server 10 that supplies the speech data and to the translation server 30 identified by the identification information. .

翻訳サーバ３０は、音声認識サーバ２０から、音声認識データと、音声データの供給元であるウェブ会議サーバ１０を識別する識別情報（ＩＰアドレス等）とを取得する。翻訳サーバ３０は、音声認識データを翻訳してテキストデータである翻訳データを生成し、翻訳データを、識別情報により識別されるウェブ会議サーバ１０に供給する。 The translation server 30 acquires, from the speech recognition server 20, speech recognition data and identification information (such as an IP address) that identifies the web conference server 10 that supplies the speech data. The translation server 30 translates the speech recognition data to generate translation data, which is text data, and supplies the translation data to the web conference server 10 identified by the identification information.

ウェブ会議サーバ１０の処理データ取得部１１０は、音声認識サーバ２０から特定のユーザの音声認識データを取得し、特定の翻訳サーバ３０から特定のユーザの翻訳データを取得する（ステップＳ１１０）。 The processed data acquisition unit 110 of the web conference server 10 acquires the specific user's speech recognition data from the speech recognition server 20, and acquires the specific user's translation data from the specific translation server 30 (step S110).

ウェブ会議サーバ１０の文脈チェック部１１１は、対応する音声認識データ及び翻訳データを同期して、対応する音声認識データ及び翻訳データの文脈をチェックし、対応する音声認識データ及び／又は翻訳データをチェック結果に応じて修正する（ステップＳ１１１）。文脈チェック部１１１は、例えば、予め作成されたＡＩモデルに音声認識データ及び翻訳データを入力することで、音声認識データ及び翻訳データの文脈をチェックすればよい。文脈チェック部１１１は、チェック結果に応じて修正した後の対応する音声認識データ及び翻訳データを、記憶装置１２０に記憶する。 The context check unit 111 of the web conference server 10 synchronizes the corresponding speech recognition data and translation data, checks the context of the corresponding speech recognition data and translation data, and checks the corresponding speech recognition data and/or translation data. Correction is made according to the result (step S111). The context check unit 111 may check the context of the speech recognition data and the translation data by, for example, inputting the speech recognition data and the translation data into an AI model created in advance. The context check unit 111 stores the corresponding speech recognition data and translation data corrected according to the check result in the storage device 120 .

記憶装置１２０は、少なくともウェブ会議の最中及び終了後所定期間に音声認識データ及び翻訳データを記憶すればよく、不揮発性又は揮発性の何れの記憶装置でもよい。なおチェック結果に応じて修正した後のデータとは、チェック結果に基づき修正が必要無く修正無しのデータも含む。 The storage device 120 may store the speech recognition data and the translation data at least during and after the web conference, and may be either non-volatile or volatile storage device. The data after correction according to the check result includes data that does not need correction based on the check result and is not corrected.

ウェブ会議サーバ１０のリアルタイム出力部１１２は、チェック結果に応じて修正した後の対応する音声認識データ及び翻訳データを、ウェブ会議の最中にリアルタイムに、ウェブ会議に参加する複数のユーザの複数のユーザ端末４０に出力する（ステップＳ１１２）。即ち、リアルタイム出力部１１２は、この音声認識データ及び翻訳データのもととなる音声データを入力したユーザ端末４０だけではなく、ウェブ会議に参加する全員のユーザの複数のユーザ端末４０に出力する。 The real-time output unit 112 of the web conference server 10 outputs the corresponding speech recognition data and translation data corrected according to the check results in real time during the web conference to a plurality of users participating in the web conference. Output to the user terminal 40 (step S112). That is, the real-time output unit 112 outputs not only the user terminal 40 that has input the speech data that is the source of the speech recognition data and the translation data, but also a plurality of user terminals 40 of all users participating in the web conference.

ウェブ会議に参加する全員のユーザの複数のユーザ端末４０のリアルタイム入力部４０２は、ウェブ会議サーバ１０のリアルタイム出力部１１２から、ウェブ会議の最中にリアルタイムに、チェック結果に応じて修正した後の対応する音声認識データ及び翻訳データを取得する。リアルタイム入力部４０２は、ウェブ会議の最中にリアルタイムに、テキストデータである音声認識データ及び翻訳データをディスプレイ４１２に表示する。 The real-time input units 402 of the plurality of user terminals 40 of all users participating in the web conference are sent from the real-time output unit 112 of the web conference server 10 in real time during the web conference, after correction according to the check results. Acquire corresponding speech recognition data and translation data. The real-time input unit 402 displays speech recognition data and translation data, which are text data, on the display 412 in real time during the web conference.

ウェブ会議終了後、ウェブ会議サーバ１０の議事録作成部１１３は、チェック結果に応じて修正した後の対応する音声認識データ及び翻訳データを記憶装置１２０から読み出し、読み出した音声認識データ及び翻訳データに基づき、ウェブ会議の議事録データを作成する。具体的には、何れかのウェブ会議サーバ１０の議事録作成部１１３は、ウェブ会議に参加する複数のユーザの複数のユーザ端末４０がアクセスする複数のウェブ会議サーバ１０の記憶装置１２０から、全てのユーザの音声認識データ及び翻訳データを取得し、取得した音声認識データ及び翻訳データを時系列順に並べて議事録データを作成する。議事録作成部１１３は、作成した議事録データを、ウェブ会議に参加する複数のユーザの複数のユーザ端末４０に出力する。 After the web conference ends, the minutes creation unit 113 of the web conference server 10 reads out the corresponding speech recognition data and translation data after being corrected according to the check result from the storage device 120, and uses the read speech recognition data and translation data as Based on this, create the minutes data of the web conference. Specifically, the minutes creation unit 113 of any one of the web conference servers 10 extracts all user's speech recognition data and translation data are acquired, and the acquired speech recognition data and translation data are arranged in chronological order to create meeting minutes data. The minutes creation unit 113 outputs the created minutes data to a plurality of user terminals 40 of a plurality of users participating in the web conference.

各ユーザ端末４０の議事録取得部４０３は、ウェブ会議サーバ１０の議事録作成部１１３から、議事録データを取得し、記憶装置４１３に記憶する。記憶装置４１３は、少なくともウェブ会議の終了後所定期間に音声認識データ及び翻訳データを記憶すればよく、不揮発性又は揮発性の何れの記憶装置でもよい。 The minutes acquisition unit 403 of each user terminal 40 acquires minutes data from the minutes creation unit 113 of the web conference server 10 and stores it in the storage device 413 . The storage device 413 may store the speech recognition data and the translation data for at least a predetermined period after the end of the web conference, and may be either non-volatile or volatile storage device.

図４は、ウェブ会議サーバの第２の動作フロー（ウェブ会議の最中）を示す。 FIG. 4 shows the second operational flow of the web conference server (during the web conference).

ウェブ会議サーバ１０の音声認識サーバ速度判断部１０３は、ウェブ会議の最中に、定期的に（Ｌｏｏｐ）、ウェブ会議サーバ１０と特定の（通信中の）音声認識サーバ２０との通信速度と、特定の（通信中の）音声認識サーバ２０の情報処理速度を判断する（ステップＳ２０１）。通信速度及び情報処理速度は、それぞれ、リアルタイムの速度である。音声認識サーバ速度判断部１０３は、ウェブ会議サーバ１０と特定の音声認識サーバ２０との通信速度及び／又は特定の音声認識サーバ２０の情報処理速度が、ウェブ会議の最中に閾値未満に変化したか否かを判断する（ステップＳ２０２）。閾値は、例えば、円滑なウェブ会議を行うために速度的に許容できない値である。 During the web conference, the speech recognition server speed determination unit 103 of the web conference server 10 periodically (Loop) determines the communication speed between the web conference server 10 and a specific (in-communication) speech recognition server 20, The information processing speed of a specific (in communication) speech recognition server 20 is determined (step S201). The communication speed and information processing speed are respectively real-time speeds. The speech recognition server speed determination unit 103 detects that the communication speed between the web conference server 10 and the specific speech recognition server 20 and/or the information processing speed of the specific speech recognition server 20 has changed below a threshold value during the web conference. (step S202). The threshold is, for example, a speed-unacceptable value for conducting a smooth web conference.

ウェブ会議サーバ１０と特定の音声認識サーバ２０との通信速度及び／又は特定の音声認識サーバ２０の情報処理速度が、ウェブ会議の最中に閾値未満に変化した場合（ステップＳ２０２、ＹＥＳ）、音声認識サーバ速度判断部１０３は、ウェブ会議サーバ１０と複数の音声認識サーバ２０（通信中の音声認識サーバ２０以外の複数の音声認識サーバ２０）との通信速度と、複数の音声認識サーバ２０の情報処理速度とを判断する（ステップＳ２０３）。通信速度及び情報処理速度は、それぞれ、リアルタイムの速度である。 If the communication speed between the web conference server 10 and the specific speech recognition server 20 and/or the information processing speed of the specific speech recognition server 20 changes below the threshold during the web conference (step S202, YES), the speech The recognition server speed determination unit 103 determines the communication speed between the web conference server 10 and the plurality of speech recognition servers 20 (the plurality of speech recognition servers 20 other than the speech recognition server 20 currently in communication) and the information of the plurality of speech recognition servers 20. A processing speed is determined (step S203). The communication speed and information processing speed are respectively real-time speeds.

ウェブ会議サーバ１０の音声認識サーバ決定部１０４は、音声認識部１０２が判断（ステップＳ１０２）した言語属性と、音声認識サーバ速度判断部１０３が判断したウェブ会議サーバ１０と複数の音声認識サーバ２０との通信速度及び複数の音声認識サーバ２０の情報処理速度（ステップＳ２０３）とに基づき、複数の音声認識サーバ２０から、特定のユーザに対して使用する特定の音声認識サーバ２０を、新たに決定する（ステップＳ２０４）。音声認識サーバ決定部１０４は、ステップＳ１０４で説明した例と同様の方法で、特定の音声認識サーバ２０を新たに決定すればよい。なお、新たに決定する音声認識サーバ２０は、通信中の音声認識サーバ２０から変更されない場合もあり得る。 The speech recognition server determination unit 104 of the web conference server 10 determines the language attribute determined by the voice recognition unit 102 (step S102), the web conference server 10 determined by the voice recognition server speed determination unit 103, and the plurality of voice recognition servers 20. and the information processing speed of the plurality of speech recognition servers 20 (step S203), a specific speech recognition server 20 to be used for a specific user is newly determined from among the plurality of speech recognition servers 20. (Step S204). The voice recognition server determination unit 104 may newly determine the specific voice recognition server 20 by the same method as the example described in step S104. The newly determined voice recognition server 20 may not be changed from the voice recognition server 20 currently in communication.

特定のユーザに対して使用する特定の音声認識サーバ２０をウェブ会議の最中に変更する場合（ステップＳ２０５、ＹＥＳ）、ウェブ会議サーバ１０の翻訳サーバ速度判断部１０５は、新たに決定された特定の音声認識サーバ２０と複数の翻訳サーバ３０との通信速度と、複数の翻訳サーバ３０の情報処理速度とを判断する（ステップＳ２０６）。通信速度及び情報処理速度は、それぞれ、リアルタイムの速度である。 When changing the specific speech recognition server 20 used for a specific user during the web conference (step S205, YES), the translation server speed determination unit 105 of the web conference server 10 uses the newly determined specific The communication speed between the speech recognition server 20 and the plurality of translation servers 30 and the information processing speed of the plurality of translation servers 30 are determined (step S206). The communication speed and information processing speed are respectively real-time speeds.

新たに決定された特定の音声認識サーバ２０と複数の翻訳サーバ３０との通信速度と、複数の翻訳サーバ３０の情報処理速度とを判断する（ステップＳ２０６）理由は、新たな音声認識サーバ２０（ステップＳ２０５、ＹＥＳ）に対して最適な翻訳サーバ３０が、通信中の翻訳サーバ３０以外の翻訳サーバ３０である可能性があるからである。これにより、新たな音声認識サーバ２０と協働するのに最適な翻訳サーバ３０を選択することで、総合的に、速度及び精度の両方について最適な音声認識サーバ２０と翻訳サーバ３０との組を、リアルタイムに選択することができる。 The communication speed between the newly determined specific speech recognition server 20 and the plurality of translation servers 30 and the information processing speed of the plurality of translation servers 30 are determined (step S206). This is because there is a possibility that the most suitable translation server 30 for step S205, YES) is a translation server 30 other than the translation server 30 currently in communication. By selecting the translation server 30 that is most suitable for cooperating with the new speech recognition server 20, a combination of the speech recognition server 20 and the translation server 30 that is generally optimum in terms of both speed and accuracy can be obtained. , can be selected in real time.

ウェブ会議サーバ１０の翻訳サーバ決定部１０８は、新たに決定された特定の音声認識サーバ２０と複数の翻訳サーバ３０との通信速度と、複数の翻訳サーバ３０の情報処理速度とに基づき、複数の翻訳サーバ３０から、特定のユーザに対して使用する特定の翻訳サーバ３０を新たに決定する（ステップＳ２０７）。翻訳サーバ決定部１０８は、ステップＳ１０８で説明した例と同様の方法で、特定の翻訳サーバ３０を新たに決定すればよい。なお、新たに決定する翻訳サーバ３０は、通信中の翻訳サーバ３０から変更されない場合もあり得る。 The translation server determination unit 108 of the web conference server 10 selects a plurality of A specific translation server 30 to be used for a specific user is newly determined from the translation servers 30 (step S207). The translation server determination unit 108 may newly determine a specific translation server 30 by the same method as the example described in step S108. It should be noted that the newly determined translation server 30 may not be changed from the translation server 30 currently in communication.

特定のユーザに対して使用する特定の翻訳サーバ３０をウェブ会議の最中に変更しない場合（ステップＳ２０８、ＮＯ）、音声データ処理要求部１０９は、新たに決定された特定の音声認識サーバ２０に、特定のユーザの音声データと、特定の（通信中の）翻訳サーバ３０を識別する識別情報とを供給する（ステップＳ２０９）。これにより、特定のユーザに対して使用する特定の音声認識サーバ２０が、ウェブ会議の最中に変更され、リアルタイムで、速度及び精度の両方について最適な音声認識サーバ２０を使用できるとともに、ウェブ会議全体的に見ても処理が分散され安定的な通信及び情報処理を行える。 If the specific translation server 30 used for the specific user is not changed during the web conference (step S208, NO), the speech data processing request unit 109 sends the newly determined specific speech recognition server 20 , the voice data of the particular user and the identification information identifying the particular (communicating) translation server 30 (step S209). This allows the specific speech recognition server 20 to be used for a specific user to be changed during the web conference, allowing the optimum speech recognition server 20 to be used in both speed and accuracy in real time, and As a whole, processing is distributed and stable communication and information processing can be performed.

一方、特定のユーザに対して使用する特定の翻訳サーバ３０をウェブ会議の最中に変更する場合（ステップＳ２０８、ＹＥＳ）、ウェブ会議サーバ１０の音声データ処理要求部１０９は、新たに決定された特定の音声認識サーバ２０に、特定のユーザの音声データと、新たに決定された特定の翻訳サーバ３０を識別する識別情報とを供給する（ステップＳ２１０）。これにより、さらに、特定のユーザに対して使用する特定の翻訳サーバ３０が、ウェブ会議の最中に変更され、リアルタイムで、速度及び精度の両方について最適な翻訳サーバ３０を使用できるとともに、ウェブ会議全体的に見ても処理が分散され安定的な通信及び情報処理を行える。 On the other hand, if the specific translation server 30 used for a specific user is to be changed during the web conference (step S208, YES), the voice data processing request unit 109 of the web conference server 10 will The specified speech recognition server 20 is supplied with the specified user's voice data and the identification information for identifying the newly determined specified translation server 30 (step S210). This further allows the specific translation server 30 to be used for a specific user to be changed during the web conference so that the optimal translation server 30 can be used in real time in terms of both speed and accuracy, and the web conference As a whole, processing is distributed and stable communication and information processing can be performed.

図５は、ウェブ会議サーバの第３の動作フロー（ウェブ会議の最中）を示す。 FIG. 5 shows the third operation flow of the web conference server (during the web conference).

ウェブ会議サーバ１０の翻訳サーバ速度判断部１０５は、ウェブ会議の最中に、定期的に（Ｌｏｏｐ）、特定の（通信中の）音声認識サーバ２０と特定の（通信中の）翻訳サーバ３０との通信速度と、特定の（通信中の）翻訳サーバ３０の情報処理速度を判断する（ステップＳ３０１）。通信速度及び情報処理速度は、それぞれ、リアルタイムの速度である。翻訳サーバ速度判断部１０５は、特定の音声認識サーバ２０と特定の翻訳サーバ３０との通信速度及び／又は特定の翻訳サーバ３０の情報処理速度が、ウェブ会議の最中に閾値未満に変化したか否かを判断する（ステップＳ３０２）。閾値は、例えば、円滑なウェブ会議を行うために速度的に許容できない値である。 During the web conference, the translation server speed determination unit 105 of the web conference server 10 periodically (loops) the specific (in-communication) speech recognition server 20 and the specific (in-communication) translation server 30 and the information processing speed of a specific (in communication) translation server 30 (step S301). The communication speed and information processing speed are respectively real-time speeds. The translation server speed determination unit 105 determines whether the communication speed between the specific speech recognition server 20 and the specific translation server 30 and/or the information processing speed of the specific translation server 30 has changed below a threshold during the web conference. It is determined whether or not (step S302). The threshold is, for example, a speed-unacceptable value for conducting a smooth web conference.

特定の音声認識サーバ２０と特定の翻訳サーバ３０との通信速度及び／又は特定の翻訳サーバ３０の情報処理速度が、ウェブ会議の最中に閾値未満に変化した場合（ステップＳ３０２、ＹＥＳ）、翻訳サーバ速度判断部１０５は、特定の（通信中の）音声認識サーバ２０と複数の翻訳サーバ３０（通信中の翻訳サーバ３０以外の複数の翻訳サーバ３０）との通信速度と、複数の翻訳サーバ３０の情報処理速度とを判断する（ステップＳ３０３）。通信速度及び情報処理速度は、それぞれ、リアルタイムの速度である。 If the communication speed between the specific speech recognition server 20 and the specific translation server 30 and/or the information processing speed of the specific translation server 30 changes below the threshold during the web conference (step S302, YES), the translation The server speed determination unit 105 determines the communication speed between a specific (in-communication) speech recognition server 20 and a plurality of translation servers 30 (a plurality of translation servers 30 other than the translation server 30 in communication), information processing speed (step S303). The communication speed and information processing speed are respectively real-time speeds.

ウェブ会議サーバ１０の翻訳サーバ決定部１０８は、翻訳サーバ速度判断部１０５が判断した特定の音声認識サーバ２０と複数の翻訳サーバ３０との通信速度と、複数の翻訳サーバ３０の情報処理速度（ステップＳ３０３）とに基づき、複数の翻訳サーバ３０から、特定のユーザに対して使用する特定の翻訳サーバ３０を、新たに決定する（ステップＳ３０４）。翻訳サーバ決定部１０８は、ステップＳ１０８で説明した例と同様の方法で、特定の翻訳サーバ３０を新たに決定すればよい。なお、新たに決定する翻訳サーバ３０は、通信中の翻訳サーバ３０から変更されない場合もあり得る。 The translation server determination unit 108 of the web conference server 10 determines the communication speed between the specific speech recognition server 20 determined by the translation server speed determination unit 105 and the plurality of translation servers 30, and the information processing speed of the plurality of translation servers 30 (step S303), a specific translation server 30 to be used for a specific user is newly determined from a plurality of translation servers 30 (step S304). The translation server determination unit 108 may newly determine a specific translation server 30 by the same method as the example described in step S108. It should be noted that the newly determined translation server 30 may not be changed from the translation server 30 currently in communication.

特定のユーザに対して使用する特定の翻訳サーバ３０をウェブ会議の最中に変更する場合（ステップＳ３０５、ＹＥＳ）、ウェブ会議サーバ１０の音声認識サーバ速度判断部１０３は、ウェブ会議サーバ１０と複数の音声認識サーバ２０との通信速度と、複数の音声認識サーバ２０の情報処理速度とを判断する（ステップＳ３０６）。通信速度及び情報処理速度は、それぞれ、リアルタイムの速度である。 When the specific translation server 30 used for a specific user is changed during the web conference (step S305, YES), the speech recognition server speed determination unit 103 of the web conference server 10 and the web conference server 10 and the information processing speed of a plurality of speech recognition servers 20 (step S306). The communication speed and information processing speed are respectively real-time speeds.

複数の音声認識サーバ２０の速度を判断する（ステップＳ３０６）理由は、特定の音声認識サーバ２０と特定の翻訳サーバ３０との通信速度及び／又は特定の翻訳サーバ３０の情報処理速度が、ウェブ会議の最中に閾値未満に変化した（ステップＳ３０２、ＹＥＳ）ということは、特定の音声認識サーバ２０に問題がある可能性があり、その場合は使用する音声認識サーバ２０を変更したほうが良い場合があるからである。これにより、新たな翻訳サーバ３０と協働するのに最適な音声認識サーバ２０を選択することで、総合的に、速度及び精度の両方について最適な音声認識サーバ２０と翻訳サーバ３０との組を、リアルタイムに選択することができる。 The reason for determining the speed of the plurality of speech recognition servers 20 (step S306) is that the communication speed between the specific speech recognition server 20 and the specific translation server 30 and/or the information processing speed of the specific translation server 30 is (step S302, YES) means that there is a possibility that there is a problem with the specific speech recognition server 20, in which case it may be better to change the speech recognition server 20 to be used. Because there is As a result, by selecting the optimal speech recognition server 20 to cooperate with the new translation server 30, a set of the optimal speech recognition server 20 and translation server 30 in terms of both speed and accuracy can be comprehensively selected. , can be selected in real time.

ウェブ会議サーバ１０の音声認識サーバ決定部１０４は、音声認識部１０２が判断した言語属性（ステップＳ１０２）と、音声認識サーバ速度判断部１０３が判断したウェブ会議サーバ１０と複数の音声認識サーバ２０との通信速度及び複数の音声認識サーバ２０の情報処理速度（ステップＳ３０６）とに基づき、複数の音声認識サーバ２０から、特定のユーザに対して使用する特定の音声認識サーバ２０を新たに決定する（ステップＳ３０７）。音声認識サーバ決定部１０４は、ステップＳ１０４で説明した例と同様の方法で、特定の音声認識サーバ２０を新たに決定すればよい。なお、新たに決定する音声認識サーバ２０は、通信中の音声認識サーバ２０から変更されない場合もあり得る。 The speech recognition server determination unit 104 of the web conference server 10 determines the language attribute determined by the speech recognition unit 102 (step S102), the web conference server 10 determined by the speech recognition server speed determination unit 103, and the plurality of voice recognition servers 20. and the information processing speed of the plurality of speech recognition servers 20 (step S306), the specific speech recognition server 20 to be used for the specific user is newly determined from the plurality of speech recognition servers 20 ( step S307). The voice recognition server determination unit 104 may newly determine the specific voice recognition server 20 by the same method as the example described in step S104. The newly determined voice recognition server 20 may not be changed from the voice recognition server 20 currently in communication.

特定のユーザに対して使用する特定の音声認識サーバ２０をウェブ会議の最中に変更しない場合（ステップＳ３０８、ＮＯ）、音声データ処理要求部１０９は、特定の（通信中の）音声認識サーバ２０に、特定のユーザの音声データと、新たに決定された翻訳サーバ３０を識別する識別情報とを供給する（ステップＳ３０９）。これにより、特定のユーザに対して使用する特定の翻訳サーバ３０が、ウェブ会議の最中に変更され、リアルタイムで、速度及び精度の両方について最適な翻訳サーバ３０を使用できるとともに、ウェブ会議全体的に見ても処理が分散され安定的な通信及び情報処理を行える。 If the specific speech recognition server 20 used for the specific user is not changed during the web conference (step S308, NO), the speech data processing request unit 109 requests the specific (in-communication) speech recognition server 20 , the specific user's voice data and identification information for identifying the newly determined translation server 30 are supplied (step S309). This allows the specific translation server 30 to be used for a specific user to be changed during the web conference, allowing the optimal translation server 30 to be used in terms of both speed and accuracy in real time, as well as the overall web conference. From this point of view, processing is distributed and stable communication and information processing can be performed.

一方、特定のユーザに対して使用する特定の音声認識サーバ２０をウェブ会議の最中に変更する場合（ステップＳ３０８、ＹＥＳ）、ウェブ会議サーバ１０の音声データ処理要求部１０９は、新たに決定された特定の音声認識サーバ２０に、特定のユーザの音声データと、新たに決定された特定の翻訳サーバ３０を識別する識別情報とを供給する（ステップＳ３１０）。これにより、さらに、特定のユーザに対して使用する特定の音声認識サーバ２０が、ウェブ会議の最中に変更され、リアルタイムで、速度及び精度の両方について最適な音声認識サーバ２０を使用できるとともに、ウェブ会議全体的に見ても処理が分散され安定的な通信及び情報処理を行える。 On the other hand, when changing the specific voice recognition server 20 used for a specific user during the web conference (step S308, YES), the voice data processing request unit 109 of the web conference server 10 newly determines The specified speech recognition server 20 is supplied with the specified user's voice data and the identification information for identifying the newly determined specified translation server 30 (step S310). This further allows the particular speech recognition server 20 to be used for a particular user to be changed during the web conference, allowing the optimal speech recognition server 20 to be used in both speed and accuracy in real time, and Processing is decentralized and stable communication and information processing can be performed even when looking at the web conference as a whole.

本技術の各実施形態及び各変形例について上に説明したが、本技術は上述の実施形態にのみ限定されるものではなく、本技術の要旨を逸脱しない範囲内において種々変更を加え得ることは勿論である。 Although the embodiments and modifications of the present technology have been described above, the present technology is not limited to the above-described embodiments, and various modifications can be made without departing from the gist of the present technology. Of course.

ウェブ会議システム１
ウェブ会議サーバ１０
音声認識サーバ２０
翻訳サーバ３０
ユーザ端末４０ Web conference system 1
Web conference server 10
Voice recognition server 20
translation server 30
User terminal 40

Claims

A web conferencing server,
a voice acquisition unit that acquires voice data of a specific user included in a plurality of users participating in a web conference from a user terminal of the specific user;
a voice recognition unit that recognizes the voice data and determines the language attribute of the specific user;
a speech recognition server speed determination unit that determines a communication speed between the web conference server and the plurality of speech recognition servers and an information processing speed of the plurality of speech recognition servers;
Based on the language attribute, the communication speed with the plurality of speech recognition servers, and the information processing speed of the plurality of speech recognition servers, a specific a voice recognition server determination unit that determines a voice recognition server;
a translation server speed determination unit that determines a communication speed between the specific speech recognition server used for the specific user and a plurality of translation servers and an information processing speed of the plurality of translation servers;
Translation server determination for determining a specific translation server to be used for the specific user from among the plurality of translation servers based on a communication speed with the plurality of translation servers and an information processing speed of the plurality of translation servers. Department and
The speech data of the specific user is supplied to the specific speech recognition server by supplying the speech data of the specific user and identification information for identifying the specific translation server to the specific speech recognition server. a speech data processing requesting unit for generating speech recognition data, which is text data, by recognizing the voice of the user, and causing the specific translation server to translate the speech recognition data of the specific user to generate translation data, which is text data;
and
A web conference server in which a combination of the specific speech recognition server and the specific translation server is different for each of the plurality of users participating in the web conference.

The web conferencing server of claim 1, comprising:
The speech recognition server speed determination unit periodically determines a communication speed between the web conference server and the specific speech recognition server and an information processing speed of the specific speech recognition server during the web conference. death,
When the communication speed between the web conference server and the specific speech recognition server and/or the information processing speed of the specific speech recognition server changes below a threshold during the web conference,
The speech recognition server speed determination unit determines a communication speed between the web conference server and the plurality of speech recognition servers and an information processing speed of the plurality of speech recognition servers,
The speech recognition server determination unit newly determines a specific speech recognition server to be used for the specific user from the plurality of speech recognition servers,
The speech data processing request unit supplies the newly determined specific speech recognition server with the speech data of the specific user and identification information for identifying the specific translation server, thereby obtaining the specified translation server. changing the specific speech recognition server used for the user of the web conference server during the web conference.

A web conferencing server according to claim 2,
When changing the specific speech recognition server used for the specific user during the web conference,
The translation server speed determination unit determines the communication speed between the newly determined specific speech recognition server and the plurality of translation servers and the information processing speed of the plurality of translation servers,
The translation server determination unit, based on the newly determined communication speed between the specific speech recognition server and the plurality of translation servers and the information processing speed of the plurality of translation servers, from the plurality of translation servers, newly determining a specific translation server to be used for the specific user;
The speech data processing request unit supplies the newly determined specific speech recognition server with the speech data of the specific user and identification information for identifying the newly decided specific translation server. a web conference server that changes the specific translation server used for the specific user during the web conference.

A web conference server according to any one of claims 1 to 3,
The translation server speed determination unit periodically determines the communication speed between the specific speech recognition server and the specific translation server and the information processing speed of the specific translation server during the web conference. ,
If the communication speed between the specific speech recognition server and the specific translation server and/or the information processing speed of the specific translation server changes below a threshold value during the web conference,
The translation server speed determination unit determines a communication speed between the specific speech recognition server and the plurality of translation servers and an information processing speed of the plurality of translation servers,
The translation server determination unit newly determines a specific translation server to be used for the specific user from the plurality of translation servers,
The speech data processing request unit supplies the newly determined specific speech recognition server with the speech data of the specific user and identification information for identifying the newly decided specific translation server. a web conference server that changes the specific translation server used for the specific user during the web conference.

A web conferencing server according to claim 3,
When changing the specific translation server used for the specific user during the web conference,
The speech recognition server speed determination unit determines the communication speed between the newly determined specific translation server and the plurality of speech recognition servers and the information processing speed of the plurality of speech recognition servers,
The speech recognition server determination unit determines the plurality of speech recognition servers based on the newly determined communication speed between the specific translation server and the plurality of speech recognition servers and the information processing speed of the plurality of speech recognition servers. From the server, newly determine a specific speech recognition server to be used for the specific user,
The speech data processing request unit supplies the newly determined specific speech recognition server with the speech data of the specific user and identification information for identifying the newly decided specific translation server. changing the specific speech recognition server used for the specific user during the web conference.

A web conference server according to any one of claims 1 to 5,
a processed data acquisition unit that acquires the speech recognition data of the specific user from the specific speech recognition server and the translation data of the specific user from the specific translation server;
a context check unit that checks the context of the corresponding speech recognition data and the translation data and corrects the corresponding speech recognition data and/or the translation data according to the check result;
Real-time outputting the corresponding speech recognition data and the translation data corrected according to the check result to the plurality of user terminals of the plurality of users participating in the web conference in real time during the web conference. an output unit;
A web conferencing server further comprising:

A web conferencing server according to claim 6,
minutes data of the web conference is created based on the corresponding speech recognition data and the translation data after being corrected according to the check result, and the minutes data is sent to multiple users participating in the web conference a minutes creation unit that outputs to the user terminal of
A web conferencing server further comprising:

A web conference server according to any one of claims 1 to 7,
a content determination unit that determines content of the web conference based on the result of voice recognition by the voice recognition unit;
an emotion determination unit that determines the emotion of the specific user based on the result of voice recognition by the voice recognition unit;
further comprising
A web conference server, wherein the translation server determination unit determines a specific translation server to be used for the specific user, further based on the content of the web conference and/or emotions of the specific user.

A web conference server according to any one of claims 1 to 8,
The user terminal with which the web conference server communicates selects the specific web conference server with the fastest communication speed from the plurality of web conference servers based on the IP address of the user terminal, and selects the specific web conference server. to access
The specific web conference server accessed by the user terminal differs for each user terminal of a plurality of users participating in the web conference.

Acquiring voice data of a specific user included in a plurality of users participating in a web conference from a user terminal of the specific user;
recognizing the voice data, determining the language attribute of the specific user,
determining a communication speed between a web conference server and a plurality of speech recognition servers and an information processing speed of the plurality of speech recognition servers;
Based on the language attribute, the communication speed with the plurality of speech recognition servers, and the information processing speed of the plurality of speech recognition servers, a specific determine the speech recognition server,
Determining the communication speed between the specific speech recognition server used for the specific user and the plurality of translation servers and the information processing speed of the plurality of translation servers,
determining a specific translation server to be used for the specific user from among the plurality of translation servers based on a communication speed with the plurality of translation servers and an information processing speed of the plurality of translation servers;
The speech data of the specific user is supplied to the specific speech recognition server by supplying the speech data of the specific user and identification information for identifying the specific translation server to the specific speech recognition server. to generate speech recognition data that is text data, and cause the specific translation server to translate the speech recognition data of the specific user to generate translation data that is text data,
A web conference method in which a combination of the specific speech recognition server and the specific translation server differs for each of the plurality of users participating in the web conference.

a web conferencing server interconnected via a network;
a plurality of speech recognition servers;
a plurality of translation servers;
and
The web conferencing server
a voice acquisition unit that acquires voice data of a specific user included in a plurality of users participating in a web conference from a user terminal of the specific user;
a voice recognition unit that recognizes the voice data and determines the language attribute of the specific user;
a speech recognition server speed determination unit that determines a communication speed between the web conference server and the plurality of speech recognition servers and an information processing speed of the plurality of speech recognition servers;
Based on the language attribute, the communication speed with the plurality of speech recognition servers, and the information processing speed of the plurality of speech recognition servers, a specific a voice recognition server determination unit that determines a voice recognition server;
a translation server speed determination unit that determines a communication speed between the specific speech recognition server used for the specific user and the plurality of translation servers and an information processing speed of the plurality of translation servers;
Translation server determination for determining a specific translation server to be used for the specific user from among the plurality of translation servers based on a communication speed with the plurality of translation servers and an information processing speed of the plurality of translation servers. Department and
The speech data of the specific user is supplied to the specific speech recognition server by supplying the speech data of the specific user and identification information for identifying the specific translation server to the specific speech recognition server. a speech data processing requesting unit for generating speech recognition data, which is text data, by recognizing the voice of the user, and causing the specific translation server to translate the speech recognition data of the specific user to generate translation data, which is text data;
has
A web conference system in which a combination of the specific speech recognition server and the specific translation server differs for each of the plurality of users participating in the web conference.