[go: up one dir, main page]

CN108184032B - A service method and device for a customer service system - Google Patents

A service method and device for a customer service system Download PDF

Info

Publication number
CN108184032B
CN108184032B CN201611116110.XA CN201611116110A CN108184032B CN 108184032 B CN108184032 B CN 108184032B CN 201611116110 A CN201611116110 A CN 201611116110A CN 108184032 B CN108184032 B CN 108184032B
Authority
CN
China
Prior art keywords
voice
customer service
synthesized
text
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611116110.XA
Other languages
Chinese (zh)
Other versions
CN108184032A (en
Inventor
王朝民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
Research Institute of China Mobile Communication Co Ltd
Original Assignee
Research Institute of China Mobile Communication Co Ltd
China Mobile Communications Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Research Institute of China Mobile Communication Co Ltd, China Mobile Communications Corp filed Critical Research Institute of China Mobile Communication Co Ltd
Priority to CN201611116110.XA priority Critical patent/CN108184032B/en
Publication of CN108184032A publication Critical patent/CN108184032A/en
Application granted granted Critical
Publication of CN108184032B publication Critical patent/CN108184032B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/50Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
    • H04M3/527Centralised call answering arrangements not requiring operator intervention
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本发明公开了一种客服系统的服务方法及装置,包括:接收语音合成指令;根据接收到的语音合成指令,确定待合成话音文本;根据确定出的待合成话音文本,以及预先按照当前接话的客服人员音色建立的语音参数模型库,合成待合成话音文本的具有客服人员音色特征的话音;接收客服人员的指令,并根据指令播放由合成的话音和/或客服人员人工语音组成的语句。由于播放给用户的是具有客服人员音色特征的合成的话音和/或客服人员人工语音组成的语句,因此,极大地减少了客服人员在人工服务过程中的话语量,降低了客服人员的疲劳压力,并且,用户会默认为客服人员一直在和其言语交流,从而提高了客服系统的服务质量,增强了用户体验。

Figure 201611116110

The invention discloses a service method and device for a customer service system, comprising: receiving a speech synthesis instruction; determining a speech text to be synthesized according to the received speech synthesis instruction; The voice parameter model library established by the timbre of the customer service personnel, synthesizes the voice with the characteristics of the timbre of the customer service staff in the voice text to be synthesized; receives the instructions of the customer service staff, and plays the sentences composed of the synthesized voice and/or the artificial voice of the customer service staff according to the instructions. Since what is played to the user is the synthesized voice with the timbre characteristics of the customer service staff and/or the sentences composed of the artificial voice of the customer service staff, the amount of speech of the customer service staff during the manual service process is greatly reduced, and the fatigue pressure of the customer service staff is reduced. , and users will tacitly assume that the customer service staff has been communicating with them verbally, thereby improving the service quality of the customer service system and enhancing the user experience.

Figure 201611116110

Description

一种客服系统的服务方法及装置A service method and device for a customer service system

技术领域technical field

本发明涉及通讯技术领域,尤其涉及一种客服系统的服务方法及装置。The present invention relates to the field of communication technologies, and in particular, to a service method and device of a customer service system.

背景技术Background technique

目前移动、联通、电信三大通讯公司的客服系统,通常由机器客服和人工客服组成。在电话服务过程中,当接收到来自用户的会话消息时,先由机器客服进行服务。当用户认为机器客服无法解决其提出的问题时,再手动选择人工客服,向人工客服进行咨询。At present, the customer service systems of the three major communication companies of China Mobile, China Unicom and China Telecom are usually composed of machine customer service and human customer service. In the process of telephone service, when a conversation message from a user is received, the machine customer service will first perform the service. When the user thinks that the machine customer service cannot solve the problem raised by him, he manually selects the human customer service and consults the human customer service.

目前的这种客服系统中,机器客服的语音比较单调乏味,听起来没有自然语言那样生动形象,并且,机器客服不具备临场应变能力,能解决的问题有限,因此,在电话服务过程中,人工客服占据重要的地位。但是,人工客服每次轮班需要连续工作6小时以上,并且在电话服务过程中需要根据用户不用的问题和情况给出大量的回答和解释,非常容易感到疲劳。疲劳会造成人工客服的发音不准或读错等情况发生,从而降低了客户服务的质量,影响用户体验。In the current customer service system, the voice of the machine customer service is relatively dull and does not sound as vivid as natural language. Moreover, the machine customer service does not have the ability to respond on the spot, and the problems that can be solved are limited. Therefore, in the process of telephone service, manual Customer service plays an important role. However, the manual customer service needs to work continuously for more than 6 hours per shift, and needs to give a large number of answers and explanations according to the user's unused questions and situations during the telephone service process, which is very easy to feel fatigued. Fatigue can cause inaccurate pronunciation or mispronunciation of human customer service personnel, thereby reducing the quality of customer service and affecting user experience.

因此,如何提高客服系统的服务质量,进而提升用户体验,是亟需解决的技术问题。Therefore, how to improve the service quality of the customer service system, thereby improving the user experience, is a technical problem that needs to be solved urgently.

发明内容SUMMARY OF THE INVENTION

本发明实施例提供一种客服系统的服务方法及装置,用以解决现有技术中存在的如何提高客服系统的服务质量,进而提升用户体验的问题。Embodiments of the present invention provide a service method and device for a customer service system, which are used to solve the problem in the prior art of how to improve the service quality of the customer service system, thereby improving user experience.

本发明实施例提供了一种客服系统的服务方法,包括:The embodiment of the present invention provides a service method of a customer service system, including:

接收语音合成指令;receive voice synthesis instructions;

根据接收到的所述语音合成指令,确定待合成话音文本;According to the received speech synthesis instruction, determine the speech text to be synthesized;

根据确定出的所述待合成话音文本,以及预先按照当前接话的客服人员音色建立的语音参数模型库,合成所述待合成话音文本的具有所述客服人员音色特征的话音;According to the determined voice text to be synthesized, and the voice parameter model library established in advance according to the timbre of the customer service staff currently answering the call, the voice of the voice text to be synthesized with the timbre characteristics of the customer service staff is synthesized;

接收所述客服人员的指令,并根据所述指令播放由所述合成的话音和/或所述客服人员人工语音组成的语句。Receive an instruction from the customer service staff, and play a sentence composed of the synthesized voice and/or the artificial voice of the customer service staff according to the instruction.

在一种可能的实现方式中,在本发明实施例提供的上述服务方法中,所述根据接收到的所述语音合成指令,确定待合成话音文本,具体包括:In a possible implementation manner, in the above service method provided by the embodiment of the present invention, the determining of the speech text to be synthesized according to the received speech synthesis instruction specifically includes:

确定接收到的所述语音合成指令对应的待合成话音文本是否为标准话术语句;Determine whether the speech text to be synthesized corresponding to the received speech synthesis instruction is a standard vocabulary sentence;

若是,则将所述语音合成指令对应的标准话术语句确定为所述待合成话音文本;If yes, then determine the standard speech sentence corresponding to the speech synthesis instruction as the speech text to be synthesized;

若否,则将填入所述语音合成指令携带的文本后的填空式话术语句作为所述待合成话音文本。If not, the fill-in-the-blank vocabulary phrase after filling in the text carried by the speech synthesis instruction is used as the speech text to be synthesized.

在一种可能的实现方式中,在本发明实施例提供的上述服务方法中,所述根据确定出的所述待合成话音文本,以及预先按照当前接话的客服人员音色建立的语音参数模型库,合成所述待合成话音文本的具有所述客服人员音色特征的话音,具体包括:In a possible implementation manner, in the above-mentioned service method provided by the embodiment of the present invention, the voice parameter model library established according to the voice text to be synthesized determined and the voice parameter model library established in advance according to the voice of the customer service staff currently answering the call , synthesizing the voice with the timbre characteristic of the customer service personnel of the voice text to be synthesized, specifically including:

采用文本分析器对确定出的所述待合成话音文本进行分词,得到与所述待合成话音文本对应的词语标注文件;Using a text analyzer to perform word segmentation on the determined voice text to be synthesized, to obtain a word annotation file corresponding to the voice text to be synthesized;

根据所述词语标注文件和预先按照当前接话的客服人员音色建立的语音参数模型库,确定与所述待合成话音文本对应的语音特征参数;According to the word annotation file and the voice parameter model library established in advance according to the timbre of the customer service staff currently answering the call, determine the voice feature parameters corresponding to the voice text to be synthesized;

根据所述语音特征参数,合成所述待合成话音文本的具有所述客服人员音色特征的话音。According to the speech feature parameters, the speech with the timbre characteristic of the customer service personnel of the speech text to be synthesized is synthesized.

在一种可能的实现方式中,在本发明实施例提供的上述服务方法中,所述根据所述词语标注文件和预先按照当前接话的客服人员音色建立的语音参数模型库,确定与所述待合成话音文本对应的语音特征参数,具体包括:In a possible implementation manner, in the above service method provided by the embodiment of the present invention, the voice parameter model library established in advance according to the word annotation file and the voice tone of the customer service staff currently answering the call is determined to be the same as the voice parameter model library. The speech feature parameters corresponding to the speech text to be synthesized, specifically including:

在预先按照当前接话的客服人员音色建立的语音参数模型库中,查找与所述词语标注文件中各词语对应的语音参数模型;Searching for the voice parameter model corresponding to each word in the word annotation file in the voice parameter model library established in advance according to the voice tone of the customer service staff currently answering the call;

按照各词语对应的语音参数模型,通过参数生成算法确定与所述待合成话音文本对应的基频信息换算log域得到的LF0,非周期成分谱信息在不同频带上的平均值BAP,以及声道谱信息在帧内提取的18维线谱对参数LSP。According to the speech parameter model corresponding to each word, the parameter generation algorithm is used to determine the LF0 obtained by converting the fundamental frequency information corresponding to the speech text to be synthesized in the log domain, the average value BAP of the aperiodic component spectral information in different frequency bands, and the vocal tract. Spectral information is extracted within the frame of the 18-dimensional line spectrum pair parameter LSP.

在一种可能的实现方式中,在本发明实施例提供的上述服务方法中,所述根据所述语音特征参数,合成所述待合成话音文本的具有所述客服人员音色特征的话音,具体包括:In a possible implementation manner, in the above-mentioned service method provided by the embodiment of the present invention, the synthesizing the voice with the timbre feature of the customer service staff in the voice text to be synthesized according to the voice feature parameter specifically includes: :

采用确定出的所述LF0和所述BAP形成与所述待合成话音文本对应的混合激励源;Using the determined LFO and the BAP to form a mixed excitation source corresponding to the to-be-synthesized speech text;

将确定出的所述混合激励源输入滤波器,并通过确定出的所述LSP对所述滤波器进行控制,合成所述待合成话音文本的具有所述客服人员音色特征的话音。The determined mixed excitation source is input into a filter, and the determined LSP is used to control the filter to synthesize the voice of the to-be-synthesized voice text with the timbre characteristic of the customer service staff.

在一种可能的实现方式中,在本发明实施例提供的上述服务方法中,还包括:通过如下方式建立具有所述客服人员音色的语音参数模型库:In a possible implementation manner, the above-mentioned service method provided by the embodiment of the present invention further includes: establishing a voice parameter model library with the timbre of the customer service personnel in the following manner:

分解客服人员的语音数据库中包含的原始语音波形文件,得到所述原始语音波形文件中每个音节的基频信息、非周期成分谱信息和声道谱信息;Decomposing the original voice waveform file contained in the voice database of the customer service personnel to obtain fundamental frequency information, aperiodic component spectrum information and channel spectrum information of each syllable in the original voice waveform file;

将每个音节的所述基频信息换算到log域得LF0;Convert the fundamental frequency information of each syllable to log domain to obtain LF0;

将每个音节的所述非周期成分谱信息在预先设定的各频带分别取平均值得到BAP;Taking the average value of the aperiodic component spectrum information of each syllable in each preset frequency band to obtain BAP;

将每个音节的所述声道谱信息在帧内提取18维线谱对参数LSP;Extracting the 18-dimensional line spectrum pair parameter LSP within the frame of the vocal tract spectrum information of each syllable;

按照所述原始语音波形文件对应的词语标注文件,对每个音节确定出的LF0、BAP和LSP按照隐马尔可夫模型建立语音参数模型;According to the word annotation file corresponding to the original speech waveform file, the LF0, BAP and LSP determined for each syllable are established according to the Hidden Markov Model speech parameter model;

对建立好的各语音参数模型进行模型聚类和模型训练后,得到具有所述客服人员音色的语音参数模型库。After performing model clustering and model training on each established voice parameter model, a voice parameter model library with the voice of the customer service personnel is obtained.

本发明实施例还提供了一种客服系统的服务装置,包括:The embodiment of the present invention also provides a service device of a customer service system, including:

接收单元,用于接收语音合成指令;a receiving unit for receiving a speech synthesis instruction;

确定单元,用于根据接收到的所述语音合成指令,确定待合成话音文本;a determining unit, configured to determine the speech text to be synthesized according to the received speech synthesis instruction;

合成单元,用于根据确定出的所述待合成话音文本,以及预先按照当前接话的客服人员音色建立的语音参数模型库,合成所述待合成话音文本的具有所述客服人员音色特征的话音;The synthesis unit is used for synthesizing the voice of the voice text to be synthesized with the timbre feature of the customer service staff according to the determined voice text to be synthesized and the voice parameter model library established in advance according to the timbre of the customer service staff currently answering the call ;

播放单元,用于接收所述客服人员的指令,并根据所述指令播放由所述合成的话音和/或所述客服人员人工语音组成的语句。A playing unit, configured to receive an instruction from the customer service staff, and play a sentence composed of the synthesized voice and/or the artificial voice of the customer service staff according to the instruction.

在一种可能的实现方式中,在本发明实施例提供的上述服务装置中,所述确定单元,具体用于确定接收到的所述语音合成指令对应的待合成话音文本是否为标准话术语句;若是,则将所述语音合成指令对应的标准话术语句确定为所述待合成话音文本;若否,则将填入所述语音合成指令携带的文本后的填空式话术语句作为所述待合成话音文本。In a possible implementation manner, in the above-mentioned service device provided by the embodiment of the present invention, the determining unit is specifically configured to determine whether the speech text to be synthesized corresponding to the received speech synthesis instruction is a standard speech sentence If it is, then the standard vocabulary sentence corresponding to the speech synthesis instruction is determined as the speech text to be synthesized; if not, then the fill-in-the-blank type vocabulary sentence after the text carried by the speech synthesis instruction is used as the described speech synthesis Speech text to be synthesized.

在一种可能的实现方式中,在本发明实施例提供的上述服务装置中,所述合成单元,包括:In a possible implementation manner, in the above service apparatus provided by the embodiment of the present invention, the combining unit includes:

第一合成子单元,用于采用文本分析器对确定出的所述待合成话音文本进行分词,得到与所述待合成话音文本对应的词语标注文件;a first synthesis subunit, configured to use a text analyzer to perform word segmentation on the determined speech text to be synthesized, and obtain a word annotation file corresponding to the speech text to be synthesized;

第二合成子单元,用于根据所述词语标注文件和预先按照当前接话的客服人员音色建立的语音参数模型库,确定与所述待合成话音文本对应的语音特征参数;The second synthesis subunit is used to determine the speech feature parameters corresponding to the speech text to be synthesized according to the word annotation file and the speech parameter model library established in advance according to the timbre of the customer service staff currently answering the call;

第三合成子单元,用于根据所述语音特征参数,合成所述待合成话音文本的具有所述客服人员音色特征的话音。The third synthesizing subunit is configured to synthesize the voice of the to-be-synthesized voice text with the timbre feature of the customer service personnel according to the voice feature parameter.

在一种可能的实现方式中,在本发明实施例提供的上述服务装置中,所述第二合成子单元,具体用于在预先按照当前接话的客服人员音色建立的语音参数模型库中,查找与所述词语标注文件中各词语对应的语音参数模型;按照各词语对应的语音参数模型,通过参数生成算法确定与所述待合成话音文本对应的基频信息换算log域得到的LF0,非周期成分谱信息在不同频带上的平均值BAP,以及声道谱信息在帧内提取的18维线谱对参数LSP。In a possible implementation manner, in the above-mentioned service device provided by the embodiment of the present invention, the second synthesis subunit is specifically used in the voice parameter model library established in advance according to the voice of the customer service staff currently answering the call, Find the speech parameter model corresponding to each word in the word labeling file; according to the speech parameter model corresponding to each word, determine the LF0 obtained by converting the fundamental frequency information corresponding to the speech text to be synthesized into the log domain by a parameter generation algorithm, and not The average value BAP of the periodic component spectral information in different frequency bands, and the 18-dimensional line spectral pair parameter LSP extracted from the channel spectral information in the frame.

在一种可能的实现方式中,在本发明实施例提供的上述服务装置中,所述第三合成子单元,具体用于采用确定出的所述LF0和所述BAP形成与所述待合成话音文本对应的混合激励源;将确定出的所述混合激励源输入滤波器,并通过确定出的所述LSP对所述滤波器进行控制,合成所述待合成话音文本的具有所述客服人员音色特征的话音。In a possible implementation manner, in the above-mentioned service apparatus provided by the embodiment of the present invention, the third synthesis subunit is specifically configured to use the determined LFO and the BAP to form the speech to be synthesized with the speech to be synthesized The mixed excitation source corresponding to the text; input the determined mixed excitation source into a filter, and control the filter through the determined LSP, and synthesize the to-be-synthesized voice text with the timbre of the customer service staff characteristic voice.

在一种可能的实现方式中,在本发明实施例提供的上述服务装置中,还包括:建模单元,用于分解客服人员的语音数据库中包含的原始语音波形文件,得到所述原始语音波形文件中每个音节的基频信息、非周期成分谱信息和声道谱信息;将每个音节的所述基频信息换算到log域得LF0;将每个音节的所述非周期成分谱信息在预先设定的各频带分别取平均值得到BAP;将每个音节的所述声道谱信息在帧内提取18维线谱对参数LSP;按照所述原始语音波形文件对应的词语标注文件,对每个音节确定出的LF0、BAP和LSP按照隐马尔可夫模型建立语音参数模型;对建立好的各语音参数模型进行模型聚类和模型训练后,得到具有所述客服人员音色的语音参数模型库。In a possible implementation manner, the above-mentioned service device provided in the embodiment of the present invention further includes: a modeling unit, configured to decompose the original voice waveform file contained in the voice database of the customer service personnel to obtain the original voice waveform The fundamental frequency information, aperiodic component spectral information and vocal tract spectral information of each syllable in the file; the fundamental frequency information of each syllable is converted to the log domain to obtain LF0; the aperiodic component spectral information of each syllable is Take the average value of each preset frequency band to obtain the BAP; extract the 18-dimensional line spectrum pair parameter LSP in the frame of the channel spectrum information of each syllable; according to the word annotation file corresponding to the original speech waveform file, The LF0, BAP and LSP determined for each syllable are established according to the Hidden Markov Model to establish a speech parameter model; after model clustering and model training are performed on each established speech parameter model, the speech parameters with the timbre of the customer service personnel are obtained. Model library.

本发明有益效果如下:The beneficial effects of the present invention are as follows:

本发明实施例提供的客服系统的服务方法及装置,包括:接收语音合成指令;根据接收到的语音合成指令,确定待合成话音文本;根据确定出的待合成话音文本,以及预先按照当前接话的客服人员音色建立的语音参数模型库,合成待合成话音文本的具有客服人员音色特征的话音;接收客服人员的指令,并根据指令播放由合成的话音和/或客服人员人工语音组成的语句。由于根据预先按照当前接话的客服人员音色建立的语音参数模型库,得到了待合成话音文本的具有客服人员音色特征的话音,并可以根据客服人员的指令将由合成的话音和/或客服人员人工语音组成的语句播放给用户,因此,可以减少客服人员在人工服务过程中的话语量,降低客服人员的疲劳压力,进而提高了客服系统的服务质量,增强了用户体验。并且,播放给用户的是具有客服人员音色特征的话音,听起来生动形象,使得用户感知不到交互过程中有机器较多的参与,默认为客服人员一直在和其言语交流,因此,进一步提高了客服系统的服务质量,增强了用户体验。The service method and device for a customer service system provided by the embodiments of the present invention include: receiving a speech synthesis instruction; determining the speech text to be synthesized according to the received speech synthesis instruction; The voice parameter model library established by the timbre of the customer service personnel, synthesizes the voice with the characteristics of the timbre of the customer service staff in the voice text to be synthesized; receives the instructions of the customer service staff, and plays the sentences composed of the synthesized voice and/or the artificial voice of the customer service staff according to the instructions. Due to the voice parameter model library established in advance according to the timbre of the customer service staff currently answering the call, the voice with the timbre characteristics of the customer service staff of the voice text to be synthesized is obtained, and the synthesized voice and/or the artificial voice of the customer service staff can be synthesized according to the instructions of the customer service staff. The sentences composed of voice are played to the user, therefore, it can reduce the amount of words spoken by the customer service staff during the manual service process, reduce the fatigue pressure of the customer service staff, thereby improving the service quality of the customer service system and enhancing the user experience. In addition, the voice with the timbre of the customer service staff is played to the user, which sounds vivid and vivid, so that the user does not perceive that there are many machines participating in the interaction process. The default is that the customer service staff has been communicating with them. It improves the service quality of the customer service system and enhances the user experience.

附图说明Description of drawings

图1为本发明实施例提供的客服系统的服务方法的流程图;1 is a flowchart of a service method of a customer service system provided by an embodiment of the present invention;

图2为本发明实施例中合成待合成话音文本的具有客服人员音色特征的话音的流程图;Fig. 2 is the flow chart of synthesizing the voice with the timbre characteristic of the customer service personnel of the voice text to be synthesized in the embodiment of the present invention;

图3为本发明实施例中建立具有客服人员音色的语音参数模型库的流程图;Fig. 3 is the flow chart of establishing the speech parameter model library with the timbre of customer service personnel in the embodiment of the present invention;

图4为本发明实施例提供的客服系统的服务装置的结构示意图;4 is a schematic structural diagram of a service device of a customer service system provided by an embodiment of the present invention;

图5为本发明实施例提供的基于隐马尔可夫模型的参数化语音合成系统框架;5 is a parametric speech synthesis system framework based on a hidden Markov model provided by an embodiment of the present invention;

图6为本发明实施例提供的通过客服系统的服务装置辅助客服人员服务的示意图。FIG. 6 is a schematic diagram of assisting customer service personnel to provide services through a service device of a customer service system according to an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图,对本发明实施例提供的客服系统的服务方法及装置的具体实施方式进行详细地说明。Specific implementations of the service method and device of the customer service system provided by the embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

本发明实施例提供的一种客服系统的服务方法,如图1所示,具体包括以下步骤:A service method of a customer service system provided by an embodiment of the present invention, as shown in FIG. 1 , specifically includes the following steps:

S101、接收语音合成指令;S101. Receive a voice synthesis instruction;

S102、根据接收到的语音合成指令,确定待合成话音文本;S102, according to the received speech synthesis instruction, determine the speech text to be synthesized;

S103、根据确定出的待合成话音文本,以及预先按照当前接话的客服人员音色建立的语音参数模型库,合成待合成话音文本的具有客服人员音色特征的话音;S103, according to the determined voice text to be synthesized, and the voice parameter model library established in advance according to the timbre of the customer service staff currently answering the call, synthesize the voice with the timbre feature of the customer service staff of the voice text to be synthesized;

S104、接收客服人员的指令,并根据指令播放由合成的话音和/或客服人员人工语音组成的语句。S104. Receive an instruction from the customer service personnel, and play a sentence composed of synthesized speech and/or an artificial voice of the customer service personnel according to the instruction.

具体地,在本发明实施例提供的上述服务方法中,由于根据预先按照当前接话的客服人员音色建立的语音参数模型库,得到了待合成话音文本的具有客服人员音色特征的话音,并可以根据客服人员的指令将由合成的话音和/或客服人员人工语音组成的语句播放给用户,因此,可以减少客服人员在人工服务过程中的话语量,降低客服人员的疲劳压力,进而提高了客服系统的服务质量,增强了用户体验。并且,播放给用户的是具有客服人员音色特征的话音,听起来生动形象,使得用户感知不到交互过程中有机器较多的参与,默认为客服人员一直在和其言语交流,因此,进一步提高了客服系统的服务质量,增强了用户体验。Specifically, in the above-mentioned service method provided by the embodiment of the present invention, the voice with the timbre characteristic of the customer service staff of the voice text to be synthesized is obtained according to the voice parameter model library established in advance according to the timbre of the customer service staff currently answering the call, and can be According to the instructions of the customer service staff, the sentence composed of the synthesized voice and/or the artificial voice of the customer service staff is played to the user. Therefore, the amount of speech of the customer service staff during the manual service process can be reduced, the fatigue pressure of the customer service staff can be reduced, and the customer service system can be improved. quality of service and enhanced user experience. In addition, the voice with the timbre of the customer service staff is played to the user, which sounds vivid and vivid, so that the user does not perceive that there are many machines participating in the interaction process. The default is that the customer service staff has been communicating with them. It improves the service quality of the customer service system and enhances the user experience.

在具体实施时,在本发明实施例提供的上述服务方法中,步骤S102根据接收到的语音合成指令,确定待合成话音文本,具体可以通过以下方式实现:During specific implementation, in the above-mentioned service method provided by the embodiment of the present invention, step S102 determines the speech text to be synthesized according to the received speech synthesis instruction, which can be specifically implemented in the following ways:

确定接收到的语音合成指令对应的待合成话音文本是否为标准话术语句;Determine whether the speech text to be synthesized corresponding to the received speech synthesis instruction is a standard vocabulary sentence;

若是,则将语音合成指令对应的标准话术语句确定为待合成话音文本;If so, the standard speech sentence corresponding to the speech synthesis instruction is determined as the speech text to be synthesized;

若否,则将填入语音合成指令携带的文本后的填空式话术语句作为待合成话音文本。If not, the fill-in-the-blank vocabulary phrase after filling in the text carried by the speech synthesis instruction is used as the speech text to be synthesized.

具体地,在本发明实施例提供的上述服务方法中,步骤S102的具体实施方式中的标准话术语句为客服人员在为用户电话服务时用到的一些基本的交流语句,例如:“很高兴为您服务”、“请您输入身份证号码”。并且,在将标准话术语句播放给用户的过程中,若用户和客服人员人员任何一方说话,则可以随时中止语音播放,以保证客服人员与用户之间的良好互动,提高用户体验。Specifically, in the above-mentioned service method provided by the embodiment of the present invention, the standard vocabulary sentences in the specific implementation of step S102 are some basic communication sentences used by the customer service personnel when serving the user's telephone, for example: "I am very happy At your service", "Please enter your ID number". In addition, in the process of playing the standard vocabulary sentence to the user, if either the user or the customer service staff speaks, the voice playback can be stopped at any time, so as to ensure good interaction between the customer service staff and the user and improve the user experience.

具体地,在本发明实施例提供的上述服务方法中,步骤S102的具体实施方式中的填空式话术语句,是需要根据用户的实际消费情况或流量情况进行组句的语句。例如:“您当前的话费余额为XX元”,其中,XX是计费系统中的数据,需要将其填入固定的句式中,再通过个性化语音合成技术进行在线合成话音输出。当然,填空式话术语句,还可以有其他实现方式,例如:仍以“您当前的话费余额为XX元”为例,可以仅将“您当前的话费余额为元”进行语音合成输出,而话费余额“XX”可以由客服人员自己说出,在此不做限定。Specifically, in the above service method provided by the embodiment of the present invention, the fill-in-the-blank vocabulary sentence in the specific implementation of step S102 is a sentence that needs to be grouped according to the actual consumption situation or traffic situation of the user. For example: "Your current call charge balance is XX yuan", where XX is the data in the billing system, which needs to be filled in a fixed sentence pattern, and then the online synthesized voice output is performed through the personalized speech synthesis technology. Of course, there are other ways to implement the fill-in-the-blank vocabulary sentence. For example, still taking "Your current call balance is XX yuan" as an example, you can only output "Your current call balance is yuan" for speech synthesis, and The call balance "XX" can be stated by the customer service staff, which is not limited here.

在具体实施时,在本发明实施例提供的上述服务方法中,步骤S103根据确定出的待合成话音文本,以及预先按照当前接话的客服人员音色建立的语音参数模型库,合成待合成话音文本的具有客服人员音色特征的话音,如图2所示,具体可以包括以下步骤:In specific implementation, in the above service method provided by the embodiment of the present invention, step S103 synthesizes the speech text to be synthesized according to the determined speech text to be synthesized and the speech parameter model library established in advance according to the tone color of the customer service staff currently answering the call The voice with the timbre characteristics of the customer service staff, as shown in Figure 2, can specifically include the following steps:

S201、采用文本分析器对确定出的待合成话音文本进行分词,得到与待合成话音文本对应的词语标注文件;S201, using a text analyzer to perform word segmentation on the determined voice text to be synthesized, to obtain a word annotation file corresponding to the voice text to be synthesized;

S202、根据词语标注文件和预先按照当前接话的客服人员音色建立的语音参数模型库,确定与待合成话音文本对应的语音特征参数;S202, according to the word annotation file and the voice parameter model library established in advance according to the timbre of the customer service staff currently answering the call, determine the voice feature parameters corresponding to the voice text to be synthesized;

S203、根据语音特征参数,合成待合成话音文本的具有客服人员音色特征的话音。S203. According to the speech feature parameters, synthesizing the speech with the timbre characteristic of the customer service personnel of the speech text to be synthesized.

具体地,在本发明实施例提供的上述服务方法中,例如以“很高兴为您服务”为待合成话音文本为例,采用文本分析器可以得到“很”“高”“兴”“为”“您”“服”“务”及其各自对应的标注文件;然后结合标注文件,在预先按照当前接话的客服人员音色建立的语音参数模型库中,可以查找到与“很”“高”“兴”“为”“您”“服”“务”分别对应的语音特征参数;最后,根据查找到的相应的语音特征参数,可以合成出具有客服人员音色特征的话音的“很高兴为您服务”的语音。Specifically, in the above-mentioned service method provided by the embodiment of the present invention, for example, taking "I'm glad to serve you" as the speech text to be synthesized as an example, using a text analyzer can get "very", "high", "happy" and "wei" "You", "Service", "Service" and their corresponding annotation files; then combined with the annotation files, in the voice parameter model library pre-established according to the voice of the customer service staff currently answering the call, you can find the words "very" and "high". "Xing", "Wei", "You", "Service" and "Service" correspond to the speech feature parameters respectively; finally, according to the corresponding speech feature parameters found, the voice "I'm glad for you" can be synthesized with the voice characteristics of the customer service staff. service" voice.

在具体实施时,在本发明实施例提供的上述服务方法中,步骤S202根据词语标注文件和预先按照当前接话的客服人员音色建立的语音参数模型库,确定与待合成话音文本对应的语音特征参数,具体可以通过以下方式实现:During specific implementation, in the above service method provided by the embodiment of the present invention, step S202 determines the speech feature corresponding to the speech text to be synthesized according to the word annotation file and the speech parameter model library established in advance according to the timbre of the customer service staff currently answering the call parameters, which can be implemented in the following ways:

在预先按照当前接话的客服人员音色建立的语音参数模型库中,查找与词语标注文件中各词语对应的语音参数模型;Search for the voice parameter model corresponding to each word in the word annotation file in the voice parameter model library established in advance according to the voice of the customer service staff currently answering the call;

按照各词语对应的语音参数模型,通过参数生成算法确定与待合成话音文本对应的基频信息换算log域得到的LF0,非周期成分谱信息在不同频带上的平均值BAP,以及声道谱信息在帧内提取的18维线谱对参数LSP。According to the speech parameter model corresponding to each word, the parameter generation algorithm is used to determine the LF0 obtained by converting the fundamental frequency information corresponding to the speech text to be synthesized in the log domain, the average value BAP of the aperiodic component spectrum information in different frequency bands, and the channel spectrum information. The 18-dimensional line spectrum extracted within the frame is paired with the parameter LSP.

具体地,在本发明实施例提供的上述服务方法中,为了提高合成的话音的质量,步骤S202的具体实现方式中非周期成分谱信息在不同频带上的平均值BAP,可以是非周期成分谱Ap按照5个频带取平均值得到BAP,其中,5个频带可以分别为0~1000Hz、1000~2000Hz、2000~4000Hz、4000~6000HZ、6000~8000Hz,在此不做限定。Specifically, in the above service method provided by the embodiment of the present invention, in order to improve the quality of the synthesized speech, in the specific implementation of step S202, the average value BAP of the aperiodic component spectrum information in different frequency bands may be the aperiodic component spectrum Ap The BAP is obtained by averaging 5 frequency bands, wherein the 5 frequency bands can be 0~1000Hz, 1000~2000Hz, 2000~4000Hz, 4000~6000Hz, 6000~8000Hz, which are not limited here.

在具体实施时,在本发明实施例提供的上述服务方法中,步骤S203根据语音特征参数,合成待合成话音文本的具有客服人员音色特征的话音,具体可以通过以下方式实现:In specific implementation, in the above-mentioned service method provided by the embodiment of the present invention, step S203, according to the speech feature parameters, synthesizes the voice with the timbre feature of the customer service personnel to be synthesized into the voice text, which can be specifically implemented in the following ways:

采用确定出的LF0和BAP形成与待合成话音文本对应的混合激励源;Using the determined LF0 and BAP to form a mixed excitation source corresponding to the speech text to be synthesized;

将确定出的混合激励源输入滤波器,并通过确定出的LSP对滤波器进行控制,合成待合成话音文本的具有客服人员音色特征的话音。The determined mixed excitation source is input into the filter, and the filter is controlled by the determined LSP to synthesize the voice with the timbre characteristic of the customer service personnel of the voice text to be synthesized.

在具体实施时,在本发明实施例提供的上述服务方法中,还可以包括:通过如下方式建立具有客服人员音色的语音参数模型库,如图3所示:During specific implementation, in the above-mentioned service method provided by the embodiment of the present invention, it may further include: establishing a voice parameter model library with the timbre of the customer service personnel in the following manner, as shown in FIG. 3 :

S301、分解客服人员的语音数据库中包含的原始语音波形文件,得到原始语音波形文件中每个音节的基频信息、非周期成分谱信息和声道谱信息;S301, decompose the original voice waveform file contained in the voice database of the customer service personnel, and obtain fundamental frequency information, aperiodic component spectrum information and channel spectrum information of each syllable in the original voice waveform file;

S302、将每个音节的基频信息换算到log域得LF0;S302, convert the fundamental frequency information of each syllable to log domain to obtain LF0;

S303、将每个音节的非周期成分谱信息在预先设定的各频带分别取平均值得到BAP;其中,预先设定的各频带可以为0~1000Hz、1000~2000Hz、2000~4000Hz、4000~6000HZ、6000~8000Hz,在此不做限定;S303 , taking the average value of the aperiodic component spectrum information of each syllable in each preset frequency band to obtain a BAP; wherein, each preset frequency band can be 0~1000Hz, 1000~2000Hz, 2000~4000Hz, 4000~ 6000HZ, 6000~8000Hz, not limited here;

S304、将每个音节的声道谱信息在帧内提取18维线谱对参数LSP;S304, extract the 18-dimensional line spectrum pair parameter LSP within the frame of the channel spectrum information of each syllable;

S305、按照原始语音波形文件对应的词语标注文件,对每个音节确定出的LF0、BAP和LSP按照隐马尔可夫模型建立语音参数模型;S305, according to the word annotation file corresponding to the original speech waveform file, establish a speech parameter model according to the hidden Markov model for the LF0, BAP and LSP determined for each syllable;

S306、对建立好的各语音参数模型进行模型聚类和模型训练后,得到具有客服人员音色的语音参数模型库。S306 , after performing model clustering and model training on each established speech parameter model, a speech parameter model library having the timbre of the customer service personnel is obtained.

需要说明的是,本发明实施例提供的上述服务方法中的步骤S302-S304的顺序可以互换,并不限于上述描述的先后顺序。It should be noted that, the order of steps S302-S304 in the above service method provided by the embodiment of the present invention may be interchanged, and is not limited to the order described above.

基于同一发明构思,本发明实施例还提供了一种客服系统的服务装置,由于该服务装置解决问题的原理与上述的服务方法相似,因此,该服务装置的实施可以参见上述服务方法的实施,重复之处不再赘述。Based on the same inventive concept, an embodiment of the present invention also provides a service device for a customer service system. Since the principle of the service device for solving problems is similar to the above-mentioned service method, the implementation of the service device can refer to the implementation of the above-mentioned service method. The repetition will not be repeated.

本发明实施例提供的客服系统的服务装置,如图4所示,可以包括:The service device of the customer service system provided by the embodiment of the present invention, as shown in FIG. 4 , may include:

接收单元401,用于接收语音合成指令;a receiving unit 401 for receiving a speech synthesis instruction;

确定单元402,用于根据接收到的语音合成指令,确定待合成话音文本;Determining unit 402, configured to determine the speech text to be synthesized according to the received speech synthesis instruction;

合成单元403,用于根据确定出的待合成话音文本,以及预先按照当前接话的客服人员音色建立的语音参数模型库,合成待合成话音文本的具有客服人员音色特征的话音;The synthesis unit 403 is used to synthesize the voice of the voice text to be synthesized with the timbre feature of the customer service staff according to the determined voice text to be synthesized and the voice parameter model library established in advance according to the timbre of the customer service staff currently answering the call;

播放单元404,用于接收客服人员的指令,并根据指令播放由合成的话音和/或客服人员人工语音组成的语句。The playing unit 404 is configured to receive an instruction from the customer service staff, and play a sentence composed of synthesized speech and/or artificial voice of the customer service staff according to the instruction.

在具体实施时,在本发明实施例提供的上述服务装置中,确定单元402,具体可以用于确定接收到的语音合成指令对应的待合成话音文本是否为标准话术语句;若是,则将语音合成指令对应的标准话术语句确定为待合成话音文本;若否,则将填入语音合成指令携带的文本后的填空式话术语句作为待合成话音文本。During specific implementation, in the above-mentioned service device provided by the embodiment of the present invention, the determining unit 402 may be specifically configured to determine whether the speech text to be synthesized corresponding to the received speech synthesis instruction is a standard speech sentence; The standard speech sentence corresponding to the synthesis instruction is determined as the speech text to be synthesized; if not, the fill-in-the-blank speech speech sentence after filling in the text carried by the speech synthesis instruction is used as the speech text to be synthesized.

在具体实施时,在本发明实施例提供的上述服务装置中,合成单元403,可以包括:During specific implementation, in the above service apparatus provided by the embodiment of the present invention, the combining unit 403 may include:

第一合成子单元4031,用于采用文本分析器对确定出的待合成话音文本进行分词,得到与待合成话音文本对应的词语标注文件;The first synthesizing subunit 4031 is used to perform word segmentation on the determined speech text to be synthesized by using a text analyzer, and obtain a word annotation file corresponding to the speech text to be synthesized;

第二合成子单元4032,用于根据词语标注文件和预先按照当前接话的客服人员音色建立的语音参数模型库,确定与待合成话音文本对应的语音特征参数;The second synthesis subunit 4032 is used to determine the speech feature parameters corresponding to the speech text to be synthesized according to the word annotation file and the speech parameter model library established in advance according to the timbre of the customer service personnel currently answering the call;

第三合成子单元4033,用于根据语音特征参数,合成待合成话音文本的具有客服人员音色特征的话音。The third synthesizing subunit 4033 is configured to synthesize, according to the speech feature parameters, the voice with the timbre characteristic of the customer service staff of the voice text to be synthesized.

在具体实施时,在本发明实施例提供的上述服务装置中,第二合成子单元4032,具体可以用于在预先按照当前接话的客服人员音色建立的语音参数模型库中,查找与词语标注文件中各词语对应的语音参数模型;按照各词语对应的语音参数模型,通过参数生成算法确定与待合成话音文本对应的基频信息换算log域得到的LF0,非周期成分谱信息在不同频带上的平均值BAP,以及声道谱信息在帧内提取的18维线谱对参数LSP。In specific implementation, in the above-mentioned service device provided by the embodiment of the present invention, the second synthesis subunit 4032 can be specifically used for searching and labeling words in the voice parameter model library established in advance according to the voice of the customer service staff currently answering the call The speech parameter model corresponding to each word in the file; according to the speech parameter model corresponding to each word, the LF0 obtained by converting the fundamental frequency information corresponding to the speech text to be synthesized by converting the log domain is determined by the parameter generation algorithm, and the aperiodic component spectral information is in different frequency bands. The average value of BAP, and the 18-dimensional line spectrum pair parameter LSP extracted from the channel spectrum information in the frame.

在具体实施时,在本发明实施例提供的上述服务装置中,第三合成子单元,4033具体可以用于采用确定出的LF0和BAP形成与待合成话音文本对应的混合激励源;将确定出的混合激励源输入滤波器,并通过确定出的LSP对滤波器进行控制,合成待合成话音文本的具有客服人员音色特征的话音。During specific implementation, in the above-mentioned service device provided by the embodiment of the present invention, the third synthesis subunit, 4033 may be specifically configured to use the determined LF0 and BAP to form a mixed excitation source corresponding to the speech text to be synthesized; The mixed excitation source is input into the filter, and the filter is controlled by the determined LSP to synthesize the voice with the timbre characteristic of the customer service personnel of the voice text to be synthesized.

在具体实施时,在本发明实施例提供的上述服务装置中,还可以包括:建模单元405,用于分解客服人员的语音数据库中包含的原始语音波形文件,得到原始语音波形文件中每个音节的基频信息、非周期成分谱信息和声道谱信息;将每个音节的基频信息换算到log域得LF0;将每个音节的非周期成分谱信息在预先设定的各频带分别取平均值得到BAP;将每个音节的声道谱信息在帧内提取18维线谱对参数LSP;按照原始语音波形文件对应的词语标注文件,对每个音节确定出的LF0、BAP和LSP按照隐马尔可夫模型建立语音参数模型;对建立好的各语音参数模型进行模型聚类和模型训练后,得到具有客服人员音色的语音参数模型库。During specific implementation, the above-mentioned service apparatus provided in the embodiment of the present invention may further include: a modeling unit 405, configured to decompose the original voice waveform files contained in the voice database of the customer service personnel, and obtain each of the original voice waveform files. The fundamental frequency information, aperiodic component spectral information and vocal tract spectral information of the syllable; the fundamental frequency information of each syllable is converted to the log domain to obtain LF0; the aperiodic component spectral information of each syllable is divided into preset frequency bands. Take the average value to obtain BAP; extract the 18-dimensional line spectrum pair parameter LSP from the channel spectrum information of each syllable in the frame; according to the word annotation file corresponding to the original speech waveform file, determine the LF0, BAP and LSP for each syllable The speech parameter model is established according to the hidden Markov model; after the model clustering and model training are performed on each established speech parameter model, the speech parameter model library with the timbre of the customer service personnel is obtained.

为了更好地理解本发明的技术方案,本发明提供了上述服务方法中建立具有客服人员音色的语音参数模型库以及合成待合成话音文本的具有客服人员音色特征的话音的具体实施例,即基于隐马尔可夫模型的参数化语音合成系统框架,如图5所示:In order to better understand the technical solutions of the present invention, the present invention provides specific embodiments of establishing a voice parameter model library with the timbre of the customer service personnel and synthesizing the voice text with the timbre characteristics of the customer service personnel in the above-mentioned service method. The parametric speech synthesis system framework of the hidden Markov model is shown in Figure 5:

图5中A部分所示为建立具有客服人员音色的语音参数模型库的具体实施例。目标客服人员的语音数据库包含wav格式的原始语音波形文件,以及相对应的标注文件label。将原始语音波形文件通过自适应加权普内插技术,即STRAIGHT分析技术,有效分解为源信息和声道信息,其中,源信息包括基频F0和非周期成分谱AP,声道信息为声道谱SP。然后,进一步处理将基频F0换算到log域得到LF0;将非周期成分谱Ap按照5个频带取平均值得到BAP,其中,5个频带分别为0~1000Hz、1000~2000Hz、2000~4000Hz、4000~6000HZ、6000~8000Hz;将声道谱sp在帧内提取18维线谱对参数LSP。最后,结合标注文件label对LF0、BAP及LSP的参数组合,进行隐马尔可夫模型建立语音参数模型,然后对建立好的各语音参数模型进行模型聚类和模型训练,循环3次左右得到目标客服人员的语音参数模型。Part A in FIG. 5 shows a specific embodiment of establishing a speech parameter model library with the timbre of the customer service personnel. The voice database of the target customer service personnel contains the original voice waveform file in wav format and the corresponding label file. The original speech waveform file is effectively decomposed into source information and channel information through adaptive weighted general interpolation technology, namely STRAIGHT analysis technology. The source information includes fundamental frequency F0 and aperiodic component spectrum AP, and channel information is channel information. Spectrum SP. Then, further processing converts the fundamental frequency F0 to the log domain to obtain LF0; the aperiodic component spectrum Ap is averaged according to 5 frequency bands to obtain BAP, wherein the 5 frequency bands are 0~1000Hz, 1000~2000Hz, 2000~4000Hz, 4000~6000Hz, 6000~8000Hz; extract the 18-dimensional line spectrum pair parameter LSP from the channel spectrum sp in the frame. Finally, combined with the parameter combination of LF0, BAP and LSP in the label file, the hidden Markov model is used to establish a speech parameter model, and then the model clustering and model training are performed on the established speech parameter models, and the target is obtained in about 3 cycles. Voice parameter model for customer service personnel.

图5中B部分所示为合成待合成话音文本的具有客服人员音色特征的话音的具体实施例。待合成话音的文本通过文本分析器得到合成需要的标注文件label形式,然后,结合图5中A部分得出的目标客服人员的语音参数模型库,找到与待合成话音文本对应的语音特征参数, LF0、BAP以及LSP。最后,采用LF0和BAP形成与待合成话音文本对应的混合激励源;将确定出的混合激励源输入滤波器,并通过确定出的LSP对滤波器进行控制,合成待合成话音文本的具有客服人员音色特征的话音。Part B in FIG. 5 shows a specific embodiment of the voice having the timbre characteristic of the customer service staff to synthesize the voice text to be synthesized. The text of the speech to be synthesized is obtained through the text analyzer to obtain the label form of the annotation file required for synthesis, and then, combined with the speech parameter model library of the target customer service personnel obtained in Part A in Figure 5, the speech feature parameters corresponding to the speech text to be synthesized are found, LF0, BAP and LSP. Finally, use LF0 and BAP to form a mixed excitation source corresponding to the speech text to be synthesized; input the determined mixed excitation source into the filter, and control the filter through the determined LSP to synthesize the speech text to be synthesized. timbre characteristic of the voice.

此外,本发明还提供了客服人员通过上述服务方法及服务装置实现语音服务的具体实施例,如图6所示:In addition, the present invention also provides a specific embodiment in which the customer service personnel realize the voice service through the above-mentioned service method and service device, as shown in FIG. 6 :

客服人员接入用户电话后,可以将标准话术语句和填空式话术语句等待合成话音文本,通过上述服务装置合成具有该客服人员音色的声音,播放给用户。例如 “您好,很高兴为您服务” 这一标准话术语句,通过上述服务装置合成具有该客服人员音色的声音,播放给用户。又如,当用户需要办理或更改业务时,可以通过上述服务装置,生成“请您输入身份证号码” 这个标准话术语句的具有该客服人员音色的声音,播放给用户。并且为保证较好的用户体验,随时中止语音播放。当用户询问话费余额时,需要将计费系统中与当前进行询问的用户对应的话费余额数据XX,填入固定的句式“您当前的话费余额为 元”中,再将填入话费余额XX的语句“您当前的话费余额为XX元”通过上述服务装置合成输出。可见,客服人员只需在根据用户的交流方式需随时调整的回答内容的情形下与用户进行语言交流,比如“好的”,“情况是这样的”这类基本交流的语句;而在以上两种情形中,均可以将具有自己音色特征的话音播放给客户,用户感知到的还是该客服人员正在和其交流,体验效果较好。After the customer service staff accesses the user's phone, they can wait to synthesize voice text with standard vocabulary sentences and fill-in-the-blank vocabulary sentences, and synthesize a voice with the customer service staff's timbre through the above-mentioned service device, and play them to the user. For example, "Hello, I'm glad to serve you" is a standard phraseology sentence, and the voice with the voice of the customer service staff is synthesized by the above-mentioned service device and played to the user. For another example, when the user needs to handle or change the business, the above-mentioned service device can generate a voice with the voice of the customer service staff in the standard phrase "please input your ID number", and play it to the user. And in order to ensure a better user experience, the voice playback can be stopped at any time. When the user inquires about the balance of the call charge, it is necessary to fill in the data XX of the call charge balance corresponding to the user currently inquiring in the billing system into the fixed sentence "Your current call charge balance is Yuan", and then the sentence "Your current call balance is XX Yuan" filled in the call balance XX is synthesized and output by the above service device. It can be seen that the customer service personnel only need to adjust the answer content at any time according to the user's communication method. In the above two cases, the voice with its own timbre characteristics can be played to the customer, and the user can perceive The reason is that the customer service staff is communicating with it, and the experience is better.

本发明实施例提供的上述客服系统的服务方法及装置,包括:接收语音合成指令;根据接收到的语音合成指令,确定待合成话音文本;根据确定出的待合成话音文本,以及预先按照当前接话的客服人员音色建立的语音参数模型库,合成待合成话音文本的具有客服人员音色特征的话音;接收客服人员的指令,并根据指令播放由合成的话音和/或客服人员人工语音组成的语句。由于根据预先按照当前接话的客服人员音色建立的语音参数模型库,得到了待合成话音文本的具有客服人员音色特征的话音,并可以根据客服人员的指令将由合成的话音和/或客服人员人工语音组成的语句播放给用户,因此,可以减少客服人员在人工服务过程中的话语量,降低客服人员的疲劳压力,进而提高了客服系统的服务质量,增强了用户体验。并且,播放给用户的是具有客服人员音色特征的话音,听起来生动形象,使得用户感知不到交互过程中有机器较多的参与,默认为客服人员一直在和其言语交流,因此,进一步提高了客服系统的服务质量,增强了用户体验。The service method and device for the above-mentioned customer service system provided by the embodiments of the present invention include: receiving a speech synthesis instruction; determining the speech text to be synthesized according to the received speech synthesis instruction; The voice parameter model library established by the voice of the customer service staff of the voice, synthesizes the voice with the voice characteristics of the customer service staff to be synthesized into the voice text; receives the instructions of the customer service staff, and plays the sentences composed of the synthesized voice and/or the artificial voice of the customer service staff according to the instructions. . Due to the voice parameter model library established in advance according to the timbre of the customer service staff currently answering the call, the voice with the timbre characteristics of the customer service staff of the voice text to be synthesized is obtained, and the synthesized voice and/or the artificial voice of the customer service staff can be synthesized according to the instructions of the customer service staff. The sentences composed of voice are played to the user, therefore, it can reduce the amount of words spoken by the customer service staff during the manual service process, reduce the fatigue pressure of the customer service staff, thereby improving the service quality of the customer service system and enhancing the user experience. In addition, the voice with the timbre of the customer service staff is played to the user, which sounds vivid and vivid, so that the user does not perceive that there are many machines participating in the interaction process. The default is that the customer service staff has been communicating with them. It improves the service quality of the customer service system and enhances the user experience.

此外,个性化语音合成技术,是一种通过建立目标说话人语音特征模型来合成出目标人说话声音的技术。该技术首先收集有一定音素覆盖性的录音材料,然后提取说话人特点的语音特征,建立目标说话人的特征模型,进而对于任意一段语句文本,可以通过模型生成该文本的语音参数特征,最后通过声码器合成出具有目标说话人特质的该文本的声音。目前的语音合成技术主要为波形拼接语音合成技术和参数化语音合成技术。In addition, personalized speech synthesis technology is a technology for synthesizing the voice of the target speaker by establishing a speech feature model of the target speaker. This technology first collects recording materials with certain phoneme coverage, then extracts the speech features of the speaker's characteristics, establishes the feature model of the target speaker, and then for any sentence text, the speech parameter features of the text can be generated through the model, and finally through the The vocoder synthesizes the voice of the text with the characteristics of the target speaker. The current speech synthesis technologies are mainly waveform splicing speech synthesis technology and parametric speech synthesis technology.

但是,目前语音合成技术在客服领域中只用作语音播报,未广泛在客服领域的其他应用中使用。而在本发明实施例提供的客服系统的服务方法及装置中,开创了语音合成技术在客服领域中的一个新的应用场景,将个性化语音合成技术在客服呼入呼出电话过程中使用,极大减少了客服人员的工作量,进而提高了客服系统的的服务质量和用户体验,有较广阔的应用前景。However, at present, the speech synthesis technology is only used for voice broadcast in the field of customer service, and is not widely used in other applications in the field of customer service. However, in the service method and device of the customer service system provided by the embodiments of the present invention, a new application scenario of the speech synthesis technology in the customer service field is created, and the personalized speech synthesis technology is used in the process of incoming and outgoing calls of the customer service, extremely It greatly reduces the workload of customer service personnel, thereby improving the service quality and user experience of the customer service system, and has broad application prospects.

显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit and scope of the invention. Thus, provided that these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include these modifications and variations.

Claims (10)

1. A service method of a customer service system, comprising:
receiving a voice synthesis instruction;
determining a voice text to be synthesized according to the received voice synthesis instruction;
synthesizing the voice with the tone characteristics of the customer service personnel of the voice text to be synthesized according to the determined voice text to be synthesized and a voice parameter model library established in advance according to the tone of the customer service personnel currently receiving the call;
receiving an instruction of the customer service staff, and playing a sentence consisting of synthesized voice and/or artificial voice of the customer service staff according to the instruction;
the determining a speech text to be synthesized according to the received speech synthesis instruction specifically includes:
determining whether the received voice text to be synthesized corresponding to the voice synthesis instruction is a standard dialect sentence;
if so, determining a standard dialect sentence corresponding to the voice synthesis instruction as the voice text to be synthesized;
and if not, taking the filling-in type speech term sentence filled with the text carried by the speech synthesis instruction as the speech text to be synthesized.
2. The service method according to claim 1, wherein the synthesizing the voice of the voice text to be synthesized with the timbre feature of the customer service person based on the determined voice text to be synthesized and a speech parameter model library established in advance according to the timbre of the customer service person currently called comprises:
adopting a text analyzer to perform word segmentation on the determined voice text to be synthesized to obtain a word annotation file corresponding to the voice text to be synthesized;
determining a voice characteristic parameter corresponding to the voice text to be synthesized according to the word annotation file and a voice parameter model library which is established in advance according to the tone of the customer service personnel receiving the call currently;
and synthesizing the voice with the timbre characteristics of the customer service staff of the voice text to be synthesized according to the voice characteristic parameters.
3. The service method according to claim 2, wherein the determining the speech feature parameters corresponding to the speech text to be synthesized according to the word annotation file and a speech parameter model library pre-established according to the tone of the customer service person currently receiving the call specifically comprises:
searching a voice parameter model corresponding to each word in the word annotation file in a voice parameter model library which is established in advance according to the tone of the customer service staff receiving the call currently;
and according to the speech parameter model corresponding to each word, determining LF0 obtained by converting the fundamental frequency information corresponding to the speech text to be synthesized into a log domain, an average value BAP of the non-periodic component spectrum information on different frequency bands and an 18-dimensional line spectrum pair parameter LSP extracted from the sound channel spectrum information in a frame through a parameter generation algorithm.
4. The service method according to claim 3, wherein the synthesizing the voice of the voice text to be synthesized with the timbre features of the customer service person according to the speech feature parameters specifically comprises:
forming a mixed excitation source corresponding to the speech text to be synthesized by using the determined LF0 and the BAP;
inputting the determined mixed excitation source into a filter, controlling the filter through the determined LSP, and synthesizing the voice of the voice text to be synthesized with the timbre characteristics of the customer service staff.
5. The service method of any one of claims 1-4, further comprising: establishing a voice parameter model library with the tone of the customer service staff by the following method:
decomposing an original voice waveform file contained in a voice database of a customer service staff to obtain fundamental frequency information, non-periodic component spectrum information and vocal tract spectrum information of each syllable in the original voice waveform file;
converting the fundamental frequency information of each syllable into a log domain to obtain LF 0;
respectively averaging the aperiodic component spectrum information of each syllable in each preset frequency band to obtain BAP;
extracting 18-dimensional line spectrum pair parameters LSP from the sound channel spectrum information of each syllable in a frame;
according to the word label file corresponding to the original voice waveform file, establishing a voice parameter model for LF0, BAP and LSP determined for each syllable according to a hidden Markov model;
and carrying out model clustering and model training on each established voice parameter model to obtain a voice parameter model library with the tone of the customer service staff.
6. A service device of a customer service system, comprising:
a receiving unit for receiving a voice synthesis instruction;
the determining unit is used for determining a voice text to be synthesized according to the received voice synthesis instruction;
the synthesis unit is used for synthesizing the voice with the tone characteristics of the customer service staff of the voice text to be synthesized according to the determined voice text to be synthesized and a voice parameter model library established in advance according to the tone of the customer service staff who receives the call currently;
the playing unit is used for receiving the instruction of the customer service staff and playing the sentence consisting of the synthesized voice and/or the artificial voice of the customer service staff according to the instruction;
the determining unit is specifically configured to determine whether a speech text to be synthesized corresponding to the received speech synthesis instruction is a standard conversational sentence; if so, determining a standard dialect sentence corresponding to the voice synthesis instruction as the voice text to be synthesized; and if not, taking the filling-in type speech term sentence filled with the text carried by the speech synthesis instruction as the speech text to be synthesized.
7. The service apparatus of claim 6, wherein the synthesis unit comprises:
the first synthesis subunit is used for performing word segmentation on the determined voice text to be synthesized by adopting a text analyzer to obtain a word annotation file corresponding to the voice text to be synthesized;
the second synthesis subunit is used for determining the voice characteristic parameters corresponding to the voice text to be synthesized according to the word annotation file and a voice parameter model library which is established in advance according to the tone of the customer service personnel receiving the call currently;
and the third synthesis subunit is used for synthesizing the voice of the voice text to be synthesized with the timbre characteristics of the customer service staff according to the voice characteristic parameters.
8. The service device according to claim 7, wherein the second synthesizing subunit is specifically configured to search a speech parameter model corresponding to each term in the term annotation file in a speech parameter model library previously established according to a tone of a currently called customer service person; and according to the speech parameter model corresponding to each word, determining LF0 obtained by converting the fundamental frequency information corresponding to the speech text to be synthesized into a log domain, an average value BAP of the non-periodic component spectrum information on different frequency bands and an 18-dimensional line spectrum pair parameter LSP extracted from the sound channel spectrum information in a frame through a parameter generation algorithm.
9. The service device according to claim 8, wherein the third synthesis subunit is specifically configured to form a hybrid excitation source corresponding to the speech text to be synthesized using the determined LF0 and the BAP; inputting the determined mixed excitation source into a filter, controlling the filter through the determined LSP, and synthesizing the voice of the voice text to be synthesized with the timbre characteristics of the customer service staff.
10. The service apparatus according to any one of claims 6 to 9, further comprising: the modeling unit is used for decomposing an original voice waveform file contained in a voice database of customer service personnel to obtain fundamental frequency information, aperiodic component spectrum information and vocal tract spectrum information of each syllable in the original voice waveform file; converting the fundamental frequency information of each syllable into a log domain to obtain LF 0; respectively averaging the aperiodic component spectrum information of each syllable in each preset frequency band to obtain BAP; extracting 18-dimensional line spectrum pair parameters LSP from the sound channel spectrum information of each syllable in a frame; according to the word label file corresponding to the original voice waveform file, establishing a voice parameter model for LF0, BAP and LSP determined for each syllable according to a hidden Markov model; and carrying out model clustering and model training on each established voice parameter model to obtain a voice parameter model library with the tone of the customer service staff.
CN201611116110.XA 2016-12-07 2016-12-07 A service method and device for a customer service system Active CN108184032B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611116110.XA CN108184032B (en) 2016-12-07 2016-12-07 A service method and device for a customer service system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611116110.XA CN108184032B (en) 2016-12-07 2016-12-07 A service method and device for a customer service system

Publications (2)

Publication Number Publication Date
CN108184032A CN108184032A (en) 2018-06-19
CN108184032B true CN108184032B (en) 2020-02-21

Family

ID=62544670

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611116110.XA Active CN108184032B (en) 2016-12-07 2016-12-07 A service method and device for a customer service system

Country Status (1)

Country Link
CN (1) CN108184032B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109785823B (en) * 2019-01-22 2021-04-02 中财颐和科技发展(北京)有限公司 Speech synthesis method and system
CN109933658B (en) * 2019-03-21 2021-05-11 中国联合网络通信集团有限公司 Customer service call analysis method and device
CN110085209B (en) * 2019-04-11 2021-07-23 广州多益网络股份有限公司 Tone screening method and device
CN110610720B (en) * 2019-09-19 2022-02-25 北京搜狗科技发展有限公司 Data processing method and device and data processing device
CN113808576A (en) * 2020-06-16 2021-12-17 阿里巴巴集团控股有限公司 Voice conversion method, device and computer system
CN111883133B (en) * 2020-07-20 2023-08-29 深圳乐信软件技术有限公司 Customer service voice recognition method, device, server and storage medium
CN112988998B (en) * 2021-03-15 2023-06-16 中国联合网络通信集团有限公司 Response method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1336750A (en) * 2000-07-27 2002-02-20 霈捷科技股份有限公司 Multiplex telephone service system and mechanism
CN102231275A (en) * 2011-06-01 2011-11-02 北京宇音天下科技有限公司 Embedded speech synthesis method based on weighted mixed excitation
CN103065619A (en) * 2012-12-26 2013-04-24 安徽科大讯飞信息科技股份有限公司 Speech synthesis method and speech synthesis system
CN105261355A (en) * 2015-09-02 2016-01-20 百度在线网络技术(北京)有限公司 Voice synthesis method and apparatus
CN105304080A (en) * 2015-09-22 2016-02-03 科大讯飞股份有限公司 Speech synthesis device and speech synthesis method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9042921B2 (en) * 2005-09-21 2015-05-26 Buckyball Mobile Inc. Association of context data with a voice-message component

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1336750A (en) * 2000-07-27 2002-02-20 霈捷科技股份有限公司 Multiplex telephone service system and mechanism
CN102231275A (en) * 2011-06-01 2011-11-02 北京宇音天下科技有限公司 Embedded speech synthesis method based on weighted mixed excitation
CN103065619A (en) * 2012-12-26 2013-04-24 安徽科大讯飞信息科技股份有限公司 Speech synthesis method and speech synthesis system
CN105261355A (en) * 2015-09-02 2016-01-20 百度在线网络技术(北京)有限公司 Voice synthesis method and apparatus
CN105304080A (en) * 2015-09-22 2016-02-03 科大讯飞股份有限公司 Speech synthesis device and speech synthesis method

Also Published As

Publication number Publication date
CN108184032A (en) 2018-06-19

Similar Documents

Publication Publication Date Title
CN108184032B (en) A service method and device for a customer service system
US12300216B2 (en) End-to-end speech conversion
US11848005B2 (en) Voice attribute conversion using speech to speech
JP6113302B2 (en) Audio data transmission method and apparatus
CN111246027A (en) Voice communication system and method for realizing man-machine cooperation
Côté Integral and diagnostic intrusive prediction of speech quality
US20060229873A1 (en) Methods and apparatus for adapting output speech in accordance with context of communication
CN113643684B (en) Speech synthesis method, device, electronic equipment and storage medium
CN113192484B (en) Method, apparatus and storage medium for generating audio based on text
CN116933806B (en) A simultaneous translation system and simultaneous translation terminal
WO2013135167A1 (en) Method, relevant device and system for processing text by mobile terminal
Tanaka et al. A hybrid approach to electrolaryngeal speech enhancement based on noise reduction and statistical excitation generation
US7747440B2 (en) Methods and apparatus for conveying synthetic speech style from a text-to-speech system
CN112349266B (en) Voice editing method and related equipment
CN116597858A (en) Voice mouth shape matching method and device, storage medium and electronic equipment
CN112185341A (en) Dubbing method, apparatus, device and storage medium based on speech synthesis
CN119314462A (en) Multi-module collaborative speech generation method, device, equipment and medium
US20070050188A1 (en) Tone contour transformation of speech
Levinson et al. Speech synthesis in telecommunications
CN109616116B (en) Communication system and communication method thereof
JP2020024522A (en) Information providing apparatus, information providing method and program
US11438457B1 (en) Method and apparatus for coaching call center agents
Dewasurendra et al. Emergency Communication Application for Speech and Hearing-Impaired Citizens
JP2004252085A (en) Voice conversion system and voice conversion program
US20250285640A1 (en) Voice attribute conversion using speech to speech

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 100053 53a, xibianmennei street, Xuanwu District, Beijing

Patentee after: CHINA MOBILE COMMUNICATION LTD., Research Institute

Patentee after: CHINA MOBILE COMMUNICATIONS GROUP Co.,Ltd.

Address before: 100053 53a, xibianmennei street, Xuanwu District, Beijing

Patentee before: CHINA MOBILE COMMUNICATION LTD., Research Institute

Patentee before: CHINA MOBILE COMMUNICATIONS Corp.