CN119276978B - Control method for communication device, and storage medium - Google Patents
Control method for communication device, and storage medium Download PDFInfo
- Publication number
- CN119276978B CN119276978B CN202411796530.1A CN202411796530A CN119276978B CN 119276978 B CN119276978 B CN 119276978B CN 202411796530 A CN202411796530 A CN 202411796530A CN 119276978 B CN119276978 B CN 119276978B
- Authority
- CN
- China
- Prior art keywords
- voice
- telephone
- recognition result
- target
- recording file
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 67
- 238000004891 communication Methods 0.000 title claims abstract description 61
- 230000002159 abnormal effect Effects 0.000 claims abstract description 74
- 238000012545 processing Methods 0.000 claims abstract description 45
- 230000009471 action Effects 0.000 claims abstract description 30
- 238000004590 computer program Methods 0.000 claims description 22
- 230000006870 function Effects 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 12
- 238000005516 engineering process Methods 0.000 claims description 11
- 238000001514 detection method Methods 0.000 claims description 8
- 230000009467 reduction Effects 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 5
- 238000004422 calculation algorithm Methods 0.000 description 10
- 238000012549 training Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000009286 beneficial effect Effects 0.000 description 4
- 238000012217 deletion Methods 0.000 description 3
- 230000037430 deletion Effects 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000013475 authorization Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 206010000117 Abnormal behaviour Diseases 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000012550 audit Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000009429 electrical wiring Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 239000003999 initiator Substances 0.000 description 1
- 238000011068 loading method Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/22—Arrangements for supervision, monitoring or testing
- H04M3/2281—Call monitoring, e.g. for law enforcement purposes; Call tracing; Detection or prevention of malicious calls
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1815—Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/42221—Conversation recording systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/487—Arrangements for providing information services, e.g. recorded voice services or time announcements
- H04M3/493—Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
- H04M3/4936—Speech interaction details
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Signal Processing (AREA)
- Evolutionary Computation (AREA)
- Computer Security & Cryptography (AREA)
- Technology Law (AREA)
- Telephonic Communication Services (AREA)
Abstract
The application discloses a control method of a communication device, the communication device and a storage medium, and relates to the technical field of voice recognition, if the method detects that a voice session is established, voice data corresponding to the voice session is collected through a voice recognition chip, and if a preset keyword is identified in the voice data, generating a recording file corresponding to the voice session, uploading the recording file to a cloud server through the voice gateway, receiving an identification result fed back by the cloud server, and finally executing a processing action corresponding to the identification result. According to the method, the voice data are collected through the voice recognition chip to be primarily processed and judged, and the cloud server is combined to recognize the recording file of the voice conversation, so that the accuracy of abnormal phone recognition is improved, and meanwhile, the processing cost of intelligent voice is reduced.
Description
Technical Field
The present application relates to the field of speech recognition technologies, and in particular, to a method for controlling a call device, and a storage medium.
Background
At present, the traditional scheme for identifying the abnormal telephone marks the telephone numbers with excessive reported times as the abnormal telephone, and stores the marked telephone numbers. When a call dialed by the labeling number is received, the user is reminded of the abnormal call through the label. In this way, when an object of an abnormal call generates a plurality of different virtual numbers by software, it is impossible to identify whether the virtual number is an abnormal call, and the accuracy of identifying an abnormal call is low.
Along with the development and application of the AI technology, an abnormal telephone identification technology based on AI also appears, but intelligent detection of real-time or recording needs to be carried out for each call, so that the calculation cost is high and the processing cost is high.
The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present application and is not intended to represent an admission that the foregoing is prior art.
Disclosure of Invention
The application provides a control method of a call device, the call device and a storage medium, and aims to solve the problems that the accuracy of the related scheme on abnormal telephone identification is low and the identification cost is high.
In order to achieve the above object, the present application provides a control method of a call device, which is applied to a call device, wherein the call device is provided with a voice recognition chip and is in communication connection with a cloud server through a voice gateway, and the control method of the call device comprises the following steps:
If the establishment of the voice session is detected, acquiring voice data corresponding to the voice session through the voice recognition chip;
if a preset keyword is identified in the voice data, generating a recording file corresponding to the voice session;
Uploading the recording file to a cloud server through the voice gateway, and receiving an identification result fed back by the cloud server;
and executing the processing action corresponding to the identification result.
In an embodiment, the step of generating the recording file corresponding to the voice session if the preset keyword is identified in the voice data includes:
noise reduction and enhancement processing are carried out on the collected voice data according to a signal processing module, so that target voice data are obtained;
Detecting statement endpoints in the target voice data based on a voice activation detection technology, and extracting corresponding voice features;
forward reasoning is carried out according to the statement endpoints and the voice characteristics to obtain posterior probability distribution, and keywords in the voice data are determined according to the posterior probability distribution;
and when the keywords are matched with preset keywords, generating a recording file corresponding to the voice conversation.
In an embodiment, when the keyword matches with a preset keyword, the step of generating the recording file corresponding to the voice session includes:
determining an identification result in a decoding search space according to the keywords;
Determining the accuracy of the identification result according to the keywords, and judging that the keywords are matched with preset keywords when the accuracy is higher than a preset threshold;
And starting recording and recording the voice session to generate the recording file.
In an embodiment, the step of executing the processing action corresponding to the identification result includes:
Acquiring a semantic recognition result and an intention recognition result in the recognition results;
Judging whether the semantic recognition result and the intention recognition result are matched or not according to a preset abnormal telephone feature library;
When the semantic recognition result and the intention recognition result are matched with the features in the abnormal telephone feature library, judging that the telephone corresponding to the recognition result is an abnormal telephone;
And executing the processing action corresponding to the abnormal telephone as the identification result.
In an embodiment, the step of executing the processing action corresponding to the recognition result as the abnormal phone includes:
receiving a call termination instruction sent by the voice gateway so as to terminate a call process corresponding to the voice session;
Or receiving a prompt instruction sent by the voice gateway, and starting a voice prompt function according to the prompt instruction;
or acquiring an associated target telephone through the voice gateway, and sending a prompt voice to the target telephone through a call;
And identifying the telephone number corresponding to the voice conversation, determining voiceprint data in the recording file, and storing the telephone number and the voiceprint data in a blacklist library in an associated manner.
In an embodiment, the method further comprises:
Identifying voiceprints of the sound recording file through the voice gateway, and comparing the voiceprints with a prestored voiceprint blacklist library;
when the target voiceprint matched with the voiceprint exists in the voiceprint blacklist library, judging that the target telephone corresponding to the voiceprint is an abnormal telephone;
And executing the processing action corresponding to the abnormal telephone as the judging result.
In an embodiment, after the step of generating the recording file corresponding to the voice session if the preset keyword is identified in the voice data, the method further includes:
Inputting the sound recording file into an imitation sound recognition model, and judging whether the sound recording file is AI synthesized voice according to the imitation sound recognition model;
Extracting tone color parameters of a sound recording file according to the simulated voice recognition model, and comparing the tone color parameters with pre-stored target tone color parameters;
and when the tone color parameters are matched with the target tone color parameters, judging that the recording file is AI synthesized voice, and outputting a corresponding voice prompt.
In an embodiment, after the step of executing, by the voice gateway, the processing action corresponding to the recognition result, the method further includes:
When receiving a misjudgment instruction fed back by a user, determining a misjudgment telephone corresponding to the misjudgment instruction;
inquiring target numbers matched with the misjudged telephones and associated target voiceprint data in a blacklist library;
and deleting the target number and the target voiceprint data in the blacklist library.
In addition, in order to achieve the above object, the present application provides a telephony device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program being configured to implement the steps of the control method of a telephony device as described above.
In order to achieve the above object, the present application provides a storage medium that is a computer-readable storage medium having a computer program stored thereon, the computer program, when executed by a processor, implementing the steps of the method for controlling a telephony device as described above.
The application provides a control method of a communication device, the communication device and a storage medium, wherein if a voice session is detected to be established, voice data corresponding to the voice session is collected through a voice recognition chip, then a recording file corresponding to the voice session is generated if a preset keyword is recognized in the voice data, the recording file is uploaded to a cloud server through a voice gateway, a recognition result fed back by the cloud server is received, and finally a processing action corresponding to the recognition result is executed. According to the voice recognition method, the voice data are collected through the voice recognition chip to be primarily processed and judged, and the cloud server is combined to recognize the recording file of the voice conversation, so that the accuracy of abnormal telephone recognition is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.
Fig. 1 is a flowchart of a first embodiment of a method for controlling a call device according to the present application;
FIG. 2 is a flowchart illustrating a second embodiment of a method for controlling a communication device according to the present application;
FIG. 3 is a flowchart illustrating a third embodiment of a method for controlling a communication device according to the present application;
FIG. 4 is a flowchart illustrating a fourth embodiment of a method for controlling a communication device according to the present application;
Fig. 5 is a schematic architecture diagram of a hardware operating environment of a telephony device according to an embodiment of the present application.
The achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
In order that the above-described aspects may be better understood, exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the application to those skilled in the art.
In order to better understand the above technical solutions, the following detailed description will refer to the accompanying drawings and specific embodiments.
If the establishment of the voice session is detected, acquiring voice data corresponding to the voice session through the voice recognition chip;
if a preset keyword is identified in the voice data, generating a recording file corresponding to the voice session;
Uploading the recording file to a cloud server through the voice gateway, and receiving an identification result fed back by the cloud server;
and executing the processing action corresponding to the identification result.
At present, the traditional scheme for identifying the abnormal telephone marks the telephone numbers with excessive reported times as the abnormal telephone, and stores the marked telephone numbers. When a call dialed by the labeling number is received, the user is reminded of the abnormal call through the label. In this way, when an object of an abnormal call generates a plurality of different virtual numbers by software, it is impossible to identify whether the virtual number is an abnormal call, and the accuracy of identifying an abnormal call is low.
According to the voice recognition method, the voice data are collected through the voice recognition chip to be primarily processed and judged, and the cloud server is combined to recognize the recording file of the voice conversation, so that the accuracy of abnormal telephone recognition is improved.
Example 1
Referring to fig. 1, in a first embodiment, the method for controlling a call device includes the following steps:
And step S10, if the establishment of the voice session is detected, acquiring voice data corresponding to the voice session through the voice recognition chip.
In this embodiment, the processing action is performed by the call device, and the call device is provided with a voice recognition chip and is in communication connection with the cloud server through the voice gateway. When the call device detects that the voice session is established, that is, the call device receives the call request, the call device can be specifically confirmed to receive the call request by detecting the state change of the telephone line. It should be noted that the communication device may be a landline phone, a mobile phone or a telephone watch, or any communication device capable of performing voice communication, which is not limited herein. The voice recognition chip can be arranged in the communication device or on equipment connected with the communication device, and can collect voice data of the communication device during voice conversation in real time.
When the voice session is detected to be established, corresponding acquisition prompt information is fed back to the communication device, namely, acquisition permission for acquiring voice data is requested to a user. The user receives the acquisition prompt information based on the front end of the communication device and clicks the consent or rejection in the front end interface. When the user clicks the agreed control, the voice recognition chip is authorized by the representative user, namely, the voice recognition has the acquisition authority, and at the moment, the voice recognition chip executes the voice data corresponding to the voice conversation. Otherwise, the voice recognition chip cannot execute the voice data corresponding to the collected voice session. Or in another alternative application scenario, the talking device performs the authorizing action corresponding to the guardian of the user. After the communication device is started, the permission is requested to the terminal corresponding to the guardian, and the guardian grants the permission to the communication device, so that a voice chip in the communication device has the permission to collect voice data.
Before the voice data of the communication device are collected through the voice recognition chip, the voice recognition chip built in the communication device is ensured to function normally and is correctly connected to the audio input end of the communication device. It is possible to check whether the driver and algorithm library of the voice recognition chip have been installed and configured correctly. And starting a software part of the voice recognition chip to ensure that the voice recognition chip is in a standby state. Parameters of the speech recognition system, such as sampling frequency, quantization accuracy, number of channels, etc., are then configured to improve the performance of the speech recognition chip.
After confirming that the calling device receives the call request, a voice session is established, so that the calling device detects that the voice session is established. Then, the voice collection function of the voice recognition chip is started, the voice recognition chip starts to receive voice signals from the audio input end of the communication device, and preliminary preprocessing such as denoising, filtering and the like is performed. The voice recognition chip converts the preprocessed voice signals into digital signals, and the digital signals are transmitted to the voice gateway for further processing through an internal bus or an external interface.
In addition, the incoming call number can be identified, whether the incoming call number is a preset number range allowing collection or not is judged, and the incoming call number belongs to the recorded relative number. And otherwise, the voice collection is carried out on the communication for the strange number.
Step S20, if the preset keywords are identified in the voice data, a recording file corresponding to the voice session is generated.
In this embodiment, the voice recognition chip will first receive the collected voice data from the audio input terminal of the call device, and then extract the keywords in the voice data through the voice recognition chip built in the call device, where the voice data is generated by the user based on the call device and the initiator of the call request. The voice data is collected by a voice recognition chip, and keywords in the voice data can be extracted based on the voice recognition chip.
Optionally, in this embodiment, the step of generating the audio file corresponding to the voice session if the preset keyword is identified in the voice data includes:
The method comprises the steps of carrying out noise reduction and enhancement processing on collected voice data according to a signal processing module to obtain target voice data, detecting statement endpoints in the target voice data based on a voice activation detection technology, extracting corresponding voice features, carrying out forward reasoning according to the statement endpoints and the voice features to obtain posterior probability distribution, determining keywords in the voice data according to the posterior probability distribution, and generating a recording file corresponding to a voice session when the keywords are matched with preset keywords.
Specifically, a speech recognition chip built in the communication device is based on a deep neural network algorithm, firstly, speech data is picked up from a microphone of the communication device, and noise reduction, enhancement and sentence endpoint detection processing based on a VAD (Voice Activity Detection, voice activation detection) technology are performed through a signal processing module. And extracting the processed voice characteristics, and carrying out forward reasoning on the characteristic data to obtain posterior probability distribution of phonemes. And finally, determining the keywords according to the posterior probability distribution. The collected voice data is processed by using a digital filter.
Noise reduction and enhancement are performed on the voice data through the signal processing module, and the collected voice data can be filtered by a digital filter to remove background noise. Specifically, a Fast Fourier Transform (FFT) is adopted to convert a voice signal into a frequency domain, filtering processing is carried out on the frequency domain, and then the voice signal is converted back into a time domain through an inverse FFT, so that noise reduction processing of voice data is completed. And performing gain control on the filtered voice data to improve the definition and recognition of the voice signal. Specifically, an adaptive gain control algorithm may be used to dynamically adjust the gain according to real-time changes in the speech signal. And after noise reduction and enhancement treatment, clear target voice data with high identification degree are obtained.
Based on the specific way in which the voice activation detection technique detects the sentence end points in the target speech data, the target speech data is processed by the VAD algorithm, and the active segments (i.e., segments containing speech information) and the inactive segments (i.e., silence or background noise segments) in the speech signal are detected. In the active segment, the starting point and the ending point of the sentence are further detected, and the sentence end points can be detected by adopting an energy threshold method, a zero crossing rate method and other algorithms. And then extracting corresponding voice characteristics from the target voice data according to the detected sentence endpoints, wherein the voice characteristics comprise frequency, energy, zero-crossing rate, mel Frequency Cepstrum Coefficient (MFCC) and the like.
And finally, forward reasoning is carried out according to statement endpoints and voice characteristics to construct a voice model, and the extracted voice characteristics are input into the trained voice model to carry out forward reasoning by utilizing the trained voice model, which can be a convolutional neural network CNN or a cyclic neural network RNN. The model will output a corresponding posterior probability distribution based on the input speech features. Wherein the posterior probability distribution represents the confidence of the model in terms of the input speech features for different text sequences. The output of the model can then be converted into a probability distribution form by a softmax function. And selecting the text sequence with the highest probability from the posterior probability distribution as a recognition result. And determining target keywords in the identified text sequence according to a preset keyword library or rule. In addition, the voice model needs to be trained in advance using a large amount of voice data to learn the mapping relationship between the voice signal and the text.
As an alternative embodiment, the extracted feature vectors may be matched to an acoustic model, and the score of each feature vector in the acoustic feature calculated. And determining the acoustic feature sequence corresponding to the most probable voice signal according to the scoring result. The probabilities of the corresponding phrase sequences of the sound signals are then calculated based on the language model. These models typically take into account information such as co-occurrence probabilities of words, grammar structures, etc. Phrase sequence decoding, namely decoding the phrase sequence according to the acoustic feature sequence and the language model. Dynamic programming algorithms, such as viterbi algorithms, etc., may be employed. And finally, extracting keywords from the phrase sequence obtained by decoding according to a preset keyword extraction algorithm.
Further, in this embodiment, when the keyword is matched with a preset keyword, the step of generating the audio file corresponding to the voice session includes:
determining an identification result in a decoding search space according to the keywords, determining the accuracy of the identification result according to the keywords, judging that the keywords are matched with preset keywords when the accuracy is higher than a preset threshold, starting recording and recording the voice conversation to generate the recording file.
Specifically, after the keyword is extracted, matching the extracted keyword with a preset keyword library. The preset keyword library is defined in advance and comprises specific keywords or phrases which need to trigger a recording function. If the extracted keyword is successfully matched with a certain keyword in a preset keyword library, the system enters the next step, if the matching is failed, recording is not carried out, and the call continues to be normally carried out.
Specifically, when a preset keyword matched with the extracted keyword is queried in the preset keyword library, after the matching is successful, the telephone identification system needs to initialize the recording module, including configuring recording parameters (such as sampling rate, bit rate, etc.), and ensuring that the recording module is in a normal working state. The recording module can be arranged in a voice gateway, the voice gateway is in communication connection with the communication device, and the communication can be recorded through the recording module.
In addition, a new sound recording file needs to be created for the call request to be recorded. This file is typically stored in a designated storage location and assigned a unique file name for subsequent management and lookup. After the initialization of the recording device is completed, the system starts recording voice data corresponding to the call request. During recording, the system will continuously write voice data into a previously created recording file. After the recording is completed, the system saves the recording file to the designated storage location. This storage location may be a local hard disk, network storage, or other type of storage device. To facilitate subsequent searches and management, the system typically names and sorts the sound files. Naming can be based on information such as recording time, call requesting party, keywords, etc., and classification can be performed according to factors such as call type, importance, etc. To prevent the audio files from being lost or damaged, the system can also backup the audio files periodically. The backup may be stored in another physical location or transmitted over a network to a remote server for storage.
It should be noted that, before recording the voice file, the recording permission is also required to be requested to the user corresponding to the calling device. The corresponding request can be fed back to the front end of the communication device and displayed in the interface of the front end, the user agrees to record the voice file based on the input device of the communication device, and the communication device executes the action of recording the voice file after detecting the authorization of the user.
Optionally, in determining whether it matches a preset keyword, a decoder is designed first, which can convert the input audio or text data into a series of keyword combinations or keyword sequences, wherein this decoder can be constructed based on an acoustic model, a language model. And acquiring the extracted keywords, and taking the keywords as the basis of the search space. And decoding the input preset keyword library by using a decoder to generate a search space containing all keyword combinations. At least one evaluation criterion is then determined, including accuracy and recall.
By applying search algorithms such as depth-first search and breadth-first search in the search space, the optimum keyword recognition results are searched according to the set evaluation criteria. And evaluating each identification result obtained in the searching process, namely setting a preset threshold according to actual requirements, wherein the preset threshold is used for judging whether the identification result meets the requirements or not in the preset threshold preservation evaluation standard. Specifically, the recognition result can be compared with a preset threshold, if the recognition result meets the threshold requirement, if the accuracy is higher than a certain value, the recall rate is within a certain range, and the like, the recognition result is regarded as a final result, and then the keyword is judged to be matched with the preset keyword, and further the steps of starting recording and recording the recording file corresponding to the call request are executed.
In addition, the final result can be output to a user interface of the telephone recognition system for further processing, and the user can check the output matching keywords and preset keywords in the user interface, so that corresponding modification information is fed back in the user interface based on the matching conditions of the keywords.
Step S30, uploading the recording file to a cloud server through the voice gateway, and receiving the recognition result fed back by the cloud server.
In this embodiment, when the keyword is matched with a preset keyword, the voice gateway obtains a recorded recording file, and then uploads the recording file to the cloud server. After the cloud server receives the recorded file uploaded by the voice gateway, the recorded file is identified through a large model deployed on the cloud server, and an identification result is fed back to the voice gateway.
Specifically, after completing the call recording, the telephone identification system generates at least one recording file, which is typically stored in a common audio format such as MP3, WAV, etc. Before uploading, the integrity and the readability of the recorded file are checked, whether the file is damaged, whether the format is correct, and the like. In the uploading process, the voice gateway sends the recording file to a designated cloud server address by utilizing network connection, wherein the voice gateway is used as a bridge for connecting local equipment and the cloud server and is responsible for transmitting the recording file to the cloud.
And after the cloud server receives the recording file uploaded by the voice gateway, checking the integrity of the file and checking the format of the file. If the file is damaged or the format is incorrect, the cloud server returns an error message. And the cloud server recognizes the sound recording file by using a deployed voice recognition technology, and generates a corresponding text recognition result. After the recognition is completed, the cloud server returns the recognition result to the voice gateway in a text form.
Optionally, a large language model LLM (Large Language Model ) is associated on the cloud server, the principle of which is mainly based on deep learning and natural language processing technology. The training process of LLM mainly comprises two stages of non-supervision pre-training and supervised fine tuning, wherein the non-supervision pre-training is to train a model by utilizing a large amount of unlabeled text data, so that grammar, semantics and context relation of natural language can be captured. The supervised fine tuning is to fine tune the model for specific tasks (such as text classification, named entity recognition, etc.) after the pre-training is completed. This stage typically requires adding some task-specific layers or parameters on the basis of a pre-trained model and training using annotation data.
As an alternative implementation way of determining the recognition result of the sound recording file through LLM, the LLM server is deployed on the cloud server first, and the LLM server is ensured to be in a normal running state, namely a pre-training language model suitable for semantic and intention recognition is loaded. After receiving the recording file, the recording file is parsed, audio data is converted into text data, and the text data can be converted through an automatic voice recognition technology. And then carrying out semantic and intention recognition on the converted text data by using LLM. LLM can analyze key information in dialogue content, such as request, threat, promise, etc. through understanding the subtle complexity of context, meaning and language, thereby judging the intention of the dialogue, and further determining the intention recognition result and the semantic recognition result.
And S40, executing the processing action corresponding to the identification result.
In this embodiment, after receiving the identification result fed back by the cloud server, whether the phone corresponding to the call request is an abnormal phone is determined through identification. When the telephone identification system judges that the call request of the identification result is an abnormal telephone according to the preset processing logic, corresponding processing actions are executed on the call request.
Before acquiring the telephone number of the caller, the user needs to be requested to acquire the authority of the telephone number. Similarly, the acquisition request can be fed back to the front-end interface corresponding to the communication device, and the user operates the request based on the front-end interface. After the calling device detects the authorization of the user, the telephone number of the calling party is acquired, the user can be a guardian, and the corresponding calling device is a terminal of the guardian.
Specifically, according to the recognition result, searching for keywords or phrases related to abnormal behaviors. And comparing the preset abnormal telephone feature library to further confirm whether the telephone is an abnormal telephone. After the judgment is completed through the telephone recognition system, the judgment result can be sent to a rechecking end for secondary judgment so as to ensure the accuracy of the judgment result. The rechecking end can be used for performing manual rechecking. The manual review may be performed by listening to the audio file, viewing the call record, analyzing the user feedback, etc.
Further, in this embodiment, after the step S40, the method further includes:
When a misjudgment instruction fed back by a user is received, determining a misjudgment telephone corresponding to the misjudgment instruction, inquiring a target number matched with the misjudgment telephone and associated target voiceprint data in a blacklist library, and deleting the target number and the target voiceprint data in the blacklist library.
Specifically, a user feedback channel is established first, a feedback module is arranged in the telephone identification system, the feedback module is associated with a corresponding feedback interface, and a user can feed back corresponding information to the telephone identification system through the feedback interface. When the telephone identification system receives a misjudgment instruction fed back by a user, according to misjudgment instruction information provided by the user, a corresponding misjudgment telephone record is searched from a system log or a database so as to confirm the specific number of the misjudgment telephone. A blacklist library is then accessed that should contain known bad or misjudged telephone numbers and their associated information target numbers and target voiceprint data. And executing query operation in the blacklist library, and searching the target number matched with the query key word and the associated target voiceprint data by using the misjudged telephone as the query key word. After confirming the error, executing the deleting operation, and removing the target number and the associated target voiceprint data from the blacklist library.
In addition, the related information of the deletion operation, such as deletion time, operators, deletion reasons, etc., can be recorded for subsequent audit or inquiry. And automatically update the configuration or status of the associated system to ensure that the misjudged phone is no longer intercepted or handled by mistake. The user is then notified that the misjudgment telephone has been successfully processed and deleted from the blacklist.
In the technical scheme provided by the embodiment, if the establishment of the voice session is detected, voice data corresponding to the voice session is collected through the voice recognition chip, then if a preset keyword is recognized in the voice data, a recording file corresponding to the voice session is generated, the recording file is uploaded to a cloud server through the voice gateway, a recognition result fed back by the cloud server is received, and finally a processing action corresponding to the recognition result is executed. According to the voice recognition method, the voice data are collected through the voice recognition chip to be primarily processed and judged, and the cloud server is combined to recognize the recording file of the voice conversation, so that the accuracy of abnormal telephone recognition is improved.
In the technical scheme provided by the embodiment, by adding a low-cost local voice recognition chip into the telephone recognition system, after suspicious information is recognized through sensitive words, uploading of the sound recording file to the cloud for further semantic and intention recognition is started, so that the cost of abnormal telephone recognition is reduced.
Example two
Referring to fig. 2, in a second embodiment, when it is determined that the phone corresponding to the call request is an abnormal phone according to the identification result, the step of executing the corresponding processing action includes:
and S50, acquiring a semantic recognition result and an intention recognition result in the recognition results.
And step 60, judging whether the semantic recognition result and the intention recognition result are matched or not according to a preset abnormal telephone feature library.
And step S70, when the semantic recognition result and the intention recognition result are matched with the features in the abnormal telephone feature library, judging that the telephone corresponding to the recognition result is an abnormal telephone.
In this embodiment, the identification result is fed back by the cloud server, and the cloud server determines the identification result of the recording file according to the deployed LLM model and feeds back the identification result to the phone identification system. The specific identification process is described in the first embodiment, and will not be described herein. After the telephone recognition system receives the recognition result, the semantic recognition result and the intention recognition result are obtained. And then, inquiring whether the matched abnormal telephone features exist in a preset abnormal telephone feature library by taking the semantic recognition result and the intention recognition result as keywords, and if so, judging that the telephone corresponding to the recognition result is an abnormal telephone.
Specifically, the recognition result is received and analyzed from the voice recognition system, so that the recognition result is ensured to contain two parts, namely a semantic recognition result and an intention recognition result. The semantic recognition result is content after voice is converted into text, specific language information in a call is reflected, and the intention recognition result is analysis of user intention in the voice content, such as service request, inquiry information, transaction and the like. An abnormal phone feature library is then obtained, which should contain semantic features and intent features of various abnormal phones. Semantic features may include common abnormal utterances, keywords, phrases, etc., and intent features include intent to obtain personal information, demand transfers, false service commitments, etc. And then, the semantic recognition result is compared with semantic features in the abnormal telephone feature library, and the semantic recognition result can be realized through a text matching algorithm (such as keyword matching, regular expression matching and the like). Meanwhile, the intention recognition result is matched with the intention features in the feature library, and a pre-trained model can be used for matching.
During the matching process, different matching weights may be set, for example, for highly suspicious semantic and intent features, a lower threshold may be set to trigger an alarm. If the semantic recognition result and the intention recognition result are matched with the features in the abnormal telephone feature library, namely the preset matching weight is reached, judging that the telephone corresponding to the recognition result is an abnormal telephone. If the matching result does not match completely or does not reach the preset matching standard, a final judgment can be made through a further manual checking step.
And S80, executing the processing action corresponding to the abnormal telephone as the identification result.
In this embodiment, when it is determined that the phone corresponding to the identification result is an abnormal phone, a corresponding instruction is triggered, so as to execute a preset action associated with the instruction.
Optionally, in this embodiment, the step S80 includes:
the voice gateway receives a call termination instruction sent by the voice gateway to terminate a call process corresponding to the voice session, or receives a prompt instruction sent by the voice gateway to start a voice prompt function according to the prompt instruction, or obtains an associated target telephone through the voice gateway and sends prompt voice to the target telephone through a call, identifies a telephone number corresponding to the voice session, determines voiceprint data in the recording file, and stores the telephone number and the voiceprint data in a blacklist library in an associated mode.
Specifically, when the telephone corresponding to the call request is determined to be an abnormal telephone, a call termination instruction can be sent to the call device through the voice gateway so that the call device terminates the call process corresponding to the call request, or a prompt instruction can be sent to the call device through the voice gateway so that the call device starts a voice prompt function according to the instruction and reminds the user who is talking that the telephone is the abnormal telephone. Or the target telephone associated with the communication device can be obtained through the voice gateway, and prompt voice is sent to the target telephone through a call, wherein the associated target telephone can be the telephone number of the family member of the user. And identifying the telephone number corresponding to the call request, determining voiceprint data in the sound recording file, and storing the telephone number and the voiceprint data in a blacklist library in an associated manner.
In the technical scheme provided by the embodiment, by acquiring the semantic recognition result and the intention recognition result in the recognition result, then judging whether the semantic recognition result and the intention recognition result are matched according to a preset abnormal telephone feature library, and finally judging that the telephone corresponding to the recognition result is an abnormal telephone and executing a corresponding processing action when the semantic recognition result and the intention recognition result are matched with the features in the abnormal telephone feature library. According to the method and the device, whether the feature is matched with the feature in the abnormal telephone feature library or not is judged by receiving the identification result of the cloud server, so that whether the abnormal telephone is judged, and the accuracy of identifying the abnormal telephone is improved.
Example III
Referring to fig. 3, in a third embodiment, the method further includes:
and step S90, identifying the voiceprint of the recording file through the voice gateway, and comparing the voiceprint with a pre-stored voiceprint blacklist library.
And step 100, when the target voiceprint matched with the voiceprint exists in the voiceprint blacklist library, judging that the target telephone corresponding to the voiceprint is an abnormal telephone.
And S110, executing the processing action corresponding to the abnormal telephone as the judging result.
In this embodiment, the processing actions are performed by the voice gateway. After the voiceprints of the identified abnormal telephones are stored in the voiceprint blacklist library, judging whether the telephone of the call request is the abnormal telephone, and firstly identifying the voiceprints of the recording file and comparing the voiceprints with the prestored voiceprint blacklist library. Determining whether a target voiceprint matched with the voiceprint exists in the voiceprint blacklist library, and when the target voiceprint exists, judging that the target telephone corresponding to the voiceprint is an abnormal telephone, and executing a processing action corresponding to the abnormal telephone as a judging result, wherein the processing action is referred to in the second embodiment and is not described in detail.
Specifically, a voice paragraph to be identified is extracted from the sound recording file, and sound features capable of representing the identity of the individual, such as mel-frequency cepstrum coefficients, perceptual linear prediction and the like, are extracted from the voice paragraph. And training and optimizing the extracted sound characteristics by using a machine learning algorithm to obtain voiceprints. And comparing the voiceprint with the voiceprint blacklist library to find out the same target voiceprint.
In the technical scheme provided by the embodiment, by identifying the voiceprint of the sound recording file and comparing the voiceprint with a prestored voiceprint blacklist library, when a target voiceprint matched with the voiceprint exists in the voiceprint blacklist library, determining that the target telephone corresponding to the voiceprint is an abnormal telephone, and executing a corresponding processing action. And when the recorded sound recording file is received again, whether the sound recording file is an abnormal telephone is judged directly through the local voiceprint blacklist library, so that the recognition efficiency of the abnormal telephone is improved.
Example IV
Referring to fig. 4, in a fourth embodiment, after the step of uploading the recording file to a cloud server through the voice gateway and receiving the recognition result fed back by the cloud server, the method further includes:
and step S120, inputting the sound recording file into a sound-imitating recognition model, and judging whether the sound recording file is AI synthesized voice according to the sound-imitating recognition model.
And S130, extracting tone color parameters of the sound recording file according to the voice-imitating recognition model, and comparing the tone color parameters with pre-stored target tone color parameters.
And step 140, judging that the recording file is AI synthesized voice when the tone color parameters are matched with the target tone color parameters, and outputting a corresponding voice prompt.
In this embodiment, the format of the audio file is ensured to be compatible with the voice-like recognition model. If not, the format of the audio record is converted into the format required by the model. Loading the trained simulated voice recognition model, and inputting the preprocessed sound recording file into the simulated voice recognition model. The model automatically extracts tone color parameters from the sound recording file, wherein the tone color parameters comprise tone quality, pitch, tone intensity and the like. The model compares the extracted tone color parameters with pre-stored target tone color parameters, and judges whether the recording file is synthesized voice, wherein the target tone color parameters are extracted from human voice in advance. And comparing the tone color parameters with pre-stored target tone color parameters by the imitation sound identification model, and judging that the recording file is AI synthesized voice when the tone color parameters are matched with the target tone color parameters. If the judgment result is the synthesized voice, preparing corresponding voice prompt contents. If the recording is synthesized voice, the recording should be handled carefully. "the prepared voice prompt contents are converted into voice signals using voice synthesis technology. The generated voice prompt signal is transmitted to the communication device through a communication interface (such as a telephone line, a network interface, etc.). After receiving the voice prompt signal, the communication device automatically plays the voice prompt signal for the user to listen.
In the technical scheme provided by the embodiment, the voice record file is input into the voice-simulating recognition model, whether the voice record file is synthesized voice or not is judged according to the voice-simulating recognition model, then a judgment result output by the voice-simulating recognition model is obtained, and if the judgment result is synthesized voice, a corresponding voice prompt is fed back to the calling device. According to the method and the device, whether the sound is sent by the robot or not can be judged through the simulated sound recognition model, so that the recognition accuracy of the abnormal telephone is improved.
Since the system described in the embodiments of the present application is a system used for implementing the method of the embodiments of the present application, based on the method described in the embodiments of the present application, a person skilled in the art can understand the specific structure and the modification of the system, and therefore, the description thereof is omitted herein. All systems used in the method of the embodiment of the application are within the scope of the application.
The application provides a call device which comprises at least one processor and a memory in communication connection with the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor can execute the control method of the call device in the first embodiment.
Referring now to fig. 5, a schematic diagram of a telephony device suitable for use in implementing embodiments of the present application is shown. The telephony device in the embodiment of the present application may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (Personal DIGITAL ASSISTANT: personal digital assistant), a PAD (Portable Application Description: tablet computer), a PMP (Portable MEDIA PLAYER: portable multimedia player), a car-mounted terminal (e.g., car navigation terminal), etc., a fixed terminal such as a digital TV, a desktop computer, etc. The call apparatus shown in fig. 5 is only an example, and should not be construed as limiting the functionality and scope of use of the embodiments of the present application.
As shown in fig. 5, the telephony device may include a processing device 1001 (e.g., a core processor, a graphics processor, etc.) that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage device 1003 into a random access Memory (RAM: random Access Memory) 1004. In the RAM1004, various programs and data necessary for the operation of the call device are also stored. The processing device 1001, the ROM1002, and the RAM1004 are connected to each other by a bus 1005. An input/output (I/O) interface 1006 is also connected to the bus. In general, a system including an input device 1007 such as a touch screen, a touch pad, a keyboard, a mouse, an image sensor, a microphone, an accelerometer, a gyroscope, etc., an output device 1008 including a Liquid crystal display (LCD: liquid CRYSTAL DISPLAY), a speaker, a vibrator, etc., a storage device 1003 including a magnetic tape, a hard disk, etc., and a communication device 1009 may be connected to the I/O interface 1006. The communication means 1009 may allow the telephony device to communicate with other devices wirelessly or by wire to exchange data. Although a call apparatus having various systems is shown in the figures, it should be understood that not all of the illustrated systems are required to be implemented or provided. More or fewer systems may alternatively be implemented or provided.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through a communication device, or installed from the storage device 1003, or installed from the ROM 1002. The above-described functions defined in the method of the disclosed embodiment of the application are performed when the computer program is executed by the processing device 1001.
The communication device provided by the application adopts the control method of the communication device in the embodiment, and can solve the technical problem that the related scheme has lower accuracy in identifying abnormal telephones. Compared with the prior art, the beneficial effects of the communication device provided by the application are the same as those of the control method of the communication device provided by the embodiment, and other technical features of the communication device are the same as those disclosed by the method of the previous embodiment, and are not repeated herein.
It is to be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof. In the description of the above embodiments, particular features, structures, materials, or characteristics may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
The present application provides a computer-readable storage medium having computer-readable program instructions (i.e., a computer program) stored thereon for executing the control method of the telephony device in the above-described embodiment.
The computer readable storage medium provided by the present application may be, for example, a U disk, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or device, or a combination of any of the foregoing. More specific examples of a computer-readable storage medium may include, but are not limited to, an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access Memory (RAM: random Access Memory), a Read-Only Memory (ROM), an erasable programmable Read-Only Memory (EPROM: erasable Programmable Read Only Memory or flash Memory), an optical fiber, a portable compact disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this embodiment, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to electrical wiring, fiber optic cable, RF (Radio Frequency) and the like, or any suitable combination of the foregoing.
The computer readable storage medium may be included in the communication device or may exist alone without being incorporated in the communication device.
The computer readable storage medium carries one or more programs, and when the one or more programs are executed by a communication device, the communication device is enabled to collect voice data corresponding to a voice session through the voice recognition chip if the voice session is detected to be established, generate a recording file corresponding to the voice session if a preset keyword is recognized in the voice data, upload the recording file to a cloud server through the voice gateway, receive a recognition result fed back by the cloud server, and execute a processing action corresponding to the recognition result.
Computer program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of remote computers, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN: local Area Network) or a wide area network (WAN: wide Area Network), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules involved in the embodiments of the present application may be implemented in software or in hardware. Wherein the name of the module does not constitute a limitation of the unit itself in some cases.
The readable storage medium provided by the application is a computer readable storage medium, and the computer readable storage medium stores computer readable program instructions (i.e. a computer program) for executing the control method of the communication device, so that the technical problem that the accuracy of the related scheme on abnormal telephone identification is low can be solved. Compared with the prior art, the beneficial effects of the computer readable storage medium provided by the application are the same as those of the control method of the communication device provided by the above embodiment, and are not described herein.
An embodiment of the present application provides a computer program product, including a computer program, where the computer program when executed by a processor implements the steps of the method for controlling a telephony device as described above.
The computer program product provided by the application can solve the technical problem that the accuracy of the related scheme for identifying the abnormal telephone is low. Compared with the prior art, the beneficial effects of the computer program product provided by the embodiment of the present application are the same as the beneficial effects of the control method of the communication device provided by the above embodiment, and are not described herein.
The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the application, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein, or any application, directly or indirectly, within the scope of the application.
Claims (7)
1. The control method of the communication device is characterized by being applied to the communication device, wherein the communication device is provided with a voice recognition chip and is in communication connection with a cloud server through a voice gateway, and the control method of the communication device comprises the following steps:
If the establishment of the voice session is detected, acquiring voice data corresponding to the voice session through the voice recognition chip;
if a keyword matched with a preset keyword is identified in the voice data, determining an identification result in a decoding search space according to the keyword;
Determining the accuracy of the identification result according to the keywords, and judging that the keywords are matched with preset keywords when the accuracy is higher than a preset threshold;
Starting recording and recording the voice session to generate a recording file;
Uploading the recording file to a cloud server through the voice gateway, and receiving a recognition result fed back by the cloud server, wherein the recognition result comprises a semantic recognition result and an intention recognition result;
Acquiring a semantic recognition result and an intention recognition result in the recognition results;
Judging whether the semantic recognition result and the intention recognition result are matched or not according to a preset abnormal telephone feature library;
When the semantic recognition result and the intention recognition result are matched with the features in the abnormal telephone feature library, judging that the telephone corresponding to the recognition result is an abnormal telephone;
executing the processing action corresponding to the abnormal telephone as the identification result;
Identifying a telephone number corresponding to the voice conversation, determining voiceprint data in the recording file, and storing the telephone number and the voiceprint data in a blacklist library in an associated manner;
Inputting the sound recording file into an imitation sound recognition model, and judging whether the sound recording file is AI synthesized voice according to the imitation sound recognition model;
Extracting tone color parameters of a sound recording file according to the simulated voice recognition model, and comparing the tone color parameters with pre-stored target tone color parameters;
and when the tone color parameters are matched with the target tone color parameters, judging that the recording file is AI synthesized voice, and outputting a corresponding voice prompt.
2. The method of claim 1, wherein the step of if a keyword matching a preset keyword is identified in the voice data comprises:
the voice recognition chip performs noise reduction and enhancement processing on the collected voice data according to a signal processing module to obtain target voice data;
Detecting statement endpoints in the target voice data based on a voice activation detection technology, and extracting corresponding voice features;
and carrying out forward reasoning according to the statement endpoints and the voice characteristics to obtain posterior probability distribution, and determining keywords in the voice data according to the posterior probability distribution.
3. The method of claim 1, wherein the step of performing the processing action corresponding to the recognition result being an abnormal phone call comprises:
receiving a call termination instruction sent by the voice gateway so as to terminate a call process corresponding to the voice session;
Or receiving a prompt instruction sent by the voice gateway, and starting a voice prompt function according to the prompt instruction;
Or acquiring the associated target telephone through the voice gateway, and sending the prompt voice to the target telephone through a call.
4. The method of claim 1, wherein the method further comprises:
Identifying voiceprints of the sound recording file through the voice gateway, and comparing the voiceprints with a prestored voiceprint blacklist library;
when the target voiceprint matched with the voiceprint exists in the voiceprint blacklist library, judging that the target telephone corresponding to the voiceprint is an abnormal telephone;
And executing the processing action corresponding to the abnormal telephone as the judging result.
5. The method of claim 1, further comprising, after the step of performing the processing action corresponding to the recognition result being an abnormal phone:
When receiving a misjudgment instruction fed back by a user, determining a misjudgment telephone corresponding to the misjudgment instruction;
inquiring target numbers matched with the misjudged telephones and associated target voiceprint data in a blacklist library;
and deleting the target number and the target voiceprint data in the blacklist library.
6. A telephony device comprising a memory, a processor and a computer program stored on the memory and operable on the processor, the computer program being configured to implement the steps of the method of controlling a telephony device as claimed in any one of claims 1 to 5.
7. A storage medium, characterized in that the storage medium is a computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, realizes the steps of the method of controlling a telephony device according to any one of claims 1 to 5.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411796530.1A CN119276978B (en) | 2024-12-09 | 2024-12-09 | Control method for communication device, and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411796530.1A CN119276978B (en) | 2024-12-09 | 2024-12-09 | Control method for communication device, and storage medium |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN119276978A CN119276978A (en) | 2025-01-07 |
| CN119276978B true CN119276978B (en) | 2025-04-11 |
Family
ID=94115163
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202411796530.1A Active CN119276978B (en) | 2024-12-09 | 2024-12-09 | Control method for communication device, and storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN119276978B (en) |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107147813A (en) * | 2017-05-25 | 2017-09-08 | 广东工业大学 | Method and device for preventing telecommunication fraud |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103458412B (en) * | 2012-06-04 | 2017-03-15 | 百度在线网络技术(北京)有限公司 | Prevent system, method and mobile terminal, the high in the clouds Analysis server of telephone fraud |
| CN106686191A (en) * | 2015-11-06 | 2017-05-17 | 北京奇虎科技有限公司 | A processing method and system for adaptively identifying harassing calls |
| CN106303058A (en) * | 2016-08-24 | 2017-01-04 | 成都中英锐达科技有限公司 | Anti-swindle audio recognition method and system |
| CN113257250A (en) * | 2021-05-11 | 2021-08-13 | 歌尔股份有限公司 | Fraud behavior detection method, device and storage medium |
-
2024
- 2024-12-09 CN CN202411796530.1A patent/CN119276978B/en active Active
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107147813A (en) * | 2017-05-25 | 2017-09-08 | 广东工业大学 | Method and device for preventing telecommunication fraud |
Also Published As
| Publication number | Publication date |
|---|---|
| CN119276978A (en) | 2025-01-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111028827B (en) | Interaction processing method, device, equipment and storage medium based on emotion recognition | |
| US10706848B1 (en) | Anomaly detection for voice controlled devices | |
| US7668710B2 (en) | Determining voice recognition accuracy in a voice recognition system | |
| CN110047481B (en) | Method and apparatus for speech recognition | |
| US10887764B1 (en) | Audio verification | |
| CN109192202B (en) | Voice safety recognition method, device, computer equipment and storage medium | |
| US12223950B2 (en) | Detecting conversations with computing devices | |
| WO2022105861A1 (en) | Method and apparatus for recognizing voice, electronic device and medium | |
| WO2020228173A1 (en) | Illegal speech detection method, apparatus and device and computer-readable storage medium | |
| CN109712610A (en) | The method and apparatus of voice for identification | |
| US20230401338A1 (en) | Method for detecting an audio adversarial attack with respect to a voice input processed by an automatic speech recognition system, corresponding device, computer program product and computer-readable carrier medium | |
| JP7230806B2 (en) | Information processing device and information processing method | |
| US20250166628A1 (en) | Digital Signal Processor-Based Continued Conversation | |
| CN112087726B (en) | Method and system for identifying polyphonic ringtone, electronic equipment and storage medium | |
| CN113779208A (en) | Method and apparatus for human-machine dialogue | |
| US10866948B2 (en) | Address book management apparatus using speech recognition, vehicle, system and method thereof | |
| CN113889091A (en) | Voice recognition method and device, computer readable storage medium and electronic equipment | |
| CN111768789A (en) | Electronic equipment and method, device and medium for determining identity of voice sender thereof | |
| CN119276978B (en) | Control method for communication device, and storage medium | |
| CN112863496B (en) | Voice endpoint detection method and device | |
| CN109064720B (en) | Position prompting method and device, storage medium and electronic equipment | |
| CN118230768A (en) | Voice quality inspection method and device, electronic equipment and storage medium | |
| CN114155845A (en) | Service determination method, device, electronic device and storage medium | |
| CN108989551A (en) | Position indicating method, device, storage medium and electronic equipment | |
| CN114387968A (en) | Voice unlocking method and device, electronic equipment and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |