CN118072741A - Method, device and equipment for identifying AI clone sound - Google Patents
Method, device and equipment for identifying AI clone sound Download PDFInfo
- Publication number
- CN118072741A CN118072741A CN202410139903.1A CN202410139903A CN118072741A CN 118072741 A CN118072741 A CN 118072741A CN 202410139903 A CN202410139903 A CN 202410139903A CN 118072741 A CN118072741 A CN 118072741A
- Authority
- CN
- China
- Prior art keywords
- voiceprint
- matching
- voice information
- sound
- judging
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Telephonic Communication Services (AREA)
Abstract
The invention discloses a method, a device and equipment for identifying AI cloning voice, wherein the method comprises the following steps: acquiring voice information to be identified, and performing voiceprint extraction on the voice information to obtain voiceprint characteristics to be detected; matching the voiceprint features to be detected with the pre-stored original voiceprint features, and judging whether the matching is successful or not; if the matching is successful, judging that the voice information to be recognized is not the AI cloning sound; if the matching fails, the voice information to be recognized is judged to be AI cloning sound, and the judging result is sent to the user terminal. The embodiment of the invention can realize the recognition of whether a section of voice or an ongoing voice call involves the use of an AI voiceprint cloning technology, and when the result analyzed by the model is suspected use, the analysis result can be sent to a user, so that the occurrence of fraud is prevented, the risk of fraud is avoided, the lawless persons are prevented from collecting the voiceprints of the individual user, and the user is prevented from forging and cloning, thereby causing the property loss of the user.
Description
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method, an apparatus, and a device for identifying an AI clone sound.
Background
In recent years, technology has rapidly developed, and the drawbacks of technology have gradually emerged. When watching short videos on a network, a phenomenon is encountered, and a section of common video presents dubbing with different roles. The AI voiceprint cloning technology is widely applied to life and entertainment of the public and is stared by lawless persons.
Recently, an AI voiceprint cloning technology is applied to realize sound detection of a cracked bank and operate a target bank account. With the continuous decline of the application threshold of the AI technology, lawbreakers mainly use AI speech synthesis technology to leave speech for victims or impersonate friends to call them so as to intercept car accidents, robbers and other interfaces to decoy the victims to transfer accounts.
In the prior art, if lawbreakers use AI clone sound to perform fraud on users, the users cannot identify the AI clone sound, so that property loss is caused.
Accordingly, the prior art is still in need of improvement and development.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, the present invention aims to provide a method, device and equipment for identifying an AI clone sound, which aims to solve the technical problem that if the prior art has lawless persons to use the AI clone sound to perform fraud on users, the users cannot identify the AI clone sound, thereby causing property loss.
The technical scheme of the invention is as follows:
a method of identifying AI clone voices, the method comprising:
acquiring voice information to be identified, and performing voiceprint extraction on the voice information to obtain voiceprint characteristics to be detected;
matching the voiceprint features to be detected with the pre-stored original voiceprint features, and judging whether the matching is successful or not;
If the matching is successful, judging that the voice information to be recognized is not the AI cloning sound;
if the matching fails, the voice information to be recognized is judged to be AI cloning sound, and the judging result is sent to the user terminal.
Further, the step of obtaining the voice information to be identified, and performing voiceprint extraction on the voice information, before obtaining the voiceprint feature to be detected, includes:
A random audio file or a special statement recording file corresponding to a random statement uploaded by a user is obtained in advance, and voiceprint extraction is carried out on the random audio file or the special statement recording file to obtain original voiceprint characteristics of the user;
Storing the original voiceprint features.
Further preferably, the voice print extracting the random audio file or the special sentence recording file to obtain an original voice print feature of the user includes:
And carrying out noise elimination, signal enhancement and feature extraction on the random audio file or the special statement recording file in sequence to obtain the original voiceprint features of the user.
Further preferably, the matching the voiceprint feature to be detected with the pre-stored original voiceprint feature, and determining whether the matching is successful includes:
and calculating the similarity between the voiceprint features to be detected and the pre-stored original voiceprint features, and judging whether the matching is successful or not according to the similarity.
Preferably, the determining whether the matching is successful according to the similarity includes:
If the similarity is greater than or equal to a preset similarity threshold, judging that the matching is successful;
If the similarity is smaller than a preset similarity threshold, judging that the matching fails.
Further, if the matching fails, determining that the voice information to be recognized is an AI cloned sound, and sending a determination result to the user terminal, including:
if the matching fails, the voice information to be identified is judged to be AI cloning sound, and the judging result is sent to the user terminal through one or more of short message, communication or WeChat modes.
Further, the step of obtaining the voice information to be identified, and performing voiceprint extraction on the voice information to obtain voiceprint features to be detected, further includes:
encrypting the voice information to be identified to obtain encrypted voice information;
encrypting the voiceprint features to be detected to obtain encrypted voiceprint features.
Another embodiment of the present invention provides an apparatus for recognizing an AI clone sound, the apparatus comprising:
The voiceprint extraction module is used for acquiring voice information to be identified, and carrying out voiceprint extraction on the voice information to obtain voiceprint characteristics to be detected;
The feature matching module is used for matching the voiceprint features to be detected with the original voiceprint features stored in advance and judging whether the matching is successful or not;
the first judging module is used for judging that the voice information to be recognized is not the AI cloning sound if the matching is successful;
And the second judging module is used for judging that the voice information to be recognized is the AI cloning sound if the matching fails and sending the judging result to the user terminal.
Another embodiment of the present invention provides an apparatus for identifying AI clone sound, the apparatus including at least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of identifying AI clone sound described above.
Another embodiment of the present invention also provides a non-transitory computer-readable storage medium storing computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform the above-described method of identifying AI clone sound.
The beneficial effects are that: the embodiment of the invention can realize the recognition of whether a section of voice or an ongoing voice call involves the use of an AI voiceprint cloning technology, and when the result analyzed by the model is suspected use, the analysis result can be sent to a user, so that the occurrence of fraud is prevented, the risk of fraud is avoided, the lawless persons are prevented from collecting the voiceprints of the individual user, and the user is prevented from forging and cloning, thereby causing the property loss of the user.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a flow chart of a preferred embodiment of a method for identifying AI clone voices according to the present invention;
FIG. 2 is a schematic functional block diagram of an apparatus for recognizing AI clone sound according to a preferred embodiment of the present invention;
fig. 3 is a schematic hardware structure of a preferred embodiment of an apparatus for recognizing AI clone sound according to the present invention.
Detailed Description
The present invention will be described in further detail below in order to make the objects, technical solutions and effects of the present invention more clear and distinct. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Embodiments of the present invention are described below with reference to the accompanying drawings.
Referring to fig. 1, fig. 1 is a flowchart of a preferred embodiment of a method for identifying an AI clone sound. As shown in fig. 1, it comprises the steps of:
Step S100, voice information to be recognized is obtained, and voiceprint extraction is carried out on the voice information to obtain voiceprint characteristics to be detected;
step 200, matching the voiceprint feature to be detected with the original voiceprint feature stored in advance, judging whether the matching is successful, if so, executing step S300, and if not, executing step S400;
step S300, judging that the voice information to be recognized is not AI cloning sound;
step S400, the voice information to be recognized is judged to be AI cloning sound, and the judging result is sent to the user terminal.
In specific implementation, the embodiment of the invention acquires the voice information to be recognized, wherein the voice information is a special statement which is recorded by a user by using equipment with a recording function and refreshed in real time. This sentence may be any particular piece of text or sound feature, such as a phrase, a word, or a sound sample. And carrying out voiceprint extraction on the voice information to obtain voiceprint characteristics to be detected. Voiceprint feature extraction refers to extracting features from a speech signal that can represent the identity of a speaker. These features may be sound spectrum, frequency spectrum, sound source information, sound quality, etc.
After the voiceprint is extracted, the background algorithm can match the voiceprint with the audio recorded during registration, and according to the matching result, the background algorithm can judge whether the recorded special statement is matched with the audio during registration. If the matching is successful, the voice information to be identified is not AI cloning sound, and the voice information to be identified is verified to be the same speaker; if the matching fails, the verification fails, the voice information to be recognized is AI cloning sound, and the judging result is sent to the user terminal.
The embodiment of the invention is used for identifying whether a section of voice or an ongoing voice call involves using an AI voiceprint cloning technology, and when the result analyzed by the model is suspected to be used, the analysis result can be sent to a user in modes of short message, call, mobile phone pushing and the like, so that the occurrence of fraud is prevented, and the risk of fraud is avoided. The secondary function is to prevent lawless persons from collecting individual user voiceprints and performing counterfeit cloning.
Further, acquiring voice information to be recognized, and performing voiceprint extraction on the voice information to obtain voiceprint characteristics to be detected, wherein the method comprises the following steps:
A random audio file or a special statement recording file corresponding to a random statement uploaded by a user is obtained in advance, and voiceprint extraction is carried out on the random audio file or the special statement recording file to obtain original voiceprint characteristics of the user;
Storing the original voiceprint features.
In the implementation, the user can upload a random audio file or a special statement recording file corresponding to the random statement which is normally speaking through the operation interface. And carrying out voiceprint extraction on the random audio file or the special statement recording file to obtain original voiceprint characteristics of the user, and storing the original voiceprint characteristics. These original voiceprint features can be saved in a system for fairness, judicial authentication, evidence curing, etc.
Further, the voiceprint extraction is performed on the random audio file or the special sentence recording file to obtain original voiceprint characteristics of the user, including:
And carrying out noise elimination, signal enhancement and feature extraction on the random audio file or the special statement recording file in sequence to obtain the original voiceprint features of the user.
In practice, after uploading the audio file, the background system needs to perform preprocessing. This includes noise cancellation, signal enhancement, feature extraction, etc. Through the steps, the original voiceprint features in the audio file can be extracted, and a data basis is provided for subsequent comparison.
Further, matching the voiceprint feature to be detected with the original voiceprint feature stored in advance, and judging whether the matching is successful or not, including:
and calculating the similarity between the voiceprint features to be detected and the pre-stored original voiceprint features, and judging whether the matching is successful or not according to the similarity.
In practice, the extracted voiceprint features are compared with the similarity of the pre-stored original voiceprint features. This process may be implemented using a specific algorithm. Common algorithms include similarity calculations based on distance metrics, classifiers based on probabilistic models, etc.
Further, judging whether the matching is successful according to the similarity comprises the following steps:
If the similarity is greater than or equal to a preset similarity threshold, judging that the matching is successful;
If the similarity is smaller than a preset similarity threshold, judging that the matching fails.
In the implementation, the similarity between the sound collected on site and the audio file uploaded by the user is greater than or equal to a preset similarity threshold, and the similarity is considered to meet the condition, so that the sound can be considered to be the sound from the same person. Otherwise, the verification fails.
Further, if the matching fails, determining that the voice information to be recognized is an AI cloned sound, and transmitting a determination result to the user terminal, including:
if the matching fails, the voice information to be identified is judged to be AI cloning sound, and the judging result is sent to the user terminal through one or more of short message, communication or WeChat modes.
In the implementation, if the matching fails, the voice information to be identified is judged to be AI clone sound, and the analysis result can be sent to the user in the modes of short message, call, mobile phone pushing and the like, so that the occurrence of fraud is prevented, and the risk of being fraud is avoided.
Further, obtaining voice information to be recognized, performing voiceprint extraction on the voice information, and after obtaining voiceprint characteristics to be detected, further comprising:
encrypting the voice information to be identified to obtain encrypted voice information;
encrypting the voiceprint features to be detected to obtain encrypted voiceprint features.
In practice, after extracting voiceprint features of a speaker, these features may be encrypted using an encryption algorithm. Common encryption algorithms include symmetric encryption algorithms (e.g., AES), asymmetric encryption algorithms (e.g., RSA), and the like. The encrypted voiceprint features will be stored in a secure location, such as the user's device or cloud server.
In addition to encrypting the voiceprint features, the voice signal itself may also be encrypted. This may be implemented using an audio encryption algorithm, such as a frequency domain encryption algorithm, a time domain encryption algorithm, or the like. The encrypted speech signal will not be stolen and cloned by lawbreakers.
When the encrypted voice signal needs to be played, the encrypted voice signal needs to be decrypted by using a corresponding decryption algorithm. The decrypted voice signal can be played by a playing device (such as a mobile phone, a computer, etc.). Meanwhile, the voiceprint features are required to be decrypted by using a corresponding decryption algorithm so as to carry out voiceprint recognition and comparison.
Voiceprint recognition and comparison: after decrypting the voiceprint features, the features can be identified and aligned using an AI voiceprint recognition algorithm. The comparison aims at confirming the identity of the speaker and preventing illegal molecules from masquerading and impersonating.
The application scene of the embodiment of the invention is as follows:
Telecom fraud protection: through voiceprint recognition technology, the identity of the caller can be verified, the caller is ensured to be a true speaker and not a cloned sound, and the fraud caused by robot voice or cloned acquaintance voice is prevented.
Financial domain services: today's financial services scenarios require authentication or recording special voice to verify the actual identity of the financial customer to ensure funds security. The identification technique can be used to verify whether the business transacting customer is a true identity, ensuring that the only verification of the authorized person is effectively used.
An access control system: certain important venues, such as government agencies, military administration areas, vaults, confidential units, and the like, need to be assured that only authorized personnel are able to enter. The technology can be used for an access control system, wherein the access control system can be opened only through verified voice, abnormal sound is detected, and an alarm is triggered.
Smart home: smart home devices need to ensure that only authorized personnel can operate. The technology can verify the identity of the user and control the operation of the intelligent household equipment.
The invention also provides a specific application embodiment I of the method for identifying the AI cloning sound, which comprises the following steps:
The user autonomously speaks to verify the identity of the speaker. The method is applied to various verification systems, a user inputs a section of special statement refreshed in real time through equipment with a recording function, such as entrance guard, privacy APP unlocking and the like, a background algorithm can compare the statement with recorded audio registered, and whether the statement is the same speaker is verified through voiceprints.
The verification mode is as follows:
the user enters a special sentence: the user uses the equipment with the recording function to record a section of special sentences refreshed in real time. This sentence may be any particular piece of text or sound feature, such as a phrase, a word, or a sound sample.
Voiceprint extraction: the background algorithm will perform voiceprint extraction on the entered special statement. Voiceprint extraction refers to extracting features from a speech signal that can represent the identity of a speaker. These features may be sound spectrum, frequency spectrum, sound source information, sound quality, etc.
Voiceprint matching: after the voiceprint is extracted, a background algorithm matches the voiceprint with the audio recorded at registration. This process is typically accomplished by calculating the similarity between two voiceprints. The similarity may be based on different algorithms and features, e.g. based on distance metrics, probability-based models, etc.
Verifying speaker identity: and according to the matching result, the background algorithm can judge whether the recorded special statement is matched with the audio frequency in registration. If the matching is successful, verifying the speaker as the same speaker; if the match fails, the verification fails.
The invention also provides a specific application embodiment II of the method for identifying the AI cloning sound, which comprises the following steps:
And the user autonomously inputs the audio file and detects the result. The method is applied to fairness, judicial identification, evidence solidification and the like, and a user can upload a section of audio file or special statement recording file which is used for speaking at ordinary times, and verify whether a speaker appearing on site is the same person as a speaker of another investigation through comparison with sound collected on site.
The verification mode is as follows:
The user uploads an audio file: the user can upload the audio file or the special sentence recording file which is speaking at ordinary times through the operation interface. These files may be stored in a system for fairness, judicial authentication, evidence curing, etc.
Audio file pretreatment: after uploading the audio file, the background system needs to be preprocessed. This includes noise cancellation, signal enhancement, feature extraction, etc. Through the steps, voiceprint features in the audio file can be extracted, and a data basis is provided for subsequent comparison.
And (3) on-site sound collection: when an event occurs, the sound of the scene is collected and recorded. This process may be accomplished by specialized recording equipment or by tools such as cell phones. The collected sound is converted into a digital signal, and corresponding feature extraction is performed.
And (3) comparing the sound: the extracted voiceprint features are compared to the sound collected in the field. This process may be implemented using a specific algorithm. Common algorithms include similarity calculations based on distance metrics, classifiers based on probabilistic models, etc.
And (3) outputting results: and outputting a verification result by the background system according to the comparison result. Sounds collected on site can be considered sound from the same person if they are sufficiently similar to the audio file uploaded by the user. Otherwise, the verification fails.
Recording and reporting: all the comparison processes and results are recorded and corresponding reports are generated. These reports may be used for fair, judicial identification, evidence-curing, etc.
The invention also provides a third embodiment of the method for identifying the AI cloning sound, which comprises the following steps:
The mobile phone communication device is embedded into a chat tool or communication equipment, when a user suspects about a voice communication object, the mobile phone can be opened for authorizing or starting related functions, and online identification is performed in real time, so that the communication object can be recognized as a fraud molecule.
The verification method comprises the following steps:
And (3) technical integration: AI voiceprint recognition technology is integrated into chat tools or communication devices. This may be achieved by means of software development or application integration. After integration, the user may turn on the AI voiceprint recognition function in the setting of the chat tool or communication device.
Sample collection: under the authorization of the user, a voice sample of the user is collected. This may be done by letting the user record a specific sound sample or automatically collect when the user is engaged in a voice call. The collected speech samples are converted to digital signals and feature extraction is performed.
Voiceprint model training: and training an AI voiceprint recognition model by using the collected voice sample. The goal of this model is to learn and identify the voiceprint features of the user for subsequent comparison.
And (3) on-line comparison: when a user makes doubt, the AI voiceprint recognition function can be turned on in the chat tool or communication device. By comparing the voice of the communication object with the voice print model of the user, it can be judged whether the communication object matches with the voice recognized by the user.
And (3) outputting results: and outputting corresponding prompt information by the chat tool or the communication equipment according to the comparison result. If the comparison is successful, prompting the voice of the user communication object to be matched with the voiceprint model of the user; otherwise, prompting the voice of the communication object of the user to be not matched with the voiceprint model of the user, and suggesting the user to carefully process.
The invention also provides a specific application embodiment IV of the method for identifying the AI cloning sound, which comprises the following specific steps:
The technology is utilized to reverse operation to protect the speaker; and (3) carrying out reverse algorithm operation on the user communication equipment or tool to encrypt voice voiceprints of a speaker so that lawbreakers cannot clone.
The verification mode is as follows:
Voiceprint feature extraction: first, voiceprint features need to be extracted from the speaker's speech signal. These features may include spectral features, cepstral features, linear prediction coefficients, and the like. The extracted voiceprint features will be used as a basis for encryption.
Voiceprint encryption: after extracting the voiceprint features of the speaker, these features may be encrypted using an encryption algorithm. Common encryption algorithms include symmetric encryption algorithms (e.g., AES), asymmetric encryption algorithms (e.g., RSA), and the like. The encrypted voiceprint features will be stored in a secure location, such as the user's device or cloud server.
Encrypting the voice signal: in addition to encrypting the voiceprint features, the voice signal itself may also be encrypted. This may be implemented using an audio encryption algorithm, such as a frequency domain encryption algorithm, a time domain encryption algorithm, or the like. The encrypted speech signal will not be stolen and cloned by lawbreakers.
Decryption and playing: when the encrypted voice signal needs to be played, the encrypted voice signal needs to be decrypted by using a corresponding decryption algorithm. The decrypted voice signal can be played by a playing device (such as a mobile phone, a computer, etc.). Meanwhile, the voiceprint features are required to be decrypted by using a corresponding decryption algorithm so as to carry out voiceprint recognition and comparison.
Voiceprint recognition and comparison: after decrypting the voiceprint features, the features can be identified and aligned using an AI voiceprint recognition algorithm. The comparison aims at confirming the identity of the speaker and preventing illegal molecules from masquerading and impersonating.
It should be noted that, there is not necessarily a certain sequence between the steps, and those skilled in the art will understand that, in different embodiments, the steps may be performed in different orders, that is, may be performed in parallel, may be performed interchangeably, or the like.
Another embodiment of the present invention provides an apparatus for recognizing an AI clone sound, as shown in fig. 2, the apparatus 1 includes:
The voiceprint extraction module 11 is configured to obtain voice information to be identified, and perform voiceprint extraction on the voice information to obtain voiceprint features to be detected;
the feature matching module 12 is configured to match the voiceprint feature to be detected with a pre-stored original voiceprint feature, and determine whether the matching is successful;
a first determining module 13, configured to determine that the voice information to be recognized is not an AI clone sound if the matching is successful;
And a second determining module 14, configured to determine that the voice information to be recognized is an AI clone sound if the matching fails, and send the determination result to the user terminal.
The specific implementation is shown in the method embodiment, and will not be described herein.
Further, the device further comprises an original voiceprint feature extraction and storage module, and the original voiceprint feature extraction and storage module is used for:
A random audio file or a special statement recording file corresponding to a random statement uploaded by a user is obtained in advance, and voiceprint extraction is carried out on the random audio file or the special statement recording file to obtain original voiceprint characteristics of the user;
Storing the original voiceprint features.
The specific implementation is shown in the method embodiment, and will not be described herein.
Further, the voiceprint extraction module 11 is specifically configured to:
And carrying out noise elimination, signal enhancement and feature extraction on the random audio file or the special statement recording file in sequence to obtain the original voiceprint features of the user.
The specific implementation is shown in the method embodiment, and will not be described herein.
Further, the feature matching module 12 is specifically configured to:
and calculating the similarity between the voiceprint features to be detected and the pre-stored original voiceprint features, and judging whether the matching is successful or not according to the similarity.
The specific implementation is shown in the method embodiment, and will not be described herein.
Further, the feature matching module 12 is further configured to:
If the similarity is greater than or equal to a preset similarity threshold, judging that the matching is successful;
If the similarity is smaller than a preset similarity threshold, judging that the matching fails.
The specific implementation is shown in the method embodiment, and will not be described herein.
Further, the second determination module 14 is further configured to:
if the matching fails, the voice information to be identified is judged to be AI cloning sound, and the judging result is sent to the user terminal through one or more of short message, communication or WeChat modes.
The specific implementation is shown in the method embodiment, and will not be described herein.
Further, the device also comprises an encryption module, wherein the encryption module is used for:
encrypting the voice information to be identified to obtain encrypted voice information;
encrypting the voiceprint features to be detected to obtain encrypted voiceprint features.
The specific implementation is shown in the method embodiment, and will not be described herein.
Another embodiment of the present invention provides an apparatus for recognizing an AI clone sound, as shown in fig. 3, the apparatus 10 includes:
One or more processors 110 and a memory 120, one processor 110 being illustrated in fig. 3, the processors 110 and the memory 120 being coupled via a bus or other means, the bus coupling being illustrated in fig. 3.
Processor 110 is used to complete the various control logic of device 10, which may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a single-chip microcomputer, an ARM (Acorn RISC MACHINE) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination of these components. Also, the processor 110 may be any conventional processor, microprocessor, or state machine. The processor 110 may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
The memory 120 is used as a non-volatile computer readable storage medium for storing a non-volatile software program, a non-volatile computer executable program, and modules, such as program instructions corresponding to the method for identifying AI clone sound in the embodiment of the present invention. The processor 110 performs various functional applications of the device 10 and data processing, i.e., implements the method of recognizing AI clone sound in the above-described method embodiment, by running nonvolatile software programs, instructions, and units stored in the memory 120.
The memory 120 may include a storage program area that may store an operating device, an application program required for at least one function, and a storage data area; the storage data area may store data created from the use of the device 10, etc. In addition, memory 120 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, memory 120 may optionally include memory located remotely from processor 110, which may be connected to device 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
One or more units are stored in the memory 120 that, when executed by the one or more processors 110, perform the method of identifying AI clone sound in any of the method embodiments described above, e.g., perform method steps S100-S400 in fig. 1 described above.
Embodiments of the present invention provide a non-transitory computer-readable storage medium storing computer-executable instructions for execution by one or more processors, e.g., to perform the method steps S100-S400 of fig. 1 described above.
By way of example, nonvolatile storage media can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically erasable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM may be available in many forms such as Synchronous RAM (SRAM), dynamic RAM, (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), SYNCHL INK DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The disclosed memory components or memories of the operating environments described herein are intended to comprise one or more of these and/or any other suitable types of memory.
Another embodiment of the present invention provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of identifying AI clone sound of the above-described method embodiment. For example, the above-described method steps S100 to S400 in fig. 1 are performed.
The embodiments described above are merely illustrative, wherein elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
From the above description of embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus a general purpose hardware platform, or may be implemented by hardware. Based on such understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the related art in the form of a software product, which may exist in a computer-readable storage medium such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method of the respective embodiments or some parts of the embodiments.
Conditional language such as "capable," "energy," "possible," or "may," among others, is generally intended to convey that a particular embodiment can include (but other embodiments do not include) particular features, elements, and/or operations unless specifically stated otherwise or otherwise understood within the context as used. Thus, such conditional language is also generally intended to imply that features, elements and/or operations are in any way required for one or more embodiments or that one or more embodiments must include logic for deciding, with or without input or prompting, whether these features, elements and/or operations are included or are to be performed in any particular embodiment.
What has been described herein in this specification and the drawings includes examples of methods and apparatus capable of providing identification of AI clone voices. It is, of course, not possible to describe every conceivable combination of components and/or methodologies for purposes of describing the various features of the present disclosure, but it may be appreciated that many further combinations and permutations of the disclosed features are possible. It is therefore evident that various modifications may be made thereto without departing from the scope or spirit of the disclosure. Further, or in the alternative, other embodiments of the disclosure may be apparent from consideration of the specification and drawings, and practice of the disclosure as presented herein. It is intended that the examples set forth in this specification and figures be considered illustrative in all respects as illustrative and not limiting. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.
Claims (10)
1. A method of identifying AI clone voices, the method comprising:
acquiring voice information to be identified, and performing voiceprint extraction on the voice information to obtain voiceprint characteristics to be detected;
matching the voiceprint features to be detected with the pre-stored original voiceprint features, and judging whether the matching is successful or not;
If the matching is successful, judging that the voice information to be recognized is not the AI cloning sound;
if the matching fails, the voice information to be recognized is judged to be AI cloning sound, and the judging result is sent to the user terminal.
2. The method for identifying an AI clone sound according to claim 1, wherein the step of obtaining the voice information to be identified, and performing voiceprint extraction on the voice information to obtain the voiceprint feature to be detected, includes:
A random audio file or a special statement recording file corresponding to a random statement uploaded by a user is obtained in advance, and voiceprint extraction is carried out on the random audio file or the special statement recording file to obtain original voiceprint characteristics of the user;
Storing the original voiceprint features.
3. The method for identifying an AI clone sound according to claim 2, wherein the voiceprint extraction of the random audio file or the special sentence recording file to obtain an original voiceprint feature of a user includes:
And carrying out noise elimination, signal enhancement and feature extraction on the random audio file or the special statement recording file in sequence to obtain the original voiceprint features of the user.
4. The method for identifying an AI clone sound according to claim 3, wherein the matching the voiceprint feature to be detected with a pre-stored original voiceprint feature, determining whether the matching is successful, includes:
and calculating the similarity between the voiceprint features to be detected and the pre-stored original voiceprint features, and judging whether the matching is successful or not according to the similarity.
5. The method of claim 4, wherein said determining whether the match is successful based on the similarity comprises:
If the similarity is greater than or equal to a preset similarity threshold, judging that the matching is successful;
If the similarity is smaller than a preset similarity threshold, judging that the matching fails.
6. The method for recognizing an AI-cloned sound according to claim 5, wherein if the matching fails, determining that the voice information to be recognized is an AI-cloned sound, and transmitting the determination result to the user terminal, comprises:
if the matching fails, the voice information to be identified is judged to be AI cloning sound, and the judging result is sent to the user terminal through one or more of short message, communication or WeChat modes.
7. The method for identifying an AI clone sound according to claim 6, wherein the obtaining the voice information to be identified, and performing voiceprint extraction on the voice information, after obtaining the voiceprint feature to be detected, further includes:
encrypting the voice information to be identified to obtain encrypted voice information;
encrypting the voiceprint features to be detected to obtain encrypted voiceprint features.
8. An apparatus for identifying AI clone voices, the apparatus comprising:
The voiceprint extraction module is used for acquiring voice information to be identified, and carrying out voiceprint extraction on the voice information to obtain voiceprint characteristics to be detected;
The feature matching module is used for matching the voiceprint features to be detected with the original voiceprint features stored in advance and judging whether the matching is successful or not;
the first judging module is used for judging that the voice information to be recognized is not the AI cloning sound if the matching is successful;
And the second judging module is used for judging that the voice information to be recognized is the AI cloning sound if the matching fails and sending the judging result to the user terminal.
9. An apparatus for identifying AI clone sounds, said apparatus comprising at least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of identifying AI clone sound of any one of claims 1-7.
10. A non-transitory computer-readable storage medium storing computer-executable instructions which, when executed by one or more processors, cause the one or more processors to perform the method of identifying AI clone sound of any of claims 1-7.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410139903.1A CN118072741A (en) | 2024-01-31 | 2024-01-31 | Method, device and equipment for identifying AI clone sound |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410139903.1A CN118072741A (en) | 2024-01-31 | 2024-01-31 | Method, device and equipment for identifying AI clone sound |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN118072741A true CN118072741A (en) | 2024-05-24 |
Family
ID=91106693
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202410139903.1A Pending CN118072741A (en) | 2024-01-31 | 2024-01-31 | Method, device and equipment for identifying AI clone sound |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN118072741A (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119935453A (en) * | 2025-01-15 | 2025-05-06 | 武汉理工大学 | A detection device for a check valve at the bottom of a water channel |
-
2024
- 2024-01-31 CN CN202410139903.1A patent/CN118072741A/en active Pending
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119935453A (en) * | 2025-01-15 | 2025-05-06 | 武汉理工大学 | A detection device for a check valve at the bottom of a water channel |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Ali et al. | Edge-centric multimodal authentication system using encrypted biometric templates | |
| US20230326462A1 (en) | Speaker recognition in the call center | |
| US11468901B2 (en) | End-to-end speaker recognition using deep neural network | |
| US20180146370A1 (en) | Method and apparatus for secured authentication using voice biometrics and watermarking | |
| EP1902442B1 (en) | Selective security masking within recorded speech utilizing speech recognition techniques | |
| US8396711B2 (en) | Voice authentication system and method | |
| EP3373176B1 (en) | Tamper-resistant element for use in speaker recognition | |
| KR102250460B1 (en) | Methods, devices and systems for building user glottal models | |
| US9728191B2 (en) | Speaker verification methods and apparatus | |
| US9237152B2 (en) | Systems and methods for secure and efficient enrollment into a federation which utilizes a biometric repository | |
| US20210089635A1 (en) | Biometric identity verification and protection software solution | |
| US6292782B1 (en) | Speech recognition and verification system enabling authorized data transmission over networked computer systems | |
| US8812319B2 (en) | Dynamic pass phrase security system (DPSS) | |
| CN117807576A (en) | Power communication network data management method and device, electronic equipment and storage medium | |
| CN102377741A (en) | Network security verification method combined with speaker voice identity verification and account password protection during Internet payment | |
| Revathi et al. | Person authentication using speech as a biometric against play back attacks | |
| CN107147499A (en) | The method and system verified using phonetic entry | |
| CN118072741A (en) | Method, device and equipment for identifying AI clone sound | |
| KR20140029990A (en) | System and method for authetificate the user using biometrics | |
| CN119182533B (en) | Security management method and system based on communication networking | |
| CN119207430A (en) | Voice anti-counterfeiting verification method, device, equipment and medium | |
| CN119992701A (en) | Method and system for remote authorization of intelligent access control applied to banking system | |
| CN118250377A (en) | Voiceprint recognition-based voice anti-fraud method, voiceprint recognition-based voice anti-fraud equipment, storage medium and product | |
| CN118474256A (en) | Identity authentication method, system, equipment, medium and product based on voiceprint recognition | |
| TR2021019581A2 (en) | AN AI-SUPPORTED AUTHENTICATION SYSTEM |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |