WO2020151317A1 - Procédé et appareil de vérification vocale, dispositif informatique et support d'enregistrement - Google Patents

Procédé et appareil de vérification vocale, dispositif informatique et support d'enregistrement Download PDF

Info

Publication number
WO2020151317A1
WO2020151317A1 PCT/CN2019/117613 CN2019117613W WO2020151317A1 WO 2020151317 A1 WO2020151317 A1 WO 2020151317A1 CN 2019117613 W CN2019117613 W CN 2019117613W WO 2020151317 A1 WO2020151317 A1 WO 2020151317A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
verification
information
preset
voice information
Prior art date
Application number
PCT/CN2019/117613
Other languages
English (en)
Chinese (zh)
Inventor
黎立桂
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020151317A1 publication Critical patent/WO2020151317A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Definitions

  • This application relates to the technical field of security verification, in particular to a voice verification method, device, computer equipment and storage medium.
  • the traditional voice verification system directly calls the user client after receiving the verification request, and broadcasts the verification information to the user client by means of voice broadcast. After the user obtains the verification information, it returns to the client to fill in and verify the verification information.
  • voice verification the user needs to listen to the verification information and record the verification information, and then return to the user client to fill in the verification information. This operation process is too cumbersome; at the same time, the verification information broadcast by voice generally only supports the use of numbers.
  • the content of the information has certain limitations, so there is a certain risk of leakage; the traditional voice verification system has the disadvantages of cumbersome operation and high risk of leakage.
  • a verification method based on voice recognition is derived.
  • the user generates voice content according to the dynamic verification information, and the user's audio content is parsed through the voice recognition algorithm in the background and verified with dynamic verification. The information is compared to verify the accuracy.
  • the main function of this technology is to use speech to recognize the semantic content of users, instead of the original manual input mode of verification information, and simplify the verification information steps.
  • the effectiveness of the voice recognition verification technology is based on the authenticity of the user, and it is impossible to recognize whether the current voice content is sent by a real human or by a smart AI. After the voice content is deciphered, the smart AI will imitate a human voice. Verification of information voice, so the security of the verification cannot be guaranteed.
  • the embodiments of the present application can provide a voice verification method, device, computer equipment, and storage medium that effectively guarantee the authenticity of the verified user and improve the security of the verification system.
  • a technical solution adopted in the embodiment created by this application is to provide a voice verification method, which includes the following steps:
  • the step of judging whether the voice content is a preset sound category includes the following steps: parsing the verified voice information to obtain characteristic data, wherein the characteristic data is time domain data and spectrum data obtained by processing the voice information; The characteristic data is input into a preset human voice judgment model, where the human voice judgment model is a neural network model that has been trained to convergence and is used to judge whether the voice information is a human voice according to the input characteristic data; The output result of the human voice judgment model determines whether the voice content is a preset voice category.
  • an embodiment of the present application also provides a voice verification device, including:
  • the obtaining module is used to obtain verification voice information, where the verification voice information is the voice content collected by the target terminal when the verification user reads the verification information aloud; the processing module is used to determine the voice content according to the verification voice information Whether it is a preset sound category, where the preset sound category is a voice category that characterizes that the voice content is a human voice; the execution module is used to determine that the voice content does not belong to the preset voice category, It is determined that the voice verification fails; the first parsing submodule is used to parse the verified voice information to obtain characteristic data, where the characteristic data is time domain data and spectrum data obtained by processing the voice information; the first input submodule uses Inputting the feature data into a preset human voice judgment model, where the human voice judgment model is a neural network model that has been trained to convergence and used to judge whether the voice information is a human voice according to the input feature data; The first processing sub-module is configured to determine whether the voice content is a preset sound category according to the output result of the human voice judgment
  • an embodiment of the present application further provides a computer device including a memory and a processor.
  • the memory stores computer-readable instructions.
  • the processor executes the steps of the voice verification method described above.
  • the embodiments of the present application also provide a storage medium storing computer readable instructions.
  • the computer readable instructions are executed by one or more processors, the one or more processors execute the above Describe the steps of the voice verification method.
  • the beneficial effect of the embodiments of the present application is that compared with the prior art, the technical solutions of the embodiments of the present application focus on mining the user’s biological voice features, which can distinguish the difference between the simulated human voice and the real voice based on This feature can effectively identify real users.
  • malicious users such as machines, AI, crawlers, etc. can be effectively excluded, preventing such malicious users from attacking websites and platforms, ensuring the validity and authenticity of verified users, and improving the effectiveness of voice verification. safety.
  • Figure 1 is a schematic diagram of the basic flow of a voice verification method according to an embodiment of this application.
  • FIG. 2 is a schematic diagram of a flow of obtaining verification information according to an embodiment of the application
  • FIG. 3 is a block diagram of the basic structure of a voice verification device according to an embodiment of the application.
  • Fig. 4 is a block diagram of the basic structure of a computer device according to an embodiment of the application.
  • terminal and “terminal device” used herein may be portable, transportable, installed in a vehicle, or suitable and/or configured to operate locally, and/or Distributed form, running on the earth and/or any other location in space.
  • the “terminals” and “terminal devices” used here can also be communication terminals, internet terminals, music/video playback terminals, and other devices.
  • FIG. 1 is a schematic diagram of the basic flow of the voice verification method in this embodiment.
  • a voice verification method includes the following steps:
  • verification voice information is voice content collected by the target terminal when the verification user reads the verification information aloud;
  • the verified user After requesting verification, the verified user receives the verification request from the terminal, sends verification information to the terminal, triggers a prompt instruction to guide the user to perform voice verification, and collects the voice verification voice entered by the user.
  • the verification information may be one or more words randomly generated, or a random word or a combination of one or more words searched from a preset verification information database.
  • the terminal After receiving the verification information, the terminal will verify Information is displayed on the screen and a reminder is issued at the same time. The reminder can be through a specific voice broadcast or display a specific guiding sentence, such as "Please read the verification information on the screen".
  • start the voice collection The end point of the collection is determined according to the size of the user's voice. For example, when there is no sound for more than a preset time (such as 1 second, but not limited to this), it is determined that the sound collection is over, and the collected sound is used as verification voice information.
  • S1200 Determine whether the voice content is a preset sound category according to the verified voice information, where the preset voice category is a voice category that characterizes that the voice content is a human voice;
  • the verification voice information is analyzed to obtain corresponding characteristic data.
  • the characteristic data includes, but is not limited to, the time domain data and spectrum data of the verification voice.
  • the human voice judgment model is trained to converge. It is a neural network model used to judge whether the voice information is human voice according to the input characteristic data, and judge the output of the model according to the human voice As a result, it is determined whether the verification voice information is a human voice.
  • the preset sound category is human voice classification. When the sound category is human voice classification, the voice content belongs to human voice.
  • human voice feature data is used as a positive sample
  • non-human voice feature data such as voices, animal sounds, and noise synthesized by speech synthesis technology are used as negative samples
  • the Inception-v3 neural network The 7x7 convolutional network is decomposed into two one-dimensional convolutions (1x7, 7x1), and the 3x3 convolutional network is also decomposed into two one-dimensional convolutions (1x3, 3x1) to train the Inception-v3 neural network model.
  • the human voice judgment model can be set to only two categories, namely, human voice and non-human voice, or more than two categories, such as human voice, synthetic voice, animal voice, and noise, etc., but not limited to this, according to the actual situation Depending on the application scenario, the classification settings can be adjusted appropriately.
  • the voice category is human voice
  • determine that the voice content belongs to the preset voice category determines that the voice content belongs to the preset voice category
  • determine that the voice content does not belong to the preset voice category determines that the voice content does not belong to the preset voice category.
  • the current verification The user is an abnormal user and the voice verification fails.
  • Step S1200 specifically includes the following steps:
  • Step a Parse the verified voice information to obtain characteristic data, where the characteristic data are time domain data and frequency spectrum data obtained by processing the voice information;
  • Analyze the obtained verification voice information into original time domain data perform anti-aliasing filtering, sampling, and A/D conversion on the original voice data for digitization, and then perform pre-emphasis to improve the high frequency part and filter out the unimportant To find out the beginning and end of the speech signal, and then perform windowing and framing.
  • short-time Fourier transform the processed time-domain data is converted into a frequency signal.
  • Mel spectrum transformation the frequency is converted into the linear relationship that human ears can perceive.
  • DCT transformation is used to separate the DC signal component and the sinusoidal signal component, and the sound spectrum feature is extracted as the spectrum data, and the time domain data and The spectrum data is collectively used as characteristic data for verifying voice information.
  • Step b Input the characteristic data into a preset human voice judgment model, where the human voice judgment model is a neural network that has been trained to convergence and is used to judge whether the voice information is a human voice according to the input characteristic data model;
  • human voice feature data is used as a positive sample
  • non-human voice feature data such as voices, animal sounds, and noise synthesized by speech synthesis technology are used as negative samples to train the neural network model.
  • the neural network model used in this embodiment may be a CNN convolutional neural network model, a VGG convolutional neural network model, or an Inception-v3 neural network model, but is not limited to this.
  • the 7x7 convolutional network of the Inception-v3 neural network is decomposed into two one-dimensional convolutions (1x7, 7x1), and the 3x3 convolutional network is also decomposed into two one-dimensional convolutions ( 1x3, 3x1), train the Inception-v3 neural network model.
  • the human voice judgment model can be set to only two categories, namely, human voice and non-human voice, or more than two categories, such as human voice, synthetic voice, animal voice, and noise, etc., but not limited to this, according to the actual situation Depending on the application scenario, the classification settings can be adjusted appropriately.
  • the feature data is input into the human voice judgment model, and then the output result of the human voice judgment model is obtained.
  • Step c Determine whether the voice content is a preset sound category according to the output result of the human voice judgment model
  • the preset sound category can be human voice classification.
  • the human voice classification is used to characterize the voice content of the human voice. After obtaining the output result of the human voice judgment model, the voice content is determined according to the output result of the human voice judgment model Whether it belongs to the classification of human voice.
  • the method of using the human voice judgment model to judge the verification voice can quickly and accurately determine whether the verification voice is a human voice.
  • the verification voice of the verification user is obtained, it can be found in time, and when the verification is performed by an abnormal user, it can be verified according to the verification
  • the classification result of the voice is intercepted.
  • Step a includes the following steps:
  • Step a1 Process the verified voice information according to a preset first processing rule to obtain time-domain data, where the first processing rule is to parse the voice information into time-domain data and enhance the high-frequency part therein.
  • Voice information processing rules
  • Analyze the obtained verification voice information into original time domain data perform anti-aliasing filtering, sampling, and A/D conversion on the original voice data for digitization, and then perform pre-emphasis to improve the high frequency part and filter out the unimportant It also eliminates the effects caused by the vocal cords and lips during the vocalization process to compensate for the high-frequency part of the voice signal suppressed by the pronunciation system, and highlight the high-frequency formant.
  • Step a2 Process the time-domain data according to a preset second processing rule to obtain a sound spectrum, where the second processing rule is a data processing rule for converting time-domain data into spectral data according to Fourier transform ;
  • the Fourier transform requires the input signal to be stable.
  • the voice signal is not stable on the macroscopic level and stable on the microscopic level. It has short-term stability (the voice signal can be considered to be approximately unchanged within 10-30ms).
  • the speech signal can be divided into some short segments for processing. Each short segment is called a frame. Since the subsequent operation needs to be windowed, when the frame is divided, the intercepted frame and the frame overlap each other partly, and then intercept The frame is multiplied by the preset window function, so that the original speech signal without periodicity shows part of the characteristics of the periodic function, and then the frame signal is Fourier transformed to obtain the corresponding frequency spectrum.
  • the frequency Converting the linear relationship that human ears can perceive, through Mel cepstrum analysis, using DCT transformation to separate the DC signal component and the sinusoidal signal component, and extract the sound spectrum characteristics as spectrum data.
  • Step a3 Define the time domain data and the frequency spectrum data as the characteristic data
  • the time data and spectrum data obtained by analyzing the verification voice information are used together as the characteristic data of the verification voice information.
  • the method of analyzing and processing the verification voice to obtain time-domain data and spectrum data can effectively eliminate the influence of environmental noise and other irrelevant sounds on the verification voice, and simultaneously characterize the verification voice from multiple angles, so that the characteristic data It can reflect the verification voice more truthfully, and the subsequent human voice judgment is more accurate.
  • step S1100 the following steps are further included:
  • the target terminal When the target terminal needs to perform voice verification, it sends a verification request to the server, and the server side obtains the verification request sent by the terminal.
  • a verification database is set in the server.
  • the verification database contains a large number of preset texts (for example, 1000).
  • the text can be a vocabulary or a random combination of characters.
  • a verification request from the target terminal is obtained, a random search is found in the verification database
  • the text is used as the verification information for this voice verification.
  • multiple words or words can be randomly searched in the verification database and randomly combined to generate verification information, so that the verification information has higher randomness.
  • the verification information is sent to the target terminal according to the obtained verification request.
  • the verification information is displayed on the screen, and the reminder instruction is triggered at the same time, and the reminder can be sent out. Announce through a specific voice or display a specific guide sentence, such as "Please read the verification information on the screen.”
  • the verification information may be preprocessed to obtain the verification information picture, such as obfuscation, but not limited to this, the verification information picture after the preprocessing is displayed to the verification user to guide him/her to perform voice verification.
  • step S1200 the following steps are further included:
  • Step d When it is judged that the voice content belongs to a preset sound category, verify the voice information according to a preset verification rule, wherein the verification rule is to determine whether the content of the verified voice information is consistent with the verification Whether the similarity of information is greater than the preset similarity threshold data comparison rule;
  • the preliminary verification is passed and the voice content is verified.
  • Input the verification voice information into the natural language analysis model identify the content in it, output text information corresponding to the voice content, and use the obtained text information as the verification text to compare with the verification information of this voice verification to obtain the comparison
  • the obtained similarity is judged whether the similarity is greater than the preset similarity threshold.
  • the verification rule is met; when the similarity is not greater than the preset threshold, the verification rule is not met.
  • Step e When the verified voice information meets the verification rule, it is determined that the voice verification is passed;
  • Step f When the verified voice information does not meet the verification rule, it is determined that the voice verification fails;
  • malicious users can prevent malicious users from arbitrarily obtaining permissions and causing damage to the platform or website.
  • voice verification can also effectively reduce the possibility of most crawlers or intelligent AI bypassing verification. To improve the authenticity of users.
  • Step d specifically includes the following steps:
  • Step d1 generating a verification text according to the verification voice information, wherein the verification text is text information corresponding to the content of the verification voice information obtained after content recognition of the verification voice information;
  • the voice information is input into the voice recognition model, and the verification text is determined according to the output result of the voice recognition model.
  • the verification text is text information corresponding to the content in the voice information, that is, the voice information is converted into text information.
  • the speech recognition model used may be an existing one, and a model that generates corresponding text information by recognizing content in the speech information, such as a natural speech analysis model or a neural network model that has been trained to convergence, is not limited here.
  • Step d2 Determine text similarity according to the verification text, wherein the text similarity is similarity information between the verification text and the verification information;
  • the similarity of the verification text is compared with the verification information to obtain the corresponding text similarity.
  • the verification text is converted into Unicode characters or GBK ⁇ GB2312 characters, and compared with the characters of the verification information to determine the Hamming distance.
  • the text similarity is determined by the ratio of the Hamming distance to the total number of characters in the verification information.
  • each vocabulary or individual Chinese character in the text can be sorted and compared with the vocabulary or Chinese character in the corresponding position in the verification information for the Hamming distance between characters.
  • the obtained Hamming distance is greater than zero, it is determined The corresponding vocabulary or Chinese characters do not correspond, the number of vocabularies or Chinese characters that do not correspond between the verification text and the verification information is counted, and the ratio is calculated with the total word volume of the verification information, and the ratio is used as the text similarity.
  • Step d3 verify whether the text similarity is greater than the preset similarity threshold
  • a similarity threshold is preset in the system to determine whether the similarity between the verification text and the verification information meets the verification rules.
  • the value of the similarity threshold can be adjusted according to the actual situation. For example, when a more accurate similarity determination method is selected, you can Increase the value of the similarity threshold. When a rough method of determining the similarity is selected, the value of the similarity threshold can be reduced.
  • the comparison result of text similarity and similarity threshold is used to determine whether the voice information meets the verification rule. When the text similarity is greater than the similarity threshold, it is determined that the voice information meets the verification rule and the verification is passed; when the text similarity is less than or equal to the similarity threshold , It is determined that the voice information does not meet the verification rules and the verification fails.
  • Step d1 specifically includes the following steps:
  • Step d11 Input the verification voice information into a preset voice recognition model, where the voice recognition model is a natural language analysis model that converts the input voice information to obtain text corresponding to the content of the voice information;
  • the voice recognition model Enter the voice information into the voice recognition model. First, segment the voice information. The segmentation can be based on pauses in the speech process, or according to the syllable of the speech, the voice information is segmented to obtain the segmented voice, and then The segmented speech is input into a speech recognition model for word segmentation, and fragmented words or syllables are extracted.
  • the speech recognition model can be an existing natural language analysis model that converts the input speech information into text.
  • Step d12 Determine the verification text according to the output result of the speech recognition model
  • the words or syllables output by the speech recognition model are spliced according to the sequence of the segments, and homophones are replaced and adjusted according to the semantics of the entire sentence to obtain a complete sentence as text information.
  • Homophone adjustments can be based on preset word collocation relationships, or similarity matching with preset example sentences, and replacements based on words in similar sentences obtained by matching.
  • the voice model By using the voice model to extract the content of the voice information and convert it into text, the corresponding text content can be accurately obtained, which is more convenient when compared with the verification information, and the accuracy of the voice verification is determined.
  • an embodiment of the present application also provides a voice verification device. Please refer to Figure 3 for details.
  • Figure 3 is a block diagram of the basic structure of the voice verification device in this embodiment.
  • the voice verification device includes: an acquisition module 2100, a processing module 2200, and an execution module 2300.
  • the obtaining module is used to obtain verification voice information, wherein the verification voice information is the voice content collected by the target terminal when the verification user reads the verification information aloud;
  • the processing module is used to determine the voice content according to the verification voice information Whether it is a preset sound category, where the preset sound category is a voice category that characterizes that the voice content is human voice;
  • the execution module is used to determine when it is determined that the voice content does not belong to the preset sound category Voice verification failed.
  • the technical solution of the embodiment of the present application focuses on mining the user's biological voice characteristics, which can distinguish the difference between the simulated human voice and the real human voice based on the characteristics, and can effectively identify real users based on this feature .
  • malicious users such as machines, AI, crawlers, etc. can be effectively eliminated, preventing such malicious users from attacking websites and platforms, ensuring the validity and authenticity of verified users, and improving the security of voice verification Sex.
  • the voice verification device further includes: a first parsing submodule, a first input submodule, and a first processing submodule.
  • the first parsing submodule is used to parse the verified voice information to obtain characteristic data, where the characteristic data is time domain data and spectrum data obtained by processing the voice information;
  • the first input submodule is used to combine the characteristic data Input into a preset human voice judgment model, where the human voice judgment model is trained to convergence, and is used to judge whether the voice information is a human voice based on the input feature data;
  • the first processing sub-module is used Determining whether the voice content is a preset voice category according to the output result of the human voice judgment model.
  • the voice verification device further includes: a second processing submodule, a third processing submodule, and a first execution submodule.
  • the second processing sub-module is configured to process the verified voice information according to a preset first processing rule to obtain time-domain data, where the first processing rule is to parse the voice information into time-domain data and improve Among them, the high-frequency part of the voice information processing rules;
  • the third processing sub-module is used to process the time domain data according to a preset second processing rule to obtain the sound spectrum, wherein the second processing rule is based on Fu
  • the inner leaf transform converts time domain data into data processing rules for spectrum data;
  • the first execution submodule is used to define the time domain data and the spectrum data as the characteristic data.
  • the voice verification device further includes: a first acquiring submodule, a first searching submodule, and a first sending submodule.
  • the first obtaining submodule is used for obtaining the verification request of the target terminal;
  • the first searching submodule is used for randomly searching a text as the verification information in a preset verification database according to the verification request;
  • the first sending submodule is used for After sending the verification information to the target terminal, a preset reminder instruction is triggered to guide the verification user to perform voice verification according to the verification information.
  • the voice verification device further includes: a second execution submodule, a third execution submodule, and a fourth execution submodule.
  • the second execution sub-module is configured to verify the voice information according to a preset verification rule when it is judged that the voice content belongs to a preset sound category, wherein the verification rule is judging the verification voice information Whether the similarity between the content of the verification information and the verification information is greater than the preset similarity threshold; the third execution submodule is used to determine that the voice verification is passed when the verification voice information meets the verification rule; fourth The execution sub-module is used to determine that the voice verification fails when the verified voice information does not meet the verification rule.
  • the voice verification device further includes: a fourth processing submodule, a fifth processing submodule, and a first verification submodule.
  • the fourth processing sub-module is configured to generate a verification text according to the verification voice information, wherein the verification text is a text corresponding to the content of the verification voice information obtained after content recognition of the verification voice information Information; the fifth processing submodule is used to determine the text similarity according to the verification text, wherein the text similarity is the similarity information between the verification text and the verification information; the first verification submodule is used for It is verified whether the text similarity is greater than the preset similarity threshold.
  • the voice verification device further includes: a second input submodule and a sixth processing submodule.
  • the second input sub-module is used for inputting the verification voice information into a preset voice recognition model, wherein the voice recognition model is converted according to the input voice information to obtain text corresponding to the content of the voice information Natural language analysis model; the sixth processing sub-module is used to determine the verification text according to the output result of the speech recognition model.
  • FIG. 4 is a block diagram of the basic structure of the computer device in this embodiment.
  • the computer device includes a processor, a nonvolatile storage medium, a memory, and a network interface connected through a system bus.
  • the non-volatile storage medium of the computer device stores an operating system, a database, and computer-readable instructions.
  • the database may store control information sequences.
  • the processor can implement a A voice verification method.
  • the processor of the computer equipment is used to provide calculation and control capabilities, and supports the operation of the entire computer equipment.
  • a computer readable instruction may be stored in the memory of the computer device, and when the computer readable instruction is executed by the processor, the processor may execute a voice verification method.
  • the network interface of the computer device is used to connect and communicate with the terminal.
  • the structure shown in the figure is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may include More or fewer parts than shown in the figure, or some parts are combined, or have a different arrangement of parts.
  • the processor is used to execute the specific functions of the acquisition module 2100, the processing module 2200, and the execution module 2300 in FIG. 3, and the memory stores the program codes and various data required to execute the above modules.
  • the network interface is used for data transmission between user terminals or servers.
  • the memory in this embodiment stores the program codes and data required to execute all the sub-modules in the voice verification device, and the server can call the program codes and data of the server to execute the functions of all the sub-modules.
  • the present application also provides a storage medium storing computer-readable instructions.
  • the computer-readable instructions are executed by one or more processors, the one or more processors execute the voice verification method described in any of the above embodiments.
  • the storage medium may be a non-volatile readable storage medium.
  • the computer program can be stored in a computer readable storage medium. When executed, it may include the procedures of the above-mentioned method embodiments.
  • the aforementioned storage medium can be a magnetic disk, an optical disk, a read-only storage memory (Read-Only Memory, ROM) and other non-volatile storage media, or random access memory (Random Access Memory, RAM), etc.

Abstract

L'invention concerne un procédé et un appareil de vérification vocale, un dispositif informatique et un support d'enregistrement. Le procédé comprend les étapes suivantes consistant à : obtenir des informations vocales de vérification, les informations vocales de vérification étant un contenu vocal acquis par un terminal cible lorsqu'un utilisateur de vérification lit des informations de vérification (S1100) ; déterminer, en fonction des informations vocales de vérification, si le contenu vocal appartient à une catégorie sonore prédéfinie, la catégorie sonore prédéfinie étant une classe sonore représentant le fait que le contenu vocal est une voix humaine (S1200) ; et lorsqu'il est déterminé que le contenu vocal n'appartient pas à la catégorie sonore prédéfinie, déterminer que la vérification vocale a échoué (S1300). La vérification permettant de savoir si la voix de vérification est une voix humaine réelle peut exclure efficacement des utilisateurs malveillants tels que des machines, une IA et des robots d'indexation, empêcher de tels utilisateurs malveillants d'attaquer des sites web et des plates-formes, garantir la validité et l'authenticité des utilisateurs de vérification et améliorer la sécurité de la vérification vocale.
PCT/CN2019/117613 2019-01-24 2019-11-12 Procédé et appareil de vérification vocale, dispositif informatique et support d'enregistrement WO2020151317A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910068827.9A CN109801638B (zh) 2019-01-24 2019-01-24 语音验证方法、装置、计算机设备及存储介质
CN201910068827.9 2019-01-24

Publications (1)

Publication Number Publication Date
WO2020151317A1 true WO2020151317A1 (fr) 2020-07-30

Family

ID=66560320

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117613 WO2020151317A1 (fr) 2019-01-24 2019-11-12 Procédé et appareil de vérification vocale, dispositif informatique et support d'enregistrement

Country Status (2)

Country Link
CN (1) CN109801638B (fr)
WO (1) WO2020151317A1 (fr)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109801638B (zh) * 2019-01-24 2023-10-13 平安科技(深圳)有限公司 语音验证方法、装置、计算机设备及存储介质
CN110727934A (zh) * 2019-10-22 2020-01-24 成都知道创宇信息技术有限公司 一种反爬虫方法及装置
CN110931020B (zh) * 2019-12-11 2022-05-24 北京声智科技有限公司 一种语音检测方法及装置
CN112185417A (zh) * 2020-10-21 2021-01-05 平安科技(深圳)有限公司 人工合成语音检测方法、装置、计算机设备及存储介质
CN112948788A (zh) * 2021-04-13 2021-06-11 网易(杭州)网络有限公司 语音验证方法、装置、计算设备以及介质
CN117201879B (zh) * 2023-11-06 2024-04-09 深圳市微浦技术有限公司 机顶盒显示方法、装置、设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102820033A (zh) * 2012-08-17 2012-12-12 南京大学 一种声纹识别方法
CN106954136A (zh) * 2017-05-16 2017-07-14 成都泰声科技有限公司 一种集成麦克风接收阵列的超声定向发射参量阵
US20170248955A1 (en) * 2016-02-26 2017-08-31 Ford Global Technologies, Llc Collision avoidance using auditory data
CN109801638A (zh) * 2019-01-24 2019-05-24 平安科技(深圳)有限公司 语音验证方法、装置、计算机设备及存储介质

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102402985A (zh) * 2010-09-14 2012-04-04 盛乐信息技术(上海)有限公司 提高声纹识别安全性的声纹认证系统及其实现方法
US9390245B2 (en) * 2012-08-02 2016-07-12 Microsoft Technology Licensing, Llc Using the ability to speak as a human interactive proof
EP3078026B1 (fr) * 2013-12-06 2022-11-16 Tata Consultancy Services Limited Système et procédé permettant la classification de données de bruit d'une foule humaine
CN104660413A (zh) * 2015-01-28 2015-05-27 中国科学院数据与通信保护研究教育中心 一种声纹口令认证方法和装置
CN107404381A (zh) * 2016-05-19 2017-11-28 阿里巴巴集团控股有限公司 一种身份认证方法和装置
AU2018226844B2 (en) * 2017-03-03 2021-11-18 Pindrop Security, Inc. Method and apparatus for detecting spoofing conditions
CN108877813A (zh) * 2017-05-12 2018-11-23 阿里巴巴集团控股有限公司 人机识别的方法、装置和系统
CN109218269A (zh) * 2017-07-05 2019-01-15 阿里巴巴集团控股有限公司 身份认证的方法、装置、设备及数据处理方法
CN108198561A (zh) * 2017-12-13 2018-06-22 宁波大学 一种基于卷积神经网络的翻录语音检测方法
CN108039176B (zh) * 2018-01-11 2021-06-18 广州势必可赢网络科技有限公司 一种防录音攻击的声纹认证方法、装置及门禁系统
CN108281158A (zh) * 2018-01-12 2018-07-13 平安科技(深圳)有限公司 基于深度学习的语音活体检测方法、服务器及存储介质
CN108711436B (zh) * 2018-05-17 2020-06-09 哈尔滨工业大学 基于高频和瓶颈特征的说话人验证系统重放攻击检测方法
CN109065030B (zh) * 2018-08-01 2020-06-30 上海大学 基于卷积神经网络的环境声音识别方法及系统
CN109147799A (zh) * 2018-10-18 2019-01-04 广州势必可赢网络科技有限公司 一种语音识别的方法、装置、设备及计算机存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102820033A (zh) * 2012-08-17 2012-12-12 南京大学 一种声纹识别方法
US20170248955A1 (en) * 2016-02-26 2017-08-31 Ford Global Technologies, Llc Collision avoidance using auditory data
CN106954136A (zh) * 2017-05-16 2017-07-14 成都泰声科技有限公司 一种集成麦克风接收阵列的超声定向发射参量阵
CN109801638A (zh) * 2019-01-24 2019-05-24 平安科技(深圳)有限公司 语音验证方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
CN109801638A (zh) 2019-05-24
CN109801638B (zh) 2023-10-13

Similar Documents

Publication Publication Date Title
WO2020151317A1 (fr) Procédé et appareil de vérification vocale, dispositif informatique et support d'enregistrement
CN109119063B (zh) 视频配音生成方法、装置、设备及存储介质
CN111128223B (zh) 一种基于文本信息的辅助说话人分离方法及相关装置
WO2020034526A1 (fr) Procédé d'inspection de qualité, appareil, dispositif et support de stockage informatique pour l'enregistrement d'une assurance
WO2015005679A1 (fr) Procédé, appareil et système de reconnaissance vocale
WO2020207035A1 (fr) Procédé, appareil et dispositif d'interception de canular téléphonique, et support d'informations
WO2020139058A1 (fr) Reconnaissance d'empreinte vocale parmi des dispositifs
WO2021114841A1 (fr) Procédé de génération de rapport d'utilisateur, et dispositif terminal
CN107886951B (zh) 一种语音检测方法、装置及设备
CN110853615A (zh) 一种数据处理方法、装置及存储介质
CN109448704A (zh) 语音解码图的构建方法、装置、服务器和存储介质
Kopparapu Non-linguistic analysis of call center conversations
WO2021251539A1 (fr) Procédé permettant de mettre en œuvre un message interactif en utilisant un réseau neuronal artificiel et dispositif associé
CN110232921A (zh) 基于生活服务的语音操作方法、装置、智能电视及系统
Pao et al. Combining acoustic features for improved emotion recognition in mandarin speech
CN113129895B (zh) 一种语音检测处理系统
CN112231440A (zh) 一种基于人工智能的语音搜索方法
CN114125506A (zh) 语音审核方法及装置
CN107885736A (zh) 翻译方法及装置
CN112087726A (zh) 彩铃识别的方法及系统、电子设备及存储介质
CN111949778A (zh) 一种基于用户情绪的智能语音对话方法、装置及电子设备
KR102407055B1 (ko) 음성인식 후 자연어 처리를 통한 대화 품질지수 측정장치 및 그 방법
WO2022154217A1 (fr) Procédé d'auto-entraînement vocal et dispositif de terminal utilisateur pour patient souffrant de troubles vocaux
CN114120425A (zh) 一种情绪识别方法、装置、电子设备及存储介质
CN112383770A (zh) 一种通过语音识别技术的影视版权监测比对方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19911634

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19911634

Country of ref document: EP

Kind code of ref document: A1