WO2021196477A1 - 基于声纹特征与关联图谱数据的风险用户识别方法、装置 - Google Patents
基于声纹特征与关联图谱数据的风险用户识别方法、装置 Download PDFInfo
- Publication number
- WO2021196477A1 WO2021196477A1 PCT/CN2020/106017 CN2020106017W WO2021196477A1 WO 2021196477 A1 WO2021196477 A1 WO 2021196477A1 CN 2020106017 W CN2020106017 W CN 2020106017W WO 2021196477 A1 WO2021196477 A1 WO 2021196477A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- voiceprint
- feature
- voice information
- voiceprint feature
- user
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 239000013598 vector Substances 0.000 claims abstract description 103
- 230000006870 function Effects 0.000 claims description 35
- 238000004364 calculation method Methods 0.000 claims description 26
- 230000015654 memory Effects 0.000 claims description 22
- 238000001228 spectrum Methods 0.000 claims description 19
- 230000004044 response Effects 0.000 claims description 12
- 238000005070 sampling Methods 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 5
- 238000001914 filtration Methods 0.000 claims description 5
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 230000002708 enhancing effect Effects 0.000 abstract description 2
- 238000012545 processing Methods 0.000 description 9
- 230000001537 neural effect Effects 0.000 description 8
- 238000012549 training Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000001514 detection method Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 238000012795 verification Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000009432 framing Methods 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000010255 response to auditory stimulus Effects 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/683—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
Definitions
- This application relates to the field of artificial intelligence technology, and in particular to a risk user identification method, device, electronic equipment, and computer-readable storage medium based on voiceprint features and associated atlas data.
- This application provides a method for identifying risky users based on voiceprint features and associated map data, including:
- the user is a risk user.
- an electronic device which includes:
- Memory storing at least one instruction
- the processor executes the instructions stored in the memory to implement the following method for identifying risky users based on voiceprint features and associated atlas data:
- the user is a risk user.
- the present application also provides a computer-readable storage medium having at least one instruction stored in the computer-readable storage medium, and the at least one instruction is executed by a processor in an electronic device to implement the following Risk user identification method based on voiceprint features and associated map data:
- the user is a risk user.
- the present application also provides a risk user identification device based on voiceprint features and associated atlas data, the device including:
- the voice information acquisition module is used to obtain the standard voice information of the user
- a voiceprint feature extraction module for extracting the first voiceprint feature of the standard voice information
- An atlas data acquisition module configured to input the first voiceprint feature into a preset associated atlas model to obtain associated atlas data related to the first voiceprint feature;
- a vector conversion module for vectorizing the associated map data to obtain an associated feature vector
- a judging module for judging whether there is a voiceprint feature matching the first voiceprint feature in the preset black voiceprint library
- the judgment module is also used to judge whether there is a label feature vector matching the associated feature vector in the preset black relationship map;
- the determining module is configured to: if there is a voiceprint feature matching the first voiceprint feature in the preset black voiceprint library, or there is a voiceprint feature matching the associated feature vector in the preset black relationship map The label feature vector determines that the user is a risk user.
- FIG. 1 is a schematic flowchart of a method for identifying risky users based on voiceprint features and associated atlas data according to an embodiment of the application;
- FIG. 2 is a schematic diagram of modules of a risk user identification device based on voiceprint features and associated atlas data provided by an embodiment of the application;
- FIG. 3 is a schematic diagram of the internal structure of an electronic device that implements a method for identifying risky users based on voiceprint features and associated atlas data according to an embodiment of the application;
- This application provides a method for identifying risky users based on voiceprint features and associated map data.
- FIG. 1 it is a schematic flowchart of a risk user identification method based on voiceprint features and associated graph data provided by an embodiment of this application.
- the method can be executed by a device, and the device can be implemented by software and/or hardware.
- the risk user identification method based on voiceprint features and associated atlas data includes:
- the standard voice information of the user may be obtained from the voice database.
- the obtaining the standard voice information of the user includes:
- the user's original voice information is audio information including the user's voice
- the original voice may be the voice information obtained during a voice call with the user.
- a bank loan officer conducts a telephone credit review of a loan user, he obtains a recording of the voice conversation between the loan reviewer and the loan user, and the recording is the original voice information.
- the purpose of sampling the original voice information is to convert the original voice information into a digital signal, facilitating the processing of the voice information.
- an analog/digital converter is used to sample the original voice information at a rate of tens of thousands of times per second, and each sample records the state of the original voice information at a certain moment, so that the voice at different moments can be obtained.
- Digital voice signal is used to sample the original voice information at a rate of tens of thousands of times per second, and each sample records the state of the original voice information at a certain moment, so that the voice at different moments can be obtained.
- the pre-emphasis operation is performed in the above manner to increase the energy of the high frequency part, so that the speech energy of the high frequency part and the speech energy of the low frequency part have similar amplitudes. , Make the frequency spectrum of the signal become flat, keep the same signal-to-noise ratio in the whole frequency band from low frequency to high frequency.
- the pre-emphasis operation can compensate the digital voice signal.
- ⁇ is the adjustment value of the pre-emphasis operation, and the value range of ⁇ is [0.9, 1.0].
- the frame-by-frame windowing operation is to remove the overlapping part of the voice in the digitally filtered voice signal.
- the performing frame framing and windowing operation on the digitally filtered speech signal includes:
- the digital filtered speech signal is framed and windowed by an objective function, and the objective function is:
- n is the sequence of the number of frames of the digitally filtered speech signal
- N is the total number of frames of the digitally filtered speech signal
- w(n) is the single frame of standard speech information, that is, w(n) represents the standard of each frame voice message.
- extracting the first voiceprint feature of the standard voice information includes:
- the calculation function included in the discrete Fourier transform is:
- N is the total number of frames of the digitally filtered speech signal
- n is the sequence of the number of frames of the digitally filtered speech signal
- w(n) is a single frame of standard speech information, that is, w(n) represents each frame J is the weight of the Fourier transform
- k is the sound frequency of a single frame in the digital filtered voice signal
- D is the spectrum information.
- the triangular filter can smooth the frequency spectrum and eliminate the effect of harmonics, it highlights the formant of the sound. Therefore, the pitch or pitch of a voice will not be reflected in the voiceprint feature, that is to say, the voiceprint feature will not be affected by the difference in the pitch of the input voice and affect the recognition result.
- the triangular filter is calculated as follows:
- f(i) is the center frequency of the triangular filter
- i is the group of the triangular filter
- H(k) is the frequency response value
- k is the sound frequency of a single frame in the digital filtered speech signal, that is, k can represent The sound frequency of each frame.
- the logarithmic transformation is to calculate the logarithmic energy output by each filter bank.
- logarithms In this embodiment, people's response to sound pressure is logarithmic, and people are less sensitive to subtle changes in high sound pressure than low sound pressure. Therefore, the use of logarithms in this embodiment can reduce the sensitivity of the extracted features to changes in the input sound energy.
- logarithm calculation can be performed by the following formula:
- i is the triangle filter group
- k is the sound frequency of a single frame of the original voice information
- N is the total number of frames of the digitally filtered voice signal
- n is the sequence of the number of frames of the digitally filtered voice signal
- D is the spectral information
- S(i) is the logarithmic energy output by each filter.
- S(i) undergoes discrete cosine transform to obtain the voiceprint feature, and the discrete cosine transform is as follows:
- n is the frame number sequence of the original speech information
- i is the group of the triangular filter
- M is the total group number of the triangular filter
- S(i) is the logarithmic energy output by each filter
- x is the sound Pattern characteristics.
- the extraction of the first feature of the standard voice information includes:
- the LSTM Long Short-Term Memory
- the LSTM has three "gate" structures, namely forget gate, input gate, and output gate, which are used to perform different processing on input information.
- forget gate As the name implies, part of the information passed through will be forgotten from the neural unit, so that part of the speech feature of the previous frame disappears during transmission, that is, it will no longer enter the next neural unit for training; the input
- the function of the gate is to add new useful information to the state of the neural unit, that is, after processing the newly learned speech features in this frame, add it to the transmitted information;
- the output gate is based on the state of the above neural unit. And the processed information output, according to the output at the previous moment and the information to be output in the input at this moment, the output information at this moment is finally obtained as the first voiceprint feature.
- the associated map data related to the first voiceprint feature may include, but is not limited to, user tag data corresponding to the first voiceprint feature, and dialing records corresponding to the first voiceprint feature.
- the user tag data includes user attribute feature data such as gender, age, region, work data, and so on.
- the correlation map model may be constructed using a convolutional neural network, using sample voiceprint features as a training set, and using sample voiceprint features marked by user label data as a label set for training to complete the correlation map Model.
- the preset correlation map model For example: input the first voiceprint feature of a user into the preset correlation map model to obtain the correlation map data related to the first voiceprint feature, such as the user's information (name, gender, Age, region, work, etc.), or the historical dialing time and number of times corresponding to the first voiceprint feature.
- vectorization is performed by the following expression:
- i denotes the number of associated data pattern
- v i represents the associated N-dimensional pattern data matrix vector i
- v j is the j-th element of the vector of N-dimensional matrix.
- S5 Determine whether there is a voiceprint feature matching the first voiceprint feature in the preset black voiceprint library; and determine whether there is a label feature vector matching the associated feature vector in the preset black relationship map.
- determining whether there is a voiceprint feature matching the first voiceprint feature in the preset black voiceprint library includes: separately calculating the first voiceprint feature and the preset black voiceprint library by a similarity function The first similarity of multiple voiceprint features; if there is a first similarity greater than the first similarity threshold, it is determined that there is a voiceprint feature matching the first voiceprint feature in the preset black voiceprint library .
- determining whether there is a voiceprint feature matching the first voiceprint feature in the preset black voiceprint library includes: performing similarity between the first voiceprint feature and the voiceprint feature in the preset black voiceprint library Calculate to obtain a first similarity set.
- the maximum value in the first similarity set is the first target similarity. If the first target similarity is greater than the first similarity threshold, it is determined that the preset black voiceprint library exists A voiceprint feature matching the first voiceprint feature.
- the blacklisted voiceprint database is a voiceprint database obtained by extracting the voiceprint feature vectors of the voices of the blacklisted persons.
- the blacklisted voiceprint database includes the voiceprint characteristics of untrustworthy persons in the bank and/or the voiceprint characteristic database of criminals in the public security department.
- x represents the first voiceprint feature
- y i represents the voiceprint feature in the preset black voiceprint library
- n represents the number of voiceprint features in the preset black voiceprint library
- sim(x,y i ) represents the first degree of similarity
- the associated feature vector and the first of the multiple label feature vectors in the preset black relationship map are respectively calculated by the similarity function. Second similarity; if there is a second similarity greater than the second similarity threshold, it is determined that there is a voiceprint feature matching the associated feature vector in the preset black relationship map.
- judging whether there is a label feature vector matching the associated feature vector in the preset black relationship map : performing similarity calculation on the associated feature vector and the label feature vector in the preset black relationship map to obtain the second similarity Set, the maximum value in the second similarity set is the second target similarity, and if the second target similarity is greater than the second similarity threshold, it is determined that there is a matching feature vector in the preset black relationship map The label feature vector.
- the black relationship map database is obtained by extracting the tag feature vector of the tag data of the blacklisted persons. Therefore, the black relationship map database contains the tag feature vector of the tag data of the blacklisted person.
- the second similarity threshold may be the same or different from the first similarity threshold, the second similarity threshold may be greater than the first similarity threshold, and the second similarity threshold may also be smaller than the first similarity threshold. For example, when the first similarity threshold is 80%, and the second similarity threshold is 90%.
- the user is identified as a risk user, which can be more comprehensive Identify risky users and reduce the missed detection of risky users caused by single verification.
- a risk user reminder message is sent.
- the standard voice information of the user is obtained; the first voiceprint feature of the standard voice information is extracted; the first voiceprint feature is input into a preset correlation map model to obtain a correlation with the first voiceprint Feature-related associated map data; vectorize the associated map data to obtain an associated feature vector; determine whether there is a voiceprint feature matching the first voiceprint feature in the preset black voiceprint library; and determine the preset Whether there is a label feature vector matching the associated feature vector in the black relationship map; if there is a voiceprint feature matching the first voiceprint feature in the preset black voiceprint library, or the preset There is a label feature vector matching the associated feature vector in the black relationship map, and the user is determined to be a risk user.
- FIG. 2 it is a functional block diagram of the risk user identification device based on voiceprint features and associated map data in this application.
- the risk user identification device 100 based on voiceprint features and associated map data described in this application can be installed in an electronic device.
- the device for identifying risky users based on voiceprint features and associated atlas data may include a voice information acquisition module 101, a voiceprint feature extraction module 102, an atlas data acquisition module 103, a vector conversion module 104, a judgment module 105, Determine module 106.
- the module described in the present invention can also be called a unit, which refers to a series of computer program segments that can be executed by the processor of an electronic device and can complete fixed functions, and are stored in the memory of the electronic device.
- each module/unit is as follows:
- the voice information acquiring module 101 is used to acquire standard voice information of the user.
- the standard voice information of the user may be obtained from the voice database.
- the obtaining the standard voice information of the user includes:
- the user's original voice information is audio information including the user's voice
- the original voice may be the voice information obtained during a voice call with the user.
- the purpose of sampling the original voice information is to convert the original voice information into a digital signal to facilitate the processing of the voice information.
- an analog/digital converter is used to sample the original voice information at a rate of tens of thousands of times per second, and each sample records the state of the original voice information at a certain moment, so that the voice at different moments can be obtained.
- Digital voice signal is used to sample the original voice information at a rate of tens of thousands of times per second, and each sample records the state of the original voice information at a certain moment, so that the voice at different moments can be obtained.
- the pre-emphasis operation is performed in the above manner to increase the energy of the high frequency part, so that the speech energy of the high frequency part and the speech energy of the low frequency part have similar amplitudes. , Make the frequency spectrum of the signal become flat, keep the same signal-to-noise ratio in the whole frequency band from low frequency to high frequency.
- the pre-emphasis operation can compensate the digital voice signal.
- ⁇ is the adjustment value of the pre-emphasis operation, and the value range of ⁇ is [0.9, 1.0].
- the frame-by-frame windowing operation is to remove the overlapping part of the voice in the digitally filtered voice signal.
- the performing frame framing and windowing operation on the digitally filtered speech signal includes:
- the digital filtered speech signal is framed and windowed by an objective function, and the objective function is:
- n is the sequence of the number of frames of the digitally filtered speech signal
- N is the total number of frames of the digitally filtered speech signal
- w(n) is the single frame of standard speech information, that is, w(n) represents the standard of each frame voice message.
- the voiceprint feature extraction module 102 is used to extract the first voiceprint feature of the standard voice information.
- extracting the first voiceprint feature of the standard voice information includes:
- the calculation function included in the discrete Fourier transform is:
- N is the total number of frames of the digitally filtered speech signal
- n is the sequence of the number of frames of the digitally filtered speech signal
- w(n) is a single frame of standard speech information, that is, w(n) represents each frame J is the weight of the Fourier transform
- k is the sound frequency of a single frame in the digital filtered voice signal
- D is the spectrum information.
- the triangular filter can smooth the frequency spectrum and eliminate the effect of harmonics, it highlights the formant of the sound. Therefore, the pitch or pitch of a piece of sound will not be reflected in the voiceprint feature, that is to say, the voiceprint feature will not be affected by the difference in the pitch of the input sound and affect the recognition result.
- the triangular filter is calculated as follows:
- f(i) is the center frequency of the triangular filter
- i is the group of the triangular filter
- H(k) is the frequency response value
- k is the sound frequency of a single frame in the digital filtered speech signal, that is, k can represent The sound frequency of each frame.
- the logarithmic transformation is to calculate the logarithmic energy output by each filter bank.
- logarithms In this embodiment, people's response to sound pressure is logarithmic, and people are less sensitive to subtle changes in high sound pressure than low sound pressure. Therefore, the use of logarithms in this embodiment can reduce the sensitivity of the extracted features to changes in the input sound energy.
- logarithm calculation can be performed by the following formula:
- i is the triangle filter group
- k is the sound frequency of a single frame of the original voice information
- N is the total number of frames of the digitally filtered voice signal
- n is the sequence of the number of frames of the digitally filtered voice signal
- D is the spectral information
- S(i) is the logarithmic energy output by each filter.
- S(i) undergoes discrete cosine transform to obtain the voiceprint feature, and the discrete cosine transform is as follows:
- n is the frame number sequence of the original speech information
- i is the group of the triangular filter
- M is the total group number of the triangular filter
- S(i) is the logarithmic energy output by each filter
- x is the sound Pattern characteristics.
- the extraction of the first feature of the standard voice information includes:
- the LSTM Long Short-Term Memory
- the LSTM has three "gate" structures, namely forget gate, input gate, and output gate, which are used to perform different processing on input information.
- forget gate As the name implies, part of the information passed through will be forgotten from the neural unit, so that part of the speech feature of the previous frame disappears during transmission, that is, it will no longer enter the next neural unit for training; the input
- the function of the gate is to add new useful information to the state of the neural unit, that is, after processing the newly learned speech features in this frame, add it to the transmitted information;
- the output gate is based on the state of the above neural unit. And the processed information output, according to the output at the previous moment and the information to be output in the input at this moment, the output information at this moment is finally obtained as the first voiceprint feature.
- the atlas data acquisition module 103 is configured to input the first voiceprint feature into a preset associated atlas model to obtain associated atlas data related to the first voiceprint feature.
- the associated map data related to the first voiceprint feature may include, but is not limited to, user tag data corresponding to the first voiceprint feature, and dialing records corresponding to the first voiceprint feature.
- the user tag data includes user attribute feature data such as gender, age, region, work data, and so on.
- the correlation map model may be constructed using a convolutional neural network, using sample voiceprint features as a training set, and using sample voiceprint features marked by user label data as a label set for training to complete the correlation map Model.
- the vector conversion module 104 is configured to vectorize the associated map data to obtain an associated feature vector.
- vectorization is performed by the following expression:
- i denotes the number of associated data pattern
- v i represents the associated N-dimensional pattern data matrix vector i
- v j is the j-th element of the vector of N-dimensional matrix.
- the judging module 105 is used to judge whether there is a voiceprint feature matching the first voiceprint feature in the preset black voiceprint library; the judging module is also used to judge whether there is a voiceprint feature matching the first voiceprint feature in the preset black voiceprint library.
- determining whether there is a voiceprint feature matching the first voiceprint feature in the preset black voiceprint library includes: separately calculating the first voiceprint feature and the preset black voiceprint library by a similarity function The first similarity of multiple voiceprint features; if there is a first similarity greater than the first similarity threshold, it is determined that there is a voiceprint feature matching the first voiceprint feature in the preset black voiceprint library .
- determining whether there is a voiceprint feature matching the first voiceprint feature in the preset black voiceprint library includes: performing similarity between the first voiceprint feature and the voiceprint feature in the preset black voiceprint library Calculate to obtain a first similarity set.
- the maximum value in the first similarity set is the first target similarity. If the first target similarity is greater than the first similarity threshold, it is determined that the preset black voiceprint library exists A voiceprint feature matching the first voiceprint feature.
- the blacklisted voiceprint database is a voiceprint database obtained by extracting the voiceprint feature vectors of the voices of the blacklisted persons.
- the blacklisted voiceprint database includes the voiceprint characteristics of untrustworthy persons in the bank and/or the voiceprint characteristic database of criminals in the public security department.
- x represents the first voiceprint feature
- y i represents the voiceprint feature in the preset black voiceprint library
- n represents the number of voiceprint features in the preset black voiceprint library
- sim(x,y i ) represents the first degree of similarity
- the associated feature vector and the first of the multiple label feature vectors in the preset black relationship map are respectively calculated by the similarity function. Second similarity; if there is a second similarity greater than the second similarity threshold, it is determined that there is a voiceprint feature matching the associated feature vector in the preset black relationship map.
- judging whether there is a label feature vector matching the associated feature vector in the preset black relationship map : performing similarity calculation on the associated feature vector and the label feature vector in the preset black relationship map to obtain the second similarity Set, the maximum value in the second similarity set is the second target similarity, and if the second target similarity is greater than the second similarity threshold, it is determined that there is a matching feature vector in the preset black relationship map The label feature vector.
- the black relationship map database is obtained by extracting the tag feature vector of the tag data of the blacklisted persons. Therefore, the black relationship map database contains the tag feature vector of the tag data of the blacklisted person.
- the determining module 106 is configured to: if there is a voiceprint feature matching the first voiceprint feature in the preset black voiceprint library, or there is a voiceprint feature matching the associated feature vector in the preset black relationship map The label feature vector determines that the user is a risk user.
- the user is identified as a risk user, which can be more comprehensive And accurately identify risk users.
- a risk user reminder message is sent.
- FIG. 3 it is a schematic structural diagram of an electronic device implementing a method for identifying risky users based on voiceprint features and associated atlas data according to the present application.
- the electronic device 1 may include a processor 10, a memory 11, and a bus, and may also include a computer program that is stored in the memory 11 and can run on the processor 10.
- the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, mobile hard disk, multimedia card, card-type memory (for example: SD or DX memory, etc.), magnetic memory, magnetic disk, CD etc.
- the memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, for example, a mobile hard disk of the electronic device 1.
- the memory 11 may also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a smart media card (SMC), and a secure digital (Secure Digital) equipped on the electronic device 1. , SD) card, flash card (Flash Card), etc.
- the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device.
- the memory 11 can be used not only to store application software and various data installed in the electronic device 1, such as the code of a risk user identification program based on voiceprint characteristics and associated map data, etc., but also to temporarily store the output or The data to be output.
- the processor 10 may be composed of integrated circuits in some embodiments, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits with the same function or different functions, including one or more Combinations of central processing unit (CPU), microprocessor, digital processing chip, graphics processor, and various control chips, etc.
- the processor 10 is the control unit of the electronic device, which uses various interfaces and lines to connect the various components of the entire electronic device, and runs or executes programs or modules stored in the memory 11 (for example, executing A risk user identification program based on voiceprint characteristics and associated atlas data, etc.), and call data stored in the memory 11 to execute various functions of the electronic device 1 and process data.
- the bus may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
- PCI peripheral component interconnect standard
- EISA extended industry standard architecture
- the bus can be divided into address bus, data bus, control bus and so on.
- the bus is configured to implement connection and communication between the memory 11 and at least one processor 12 and the like.
- FIG. 3 only shows an electronic device with components. Those skilled in the art can understand that the structure shown in FIG. 3 does not constitute a limitation on the electronic device 1, and may include fewer or more components than shown in the figure. Components, or a combination of certain components, or different component arrangements.
- the electronic device 1 may also include a power source (such as a battery) for supplying power to various components.
- the power source may be logically connected to the at least one processor 10 through a power management device, thereby controlling power
- the device implements functions such as charge management, discharge management, and power consumption management.
- the power supply may also include any components such as one or more DC or AC power supplies, recharging devices, power failure detection circuits, power converters or inverters, and power status indicators.
- the electronic device 1 may also include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
- the electronic device 1 may also include a network interface.
- the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
- the electronic device 1 may also include a user interface.
- the user interface may be a display (Display) and an input unit (such as a keyboard (Keyboard)).
- the user interface may also be a standard wired interface or a wireless interface.
- the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, etc.
- the display can also be appropriately called a display screen or a display unit, which is used to display the information processed in the electronic device 1 and to display a visualized user interface.
- the risk user identification program 12 based on voiceprint features and associated map data stored in the memory 11 in the electronic device 1 is a combination of multiple instructions. When running in the processor 10, it can realize:
- the user is a risk user.
- the integrated module/unit of the electronic device 1 is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
- the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) .
- the computer-readable storage medium may be non-volatile or volatile.
- modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- the functional modules in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional modules.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Business, Economics & Management (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Accounting & Taxation (AREA)
- Quality & Reliability (AREA)
- Library & Information Science (AREA)
- Signal Processing (AREA)
- Finance (AREA)
- Databases & Information Systems (AREA)
- Marketing (AREA)
- Economics (AREA)
- Animal Behavior & Ethology (AREA)
- Technology Law (AREA)
- Development Economics (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Telephonic Communication Services (AREA)
Abstract
一种基于声纹特征与关联图谱数据的风险用户识别方法,包括:获取用户的标准语音信息(S1);提取标准语音信息的第一声纹特征(S2);将所述第一声纹特征输入至预设关联图谱模型,得到与所述第一声纹特征相关的关联图谱数据(S3);将所述关联图谱数据向量化,得到关联特征向量(S4);若预设黑声纹库中存在与第一声纹特征相匹配的声纹特征,或者预设黑关系图谱中存在与关联特征向量相匹配的标签特征向量,识别用户为风险用户。还提出一种基于声纹特征与关联图谱数据的风险用户识别装置、电子设备以及一种计算机可读存储介质。可以降低识别风险用户的漏检率,有利于增强信息的安全性。
Description
本申请要求于2020年04月01日提交中国专利局、申请号为202010253799.0、发明名称为“基于声纹特征与关联图谱数据的风险用户识别方法、装置”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
本申请涉及人工智能技术领域,尤其涉及一种基于声纹特征与关联图谱数据的风险用户识别方法、装置、电子设备及计算机可读存储介质。
目前信息数据呈现指数型的增加,伴随着信息数据的增加,对用户信息进行安全性验证从而识别潜在的风险用户具有一定必要性。发明人意识到,现有技术中,主要基于单项验证方法对用户信息进行验证进而识别风险用户,这种方式存在安全漏洞,容易漏检,用户信息容易被盗用信息者盗用。
发明内容
本申请提供的一种基于声纹特征与关联图谱数据的风险用户识别方法,包括:
获取用户的标准语音信息;
提取所述标准语音信息的第一声纹特征;
将所述第一声纹特征输入至预设关联图谱模型,得到与所述第一声纹特征相关的关联图谱数据;
将所述关联图谱数据向量化,得到关联特征向量;
判断预设黑声纹库中是否存在与所述第一声纹特征相匹配的声纹特征;以及
判断预设黑关系图谱中是否存在与所述关联特征向量相匹配的标签特征向量;
若所述预设黑声纹库中存在与所述第一声纹特征相匹配的声纹特征,或者所述预设黑关系图谱中存在与所述关联特征向量相匹配的标签特征向量,确定所述用户为风险用户。
为了解决上述问题,本申请还提供一种电子设备,所述电子设备包括:
存储器,存储至少一个指令;及
处理器,执行所述存储器中存储的指令以实现如下所述的基于声纹特征与关联图谱数据的风险用户识别方法:
获取用户的标准语音信息;
提取所述标准语音信息的第一声纹特征;
将所述第一声纹特征输入至预设关联图谱模型,得到与所述第一声纹特征相关的关联图谱数据;
将所述关联图谱数据向量化,得到关联特征向量;
判断预设黑声纹库中是否存在与所述第一声纹特征相匹配的声纹特征;以及
判断预设黑关系图谱中是否存在与所述关联特征向量相匹配的标签特征向量;
若所述预设黑声纹库中存在与所述第一声纹特征相匹配的声纹特征,或者所述预设黑关系图谱中存在与所述关联特征向量相匹配的标签特征向量,确定所述用户为风险用户。
为了解决上述问题,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一个指令,所述至少一个指令被电子设备中的处理器执行以实现如下所述的基于声纹特征与关联图谱数据的风险用户识别方法:
获取用户的标准语音信息;
提取所述标准语音信息的第一声纹特征;
将所述第一声纹特征输入至预设关联图谱模型,得到与所述第一声纹特征相关的关联图谱数据;
将所述关联图谱数据向量化,得到关联特征向量;
判断预设黑声纹库中是否存在与所述第一声纹特征相匹配的声纹特征;以及
判断预设黑关系图谱中是否存在与所述关联特征向量相匹配的标签特征向量;
若所述预设黑声纹库中存在与所述第一声纹特征相匹配的声纹特征,或者所述预设黑关系图谱中存在与所述关联特征向量相匹配的标签特征向量,确定所述用户为风险用户。
为了解决上述问题,本申请还提供一种基于声纹特征与关联图谱数据的风险用户识别装置,所述装置包括:
语音信息获取模块,用于获取用户的标准语音信息;
声纹特征提取模块,用于提取所述标准语音信息的第一声纹特征;
图谱数据获取模块,用于将所述第一声纹特征输入至预设关联图谱模型,得到与所述第一声纹特征相关的关联图谱数据;
向量转换模块,用于将所述关联图谱数据向量化,得到关联特征向量;
判断模块,用于判断预设黑声纹库中是否存在与所述第一声纹特征相匹配的声纹特征;
所述判断模块,还用于判断预设黑关系图谱中是否存在与所述关联特征向量相匹配的标签特征向量;
确定模块,用于若所述预设黑声纹库中存在与所述第一声纹特征相匹配的声纹特征,或者所述预设黑关系图谱中存在与所述关联特征向量相匹配的标签特征向量,确定所述用户为风险用户。
图1为本申请一实施例提供的基于声纹特征与关联图谱数据的风险用户识别方法的流程示意图;
图2为本申请一实施例提供的基于声纹特征与关联图谱数据的风险用户识别装置的模块示意图;
图3为本申请一实施例提供的实现基于声纹特征与关联图谱数据的风险用户识别方法的电子设备的内部结构示意图;
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供一种基于声纹特征与关联图谱数据的风险用户识别方法。参照图1所示,为本申请一实施例提供的基于声纹特征与关联图谱数据的风险用户识别方法的流程示意图。该方法可以由一个装置执行,该装置可以由软件和/或硬件实现。
在本实施例中,基于声纹特征与关联图谱数据的风险用户识别方法包括:
S1、获取用户的标准语音信息。
本实施例中,用户的标准语音信息可以是从语音数据库中获取到的。
进一步地,所述获取用户的标准语音信息包括:
获取所述用户的原始语音信息;
利用模/数转换器对所述原始语音信息进行采样,得到数字语音信号;
对所述数字语音信号进行预加重操作,得到数字滤波语音信号;
对所述数字滤波语音信号进行分帧加窗操作,得到所述标准语音信息。
本实施例中,用户的原始语音信息为包含用户语音的音频信息,原始语音可以是与用 户的语音通话中获取到的语音信息。
例如,银行贷款人员在对贷款用户进行电话信用审核时,获取贷款审核人员与贷款用户之间的通话语音的录音,该录音为原始语音信息。
详细地,对原始语音信息进行采样是为了将原始语音信息转化为数字信号,便于对语音信息进行处理。
本实施例中,使用模/数转换器以每秒上万次的速率对原始语音信息进行采样,每一次采样都记录下了原始语音信息在某一时刻的状态,从而能够得到不同时刻的语音数字语音信号。
由于人声的发音系统会抑制高频部分,在本实施例中,通过上述方式进行预加重操作,可以提升高频部分能量,使高频部分的语音能量和低频部分的语音能量有相似的幅度,使信号的频谱变得平坦,保持在低频到高频的整个频带中能用同样的信噪比。
在本实施例中,预加重操作可以对数字语音信号进行补偿。
具体地,所述预加重操作可通过y(t)=x(t)-μx(t-1)进行计算,其中,x(t)为数字语音信号,t为时间,y(t)为所述数字滤波语音信号,μ为所述预加重操作的调节值,μ的取值范围为[0.9,1.0]。
本实施例中,分帧加窗操作是为了去除所述数字滤波语音信号中的语音的重叠部分。
例如:银行放贷人员在对贷款用户进行电话信用审核时,由于原始语音信息会有银行放贷人员与贷款用户的语音重叠部分,因此采用所述分帧加窗操作可去除银行放贷人员的语音,保留贷款用户的语音。
进一步地,所述对所述数字滤波语音信号进行分帧加窗操作包括:
通过目标函数对所述数字滤波语音信号进行分帧加窗操作,所述目标函数为:
其中,n为数字滤波语音信号的帧数序列,N为所述数字滤波语音信号的总帧数,w(n)为标准语音信息的单帧数据,即w(n)表示每一帧的标准语音信息。
S2、提取所述标准语音信息的第一声纹特征。
详细地,提取所述标准语音信息的第一声纹特征,包括:
将所述标准语音信息进行离散傅里叶变换,得到所述标准语音信息的频谱信息;
利用三角滤波器对所述标准语音信息进行三角滤波计算,得到所述标准语音信息的频率响应值;
对所述频谱信息和所述频率响应值进行对数计算,得到对数能量;
对所述对数能量进行离散余弦计算,得到所述第一声纹特征。
较佳地,所述离散傅里叶变换包含的计算函数为:
其中,N为所述数字滤波语音信号的总帧数,n为所述数字滤波语音信号的帧数序列,w(n)为标准语音信息的单帧数据,即w(n)表示每一帧的标准语音信息,j为所述傅里叶变换的权值,k为所述数字滤波语音信号中单帧的声音频率,D为频谱信息。
优选地,在本实施例中,定义一个有M个滤波器(滤波器可以为三角滤波器)的滤波器组,滤波器的中心频率为f(i),i=1,2,…,M,所述中心频率为滤波器的截止频率,通过三角滤波器进行三角滤波计算。
由于三角滤波器可以对频谱进行平滑,并消除谐波的作用,突显声音的共振峰。因此一段声音的音调或音高,不会反应在声纹特征内,也就是说所述声纹特征并不会受到输入 声音的音调不同而对识别结果有所影响。
优选的,所述三角滤波计算如下:
其中f(i)为三角滤波器的中心频率,i为三角滤波器的组别,H(k)为频率响应值,k为所述数字滤波语音信号中单帧的声音频率,即k可以表示每一帧的声音频率。
进一步地,对数变换是计算每个滤波器组输出的对数能量。
一般人对声音声压的反应呈对数关系,人对高声压的细微变化敏感度不如低声压。因此,在本实施例中使用对数可以降低提取的特征对输入声音能量变化的敏感度。
具体地,可通过以下公式进行对数计算:
其中i为三角滤波器的组别,k为所述原始语音信息的单帧的声音频率,N为所述数字滤波语音信号的总帧数,n为所述数字滤波语音信号的帧数序列,D为频谱信息,S(i)为每个滤波器输出的对数能量。
优选地,S(i)经过离散余弦变换得到声纹特征,所述离散余弦变换如下:
其中n为原始语音信息的帧数序列,i为三角滤波器的组别,M为三角滤波器的总组数,S(i)为每个滤波器输出的对数能量,x为所述声纹特征。
进一步地,在本申请的另一实施例中所述提取所述标准语音信息的第一特征包括:
利用LSTM(Long Short-Term Memory,长短期记忆)网络提取所述标准语音信息的第一特征。所述LSTM具有三个“门”结构,分别为忘记门(forget gate)、输入门(input gate)、输出门(output gate),用于对输入的信息进行不同的处理。所述忘记门,顾名思义通过的信息将有一部分从神经单元中被遗忘,使上一帧的语音特征中的一部分在传递中消失,即不再会进入到下一个神经单元中进行训练;所述输入门的作用是将新的有用的信息添加到神经单元状态中去,即将这一帧新学习到的语音特征处理后,加入到传递的信息中去;最后所述输出门是基于以上神经单元状态和处理后的信息输出,根据上一时刻的输出和这一时刻的输入中将要输出的信息,最终得到该时刻的输出信息作为所述第一声纹特征。
S3、将所述第一声纹特征输入至预设关联图谱模型,得到与所述第一声纹特征相关的关联图谱数据。
本实施例中,与第一声纹特征相关的关联图谱数据可以包括但不限于第一声纹特征对应的用户标签数据,第一声纹特征对应的拨打记录。具体的,用户标签数据包括用户的属性特征数据例如:性别、年龄、地域、工作数据等。
详细地,本实施例中,所述关联图谱模型可以用卷积神经网络进行构建,利用样本声纹特征作为训练集,利用用户标签数据标记过的样本声纹特征作为标签集进行训练完成关联图谱模型。
例如:将某用户的第一声纹特征输入至预设关联图谱模型,得到与该第一声纹特征相 关的关联图谱数据,如该第一声纹特征对应的用户的信息(姓名、性别、年龄、地域、工作等),或者该第一声纹特征对应的历史拨打时间和次数。
S4、将所述关联图谱数据向量化,得到关联特征向量。
详细地,通过以下表达式进行向量化:
其中,i表示所述关联图谱数据的编号,v
i表示关联图谱数据i的N维矩阵向量,v
j是所述N维矩阵向量的第j个元素。
S5、判断预设黑声纹库中是否存在与所述第一声纹特征相匹配的声纹特征;以及判断预设黑关系图谱中是否存在与所述关联特征向量相匹配的标签特征向量。
详细地,判断预设黑声纹库中是否存在与所述第一声纹特征相匹配的声纹特征包括:通过相似度函数分别计算所述第一声纹特征与预设黑声纹库中多个声纹特征的第一相似度;若存在大于第一相似度阈值的第一相似度,确定所述预设黑声纹库中存在与所述第一声纹特征相匹配的声纹特征。
或者,判断预设黑声纹库中是否存在与所述第一声纹特征相匹配的声纹特征包括:将所述第一声纹特征与预设黑声纹库中声纹特征进行相似度计算,得到第一相似度集,所述第一相似度集中的最大值为第一目标相似度,若第一目标相似度大于第一相似度阈值,确定所述预设黑声纹库中存在与所述第一声纹特征相匹配的声纹特征。
本实施例中,所述黑名单声纹库是通过提取黑名单人员的声音的声纹特征向量得到的声纹数据库。
例如,黑名单声纹库包含银行的失信人员的声纹特征和/或公安部门的犯罪分子声纹特征库。
进一步地,所述相似度函数为:
其中,x表示所述第一声纹特征,y
i表示所述预设黑声纹库中声纹特征,n表示所述预设黑声纹库中声纹特征的数量,sim(x,y
i)表示所述第一相似度。
类似地,判断预设黑关系图谱中是否存在与所述关联特征向量相匹配的标签特征向量:通过相似度函数分别计算所述关联特征向量与预设黑关系图谱中多个标签特征向量的第二相似度;若存在大于第二相似度阈值的第二相似度,确定所述预设黑关系图谱中存在与所述关联特征向量相匹配的声纹特征。
或者,判断预设黑关系图谱中是否存在与所述关联特征向量相匹配的标签特征向量:将所述关联特征向量与预设黑关系图谱中标签特征向量进行相似度计算,得到第二相似度集,所述第二相似度集中的最大值为第二目标相似度,若第二目标相似度大于第二相似度阈值,确定所述预设黑关系图谱中存在与所述关联特征向量相匹配的标签特征向量。
本实施例中,所述黑关系图谱数据库是通过提取黑名单人员的标签数据的标签特征向量得到的,因此,黑关系图谱数据库包含黑名单人员的标签数据的标签特征向量。
本实施例中,第二相似度阈值与第一相似度阈值可以相同或不同,第二相似度阈值可以大于第一相似度阈值,第二相似度阈值也可以小于第一相似度阈值。例如,第一相似度阈值为80%,第二相似度阈值为90%时。
S6、若所述预设黑声纹库中存在与所述第一声纹特征相匹配的声纹特征,或者所述预设黑关系图谱中存在与所述关联特征向量相匹配的标签特征向量,确定所述用户为风险用户。
若预设黑声纹库中存在与第一声纹特征相匹配的声纹特征,或者预设黑关系图谱中存在与关联特征向量相匹配的标签特征向量,识别用户为风险用户,可以更全面识别到风险 用户,降低单项验证造成的风险用户漏检的情况。
进一步地,若确定所述用户为风险用户,发送风险用户提醒消息。
本申请实施例中,获取用户的标准语音信息;提取所述标准语音信息的第一声纹特征;将所述第一声纹特征输入至预设关联图谱模型,得到与所述第一声纹特征相关的关联图谱数据;将所述关联图谱数据向量化,得到关联特征向量;判断预设黑声纹库中是否存在与所述第一声纹特征相匹配的声纹特征;以及判断预设黑关系图谱中是否存在与所述关联特征向量相匹配的标签特征向量;若所述预设黑声纹库中存在与所述第一声纹特征相匹配的声纹特征,或者所述预设黑关系图谱中存在与所述关联特征向量相匹配的标签特征向量,确定所述用户为风险用户。通过两种渠道的双项验证,实现了降低识别风险用户的漏检率,进而增强信息的安全性的目的。
如图2所示,是本申请基于声纹特征与关联图谱数据的风险用户识别装置的功能模块图。
本申请所述基于声纹特征与关联图谱数据的风险用户识别装置100可以安装于电子设备中。根据实现的功能,所述基于声纹特征与关联图谱数据的风险用户识别装置可以包括语音信息获取模块101、声纹特征提取模块102、图谱数据获取模块103、向量转换模块104、判断模块105、确定模块106。本发所述模块也可以称之为单元,是指一种能够被电子设备处理器所执行,并且能够完成固定功能的一系列计算机程序段,其存储在电子设备的存储器中。
在本实施例中,关于各模块/单元的功能如下:
语音信息获取模块101用于获取用户的标准语音信息。
本实施例中,用户的标准语音信息可以是从语音数据库中获取到的。
进一步地,所述获取用户的标准语音信息包括:
获取所述用户的原始语音信息;
利用模/数转换器对所述原始语音信息进行采样,得到数字语音信号;
对所述数字语音信号进行预加重操作,得到数字滤波语音信号;
对所述数字滤波语音信号进行分帧加窗操作,得到所述标准语音信息。
本实施例中,用户的原始语音信息为包含用户语音的音频信息,原始语音可以是与用户的语音通话中获取到的语音信息。
详细地,对原始语音信息进行采样是为了将原始语音信息转化为数字信号,便于对语音信息进行处理。
本实施例中,使用模/数转换器以每秒上万次的速率对原始语音信息进行采样,每一次采样都记录下了原始语音信息在某一时刻的状态,从而能够得到不同时刻的语音数字语音信号。
由于人声的发音系统会抑制高频部分,在本实施例中,通过上述方式进行预加重操作,可以提升高频部分能量,使高频部分的语音能量和低频部分的语音能量有相似的幅度,使信号的频谱变得平坦,保持在低频到高频的整个频带中能用同样的信噪比。
在本实施例中,预加重操作可以对数字语音信号进行补偿。
具体地,所述预加重操作可通过y(t)=x(t)-μx(t-1)进行计算,其中,x(t)为数字语音信号,t为时间,y(t)为所述数字滤波语音信号,μ为所述预加重操作的调节值,μ的取值范围为[0.9,1.0]。
本实施例中,分帧加窗操作是为了去除所述数字滤波语音信号中的语音的重叠部分。
进一步地,所述对所述数字滤波语音信号进行分帧加窗操作包括:
通过目标函数对所述数字滤波语音信号进行分帧加窗操作,所述目标函数为:
其中,n为数字滤波语音信号的帧数序列,N为所述数字滤波语音信号的总帧数,w(n)为标准语音信息的单帧数据,即w(n)表示每一帧的标准语音信息。
声纹特征提取模块102用于提取所述标准语音信息的第一声纹特征。
详细地,提取所述标准语音信息的第一声纹特征,包括:
将所述标准语音信息进行离散傅里叶变换,得到所述标准语音信息的频谱信息;
利用三角滤波器对所述标准语音信息进行三角滤波计算,得到所述标准语音信息的频率响应值;
对所述频谱信息和所述频率响应值进行对数计算,得到对数能量;
对所述对数能量进行离散余弦计算,得到所述第一声纹特征。
较佳地,所述离散傅里叶变换包含的计算函数为:
其中,N为所述数字滤波语音信号的总帧数,n为所述数字滤波语音信号的帧数序列,w(n)为标准语音信息的单帧数据,即w(n)表示每一帧的标准语音信息,j为所述傅里叶变换的权值,k为所述数字滤波语音信号中单帧的声音频率,D为频谱信息。
优选地,在本实施例中,定义一个有M个滤波器(滤波器可以为三角滤波器)的滤波器组,滤波器的中心频率为f(i),i=1,2,…,M,所述中心频率为滤波器的截止频率,通过三角滤波器进行三角滤波计算。
由于三角滤波器可以对频谱进行平滑,并消除谐波的作用,突显声音的共振峰。因此一段声音的音调或音高,不会反应在声纹特征内,也就是说所述声纹特征并不会受到输入声音的音调不同而对识别结果有所影响。
优选的,所述三角滤波计算如下:
其中f(i)为三角滤波器的中心频率,i为三角滤波器的组别,H(k)为频率响应值,k为所述数字滤波语音信号中单帧的声音频率,即k可以表示每一帧的声音频率。
进一步地,对数变换是计算每个滤波器组输出的对数能量。
一般人对声音声压的反应呈对数关系,人对高声压的细微变化敏感度不如低声压。因此,在本实施例中使用对数可以降低提取的特征对输入声音能量变化的敏感度。
具体地,可通过以下公式进行对数计算:
其中i为三角滤波器的组别,k为所述原始语音信息的单帧的声音频率,N为所述数字滤波语音信号的总帧数,n为所述数字滤波语音信号的帧数序列,D为频谱信息,S(i)为每个滤波器输出的对数能量。
优选地,S(i)经过离散余弦变换得到声纹特征,所述离散余弦变换如下:
其中n为原始语音信息的帧数序列,i为三角滤波器的组别,M为三角滤波器的总组数,S(i)为每个滤波器输出的对数能量,x为所述声纹特征。
进一步地,在本申请的另一实施例中所述提取所述标准语音信息的第一特征包括:
利用LSTM(Long Short-Term Memory,长短期记忆)网络提取所述标准语音信息的第一特征。所述LSTM具有三个“门”结构,分别为忘记门(forget gate)、输入门(input gate)、输出门(output gate),用于对输入的信息进行不同的处理。所述忘记门,顾名思义通过的信息将有一部分从神经单元中被遗忘,使上一帧的语音特征中的一部分在传递中消失,即不再会进入到下一个神经单元中进行训练;所述输入门的作用是将新的有用的信息添加到神经单元状态中去,即将这一帧新学习到的语音特征处理后,加入到传递的信息中去;最后所述输出门是基于以上神经单元状态和处理后的信息输出,根据上一时刻的输出和这一时刻的输入中将要输出的信息,最终得到该时刻的输出信息作为所述第一声纹特征。
图谱数据获取模块103用于将所述第一声纹特征输入至预设关联图谱模型,得到与所述第一声纹特征相关的关联图谱数据。
本实施例中,与第一声纹特征相关的关联图谱数据可以包括但不限于第一声纹特征对应的用户标签数据,第一声纹特征对应的拨打记录。具体的,用户标签数据包括用户的属性特征数据例如:性别、年龄、地域、工作数据等。
详细地,本实施例中,所述关联图谱模型可以用卷积神经网络进行构建,利用样本声纹特征作为训练集,利用用户标签数据标记过的样本声纹特征作为标签集进行训练完成关联图谱模型。
向量转换模块104用于将所述关联图谱数据向量化,得到关联特征向量。
详细地,通过以下表达式进行向量化:
其中,i表示所述关联图谱数据的编号,v
i表示关联图谱数据i的N维矩阵向量,v
j是所述N维矩阵向量的第j个元素。
判断模块105用于判断预设黑声纹库中是否存在与所述第一声纹特征相匹配的声纹特征;所述判断模块,还用于判断预设黑关系图谱中是否存在与所述关联特征向量相匹配的标签特征向量。
详细地,判断预设黑声纹库中是否存在与所述第一声纹特征相匹配的声纹特征包括:通过相似度函数分别计算所述第一声纹特征与预设黑声纹库中多个声纹特征的第一相似度;若存在大于第一相似度阈值的第一相似度,确定所述预设黑声纹库中存在与所述第一声纹特征相匹配的声纹特征。
或者,判断预设黑声纹库中是否存在与所述第一声纹特征相匹配的声纹特征包括:将所述第一声纹特征与预设黑声纹库中声纹特征进行相似度计算,得到第一相似度集,所述第一相似度集中的最大值为第一目标相似度,若第一目标相似度大于第一相似度阈值,确定所述预设黑声纹库中存在与所述第一声纹特征相匹配的声纹特征。
本实施例中,所述黑名单声纹库是通过提取黑名单人员的声音的声纹特征向量得到的声纹数据库。
例如,黑名单声纹库包含银行的失信人员的声纹特征和/或公安部门的犯罪分子声纹特征库。
进一步地,所述相似度函数为:
其中,x表示所述第一声纹特征,y
i表示所述预设黑声纹库中声纹特征,n表示所述预设黑声纹库中声纹特征的数量,sim(x,y
i)表示所述第一相似度。
类似地,判断预设黑关系图谱中是否存在与所述关联特征向量相匹配的标签特征向量:通过相似度函数分别计算所述关联特征向量与预设黑关系图谱中多个标签特征向量的第二相似度;若存在大于第二相似度阈值的第二相似度,确定所述预设黑关系图谱中存在与所述关联特征向量相匹配的声纹特征。
或者,判断预设黑关系图谱中是否存在与所述关联特征向量相匹配的标签特征向量:将所述关联特征向量与预设黑关系图谱中标签特征向量进行相似度计算,得到第二相似度集,所述第二相似度集中的最大值为第二目标相似度,若第二目标相似度大于第二相似度阈值,确定所述预设黑关系图谱中存在与所述关联特征向量相匹配的标签特征向量。
本实施例中,所述黑关系图谱数据库是通过提取黑名单人员的标签数据的标签特征向量得到的,因此,黑关系图谱数据库包含黑名单人员的标签数据的标签特征向量。
确定模块106用于若所述预设黑声纹库中存在与所述第一声纹特征相匹配的声纹特征,或者所述预设黑关系图谱中存在与所述关联特征向量相匹配的标签特征向量,确定所述用户为风险用户。
若预设黑声纹库中存在与第一声纹特征相匹配的声纹特征,或者预设黑关系图谱中存在与关联特征向量相匹配的标签特征向量,识别用户为风险用户,可以更全面且准确的识别到风险用户。
进一步地,若确定所述用户为风险用户,发送风险用户提醒消息。
如图3所示,是本申请实现基于声纹特征与关联图谱数据的风险用户识别方法的电子设备的结构示意图。
所述电子设备1可以包括处理器10、存储器11和总线,还可以包括存储在所述存储器11中并可在所述处理器10上运行的计算机程序。
其中,所述存储器11至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、移动硬盘、多媒体卡、卡型存储器(例如:SD或DX存储器等)、磁性存储器、磁盘、光盘等。所述存储器11在一些实施例中可以是电子设备1的内部存储单元,例如该电子设备1的移动硬盘。所述存储器11在另一些实施例中也可以是电子设备1的外部存储设备,例如电子设备1上配备的插接式移动硬盘、智能存储卡(Smart Media Card,SMC)、安全数字(Secure Digital,SD)卡、闪存卡(Flash Card)等。进一步地,所述存储器11还可以既包括电子设备1的内部存储单元也包括外部存储设备。所述存储器11不仅可以用于存储安装于电子设备1的应用软件及各类数据,例如基于声纹特征与关联图谱数据的风险用户识别程序的代码等,还可以用于暂时地存储已经输出或者将要输出的数据。
所述处理器10在一些实施例中可以由集成电路组成,例如可以由单个封装的集成电路所组成,也可以是由多个相同功能或不同功能封装的集成电路所组成,包括一个或者多个中央处理器(Central Processing unit,CPU)、微处理器、数字处理芯片、图形处理器及各种控制芯片的组合等。所述处理器10是所述电子设备的控制核心(Control Unit),利用各种接口和线路连接整个电子设备的各个部件,通过运行或执行存储在所述存储器11内的程序或者模块(例如执行基于声纹特征与关联图谱数据的风险用户识别程序等),以及调用存储在所述存储器11内的数据,以执行电子设备1的各种功能和处理数据。
所述总线可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。所述总线被设置为实现所述存储器11以及 至少一个处理器12等之间的连接通信。
图3仅示出了具有部件的电子设备,本领域技术人员可以理解的是,图3示出的结构并不构成对所述电子设备1的限定,可以包括比图示更少或者更多的部件,或者组合某些部件,或者不同的部件布置。
例如,尽管未示出,所述电子设备1还可以包括给各个部件供电的电源(比如电池),优选地,电源可以通过电源管理装置与所述至少一个处理器10逻辑相连,从而通过电源管理装置实现充电管理、放电管理、以及功耗管理等功能。电源还可以包括一个或一个以上的直流或交流电源、再充电装置、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。所述电子设备1还可以包括多种传感器、蓝牙模块、Wi-Fi模块等,在此不再赘述。
进一步地,所述电子设备1还可以包括网络接口,可选地,所述网络接口可以包括有线接口和/或无线接口(如WI-FI接口、蓝牙接口等),通常用于在该电子设备1与其他电子设备之间建立通信连接。
可选地,该电子设备1还可以包括用户接口,用户接口可以是显示器(Display)、输入单元(比如键盘(Keyboard)),可选地,用户接口还可以是标准的有线接口、无线接口。可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在电子设备1中处理的信息以及用于显示可视化的用户界面。
应该了解,所述实施例仅为说明之用,在专利申请范围上并不受此结构的限制。
所述电子设备1中的所述存储器11存储的基于声纹特征与关联图谱数据的风险用户识别程序12是多个指令的组合,在所述处理器10中运行时,可以实现:
获取用户的标准语音信息;
提取所述标准语音信息的第一声纹特征;
将所述第一声纹特征输入至预设关联图谱模型,得到与所述第一声纹特征相关的关联图谱数据;
将所述关联图谱数据向量化,得到关联特征向量;
判断预设黑声纹库中是否存在与所述第一声纹特征相匹配的声纹特征;以及
判断预设黑关系图谱中是否存在与所述关联特征向量相匹配的标签特征向量;
若所述预设黑声纹库中存在与所述第一声纹特征相匹配的声纹特征,或者所述预设黑关系图谱中存在与所述关联特征向量相匹配的标签特征向量,确定所述用户为风险用户。
具体地,所述处理器10对上述指令的具体实现方法可参考图1对应实施例中相关步骤的描述,在此不赘述。
进一步地,所述电子设备1集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读存储介质中。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)。所述计算机可读存储介质可以是非易失性,也可以是易失性。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各 个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。
因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附关联图标记视为限制所涉及的权利要求。
此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。系统权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第二等词语用来表示名称,而并不表示任何特定的顺序。
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神和范围。
Claims (20)
- 一种基于声纹特征与关联图谱数据的风险用户识别方法,其中,所述方法包括:获取用户的标准语音信息;提取所述标准语音信息的第一声纹特征;将所述第一声纹特征输入至预设关联图谱模型,得到与所述第一声纹特征相关的关联图谱数据;将所述关联图谱数据向量化,得到关联特征向量;判断预设黑声纹库中是否存在与所述第一声纹特征相匹配的声纹特征;以及判断预设黑关系图谱中是否存在与所述关联特征向量相匹配的标签特征向量;若所述预设黑声纹库中存在与所述第一声纹特征相匹配的声纹特征,或者所述预设黑关系图谱中存在与所述关联特征向量相匹配的标签特征向量,确定所述用户为风险用户。
- 如权利要求1所述的基于声纹特征与关联图谱数据的风险用户识别方法,其中,所述获取用户的标准语音信息包括:获取所述用户的原始语音信息;利用模/数转换器对所述原始语音信息进行采样,得到数字语音信号;对所述数字语音信号进行预加重操作,得到数字滤波语音信号;对所述数字滤波语音信号进行分帧加窗操作,得到所述标准语音信息。
- 如权利要求1所述的基于声纹特征与关联图谱数据的风险用户识别方法,其中,所述提取所述标准语音信息的第一声纹特征,包括:将所述标准语音信息进行离散傅里叶变换,得到所述标准语音信息的频谱信息;利用三角滤波器对所述标准语音信息进行三角滤波计算,得到所述标准语音信息的频率响应值;对所述频谱信息和所述频率响应值进行对数计算,得到对数能量;对所述对数能量进行离散余弦计算,得到所述第一声纹特征。
- 如权利要求1至3任一项所述的基于声纹特征与关联图谱数据的风险用户识别方法,其中,所述判断预设黑声纹库中是否存在与所述第一声纹特征相匹配的声纹特征包括:通过相似度函数分别计算所述第一声纹特征与预设黑声纹库中多个声纹特征的第一相似度;若存在大于第一相似度阈值的第一相似度,确定所述预设黑声纹库中存在与所述第一声纹特征相匹配的声纹特征。
- 一种电子设备,其中,所述电子设备包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如下所述的基于声纹特征与关联图谱数据的风险用户识别方法:获取用户的标准语音信息;提取所述标准语音信息的第一声纹特征;将所述第一声纹特征输入至预设关联图谱模型,得到与所述第一声纹特征相关的关联图谱数据;将所述关联图谱数据向量化,得到关联特征向量;判断预设黑声纹库中是否存在与所述第一声纹特征相匹配的声纹特征;以及判断预设黑关系图谱中是否存在与所述关联特征向量相匹配的标签特征向量;若所述预设黑声纹库中存在与所述第一声纹特征相匹配的声纹特征,或者所述预设黑关系图谱中存在与所述关联特征向量相匹配的标签特征向量,确定所述用户为风险用户。
- 如权利要求8所述的电子设备,其中,所述获取用户的标准语音信息包括:获取所述用户的原始语音信息;利用模/数转换器对所述原始语音信息进行采样,得到数字语音信号;对所述数字语音信号进行预加重操作,得到数字滤波语音信号;对所述数字滤波语音信号进行分帧加窗操作,得到所述标准语音信息。
- 如权利要求8所述的电子设备,其中,所述提取所述标准语音信息的第一声纹特征,包括:将所述标准语音信息进行离散傅里叶变换,得到所述标准语音信息的频谱信息;利用三角滤波器对所述标准语音信息进行三角滤波计算,得到所述标准语音信息的频率响应值;对所述频谱信息和所述频率响应值进行对数计算,得到对数能量;对所述对数能量进行离散余弦计算,得到所述第一声纹特征。
- 如权利要求8至10任一项所述的电子设备,其中,所述判断预设黑声纹库中是否存在与所述第一声纹特征相匹配的声纹特征包括:通过相似度函数分别计算所述第一声纹特征与预设黑声纹库中多个声纹特征的第一相似度;若存在大于第一相似度阈值的第一相似度,确定所述预设黑声纹库中存在与所述第一声纹特征相匹配的声纹特征。
- 一种计算机可读存储介质,存储有计算机程序,其中,所述计算机程序被处理器执行时实现如下所述的基于声纹特征与关联图谱数据的风险用户识别方法:获取用户的标准语音信息;提取所述标准语音信息的第一声纹特征;将所述第一声纹特征输入至预设关联图谱模型,得到与所述第一声纹特征相关的关联图谱数据;将所述关联图谱数据向量化,得到关联特征向量;判断预设黑声纹库中是否存在与所述第一声纹特征相匹配的声纹特征;以及判断预设黑关系图谱中是否存在与所述关联特征向量相匹配的标签特征向量;若所述预设黑声纹库中存在与所述第一声纹特征相匹配的声纹特征,或者所述预设黑关系图谱中存在与所述关联特征向量相匹配的标签特征向量,确定所述用户为风险用户。
- 如权利要求14所述的计算机可读存储介质,其中,所述获取用户的标准语音信息包括:获取所述用户的原始语音信息;利用模/数转换器对所述原始语音信息进行采样,得到数字语音信号;对所述数字语音信号进行预加重操作,得到数字滤波语音信号;对所述数字滤波语音信号进行分帧加窗操作,得到所述标准语音信息。
- 如权利要求14所述的计算机可读存储介质,其中,所述提取所述标准语音信息的第一声纹特征,包括:将所述标准语音信息进行离散傅里叶变换,得到所述标准语音信息的频谱信息;利用三角滤波器对所述标准语音信息进行三角滤波计算,得到所述标准语音信息的频率响应值;对所述频谱信息和所述频率响应值进行对数计算,得到对数能量;对所述对数能量进行离散余弦计算,得到所述第一声纹特征。
- 如权利要求14至16任一项所述的计算机可读存储介质,其中,所述判断预设黑声纹库中是否存在与所述第一声纹特征相匹配的声纹特征包括:通过相似度函数分别计算所述第一声纹特征与预设黑声纹库中多个声纹特征的第一相似度;若存在大于第一相似度阈值的第一相似度,确定所述预设黑声纹库中存在与所述第一声纹特征相匹配的声纹特征。
- 一种基于声纹特征与关联图谱数据的风险用户识别装置,其中,所述装置包括:语音信息获取模块,用于获取用户的标准语音信息;声纹特征提取模块,用于提取所述标准语音信息的第一声纹特征;图谱数据获取模块,用于将所述第一声纹特征输入至预设关联图谱模型,得到与所述第一声纹特征相关的关联图谱数据;向量转换模块,用于将所述关联图谱数据向量化,得到关联特征向量;判断模块,用于判断预设黑声纹库中是否存在与所述第一声纹特征相匹配的声纹特征;所述判断模块,还用于判断预设黑关系图谱中是否存在与所述关联特征向量相匹配的标签特征向量;确定模块,用于若所述预设黑声纹库中存在与所述第一声纹特征相匹配的声纹特征,或者所述预设黑关系图谱中存在与所述关联特征向量相匹配的标签特征向量,确定所述用户为风险用户。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010253799.0A CN111552832A (zh) | 2020-04-01 | 2020-04-01 | 基于声纹特征与关联图谱数据的风险用户识别方法、装置 |
CN202010253799.0 | 2020-04-01 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021196477A1 true WO2021196477A1 (zh) | 2021-10-07 |
Family
ID=72004275
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/106017 WO2021196477A1 (zh) | 2020-04-01 | 2020-07-30 | 基于声纹特征与关联图谱数据的风险用户识别方法、装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111552832A (zh) |
WO (1) | WO2021196477A1 (zh) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113393318A (zh) * | 2021-06-10 | 2021-09-14 | 中国工商银行股份有限公司 | 银行卡申请风控方法、装置、电子设备和介质 |
CN113590873A (zh) * | 2021-07-23 | 2021-11-02 | 中信银行股份有限公司 | 白名单声纹特征库的处理方法、装置及电子设备 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9767455B1 (en) * | 2001-07-06 | 2017-09-19 | Hossein Mohsenzadeh | Secure authentication and payment system |
CN109428719A (zh) * | 2017-08-22 | 2019-03-05 | 阿里巴巴集团控股有限公司 | 一种身份验证方法、装置及设备 |
CN110738998A (zh) * | 2019-09-11 | 2020-01-31 | 深圳壹账通智能科技有限公司 | 基于语音的个人信用评估方法、装置、终端及存储介质 |
CN110855740A (zh) * | 2019-09-27 | 2020-02-28 | 深圳市火乐科技发展有限公司 | 信息推送方法及相关设备 |
CN110896352A (zh) * | 2018-09-12 | 2020-03-20 | 阿里巴巴集团控股有限公司 | 身份识别方法、装置和系统 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107481720B (zh) * | 2017-06-30 | 2021-03-19 | 百度在线网络技术(北京)有限公司 | 一种显式声纹识别方法及装置 |
CN107993071A (zh) * | 2017-11-21 | 2018-05-04 | 平安科技(深圳)有限公司 | 电子装置、基于声纹的身份验证方法及存储介质 |
CN110047490A (zh) * | 2019-03-12 | 2019-07-23 | 平安科技(深圳)有限公司 | 声纹识别方法、装置、设备以及计算机可读存储介质 |
CN110767238B (zh) * | 2019-09-19 | 2023-07-04 | 平安科技(深圳)有限公司 | 基于地址信息的黑名单识别方法、装置、设备及存储介质 |
-
2020
- 2020-04-01 CN CN202010253799.0A patent/CN111552832A/zh active Pending
- 2020-07-30 WO PCT/CN2020/106017 patent/WO2021196477A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9767455B1 (en) * | 2001-07-06 | 2017-09-19 | Hossein Mohsenzadeh | Secure authentication and payment system |
CN109428719A (zh) * | 2017-08-22 | 2019-03-05 | 阿里巴巴集团控股有限公司 | 一种身份验证方法、装置及设备 |
CN110896352A (zh) * | 2018-09-12 | 2020-03-20 | 阿里巴巴集团控股有限公司 | 身份识别方法、装置和系统 |
CN110738998A (zh) * | 2019-09-11 | 2020-01-31 | 深圳壹账通智能科技有限公司 | 基于语音的个人信用评估方法、装置、终端及存储介质 |
CN110855740A (zh) * | 2019-09-27 | 2020-02-28 | 深圳市火乐科技发展有限公司 | 信息推送方法及相关设备 |
Also Published As
Publication number | Publication date |
---|---|
CN111552832A (zh) | 2020-08-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021208287A1 (zh) | 用于情绪识别的语音端点检测方法、装置、电子设备及存储介质 | |
US11610394B2 (en) | Neural network model training method and apparatus, living body detecting method and apparatus, device and storage medium | |
WO2022116420A1 (zh) | 语音事件检测方法、装置、电子设备及计算机存储介质 | |
US10522136B2 (en) | Method and device for training acoustic model, computer device and storage medium | |
WO2021000678A1 (zh) | 企业信贷审核方法、装置、设备及计算机可读存储介质 | |
CN109087670B (zh) | 情绪分析方法、系统、服务器及存储介质 | |
CN110619568A (zh) | 风险评估报告的生成方法、装置、设备及存储介质 | |
WO2020238046A1 (zh) | 人声智能检测方法、装置及计算机可读存储介质 | |
CN108550065B (zh) | 评论数据处理方法、装置及设备 | |
WO2021151310A1 (zh) | 语音通话的噪声消除方法、装置、电子设备及存储介质 | |
CN108962231B (zh) | 一种语音分类方法、装置、服务器及存储介质 | |
WO2021196477A1 (zh) | 基于声纹特征与关联图谱数据的风险用户识别方法、装置 | |
CN113903363B (zh) | 基于人工智能的违规行为检测方法、装置、设备及介质 | |
CN109947971B (zh) | 图像检索方法、装置、电子设备及存储介质 | |
CN113807103B (zh) | 基于人工智能的招聘方法、装置、设备及存储介质 | |
WO2020140609A1 (zh) | 一种语音识别方法、设备及计算机可读存储介质 | |
CN113327586A (zh) | 一种语音识别方法、装置、电子设备以及存储介质 | |
CN108847251B (zh) | 一种语音去重方法、装置、服务器及存储介质 | |
CN112489628B (zh) | 语音数据选择方法、装置、电子设备及存储介质 | |
CN116450797A (zh) | 基于多模态对话的情感分类方法、装置、设备及介质 | |
CN116542783A (zh) | 基于人工智能的风险评估方法、装置、设备及存储介质 | |
CN116306656A (zh) | 实体关系抽取方法、装置、设备及存储介质 | |
CN115631748A (zh) | 基于语音对话的情感识别方法、装置、电子设备及介质 | |
CN113555026B (zh) | 语音转换方法、装置、电子设备及介质 | |
CN111985231B (zh) | 无监督角色识别方法、装置、电子设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20928963 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 190123) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20928963 Country of ref document: EP Kind code of ref document: A1 |