CN111552832A - Risk user identification method and device based on voiceprint features and associated map data - Google Patents

Risk user identification method and device based on voiceprint features and associated map data Download PDF

Info

Publication number
CN111552832A
CN111552832A CN202010253799.0A CN202010253799A CN111552832A CN 111552832 A CN111552832 A CN 111552832A CN 202010253799 A CN202010253799 A CN 202010253799A CN 111552832 A CN111552832 A CN 111552832A
Authority
CN
China
Prior art keywords
voiceprint
voice information
user
feature
voiceprint features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010253799.0A
Other languages
Chinese (zh)
Inventor
刘微微
马坤
赵之砚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN202010253799.0A priority Critical patent/CN111552832A/en
Priority to PCT/CN2020/106017 priority patent/WO2021196477A1/en
Publication of CN111552832A publication Critical patent/CN111552832A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating

Abstract

The invention relates to an artificial intelligence technology, and discloses a risk user identification method based on voiceprint features and associated map data, which comprises the following steps: acquiring standard voice information of a user; extracting a first voiceprint feature of the standard voice information; inputting the first voiceprint characteristics to a preset association graph model to obtain association graph data related to the first voiceprint characteristics; vectorizing the associated map data to obtain associated feature vectors; and if the voiceprint features matched with the first voiceprint features exist in the preset black voiceprint library or the label feature vectors matched with the associated feature vectors exist in the preset black relation atlas, determining that the user is a risk user. The invention also provides a risk user identification device based on the voiceprint characteristics and the associated map data, electronic equipment and a computer readable storage medium. The method and the device can reduce the omission factor of the risk user and are beneficial to enhancing the information security.

Description

Risk user identification method and device based on voiceprint features and associated map data
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a risk user identification method and device based on voiceprint features and associated map data, electronic equipment and a computer-readable storage medium.
Background
At present, the increase of information data in an exponential manner is accompanied by the increase of information data, and the necessity of carrying out security verification on user information so as to identify potential risk users is certain. In the prior art, user information is verified mainly based on a single verification method so as to identify a risk user, the method has security holes and is easy to miss detection, and the user information is easy to be stolen by a stealer.
Disclosure of Invention
The invention provides a method and a device for identifying a risk user based on voiceprint characteristics and associated map data, electronic equipment and a computer-readable storage medium, and mainly aims to reduce the omission factor of the risk user and be beneficial to enhancing the information security.
In order to achieve the above object, the present invention provides a method for identifying a risky user based on voiceprint features and associated map data, comprising:
acquiring standard voice information of a user;
extracting a first voiceprint feature of the standard voice information;
inputting the first voiceprint feature into a preset association graph model to obtain association graph data related to the first voiceprint feature;
vectorizing the associated map data to obtain associated feature vectors;
judging whether a preset black voiceprint library has a voiceprint feature matched with the first voiceprint feature; and
judging whether a label feature vector matched with the associated feature vector exists in a preset black relation map;
and if the voiceprint features matched with the first voiceprint features exist in the preset black voiceprint library or the label feature vectors matched with the associated feature vectors exist in the preset black relation atlas, determining that the user is a risk user.
Optionally, the acquiring the standard voice information of the user includes:
acquiring original voice information of the user;
sampling the original voice information by using an analog/digital converter to obtain a digital voice signal;
carrying out pre-emphasis operation on the digital voice signal to obtain a digital filtering voice signal;
and performing frame division and windowing operation on the digital filtering voice signal to obtain the standard voice information.
Optionally, the performing a framing windowing operation on the digitally filtered speech signal includes:
performing frame-division windowing operation on the digital filtering voice signal through an objective function, wherein the objective function is as follows:
Figure BDA0002434564930000021
wherein N is the frame number sequence of the digital filtering voice signal, N is the total frame number of the digital filtering voice signal, and w (N) is the single frame data of the standard voice information.
Optionally, the extracting the first voiceprint feature of the standard voice information includes:
performing discrete Fourier transform on the standard voice information to obtain frequency spectrum information of the standard voice information;
performing triangular filtering calculation on the standard voice information by using a triangular filter to obtain a frequency response value of the standard voice information;
carrying out logarithmic calculation on the frequency spectrum information and the frequency response value to obtain logarithmic energy;
and performing discrete cosine calculation on the logarithmic energy to obtain the first voiceprint characteristic.
Optionally, the discrete fourier transform comprises a calculation function of:
Figure BDA0002434564930000022
wherein, N is the total frame number of the digital filtering voice signal, N is the frame number sequence of the digital filtering voice signal, w (N) is the single frame data of the standard voice information, j is the weight of the fourier transform, k is the sound frequency of the single frame in the digital filtering voice signal, and D is the frequency spectrum information.
Optionally, the determining whether the voiceprint feature matched with the first voiceprint feature exists in the preset black voiceprint library includes:
respectively calculating first similarity of the first voiceprint features and a plurality of voiceprint features in a preset black voiceprint library through a similarity function;
and if the first similarity which is larger than the first similarity threshold exists, determining that the voiceprint features matched with the first voiceprint features exist in the preset black voiceprint library.
Optionally, the similarity function is:
Figure BDA0002434564930000031
wherein x represents the first voiceprint feature and yiRepresenting the voiceprint features in the preset black voiceprint library, n representing the number of the voiceprint features in the preset black voiceprint library, sim (x, y)i) Presentation instrumentThe first similarity.
In order to solve the above problem, the present invention further provides a risk user identification device based on voiceprint features and associated map data, the device comprising:
the voice information acquisition module is used for acquiring standard voice information of a user;
the voiceprint feature extraction module is used for extracting a first voiceprint feature of the standard voice information;
the spectrum data acquisition module is used for inputting the first voiceprint characteristics to a preset associated spectrum model to obtain associated spectrum data related to the first voiceprint characteristics;
the vector conversion module is used for vectorizing the associated map data to obtain associated feature vectors;
the judging module is used for judging whether the preset black voiceprint library has the voiceprint characteristics matched with the first voiceprint characteristics;
the judging module is also used for judging whether a label feature vector matched with the associated feature vector exists in a preset black relation map;
and the determining module is used for determining that the user is a risk user if the voiceprint features matched with the first voiceprint features exist in the preset black voiceprint library or the label feature vectors matched with the associated feature vectors exist in the preset black relation atlas.
In order to solve the above problem, the present invention also provides an electronic device, including:
a memory storing at least one instruction; and
and the processor executes the instructions stored in the memory to realize the risk user identification method based on the voiceprint features and the associated atlas data.
In order to solve the above problem, the present invention further provides a computer-readable storage medium, having at least one instruction stored therein, where the at least one instruction is executed by a processor in an electronic device to implement any one of the above risk user identification methods based on voiceprint features and associated atlas data.
In the embodiment of the invention, standard voice information of a user is acquired; extracting a first voiceprint feature of the standard voice information; inputting the first voiceprint feature into a preset association graph model to obtain association graph data related to the first voiceprint feature; vectorizing the associated map data to obtain associated feature vectors; judging whether a preset black voiceprint library has a voiceprint feature matched with the first voiceprint feature; judging whether a label feature vector matched with the associated feature vector exists in a preset black relation map; and if the voiceprint features matched with the first voiceprint features exist in the preset black voiceprint library or the label feature vectors matched with the associated feature vectors exist in the preset black relation atlas, determining that the user is a risk user. Through double verification of two channels, the missing rate of the users with the risk is reduced, and the information safety is further enhanced.
Drawings
Fig. 1 is a schematic flowchart of a method for identifying a risky user based on voiceprint features and associated map data according to an embodiment of the present invention;
fig. 2 is a schematic block diagram of a risk user identification apparatus based on voiceprint features and associated map data according to an embodiment of the present invention;
fig. 3 is a schematic internal structural diagram of an electronic device implementing a risk user identification method based on voiceprint features and associated map data according to an embodiment of the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides a risk user identification method based on voiceprint features and associated map data. Referring to fig. 1, a schematic flow chart of a risk user identification method based on voiceprint features and associated map data according to an embodiment of the present invention is shown. The method may be performed by an apparatus, which may be implemented by software and/or hardware.
In this embodiment, the method for identifying a risk user based on voiceprint features and associated map data includes:
and S1, acquiring the standard voice information of the user.
In this embodiment, the standard voice information of the user may be acquired from a voice database.
Further, the acquiring the standard voice information of the user includes:
acquiring original voice information of the user;
sampling the original voice information by using an analog/digital converter to obtain a digital voice signal;
carrying out pre-emphasis operation on the digital voice signal to obtain a digital filtering voice signal;
and performing frame division and windowing operation on the digital filtering voice signal to obtain the standard voice information.
In this embodiment, the original voice information of the user is audio information including a voice of the user, and the original voice may be voice information acquired in a voice call with the user.
For example, when a bank loan officer audits the credit of a loan user, the bank loan officer obtains the recording of the call voice between the loan auditor and the loan user, and the recording is the original voice information.
In detail, the sampling of the original voice information is to convert the original voice information into a digital signal, which is convenient for processing the voice information.
In this embodiment, the original voice information is sampled at a rate of ten thousand times per second by using the analog/digital converter, and the state of the original voice information at a certain time is recorded in each sampling, so that voice digital voice signals at different times can be obtained.
Since the human voice pronunciation system can suppress the high frequency part, in this embodiment, the pre-emphasis operation is performed in the above manner, so that the energy of the high frequency part can be increased, the voice energy of the high frequency part and the voice energy of the low frequency part have similar amplitudes, the frequency spectrum of the signal becomes flat, and the same signal-to-noise ratio can be maintained in the whole frequency band from the low frequency to the high frequency.
In this embodiment, the pre-emphasis operation may compensate the digital speech signal.
Specifically, the pre-emphasis operation may be calculated by y (t) ═ x (t) — μ x (t-1), where x (t) is a digital speech signal, t is time, y (t) is the digital filtered speech signal, μ is an adjustment value of the pre-emphasis operation, and μ has a value range of [0.9,1.0 ].
In this embodiment, the frame windowing is performed to remove the overlapping portion of the speech in the digitally filtered speech signal.
For example: when the bank lending person audits the telephone credit of the loan user, the original voice information has the voice overlapping part of the bank lending person and the loan user, so the voice of the bank lending person can be removed by adopting the frame windowing operation, and the voice of the loan user is reserved.
Further, the performing a framing windowing operation on the digitally filtered speech signal comprises:
performing frame-division windowing operation on the digital filtering voice signal through an objective function, wherein the objective function is as follows:
Figure BDA0002434564930000061
wherein N is the frame number sequence of the digital filtering voice signal, N is the total frame number of the digital filtering voice signal, and w (N) is the single frame data of the standard voice information, i.e. w (N) represents the standard voice information of each frame.
And S2, extracting a first voiceprint feature of the standard voice information.
In detail, extracting a first voiceprint feature of the standard speech information includes:
performing discrete Fourier transform on the standard voice information to obtain frequency spectrum information of the standard voice information;
performing triangular filtering calculation on the standard voice information by using a triangular filter to obtain a frequency response value of the standard voice information;
carrying out logarithmic calculation on the frequency spectrum information and the frequency response value to obtain logarithmic energy;
and performing discrete cosine calculation on the logarithmic energy to obtain the first voiceprint characteristic.
Preferably, the discrete fourier transform comprises the computational function:
Figure BDA0002434564930000062
wherein, N is the total frame number of the digital filtering speech signal, N is the frame number sequence of the digital filtering speech signal, w (N) is the single frame data of the standard speech information, i.e. w (N) represents the standard speech information of each frame, j is the weight of the fourier transform, k is the sound frequency of the single frame in the digital filtering speech signal, and D is the frequency spectrum information.
Preferably, in this embodiment, a filter bank having M filters (the filters may be triangular filters) is defined, the center frequency of the filter is f (i), i is 1, 2, …, M, the center frequency is the cut-off frequency of the filter, and the triangular filtering calculation is performed by the triangular filters.
Because the triangular filter can smooth the frequency spectrum and eliminate the effect of harmonic waves, the formants of the sound are highlighted. Therefore, the tone or pitch of a segment of sound is not reflected in the voiceprint feature, i.e. the voiceprint feature is not affected by the difference of the tone of the input sound.
Preferably, the triangular filtering is calculated as follows:
Figure BDA0002434564930000071
where f (i) is the center frequency of the triangular filter, i is the set of triangular filters, h (k) is the frequency response value, and k is the sound frequency of a single frame in the digitally filtered speech signal, i.e., k can represent the sound frequency of each frame.
Further, the log transform is to compute the log energy of each filter bank output.
Generally, the response of a person to sound pressure is logarithmic, and the sensitivity of the person to fine changes of high sound pressure is not as good as that of low sound pressure. Therefore, the use of logarithms in the present embodiment can reduce the sensitivity of the extracted features to variations in the energy of the input sound.
Specifically, the logarithmic calculation can be performed by the following formula:
Figure BDA0002434564930000072
wherein i is a group of the triangular filters, k is a sound frequency of a single frame of the original voice information, N is a total frame number of the digitally filtered voice signal, N is a frame number sequence of the digitally filtered voice signal, D is spectrum information, and s (i) is a logarithmic energy output by each filter.
Preferably, s (i) is subjected to a discrete cosine transform to obtain the voiceprint features, the discrete cosine transform being as follows:
Figure BDA0002434564930000073
wherein n is the frame number sequence of the original voice information, i is the group of the triangular filters, M is the total group number of the triangular filters, S (i) is the logarithmic energy output by each filter, and x is the voiceprint feature.
Further, in another embodiment of the present invention, the extracting the first feature of the standard voice information includes:
and extracting the first characteristic of the standard voice information by using an LSTM (Long Short-Term Memory) network. The LSTM has three "gate" structures, which are a forgetting gate (forget gate), an input gate (input gate), and an output gate (output gate), and is used to perform different processing on input information. The forgetting gate has the advantages that part of information passing through as the name implies is forgotten from the neural unit, so that part of the voice features of the previous frame disappear in transmission, and the training can not be carried out in the next neural unit; the input gate is used for adding new useful information into the state of the neural unit, namely adding the newly learned speech features of the frame into the transmitted information after processing; and finally, the output gate is used for outputting information based on the state of the nerve unit and the processed information, and finally obtaining the output information at the moment as the first voiceprint characteristic according to the output at the previous moment and the information to be output in the input at the moment.
And S3, inputting the first voiceprint feature into a preset association map model to obtain association map data related to the first voiceprint feature.
In this embodiment, the associated map data related to the first voiceprint feature may include, but is not limited to, user tag data corresponding to the first voiceprint feature and a dialing record corresponding to the first voiceprint feature. Specifically, the user tag data includes attribute feature data of the user, such as: gender, age, location, job data, etc.
In detail, in this embodiment, the association graph model may be constructed by using a convolutional neural network, and the association graph model is completed by using the sample voiceprint features as a training set and using the sample voiceprint features marked by the user tag data as a tag set.
For example: inputting a first voiceprint feature of a certain user into a preset association graph model to obtain association graph data related to the first voiceprint feature, such as information (name, gender, age, region, work, and the like) of the user corresponding to the first voiceprint feature, or historical dialing time and times corresponding to the first voiceprint feature.
And S4, vectorizing the associated map data to obtain associated feature vectors.
In detail, vectorization is performed by the following expression:
Figure BDA0002434564930000081
wherein i represents the number of the associated map data, viN-dimensional matrix vector, v, representing associated atlas data ijIs the jth element of the N-dimensional matrix vector.
S5, judging whether a preset black voiceprint library has a voiceprint feature matched with the first voiceprint feature; and judging whether a label feature vector matched with the associated feature vector exists in a preset black relation map.
In detail, the determining whether the voiceprint feature matched with the first voiceprint feature exists in the preset black voiceprint library includes: respectively calculating first similarity of the first voiceprint features and a plurality of voiceprint features in a preset black voiceprint library through a similarity function; and if the first similarity which is larger than the first similarity threshold exists, determining that the voiceprint features matched with the first voiceprint features exist in the preset black voiceprint library.
Or, the judging whether the voiceprint features matched with the first voiceprint features exist in a preset black voiceprint library comprises: and performing similarity calculation on the first voiceprint features and voiceprint features in a preset black voiceprint library to obtain a first similarity set, wherein the maximum value in the first similarity set is a first target similarity, and if the first target similarity is greater than a first similarity threshold, determining that voiceprint features matched with the first voiceprint features exist in the preset black voiceprint library.
In this embodiment, the blacklist voiceprint library is a voiceprint database obtained by extracting a voiceprint feature vector of a voice of a blacklist person.
For example, the blacklisted voiceprint library contains voiceprint characteristics of a distrusted person at a bank and/or a library of criminal voiceprint characteristics of a public security department.
Further, the similarity function is:
Figure BDA0002434564930000091
wherein x represents the first voiceprint feature and yiRepresenting the voiceprint characteristics in the preset black voiceprint library, n representing the preset black voiceNumber of voiceprint features in the print library, sim (x, y)i) Representing the first similarity.
Similarly, judging whether a label feature vector matched with the associated feature vector exists in a preset black relation map: respectively calculating second similarity of the associated feature vectors and a plurality of label feature vectors in a preset black relation map through a similarity function; and if the second similarity which is larger than the second similarity threshold exists, determining that the voiceprint features matched with the associated feature vectors exist in the preset black relational graph.
Or, judging whether a label feature vector matched with the associated feature vector exists in a preset black relation map: and performing similarity calculation on the associated feature vectors and the label feature vectors in the preset black relational graph to obtain a second similarity set, wherein the maximum value in the second similarity set is the similarity of a second target, and if the similarity of the second target is greater than a second similarity threshold, determining that the label feature vectors matched with the associated feature vectors exist in the preset black relational graph.
In this embodiment, the black-relation graph database is obtained by extracting the tag feature vectors of the tag data of the blacklist people, and therefore, the black-relation graph database includes the tag feature vectors of the tag data of the blacklist people.
In this embodiment, the second similarity threshold may be the same as or different from the first similarity threshold, and the second similarity threshold may be greater than the first similarity threshold, and may also be smaller than the first similarity threshold. For example, the first similarity threshold is 80%, and the second similarity threshold is 90%.
S6, if the voiceprint features matched with the first voiceprint features exist in the preset black voiceprint library or the label feature vectors matched with the associated feature vectors exist in the preset black relation map, determining that the user is a risk user.
If the voiceprint features matched with the first voiceprint features exist in the preset black voiceprint library or the label feature vectors matched with the associated feature vectors exist in the preset black relation atlas, the user is identified as a risk user, the risk user can be identified more comprehensively, and the condition that the risk user is missed to be detected due to single verification is reduced.
And further, if the user is determined to be a risk user, sending a risk user reminding message.
In the embodiment of the invention, standard voice information of a user is acquired; extracting a first voiceprint feature of the standard voice information; inputting the first voiceprint feature into a preset association graph model to obtain association graph data related to the first voiceprint feature; vectorizing the associated map data to obtain associated feature vectors; judging whether a preset black voiceprint library has a voiceprint feature matched with the first voiceprint feature; judging whether a label feature vector matched with the associated feature vector exists in a preset black relation map; and if the voiceprint features matched with the first voiceprint features exist in the preset black voiceprint library or the label feature vectors matched with the associated feature vectors exist in the preset black relation atlas, determining that the user is a risk user. Through double verification of two channels, the missing rate of the users with the risk is reduced, and the information safety is further enhanced.
Fig. 2 is a functional block diagram of the risk user identification apparatus based on voiceprint features and associated map data according to the present invention.
The risk user identification device 100 based on the voiceprint characteristics and the associated map data can be installed in an electronic device. According to the realized functions, the risk user identification device based on the voiceprint features and the associated graph data can comprise a voice information acquisition module 101, a voiceprint feature extraction module 102, a graph data acquisition module 103, a vector conversion module 104, a judgment module 105 and a determination module 106. A module according to the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the voice information obtaining module 101 is configured to obtain standard voice information of a user.
In this embodiment, the standard voice information of the user may be acquired from a voice database.
Further, the acquiring the standard voice information of the user includes:
acquiring original voice information of the user;
sampling the original voice information by using an analog/digital converter to obtain a digital voice signal;
carrying out pre-emphasis operation on the digital voice signal to obtain a digital filtering voice signal;
and performing frame division and windowing operation on the digital filtering voice signal to obtain the standard voice information.
In this embodiment, the original voice information of the user is audio information including a voice of the user, and the original voice may be voice information acquired in a voice call with the user.
In detail, the sampling of the original voice information is to convert the original voice information into a digital signal, which is convenient for processing the voice information.
In this embodiment, the original voice information is sampled at a rate of ten thousand times per second by using the analog/digital converter, and the state of the original voice information at a certain time is recorded in each sampling, so that voice digital voice signals at different times can be obtained.
Since the human voice pronunciation system can suppress the high frequency part, in this embodiment, the pre-emphasis operation is performed in the above manner, so that the energy of the high frequency part can be increased, the voice energy of the high frequency part and the voice energy of the low frequency part have similar amplitudes, the frequency spectrum of the signal becomes flat, and the same signal-to-noise ratio can be maintained in the whole frequency band from the low frequency to the high frequency.
In this embodiment, the pre-emphasis operation may compensate the digital speech signal.
Specifically, the pre-emphasis operation may be calculated by y (t) ═ x (t) — μ x (t-1), where x (t) is a digital speech signal, t is time, y (t) is the digital filtered speech signal, μ is an adjustment value of the pre-emphasis operation, and μ has a value range of [0.9,1.0 ].
In this embodiment, the frame windowing is performed to remove the overlapping portion of the speech in the digitally filtered speech signal.
Further, the performing a framing windowing operation on the digitally filtered speech signal comprises:
performing frame-division windowing operation on the digital filtering voice signal through an objective function, wherein the objective function is as follows:
Figure BDA0002434564930000121
wherein N is the frame number sequence of the digital filtering voice signal, N is the total frame number of the digital filtering voice signal, and w (N) is the single frame data of the standard voice information, i.e. w (N) represents the standard voice information of each frame.
The voiceprint feature extraction module 102 is configured to extract a first voiceprint feature of the standard voice information.
In detail, extracting a first voiceprint feature of the standard speech information includes:
performing discrete Fourier transform on the standard voice information to obtain frequency spectrum information of the standard voice information;
performing triangular filtering calculation on the standard voice information by using a triangular filter to obtain a frequency response value of the standard voice information;
carrying out logarithmic calculation on the frequency spectrum information and the frequency response value to obtain logarithmic energy;
and performing discrete cosine calculation on the logarithmic energy to obtain the first voiceprint characteristic.
Preferably, the discrete fourier transform comprises the computational function:
Figure BDA0002434564930000122
wherein, N is the total frame number of the digital filtering speech signal, N is the frame number sequence of the digital filtering speech signal, w (N) is the single frame data of the standard speech information, i.e. w (N) represents the standard speech information of each frame, j is the weight of the fourier transform, k is the sound frequency of the single frame in the digital filtering speech signal, and D is the frequency spectrum information.
Preferably, in this embodiment, a filter bank having M filters (the filters may be triangular filters) is defined, the center frequency of the filter is f (i), i is 1, 2, …, M, the center frequency is the cut-off frequency of the filter, and the triangular filtering calculation is performed by the triangular filters.
Because the triangular filter can smooth the frequency spectrum and eliminate the effect of harmonic waves, the formants of the sound are highlighted. Therefore, the tone or pitch of a segment of sound is not reflected in the voiceprint feature, i.e. the voiceprint feature is not affected by the difference of the tone of the input sound.
Preferably, the triangular filtering is calculated as follows:
Figure BDA0002434564930000131
where f (i) is the center frequency of the triangular filter, i is the set of triangular filters, h (k) is the frequency response value, and k is the sound frequency of a single frame in the digitally filtered speech signal, i.e., k can represent the sound frequency of each frame.
Further, the log transform is to compute the log energy of each filter bank output.
Generally, the response of a person to sound pressure is logarithmic, and the sensitivity of the person to fine changes of high sound pressure is not as good as that of low sound pressure. Therefore, the use of logarithms in the present embodiment can reduce the sensitivity of the extracted features to variations in the energy of the input sound.
Specifically, the logarithmic calculation can be performed by the following formula:
Figure BDA0002434564930000132
wherein i is a group of the triangular filters, k is a sound frequency of a single frame of the original voice information, N is a total frame number of the digitally filtered voice signal, N is a frame number sequence of the digitally filtered voice signal, D is spectrum information, and s (i) is a logarithmic energy output by each filter.
Preferably, s (i) is subjected to a discrete cosine transform to obtain the voiceprint features, the discrete cosine transform being as follows:
Figure BDA0002434564930000133
wherein n is the frame number sequence of the original voice information, i is the group of the triangular filters, M is the total group number of the triangular filters, S (i) is the logarithmic energy output by each filter, and x is the voiceprint feature.
Further, in another embodiment of the present invention, the extracting the first feature of the standard voice information includes:
and extracting the first characteristic of the standard voice information by using an LSTM (Long Short-Term Memory) network. The LSTM has three "gate" structures, which are a forgetting gate (forget gate), an input gate (input gate), and an output gate (output gate), and is used to perform different processing on input information. The forgetting gate has the advantages that part of information passing through as the name implies is forgotten from the neural unit, so that part of the voice features of the previous frame disappear in transmission, and the training can not be carried out in the next neural unit; the input gate is used for adding new useful information into the state of the neural unit, namely adding the newly learned speech features of the frame into the transmitted information after processing; and finally, the output gate is used for outputting information based on the state of the nerve unit and the processed information, and finally obtaining the output information at the moment as the first voiceprint characteristic according to the output at the previous moment and the information to be output in the input at the moment.
The atlas data obtaining module 103 is configured to input the first voiceprint feature to a preset associated atlas model, so as to obtain associated atlas data associated with the first voiceprint feature.
In this embodiment, the associated map data related to the first voiceprint feature may include, but is not limited to, user tag data corresponding to the first voiceprint feature and a dialing record corresponding to the first voiceprint feature. Specifically, the user tag data includes attribute feature data of the user, such as: gender, age, location, job data, etc.
In detail, in this embodiment, the association graph model may be constructed by using a convolutional neural network, and the association graph model is completed by using the sample voiceprint features as a training set and using the sample voiceprint features marked by the user tag data as a tag set.
The vector conversion module 104 is configured to vectorize the associated map data to obtain an associated feature vector.
In detail, vectorization is performed by the following expression:
Figure BDA0002434564930000141
wherein i represents the number of the associated map data, viN-dimensional matrix vector, v, representing associated atlas data ijIs the jth element of the N-dimensional matrix vector.
The judging module 105 is configured to judge whether a preset black voiceprint library has a voiceprint feature matched with the first voiceprint feature; the judging module is further configured to judge whether a label feature vector matched with the associated feature vector exists in a preset black relation map.
In detail, the determining whether the voiceprint feature matched with the first voiceprint feature exists in the preset black voiceprint library includes: respectively calculating first similarity of the first voiceprint features and a plurality of voiceprint features in a preset black voiceprint library through a similarity function; and if the first similarity which is larger than the first similarity threshold exists, determining that the voiceprint features matched with the first voiceprint features exist in the preset black voiceprint library.
Or, the judging whether the voiceprint features matched with the first voiceprint features exist in a preset black voiceprint library comprises: and performing similarity calculation on the first voiceprint features and voiceprint features in a preset black voiceprint library to obtain a first similarity set, wherein the maximum value in the first similarity set is a first target similarity, and if the first target similarity is greater than a first similarity threshold, determining that voiceprint features matched with the first voiceprint features exist in the preset black voiceprint library.
In this embodiment, the blacklist voiceprint library is a voiceprint database obtained by extracting a voiceprint feature vector of a voice of a blacklist person.
For example, the blacklisted voiceprint library contains voiceprint characteristics of a distrusted person at a bank and/or a library of criminal voiceprint characteristics of a public security department.
Further, the similarity function is:
Figure BDA0002434564930000151
wherein x represents the first voiceprint feature and yiRepresenting the voiceprint features in the preset black voiceprint library, n representing the number of the voiceprint features in the preset black voiceprint library, sim (x, y)i) Representing the first similarity.
Similarly, judging whether a label feature vector matched with the associated feature vector exists in a preset black relation map: respectively calculating second similarity of the associated feature vectors and a plurality of label feature vectors in a preset black relation map through a similarity function; and if the second similarity which is larger than the second similarity threshold exists, determining that the voiceprint features matched with the associated feature vectors exist in the preset black relational graph.
Or, judging whether a label feature vector matched with the associated feature vector exists in a preset black relation map: and performing similarity calculation on the associated feature vectors and the label feature vectors in the preset black relational graph to obtain a second similarity set, wherein the maximum value in the second similarity set is the similarity of a second target, and if the similarity of the second target is greater than a second similarity threshold, determining that the label feature vectors matched with the associated feature vectors exist in the preset black relational graph.
In this embodiment, the black-relation graph database is obtained by extracting the tag feature vectors of the tag data of the blacklist people, and therefore, the black-relation graph database includes the tag feature vectors of the tag data of the blacklist people.
The determining module 106 is configured to determine that the user is a risk user if a voiceprint feature matched with the first voiceprint feature exists in the preset black voiceprint library or a label feature vector matched with the associated feature vector exists in the preset black relation map.
If the voiceprint features matched with the first voiceprint features exist in the preset black voiceprint library or the label feature vectors matched with the associated feature vectors exist in the preset black relation atlas, the user is identified as a risk user, and the risk user can be identified more comprehensively and accurately.
And further, if the user is determined to be a risk user, sending a risk user reminding message.
Fig. 3 is a schematic structural diagram of an electronic device implementing a risk user identification method based on voiceprint features and associated atlas data according to the present invention.
The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of a risk user identification program based on voiceprint features and associated map data, but also to temporarily store data that has been output or is to be output.
The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the whole electronic device by using various interfaces and lines, and executes various functions and processing data of the electronic device 1 by running or executing programs or modules stored in the memory 11 (for example, executing a risk user identification program based on voiceprint features and associated map data, etc.), and calling data stored in the memory 11.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 12 or the like.
Fig. 3 shows only an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.
Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The risk user identification program 12 stored in the memory 11 of the electronic device 1 based on voiceprint features and associated atlas data is a combination of instructions that, when executed in the processor 10, may implement:
acquiring standard voice information of a user;
extracting a first voiceprint feature of the standard voice information;
inputting the first voiceprint feature into a preset association graph model to obtain association graph data related to the first voiceprint feature;
vectorizing the associated map data to obtain associated feature vectors;
judging whether a preset black voiceprint library has a voiceprint feature matched with the first voiceprint feature; and
judging whether a label feature vector matched with the associated feature vector exists in a preset black relation map;
and if the voiceprint features matched with the first voiceprint features exist in the preset black voiceprint library or the label feature vectors matched with the associated feature vectors exist in the preset black relation atlas, determining that the user is a risk user.
Specifically, the specific implementation method of the processor 10 for the instruction may refer to the description of the relevant steps in the embodiment corresponding to fig. 1, which is not described herein again.
Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A risk user identification method based on voiceprint features and associated atlas data is characterized by comprising the following steps:
acquiring standard voice information of a user;
extracting a first voiceprint feature of the standard voice information;
inputting the first voiceprint feature into a preset association graph model to obtain association graph data related to the first voiceprint feature;
vectorizing the associated map data to obtain associated feature vectors;
judging whether a preset black voiceprint library has a voiceprint feature matched with the first voiceprint feature; and
judging whether a label feature vector matched with the associated feature vector exists in a preset black relation map;
and if the voiceprint features matched with the first voiceprint features exist in the preset black voiceprint library or the label feature vectors matched with the associated feature vectors exist in the preset black relation atlas, determining that the user is a risk user.
2. The method for identifying a risky user based on voiceprint features and associated atlas data according to claim 1, wherein the obtaining of the standard voice information of the user comprises:
acquiring original voice information of the user;
sampling the original voice information by using an analog/digital converter to obtain a digital voice signal;
carrying out pre-emphasis operation on the digital voice signal to obtain a digital filtering voice signal;
and performing frame division and windowing operation on the digital filtering voice signal to obtain the standard voice information.
3. The method of claim 2, wherein the framing windowing the digitally filtered speech signal comprises:
performing frame-division windowing operation on the digital filtering voice signal through an objective function, wherein the objective function is as follows:
Figure FDA0002434564920000011
wherein N is the frame number sequence of the digital filtering voice signal, N is the total frame number of the digital filtering voice signal, and w (N) is the single frame data of the standard voice information.
4. The method for identifying a risky user based on voiceprint features and associated atlas data according to claim 1, wherein the extracting the first voiceprint features of the standard speech information comprises:
performing discrete Fourier transform on the standard voice information to obtain frequency spectrum information of the standard voice information;
performing triangular filtering calculation on the standard voice information by using a triangular filter to obtain a frequency response value of the standard voice information;
carrying out logarithmic calculation on the frequency spectrum information and the frequency response value to obtain logarithmic energy;
and performing discrete cosine calculation on the logarithmic energy to obtain the first voiceprint characteristic.
5. The method for at risk user identification based on voiceprint features and associated atlas data of claim 4, wherein the discrete Fourier transform comprises a computational function of:
Figure FDA0002434564920000021
wherein, N is the total frame number of the digital filtering voice signal, N is the frame number sequence of the digital filtering voice signal, w (N) is the single frame data of the standard voice information, j is the weight of the fourier transform, k is the sound frequency of the single frame in the digital filtering voice signal, and D is the frequency spectrum information.
6. The method for identifying a risky user based on voiceprint features and associated atlas data according to any one of claims 1 to 3, wherein the judging whether the voiceprint features matched with the first voiceprint features exist in the preset black voiceprint library comprises:
respectively calculating first similarity of the first voiceprint features and a plurality of voiceprint features in a preset black voiceprint library through a similarity function;
and if the first similarity which is larger than the first similarity threshold exists, determining that the voiceprint features matched with the first voiceprint features exist in the preset black voiceprint library.
7. The method of claim 6, wherein the similarity function is:
Figure FDA0002434564920000022
wherein x represents the first voiceprint feature and yiRepresenting the voiceprint features in the preset black voiceprint library, n representing the number of the voiceprint features in the preset black voiceprint library, sim (x, y)i) Representing the first similarity.
8. An apparatus for identifying a risky user based on voiceprint features and associated atlas data, the apparatus comprising:
the voice information acquisition module is used for acquiring standard voice information of a user;
the voiceprint feature extraction module is used for extracting a first voiceprint feature of the standard voice information;
the spectrum data acquisition module is used for inputting the first voiceprint characteristics to a preset associated spectrum model to obtain associated spectrum data related to the first voiceprint characteristics;
the vector conversion module is used for vectorizing the associated map data to obtain associated feature vectors;
the judging module is used for judging whether the preset black voiceprint library has the voiceprint characteristics matched with the first voiceprint characteristics;
the judging module is also used for judging whether a label feature vector matched with the associated feature vector exists in a preset black relation map;
and the determining module is used for determining that the user is a risk user if the voiceprint features matched with the first voiceprint features exist in the preset black voiceprint library or the label feature vectors matched with the associated feature vectors exist in the preset black relation atlas.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of at risk user identification based on voiceprint features and associated atlas data as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium, storing a computer program, wherein the computer program, when executed by a processor, implements a method for at risk user identification based on voiceprint features and associated atlas data as recited in any one of claims 1 to 7.
CN202010253799.0A 2020-04-01 2020-04-01 Risk user identification method and device based on voiceprint features and associated map data Pending CN111552832A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010253799.0A CN111552832A (en) 2020-04-01 2020-04-01 Risk user identification method and device based on voiceprint features and associated map data
PCT/CN2020/106017 WO2021196477A1 (en) 2020-04-01 2020-07-30 Risk user identification method and apparatus based on voiceprint characteristics and associated graph data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010253799.0A CN111552832A (en) 2020-04-01 2020-04-01 Risk user identification method and device based on voiceprint features and associated map data

Publications (1)

Publication Number Publication Date
CN111552832A true CN111552832A (en) 2020-08-18

Family

ID=72004275

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010253799.0A Pending CN111552832A (en) 2020-04-01 2020-04-01 Risk user identification method and device based on voiceprint features and associated map data

Country Status (2)

Country Link
CN (1) CN111552832A (en)
WO (1) WO2021196477A1 (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8346659B1 (en) * 2001-07-06 2013-01-01 Hossein Mohsenzadeh Secure authentication and payment system
CN115766031A (en) * 2017-08-22 2023-03-07 创新先进技术有限公司 Identity verification method, device and equipment
CN110896352B (en) * 2018-09-12 2022-07-08 阿里巴巴集团控股有限公司 Identity recognition method, device and system
CN110738998A (en) * 2019-09-11 2020-01-31 深圳壹账通智能科技有限公司 Voice-based personal credit evaluation method, device, terminal and storage medium
CN110855740B (en) * 2019-09-27 2021-03-19 深圳市火乐科技发展有限公司 Information pushing method and related equipment

Also Published As

Publication number Publication date
WO2021196477A1 (en) 2021-10-07

Similar Documents

Publication Publication Date Title
US11610394B2 (en) Neural network model training method and apparatus, living body detecting method and apparatus, device and storage medium
CN110619568A (en) Risk assessment report generation method, device, equipment and storage medium
US10817707B2 (en) Attack sample generating method and apparatus, device and storage medium
CN110020009B (en) Online question and answer method, device and system
CN111523389A (en) Intelligent emotion recognition method and device, electronic equipment and storage medium
CN112447189A (en) Voice event detection method and device, electronic equipment and computer storage medium
CN110633991A (en) Risk identification method and device and electronic equipment
CN111754982A (en) Noise elimination method and device for voice call, electronic equipment and storage medium
CN113064994A (en) Conference quality evaluation method, device, equipment and storage medium
CN112560453A (en) Voice information verification method and device, electronic equipment and medium
CN113807103A (en) Recruitment method, device, equipment and storage medium based on artificial intelligence
CN112233700A (en) Audio-based user state identification method and device and storage medium
CN113327586A (en) Voice recognition method and device, electronic equipment and storage medium
CN113568934B (en) Data query method and device, electronic equipment and storage medium
CN112738338A (en) Telephone recognition method, device, equipment and medium based on deep learning
CN113903363B (en) Violation behavior detection method, device, equipment and medium based on artificial intelligence
CN111552832A (en) Risk user identification method and device based on voiceprint features and associated map data
CN111985231B (en) Unsupervised role recognition method and device, electronic equipment and storage medium
US20210019553A1 (en) Information processing apparatus, control method, and program
CN113808577A (en) Intelligent extraction method and device of voice abstract, electronic equipment and storage medium
CN113704430A (en) Intelligent auxiliary receiving method and device, electronic equipment and storage medium
CN113990313A (en) Voice control method, device, equipment and storage medium
CN113221990A (en) Information input method and device and related equipment
CN113228164A (en) Safety early warning method and device based on voice recognition and terminal equipment
CN111753872A (en) Method, device, equipment and storage medium for analyzing association of serial and parallel cases

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination