CN104992095B - Information Authentication method and system - Google Patents

Information Authentication method and system Download PDF

Info

Publication number
CN104992095B
CN104992095B CN201510367441.XA CN201510367441A CN104992095B CN 104992095 B CN104992095 B CN 104992095B CN 201510367441 A CN201510367441 A CN 201510367441A CN 104992095 B CN104992095 B CN 104992095B
Authority
CN
China
Prior art keywords
audio
audio clip
identified
feature data
segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510367441.XA
Other languages
Chinese (zh)
Other versions
CN104992095A (en
Inventor
宋辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201510367441.XA priority Critical patent/CN104992095B/en
Publication of CN104992095A publication Critical patent/CN104992095A/en
Application granted granted Critical
Publication of CN104992095B publication Critical patent/CN104992095B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2133Verifying human interaction, e.g., Captcha

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The present invention proposes a kind of Information Authentication method and system, which includes obtaining audio-frequency fragments to be identified;Feature extraction is carried out to the audio-frequency fragments to be identified, obtains the corresponding characteristic of the audio-frequency fragments to be identified;According to the characteristic and the data model being generated in advance, verification result is obtained.This method can improve the safety of Information Authentication and property easy to use.

Description

Information verification method and system
Technical Field
The invention relates to the technical field of information security processing, in particular to an information verification method and an information verification system.
Background
In order to secure a user account, authentication is typically required when using the user account. For example, in the payment stage, the payment server issues a verification code to a mobile phone number reserved by the user through the short message server, the user fills in a corresponding position after receiving the verification code, the verification code is uploaded to the payment server after clicking confirmation, and the payment server enters the next payment process after verification of the verification code is completed.
Although the scheme improves the payment security to a certain extent, the method is limited by some defects of a short message mode, such as easy stealing, need of reading and inputting, and the like, and causes the problems of low security, insufficient convenience in operation and the like in the actual use of the short message verification code mode.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, an object of the present invention is to provide an information verification method, which can improve the security and convenience of information verification.
Another object of the present invention is to provide an information verification system.
In order to achieve the above object, an information verification method provided in an embodiment of a first aspect of the present invention includes: acquiring an audio clip to be identified; extracting the characteristics of the audio clip to be identified to obtain characteristic data corresponding to the audio clip to be identified; and obtaining a verification result according to the characteristic data and a pre-generated data model.
According to the information verification method provided by the embodiment of the first aspect of the invention, the audio-based verification can be realized by extracting the characteristics of the audio clip to be recognized and verifying according to the extracted characteristic data, so that some problems in a short message verification mode are avoided, and the safety and the use convenience of information verification are improved.
In order to achieve the above object, an information verification system according to an embodiment of a second aspect of the present invention includes: the acquisition module is used for acquiring the audio clip to be identified; the extraction module is used for extracting the characteristics of the audio clip to be identified and acquiring the characteristic data corresponding to the audio clip to be identified; and the verification module is used for acquiring a verification result according to the characteristic data and a pre-generated data model.
The information verification system provided by the embodiment of the second aspect of the invention can realize audio-based verification by extracting the characteristics of the audio clip to be recognized and verifying according to the extracted characteristic data, thereby avoiding some problems in a short message verification mode and improving the safety and the use convenience of information verification.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic flow chart of an information verification method according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating an information verification method according to another embodiment of the present invention;
FIG. 3 is a flow chart illustrating an information verification method according to another embodiment of the present invention;
FIG. 4 is a schematic structural diagram of an information verification system according to another embodiment of the present invention;
fig. 5 is a schematic structural diagram of an information verification system according to another embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar modules or modules having the same or similar functionality throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention. On the contrary, the embodiments of the invention include all changes, modifications and equivalents coming within the spirit and terms of the claims appended hereto.
Fig. 1 is a schematic flow chart of an information verification method according to an embodiment of the present invention, where the method includes:
s11: and acquiring the audio clip to be identified.
Different from the method of sending the verification code by using the short message in the prior art, the verification is performed by using an audio mode in the embodiment.
Optionally, the acquiring the audio segment to be recognized includes:
and the client records the audio clip transmitted in the telephone channel to obtain the audio clip to be identified.
The client may be specifically located in the mobile phone, for example, the user may perform verification based on the audio clip before using the mobile phone to pay. At this time, the audio clip transmitted in the telephone channel of the mobile phone can be recorded to obtain the audio clip to be identified.
By recording the audio clip in the telephone channel, the audio clip to be identified can be unrelated to the acoustic environment around the user and can not be influenced by environmental noise, so that the precision of the audio clip to be identified is improved, and the verification accuracy is improved.
Optionally, the recording, by the client, the audio segment transmitted in the telephone channel to obtain the audio segment to be identified includes:
after detecting the incoming call, the client automatically answers the call and starts a recording module;
and after receiving the call, the client receives the audio clip in the telephone channel and records the audio clip by adopting the recording module.
The automatic answering and starting of the recording module can be carried out synchronously.
In the embodiment, by automatically answering the call and starting the recording module, the efficiency can be improved in a mode of manually waiting and inputting the verification code in the short message without human participation.
In this embodiment, the call incoming to the mobile phone may be initiated by the server, so that after the mobile phone automatically answers the call, the server establishes a communication connection with the mobile phone, and transmits the audio clip in the telephone channel through the communication connection. When a user needs to adopt an audio verification mode, an audio verification button on a client can be clicked, after the user clicks the button, the client can send a request for obtaining audio to a server, after the server receives the request, one audio fragment can be randomly selected from 10000 audio fragments obtained, SliceID of the audio fragment is recorded, then the audio fragment is communicated with a telephone number reserved by the user and delivered to a call center, and the call center dials a telephone through an operator and plays the selected audio fragment on a telephone channel. And after the client automatically answers the call, recording the audio clip played on the telephone channel to obtain the audio clip to be identified.
In addition, after the client finishes recording, the telephone can be hung up.
The above describes how the client obtains the audio clip to be identified, and since the audio clip to be identified needs to be verified, which is usually performed at the server, the server also needs to obtain the audio clip to be identified first. Specifically, the server may receive an audio clip to be identified sent by the client.
Therefore, the following steps can also be performed for the client:
and after the client finishes recording and obtains the audio fragments to be identified, sending a verification request message to a server, wherein the verification request message comprises the audio fragments to be identified.
In the embodiment, the client automatically sends the audio clip to be identified to the server after recording is completed, so that the efficiency can be improved.
S12: and extracting the characteristics of the audio segment to be identified to obtain characteristic data.
The method for extracting the features of the audio segment to be recognized may be the same as the method for extracting the features of the audio segment serving as the sample when the data model is established in the following description, and therefore, the specific contents may refer to the following related description.
S13: and obtaining a verification result according to the characteristic data and a pre-generated data model.
Wherein, the verification process can be implemented online, and the data model can be implemented offline.
Referring to fig. 2, the process of building the data model may include:
s21: a preset number of audio clips serving as samples are acquired.
The general authentication information is composed of four digits, or a form of a number + letter, and for the sake of simplifying the problem, it is assumed that the description is performed by taking a four-digit authentication code as an example.
The random combination of four digits is 10000 (0000-9999) in total, so the preset number is 10000. 10000 audio clips can be selected afterwards, and the 10000 audio clips can be obtained from various audio resources such as songs, music, accompaniment and the like.
In addition, after 10000 audio clips are acquired, unique identification information can be generated for each audio clip, and the SliceID can be represented as 1-10000 if the identification information is represented by SliceID.
It will be appreciated that because the audio segments need to be played in the telephone channel, each audio segment is preferably not too long, which would increase the authentication time and affect the user experience. Therefore, the length of each audio piece can be specified to be 1s on the premise of ensuring the recognition rate. In addition, to improve the verification accuracy, audio segments with larger differences can be selected.
To this end, a set of 10000 audio pieces of 1s, which may be referred to as a verification closure, may be constructed.
S22: and extracting the characteristics of each audio clip to obtain the characteristic data corresponding to each audio clip.
Referring to fig. 3, extracting the features of the audio segment to obtain feature data corresponding to the audio segment includes:
s221: the audio segment is subjected to Fast Fourier Transform (FFT) to obtain spectral data.
The short-time FFT is to multiply the time window on the audio segment of the time domain to obtain audio time domain signals of multiple time segments, and perform FFT on the audio time domain signal of each time segment. The time window can be set according to actual conditions, so that the audio time domain signal of each time period is a short-time stable signal. The music changes relatively slowly compared to the speech signal, so in this embodiment the window of the time window is generally wider than in speech recognition.
The basic unit of audio data processing is a frame, each frame is 64ms long, and assuming that a sample point N is 1024, a sampling rate Fs is 16K. In addition, 513 frequency values can be selected per frame in consideration of the symmetry of the frequency spectrum.
Wherein the ith frequency value fiComprises the following steps:
s222: and extracting tone color (chroma) characteristic data of the frequency value to obtain chroma characteristic data.
The performance of music or singers when singing a song is presented according to a set frequency, and there is a set relationship between different frequencies. Thus, in the spectrogram, different pitches (pitch) are not random, but have a large correlation. This correlation makes the music sound pleasing. This also indicates that there is much redundant information in the spectrogram, so that the spectrum can be compressed to reduce the storage space.
In this embodiment, the spectrogram is compressed based on knowledge of the notes (midi note) in the music. In midi note, each scale (octave) has 12 semitones (semitone), and the ratio of adjacent octaves is 2. In this embodiment, the FFT spectrum is compressed into one octave, resulting in a 12-dimensional chroma feature. For example: the corresponding FFT band spectra of A4(440Hz), A5(880Hz), A6(1760Hz), A7(3520Hz) are added together to produce a spectrum of midi note. Such an operation improves the immunity to noise interference to some extent, since noise or other filtering processes may affect the spectrum of a certain octave segment, i.e. it affects all midnote spectra in this segment, but other unaffected octave segments will also provide information.
For a frequency value f, the dimension index k of the corresponding chroma feature is:
chroma feature data may be expressed as:
chroma(t,k)=log(∑f∈C(k)fft(t,f)+minimal_value)
where t is the index of the frame, k is the index of the dimension, and minimum _ value is a preset small value for the purpose of preventing overflow.
Up to this point, 12-dimensional chroma feature data is calculated for each frame of data for each audio clip.
S223: and carrying out bit map feature extraction on the chroma feature data to obtain bit map feature data, wherein the bit map feature data comprise bit map feature data corresponding to a plurality of frames.
If the chroma spectrum and the FFT spectrum are compared, it is found that the location where 1 exists generally corresponds to an important frequency point in the spectrum and an important event in the spectrum. These important "events" are described by the bit "1" and the other "events" are described by the bit "0", thereby forming bitmap (bitmap) feature data.
Specifically, after the chroma feature is extracted, statistics of the feature in a long time window can be calculated, and the chroma feature can be further converted into a 0/1 bit map by using the statistics.
The statistics herein refer to the mean and variance. Assuming a long-term window length of M (settable), the bitmap feature data bitmap (t, k) for M frames from chroma (t, k) to chroma (t + M-1, k) can be calculated as:
where μ is the mean of the chroma feature data for the M frames, σ is the standard deviation of the chroma feature data for the M frames, and bitmapThreshold is a set constant value, currently set to 2.0.
S224: cascading bitmap feature data corresponding to a preset number of frames to obtain cascaded feature data, and taking the cascaded feature data as the feature data corresponding to the audio segments
After the bitmap feature data is extracted, it is found that the bitmap feature data is a sparse graph, and 1 is about only 10%, so that the 12-dimensional bitmap feature data is not enough in distinction. To improve discrimination, more bits are needed.
In this embodiment, a basic index unit is formed by concatenating consecutive W (settable) bit frames. For example, when W is 14, consecutive 14 frames of bitmap feature data are concatenated together to form 168-dimensional bitmap feature data, where the 168-dimensional bitmap feature data is feature data corresponding to one audio clip.
S23: and acquiring related information according to the characteristic data, storing the related information and generating a data model.
Optionally, the related information may include: the feature data and the identification information of the audio clip corresponding to the feature data are stored, and when the feature data and the identification information of the audio clip are stored, the feature data and the identification information of the audio clip are stored correspondingly. For example, if the identification information of the first audio segment is 1 and the feature data corresponding to the first audio segment is a first vector with 168 dimensions, 1 and the first vector may be stored correspondingly. Or,
optionally, the 168-dimensional feature data still occupies only a small part of 1, so that, in order to avoid the waste of storage resources, and facilitate subsequent comparison, the feature data and the corresponding identification information of the audio clip may be recorded in an inverted document index (inverted document index) manner. At this time, the related information is: and obtaining the identification information of all audio clips of which the data at the same bit position is 1 by adopting an inverted document index mode. For example, 10000 audio segments correspond to a 168-dimensional vector respectively, and 1 is a small part and most of 168 data (0 or 1) in each vector is 0, and it is assumed that the identification information of the audio segment with 1 in the first dimension includes: 1,10, …, the identification information corresponding to the first dimension includes: 1,10, ….
Thus, a data model can be obtained.
During verification, as described in the above related steps, feature extraction needs to be performed on the audio segment to be recognized first, and the flow of feature extraction is consistent with the flow of feature extraction when the data model is generated. Referring to fig. 3, after receiving the audio clip to be identified (obtained by the client recording) sent by the client, the server performs feature extraction (S31), where the feature extraction includes: short-time FFT, chroma characteristic data extraction, bitmap characteristic data extraction and 14-frame cascading are carried out to generate 168-dimensional characteristic data. After 168-dimensional feature data corresponding to the audio clip to be identified is obtained, the 168-dimensional feature data can be compared with relevant information of the feature data stored in the data model, and then a verification result is obtained.
Referring to fig. 3, the data model has stored therein: the obtaining of the verification result according to the feature data and a pre-generated data model includes:
s32: and determining an audio clip matched with the audio clip to be recognized according to the feature data and the related information, and determining the identification information of the matched audio clip.
After the feature data corresponding to the audio segments to be identified are obtained, the feature data corresponding to 10000 audio segments in the data model can be sequentially compared, so that the matched audio segments are obtained.
When the audio segments are compared, the dot product value can be calculated by calculating the dot product of the feature data corresponding to the two audio segments, and the audio segment with the maximum dot product value is determined as the matched audio segment.
For convenience of explanation, the following simplified scenario is assumed: assuming that the feature data corresponding to the audio segment to be recognized is 5-dimensional and is 10100, assuming that there are 3 feature data of the audio segment in the data model, which are 10010,01000,10100 respectively, since the dot product values of 10100 and 10010 are 1, the dot product values of 10100 and 01000 are 0, the dot product values of 10100 and 10100 are 2, and since 2 is the maximum value, the matched audio segment is the audio segment corresponding to 10100.
In addition, considering the time delay of the audio clip in the transmission of the telephone channel, and other reasons, the recorded audio clip to be identified and the originally played audio clip have the problem of non-synchronization. It is assumed that the feature data corresponding to each audio clip stored in the data model is called an index segment, the feature data corresponding to the audio clip to be identified is called an original segment, and the index segment and the original segment both comprise 14 frames of data. After the original segment is obtained, time shift operation may be performed on the original segment, for example, time shift is performed on the original segment by 1 to 7 frames, so that 14 new query segments may be obtained, and assuming that the original segment and the 14 new query segments are collectively referred to as query segments, there are 15 query segments in total. Then, the dot product value of each query segment and each index segment can be calculated to obtain 150000 dot product values in total, and the audio segment corresponding to the index segment corresponding to the largest dot product value is determined as the matched audio segment. For example, if the largest dot product value is the dot product value of the 6 th query segment and the 100 th index segment, and the 100 th index segment is the feature data of the 100 th audio segment, then the matching audio segment is the 100 th audio segment.
The method for generating the new query segment is elastic and expandable. Since the database situation is known, there is substantially little change once the index is built. While the conditions at query time are unpredictable, so that if a very difficult situation is encountered, the query can be processed instead of changing the index.
Because 10000 audio fragments are obtained, corresponding identification information can be generated, and therefore, after the matched audio fragments are determined, the identification information of the matched audio fragments can be obtained.
S33: and comparing the identification information of the matched audio clip with the identification information of the audio clip to be identified.
Because the audio clip to be recognized is played to the client by the server, the server can record the identification information of the played audio clip, so that the server can compare the identification information of the matched audio clip with the identification information of the audio clip to be recognized.
S34: if the two are consistent, the verification result is determined to be successful, otherwise, the verification result is determined to be failed.
For example, the identification information of the matching audio clip is 100, if the identification information of the recorded audio clip to be recognized is also 100, the verification is successful, otherwise the verification fails.
In the embodiment, the audio-based verification can be realized by extracting the characteristics of the audio clip to be recognized and verifying according to the extracted characteristic data, so that some problems in a short message verification mode are avoided, and the safety and the use convenience of information verification are improved. The method of the present embodiment may be applied to payment related mobile products. In the embodiment, the audio clip played in the telephone channel is recorded, so that the interference of external environmental factors can be avoided, and the accuracy is improved. In the embodiment, the telephone is automatically answered and recorded without manual participation, so that the efficiency is improved, the accuracy is improved, and the user experience is improved.
Fig. 4 is a schematic structural diagram of an information verification system according to another embodiment of the present invention, where the system 40 may include: an acquisition module 41, an extraction module 42 and a verification module 43. The obtaining module may be located at the client, and the extracting module and the verifying module may be located at the server, or the obtaining module, the extracting module and the verifying module are located at the server.
An obtaining module 41, configured to obtain an audio segment to be identified;
different from the method of sending the verification code by using the short message in the prior art, the verification is performed by using an audio mode in the embodiment.
Optionally, the obtaining module is located at the client, and the obtaining module is specifically configured to:
and recording the audio clip transmitted in the telephone channel to obtain the audio clip to be identified.
The client may be specifically located in the mobile phone, for example, the user may perform verification based on the audio clip before using the mobile phone to pay. At this time, the audio clip transmitted in the telephone channel of the mobile phone can be recorded to obtain the audio clip to be identified.
By recording the audio clip in the telephone channel, the audio clip to be identified can be unrelated to the acoustic environment around the user and can not be influenced by environmental noise, so that the precision of the audio clip to be identified is improved, and the verification accuracy is improved.
Optionally, the obtaining module is further specifically configured to:
after detecting the incoming call, automatically answering the call and starting a recording module;
and after receiving the call, receiving the audio clip in the telephone channel, and recording the audio clip by adopting the recording module.
The automatic answering and starting of the recording module can be carried out synchronously.
In the embodiment, by automatically answering the call and starting the recording module, the efficiency can be improved in a mode of manually waiting and inputting the verification code in the short message without human participation.
In this embodiment, the call incoming to the mobile phone may be initiated by the server, so that after the mobile phone automatically answers the call, the server establishes a communication connection with the mobile phone, and transmits the audio clip in the telephone channel through the communication connection. When a user needs to adopt an audio verification mode, an audio verification button on a client can be clicked, after the user clicks the button, the client can send a request for obtaining audio to a server, after the server receives the request, one audio fragment can be randomly selected from 10000 audio fragments obtained, SliceID of the audio fragment is recorded, then the audio fragment is communicated with a telephone number reserved by the user and delivered to a call center, and the call center dials a telephone through an operator and plays the selected audio fragment on a telephone channel. And after the client automatically answers the call, recording the audio clip played on the telephone channel to obtain the audio clip to be identified.
In addition, after the client finishes recording, the telephone can be hung up.
The above describes how the client obtains the audio clip to be identified, and since the audio clip to be identified needs to be verified, which is usually performed at the server, the server also needs to obtain the audio clip to be identified first. Specifically, the server may receive an audio clip to be identified sent by the client.
In another embodiment, referring to fig. 5, the system 40 further comprises:
and the sending module 44 is located at the client and configured to send a verification request message to the server after the audio segment to be identified is obtained after the recording is completed, where the verification request message includes the audio segment to be identified.
In the embodiment, the client automatically sends the audio clip to be identified to the server after recording is completed, so that the efficiency can be improved.
An extracting module 42, configured to perform feature extraction on the audio segment to be identified, and acquire feature data corresponding to the audio segment to be identified;
optionally, the extracting module 42 is specifically configured to:
carrying out short-time FFT on the audio clip to be identified to obtain a frequency value;
extracting tone chroma characteristic data of the frequency value to obtain chroma characteristic data;
carrying out bit map feature extraction on the chroma feature data to obtain bit map feature data, wherein the bit map feature data comprise bit map feature data corresponding to a plurality of frames;
cascading the bitmap feature data corresponding to a preset number of frames to obtain cascaded feature data, and taking the cascaded feature data as the feature data corresponding to the audio segment to be identified.
The method for extracting the features of the audio segment to be recognized may be the same as the method for extracting the features of the audio segment serving as the sample when the data model is established in the following description, and therefore, the specific contents may refer to the following related description.
And the verification module 43 is configured to obtain a verification result according to the feature data and a pre-generated data model.
Wherein, the verification process can be implemented online, and the data model can be implemented offline.
In another embodiment, referring to fig. 5, the system 40 further comprises: a generating module 45 for generating a data model, the generating module 45 being specifically configured to:
acquiring a preset number of audio fragments serving as samples;
the general authentication information is composed of four digits, or a form of a number + letter, and for the sake of simplifying the problem, it is assumed that the description is performed by taking a four-digit authentication code as an example.
The random combination of four digits is 10000 (0000-9999) in total, so the preset number is 10000. 10000 audio clips can be selected afterwards, and the 10000 audio clips can be obtained from various audio resources such as songs, music, accompaniment and the like.
In addition, after 10000 audio clips are acquired, unique identification information can be generated for each audio clip, and the SliceID can be represented as 1-10000 if the identification information is represented by SliceID.
It will be appreciated that because the audio segments need to be played in the telephone channel, each audio segment is preferably not too long, which would increase the authentication time and affect the user experience. Therefore, the length of each audio piece can be specified to be 1s on the premise of ensuring the recognition rate. In addition, to improve the verification accuracy, audio segments with larger differences can be selected.
To this end, a set of 10000 audio pieces of 1s, which may be referred to as a verification closure, may be constructed.
Extracting the characteristics of each audio clip to obtain characteristic data corresponding to each audio clip;
the process of extracting the feature data of the audio segment can comprise the following steps:
carrying out short-time FFT on each audio clip to obtain a frequency value;
extracting tone chroma characteristic data of the frequency value to obtain chroma characteristic data;
carrying out bit map feature extraction on the chroma feature data to obtain bit map feature data, wherein the bit map feature data comprise bit map feature data corresponding to a plurality of frames;
cascading the bitmap feature data corresponding to a preset number of frames to obtain cascaded feature data, and taking the cascaded feature data as the feature data corresponding to the audio segment.
For details, see S221-S224, which are not described herein.
And acquiring related information according to the characteristic data, storing the related information and generating a data model.
Optionally, the related information may include: the feature data and the identification information of the audio clip corresponding to the feature data are stored, and when the feature data and the identification information of the audio clip are stored, the feature data and the identification information of the audio clip are stored correspondingly. For example, if the identification information of the first audio segment is 1 and the feature data corresponding to the first audio segment is a first vector with 168 dimensions, 1 and the first vector may be stored correspondingly. Or,
optionally, the 168-dimensional feature data still occupies only a small part of 1, so that, in order to avoid the waste of storage resources, and facilitate subsequent comparison, the feature data and the corresponding identification information of the audio clip may be recorded in an inverted document index (inverted document index) manner. At this time, the related information is: and obtaining the identification information of all audio clips of which the data at the same bit position is 1 by adopting an inverted document index mode. For example, 10000 audio segments correspond to a 168-dimensional vector respectively, and 1 is a small part and most of 168 data (0 or 1) in each vector is 0, and it is assumed that the identification information of the audio segment with 1 in the first dimension includes: 1,10, …, the identification information corresponding to the first dimension includes: 1,10, ….
Thus, a data model can be obtained.
During verification, as described in the above related steps, feature extraction needs to be performed on the audio segment to be recognized first, and the flow of feature extraction is consistent with the flow of feature extraction when the data model is generated. Referring to fig. 3, after receiving the audio clip to be identified (obtained by the client recording) sent by the client, the server performs feature extraction (S31), where the feature extraction includes: short-time FFT, chroma characteristic data extraction, bitmap characteristic data extraction and 14-frame cascading are carried out to generate 168-dimensional characteristic data. After 168-dimensional feature data corresponding to the audio clip to be identified is obtained, the 168-dimensional feature data can be compared with relevant information of the feature data stored in the data model, and then a verification result is obtained.
In another embodiment, referring to fig. 5, the data model has stored therein: as the related information of the feature data of the audio segment of the sample, where the audio segment to be identified is obtained by recording one audio segment in the sample, the verification module 43 includes:
a first unit 431, configured to determine, according to the feature data and the related information, an audio segment matching the audio segment to be recognized, and determine identification information of the matching audio segment;
optionally, the determining, by the first unit, an audio segment matching the audio segment to be identified according to the feature data and the related information includes:
time shifting the characteristic data, and forming a plurality of query segments by the time-shifted characteristic data and the original characteristic data;
and calculating a dot product value between each query segment and the feature data in each relevant information, and determining the audio segment with the maximum dot product value as a matched audio segment.
After the feature data corresponding to the audio segments to be identified are obtained, the feature data corresponding to 10000 audio segments in the data model can be sequentially compared, so that the matched audio segments are obtained.
When the audio segments are compared, the dot product value can be calculated by calculating the dot product of the feature data corresponding to the two audio segments, and the audio segment with the maximum dot product value is determined as the matched audio segment.
For convenience of explanation, the following simplified scenario is assumed: assuming that the feature data corresponding to the audio segment to be recognized is 5-dimensional and is 10100, assuming that there are 3 feature data of the audio segment in the data model, which are 10010,01000,10100 respectively, since the dot product values of 10100 and 10010 are 1, the dot product values of 10100 and 01000 are 0, the dot product values of 10100 and 10100 are 2, and since 2 is the maximum value, the matched audio segment is the audio segment corresponding to 10100.
In addition, considering the time delay of the audio clip in the transmission of the telephone channel, and other reasons, the recorded audio clip to be identified and the originally played audio clip have the problem of non-synchronization. It is assumed that the feature data corresponding to each audio clip stored in the data model is called an index segment, the feature data corresponding to the audio clip to be identified is called an original segment, and the index segment and the original segment both comprise 14 frames of data. After the original segment is obtained, time shift operation may be performed on the original segment, for example, time shift is performed on the original segment by 1 to 7 frames, so that 14 new segments may be obtained, and assuming that the original segment and the 14 new segments are collectively referred to as query segments, there are 15 query segments in total. Then, the dot product value of each query segment and each index segment can be calculated to obtain 150000 dot product values in total, and the audio segment corresponding to the index segment corresponding to the largest dot product value is determined as the matched audio segment. For example, if the largest dot product value is the dot product value of the 6 th query segment and the 100 th index segment, and the 100 th index segment is the feature data of the 100 th audio segment, then the matching audio segment is the 100 th audio segment.
Because 10000 audio fragments are obtained, corresponding identification information can be generated, and therefore, after the matched audio fragments are determined, the identification information of the matched audio fragments can be obtained.
A second unit 432, configured to compare the identification information of the matched audio segment with the identification information of the audio segment to be identified;
because the audio clip to be recognized is played to the client by the server, the server can record the identification information of the played audio clip, so that the server can compare the identification information of the matched audio clip with the identification information of the audio clip to be recognized.
A third unit 433, configured to determine that the verification result is successful if the two are consistent, and otherwise, determine that the verification result is failed.
For example, the identification information of the matching audio clip is 100, if the identification information of the recorded audio clip to be recognized is also 100, the verification is successful, otherwise the verification fails.
In the embodiment, the audio-based verification can be realized by extracting the characteristics of the audio clip to be recognized and verifying according to the extracted characteristic data, so that some problems in a short message verification mode are avoided, and the safety and the use convenience of information verification are improved. The method of the present embodiment may be applied to payment related mobile products. In the embodiment, the audio clip played in the telephone channel is recorded, so that the interference of external environmental factors can be avoided, and the accuracy is improved. In the embodiment, the telephone is automatically answered and recorded without manual participation, so that the efficiency is improved, the accuracy is improved, and the user experience is improved.
It should be noted that, the foregoing describes flows of the client and the server, or a single-side flow of the server, and from a single-side perspective, the embodiment of the present invention may further provide a single-side flow of the client and a corresponding apparatus, where the single-side flow of the client may include: and acquiring an audio clip to be identified, sending the audio clip to be identified to a server, and receiving a verification result sent by the server. The specific content of the audio clip to be identified obtained by the client may refer to the above related description, and the flow of the server obtaining the verification result according to the audio clip to be identified may refer to the above related description.
It should be noted that the terms "first," "second," and the like in the description of the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present invention, the meaning of "a plurality" means at least two unless otherwise specified.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (15)

1. An information verification method, comprising:
acquiring an audio clip to be identified;
extracting the characteristics of the audio clip to be identified to obtain characteristic data corresponding to the audio clip to be identified;
obtaining a verification result according to the characteristic data and a pre-generated data model;
the method for extracting the characteristics of the audio clip and acquiring the characteristic data corresponding to the audio clip comprises the following steps:
carrying out short-time fast Fourier transform on an audio clip to obtain a frequency value, wherein the audio clip comprises: an audio segment to be identified, or each audio segment as a sample;
extracting tone characteristic data from the frequency value to obtain tone characteristic data;
carrying out bit map feature extraction on the tone feature data to obtain bit map feature data, wherein the bit map feature data comprise bit map feature data corresponding to a plurality of frames;
cascading the bitmap feature data corresponding to a preset number of frames to obtain cascaded feature data, and taking the cascaded feature data as the feature data corresponding to the audio segment.
2. The method of claim 1, wherein the obtaining the audio segment to be identified comprises:
and the client records the audio clip transmitted in the telephone channel to obtain the audio clip to be identified.
3. The method of claim 2, wherein the client recording the audio clip transmitted in the telephone channel to obtain the audio clip to be identified comprises:
after detecting the incoming call, the client automatically answers the call and starts a recording module;
and after receiving the call, the client receives the audio clip in the telephone channel and records the audio clip by adopting the recording module.
4. The method of claim 2 or 3, further comprising:
and after the client finishes recording and obtains the audio fragments to be identified, sending a verification request message to a server, wherein the verification request message comprises the audio fragments to be identified.
5. The method of claim 1, wherein the data model has stored therein: the obtaining of the verification result according to the feature data and a pre-generated data model includes:
determining an audio clip matched with the audio clip to be recognized according to the feature data and the related information, and determining identification information of the matched audio clip;
comparing the identification information of the matched audio clip with the identification information of the audio clip to be identified;
if the two are consistent, the verification result is determined to be successful, otherwise, the verification result is determined to be failed.
6. The method of claim 5, wherein determining the audio segment matching the audio segment to be identified according to the feature data and the related information comprises:
time shifting the characteristic data, and forming a plurality of query segments by the time-shifted characteristic data and the original characteristic data;
and calculating a dot product value between each query segment and the feature data in each relevant information, and determining the audio segment with the maximum dot product value as a matched audio segment.
7. The method of claim 1, further comprising: generating a data model, the generating a data model comprising:
acquiring a preset number of audio fragments serving as samples;
extracting the characteristics of each audio clip to obtain characteristic data corresponding to each audio clip;
and acquiring related information according to the characteristic data, storing the related information and generating a data model.
8. The method according to any one of claims 5 to 7, wherein the information related to the feature data comprises:
the feature data and the identification information of the audio clip corresponding to the feature data are stored, and when the feature data and the identification information of the audio clip are stored, the feature data and the identification information of the audio clip are stored correspondingly; or,
and obtaining the identification information of all audio clips of which the data at the same bit position is 1 by adopting an inverted document index mode.
9. An information verification system, comprising:
the acquisition module is used for acquiring the audio clip to be identified;
the extraction module is used for extracting the characteristics of the audio clip to be identified and acquiring the characteristic data corresponding to the audio clip to be identified;
the verification module is used for obtaining a verification result according to the characteristic data and a pre-generated data model;
the extraction module is specifically configured to:
carrying out short-time fast Fourier transform on the audio segment to be identified to obtain a frequency value;
extracting tone characteristic data from the frequency value to obtain tone characteristic data;
carrying out bit map feature extraction on the tone feature data to obtain bit map feature data, wherein the bit map feature data comprise bit map feature data corresponding to a plurality of frames;
cascading the bitmap feature data corresponding to a preset number of frames to obtain cascaded feature data, and taking the cascaded feature data as the feature data corresponding to the audio segment to be identified.
10. The system of claim 9, wherein the obtaining module is located at a client, and the obtaining module is specifically configured to:
and recording the audio clip transmitted in the telephone channel to obtain the audio clip to be identified.
11. The system of claim 10, wherein the acquisition module is further specifically configured to:
after detecting the incoming call, automatically answering the call and starting a recording module;
and after receiving the call, receiving the audio clip in the telephone channel, and recording the audio clip by adopting the recording module.
12. The system of claim 10 or 11, further comprising:
and the sending module is positioned at the client and used for sending a verification request message to the server after the audio clip to be identified is obtained after the recording is finished, wherein the verification request message comprises the audio clip to be identified.
13. The system of claim 9, wherein the data model has stored therein: the verification module is used for obtaining relevant information of characteristic data of an audio fragment of a sample, wherein the audio fragment to be identified is obtained by recording one audio fragment in the sample, and the verification module comprises:
a first unit, configured to determine, according to the feature data and the related information, an audio segment that matches the audio segment to be recognized, and determine identification information of the matched audio segment;
the second unit is used for comparing the identification information of the matched audio clip with the identification information of the audio clip to be identified;
and the third unit is used for determining that the verification result is successful if the two are consistent, and otherwise, determining that the verification result is failed.
14. The system of claim 13, wherein the first unit is configured to determine an audio segment matching the audio segment to be identified according to the feature data and the related information, and comprises:
time shifting the characteristic data, and forming a plurality of query segments by the time-shifted characteristic data and the original characteristic data;
and calculating a dot product value between each query segment and the feature data in each relevant information, and determining the audio segment with the maximum dot product value as a matched audio segment.
15. The system according to any one of claims 9-11, further comprising: a generation module for generating a data model, the generation module being specifically configured to:
acquiring a preset number of audio fragments serving as samples;
extracting the characteristics of each audio clip to obtain characteristic data corresponding to each audio clip;
and acquiring related information according to the characteristic data, storing the related information and generating a data model.
CN201510367441.XA 2015-06-29 2015-06-29 Information Authentication method and system Active CN104992095B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510367441.XA CN104992095B (en) 2015-06-29 2015-06-29 Information Authentication method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510367441.XA CN104992095B (en) 2015-06-29 2015-06-29 Information Authentication method and system

Publications (2)

Publication Number Publication Date
CN104992095A CN104992095A (en) 2015-10-21
CN104992095B true CN104992095B (en) 2018-07-10

Family

ID=54303908

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510367441.XA Active CN104992095B (en) 2015-06-29 2015-06-29 Information Authentication method and system

Country Status (1)

Country Link
CN (1) CN104992095B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109461437B (en) * 2018-11-28 2023-05-09 平安科技(深圳)有限公司 Verification content generation method and related device for lip language identification
CN110113357B (en) * 2019-05-23 2022-11-15 五竹科技(北京)有限公司 Method for authenticating identity of outbound robot and providing certification information

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101546596A (en) * 2008-03-28 2009-09-30 北京爱国者存储科技有限责任公司 Electronic voice recording equipment capable of identifying identity
CN103391354A (en) * 2012-05-09 2013-11-13 富泰华工业(深圳)有限公司 Information security system and information security method
CN103871426A (en) * 2012-12-13 2014-06-18 上海八方视界网络科技有限公司 Method and system for comparing similarity between user audio frequency and original audio frequency
CN103973441B (en) * 2013-01-29 2016-03-09 腾讯科技(深圳)有限公司 Based on user authen method and the device of audio frequency and video
CN103841108B (en) * 2014-03-12 2018-04-27 北京天诚盛业科技有限公司 The authentication method and system of user biological feature

Also Published As

Publication number Publication date
CN104992095A (en) 2015-10-21

Similar Documents

Publication Publication Date Title
WO2018045988A1 (en) Method and device for generating digital music score file of song, and storage medium
CN108074557B (en) Tone adjusting method, device and storage medium
WO2017157319A1 (en) Audio information processing method and device
US20230402026A1 (en) Audio processing method and apparatus, and device and medium
US9892758B2 (en) Audio information processing
CN102419998B (en) Voice frequency processing method and system
CN112365868B (en) Sound processing method, device, electronic equipment and storage medium
US9058384B2 (en) System and method for identification of highly-variable vocalizations
CN111667803B (en) Audio processing method and related products
CN112420015B (en) Audio synthesis method, device, equipment and computer readable storage medium
CN112216259A (en) Method and device for aligning vocal accompaniment
CN104992095B (en) Information Authentication method and system
KR101813704B1 (en) Analyzing Device and Method for User's Voice Tone
US11694054B2 (en) Identifier
Van Balen Automatic recognition of samples in musical audio
CN112687247B (en) Audio alignment method and device, electronic equipment and storage medium
CN116631359A (en) Music generation method, device, computer readable medium and electronic equipment
CN111276113B (en) Method and device for generating key time data based on audio
CN108885878A (en) Improved method, device and system for embedding data in a stream
CN104954369A (en) Multimedia content sending, generating and transmitting and playing methods and devices
CN104935950B (en) Processing method and system for obtaining programme information
US20220223127A1 (en) Real-Time Speech To Singing Conversion
CN117636903A (en) Music score identification method and device, electronic equipment and medium
CN115101094A (en) Audio processing method and device, electronic equipment and storage medium
CN114822456A (en) Music score audio detection method, device, equipment and computer medium based on music score

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant