WO2021051533A1 - Address information-based blacklist identification method, apparatus, device, and storage medium - Google Patents

Address information-based blacklist identification method, apparatus, device, and storage medium Download PDF

Info

Publication number
WO2021051533A1
WO2021051533A1 PCT/CN2019/117117 CN2019117117W WO2021051533A1 WO 2021051533 A1 WO2021051533 A1 WO 2021051533A1 CN 2019117117 W CN2019117117 W CN 2019117117W WO 2021051533 A1 WO2021051533 A1 WO 2021051533A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
blacklist
feature
address information
file
Prior art date
Application number
PCT/CN2019/117117
Other languages
French (fr)
Chinese (zh)
Inventor
李江
王健宗
彭俊清
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021051533A1 publication Critical patent/WO2021051533A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/08Use of distortion metrics or a particular distance between probe pattern and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0876Network architectures or network communication protocols for network security for authentication of entities based on the identity of the terminal or configuration, e.g. MAC address, hardware or software configuration or device fingerprint
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Definitions

  • This application relates to the field of artificial intelligence, and in particular to a blacklist identification method, device, equipment and storage medium based on address information.
  • Voiceprint recognition achieves the purpose of distinguishing unknown voices by analyzing the characteristics of one or more speech signals. Simply put, it is a technology to distinguish whether a certain sentence is spoken by a certain person.
  • the theoretical basis of voiceprint recognition is that each voice has a unique feature, which can effectively distinguish the voices of different people.
  • the two most important factors that determine the characteristics of the voiceprint are the size of the sound cavity and the way the vocal organs are manipulated.
  • Voiceprint is a very important feature of the human body. In theory, no two people have exactly the same voiceprint characteristics.
  • the speaker identification technology based on voiceprint recognition is of great significance in actual production. For example, bank credit card business can record the voiceprint feature database of blacklisted users and compare whether the user's voice is in the blacklisted feature database. Analyze and identify whether the user is on the blacklist, so as to guide the bank's credit card business personnel to make corresponding processing strategies.
  • the establishment of the blacklist feature database covers the whole country and covers all age groups. Therefore, the blacklist feature database will be very large.
  • This application provides a blacklist identification method, device, equipment and storage medium based on address information, which are used to divide the blacklist feature database into smaller blacklist feature sub-databases, and based on the pair of blacklist feature sub-databases corresponding to the address information
  • the voiceprint features are compared to improve the efficiency of voiceprint recognition.
  • the first aspect of the embodiments of the present application provides a blacklist recognition method based on address information, including: acquiring a voice file of a target user, the voice file including audio and address information, and the address information including the incoming call section or / And Internet Protocol address IP information; perform feature extraction on the audio through a preset algorithm to obtain a feature file; determine whether the feature file is valid; if the feature file is invalid, generate an extraction failure status code, the status The code is used to indicate the reason for the extraction failure; if the feature file is valid, the target address range to which the target user belongs is determined according to the incoming phone segment or the IP information, and the preset blacklist model and all the addresses are called. According to the target address range, the feature files are scored for similarity, and corresponding operations are performed according to the scoring results.
  • a second aspect of the embodiments of the present application provides a blacklist recognition device based on address information, including: an acquiring unit for acquiring a voice file of a target user, the voice file including audio and address information; and an extracting unit for Perform feature extraction on the audio through a preset algorithm to obtain a feature file; a judging unit for judging whether the feature file is valid; a first generating unit, if the feature file is invalid, for generating a status code of extraction failure The status code is used to indicate the reason for the extraction failure; the scoring unit, if the feature file is valid, is used to determine the target to which the target user belongs based on the incoming phone section or the Internet Protocol address IP information Address range, call the preset blacklist model and the target address range to score the similarity of the feature files, and perform corresponding operations according to the scoring results.
  • the third aspect of the embodiments of the present application provides a blacklist identification device based on address information, including a memory, a processor, and a computer program stored in the memory and running on the processor.
  • the processor The above-mentioned blacklist identification method based on address information is realized when the computer program is executed.
  • the fourth aspect of the embodiments of the present application provides a computer-readable storage medium that stores instructions in the computer-readable storage medium.
  • the computer executes the blacklist based on address information. Identify the steps of the method.
  • the voice file of the target user is obtained, the voice file includes audio and address information, and the address information includes the incoming phone segment or/and Internet Protocol address IP information;
  • the audio is characterized by a preset algorithm Extract and obtain the signature file; determine whether the signature file is valid; if the signature file is invalid, generate a status code of the extraction failure, the status code is used to indicate the reason for the extraction failure; if the signature file is valid, according to the incoming phone segment or IP information Determine the target address range to which the target user belongs, call the preset blacklist model and target address range to score the similarity of the feature files, and perform corresponding operations according to the scoring results.
  • the blacklist feature database is divided into blacklist feature sub-bases with smaller dimensions, and voiceprint features are compared according to the blacklist feature sub-bases corresponding to address information, which improves the efficiency of voiceprint recognition.
  • FIG. 1 is a schematic diagram of an embodiment of a blacklist identification method based on address information in an embodiment of the application
  • FIG. 2 is a schematic diagram of another embodiment of a blacklist identification method based on address information in an embodiment of this application;
  • FIG. 3 is a schematic diagram of an embodiment of a blacklist identification device based on address information in an embodiment of the application;
  • FIG. 4 is a schematic diagram of another embodiment of a blacklist identification device based on address information in an embodiment of the application;
  • Fig. 5 is a schematic diagram of an embodiment of a blacklist identification device based on address information in an embodiment of the application.
  • This application provides a blacklist identification method, device, equipment, and storage medium based on address information.
  • the blacklist feature database is divided into smaller-dimensional blacklist feature sub-bases, and the blacklist feature sub-bases corresponding to the address information are used for voice matching.
  • the feature of the pattern is compared to improve the efficiency of voiceprint recognition.
  • FIG. 1 a flowchart of a method for identifying a blacklist based on address information provided by an embodiment of the present application, which specifically includes:
  • the voice file includes audio and address information, and the address information includes incoming phone segment or/and Internet Protocol address IP information.
  • the server obtains the voice file of the target user, the voice file includes audio and address information, and the address information includes incoming telephone section or/and IP information.
  • the server receives the voice file of the target user; the server parses the voice file to obtain the audio and address identification of the target user; the server queries the preset table according to the address identification to obtain the address information corresponding to the address identification, and the address information includes the incoming line Phone segment or/and IP information.
  • the server when the server obtains the audio of the target user through the phone or the network, it will also determine the specific address of the target user according to the incoming phone segment or internet protocol address (IP) information. For a specific service, the basic information of the target user (except sensitive information) is maintained. For example, when the server obtains the target user's voice file through the network, the voice file includes audio and address identifiers as well as An identity identifier that indicates the basic information of the target user, where the basic information includes age, gender, and so on.
  • IP internet protocol address
  • the execution subject of this application may be a blacklist identification device based on address information, or may also be a terminal or a server, which is not specifically limited here.
  • the embodiment of the present application takes the server as the execution subject as an example for description.
  • the server performs feature extraction on the audio through a preset algorithm to obtain a feature file. Specifically, the server converts the audio from an analog signal form to a digital signal form; the server pre-emphasizes the audio in the digital signal form; the server performs windowing processing on the pre-emphasized audio; the server performs discrete integration of the windowed audio
  • the inner leaf transforms to obtain the target complex number; the server maps the target complex number to the Mel spectrum to obtain the logarithmic energy; the server converts the logarithmic energy to obtain the cepstral coefficient; the server calculates the energy and the difference according to the cepstral coefficient, and generates the signature file .
  • the server needs to sample and quantize the collected audio, that is, convert the audio continuous waveform into discrete data points with a certain sampling rate and number of sampling bits. Since the sounds in daily life are generally below 8kHz, according to Nyquist's law, the 16kHz sampling rate is sufficient to make the sampled data contain most of the sound information. 16kHz means that 16k samples are sampled in 1s. These samples are stored in amplitude values. In order to effectively store the amplitude values, they need to be quantized into integers. For the 16-bit sampling number, it can represent an integer value between -32768 and 32767, so the sampling amplitude value can be quantized to the nearest integer value.
  • the energy of the low frequency part is usually higher than the energy of the high frequency part. After 10 times of Hz, the spectrum energy will be attenuated by 20dB, and due to the influence of the noise of the circuit itself when the microphone is collecting the sound signal, It will also increase the energy of the low frequency part. In order to make the energy of the high frequency part and the energy of the low frequency part have a similar amplitude, it is necessary to pre-enhance the high-frequency energy of the collected sound, that is, pre-emphasize the audio in the form of a digital signal.
  • the audio after pre-emphasis can be considered to be smooth, which is called windowing.
  • the window is described by three parameters: window length (in milliseconds), offset and shape.
  • Each windowed audio signal is called a frame, the number of milliseconds in each frame is called the frame length, and the distance between the left borders of two adjacent frames is called the frame shift.
  • L is the frame length
  • the server performs discrete Fourier transform on the windowed audio
  • the process of obtaining the target complex number specifically includes: the server obtains the windowed audio signal x[n],...,[m], n And m are integers greater than 0; the server calls the first preset formula to generate the target complex number X[k], the first preset formula is: N is a power of 2, k is an integer, and X[k] represents the amplitude and phase of a certain frequency component in the windowed audio signal.
  • the server maps the target complex number to the Mel spectrum, and the process of obtaining the logarithmic energy specifically includes:
  • the server smoothes the target complex number through the preset filter group; the server corresponds the smoothed complex number to the mel scale on the mel spectrum, and one mel scale represents a treble unit; the server uses the second preset formula Correspond the smoothed complex number to the mel scale to obtain the target scale.
  • the second preset formula is:
  • the server calculates the logarithmic energy of the target scale according to the third preset formula.
  • the third preset formula is: H m (k) is the frequency response of the filter bank, and M represents the number of filters in the preset filter bank. It should be noted that the response of average people to sound pressure is logarithmic, and people are not as sensitive to subtle changes in high sound pressure as low sound pressure.
  • the filter bank is a set of triangular filter banks with a Mel scale. The 10 filters below 1000 Hz are linearly separated, and the remaining filters above 1000 Hz are logarithmically separated.
  • the server calculates the energy and difference according to the cepstral coefficient, and the process of generating the signature file includes:
  • the energy of a certain frame is defined as the sum of the squares of sample points in a certain frame.
  • the energy from sample point t1 to sample point t2 is:
  • the features extracted above are considered separately for each frame and are static, while the actual sound is continuous, and there is a connection between frames. Therefore, it is necessary to add features to represent such dynamic changes between frames. This is usually calculated by calculating each frame.
  • the first-order difference or even the second-order difference of 13 features in one frame (12 cepstrum features plus 1 energy) can be realized.
  • a simple way to calculate the difference is to calculate the difference between the 13 features of the current frame before and after the frame: If the second-order difference is not considered, the final Mel frequency cepstral coefficient feature of each frame is 26 dimensions: 12-dimensional cepstral coefficient, 12-dimensional cepstral coefficient difference, 1-dimensional energy and 1-dimensional energy difference.
  • the server judges whether the signature file is valid. Specifically, the server determines whether the format of the signature file meets the preset quality requirements; if the format of the signature file does not meet the preset quality requirements, the server determines that the signature file is invalid; if the format of the signature file meets the preset quality requirements, the server determines the feature Whether there are voices of multiple users in the file; if there are no voices of multiple users in the feature file, the server determines that the feature file is valid; if there are voices of multiple users in the feature file, the server determines that the feature file is invalid.
  • the server If the signature file is invalid, the server generates an extraction failure status code.
  • the status code is used to indicate the reason for the extraction failure. That is, the extraction failure status code can simply inform the reason for the failure to extract, such as poor voice quality, multiple people talking, etc. Different reasons for failure correspond to different status codes. For example, if the format of the feature file does not meet the preset quality requirements, the server determines that the feature file is invalid and generates the first status code that the extraction fails; if there are multiple user voices in the feature file, the server determines that the feature file is invalid and generates the extraction The failed second status code, where the first status code and the second status code are different.
  • the server determines the target address range to which the target user belongs according to the incoming phone segment or IP information, calls the preset blacklist model and target address range to score the similarity of the signature file, and makes corresponding responses according to the scoring results operating.
  • the server determines the target address range to which the target user belongs based on the incoming phone segment or IP information; the server determines the corresponding target blacklist model in the preset blacklist model according to the target address range, and A preset blacklist model corresponds to a different blacklist feature sub-database; the server scores the similarity of the feature files through the target blacklist model to obtain the target score; if the target score is greater than or equal to the first threshold, the server determines The target user is in the blacklist feature sub-database corresponding to the target blacklist model, and the first prompt message is returned.
  • the first prompt message is used to indicate that the target user is prohibited from receiving normal services; if the target score is less than the first threshold, the server determines The target user is not in the blacklist feature sub-database corresponding to the target blacklist model, and a second prompt message is returned.
  • the second prompt message is used to instruct the target user to accept normal services.
  • the total score of the target user is compared with the voiceprint features (feature files) extracted from the voice and the voiceprint features in the blacklist feature library, and then combined with the address information to score.
  • the scoring is to calculate the similarity of voiceprint features. Usually there is a threshold according to the model training. When the score is higher than the threshold, it proves that the two voiceprint features are close, and it can be considered as a comparison.
  • this application can be mainly applied to bank loan business, according to the user's credit rating to decide whether to include the user in the blacklist, and at the same time, the user’s voiceprint characteristics are registered to the blacklist characteristics according to the user’s region, age, and gender. Sub-library.
  • a user's voiceprint characteristics are registered in the blacklist feature sub-database, in the future, if the person has an incoming call, it can be judged that he belongs to the blacklist based on his voiceprint characteristics, so the loan business may not be processed.
  • the blacklist feature database is divided into blacklist feature sub-bases with smaller dimensions, and voiceprint features are compared according to the blacklist feature sub-bases corresponding to address information, which improves the efficiency of voiceprint recognition.
  • FIG. 2 another flowchart of a method for identifying a blacklist based on address information provided by an embodiment of the present application, which specifically includes:
  • a preset blacklist model is generated, and the preset blacklist model is used for blacklist registration.
  • the server generates a preset blacklist model, and the preset blacklist model is used for blacklist registration.
  • the server performs sub-database registration processing on the blacklist, obtains key information such as the user's age, region, and gender from customer information (non-sensitive), phone segment, network IP, etc., and extracts voiceprint features from the voice of the call;
  • the basic information of the registered user saves the voiceprint characteristics in the corresponding database, which is the sub-database registration of the blacklist.
  • the sub-database registration only saves the blacklist feature sub-database with the finest dimension, such as (males in East China over 50 years old), while the larger-dimensional database will be synthesized from the finest database, such as (males in Eastern China can be over 50 years old in Eastern China).
  • the execution subject of this application may be a blacklist identification device based on address information, or may also be a terminal or a server, which is not specifically limited here.
  • the embodiment of the present application takes the server as the execution subject as an example for description.
  • the voice file includes audio and address information, and the address information includes incoming phone segment or/and Internet Protocol address IP information.
  • the server obtains the voice file of the target user, the voice file includes audio and address information, and the address information includes incoming telephone section or/and IP information.
  • the server receives the voice file of the target user; the server parses the voice file to obtain the audio and address identification of the target user; the server queries the preset table according to the address identification to obtain the address information corresponding to the address identification, and the address information includes the incoming line Phone segment or/and IP information.
  • the server when the server obtains the audio of the target user through the phone or the network, it also determines the specific address of the target user according to the incoming phone segment or the Internet protocol address (IP) information of the network. For a specific service, the basic information of the target user (except sensitive information) is maintained. For example, when the server obtains the target user's voice file through the network, the voice file includes audio and address identifiers as well as An identity identifier that indicates the basic information of the target user, where the basic information includes age, gender, and so on.
  • IP Internet protocol address
  • the server performs feature extraction on the audio through a preset algorithm to obtain a feature file. Specifically, the server converts the audio from an analog signal form to a digital signal form; the server pre-emphasizes the audio in the digital signal form; the server performs windowing processing on the pre-emphasized audio; the server performs discrete integration of the windowed audio
  • the inner leaf transforms to obtain the target complex number; the server maps the target complex number to the Mel spectrum to obtain the logarithmic energy; the server converts the logarithmic energy to obtain the cepstral coefficient; the server calculates the energy and the difference according to the cepstral coefficient, and generates the signature file .
  • the server needs to sample and quantize the collected audio, that is, convert the audio continuous waveform into discrete data points with a certain sampling rate and number of sampling bits. Since the sound in daily life is generally below 8kHz, according to Nyquist's law, the sampling rate of 16kHz is sufficient to make the sampled data contain most of the sound information. 16kHz means that 16k samples are sampled in 1s. These samples are stored in amplitude values. In order to effectively store the amplitude values, they need to be quantized into integers. For the 16-bit sampling number, it can represent an integer value between -32768 and 32767, so the sampling amplitude value can be quantized to the nearest integer value.
  • the energy of the low frequency part is usually higher than the energy of the high frequency part. After 10 times of Hz, the spectrum energy will be attenuated by 20dB, and due to the influence of the noise of the circuit itself when the microphone is collecting the sound signal, It will also increase the energy of the low frequency part. In order to make the energy of the high frequency part and the energy of the low frequency part have similar amplitude, it is necessary to pre-enhance the high-frequency energy of the collected sound, that is, pre-emphasize the audio in the form of digital signals.
  • the audio after pre-emphasis can be considered to be smooth, which is called windowing.
  • the window is described by three parameters: window length (in milliseconds), offset and shape.
  • Each windowed audio signal is called a frame, the number of milliseconds in each frame is called the frame length, and the distance between the left borders of two adjacent frames is called the frame shift.
  • L is the frame length
  • the server performs discrete Fourier transform on the windowed audio
  • the process of obtaining the target complex number specifically includes: the server obtains the windowed audio signal x[n],...,[m], n And m are integers greater than 0; the server calls the first preset formula to generate the target complex number X[k], the first preset formula is: N is a power of 2, k is an integer, and X[k] represents the amplitude and phase of a certain frequency component in the windowed audio signal.
  • the server maps the target complex number to the Mel spectrum, and the process of obtaining the logarithmic energy specifically includes:
  • the server smoothes the target complex number through the preset filter group; the server corresponds the smoothed complex number to the mel scale on the mel spectrum, and one mel scale represents a treble unit; the server uses the second preset formula Correspond the smoothed complex number to the mel scale to obtain the target scale.
  • the second preset formula is:
  • the server calculates the logarithmic energy of the target scale according to the third preset formula.
  • the third preset formula is: H m (k) is the frequency response of the filter bank, and M represents the number of filters in the preset filter bank. It should be noted that the response of average people to sound pressure is logarithmic, and people are not as sensitive to subtle changes in high sound pressure as low sound pressure.
  • the filter bank is a set of triangular filter banks with a Mel scale. The 10 filters below 1000 Hz are linearly separated, and the remaining filters above 1000 Hz are logarithmically separated.
  • the server calculates the energy and difference according to the cepstral coefficient, and the process of generating the signature file includes:
  • the energy of a certain frame is defined as the sum of the squares of sample points in a certain frame.
  • the energy from sample point t1 to sample point t2 is:
  • the features extracted above are considered separately for each frame and are static, while the actual sound is continuous, and there is a connection between frames. Therefore, it is necessary to add features to represent such dynamic changes between frames. This is usually calculated by calculating each frame.
  • the first-order difference or even the second-order difference of 13 features in one frame (12 cepstrum features plus 1 energy) can be realized.
  • a simple way to calculate the difference is to calculate the difference between the 13 features of the current frame before and after the frame: If the second-order difference is not considered, the final Mel frequency cepstral coefficient feature of each frame is 26 dimensions: 12-dimensional cepstral coefficient, 12-dimensional cepstral coefficient difference, 1-dimensional energy and 1-dimensional energy difference.
  • the server judges whether the signature file is valid. Specifically, the server determines whether the format of the signature file meets the preset quality requirements; if the format of the signature file does not meet the preset quality requirements, the server determines that the signature file is invalid; if the format of the signature file meets the preset quality requirements, the server determines the feature Whether there are voices of multiple users in the file; if there are no voices of multiple users in the feature file, the server determines that the feature file is valid; if there are voices of multiple users in the feature file, the server determines that the feature file is invalid.
  • the server If the signature file is invalid, the server generates an extraction failure status code.
  • the status code is used to indicate the reason for the extraction failure. That is, the extraction failure status code can simply inform the reason for the failure to extract, such as poor voice quality, multiple people talking, etc. Different reasons for failure correspond to different status codes. For example, if the format of the feature file does not meet the preset quality requirements, the server determines that the feature file is invalid and generates the first status code that the extraction fails; if there are multiple user voices in the feature file, the server determines that the feature file is invalid and generates the extraction The failed second status code, where the first status code and the second status code are different.
  • the server determines the target address range to which the target user belongs according to the incoming phone segment or IP information, calls the preset blacklist model and target address range to score the similarity of the signature file, and makes corresponding responses according to the scoring results operating.
  • the server determines the target address range to which the target user belongs based on the incoming phone segment or IP information; the server determines the corresponding target blacklist model in the preset blacklist model according to the target address range, and A preset blacklist model corresponds to a different blacklist feature sub-database; the server scores the similarity of the feature files through the target blacklist model to obtain the target score; if the target score is greater than or equal to the first threshold, the server determines The target user is in the blacklist feature sub-database corresponding to the target blacklist model, and the first prompt message is returned.
  • the first prompt message is used to indicate that the target user is prohibited from receiving normal services; if the target score is less than the first threshold, the server determines The target user is not in the blacklist feature sub-database corresponding to the target blacklist model, and a second prompt message is returned.
  • the second prompt message is used to instruct the target user to accept normal services.
  • the total score of the target user is compared with the voiceprint features (feature files) extracted from the voice and the voiceprint features in the blacklist feature library, and then combined with the address information to score.
  • the scoring is to calculate the similarity of voiceprint features. Usually there is a threshold according to the model training. When the score is higher than the threshold, it proves that the two voiceprint features are close, and it can be considered as a comparison.
  • this application can be mainly applied to bank loan business, according to the user's credit rating to decide whether to include the user in the blacklist, and at the same time, the user’s voiceprint characteristics are registered to the blacklist characteristics according to the user’s region, age, and gender. Sub-library.
  • a user's voiceprint characteristics are registered in the blacklist feature sub-database, in the future, if the person has an incoming call, it can be judged that he belongs to the blacklist based on his voiceprint characteristics, so the loan business may not be processed.
  • the blacklist feature database is divided into blacklist feature sub-bases with smaller dimensions, and voiceprint features are compared according to the blacklist feature sub-bases corresponding to address information, which improves the efficiency of voiceprint recognition.
  • An embodiment of the list identification device includes:
  • the obtaining unit 301 is configured to obtain a voice file of a target user, the voice file includes audio and address information, and the address information includes incoming telephone section or/and Internet Protocol address IP information;
  • the extraction unit 302 is configured to perform feature extraction on the audio by using a preset algorithm to obtain a feature file
  • the judging unit 303 is used to judge whether the feature file is valid
  • the first generating unit 304 if the signature file is invalid, is used to generate a status code of the extraction failure, and the status code is used to indicate the reason for the extraction failure;
  • the scoring unit 305 if the feature file is valid, is used to determine the target address range to which the target user belongs according to the incoming phone segment or the IP information, and call a preset blacklist model and the target address The range scores the similarity of the feature files, and performs corresponding operations based on the scoring results.
  • the blacklist feature database is divided into blacklist feature sub-bases with smaller dimensions, and voiceprint features are compared according to the blacklist feature sub-bases corresponding to address information, which improves the efficiency of voiceprint recognition.
  • another embodiment of the device for identifying a blacklist based on address information in an embodiment of the present application includes:
  • the obtaining unit 301 is configured to obtain a voice file of a target user, the voice file includes audio and address information, and the address information includes incoming telephone section or/and Internet Protocol address IP information;
  • the extraction unit 302 is configured to perform feature extraction on the audio by using a preset algorithm to obtain a feature file
  • the judging unit 303 is used to judge whether the feature file is valid
  • the first generating unit 304 if the signature file is invalid, is used to generate a status code of the extraction failure, and the status code is used to indicate the reason for the extraction failure;
  • the scoring unit 305 if the feature file is valid, is used to determine the target address range to which the target user belongs according to the incoming phone segment or the IP information, and call a preset blacklist model and the target address The range scores the similarity of the feature files, and performs corresponding operations based on the scoring results.
  • the scoring unit 305 is specifically used for:
  • the target address range to which the target user belongs is determined based on the incoming phone segment or the IP information; the corresponding target address range is determined in the preset blacklist model according to the target address range Target blacklist model, each preset blacklist model corresponds to a different blacklist feature sub-database; the target blacklist model is used to score the similarity of the feature files to obtain the target score; if the target If the score is greater than or equal to the first threshold, it is determined that the target user is in the blacklist feature database corresponding to the target blacklist model, and a first prompt message is returned.
  • the first prompt message is used to indicate the target The user is prohibited from receiving normal services; if the target score is less than the first threshold, it is determined that the target user is not in the blacklist feature database corresponding to the target blacklist model, and a second prompt message is returned, so The second prompt message is used to indicate that the target user accepts normal services.
  • the obtaining unit 301 is specifically configured to:
  • Receive the voice file of the target user parse the voice file to obtain the audio and address identification of the target user; query a preset table according to the address identification to obtain the address information corresponding to the address identification, the address
  • the information includes incoming phone segment or/and IP information.
  • the extraction unit 302 includes:
  • the first conversion module 3021 is used to convert the audio from an analog signal form to a digital signal form
  • the pre-emphasis module 3022 is used to pre-emphasize audio in the form of digital signals
  • the windowing module 3023 is used for windowing the pre-emphasized audio
  • the transform module 3024 is used to perform discrete Fourier transform on the windowed audio to obtain the target complex number
  • the corresponding module 3025 is used to map the target complex number to the Mel spectrum to obtain logarithmic energy
  • the second conversion module 3026 is configured to convert the logarithmic energy to obtain the cepstral coefficient
  • the calculation module 3027 is used to calculate the energy and the difference according to the cepstral coefficients to generate a feature file.
  • the transformation module 3024 is specifically used for:
  • the corresponding module 3025 is specifically used for:
  • the target complex number is smoothed through the preset filter bank; the smoothed complex number is corresponding to the mel scale on the mel spectrum, and one mel scale represents a treble unit; and the second preset formula
  • the smoothed complex number corresponds to the mel scale to obtain the target scale, and the second preset formula is:
  • the logarithmic energy of the target scale is calculated according to a third preset formula, and the third preset formula is: H m (k) is the frequency response of the filter bank, and M represents the number of filters in the preset filter bank.
  • the blacklist identification device based on address information further includes:
  • the second generating unit 306 is configured to generate a preset blacklist model, and the preset blacklist model is used for blacklist registration.
  • the blacklist feature database is divided into blacklist feature sub-bases with smaller dimensions, and voiceprint features are compared according to the blacklist feature sub-bases corresponding to address information, which improves the efficiency of voiceprint recognition.
  • FIG. 5 is a schematic structural diagram of a blacklist recognition device based on address information provided by an embodiment of the present application.
  • the blacklist recognition device 500 based on address information may have relatively large differences due to different configurations or performance, and may include one or more A processor (central processing units, CPU) 501 (for example, one or more processors), a memory 509, and one or more storage media 508 (for example, one or more storage devices with a large amount of data) storing application programs 507 or data 506.
  • the memory 509 and the storage medium 508 may be short-term storage or persistent storage.
  • the program stored in the storage medium 508 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the blacklist identification device based on address information.
  • the processor 501 may be configured to communicate with the storage medium 508, and execute a series of instruction operations in the storage medium 508 on the blacklist recognition device 500 based on address information.
  • the blacklist identification device 500 based on address information may also include one or more power supplies 502, one or more wired or wireless network interfaces 503, one or more input and output interfaces 504, and/or, one or more operating systems 505 , Such as Windows Serve, Mac OS X, Unix, Linux, FreeBSD and so on.
  • Windows Serve Windows Serve
  • Mac OS X Unix
  • Linux FreeBSD
  • FIG. 5 does not constitute a limitation on the blacklist recognition device based on address information, and may include more or less components than shown in the figure. Or some parts are combined, or different parts are arranged.
  • the processor 501 can perform the functions of the acquisition unit 301, the extraction unit 302, the judgment unit 303, the generation unit 304, the scoring unit 305, and the generation unit 306 in the foregoing embodiment.
  • the processor 501 is the control center of the blacklist identification device based on address information, and can perform processing according to the set blacklist identification method based on address information.
  • the processor 501 uses various interfaces and lines to connect various parts of the entire blacklist identification device based on address information, by running or executing software programs and/or modules stored in the memory 509, and calling data stored in the memory 509, Perform various functions of the blacklist recognition device based on address information, divide the blacklist feature database into smaller blacklist feature sub-bases, and compare the voiceprint features according to the blacklist feature sub-bases corresponding to the address information to improve Voiceprint recognition efficiency.
  • the storage medium 508 and the memory 509 are both carriers for storing data. In the embodiment of the present application, the storage medium 508 may refer to an internal memory with a small storage capacity but a fast speed, and the storage medium 509 may have a large storage capacity but a slow storage speed. External memory.
  • the memory 509 may be used to store software programs and modules.
  • the processor 501 executes various functional applications and data processing of the blacklist identification device 500 based on address information by running the software programs and modules stored in the memory 509.
  • the memory 509 may mainly include a storage program area and a storage data area, where the storage program area may store an operating system, at least one application program required by a function (for example, feature extraction of the audio through a preset algorithm to obtain a feature file), etc. ;
  • the storage data area can store data created according to the use of the blacklist identification device based on the address information (such as the status code of the extraction failure) and so on.
  • the memory 509 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices.
  • a non-volatile memory such as at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices.
  • the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be a non-volatile computer-readable storage medium, and the computer-readable storage medium may also be a volatile computer-readable storage medium.
  • the computer-readable storage medium stores instructions, and when the instructions run on the computer, the computer executes the following steps of the blacklist identification method based on address information:
  • the voice file includes audio and address information, and the address information includes incoming phone segment or/and Internet Protocol address IP information;
  • the target address range to which the target user belongs is determined according to the incoming phone segment or the IP information, and the preset blacklist model and the target address range are called to perform similarity to the signature file Scoring, and perform corresponding operations based on the scoring results.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, twisted pair) or wireless (such as infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, an optical disc), or a semiconductor medium (for example, a solid state disk (SSD)).

Abstract

The present application relates to the field of artificial intelligence, and disclosed therein are an address information-based blacklist identification method, an apparatus, a device, and a storage medium, which are used for dividing a blacklist feature library into blacklist feature sub-libraries having smaller dimensions and comparing voiceprint features according to the blacklist feature sub-library corresponding to address information, thereby improving the voiceprint identification efficiency. The method according to the present application comprises: acquiring a voice file of a target user, the voice file comprising audio and address information, and the address information comprising an incoming telephone segment or/and Internet Protocol (IP) address information; carrying out feature extraction on the audio by means of a preset algorithm to obtain a feature file; determining whether the feature file is valid or not; if the feature file is invalid, generating a status code that extraction has failed; and if the feature file is valid, determining according to the incoming telephone segment or the IP information a target address range to which the target user belongs, calling a preconfigured blacklist model and the target address range to perform similarity scoring on the feature file, and performing a corresponding operation according to the scoring result.

Description

基于地址信息的黑名单识别方法、装置、设备及存储介质Blacklist identification method, device, equipment and storage medium based on address information
本申请要求于2019年9月19日提交中国专利局、申请号为201910884630.2、发明名称为“基于地址信息的黑名单识别方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on September 19, 2019, the application number is 201910884630.2, and the invention title is "Blacklist identification method, device, equipment and storage medium based on address information", all of which The content is incorporated in the application by reference.
技术领域Technical field
本申请涉及人工智能领域,尤其涉及基于地址信息的黑名单识别方法、装置、设备及存储介质。This application relates to the field of artificial intelligence, and in particular to a blacklist identification method, device, equipment and storage medium based on address information.
背景技术Background technique
声纹识别是通过对一种或多种语音信号的特征分析来达到对未知声音辨别的目的,简单的说就是辨别某一句话是否是某一个人说的技术。声纹识别的理论基础是每一个声音都具有独特的特征,通过该特征能将不同人的声音进行有效的区分。决定声纹特征最重要的两个因素是声腔的尺寸和发声器官被操纵的方式。Voiceprint recognition achieves the purpose of distinguishing unknown voices by analyzing the characteristics of one or more speech signals. Simply put, it is a technology to distinguish whether a certain sentence is spoken by a certain person. The theoretical basis of voiceprint recognition is that each voice has a unique feature, which can effectively distinguish the voices of different people. The two most important factors that determine the characteristics of the voiceprint are the size of the sound cavity and the way the vocal organs are manipulated.
声纹是人体非常重要的特征,理论上来讲,没有任何两个人具备完全相同的声纹特征。而基于声纹识别的说话人辨认技术在实际生产中具有重要的意义,比如银行信用卡业务,可以通过记录黑名单用户的声纹特征库,比对用户语音是否在黑名单特征库中,以此分析辨认出用户是否在黑名单,从而指导银行信用卡业务人员做出对应的处理应对策略。Voiceprint is a very important feature of the human body. In theory, no two people have exactly the same voiceprint characteristics. The speaker identification technology based on voiceprint recognition is of great significance in actual production. For example, bank credit card business can record the voiceprint feature database of blacklisted users and compare whether the user's voice is in the blacklisted feature database. Analyze and identify whether the user is on the blacklist, so as to guide the bank's credit card business personnel to make corresponding processing strategies.
现有方案中,黑名单特征库的建立覆盖全国,覆盖全年龄段,因此,黑名单特征库会非常庞大,发明人意识到基于庞大黑名单特征库的比对效率会非常慢,很难快速得到响应。In the existing scheme, the establishment of the blacklist feature database covers the whole country and covers all age groups. Therefore, the blacklist feature database will be very large. The inventor realized that the comparison efficiency based on the huge blacklist feature database would be very slow and difficult to be fast. Get a response.
发明内容Summary of the invention
本申请提供了基于地址信息的黑名单识别方法、装置、设备及存储介质,用于将黑名单特征库划分为维度更小的黑名单特征分库,根据地址信息对应的黑名单特征分库对声纹特征进行对比,提高了声纹识别效率。This application provides a blacklist identification method, device, equipment and storage medium based on address information, which are used to divide the blacklist feature database into smaller blacklist feature sub-databases, and based on the pair of blacklist feature sub-databases corresponding to the address information The voiceprint features are compared to improve the efficiency of voiceprint recognition.
本申请实施例的第一方面提供一种基于地址信息的黑名单识别方法,包括:获取目标用户的语音文件,所述语音文件包括音频和地址信息,所述地址信息包括进线电话区段或/和互联网协议地址IP信息;通过预置算法对所述音频进行特征提取,得到特征文件;判断所述特征文件是否有效;若所述特征文件无效,则生成提取失败的状态码,所述状态码用于指示提取失败的原因;若所述特征文件有效,则根据所述进线电话区段或所述IP信息确定所述目标用户所属的目标地址范围,调用预置的黑名单模型和所述目标地址范围对特征文件进行相似度评分,并根据评分结果进行相应操作。The first aspect of the embodiments of the present application provides a blacklist recognition method based on address information, including: acquiring a voice file of a target user, the voice file including audio and address information, and the address information including the incoming call section or / And Internet Protocol address IP information; perform feature extraction on the audio through a preset algorithm to obtain a feature file; determine whether the feature file is valid; if the feature file is invalid, generate an extraction failure status code, the status The code is used to indicate the reason for the extraction failure; if the feature file is valid, the target address range to which the target user belongs is determined according to the incoming phone segment or the IP information, and the preset blacklist model and all the addresses are called. According to the target address range, the feature files are scored for similarity, and corresponding operations are performed according to the scoring results.
本申请实施例的第二方面提供了一种基于地址信息的黑名单识别装置,包括:获取单元,用于获取目标用户的语音文件,所述语音文件包括音频和地址信息;提取单元,用于通过预置算法对所述音频进行特征提取,得到特征文件;判断单元,用于判断所述特征文件是否有效;第一生成单元,若所述特征文件无效,则用于生成提取失败的状态码,所述状态码用于指示提取失败的原因;评分单元,若所述特征文件有效,则用于根据所述进线电话区段或所述互联网协议地址IP信息确定所述目标用户所属的目标地址范围,调用预置的 黑名单模型和所述目标地址范围对特征文件进行相似度评分,并根据评分结果进行相应操作。A second aspect of the embodiments of the present application provides a blacklist recognition device based on address information, including: an acquiring unit for acquiring a voice file of a target user, the voice file including audio and address information; and an extracting unit for Perform feature extraction on the audio through a preset algorithm to obtain a feature file; a judging unit for judging whether the feature file is valid; a first generating unit, if the feature file is invalid, for generating a status code of extraction failure The status code is used to indicate the reason for the extraction failure; the scoring unit, if the feature file is valid, is used to determine the target to which the target user belongs based on the incoming phone section or the Internet Protocol address IP information Address range, call the preset blacklist model and the target address range to score the similarity of the feature files, and perform corresponding operations according to the scoring results.
本申请实施例的第三方面提供了一种基于地址信息的黑名单识别设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述基于地址信息的黑名单识别方法。The third aspect of the embodiments of the present application provides a blacklist identification device based on address information, including a memory, a processor, and a computer program stored in the memory and running on the processor. The processor The above-mentioned blacklist identification method based on address information is realized when the computer program is executed.
本申请实施例的第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在计算机上运行时,使得计算机执行上述基于地址信息的黑名单识别方法的步骤。The fourth aspect of the embodiments of the present application provides a computer-readable storage medium that stores instructions in the computer-readable storage medium. When the instructions run on a computer, the computer executes the blacklist based on address information. Identify the steps of the method.
本申请实施例提供的技术方案中,获取目标用户的语音文件,语音文件包括音频和地址信息,地址信息包括进线电话区段或/和互联网协议地址IP信息;通过预置算法对音频进行特征提取,得到特征文件;判断特征文件是否有效;若特征文件无效,则生成提取失败的状态码,状态码用于指示提取失败的原因;若特征文件有效,则根据进线电话区段或IP信息确定目标用户所属的目标地址范围,调用预置的黑名单模型和目标地址范围对特征文件进行相似度评分,并根据评分结果进行相应操作。本申请实施例,将黑名单特征库划分为维度更小的黑名单特征分库,根据地址信息对应的黑名单特征分库对声纹特征进行对比,提高了声纹识别效率。In the technical solution provided by the embodiments of the present application, the voice file of the target user is obtained, the voice file includes audio and address information, and the address information includes the incoming phone segment or/and Internet Protocol address IP information; the audio is characterized by a preset algorithm Extract and obtain the signature file; determine whether the signature file is valid; if the signature file is invalid, generate a status code of the extraction failure, the status code is used to indicate the reason for the extraction failure; if the signature file is valid, according to the incoming phone segment or IP information Determine the target address range to which the target user belongs, call the preset blacklist model and target address range to score the similarity of the feature files, and perform corresponding operations according to the scoring results. In this embodiment of the application, the blacklist feature database is divided into blacklist feature sub-bases with smaller dimensions, and voiceprint features are compared according to the blacklist feature sub-bases corresponding to address information, which improves the efficiency of voiceprint recognition.
附图说明Description of the drawings
图1为本申请实施例中基于地址信息的黑名单识别方法的一个实施例示意图;FIG. 1 is a schematic diagram of an embodiment of a blacklist identification method based on address information in an embodiment of the application;
图2为本申请实施例中基于地址信息的黑名单识别方法的另一个实施例示意图;2 is a schematic diagram of another embodiment of a blacklist identification method based on address information in an embodiment of this application;
图3为本申请实施例中基于地址信息的黑名单识别装置的一个实施例示意图;3 is a schematic diagram of an embodiment of a blacklist identification device based on address information in an embodiment of the application;
图4为本申请实施例中基于地址信息的黑名单识别装置的另一个实施例示意图;4 is a schematic diagram of another embodiment of a blacklist identification device based on address information in an embodiment of the application;
图5为本申请实施例中基于地址信息的黑名单识别设备的一个实施例示意图。Fig. 5 is a schematic diagram of an embodiment of a blacklist identification device based on address information in an embodiment of the application.
具体实施方式detailed description
本申请提供了基于地址信息的黑名单识别方法、装置、设备及存储介质,于将黑名单特征库划分为维度更小的黑名单特征分库,根据地址信息对应的黑名单特征分库对声纹特征进行对比,提高了声纹识别效率。This application provides a blacklist identification method, device, equipment, and storage medium based on address information. The blacklist feature database is divided into smaller-dimensional blacklist feature sub-bases, and the blacklist feature sub-bases corresponding to the address information are used for voice matching. The feature of the pattern is compared to improve the efficiency of voiceprint recognition.
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例进行描述。In order to enable those skilled in the art to better understand the solution of the present application, the embodiments of the present application will be described below in conjunction with the accompanying drawings in the embodiments of the present application.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”或“具有”及其任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. (if any) in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects, without having to use To describe a specific order or sequence. It should be understood that the data used in this way can be interchanged under appropriate circumstances, so that the embodiments described herein can be implemented in a sequence other than the content illustrated or described herein. In addition, the terms "including" or "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device that includes a series of steps or units is not necessarily limited to those clearly listed. Steps or units, but may include other steps or units that are not clearly listed or are inherent to these processes, methods, products, or equipment.
请参阅图1,本申请实施例提供的基于地址信息的黑名单识别方法的流程图,具体包 括:Please refer to Fig. 1, a flowchart of a method for identifying a blacklist based on address information provided by an embodiment of the present application, which specifically includes:
101、获取目标用户的语音文件,语音文件包括音频和地址信息,地址信息包括进线电话区段或/和互联网协议地址IP信息。101. Acquire a voice file of the target user. The voice file includes audio and address information, and the address information includes incoming phone segment or/and Internet Protocol address IP information.
服务器获取目标用户的语音文件,语音文件包括音频和地址信息,地址信息包括进线电话区段或/和IP信息。具体的,服务器接收目标用户的语音文件;服务器对语音文件进行解析,得到目标用户的音频和地址标识;服务器根据地址标识查询预置表格,得到与地址标识对应的地址信息,地址信息包括进线电话区段或/和IP信息。The server obtains the voice file of the target user, the voice file includes audio and address information, and the address information includes incoming telephone section or/and IP information. Specifically, the server receives the voice file of the target user; the server parses the voice file to obtain the audio and address identification of the target user; the server queries the preset table according to the address identification to obtain the address information corresponding to the address identification, and the address information includes the incoming line Phone segment or/and IP information.
例如,服务器在通过电话或网络获取到目标用户的音频的同时,也会相应的根据进线电话区段或者互联网协议地址(internet protocol address,IP)信息确定目标用户的具体地址。而对于某一具体业务而言,会维护有目标用户的基本信息(敏感信息除外),例如,当服务器通过网络获取到目标用户语音文件,该语音文件中除了包括音频和地址标识,还包括有指示目标用户基本信息的身份标识,其中,基本信息包括年龄、性别等。For example, when the server obtains the audio of the target user through the phone or the network, it will also determine the specific address of the target user according to the incoming phone segment or internet protocol address (IP) information. For a specific service, the basic information of the target user (except sensitive information) is maintained. For example, when the server obtains the target user's voice file through the network, the voice file includes audio and address identifiers as well as An identity identifier that indicates the basic information of the target user, where the basic information includes age, gender, and so on.
可以理解的是,本申请的执行主体可以为基于地址信息的黑名单识别装置,还可以是终端或者服务器,具体此处不做限定。本申请实施例以服务器为执行主体为例进行说明。It is understandable that the execution subject of this application may be a blacklist identification device based on address information, or may also be a terminal or a server, which is not specifically limited here. The embodiment of the present application takes the server as the execution subject as an example for description.
102、通过预置算法对音频进行特征提取,得到特征文件。102. Perform feature extraction on the audio by using a preset algorithm to obtain a feature file.
服务器通过预置算法对音频进行特征提取,得到特征文件。具体的,服务器将音频从模拟信号形式转换成数字信号形式;服务器对数字信号形式的音频进行预加重;服务器将预加重后的音频进行加窗处理;服务器将加窗处理后的音频进行离散傅里叶变换,得到目标复数;服务器将目标复数对应到梅尔频谱上,得到对数能量;服务器将对数能量进行转换,得到倒谱系数;服务器根据倒谱系数计算能量和差分,生成特征文件。The server performs feature extraction on the audio through a preset algorithm to obtain a feature file. Specifically, the server converts the audio from an analog signal form to a digital signal form; the server pre-emphasizes the audio in the digital signal form; the server performs windowing processing on the pre-emphasized audio; the server performs discrete integration of the windowed audio The inner leaf transforms to obtain the target complex number; the server maps the target complex number to the Mel spectrum to obtain the logarithmic energy; the server converts the logarithmic energy to obtain the cepstral coefficient; the server calculates the energy and the difference according to the cepstral coefficient, and generates the signature file .
需要说明的是,再进行特征提取之前,服务器需要对采集到的音频进行采样和量化,即以一定的采样率和采样位数把音频连续波形转换为离散的数据点。由于日常生活中的声音一般都在8kHz以下,根据Nyquist定律,16kHz采样率足以使得采样出来的数据包含大多数声音信息。16kHz意味着1s的时间内采样16k个样本,这些样本都是以幅度值存储,为了有效存储幅度值,需要将其量化为整数。对于16位采样位数来说,可以表示-32768~32767之间的整数值,所以可以将采样幅度值量化为最近的整数值。It should be noted that, before performing feature extraction, the server needs to sample and quantize the collected audio, that is, convert the audio continuous waveform into discrete data points with a certain sampling rate and number of sampling bits. Since the sounds in daily life are generally below 8kHz, according to Nyquist's law, the 16kHz sampling rate is sufficient to make the sampled data contain most of the sound information. 16kHz means that 16k samples are sampled in 1s. These samples are stored in amplitude values. In order to effectively store the amplitude values, they need to be quantized into integers. For the 16-bit sampling number, it can represent an integer value between -32768 and 32767, so the sampling amplitude value can be quantized to the nearest integer value.
例如,对于声音信号的频谱来说,通常低频部分的能量高于高频部分的能量,每经过10倍Hz,频谱能量就会衰减20dB,而且由于麦克风在采集声音信号时电路本身噪声的影响,也会增加低频部分的能量,为使高频部分的能量和低频部分能量有相似的幅度,需要预加强采集到声音的高频能量,即对数字信号形式的音频进行预加重。For example, for the frequency spectrum of a sound signal, the energy of the low frequency part is usually higher than the energy of the high frequency part. After 10 times of Hz, the spectrum energy will be attenuated by 20dB, and due to the influence of the noise of the circuit itself when the microphone is collecting the sound signal, It will also increase the energy of the low frequency part. In order to make the energy of the high frequency part and the energy of the low frequency part have a similar amplitude, it is necessary to pre-enhance the high-frequency energy of the collected sound, that is, pre-emphasize the audio in the form of a digital signal.
在一段相当短的时间内,可以认为预加重后的音频是平稳的,这就是加窗。窗由三个参数来描述:窗长(单位毫秒)、偏移和形状。每一个加窗的音频信号叫做一帧,每一帧的毫秒数叫做帧长,相邻两帧左边界的距离叫帧移。从音频信号s[n]中提取一帧的过程可表示为y[n]=w[n]s[n],如果w[n]是矩形窗,则信号会在边界处切断,这些不连续会对傅里叶分析造成影响。因此在梅尔频率倒谱系数中,加窗一般使用边缘平滑降到0的汉明窗,表达式如下:In a relatively short period of time, the audio after pre-emphasis can be considered to be smooth, which is called windowing. The window is described by three parameters: window length (in milliseconds), offset and shape. Each windowed audio signal is called a frame, the number of milliseconds in each frame is called the frame length, and the distance between the left borders of two adjacent frames is called the frame shift. The process of extracting a frame from the audio signal s[n] can be expressed as y[n]=w[n]s[n]. If w[n] is a rectangular window, the signal will be cut off at the boundary. These discontinuities Will affect the Fourier analysis. Therefore, in the Mel frequency cepstral coefficient, the windowing generally uses the Hamming window with edge smoothing reduced to 0, the expression is as follows:
Figure PCTCN2019117117-appb-000001
L为帧长。
Figure PCTCN2019117117-appb-000001
L is the frame length.
可以理解的是,服务器将加窗处理后的音频进行离散傅里叶变换,得到目标复数的过程具体包括:服务器获取加窗后的音频信号x[n],...,[m],n和m为大于0的整数;服务器调用第一预置公式生成目标复数X[k],第一预置公式为:
Figure PCTCN2019117117-appb-000002
N为2的幂,k为整数,X[k]表示加窗后的音频信号中某一频率成分的幅度和相位。
It is understandable that the server performs discrete Fourier transform on the windowed audio, and the process of obtaining the target complex number specifically includes: the server obtains the windowed audio signal x[n],...,[m], n And m are integers greater than 0; the server calls the first preset formula to generate the target complex number X[k], the first preset formula is:
Figure PCTCN2019117117-appb-000002
N is a power of 2, k is an integer, and X[k] represents the amplitude and phase of a certain frequency component in the windowed audio signal.
服务器将目标复数对应到梅尔频谱上,得到对数能量的过程具体包括:The server maps the target complex number to the Mel spectrum, and the process of obtaining the logarithmic energy specifically includes:
服务器通过预置的滤波器组将目标复数进行平滑处理;服务器将平滑处理后的复数与梅尔频谱上的梅尔刻度进行对应,一个梅尔刻度表示一个高音单位;服务器通过第二预置公式将平滑处理后的复数对应到梅尔刻度上,得到目标刻度,第二预置公式为:
Figure PCTCN2019117117-appb-000003
服务器根据第三预置公式计算目标刻度的对数能量,第三预置公式为:
Figure PCTCN2019117117-appb-000004
H m(k)为滤波器组的频率响应,M表示预置的滤波器组中滤波器个数。需要说明的是,一般人对声音声压的反应呈对数关系,人对高声压的细微变化敏感度不如低声压。此外,使用对数可以降低提取的特征对输入声音能量变化的敏感度,因为声音与麦克风之间的距离是变化的,因而麦克风采集到的声音能量也是变化的。人耳听觉对不同频带的敏感度是不同的,人耳对高频不如低频敏感,这一分界线大约是1000Hz,在提取声音特征时模拟人耳听觉这一性质可以提高识别性能。滤波器组是一组梅尔刻度的三角形滤波器组,1000Hz以下的10个滤波器线性相隔,1000Hz以上的剩余滤波器对数相隔。
The server smoothes the target complex number through the preset filter group; the server corresponds the smoothed complex number to the mel scale on the mel spectrum, and one mel scale represents a treble unit; the server uses the second preset formula Correspond the smoothed complex number to the mel scale to obtain the target scale. The second preset formula is:
Figure PCTCN2019117117-appb-000003
The server calculates the logarithmic energy of the target scale according to the third preset formula. The third preset formula is:
Figure PCTCN2019117117-appb-000004
H m (k) is the frequency response of the filter bank, and M represents the number of filters in the preset filter bank. It should be noted that the response of average people to sound pressure is logarithmic, and people are not as sensitive to subtle changes in high sound pressure as low sound pressure. In addition, the use of logarithms can reduce the sensitivity of the extracted features to changes in the input sound energy, because the distance between the sound and the microphone changes, so the sound energy collected by the microphone also changes. The sensitivity of human hearing to different frequency bands is different. The human ear is not as sensitive to high frequencies as low frequencies. This dividing line is about 1000 Hz. The property of simulating human hearing when extracting sound features can improve recognition performance. The filter bank is a set of triangular filter banks with a Mel scale. The 10 filters below 1000 Hz are linearly separated, and the remaining filters above 1000 Hz are logarithmically separated.
服务器根据倒谱系数计算能量和差分,生成特征文件的过程具体包括:The server calculates the energy and difference according to the cepstral coefficient, and the process of generating the signature file includes:
具体的,某一帧的能量定义为某一帧样本点的平方和,对于一个加窗信号x,其从样本点t1到样本点t2的能量为:
Figure PCTCN2019117117-appb-000005
以上提取的特征每一帧单独考虑,是静态的,而实际声音是连续的,帧与帧之间是有联系的,因而需要增加特征来表示这种帧间的动态变化,这通常通过计算每一帧13个特征(12个倒谱特征加上1个能量)的一阶差分甚至二阶差分来实现。一个简单计算差分的方法就是计算当前帧前后各一帧的13个特征的差值:
Figure PCTCN2019117117-appb-000006
如果不考虑二阶差分,最终每一帧的梅尔频率倒谱系数特征为26维度:12维倒谱系数、12维倒谱系数差分、1维能量和1维能量差分。
Specifically, the energy of a certain frame is defined as the sum of the squares of sample points in a certain frame. For a windowed signal x, the energy from sample point t1 to sample point t2 is:
Figure PCTCN2019117117-appb-000005
The features extracted above are considered separately for each frame and are static, while the actual sound is continuous, and there is a connection between frames. Therefore, it is necessary to add features to represent such dynamic changes between frames. This is usually calculated by calculating each frame. The first-order difference or even the second-order difference of 13 features in one frame (12 cepstrum features plus 1 energy) can be realized. A simple way to calculate the difference is to calculate the difference between the 13 features of the current frame before and after the frame:
Figure PCTCN2019117117-appb-000006
If the second-order difference is not considered, the final Mel frequency cepstral coefficient feature of each frame is 26 dimensions: 12-dimensional cepstral coefficient, 12-dimensional cepstral coefficient difference, 1-dimensional energy and 1-dimensional energy difference.
103、判断特征文件是否有效。103. Determine whether the feature file is valid.
服务器判断特征文件是否有效。具体的,服务器判断特征文件的格式是否符合预置质量要求;若特征文件的格式不符合预置质量要求,则服务器确定特征文件无效;若特征文 件的格式符合预置质量要求,则服务器判断特征文件中是否存在多个用户的声音;若特征文件中不存在多个用户的声音,则服务器确定特征文件有效;若特征文件中存在多个用户的声音,则服务器确定特征文件无效。The server judges whether the signature file is valid. Specifically, the server determines whether the format of the signature file meets the preset quality requirements; if the format of the signature file does not meet the preset quality requirements, the server determines that the signature file is invalid; if the format of the signature file meets the preset quality requirements, the server determines the feature Whether there are voices of multiple users in the file; if there are no voices of multiple users in the feature file, the server determines that the feature file is valid; if there are voices of multiple users in the feature file, the server determines that the feature file is invalid.
104、若特征文件无效,则生成提取失败的状态码,状态码用于指示提取失败的原因。104. If the feature file is invalid, a status code of the extraction failure is generated, and the status code is used to indicate the reason for the extraction failure.
若特征文件无效,则服务器生成提取失败的状态码,状态码用于指示提取失败的原因,即提取失败的状态码可以简单告知没有提取成功的原因,如语音质量不佳,多人说话等。不同的失败原因对应不同的状态码。例如,若特征文件的格式不符合预置质量要求,则服务器确定特征文件无效,生成提取失败的第一状态码;若特征文件中存在多个用户的声音,则服务器确定特征文件无效,生成提取失败的第二状态码,其中,第一状态码和第二状态码不同。If the signature file is invalid, the server generates an extraction failure status code. The status code is used to indicate the reason for the extraction failure. That is, the extraction failure status code can simply inform the reason for the failure to extract, such as poor voice quality, multiple people talking, etc. Different reasons for failure correspond to different status codes. For example, if the format of the feature file does not meet the preset quality requirements, the server determines that the feature file is invalid and generates the first status code that the extraction fails; if there are multiple user voices in the feature file, the server determines that the feature file is invalid and generates the extraction The failed second status code, where the first status code and the second status code are different.
105、若特征文件有效,则根据进线电话区段或IP信息确定目标用户所属的目标地址范围,调用预置的黑名单模型和目标地址范围对特征文件进行相似度评分,并根据评分结果进行相应操作。105. If the signature file is valid, determine the target address range to which the target user belongs according to the incoming phone segment or IP information, call the preset blacklist model and target address range to score the similarity of the signature file, and perform the score based on the score result Operate accordingly.
若特征文件有效,则服务器根据进线电话区段或IP信息确定目标用户所属的目标地址范围,调用预置的黑名单模型和目标地址范围对特征文件进行相似度评分,并根据评分结果进行相应操作。具体的,若特征文件有效,则服务器基于进线电话区段或IP信息确定目标用户所属的目标地址范围;服务器根据目标地址范围在预置的黑名单模型中确定对应的目标黑名单模型,每个预置的黑名单模型对应一个不同的黑名单特征分库;服务器通过目标黑名单模型对特征文件进行相似度评分,得到目标分值;若目标分值大于或等于第一阈值,则服务器确定目标用户在目标黑名单模型对应的黑名单特征分库中,并返回第一提示消息,第一提示消息用于指示目标用户被禁止接受正常服务;若目标分值小于第一阈值,则服务器确定目标用户不在目标黑名单模型对应的黑名单特征分库中,并返回第二提示消息,第二提示消息用于指示目标用户接受正常服务。If the signature file is valid, the server determines the target address range to which the target user belongs according to the incoming phone segment or IP information, calls the preset blacklist model and target address range to score the similarity of the signature file, and makes corresponding responses according to the scoring results operating. Specifically, if the signature file is valid, the server determines the target address range to which the target user belongs based on the incoming phone segment or IP information; the server determines the corresponding target blacklist model in the preset blacklist model according to the target address range, and A preset blacklist model corresponds to a different blacklist feature sub-database; the server scores the similarity of the feature files through the target blacklist model to obtain the target score; if the target score is greater than or equal to the first threshold, the server determines The target user is in the blacklist feature sub-database corresponding to the target blacklist model, and the first prompt message is returned. The first prompt message is used to indicate that the target user is prohibited from receiving normal services; if the target score is less than the first threshold, the server determines The target user is not in the blacklist feature sub-database corresponding to the target blacklist model, and a second prompt message is returned. The second prompt message is used to instruct the target user to accept normal services.
需要说明的是,不同的地址,对应不同的目标黑名单模型,对应的黑名单特征分库也不同。不在黑名单特征分库内,即表示为正常用户,接受正常服务,如果此次服务过程中不顺利,也可将该用户注册到黑名单特征分库中;如果在黑名单特征分库内,则视为黑名单用户,不予正常服务(如不予办理贷款等等)。It should be noted that different addresses correspond to different target blacklist models, and the corresponding blacklist feature sub-databases are also different. If the user is not in the blacklist feature sub-database, it means that it is a normal user and accepts the normal service. If the service process is not smooth, the user can also be registered in the blacklist feature sub-database; if it is in the blacklist feature sub-database, It is regarded as a blacklisted user and will not receive normal services (such as not granting loans, etc.).
目标用户的总分数,对语音所提取的声纹特征(特征文件)与黑名单特征分库中的声纹特征进行比对,然后结合地址信息进行评分。这里的打分是计算声纹特征的相似度。通常会根据模型训练有一个阈值,当打分高于阈值时,即证明两个声纹特征接近,也就可以认为是比对到了。The total score of the target user is compared with the voiceprint features (feature files) extracted from the voice and the voiceprint features in the blacklist feature library, and then combined with the address information to score. The scoring here is to calculate the similarity of voiceprint features. Usually there is a threshold according to the model training. When the score is higher than the threshold, it proves that the two voiceprint features are close, and it can be considered as a comparison.
可以理解的是,本申请主要可以应用到银行贷款业务,根据用户的信用等级决定是否将该用户纳入黑名单,同时,根据该用户的地区、年龄、性别将其声纹特征注册到黑名单特征分库。当一个用户的声纹特征被注册到黑名单特征分库后,日后,若此人有来电,可以根据其声纹特征判别出其属于黑名单,因此可以不予办理贷款业务。It is understandable that this application can be mainly applied to bank loan business, according to the user's credit rating to decide whether to include the user in the blacklist, and at the same time, the user’s voiceprint characteristics are registered to the blacklist characteristics according to the user’s region, age, and gender. Sub-library. When a user's voiceprint characteristics are registered in the blacklist feature sub-database, in the future, if the person has an incoming call, it can be judged that he belongs to the blacklist based on his voiceprint characteristics, so the loan business may not be processed.
本申请实施例,将黑名单特征库划分为维度更小的黑名单特征分库,根据地址信息对应的黑名单特征分库对声纹特征进行对比,提高了声纹识别效率。In this embodiment of the application, the blacklist feature database is divided into blacklist feature sub-bases with smaller dimensions, and voiceprint features are compared according to the blacklist feature sub-bases corresponding to address information, which improves the efficiency of voiceprint recognition.
请参阅图2,本申请实施例提供的基于地址信息的黑名单识别方法的另一个流程图,具体包括:Please refer to FIG. 2, another flowchart of a method for identifying a blacklist based on address information provided by an embodiment of the present application, which specifically includes:
201、生成预置的黑名单模型,预置的黑名单模型用于黑名单注册。201. A preset blacklist model is generated, and the preset blacklist model is used for blacklist registration.
服务器生成预置的黑名单模型,预置的黑名单模型用于黑名单注册。具体的,服务器对黑名单进行分库注册处理,从客户信息(非敏感)、电话区段、网络IP等获知用户的年龄、地区、性别等关键信息,以其通话语音提取声纹特征;根据注册用户的基本信息将声纹特征保存到对应的库中,此即为黑名单的分库注册。分库注册只保存维度最细的黑名单特征分库,如(华东地区男50岁以上),而维度较大的库将由最细的库合成,如(华东地区男可由华东地区男50岁以上、华东地区男50岁以下两个黑名单特征分库合成)。基于此分库注册方案,可以根据用户的地区、性别以及年龄直接在对应的小库中进行匹配,(如果用户的地区、性别或年龄三个要素其中有某个要素无法确定,即可根据要素找到应该要合成的库,在这几个黑名单特征分库中进行匹配。)此分库注册方案在实际使用过程中,避免了单次匹配庞大黑名单库的弊端,同时也非常符合银行信用卡的实际业务场景。The server generates a preset blacklist model, and the preset blacklist model is used for blacklist registration. Specifically, the server performs sub-database registration processing on the blacklist, obtains key information such as the user's age, region, and gender from customer information (non-sensitive), phone segment, network IP, etc., and extracts voiceprint features from the voice of the call; The basic information of the registered user saves the voiceprint characteristics in the corresponding database, which is the sub-database registration of the blacklist. The sub-database registration only saves the blacklist feature sub-database with the finest dimension, such as (males in East China over 50 years old), while the larger-dimensional database will be synthesized from the finest database, such as (males in Eastern China can be over 50 years old in Eastern China). , Two blacklist feature sub-databases for males under 50 in East China). Based on this sub-database registration scheme, it can be directly matched in the corresponding small database according to the user’s region, gender and age. (If one of the three elements of the user’s region, gender or age cannot be determined, you can Find the library that should be synthesized, and match in these blacklist feature sub-libraries.) This sub-library registration scheme avoids the disadvantages of matching a huge blacklist library at a time during actual use, and it is also very compatible with bank credit cards Actual business scenarios.
需要说明的是,除了可以获取地区、年龄段、性别等维度信息,还可以获取其他维度信息,比如客户的职业或者客户在系统中的ID等等,但是,对于客户的敏感信息不宜作为维度。It should be noted that in addition to information on dimensions such as region, age group, gender, etc., information on other dimensions can also be obtained, such as the customer’s occupation or the customer’s ID in the system, etc. However, sensitive customer information should not be used as a dimension.
可以理解的是,本申请的执行主体可以为基于地址信息的黑名单识别装置,还可以是终端或者服务器,具体此处不做限定。本申请实施例以服务器为执行主体为例进行说明。It is understandable that the execution subject of this application may be a blacklist identification device based on address information, or may also be a terminal or a server, which is not specifically limited here. The embodiment of the present application takes the server as the execution subject as an example for description.
202、获取目标用户的语音文件,语音文件包括音频和地址信息,地址信息包括进线电话区段或/和互联网协议地址IP信息。202. Acquire a voice file of the target user. The voice file includes audio and address information, and the address information includes incoming phone segment or/and Internet Protocol address IP information.
服务器获取目标用户的语音文件,语音文件包括音频和地址信息,地址信息包括进线电话区段或/和IP信息。具体的,服务器接收目标用户的语音文件;服务器对语音文件进行解析,得到目标用户的音频和地址标识;服务器根据地址标识查询预置表格,得到与地址标识对应的地址信息,地址信息包括进线电话区段或/和IP信息。The server obtains the voice file of the target user, the voice file includes audio and address information, and the address information includes incoming telephone section or/and IP information. Specifically, the server receives the voice file of the target user; the server parses the voice file to obtain the audio and address identification of the target user; the server queries the preset table according to the address identification to obtain the address information corresponding to the address identification, and the address information includes the incoming line Phone segment or/and IP information.
例如,服务器在通过电话或网络获取到目标用户的音频的同时,也会相应的根据进线电话区段或者网络互联网协议地址地址(internet protocol,IP)信息确定目标用户的具体地址。而对于某一具体业务而言,会维护有目标用户的基本信息(敏感信息除外),例如,当服务器通过网络获取到目标用户语音文件,该语音文件中除了包括音频和地址标识,还包括有指示目标用户基本信息的身份标识,其中,基本信息包括年龄、性别等。For example, when the server obtains the audio of the target user through the phone or the network, it also determines the specific address of the target user according to the incoming phone segment or the Internet protocol address (IP) information of the network. For a specific service, the basic information of the target user (except sensitive information) is maintained. For example, when the server obtains the target user's voice file through the network, the voice file includes audio and address identifiers as well as An identity identifier that indicates the basic information of the target user, where the basic information includes age, gender, and so on.
203、通过预置算法对音频进行特征提取,得到特征文件。203. Perform feature extraction on the audio by using a preset algorithm to obtain a feature file.
服务器通过预置算法对音频进行特征提取,得到特征文件。具体的,服务器将音频从模拟信号形式转换成数字信号形式;服务器对数字信号形式的音频进行预加重;服务器将预加重后的音频进行加窗处理;服务器将加窗处理后的音频进行离散傅里叶变换,得到目标复数;服务器将目标复数对应到梅尔频谱上,得到对数能量;服务器将对数能量进行转换,得到倒谱系数;服务器根据倒谱系数计算能量和差分,生成特征文件。The server performs feature extraction on the audio through a preset algorithm to obtain a feature file. Specifically, the server converts the audio from an analog signal form to a digital signal form; the server pre-emphasizes the audio in the digital signal form; the server performs windowing processing on the pre-emphasized audio; the server performs discrete integration of the windowed audio The inner leaf transforms to obtain the target complex number; the server maps the target complex number to the Mel spectrum to obtain the logarithmic energy; the server converts the logarithmic energy to obtain the cepstral coefficient; the server calculates the energy and the difference according to the cepstral coefficient, and generates the signature file .
需要说明的是,再进行特征提取之前,服务器需要对采集到的音频进行采样和量化,即以一定的采样率和采样位数把音频连续波形转换为离散的数据点。由于日常生活中的声 音一般都在8kHz以下,根据Nyquist定律,16kHz采样率足以使得采样出来的数据包含大多数声音信息。16kHz意味着1s的时间内采样16k个样本,这些样本都是以幅度值存储,为了有效存储幅度值,需要将其量化为整数。对于16位采样位数来说,可以表示-32768~32767之间的整数值,所以可以将采样幅度值量化为最近的整数值。It should be noted that, before performing feature extraction, the server needs to sample and quantize the collected audio, that is, convert the audio continuous waveform into discrete data points with a certain sampling rate and number of sampling bits. Since the sound in daily life is generally below 8kHz, according to Nyquist's law, the sampling rate of 16kHz is sufficient to make the sampled data contain most of the sound information. 16kHz means that 16k samples are sampled in 1s. These samples are stored in amplitude values. In order to effectively store the amplitude values, they need to be quantized into integers. For the 16-bit sampling number, it can represent an integer value between -32768 and 32767, so the sampling amplitude value can be quantized to the nearest integer value.
例如,对于声音信号的频谱来说,通常低频部分的能量高于高频部分的能量,每经过10倍Hz,频谱能量就会衰减20dB,而且由于麦克风在采集声音信号时电路本身噪声的影响,也会增加低频部分的能量,为使高频部分的能量和低频部分能量有相似的幅度,需要预加强采集到声音的高频能量,即对数字信号形式的音频进行预加重。For example, for the frequency spectrum of a sound signal, the energy of the low frequency part is usually higher than the energy of the high frequency part. After 10 times of Hz, the spectrum energy will be attenuated by 20dB, and due to the influence of the noise of the circuit itself when the microphone is collecting the sound signal, It will also increase the energy of the low frequency part. In order to make the energy of the high frequency part and the energy of the low frequency part have similar amplitude, it is necessary to pre-enhance the high-frequency energy of the collected sound, that is, pre-emphasize the audio in the form of digital signals.
在一段相当短的时间内,可以认为预加重后的音频是平稳的,这就是加窗。窗由三个参数来描述:窗长(单位毫秒)、偏移和形状。每一个加窗的音频信号叫做一帧,每一帧的毫秒数叫做帧长,相邻两帧左边界的距离叫帧移。从音频信号s[n]中提取一帧的过程可表示为y[n]=w[n]s[n],如果w[n]是矩形窗,则信号会在边界处切断,这些不连续会对傅里叶分析造成影响。因此在梅尔频率倒谱系数中,加窗一般使用边缘平滑降到0的汉明窗,表达式如下:In a relatively short period of time, the audio after pre-emphasis can be considered to be smooth, which is called windowing. The window is described by three parameters: window length (in milliseconds), offset and shape. Each windowed audio signal is called a frame, the number of milliseconds in each frame is called the frame length, and the distance between the left borders of two adjacent frames is called the frame shift. The process of extracting a frame from the audio signal s[n] can be expressed as y[n]=w[n]s[n]. If w[n] is a rectangular window, the signal will be cut off at the boundary. These discontinuities Will affect the Fourier analysis. Therefore, in the Mel frequency cepstral coefficient, the windowing generally uses the Hamming window with edge smoothing reduced to 0, the expression is as follows:
Figure PCTCN2019117117-appb-000007
L为帧长。
Figure PCTCN2019117117-appb-000007
L is the frame length.
可以理解的是,服务器将加窗处理后的音频进行离散傅里叶变换,得到目标复数的过程具体包括:服务器获取加窗后的音频信号x[n],...,[m],n和m为大于0的整数;服务器调用第一预置公式生成目标复数X[k],第一预置公式为:
Figure PCTCN2019117117-appb-000008
N为2的幂,k为整数,X[k]表示加窗后的音频信号中某一频率成分的幅度和相位。
It is understandable that the server performs discrete Fourier transform on the windowed audio, and the process of obtaining the target complex number specifically includes: the server obtains the windowed audio signal x[n],...,[m], n And m are integers greater than 0; the server calls the first preset formula to generate the target complex number X[k], the first preset formula is:
Figure PCTCN2019117117-appb-000008
N is a power of 2, k is an integer, and X[k] represents the amplitude and phase of a certain frequency component in the windowed audio signal.
服务器将目标复数对应到梅尔频谱上,得到对数能量的过程具体包括:The server maps the target complex number to the Mel spectrum, and the process of obtaining the logarithmic energy specifically includes:
服务器通过预置的滤波器组将目标复数进行平滑处理;服务器将平滑处理后的复数与梅尔频谱上的梅尔刻度进行对应,一个梅尔刻度表示一个高音单位;服务器通过第二预置公式将平滑处理后的复数对应到梅尔刻度上,得到目标刻度,第二预置公式为:
Figure PCTCN2019117117-appb-000009
服务器根据第三预置公式计算目标刻度的对数能量,第三预置公式为:
Figure PCTCN2019117117-appb-000010
H m(k)为滤波器组的频率响应,M表示预置的滤波器组中滤波器个数。需要说明的是,一般人对声音声压的反应呈对数关系,人对高声压的细微变化敏感度不如低声压。此外,使用对数可以降低提取的特征对输入声音能量变化的敏感度,因为声音与麦克风之间的距离是变化的,因而麦克风采集到的声音能量也是变化的。人耳听觉对不同频带的敏感度是不同的,人耳对高频不如低频敏感,这一分界线大约是1000Hz,在提取声音特征时模拟人耳听觉这一性质可以提高识别性能。滤波器组是一组梅尔刻度的三角形滤波器组,1000Hz以下的10个滤波器线性相隔,1000Hz以上的剩余滤波器对数相隔。
The server smoothes the target complex number through the preset filter group; the server corresponds the smoothed complex number to the mel scale on the mel spectrum, and one mel scale represents a treble unit; the server uses the second preset formula Correspond the smoothed complex number to the mel scale to obtain the target scale. The second preset formula is:
Figure PCTCN2019117117-appb-000009
The server calculates the logarithmic energy of the target scale according to the third preset formula. The third preset formula is:
Figure PCTCN2019117117-appb-000010
H m (k) is the frequency response of the filter bank, and M represents the number of filters in the preset filter bank. It should be noted that the response of average people to sound pressure is logarithmic, and people are not as sensitive to subtle changes in high sound pressure as low sound pressure. In addition, the use of logarithms can reduce the sensitivity of the extracted features to changes in the input sound energy, because the distance between the sound and the microphone changes, so the sound energy collected by the microphone also changes. The sensitivity of human hearing to different frequency bands is different. The human ear is not as sensitive to high frequencies as low frequencies. This dividing line is about 1000 Hz. The property of simulating human hearing when extracting sound features can improve recognition performance. The filter bank is a set of triangular filter banks with a Mel scale. The 10 filters below 1000 Hz are linearly separated, and the remaining filters above 1000 Hz are logarithmically separated.
服务器根据倒谱系数计算能量和差分,生成特征文件的过程具体包括:The server calculates the energy and difference according to the cepstral coefficient, and the process of generating the signature file includes:
具体的,某一帧的能量定义为某一帧样本点的平方和,对于一个加窗信号x,其从样本点t1到样本点t2的能量为:
Figure PCTCN2019117117-appb-000011
以上提取的特征每一帧单独考虑,是静态的,而实际声音是连续的,帧与帧之间是有联系的,因而需要增加特征来表示这种帧间的动态变化,这通常通过计算每一帧13个特征(12个倒谱特征加上1个能量)的一阶差分甚至二阶差分来实现。一个简单计算差分的方法就是计算当前帧前后各一帧的13个特征的差值:
Figure PCTCN2019117117-appb-000012
如果不考虑二阶差分,最终每一帧的梅尔频率倒谱系数特征为26维度:12维倒谱系数、12维倒谱系数差分、1维能量和1维能量差分。
Specifically, the energy of a certain frame is defined as the sum of the squares of sample points in a certain frame. For a windowed signal x, the energy from sample point t1 to sample point t2 is:
Figure PCTCN2019117117-appb-000011
The features extracted above are considered separately for each frame and are static, while the actual sound is continuous, and there is a connection between frames. Therefore, it is necessary to add features to represent such dynamic changes between frames. This is usually calculated by calculating each frame. The first-order difference or even the second-order difference of 13 features in one frame (12 cepstrum features plus 1 energy) can be realized. A simple way to calculate the difference is to calculate the difference between the 13 features of the current frame before and after the frame:
Figure PCTCN2019117117-appb-000012
If the second-order difference is not considered, the final Mel frequency cepstral coefficient feature of each frame is 26 dimensions: 12-dimensional cepstral coefficient, 12-dimensional cepstral coefficient difference, 1-dimensional energy and 1-dimensional energy difference.
204、判断特征文件是否有效。204. Determine whether the feature file is valid.
服务器判断特征文件是否有效。具体的,服务器判断特征文件的格式是否符合预置质量要求;若特征文件的格式不符合预置质量要求,则服务器确定特征文件无效;若特征文件的格式符合预置质量要求,则服务器判断特征文件中是否存在多个用户的声音;若特征文件中不存在多个用户的声音,则服务器确定特征文件有效;若特征文件中存在多个用户的声音,则服务器确定特征文件无效。The server judges whether the signature file is valid. Specifically, the server determines whether the format of the signature file meets the preset quality requirements; if the format of the signature file does not meet the preset quality requirements, the server determines that the signature file is invalid; if the format of the signature file meets the preset quality requirements, the server determines the feature Whether there are voices of multiple users in the file; if there are no voices of multiple users in the feature file, the server determines that the feature file is valid; if there are voices of multiple users in the feature file, the server determines that the feature file is invalid.
205、若特征文件无效,则生成提取失败的状态码,状态码用于指示提取失败的原因。205. If the feature file is invalid, a status code of the extraction failure is generated, and the status code is used to indicate the reason for the extraction failure.
若特征文件无效,则服务器生成提取失败的状态码,状态码用于指示提取失败的原因,即提取失败的状态码可以简单告知没有提取成功的原因,如语音质量不佳,多人说话等。不同的失败原因对应不同的状态码。例如,若特征文件的格式不符合预置质量要求,则服务器确定特征文件无效,生成提取失败的第一状态码;若特征文件中存在多个用户的声音,则服务器确定特征文件无效,生成提取失败的第二状态码,其中,第一状态码和第二状态码不同。If the signature file is invalid, the server generates an extraction failure status code. The status code is used to indicate the reason for the extraction failure. That is, the extraction failure status code can simply inform the reason for the failure to extract, such as poor voice quality, multiple people talking, etc. Different reasons for failure correspond to different status codes. For example, if the format of the feature file does not meet the preset quality requirements, the server determines that the feature file is invalid and generates the first status code that the extraction fails; if there are multiple user voices in the feature file, the server determines that the feature file is invalid and generates the extraction The failed second status code, where the first status code and the second status code are different.
206、若特征文件有效,则根据进线电话区段或IP信息确定目标用户所属的目标地址范围,调用预置的黑名单模型和目标地址范围对特征文件进行相似度评分,并根据评分结果进行相应操作。206. If the signature file is valid, determine the target address range to which the target user belongs according to the incoming phone segment or IP information, call the preset blacklist model and target address range to score the similarity of the signature file, and perform the score based on the score result Operate accordingly.
若特征文件有效,则服务器根据进线电话区段或IP信息确定目标用户所属的目标地址范围,调用预置的黑名单模型和目标地址范围对特征文件进行相似度评分,并根据评分结果进行相应操作。具体的,若特征文件有效,则服务器基于进线电话区段或IP信息确定目标用户所属的目标地址范围;服务器根据目标地址范围在预置的黑名单模型中确定对应的目标黑名单模型,每个预置的黑名单模型对应一个不同的黑名单特征分库;服务器通过目 标黑名单模型对特征文件进行相似度评分,得到目标分值;若目标分值大于或等于第一阈值,则服务器确定目标用户在目标黑名单模型对应的黑名单特征分库中,并返回第一提示消息,第一提示消息用于指示目标用户被禁止接受正常服务;若目标分值小于第一阈值,则服务器确定目标用户不在目标黑名单模型对应的黑名单特征分库中,并返回第二提示消息,第二提示消息用于指示目标用户接受正常服务。If the signature file is valid, the server determines the target address range to which the target user belongs according to the incoming phone segment or IP information, calls the preset blacklist model and target address range to score the similarity of the signature file, and makes corresponding responses according to the scoring results operating. Specifically, if the signature file is valid, the server determines the target address range to which the target user belongs based on the incoming phone segment or IP information; the server determines the corresponding target blacklist model in the preset blacklist model according to the target address range, and A preset blacklist model corresponds to a different blacklist feature sub-database; the server scores the similarity of the feature files through the target blacklist model to obtain the target score; if the target score is greater than or equal to the first threshold, the server determines The target user is in the blacklist feature sub-database corresponding to the target blacklist model, and the first prompt message is returned. The first prompt message is used to indicate that the target user is prohibited from receiving normal services; if the target score is less than the first threshold, the server determines The target user is not in the blacklist feature sub-database corresponding to the target blacklist model, and a second prompt message is returned. The second prompt message is used to instruct the target user to accept normal services.
需要说明的是,不同的地址,对应不同的目标黑名单模型,对应的黑名单特征分库也不同。不在黑名单特征分库内,即表示为正常用户,接受正常服务,如果此次服务过程中不顺利,也可将该用户注册到黑名单特征分库中;如果在黑名单特征分库内,则视为黑名单用户,不予正常服务(如不予办理贷款等等)。It should be noted that different addresses correspond to different target blacklist models, and the corresponding blacklist feature sub-databases are also different. If the user is not in the blacklist feature sub-database, it means that it is a normal user and accepts the normal service. If the service process is not smooth, the user can also be registered in the blacklist feature sub-database; if it is in the blacklist feature sub-database, It is regarded as a blacklisted user and will not receive normal services (such as not granting loans, etc.).
目标用户的总分数,对语音所提取的声纹特征(特征文件)与黑名单特征分库中的声纹特征进行比对,然后结合地址信息进行评分。这里的打分是计算声纹特征的相似度。通常会根据模型训练有一个阈值,当打分高于阈值时,即证明两个声纹特征接近,也就可以认为是比对到了。The total score of the target user is compared with the voiceprint features (feature files) extracted from the voice and the voiceprint features in the blacklist feature library, and then combined with the address information to score. The scoring here is to calculate the similarity of voiceprint features. Usually there is a threshold according to the model training. When the score is higher than the threshold, it proves that the two voiceprint features are close, and it can be considered as a comparison.
可以理解的是,本申请主要可以应用到银行贷款业务,根据用户的信用等级决定是否将该用户纳入黑名单,同时,根据该用户的地区、年龄、性别将其声纹特征注册到黑名单特征分库。当一个用户的声纹特征被注册到黑名单特征分库后,日后,若此人有来电,可以根据其声纹特征判别出其属于黑名单,因此可以不予办理贷款业务。It is understandable that this application can be mainly applied to bank loan business, according to the user's credit rating to decide whether to include the user in the blacklist, and at the same time, the user’s voiceprint characteristics are registered to the blacklist characteristics according to the user’s region, age, and gender. Sub-library. When a user's voiceprint characteristics are registered in the blacklist feature sub-database, in the future, if the person has an incoming call, it can be judged that he belongs to the blacklist based on his voiceprint characteristics, so the loan business may not be processed.
本申请实施例,将黑名单特征库划分为维度更小的黑名单特征分库,根据地址信息对应的黑名单特征分库对声纹特征进行对比,提高了声纹识别效率。In this embodiment of the application, the blacklist feature database is divided into blacklist feature sub-bases with smaller dimensions, and voiceprint features are compared according to the blacklist feature sub-bases corresponding to address information, which improves the efficiency of voiceprint recognition.
上面对本申请实施例中基于地址信息的黑名单识别方法进行了描述,下面对本申请实施例中基于地址信息的黑名单识别装置进行描述,请参阅图3,本申请实施例中基于地址信息的黑名单识别装置的一个实施例包括:The blacklist recognition method based on address information in the embodiment of this application is described above, and the blacklist recognition device based on address information in the embodiment of this application is described below. Please refer to FIG. 3, the blacklist based on address information in the embodiment of this application is described. An embodiment of the list identification device includes:
获取单元301,用于获取目标用户的语音文件,所述语音文件包括音频和地址信息,所述地址信息包括进线电话区段或/和互联网协议地址IP信息;The obtaining unit 301 is configured to obtain a voice file of a target user, the voice file includes audio and address information, and the address information includes incoming telephone section or/and Internet Protocol address IP information;
提取单元302,用于通过预置算法对所述音频进行特征提取,得到特征文件;The extraction unit 302 is configured to perform feature extraction on the audio by using a preset algorithm to obtain a feature file;
判断单元303,用于判断所述特征文件是否有效;The judging unit 303 is used to judge whether the feature file is valid;
第一生成单元304,若所述特征文件无效,则用于生成提取失败的状态码,所述状态码用于指示提取失败的原因;The first generating unit 304, if the signature file is invalid, is used to generate a status code of the extraction failure, and the status code is used to indicate the reason for the extraction failure;
评分单元305,若所述特征文件有效,则用于根据所述进线电话区段或所述IP信息确定所述目标用户所属的目标地址范围,调用预置的黑名单模型和所述目标地址范围对特征文件进行相似度评分,并根据评分结果进行相应操作。The scoring unit 305, if the feature file is valid, is used to determine the target address range to which the target user belongs according to the incoming phone segment or the IP information, and call a preset blacklist model and the target address The range scores the similarity of the feature files, and performs corresponding operations based on the scoring results.
本申请实施例,将黑名单特征库划分为维度更小的黑名单特征分库,根据地址信息对应的黑名单特征分库对声纹特征进行对比,提高了声纹识别效率。In this embodiment of the application, the blacklist feature database is divided into blacklist feature sub-bases with smaller dimensions, and voiceprint features are compared according to the blacklist feature sub-bases corresponding to address information, which improves the efficiency of voiceprint recognition.
请参阅图4,本申请实施例中基于地址信息的黑名单识别装置的另一个实施例包括:Referring to FIG. 4, another embodiment of the device for identifying a blacklist based on address information in an embodiment of the present application includes:
获取单元301,用于获取目标用户的语音文件,所述语音文件包括音频和地址信息,所述地址信息包括进线电话区段或/和互联网协议地址IP信息;The obtaining unit 301 is configured to obtain a voice file of a target user, the voice file includes audio and address information, and the address information includes incoming telephone section or/and Internet Protocol address IP information;
提取单元302,用于通过预置算法对所述音频进行特征提取,得到特征文件;The extraction unit 302 is configured to perform feature extraction on the audio by using a preset algorithm to obtain a feature file;
判断单元303,用于判断所述特征文件是否有效;The judging unit 303 is used to judge whether the feature file is valid;
第一生成单元304,若所述特征文件无效,则用于生成提取失败的状态码,所述状态码用于指示提取失败的原因;The first generating unit 304, if the signature file is invalid, is used to generate a status code of the extraction failure, and the status code is used to indicate the reason for the extraction failure;
评分单元305,若所述特征文件有效,则用于根据所述进线电话区段或所述IP信息确定所述目标用户所属的目标地址范围,调用预置的黑名单模型和所述目标地址范围对特征文件进行相似度评分,并根据评分结果进行相应操作。The scoring unit 305, if the feature file is valid, is used to determine the target address range to which the target user belongs according to the incoming phone segment or the IP information, and call a preset blacklist model and the target address The range scores the similarity of the feature files, and performs corresponding operations based on the scoring results.
可选的,评分单元305具体用于:Optionally, the scoring unit 305 is specifically used for:
若所述特征文件有效,则基于所述进线电话区段或所述IP信息确定所述目标用户所属的目标地址范围;根据所述目标地址范围在所述预置的黑名单模型中确定对应的目标黑名单模型,每个预置的黑名单模型对应一个不同的黑名单特征分库;通过所述目标黑名单模型对所述特征文件进行相似度评分,得到目标分值;若所述目标分值大于或等于第一阈值,则确定所述目标用户在所述目标黑名单模型对应的黑名单特征分库中,并返回第一提示消息,所述第一提示消息用于指示所述目标用户被禁止接受正常服务;若所述目标分值小于所述第一阈值,则确定所述目标用户不在所述目标黑名单模型对应的黑名单特征分库中,并返回第二提示消息,所述第二提示消息用于指示所述目标用户接受正常服务。If the signature file is valid, the target address range to which the target user belongs is determined based on the incoming phone segment or the IP information; the corresponding target address range is determined in the preset blacklist model according to the target address range Target blacklist model, each preset blacklist model corresponds to a different blacklist feature sub-database; the target blacklist model is used to score the similarity of the feature files to obtain the target score; if the target If the score is greater than or equal to the first threshold, it is determined that the target user is in the blacklist feature database corresponding to the target blacklist model, and a first prompt message is returned. The first prompt message is used to indicate the target The user is prohibited from receiving normal services; if the target score is less than the first threshold, it is determined that the target user is not in the blacklist feature database corresponding to the target blacklist model, and a second prompt message is returned, so The second prompt message is used to indicate that the target user accepts normal services.
可选的,获取单元301具体用于:Optionally, the obtaining unit 301 is specifically configured to:
接收目标用户的语音文件;对所述语音文件进行解析,得到所述目标用户的音频和地址标识;根据所述地址标识查询预置表格,得到与所述地址标识对应的地址信息,所述地址信息包括进线电话区段或/和IP信息。Receive the voice file of the target user; parse the voice file to obtain the audio and address identification of the target user; query a preset table according to the address identification to obtain the address information corresponding to the address identification, the address The information includes incoming phone segment or/and IP information.
可选的,提取单元302包括:Optionally, the extraction unit 302 includes:
第一转换模块3021,用于将所述音频从模拟信号形式转换成数字信号形式;The first conversion module 3021 is used to convert the audio from an analog signal form to a digital signal form;
预加重模块3022,用于对数字信号形式的音频进行预加重;The pre-emphasis module 3022 is used to pre-emphasize audio in the form of digital signals;
加窗模块3023,用于将预加重后的音频进行加窗处理;The windowing module 3023 is used for windowing the pre-emphasized audio;
变换模块3024,用于将加窗处理后的音频进行离散傅里叶变换,得到目标复数;The transform module 3024 is used to perform discrete Fourier transform on the windowed audio to obtain the target complex number;
对应模块3025,用于将所述目标复数对应到梅尔频谱上,得到对数能量;The corresponding module 3025 is used to map the target complex number to the Mel spectrum to obtain logarithmic energy;
第二转换模块3026,用于将所述对数能量进行转换,得到倒谱系数;The second conversion module 3026 is configured to convert the logarithmic energy to obtain the cepstral coefficient;
计算模块3027,用于根据所述倒谱系数计算能量和差分,生成特征文件。The calculation module 3027 is used to calculate the energy and the difference according to the cepstral coefficients to generate a feature file.
可选的,变换模块3024具体用于:Optionally, the transformation module 3024 is specifically used for:
获取加窗后的音频信号x[n],...,[m],n和m为大于0的整数;调用第一预置公式生成目标复数X[k],第一预置公式为:
Figure PCTCN2019117117-appb-000013
N为2的幂,k为整数,X[k]表示所述加窗后的音频信号中某一频率成分的幅度和相位。
Get the windowed audio signal x[n],...,[m], where n and m are integers greater than 0; call the first preset formula to generate the target complex number X[k], the first preset formula is:
Figure PCTCN2019117117-appb-000013
N is a power of 2, k is an integer, and X[k] represents the amplitude and phase of a certain frequency component in the windowed audio signal.
可选的,对应模块3025具体用于:Optionally, the corresponding module 3025 is specifically used for:
通过预置的滤波器组将所述目标复数进行平滑处理;将平滑处理后的复数与梅尔频谱上的梅尔刻度进行对应,一个梅尔刻度表示一个高音单位;通过第二预置公式将所述平滑处理后的复数对应到梅尔刻度上,得到目标刻度,所述第二预置公式为:
Figure PCTCN2019117117-appb-000014
根据第三预置公式计算所述目标刻度的对数能量,所述第三预置公式为:
Figure PCTCN2019117117-appb-000015
H m(k)为滤波器组的频率响应,M表示预置的滤波器组中滤波器个数。
The target complex number is smoothed through the preset filter bank; the smoothed complex number is corresponding to the mel scale on the mel spectrum, and one mel scale represents a treble unit; and the second preset formula The smoothed complex number corresponds to the mel scale to obtain the target scale, and the second preset formula is:
Figure PCTCN2019117117-appb-000014
The logarithmic energy of the target scale is calculated according to a third preset formula, and the third preset formula is:
Figure PCTCN2019117117-appb-000015
H m (k) is the frequency response of the filter bank, and M represents the number of filters in the preset filter bank.
可选的,基于地址信息的黑名单识别装置还包括:Optionally, the blacklist identification device based on address information further includes:
第二生成单元306,用于生成预置的黑名单模型,所述预置的黑名单模型用于黑名单注册。The second generating unit 306 is configured to generate a preset blacklist model, and the preset blacklist model is used for blacklist registration.
本申请实施例,将黑名单特征库划分为维度更小的黑名单特征分库,根据地址信息对应的黑名单特征分库对声纹特征进行对比,提高了声纹识别效率。In this embodiment of the application, the blacklist feature database is divided into blacklist feature sub-bases with smaller dimensions, and voiceprint features are compared according to the blacklist feature sub-bases corresponding to address information, which improves the efficiency of voiceprint recognition.
上面图3至图4从模块化功能实体的角度对本申请实施例中的基于地址信息的黑名单识别装置进行详细描述,下面从硬件处理的角度对本申请实施例中基于地址信息的黑名单识别设备进行详细描述。The above Figures 3 to 4 describe in detail the address information-based blacklist identification device in this embodiment of the application from the perspective of modular functional entities, and the following describes the address information-based blacklist identification device in this embodiment of the application from the perspective of hardware processing Give a detailed description.
图5是本申请实施例提供的基于地址信息的黑名单识别设备的结构示意图,该基于地址信息的黑名单识别设备500可因配置或性能不同而产生比较大的差异,可以包括一个或 一个以上处理器(central processing units,CPU)501(例如,一个或一个以上处理器)和存储器509,一个或一个以上存储应用程序507或数据506的存储介质508(例如一个或一个以上海量存储设备)。其中,存储器509和存储介质508可以是短暂存储或持久存储。存储在存储介质508的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对基于地址信息的黑名单识别设备中的一系列指令操作。更进一步地,处理器501可以设置为与存储介质508通信,在基于地址信息的黑名单识别设备500上执行存储介质508中的一系列指令操作。FIG. 5 is a schematic structural diagram of a blacklist recognition device based on address information provided by an embodiment of the present application. The blacklist recognition device 500 based on address information may have relatively large differences due to different configurations or performance, and may include one or more A processor (central processing units, CPU) 501 (for example, one or more processors), a memory 509, and one or more storage media 508 (for example, one or more storage devices with a large amount of data) storing application programs 507 or data 506. Among them, the memory 509 and the storage medium 508 may be short-term storage or persistent storage. The program stored in the storage medium 508 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the blacklist identification device based on address information. Furthermore, the processor 501 may be configured to communicate with the storage medium 508, and execute a series of instruction operations in the storage medium 508 on the blacklist recognition device 500 based on address information.
基于地址信息的黑名单识别设备500还可以包括一个或一个以上电源502,一个或一个以上有线或无线网络接口503,一个或一个以上输入输出接口504,和/或,一个或一个以上操作系统505,例如Windows Serve,Mac OS X,Unix,Linux,FreeBSD等等。本领域技术人员可以理解,图5中示出的基于地址信息的黑名单识别设备结构并不构成对基于地址信息的黑名单识别设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。处理器501可以执行上述实施例中获取单元301、提取单元302、判断单元303、生成单元304、评分单元305和生成单元306的功能。The blacklist identification device 500 based on address information may also include one or more power supplies 502, one or more wired or wireless network interfaces 503, one or more input and output interfaces 504, and/or, one or more operating systems 505 , Such as Windows Serve, Mac OS X, Unix, Linux, FreeBSD and so on. Those skilled in the art can understand that the structure of the blacklist recognition device based on address information shown in FIG. 5 does not constitute a limitation on the blacklist recognition device based on address information, and may include more or less components than shown in the figure. Or some parts are combined, or different parts are arranged. The processor 501 can perform the functions of the acquisition unit 301, the extraction unit 302, the judgment unit 303, the generation unit 304, the scoring unit 305, and the generation unit 306 in the foregoing embodiment.
下面结合图5对基于地址信息的黑名单识别设备的各个构成部件进行具体的介绍:The following specifically introduces each component of the address information-based blacklist identification device with reference to Figure 5:
处理器501是基于地址信息的黑名单识别设备的控制中心,可以按照设置的基于地址信息的黑名单识别方法进行处理。处理器501利用各种接口和线路连接整个基于地址信息的黑名单识别设备的各个部分,通过运行或执行存储在存储器509内的软件程序和/或模块,以及调用存储在存储器509内的数据,执行基于地址信息的黑名单识别设备的各种功能,将黑名单特征库划分为维度更小的黑名单特征分库,根据地址信息对应的黑名单特征分库对声纹特征进行对比,提高了声纹识别效率。存储介质508和存储器509都是存储数据的载体,本申请实施例中,存储介质508可以是指储存容量较小,但速度快的内存储器,而存储器509可以是储存容量大,但储存速度慢的外存储器。The processor 501 is the control center of the blacklist identification device based on address information, and can perform processing according to the set blacklist identification method based on address information. The processor 501 uses various interfaces and lines to connect various parts of the entire blacklist identification device based on address information, by running or executing software programs and/or modules stored in the memory 509, and calling data stored in the memory 509, Perform various functions of the blacklist recognition device based on address information, divide the blacklist feature database into smaller blacklist feature sub-bases, and compare the voiceprint features according to the blacklist feature sub-bases corresponding to the address information to improve Voiceprint recognition efficiency. The storage medium 508 and the memory 509 are both carriers for storing data. In the embodiment of the present application, the storage medium 508 may refer to an internal memory with a small storage capacity but a fast speed, and the storage medium 509 may have a large storage capacity but a slow storage speed. External memory.
存储器509可用于存储软件程序以及模块,处理器501通过运行存储在存储器509的软件程序以及模块,从而执行基于地址信息的黑名单识别设备500的各种功能应用以及数据处理。存储器509可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如通过预置算法对所述音频进行特征提取,得到特征文件)等;存储数据区可存储根据基于地址信息的黑名单识别设备的使用所创建的数据(比如提取失败的状态码)等。此外,存储器509可以包括高速随机存取存储器,还可 以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在本申请实施例中提供的基于地址信息的黑名单识别方法程序和接收到的数据流存储在存储器中,当需要使用时,处理器501从存储器509中调用。The memory 509 may be used to store software programs and modules. The processor 501 executes various functional applications and data processing of the blacklist identification device 500 based on address information by running the software programs and modules stored in the memory 509. The memory 509 may mainly include a storage program area and a storage data area, where the storage program area may store an operating system, at least one application program required by a function (for example, feature extraction of the audio through a preset algorithm to obtain a feature file), etc. ; The storage data area can store data created according to the use of the blacklist identification device based on the address information (such as the status code of the extraction failure) and so on. In addition, the memory 509 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices. The blacklist identification method program based on address information and the received data stream provided in the embodiment of the present application are stored in the memory, and the processor 501 calls it from the memory 509 when it needs to be used.
本申请还提供一种计算机可读存储介质,该计算机可读存储介质可以为非易失性计算机可读存储介质,该计算机可读存储介质也可以为易失性计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在计算机上运行时,使得计算机执行如下基于地址信息的黑名单识别方法的步骤:The present application also provides a computer-readable storage medium. The computer-readable storage medium may be a non-volatile computer-readable storage medium, and the computer-readable storage medium may also be a volatile computer-readable storage medium. The computer-readable storage medium stores instructions, and when the instructions run on the computer, the computer executes the following steps of the blacklist identification method based on address information:
获取目标用户的语音文件,所述语音文件包括音频和地址信息,所述地址信息包括进线电话区段或/和互联网协议地址IP信息;Acquire a voice file of the target user, the voice file includes audio and address information, and the address information includes incoming phone segment or/and Internet Protocol address IP information;
通过预置算法对所述音频进行特征提取,得到特征文件;Performing feature extraction on the audio by using a preset algorithm to obtain a feature file;
判断所述特征文件是否有效;Determine whether the feature file is valid;
若所述特征文件无效,则生成提取失败的状态码,所述状态码用于指示提取失败的原因;If the signature file is invalid, a status code of extraction failure is generated, and the status code is used to indicate the reason for the extraction failure;
若所述特征文件有效,则根据所述进线电话区段或所述IP信息确定所述目标用户所属的目标地址范围,调用预置的黑名单模型和所述目标地址范围对特征文件进行相似度评分,并根据评分结果进行相应操作。If the signature file is valid, the target address range to which the target user belongs is determined according to the incoming phone segment or the IP information, and the preset blacklist model and the target address range are called to perform similarity to the signature file Scoring, and perform corresponding operations based on the scoring results.
在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、双绞线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,光盘)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。When the computer program instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, twisted pair) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server or a data center integrated with one or more available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, an optical disc), or a semiconductor medium (for example, a solid state disk (SSD)).
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of the description, the specific working process of the above-described system, device, and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

Claims (20)

  1. 一种基于地址信息的黑名单识别方法,包括:A blacklist identification method based on address information includes:
    获取目标用户的语音文件,所述语音文件包括音频和地址信息,所述地址信息包括进线电话区段或/和互联网协议地址IP信息;Acquire a voice file of the target user, the voice file includes audio and address information, and the address information includes incoming phone segment or/and Internet Protocol address IP information;
    通过预置算法对所述音频进行特征提取,得到特征文件;Perform feature extraction on the audio by using a preset algorithm to obtain a feature file;
    判断所述特征文件是否有效;Determine whether the feature file is valid;
    若所述特征文件无效,则生成提取失败的状态码,所述状态码用于指示提取失败的原因;If the signature file is invalid, a status code of extraction failure is generated, and the status code is used to indicate the reason for the extraction failure;
    若所述特征文件有效,则根据所述进线电话区段或所述IP信息确定所述目标用户所属的目标地址范围,调用预置的黑名单模型和所述目标地址范围对特征文件进行相似度评分,并根据评分结果进行相应操作。If the signature file is valid, the target address range to which the target user belongs is determined according to the incoming phone segment or the IP information, and the preset blacklist model and the target address range are called to perform similarity to the signature file Scoring, and perform corresponding operations based on the scoring results.
  2. 根据权利要求1所述的基于地址信息的黑名单识别方法,所述若所述特征文件有效,则根据所述进线电话区段或所述互联网协议地址IP信息确定所述目标用户所属的目标地址范围,调用预置的黑名单模型和所述目标地址范围对特征文件进行相似度评分,并根据评分结果进行相应操作,包括:The method for identifying a blacklist based on address information according to claim 1, wherein if the feature file is valid, the target to which the target user belongs is determined according to the incoming phone segment or the Internet Protocol address IP information Address range, call the preset blacklist model and the target address range to score the similarity of the feature files, and perform corresponding operations based on the scoring results, including:
    若所述特征文件有效,则基于所述进线电话区段或所述IP信息确定所述目标用户所属的目标地址范围;If the signature file is valid, determine the target address range to which the target user belongs based on the incoming phone segment or the IP information;
    根据所述目标地址范围在所述预置的黑名单模型中确定对应的目标黑名单模型,每个预置的黑名单模型对应一个不同的黑名单特征分库;Determining a corresponding target blacklist model in the preset blacklist model according to the target address range, and each preset blacklist model corresponds to a different blacklist feature sub-database;
    通过所述目标黑名单模型对所述特征文件进行相似度评分,得到目标分值;Scoring the similarity of the feature files through the target blacklist model to obtain a target score;
    若所述目标分值大于或等于第一阈值,则确定所述目标用户在所述目标黑名单模型对应的黑名单特征分库中,并返回第一提示消息,所述第一提示消息用于指示所述目标用户被禁止接受正常服务;If the target score is greater than or equal to the first threshold, it is determined that the target user is in the blacklist feature database corresponding to the target blacklist model, and a first prompt message is returned. The first prompt message is used for Indicating that the target user is prohibited from receiving normal services;
    若所述目标分值小于所述第一阈值,则确定所述目标用户不在所述目标黑名单模型对应的黑名单特征分库中,并返回第二提示消息,所述第二提示消息用于指示所述目标用户接受正常服务。If the target score is less than the first threshold, it is determined that the target user is not in the blacklist feature database corresponding to the target blacklist model, and a second prompt message is returned, where the second prompt message is used for Instruct the target user to accept normal services.
  3. 根据权利要求1所述的基于地址信息的黑名单识别方法,所述获取目标用户的语音文件,所述语音文件包括音频和地址信息,所述地址信息包括进线电话区段或/和IP信息包括:The blacklist recognition method based on address information according to claim 1, said acquiring a voice file of a target user, said voice file including audio and address information, and said address information including incoming phone segment or/and IP information include:
    接收目标用户的语音文件;Receive the voice file of the target user;
    对所述语音文件进行解析,得到所述目标用户的音频和地址标识;Parse the voice file to obtain the audio and address identification of the target user;
    根据所述地址标识查询预置表格,得到与所述地址标识对应的地址信息,所述地址信息包括进线电话区段或/和IP信息。The preset table is queried according to the address identifier to obtain address information corresponding to the address identifier, and the address information includes incoming telephone section or/and IP information.
  4. 根据权利要求1所述的基于地址信息的黑名单识别方法,所述通过预置算法对所述音频进行特征提取,得到特征文件包括:The blacklist recognition method based on address information according to claim 1, wherein the feature extraction of the audio through a preset algorithm to obtain a feature file comprises:
    将所述音频从模拟信号形式转换成数字信号形式;Converting the audio from an analog signal form to a digital signal form;
    对数字信号形式的音频进行预加重;Pre-emphasize audio in the form of digital signals;
    将预加重后的音频进行加窗处理;Windowing the pre-emphasized audio;
    将加窗处理后的音频进行离散傅里叶变换,得到目标复数;Perform discrete Fourier transform on the windowed audio to obtain the target complex number;
    将所述目标复数对应到梅尔频谱上,得到对数能量;Corresponding the target complex number to the Mel spectrum to obtain logarithmic energy;
    将所述对数能量进行转换,得到倒谱系数;Converting the logarithmic energy to obtain the cepstrum coefficient;
    根据所述倒谱系数计算能量和差分,生成特征文件。The energy and difference are calculated according to the cepstral coefficients, and a characteristic file is generated.
  5. 根据权利要求4所述的基于地址信息的黑名单识别方法,所述将加窗处理后的音频进行离散傅里叶变换,得到目标复数包括:According to the method for identifying a blacklist based on address information according to claim 4, the discrete Fourier transform of the windowed audio to obtain the target complex number comprises:
    获取加窗后的音频信号x[n],...,[m],n和m为大于0的整数;Obtain the windowed audio signal x[n],...,[m], where n and m are integers greater than 0;
    调用第一预置公式生成目标复数X[k],第一预置公式为:
    Figure PCTCN2019117117-appb-100001
    N为2的幂,k为整数,X[k]表示所述加窗后的音频信号中某一频率成分的幅度和相位。
    Call the first preset formula to generate the target complex number X[k], the first preset formula is:
    Figure PCTCN2019117117-appb-100001
    N is a power of 2, k is an integer, and X[k] represents the amplitude and phase of a certain frequency component in the windowed audio signal.
  6. 根据权利要求5所述的基于地址信息的黑名单识别方法,所述将所述目标复数对应到梅尔频谱上,得到对数能量包括:The method for identifying a blacklist based on address information according to claim 5, wherein the corresponding to the target complex number to the Mel spectrum to obtain logarithmic energy comprises:
    通过预置的滤波器组将所述目标复数进行平滑处理;Smoothing the target complex number through a preset filter bank;
    将平滑处理后的复数与梅尔频谱上的梅尔刻度进行对应,一个梅尔刻度表示一个高音单位;Correspond the smoothed complex number to the mel scale on the mel spectrum, and one mel scale represents a treble unit;
    通过第二预置公式将所述平滑处理后的复数对应到梅尔刻度上,得到目标刻度,所述第二预置公式为:
    Figure PCTCN2019117117-appb-100002
    Corresponding the smoothed complex number to the mel scale by a second preset formula to obtain the target scale, the second preset formula is:
    Figure PCTCN2019117117-appb-100002
    根据第三预置公式计算所述目标刻度的对数能量,所述第三预置公式为:The logarithmic energy of the target scale is calculated according to a third preset formula, and the third preset formula is:
    Figure PCTCN2019117117-appb-100003
    H m(k)为滤波器组的频率响应,M表示预置的滤波器组中滤波器个数。
    Figure PCTCN2019117117-appb-100003
    H m (k) is the frequency response of the filter bank, and M represents the number of filters in the preset filter bank.
  7. 根据权利要求1-6中任一所述的基于地址信息的黑名单识别方法,所述获取目标用户的语音文件,所述语音文件包括音频和地址信息,所述地址信息包括进线电话区段或/和IP信息之前,所述方法还包括:The blacklist recognition method based on address information according to any one of claims 1-6, said acquiring a voice file of a target user, said voice file including audio and address information, and said address information including incoming phone segments Or/and before the IP information, the method further includes:
    生成预置的黑名单模型,所述预置的黑名单模型用于黑名单注册。A preset blacklist model is generated, and the preset blacklist model is used for blacklist registration.
  8. 一种基于地址信息的黑名单识别装置,包括:A blacklist recognition device based on address information includes:
    获取单元,用于获取目标用户的语音文件,所述语音文件包括音频和地址信息,所述地址信息包括进线电话区段或/和互联网协议地址IP信息;The acquiring unit is configured to acquire a voice file of the target user, the voice file includes audio and address information, and the address information includes incoming telephone section or/and Internet Protocol address IP information;
    提取单元,用于通过预置算法对所述音频进行特征提取,得到特征文件;An extraction unit, configured to perform feature extraction on the audio by using a preset algorithm to obtain a feature file;
    判断单元,用于判断所述特征文件是否有效;A judging unit for judging whether the feature file is valid;
    第一生成单元,若所述特征文件无效,则用于生成提取失败的状态码,所述状态码用于指示提取失败的原因;The first generating unit, if the signature file is invalid, is used to generate a status code of extraction failure, and the status code is used to indicate the reason for the extraction failure;
    评分单元,若所述特征文件有效,则用于根据所述进线电话区段或所述IP信息确定所述目标用户所属的目标地址范围,调用预置的黑名单模型和所述目标地址范围对特征文件进行相似度评分,并根据评分结果进行相应操作。The scoring unit, if the feature file is valid, is used to determine the target address range to which the target user belongs according to the incoming phone segment or the IP information, and call a preset blacklist model and the target address range Score the similarity of the feature files, and perform corresponding operations based on the scoring results.
  9. 根据权利要求8所述的基于地址信息的黑名单识别装置,其特征在于,所述评分单元具体用于:The device for identifying a blacklist based on address information according to claim 8, wherein the scoring unit is specifically configured to:
    若所述特征文件有效,则基于所述进线电话区段或所述IP信息确定所述目标用户所属的目标地址范围;If the signature file is valid, determine the target address range to which the target user belongs based on the incoming phone segment or the IP information;
    根据所述目标地址范围在所述预置的黑名单模型中确定对应的目标黑名单模型,每个预置的黑名单模型对应一个不同的黑名单特征分库;Determining a corresponding target blacklist model in the preset blacklist model according to the target address range, and each preset blacklist model corresponds to a different blacklist feature sub-database;
    通过所述目标黑名单模型对所述特征文件进行相似度评分,得到目标分值;Scoring the similarity of the feature files through the target blacklist model to obtain a target score;
    若所述目标分值大于或等于第一阈值,则确定所述目标用户在所述目标黑名单模型对应的黑名单特征分库中,并返回第一提示消息,所述第一提示消息用于指示所述目标用户被禁止接受正常服务;If the target score is greater than or equal to the first threshold, it is determined that the target user is in the blacklist feature database corresponding to the target blacklist model, and a first prompt message is returned. The first prompt message is used for Indicating that the target user is prohibited from receiving normal services;
    若所述目标分值小于所述第一阈值,则确定所述目标用户不在所述目标黑名单模型对应的黑名单特征分库中,并返回第二提示消息,所述第二提示消息用于指示所述目标用户接受正常服务。If the target score is less than the first threshold, it is determined that the target user is not in the blacklist feature database corresponding to the target blacklist model, and a second prompt message is returned, where the second prompt message is used for Instruct the target user to accept normal services.
  10. 根据权利要求8所述的基于地址信息的黑名单识别装置,所述获取单元具体用于:According to the device for identifying a blacklist based on address information according to claim 8, the acquiring unit is specifically configured to:
    接收目标用户的语音文件;Receive the voice file of the target user;
    对所述语音文件进行解析,得到所述目标用户的音频和地址标识;Parse the voice file to obtain the audio and address identification of the target user;
    根据所述地址标识查询预置表格,得到与所述地址标识对应的地址信息,所述地址信息包括进线电话区段或/和IP信息。The preset table is queried according to the address identifier to obtain address information corresponding to the address identifier, and the address information includes incoming telephone section or/and IP information.
  11. 根据权利要求8所述的基于地址信息的黑名单识别装置,所述提取单元包括:According to the device for identifying a blacklist based on address information according to claim 8, the extracting unit comprises:
    第一转换模块,用于将所述音频从模拟信号形式转换成数字信号形式;The first conversion module is used to convert the audio from an analog signal form to a digital signal form;
    预加重模块,用于对数字信号形式的音频进行预加重;加窗模块,用于将预加重后的音频进行加窗处理;The pre-emphasis module is used to pre-emphasize the audio in the form of digital signals; the windowing module is used to perform windowing processing on the pre-emphasized audio;
    变换模块,用于将加窗处理后的音频进行离散傅里叶变换,得到目标复数;The transform module is used to perform discrete Fourier transform on the windowed audio to obtain the target complex number;
    对应模块,用于将所述目标复数对应到梅尔频谱上,得到对数能量;The corresponding module is used to map the target complex number to the Mel spectrum to obtain logarithmic energy;
    第二转换模块,用于将所述对数能量进行转换,得到倒谱系数;The second conversion module is used to convert the logarithmic energy to obtain the cepstral coefficient;
    计算模块,用于根据所述倒谱系数计算能量和差分,生成特征文件。The calculation module is used to calculate the energy and the difference according to the cepstral coefficient to generate a characteristic file.
  12. 根据权利要求11所述的基于地址信息的黑名单识别装置,所述变换模块具体用于:According to the device for identifying a blacklist based on address information according to claim 11, the conversion module is specifically configured to:
    获取加窗后的音频信号x[n],...,[m],n和m为大于0的整数;Obtain the windowed audio signal x[n],...,[m], where n and m are integers greater than 0;
    调用第一预置公式生成目标复数X[k],第一预置公式为:
    Figure PCTCN2019117117-appb-100004
    N为2的幂,k为整数,X[k]表示所述加窗后的音频信号中某一频率成分的幅度和相位。
    Call the first preset formula to generate the target complex number X[k], the first preset formula is:
    Figure PCTCN2019117117-appb-100004
    N is a power of 2, k is an integer, and X[k] represents the amplitude and phase of a certain frequency component in the windowed audio signal.
  13. 根据权利要求12所述的基于地址信息的黑名单识别装置,所述对应模块具体用于:According to the device for identifying a blacklist based on address information according to claim 12, the corresponding module is specifically configured to:
    通过预置的滤波器组将所述目标复数进行平滑处理;Smoothing the target complex number through a preset filter bank;
    将平滑处理后的复数与梅尔频谱上的梅尔刻度进行对应,一个梅尔刻度表示一个高音单位;Correspond the smoothed complex number to the mel scale on the mel spectrum, and one mel scale represents a treble unit;
    通过第二预置公式将所述平滑处理后的复数对应到梅尔刻度上,得到目标刻度,所述第二预置公式为:
    Figure PCTCN2019117117-appb-100005
    Corresponding the smoothed complex number to the mel scale by a second preset formula to obtain the target scale, the second preset formula is:
    Figure PCTCN2019117117-appb-100005
    根据第三预置公式计算所述目标刻度的对数能量,所述第三预置公式为:The logarithmic energy of the target scale is calculated according to a third preset formula, and the third preset formula is:
    Figure PCTCN2019117117-appb-100006
    H m(k)为滤波器组的频率响应,M表示预置的滤波器组中滤波器个数。
    Figure PCTCN2019117117-appb-100006
    H m (k) is the frequency response of the filter bank, and M represents the number of filters in the preset filter bank.
  14. 根据权利要求8-13中任一所述的基于地址信息的黑名单识别装置,基于地址信息的黑名单识别装置还包括:The device for identifying a blacklist based on address information according to any one of claims 8-13, the device for identifying a blacklist based on address information further comprises:
    第二生成单元,用于生成预置的黑名单模型,所述预置的黑名单模型用于黑名单注册。The second generating unit is configured to generate a preset blacklist model, and the preset blacklist model is used for blacklist registration.
  15. 一种基于地址信息的黑名单识别设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如下步骤:A blacklist identification device based on address information includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and the processor implements the following steps when the processor executes the computer program:
    获取目标用户的语音文件,所述语音文件包括音频和地址信息,所述地址信息包括进线电话区段或/和互联网协议地址IP信息;Acquire a voice file of the target user, the voice file includes audio and address information, and the address information includes incoming phone segment or/and Internet Protocol address IP information;
    通过预置算法对所述音频进行特征提取,得到特征文件;Perform feature extraction on the audio by using a preset algorithm to obtain a feature file;
    判断所述特征文件是否有效;Determine whether the feature file is valid;
    若所述特征文件无效,则生成提取失败的状态码,所述状态码用于指示提取失败的原因;If the signature file is invalid, a status code of extraction failure is generated, and the status code is used to indicate the reason for the extraction failure;
    若所述特征文件有效,则根据所述进线电话区段或所述IP信息确定所述目标用户所属的目标地址范围,调用预置的黑名单模型和所述目标地址范围对特征文件进行相似度评分,并根据评分结果进行相应操作。If the signature file is valid, the target address range to which the target user belongs is determined according to the incoming phone segment or the IP information, and the preset blacklist model and the target address range are called to perform similarity to the signature file Scoring, and perform corresponding operations based on the scoring results.
  16. 根据权利要求15所述的基于地址信息的黑名单识别设备,所述处理器执行所述计算机程序实现所述所述若所述特征文件有效,则根据所述进线电话区段或所述互联网协议地址IP信息确定所述目标用户所属的目标地址范围,调用预置的黑名单模型和所述目标地址范围对特征文件进行相似度评分,并根据评分结果进行相应操作时,包括以下步骤:The blacklist identification device based on address information according to claim 15, wherein said processor executes said computer program to realize said if said feature file is valid, according to said incoming telephone section or said Internet The protocol address IP information determines the target address range to which the target user belongs, calls the preset blacklist model and the target address range to score the similarity of the signature files, and performs corresponding operations based on the score results, including the following steps:
    若所述特征文件有效,则基于所述进线电话区段或所述IP信息确定所述目标用户所属的目标地址范围;If the signature file is valid, determine the target address range to which the target user belongs based on the incoming phone segment or the IP information;
    根据所述目标地址范围在所述预置的黑名单模型中确定对应的目标黑名单模型,每个预置的黑名单模型对应一个不同的黑名单特征分库;Determining a corresponding target blacklist model in the preset blacklist model according to the target address range, and each preset blacklist model corresponds to a different blacklist feature sub-database;
    通过所述目标黑名单模型对所述特征文件进行相似度评分,得到目标分值;Scoring the similarity of the feature files through the target blacklist model to obtain a target score;
    若所述目标分值大于或等于第一阈值,则确定所述目标用户在所述目标黑名单模型对应的黑名单特征分库中,并返回第一提示消息,所述第一提示消息用于指示所述目标用户被禁止接受正常服务;If the target score is greater than or equal to the first threshold, it is determined that the target user is in the blacklist feature database corresponding to the target blacklist model, and a first prompt message is returned. The first prompt message is used for Indicating that the target user is prohibited from receiving normal services;
    若所述目标分值小于所述第一阈值,则确定所述目标用户不在所述目标黑名单模型对应的黑名单特征分库中,并返回第二提示消息,所述第二提示消息用于指示所述目标用户接受正常服务。If the target score is less than the first threshold, it is determined that the target user is not in the blacklist feature database corresponding to the target blacklist model, and a second prompt message is returned, where the second prompt message is used for Instruct the target user to accept normal services.
  17. 根据权利要求15所述的基于地址信息的黑名单识别设备,所述处理器执行所述计算机程序实现所述获取目标用户的语音文件,所述语音文件包括音频和地址信息,所述地址信息包括进线电话区段或/和IP信息时,包括以下步骤:The blacklist recognition device based on address information according to claim 15, wherein the processor executes the computer program to realize the acquisition of the voice file of the target user, the voice file includes audio and address information, and the address information includes When incoming telephone segment or/and IP information, include the following steps:
    接收目标用户的语音文件;Receive the voice file of the target user;
    对所述语音文件进行解析,得到所述目标用户的音频和地址标识;Parse the voice file to obtain the audio and address identification of the target user;
    根据所述地址标识查询预置表格,得到与所述地址标识对应的地址信息,所述地址信息包括进线电话区段或/和IP信息。The preset table is queried according to the address identifier to obtain address information corresponding to the address identifier, and the address information includes incoming telephone section or/and IP information.
  18. 根据权利要求15所述的基于地址信息的黑名单识别设备,所述处理器执行所述计算机程序实现所述通过预置算法对所述音频进行特征提取,得到特征文件时,包括以下步骤:According to the address information-based blacklist recognition device according to claim 15, when the processor executes the computer program to realize the feature extraction of the audio through a preset algorithm to obtain a feature file, the method includes the following steps:
    将所述音频从模拟信号形式转换成数字信号形式;Converting the audio from an analog signal form to a digital signal form;
    对数字信号形式的音频进行预加重;Pre-emphasize audio in the form of digital signals;
    将预加重后的音频进行加窗处理;Windowing the pre-emphasized audio;
    将加窗处理后的音频进行离散傅里叶变换,得到目标复数;Perform discrete Fourier transform on the windowed audio to obtain the target complex number;
    将所述目标复数对应到梅尔频谱上,得到对数能量;Corresponding the target complex number to the Mel spectrum to obtain logarithmic energy;
    将所述对数能量进行转换,得到倒谱系数;Converting the logarithmic energy to obtain the cepstrum coefficient;
    根据所述倒谱系数计算能量和差分,生成特征文件。The energy and difference are calculated according to the cepstral coefficients, and a characteristic file is generated.
  19. 根据权利要求18所述的基于地址信息的黑名单识别设备,所述处理器执行所述计算机程序实现所述将加窗处理后的音频进行离散傅里叶变换,得到目标复数时,包括以下步骤:The blacklist recognition device based on address information according to claim 18, wherein the processor executes the computer program to implement the discrete Fourier transform of the windowed audio to obtain the target complex number, comprising the following steps :
    获取加窗后的音频信号x[n],...,[m],n和m为大于0的整数;Obtain the windowed audio signal x[n],...,[m], where n and m are integers greater than 0;
    调用第一预置公式生成目标复数X[k],第一预置公式为:
    Figure PCTCN2019117117-appb-100007
    N为2的幂,k为整数,X[k]表示所述加窗后的音频信号中某一频率成分的幅度和相位。
    Call the first preset formula to generate the target complex number X[k], the first preset formula is:
    Figure PCTCN2019117117-appb-100007
    N is a power of 2, k is an integer, and X[k] represents the amplitude and phase of a certain frequency component in the windowed audio signal.
  20. 一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在计算机上运行时,使得计算机执行如下步骤:A computer-readable storage medium in which instructions are stored, and when the instructions are run on a computer, the computer executes the following steps:
    获取目标用户的语音文件,所述语音文件包括音频和地址信息,所述地址信息包括进线电话区段或/和互联网协议地址IP信息;Acquire a voice file of the target user, the voice file includes audio and address information, and the address information includes incoming phone segment or/and Internet Protocol address IP information;
    通过预置算法对所述音频进行特征提取,得到特征文件;Perform feature extraction on the audio by using a preset algorithm to obtain a feature file;
    判断所述特征文件是否有效;Determine whether the feature file is valid;
    若所述特征文件无效,则生成提取失败的状态码,所述状态码用于指示提取失败的原因;If the signature file is invalid, a status code of extraction failure is generated, and the status code is used to indicate the reason for the extraction failure;
    若所述特征文件有效,则根据所述进线电话区段或所述IP信息确定所述目标用户所属的目标地址范围,调用预置的黑名单模型和所述目标地址范围对特征文件进行相似度评分,并根据评分结果进行相应操作。If the signature file is valid, the target address range to which the target user belongs is determined according to the incoming phone segment or the IP information, and the preset blacklist model and the target address range are called to perform similarity to the signature file Scoring, and perform corresponding operations based on the scoring results.
PCT/CN2019/117117 2019-09-19 2019-11-11 Address information-based blacklist identification method, apparatus, device, and storage medium WO2021051533A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910884630.2A CN110767238B (en) 2019-09-19 2019-09-19 Blacklist identification method, device, equipment and storage medium based on address information
CN201910884630.2 2019-09-19

Publications (1)

Publication Number Publication Date
WO2021051533A1 true WO2021051533A1 (en) 2021-03-25

Family

ID=69329776

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117117 WO2021051533A1 (en) 2019-09-19 2019-11-11 Address information-based blacklist identification method, apparatus, device, and storage medium

Country Status (2)

Country Link
CN (1) CN110767238B (en)
WO (1) WO2021051533A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111741457B (en) * 2020-07-16 2023-06-09 Oppo广东移动通信有限公司 Bluetooth communication method and device and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015007231A1 (en) * 2013-07-19 2015-01-22 腾讯科技(深圳)有限公司 Method and device for identification of malicious url
CN106372572A (en) * 2016-08-19 2017-02-01 北京旷视科技有限公司 Monitoring method and apparatus
CN107105108A (en) * 2017-04-21 2017-08-29 天维尔信息科技股份有限公司 A kind of processing method and its system of anti-alarm harassing call
CN109858917A (en) * 2019-02-13 2019-06-07 苏州意能通信息技术有限公司 A kind of anti-fake system and its method based on artificial intelligence

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101262524A (en) * 2008-04-23 2008-09-10 沈阳东软软件股份有限公司 Rubbish voice filtration method and system
CN105872185A (en) * 2016-04-20 2016-08-17 乐视控股(北京)有限公司 Information prompting method, device and system
CN109995732A (en) * 2017-12-30 2019-07-09 中国移动通信集团安徽有限公司 Web portal security access monitoring method, device, equipment and medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015007231A1 (en) * 2013-07-19 2015-01-22 腾讯科技(深圳)有限公司 Method and device for identification of malicious url
CN106372572A (en) * 2016-08-19 2017-02-01 北京旷视科技有限公司 Monitoring method and apparatus
CN107105108A (en) * 2017-04-21 2017-08-29 天维尔信息科技股份有限公司 A kind of processing method and its system of anti-alarm harassing call
CN109858917A (en) * 2019-02-13 2019-06-07 苏州意能通信息技术有限公司 A kind of anti-fake system and its method based on artificial intelligence

Also Published As

Publication number Publication date
CN110767238A (en) 2020-02-07
CN110767238B (en) 2023-07-04

Similar Documents

Publication Publication Date Title
WO2021139425A1 (en) Voice activity detection method, apparatus and device, and storage medium
WO2018149077A1 (en) Voiceprint recognition method, device, storage medium, and background server
WO2020181824A1 (en) Voiceprint recognition method, apparatus and device, and computer-readable storage medium
CN109147796B (en) Speech recognition method, device, computer equipment and computer readable storage medium
CN109215665A (en) A kind of method for recognizing sound-groove based on 3D convolutional neural networks
Leu et al. An MFCC-based speaker identification system
CN108922543B (en) Model base establishing method, voice recognition method, device, equipment and medium
CN113223536B (en) Voiceprint recognition method and device and terminal equipment
CN110880329A (en) Audio identification method and equipment and storage medium
WO2019232826A1 (en) I-vector extraction method, speaker recognition method and apparatus, device, and medium
CN113327626A (en) Voice noise reduction method, device, equipment and storage medium
CN113035202B (en) Identity recognition method and device
Siam et al. A novel speech enhancement method using Fourier series decomposition and spectral subtraction for robust speaker identification
WO2021051533A1 (en) Address information-based blacklist identification method, apparatus, device, and storage medium
Nirjon et al. sMFCC: exploiting sparseness in speech for fast acoustic feature extraction on mobile devices--a feasibility study
Jeyalakshmi et al. HMM and K-NN based automatic musical instrument recognition
CN106228984A (en) Voice recognition information acquisition methods
WO2021217979A1 (en) Voiceprint recognition method and apparatus, and device and storage medium
CN112309404B (en) Machine voice authentication method, device, equipment and storage medium
CN113782005B (en) Speech recognition method and device, storage medium and electronic equipment
CN111402898B (en) Audio signal processing method, device, equipment and storage medium
Upadhyay et al. Robust recognition of English speech in noisy environments using frequency warped signal processing
Zhipeng et al. Voiceprint recognition based on BP Neural Network and CNN
Ahmad et al. The impact of low-pass filter in speaker identification
CN114512133A (en) Sound object recognition method, sound object recognition device, server and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19945695

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19945695

Country of ref document: EP

Kind code of ref document: A1