CN110769425B

CN110769425B - Method and device for judging abnormal call object, computer equipment and storage medium

Info

Publication number: CN110769425B
Application number: CN201910882722.7A
Authority: CN
Inventors: 王珏; 彭俊清; 王健宗
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-09-18
Filing date: 2019-09-18
Publication date: 2022-11-04
Anticipated expiration: 2039-09-18
Also published as: WO2021051504A1; CN110769425A

Abstract

The application discloses a method, a device, computer equipment and a storage medium for judging an abnormal call object, wherein the method comprises the following steps: acquiring a first call voice and a second call voice; respectively extracting first sound data of the first object end and second sound data of the second object end; if the first sound data and the second sound data are both electronic sound, establishing a communication channel; recording call content, and inputting the call content into a preset emotion fluctuation recognition model for processing to obtain a first emotion fluctuation value of the first object end and a second emotion fluctuation value of the second object end; and if the first emotion fluctuation value and the second emotion fluctuation value are both smaller than a preset emotion fluctuation threshold value, judging that the first object end and the second object end are both abnormal call objects. Therefore, the accuracy of judging the abnormal call object is improved.

Description

Method and device for judging abnormal call object, computer equipment and storage medium

Technical Field

The present application relates to the field of computers, and in particular, to a method and an apparatus for determining an abnormal call object, a computer device, and a storage medium.

Background

Abnormal calls (abnormal calls), such as malicious promotions, fraudulent calls, phishing calls, etc., have long been a bothersome problem to users, causing billions of dollars of financial loss every year around the world. The development of artificial intelligence technology has brought forward the rapid maturity of the automatic outbound robot industry in recent years, according to statistics, only thousands of yuan is needed at least in the market at present to introduce the automatic outbound robot product, more and more trade companies use the outbound robot product to replace the manual seat to finish the telephone sales task, and the abnormal telephone problem is more serious under the condition. The standard processing method of 'marking-intercepting' is mostly adopted for processing abnormal calls in the industry, and although the influence of harassing calls on users can be effectively reduced, the standard processing method still has many defects, such as: the accuracy of the mark cannot be verified, if the mark information is wrong, the user may miss important calls and the interception action with uniform indexes is difficult to implement in the face of different users. Therefore, the accuracy of judging the abnormal call is lower at present.

Disclosure of Invention

The application mainly aims to provide a method, a device, a computer device and a storage medium for judging an abnormal call object, and aims to improve the accuracy of judging an abnormal call.

In order to achieve the above object, the present application provides a method for determining an abnormal call object, which is applied to a server, and includes:

acquiring a first call voice and a second call voice, wherein the first call voice is the call voice of a first user end and a first object end, and the second call voice is the call voice of a second user end and a second object end;

according to a preset sound data extraction method, extracting first sound data of the first object end and second sound data of the second object end from the first call voice and the second call voice respectively;

judging whether the first sound data is electronic sound or not according to a preset electronic sound judgment method, and judging whether the second sound data is electronic sound or not;

if the first sound data and the second sound data are both electronic sounds, a communication channel is constructed, and the communication channel is used for connecting the first object end and the second object end;

recording the conversation contents of the first object terminal and the second object terminal, and inputting the conversation contents into a preset emotion fluctuation recognition model for processing to obtain a first emotion fluctuation value of the first object terminal and a second emotion fluctuation value of the second object terminal;

judging whether the first emotion fluctuation value and the second emotion fluctuation value are both smaller than a preset emotion fluctuation threshold value;

and if the first emotion fluctuation value and the second emotion fluctuation value are both smaller than a preset emotion fluctuation threshold value, judging that the first object end and the second object end are both abnormal call objects.

Further, the step of extracting first sound data of the first object side and second sound data of the second object side from the first call voice and the second call voice respectively according to a preset sound data extraction method includes:

acquiring a first voiceprint characteristic corresponding to a pre-stored first user side and a second voiceprint characteristic corresponding to a pre-stored second user side;

clustering the first call voice according to a preset speaker clustering technology to obtain two first voice sets with different voiceprint characteristics, and recording the first voice set which does not accord with the first voiceprint characteristics as first voice data of the first object end;

clustering the second communication voice according to a preset speaker clustering technology to obtain two second voice sets with different voiceprint characteristics, and recording the second voice set which does not accord with the second voiceprint characteristics as second voice data of the second object end;

the first sound data and the second sound data are extracted.

Further, the step of determining whether the first sound data is an electronic sound according to a preset electronic sound determination method includes:

generating an expression function F (t) of a waveform diagram corresponding to the first sound data according to the first sound data;

according to the formula:

h (t) = min (G (t), m), wherein

Obtaining a function H (t), wherein F (t) is an expression function of a waveform diagram of a preset electronic sound, E (t) is a difference function of the function F (t) and the function F (t),

the differential function of the difference function to the time is taken as the differential function, t is the time, and m is a preset error parameter value which is larger than 0;

acquiring a first time length of the function H (t) on a time axis when the time length is not equal to m and a second time length when the time length is equal to m, and according to a formula: the attaching degree value = the first time length/(the first time length + the second time length), the attaching degree value is calculated, and whether the attaching degree value is greater than a preset attaching threshold value is judged;

and if the fitting degree value is larger than a preset fitting threshold value, judging that the first sound data is electronic sound.

Further, after the step of determining whether the first sound data is an electronic sound and determining whether the second sound data is an electronic sound according to a preset electronic sound determining method, the method includes:

if only one of the first sound data and the second sound data is an electronic sound, marking the first sound data or the second sound data which is the electronic sound as suspected sound data, and marking an object end corresponding to the suspected sound data as a suspected object end;

establishing a communication channel to connect the suspect object end with a preset response robot;

recording the conversation content of the suspected object end and a preset answering robot, and inputting the conversation content into a preset emotion fluctuation recognition model for processing to obtain a suspected emotion fluctuation value of the suspected object end;

judging whether the suspected emotion fluctuation value is smaller than a preset emotion fluctuation threshold value or not;

and if the suspected emotion fluctuation value is smaller than a preset emotion fluctuation threshold value, judging that the suspected object end is an abnormal call object.

Further, the step of recording the call content between the suspected object end and a preset response robot, and inputting the call content into a preset emotion fluctuation recognition model for processing to obtain a suspected emotion fluctuation value of the suspected object end includes:

inputting a stimulating sound in a communication channel by using the response robot, wherein the stimulating sound comprises noise, sound with volume larger than a preset volume threshold value or sound with frequency higher than a preset frequency threshold value;

and generating a call record instruction, wherein the call record instruction is used for instructing and recording call contents of the suspected object end and a preset response robot, and the call contents at least comprise replies of the suspected object end to the stimulation sound.

Further, the step of inputting the call content into a preset emotion fluctuation recognition model for processing to obtain a first emotion fluctuation value of the first object side and a second emotion fluctuation value of the second object side includes:

separating a first voice fragment set of the first object end and a second voice fragment set of the second object end from the call content;

collecting first sound characteristic data of the first voice fragment set and second sound characteristic data of the second voice fragment set;

according to the formula: and calculating a first mood fluctuation value corresponding to the first object end and a second mood fluctuation value corresponding to the second object end according to the mood fluctuation value = (the maximum value of the voice characteristic data-the minimum value of the voice characteristic data)/the average value of the voice characteristic data.

Further, the step of acquiring a first call voice and a second call voice, where the first call voice is a call voice of a first user end and a first object end, and the second call voice is a call voice of a second user end and a second object end, includes:

acquiring the telephone numbers and the telephone number activation time of the first object terminal and the second object terminal;

judging whether the telephone numbers of the first object terminal and the second object terminal both belong to a preset abnormal database;

if the telephone numbers of the first object terminal and the second object terminal do not belong to a preset abnormal database, judging whether the activation time of the telephone numbers is later than a preset time point;

and if the activation time of the telephone number is later than a preset time point, generating a call voice acquisition instruction, wherein the call voice acquisition instruction is used for indicating to acquire a first call voice and a second call voice.

The application provides a judgment device of abnormal call object, which is applied to a server and comprises:

the device comprises a call voice acquisition unit, a first call voice acquisition unit and a second call voice acquisition unit, wherein the call voice acquisition unit is used for acquiring a first call voice and a second call voice, the first call voice is the call voice of a first user end and a first object end, and the second call voice is the call voice of a second user end and a second object end;

a sound data extraction unit, configured to extract first sound data of the first object end and second sound data of the second object end from the first call voice and the second call voice, respectively, according to a preset sound data extraction method;

the electronic sound judging unit is used for judging whether the first sound data is electronic sound or not according to a preset electronic sound judging method and judging whether the second sound data is electronic sound or not;

a communication channel construction unit, configured to construct a communication channel if the first sound data and the second sound data are both electronic tones, where the communication channel is used to connect the first object end and the second object end;

the call content recording unit is used for recording the call content of the first object terminal and the second object terminal, and inputting the call content into a preset emotion fluctuation recognition model for processing to obtain a first emotion fluctuation value of the first object terminal and a second emotion fluctuation value of the second object terminal;

the emotion fluctuation threshold value judging unit is used for judging whether the first emotion fluctuation value and the second emotion fluctuation value are both smaller than a preset emotion fluctuation threshold value;

and the abnormal call object judging unit is used for judging that the first object end and the second object end are abnormal call objects if the first emotion fluctuation value and the second emotion fluctuation value are both smaller than a preset emotion fluctuation threshold value.

The present application provides a computer device comprising a memory storing a computer program and a processor implementing the steps of any of the above methods when the processor executes the computer program.

The present application provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of any of the above.

According to the method and the device for judging the abnormal call object, the computer equipment and the storage medium, the first call voice is obtained, and the second call voice is obtained; extracting first sound data of the first object end and second sound data of the second object end from the first call voice and the second call voice respectively; if the first sound data and the second sound data are both electronic sounds, a communication channel is constructed, and the communication channel is used for connecting the first object end and the second object end; recording the conversation content of the first object end and the second object end, and inputting the conversation content into a preset emotion fluctuation recognition model for processing to obtain a first emotion fluctuation value of the first object end and a second emotion fluctuation value of the second object end; and if the first emotion fluctuation value and the second emotion fluctuation value are both smaller than a preset emotion fluctuation threshold value, judging that the first object end and the second object end are both abnormal call objects. Therefore, the accuracy of judging the abnormal call object is improved.

Drawings

Fig. 1 is a schematic flowchart illustrating a method for determining an abnormal call object according to an embodiment of the present application;

fig. 2 is a schematic block diagram of a structure of an abnormal call target determination device according to an embodiment of the present application;

fig. 3 is a block diagram illustrating a structure of a computer device according to an embodiment of the present application.

The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Referring to fig. 1, an embodiment of the present application provides a method for determining an abnormal call object, which is applied to a server and includes:

the method comprises the steps of S1, obtaining a first call voice and a second call voice, wherein the first call voice is the call voice of a first user end and a first object end, and the second call voice is the call voice of a second user end and a second object end;

s2, according to a preset sound data extraction method, extracting first sound data of the first object end and second sound data of the second object end from the first call voice and the second call voice respectively;

s3, judging whether the first sound data is electronic sound or not according to a preset electronic sound judging method, and judging whether the second sound data is electronic sound or not;

s4, if the first sound data and the second sound data are both electronic sounds, a communication channel is constructed and used for connecting the first object end and the second object end;

s5, recording the conversation contents of the first object terminal and the second object terminal, and inputting the conversation contents into a preset emotion fluctuation recognition model for processing to obtain a first emotion fluctuation value of the first object terminal and a second emotion fluctuation value of the second object terminal;

s6, judging whether the first emotion fluctuation value and the second emotion fluctuation value are both smaller than a preset emotion fluctuation threshold value;

and S7, if the first emotion fluctuation value and the second emotion fluctuation value are both smaller than a preset emotion fluctuation threshold value, judging that the first object end and the second object end are both abnormal call objects.

According to the method and the device, the communication channel is constructed and used for communicating the first object end with the second object end, and the emotion fluctuation value is used for judging whether the first object end and the second object end are abnormal communication objects, so that the calculation pressure is relieved (construction is not needed and a response robot is used), and the accuracy of judging abnormal communication is improved (natural people have emotion fluctuation, and the robot does not). The idea of the technical scheme is that the first object end and the second object end are assumed to be robots, the voice adopted by the robots is electronic synthetic sound (electronic sound), the two robots are connected, the robot can continuously talk to achieve the purpose of malicious promotion or information collection due to the mechanical property of the robots, and the robots do not have emotional fluctuation in the talking process, so that whether the robots are abnormal talking objects or not is judged. The abnormal call object refers to a call object which accords with the judging method of the application.

As described in the above step S1, a first call voice is obtained, and a second call voice is obtained, where the first call voice is a call voice of a first user end and a first object end, and the second call voice is a call voice of a second user end and a second object end. Wherein the first call voice and the second call voice are both acquired in a call that remains connected at the current time. The first user side and the second user side are consumption sides of the service provided by the server of the application, and the first object side and the second object side are objects to be judged by the application.

As described in step S2, according to a preset sound data extraction method, the first sound data of the first object side and the second sound data of the second object side are extracted from the first call voice and the second call voice, respectively. Because the voiceprint characteristics of people are different, the voiceprint characteristics can be used as the basis of identity authentication, and the voices of two call objects can be distinguished, so that voice data can be extracted. The method for extracting the sound data can be any method, for example, the method comprises the steps of obtaining a first voiceprint characteristic corresponding to a pre-stored first user side and obtaining a second voiceprint characteristic corresponding to a pre-stored second user side; clustering the first call voice according to a preset speaker clustering technology to obtain two first voice sets with different voiceprint characteristics, and recording the first voice set which does not accord with the first voiceprint characteristics as first voice data of the first object end; clustering the second communication voice according to a preset speaker clustering technology to obtain two second voice sets with different voiceprint characteristics, and recording the second voice set which does not accord with the second voiceprint characteristics as second voice data of the second object end; the first sound data and the second sound data are extracted.

As described in step S3, according to a predetermined electronic sound determination method, it is determined whether the first sound data is an electronic sound, and it is determined whether the second sound data is an electronic sound. The speech sound used by the robot is an electronic synthetic sound (electronic sound), and the speech sound of a natural person is generally clearly different from the electronic sound, so that if the sound data is determined to be an electronic sound, it is possible to suspect that the sound data is an abnormal target. Specifically, the predetermined electronic sound determination method may be any method, for example, comparing the sound data with the electronic sound in the pre-stored electronic sound database, and if the comparison result is similar, determining that the sound data is the electronic sound. More specifically, the preset electronic sound judgment method includes: recognizing the first sound data to obtain a text; generating reference voice by adopting preset electronic sound according to the text; judging the similarity degree of the reference voice and the first sound data; and if the similarity degree is greater than a preset threshold value, judging that the first sound data is an electronic sound. The similarity degree can be any feasible similarity degree, such as the similarity degree of the voiceprint features, the similarity degree of the waveform images, and the like.

As described in step S4, if the first sound data and the second sound data are both electronic sounds, a communication channel is constructed, and the communication channel is used to connect the first object end and the second object end. If the first sound data and the second sound data are both electronic sounds, the first object end and the second object end may be both robots, and a communication channel for connecting the first object end and the second object end is constructed accordingly. Therefore, the conversation content can be acquired on the premise of not constructing a response robot and using the response robot.

In step S5, the call content of the first object side and the second object side is recorded, and the call content is input into a preset emotion fluctuation recognition model for processing, so as to obtain a first emotion fluctuation value of the first object side and a second emotion fluctuation value of the second object side. The preset emotion fluctuation recognition model can be any model, such as an emotion fluctuation recognition model based on a machine learning model. The machine learning model is, for example, a neural network model, a convolutional neural network model, a long-short term memory network model, etc., and will not be described herein again. The inputting of the call content into a preset emotion fluctuation recognition model for processing includes: separating a first voice fragment set of the first object end and a second voice fragment set of the second object end from the call content; collecting first sound characteristic data of the first voice fragment set and second sound characteristic data of the second voice fragment set; according to the formula: and calculating a first mood fluctuation value corresponding to the first object end and a second mood fluctuation value corresponding to the second object end according to the mood fluctuation value = (the maximum value of the voice characteristic data-the minimum value of the voice characteristic data)/the average value of the voice characteristic data. The voice characteristics of natural people are related to emotions, for example, the volume when the people are irritated is generally larger than the volume when the people are calm, so that the emotion fluctuation value can be calculated through the conversation content.

As described in step S6 above, it is determined whether both the first mood swing value and the second mood swing value are smaller than a preset mood swing threshold value. The mood swing value reflects the mood change amplitude, which is associated with natural people, while the robot has no mood swing. Therefore, the value of the emotional fluctuation is used as a basis for judging whether the terminal is an abnormal object.

As described in step S7, if both the first mood swing value and the second mood swing value are smaller than the preset mood swing threshold, it is determined that both the first object side and the second object side are abnormal call targets. If the first emotion fluctuation value and the second emotion fluctuation value are both smaller than a preset emotion fluctuation threshold value, it is indicated that no large emotion change occurs in the first object end and the second object end, and accordingly it is determined that the first object end and the second object end are both abnormal call objects.

In one embodiment, the step S2 of extracting the first sound data of the first object side and the second sound data of the second object side from the first call voice and the second call voice respectively according to a preset sound data extraction method includes:

s201, obtaining a first voiceprint characteristic corresponding to a pre-stored first user side and obtaining a second voiceprint characteristic corresponding to a pre-stored second user side;

s202, clustering the first call voice according to a preset speaker clustering technology to obtain two first voice sets with different voiceprint characteristics, and recording the first voice sets which do not accord with the first voiceprint characteristics as first voice data of the first object end;

s203, clustering the second communication voice according to a preset speaker clustering technology to obtain two second voice sets with different voiceprint characteristics, and recording the second voice set which does not accord with the second voiceprint characteristics as second voice data of the second object end;

and S204, extracting the first sound data and the second sound data.

As described above, it is realized that the first sound data of the first object side and the second sound data of the second object side are extracted from the first call voice and the second call voice. Since the first user side and the second user side are consumers of the server of the application, corresponding voiceprint features are stored in the server in advance, and therefore the voices of the first user side and the second user side can be recognized. The preset speaker clustering technology is to classify voice segments with the same voiceprint characteristics into one class, so as to form a voice set, so that a first call voice mixed with two speakers is separated into two first voice sets. Wherein one of the two first speech sets is a first user terminal and one is a first object terminal, and the first speech set not conforming to the first voiceprint characteristic is first sound data of the first object terminal. And similarly, second sound data of the second object terminal can be acquired.

In one embodiment, the step S3 of determining whether the first sound data is an electronic sound according to a preset electronic sound determination method includes:

s301, generating an expression function F (t) of a oscillogram corresponding to the first sound data according to the first sound data;

s302, according to a formula:

h (t) = min (G (t), m), wherein

s303, acquiring a first time length of the function H (t) on a time axis when the time length is not equal to m and a second time length when the time length is equal to m, and according to a formula: the attaching degree value = the first time length/(the first time length + the second time length), the attaching degree value is calculated, and whether the attaching degree value is greater than a preset attaching threshold value is judged;

s304, if the fit degree value is larger than a preset fit threshold value, determining that the first sound data is electronic sound.

As described above, the determination of whether or not the first sound data is an electronic sound is realized. The sound is generated by mechanical vibration, and the sound itself has a corresponding waveform diagram having a corresponding functional expression, thereby generating an expression function F (t) of the waveform diagram corresponding to the first sound data. Then according to the formula:

h (t) = min (G (t), m), wherein

Acquiring a function H (t), acquiring a first time length of the function H (t) on a time axis when the function H (t) is not equal to m and a second time length when the function H (t) is equal to m, and according to a formula: the attaching degree value = the first time length/(the first time length + the second time length), the attaching degree value is calculated, and whether the attaching degree value is greater than a preset attaching threshold value is judged; and if the fitting degree value is larger than a preset fitting threshold value, judging that the first sound data is electronic sound. Therefore, the function H (t) is used to determine whether the first sound data is close to the preset electronic sound, and accordingly, whether the first sound data is the electronic sound is determined.

In one embodiment, after the step S3 of determining whether the first sound data is an electronic sound and determining whether the second sound data is an electronic sound according to a preset electronic sound determination method, the method includes:

s31, if only one of the first sound data and the second sound data is an electronic sound, marking the first sound data or the second sound data which is the electronic sound as suspect sound data, and marking an object end corresponding to the suspect sound data as a suspect object end;

s32, establishing a communication channel to connect the suspect object end with a preset response robot;

s33, recording the conversation content of the suspected object end and a preset response robot, and inputting the conversation content into a preset emotion fluctuation recognition model for processing to obtain a suspected emotion fluctuation value of the suspected object end;

s34, judging whether the suspected emotion fluctuation value is smaller than a preset emotion fluctuation threshold value or not;

and S35, if the suspected emotion fluctuation value is smaller than a preset emotion fluctuation threshold value, judging that the suspected object end is an abnormal call object.

As described above, the determination of the abnormal call target when only a single suspect target exists is realized. If only one of the first sound data and the second sound data is the electronic sound, the call content cannot be acquired by connecting the first object terminal and the second object terminal. Alternatively, the method and the device adopt a preset answering robot to communicate with the suspected object side, so that communication contents are obtained. And the suspected object side or the robot, the call between the answering robot and the suspected object side can be carried out, and the privacy of the user side cannot be leaked. And judging whether the suspected emotion fluctuation value is smaller than a preset emotion fluctuation threshold value or not according to the call content, and if the suspected emotion fluctuation value is smaller than the preset emotion fluctuation threshold value, judging that the suspected object end is an abnormal call object.

In one embodiment, before the step S33 of recording call contents between the suspected object end and a preset response robot, and inputting the call contents into a preset emotion fluctuation recognition model for processing, to obtain a suspected emotion fluctuation value of the suspected object end, the method includes:

s321, inputting a stimulating sound in a communication channel by using the response robot, wherein the stimulating sound comprises noise, a sound with a volume larger than a preset volume threshold value or a sound with a frequency higher than a preset frequency threshold value;

and S322, generating a call record instruction, wherein the call record instruction is used for instructing to record call contents of the suspected object end and a preset response robot, and the call contents at least comprise a reply of the suspected object end to the stimulus sound.

As described above, the method for inputting the stimulation sound is realized, and the accuracy of judgment is improved. If the object side is a natural person, and the communication object is not found to be a robot, or the self-control ability of the natural person is strong, the emotion fluctuation of the object side is not large, so that the judgment error of the abnormal communication object is caused. The method for inputting the stimulation sound in the communication channel by the response robot is utilized, so that the natural person can be stimulated to generate stress response (such as screaming), the stress response is difficult to control by the natural person, and the robot is not influenced by the stress response, so that the emotion fluctuation value of the natural person is improved, and the accuracy of judging the abnormal communication object is improved.

In one embodiment, the step S5 of inputting the call content into a preset emotion fluctuation recognition model for processing to obtain a first emotion fluctuation value of the first object side and a second emotion fluctuation value of the second object side includes:

s501, separating a first voice fragment set of the first object end and a second voice fragment set of the second object end from the call content;

s502, collecting first sound characteristic data of the first sound fragment set and second sound characteristic data of the second sound fragment set;

s503, according to the formula: the emotion fluctuation value = (the maximum value of the sound characteristic data-the minimum value of the sound characteristic data)/the average value of the sound characteristic data, and a first emotion fluctuation value corresponding to the first object end and a second emotion fluctuation value corresponding to the second object end are obtained through calculation.

As described above, the call content is input into a preset emotion fluctuation recognition model for processing, so that a first emotion fluctuation value of the first object side and a second emotion fluctuation value of the second object side are obtained. The sound feature data may be any data, such as audio, volume, speech rate, etc. If the emotion is steady, the voice feature data during the call should be kept within a certain range, and if the emotion is excited, the voice feature data will be changed greatly. Thus, according to the formula: the emotion fluctuation value = (the maximum value of the sound characteristic data-the minimum value of the sound characteristic data)/the average value of the sound characteristic data, and a first emotion fluctuation value corresponding to the first object end and a second emotion fluctuation value corresponding to the second object end are obtained through calculation.

In one embodiment, the obtaining a first call voice and obtaining a second call voice, where the first call voice is a call voice of a first user terminal and a first object terminal, and the second call voice is a call voice of a second user terminal and a second object terminal, before step S1, includes:

s01, acquiring the telephone numbers and the telephone number activation time of the first object terminal and the second object terminal;

s02, judging whether the telephone numbers of the first object terminal and the second object terminal belong to a preset abnormal database;

s03, if the telephone numbers of the first object end and the second object end do not belong to a preset abnormal database, judging whether the activation time of the telephone numbers is later than a preset time point;

and S04, if the activation time of the telephone number is later than a preset time point, generating a call voice acquisition instruction, wherein the call voice acquisition instruction is used for instructing to acquire a first call voice and a second call voice.

As described above, it is realized to adopt the judgment in advance to identify the abnormal call terminal. Generally speaking, if a telephone number is marked and stored in a preset abnormal database, it indicates that the telephone number is frequently subjected to abnormal calls such as malicious promotion, fraud and the like, and at this time, it can be directly determined that the telephone number is an abnormal call object; otherwise, it cannot be determined whether the call is an abnormal call object, and further judgment is needed. If the activation time of the telephone number is later than the preset time point, the telephone number is a new network access user, and the telephone number is possibly an abnormal call object due to lack of enough call record information, further identification is needed, and a call voice acquisition instruction is generated according to the call voice acquisition instruction, wherein the call voice acquisition instruction is used for instructing acquisition of the first call voice and acquisition of the second call voice.

According to the method for judging the abnormal call object, a first call voice is obtained, and a second call voice is obtained; extracting first sound data of the first object end and second sound data of the second object end from the first call voice and the second call voice respectively; if the first sound data and the second sound data are both electronic sounds, a communication channel is constructed, and the communication channel is used for connecting the first object end and the second object end; recording the conversation content of the first object end and the second object end, and inputting the conversation content into a preset emotion fluctuation recognition model for processing to obtain a first emotion fluctuation value of the first object end and a second emotion fluctuation value of the second object end; and if the first emotion fluctuation value and the second emotion fluctuation value are both smaller than a preset emotion fluctuation threshold value, judging that the first object end and the second object end are both abnormal call objects. Therefore, the accuracy of judging the abnormal call object is improved.

Referring to fig. 2, an apparatus for determining an abnormal call target according to an embodiment of the present application is applied to a server, and includes:

a call voice acquiring unit 10, configured to acquire a first call voice, which is a call voice of a first user terminal and a first object terminal, and acquire a second call voice, which is a call voice of a second user terminal and a second object terminal;

a sound data extraction unit 20, configured to extract first sound data of the first object end and second sound data of the second object end from the first call voice and the second call voice respectively according to a preset sound data extraction method;

an electronic sound determination unit 30, configured to determine whether the first sound data is an electronic sound according to a preset electronic sound determination method, and determine whether the second sound data is an electronic sound;

a communication channel constructing unit 40, configured to construct a communication channel if the first sound data and the second sound data are both electronic sounds, where the communication channel is used to connect the first object end and the second object end;

a call content recording unit 50, configured to record call content of the first object side and the second object side, and input the call content into a preset emotion fluctuation recognition model for processing, so as to obtain a first emotion fluctuation value of the first object side and a second emotion fluctuation value of the second object side;

an emotion fluctuation threshold value determination unit 60 configured to determine whether the first emotion fluctuation value and the second emotion fluctuation value are both smaller than a preset emotion fluctuation threshold value;

an abnormal call object determining unit 70, configured to determine that the first object end and the second object end are both abnormal call objects if the first emotion fluctuation value and the second emotion fluctuation value are both smaller than a preset emotion fluctuation threshold value.

The operations executed by the above units correspond to the steps of the method for determining an abnormal call object in the foregoing embodiment one by one, and are not described herein again.

In one embodiment, the sound data extracting unit 20 includes:

the voiceprint feature acquisition subunit is configured to acquire a first voiceprint feature corresponding to a pre-stored first user side and acquire a second voiceprint feature corresponding to a pre-stored second user side;

a first voice data obtaining subunit, configured to perform clustering processing on the first call voice according to a preset speaker clustering technique, so as to obtain two first voice sets with different voiceprint characteristics, and mark the first voice set that does not conform to the first voiceprint characteristics as first voice data of the first object end;

a second voice data acquisition subunit, configured to perform clustering processing on the second voice according to a preset speaker clustering technique, so as to obtain two second voice sets with different voiceprint characteristics, and record the second voice set that does not conform to the second voiceprint characteristics as second voice data of the second object;

a sound data extraction subunit operable to extract the first sound data and the second sound data.

The operations executed by the sub-units respectively correspond to the steps of the method for determining an abnormal call object in the foregoing embodiment one by one, and are not described herein again.

In one embodiment, the electronic sound determination unit 30 includes:

an expression function F (t) generation subunit, configured to generate, according to the first sound data, an expression function F (t) of a waveform corresponding to the first sound data;

a function H (t) obtaining subunit configured to:

h (t) = min (G (t), m), wherein

a fitting degree value calculating subunit, configured to obtain a first time length of the function H (t) on a time axis that is not equal to m and a second time length of the function H (t) that is equal to m, according to a formula: the attaching degree value = the first time length/(the first time length + the second time length), the attaching degree value is calculated, and whether the attaching degree value is greater than a preset attaching threshold value is judged;

and the electronic sound judging subunit is used for judging that the first sound data is an electronic sound if the bonding degree value is greater than a preset bonding threshold value.

In one embodiment, the apparatus comprises:

a suspected sound marking unit, configured to mark the first sound data or the second sound data as suspected sound data if only one of the first sound data and the second sound data is an electronic sound, where an object end corresponding to the suspected sound data is a suspected object end;

a call channel construction unit for constructing a call channel to connect the suspect target end with a preset answering robot;

the suspected emotion fluctuation value acquisition unit is used for recording conversation contents of the suspected object end and a preset response robot, inputting the conversation contents into a preset emotion fluctuation identification model for processing, and obtaining a suspected emotion fluctuation value of the suspected object end;

the suspected emotion fluctuation value judgment unit is used for judging whether the suspected emotion fluctuation value is smaller than a preset emotion fluctuation threshold value or not;

and the suspected object side determining unit is used for determining that the suspected object side is an abnormal call object if the suspected emotion fluctuation value is smaller than a preset emotion fluctuation threshold value.

In one embodiment, the apparatus comprises:

a stimulus sound input unit for inputting a stimulus sound in a communication channel by using the answering robot, wherein the stimulus sound comprises noise, a sound with a volume greater than a preset volume threshold value, or a sound with a frequency greater than a preset frequency threshold value;

and the call record instruction generating unit is used for generating a call record instruction, wherein the call record instruction is used for indicating and recording the call content of the suspected object end and a preset response robot, and the call content at least comprises the reply of the suspected object end to the stimulation sound.

The operations executed by the above units correspond to the steps of the method for determining an abnormal call object in the foregoing embodiment one by one, which is not described herein again.

In one embodiment, the call content recording unit 50 includes:

a voice fragment set acquiring subunit, configured to separate, from the call content, a first voice fragment set of the first object end and a second voice fragment set of the second object end;

a sound feature data acquisition subunit, configured to acquire first sound feature data of the first voice fragment set and second sound feature data of the second voice fragment set;

an emotion fluctuation value calculation operator unit for, according to the formula: and calculating a first mood fluctuation value corresponding to the first object end and a second mood fluctuation value corresponding to the second object end according to the mood fluctuation value = (the maximum value of the voice characteristic data-the minimum value of the voice characteristic data)/the average value of the voice characteristic data.

The sub-units are respectively used for executing operations corresponding to the steps of the method for determining an abnormal call object in the foregoing embodiment one by one, and are not described herein again.

In one embodiment, the apparatus comprises:

a telephone number obtaining unit, configured to obtain a telephone number and a telephone number activation time of the first object side and the second object side;

an abnormal database judging unit, configured to judge whether the telephone numbers of the first object side and the second object side both belong to a preset abnormal database;

the time point judging unit is used for judging whether the activation time of the telephone number is later than a preset time point if the telephone numbers of the first object end and the second object end do not belong to a preset abnormal database;

and the call voice acquisition instruction generating unit is used for generating a call voice acquisition instruction if the activation time of the telephone number is later than a preset time point, wherein the call voice acquisition instruction is used for indicating to acquire a first call voice and a second call voice.

The judging device for the abnormal call object acquires a first call voice and a second call voice; extracting first sound data of the first object end and second sound data of the second object end from the first call voice and the second call voice respectively; if the first sound data and the second sound data are both electronic sounds, a communication channel is constructed, and the communication channel is used for connecting the first object end and the second object end; recording the conversation content of the first object end and the second object end, and inputting the conversation content into a preset emotion fluctuation recognition model for processing to obtain a first emotion fluctuation value of the first object end and a second emotion fluctuation value of the second object end; and if the first emotion fluctuation value and the second emotion fluctuation value are both smaller than a preset emotion fluctuation threshold value, judging that the first object end and the second object end are both abnormal call objects. Therefore, the accuracy of judging the abnormal call object is improved.

Referring to fig. 3, an embodiment of the present invention further provides a computer device, where the computer device may be a server, and an internal structure of the computer device may be as shown in the figure. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The database of the computer device is used for storing data used by the judgment method of the abnormal call object. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to realize a method for judging an abnormal call object.

The processor executes the method for determining the abnormal call object, wherein the steps of the method are in one-to-one correspondence with the steps of the method for determining the abnormal call object in the foregoing embodiment, and are not described herein again.

It will be appreciated by those skilled in the art that the architecture shown in the figures is merely a block diagram of some of the structures associated with the embodiments of the present application and is not intended to limit the scope of the present application.

The computer equipment acquires a first call voice and a second call voice; extracting first sound data of the first object end and second sound data of the second object end from the first call voice and the second call voice respectively; if the first sound data and the second sound data are both electronic sounds, a communication channel is constructed and used for connecting the first object end and the second object end; recording the conversation contents of the first object terminal and the second object terminal, and inputting the conversation contents into a preset emotion fluctuation recognition model for processing to obtain a first emotion fluctuation value of the first object terminal and a second emotion fluctuation value of the second object terminal; and if the first emotion fluctuation value and the second emotion fluctuation value are both smaller than a preset emotion fluctuation threshold value, judging that the first object end and the second object end are both abnormal call objects. Therefore, the accuracy of judging the abnormal call object is improved.

An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored thereon, and when the computer program is executed by a processor, the method for determining an abnormal call object is implemented, where steps included in the method correspond to steps of the method for determining an abnormal call object in the foregoing embodiment one to one, and are not described herein again.

The computer-readable storage medium of the application acquires a first call voice and acquires a second call voice; extracting first sound data of the first object end and second sound data of the second object end from the first call voice and the second call voice respectively; if the first sound data and the second sound data are both electronic sounds, a communication channel is constructed and used for connecting the first object end and the second object end; recording the conversation content of the first object end and the second object end, and inputting the conversation content into a preset emotion fluctuation recognition model for processing to obtain a first emotion fluctuation value of the first object end and a second emotion fluctuation value of the second object end; and if the first emotion fluctuation value and the second emotion fluctuation value are both smaller than a preset emotion fluctuation threshold value, judging that the first object end and the second object end are both abnormal call objects. Therefore, the accuracy of judging the abnormal call object is improved.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (SSRDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct bused dynamic RAM (DRDRAM), and bused dynamic RAM (RDRAM).

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of another identical element in a process, apparatus, article, or method comprising the element.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. A method for judging an abnormal call object is applied to a server and is characterized by comprising the following steps:

judging whether the first sound data is electronic sound or not according to a preset electronic sound judging method, and judging whether the second sound data is electronic sound or not;

if the first sound data and the second sound data are both electronic sounds, a communication channel is constructed and used for connecting the first object end and the second object end;

recording the conversation content of the first object end and the second object end, and inputting the conversation content into a preset emotion fluctuation recognition model for processing to obtain a first emotion fluctuation value of the first object end and a second emotion fluctuation value of the second object end;

2. The method according to claim 1, wherein the step of extracting first voice data of the first target terminal and second voice data of the second target terminal from the first call voice and the second call voice, respectively, according to a predetermined voice data extraction method comprises:

clustering the second speech according to a preset speaker clustering technology to obtain two second speech sets with different voiceprint characteristics, and recording the second speech sets which do not accord with the second voiceprint characteristics as second sound data of the second object end;

the first sound data and the second sound data are extracted.

3. The method for determining an abnormal called party according to claim 1, wherein the step of determining whether the first voice data is an electronic sound according to a predetermined electronic sound determination method comprises:

generating an expression function F (t) of a oscillogram corresponding to the first sound data according to the first sound data;

according to the formula:

h (t) = min (G (t), m), wherein

E (t) = F (t) -F (t), obtaining function H (t), wherein F (t) is of preset electronic soundAn expression function of the waveform plot, E (t) being a difference function of the function F (t) and the function F (t),

the differential function of the difference function to time is obtained, t is time, and m is a preset error parameter value which is larger than 0;

4. The method according to claim 1, wherein the step of determining whether the first sound data is an electronic sound and whether the second sound data is an electronic sound according to a predetermined electronic sound determination method is followed by the step of:

establishing a communication channel to connect the suspect object end with a preset answering robot;

5. The method for determining the abnormal call target according to claim 4, wherein before the step of recording the call contents between the suspected object end and a preset answering robot and inputting the call contents into a preset emotion fluctuation recognition model for processing to obtain the suspected emotion fluctuation value of the suspected object end, the method comprises:

inputting a stimulation sound in a communication channel by using the response robot, wherein the stimulation sound comprises noise, sound with volume larger than a preset volume threshold value or sound with frequency higher than a preset frequency threshold value;

6. The method according to claim 1, wherein the step of inputting the call content into a preset emotion fluctuation recognition model for processing to obtain a first emotion fluctuation value of the first object side and a second emotion fluctuation value of the second object side includes:

according to the formula: the emotion fluctuation value = (the maximum value of the sound characteristic data-the minimum value of the sound characteristic data)/the average value of the sound characteristic data, and a first emotion fluctuation value corresponding to the first object end and a second emotion fluctuation value corresponding to the second object end are obtained through calculation.

7. The method according to claim 1, wherein the step of obtaining a first call voice and a second call voice is preceded by the step of obtaining a first call voice between the first user terminal and the first object terminal, and the step of obtaining a second call voice between the second user terminal and the second object terminal, the method comprises:

acquiring the telephone numbers and the telephone number activation time of the first object end and the second object end;

if the telephone numbers of the first object end and the second object end do not belong to a preset abnormal database, judging whether the activation time of the telephone numbers is later than a preset time point;

and if the activation time of the telephone number is later than a preset time point, generating a call voice acquisition instruction, wherein the call voice acquisition instruction is used for instructing to acquire a first call voice and a second call voice.

8. An apparatus for determining an abnormal call object, applied to a server, includes:

the device comprises a call voice acquisition unit and a call voice processing unit, wherein the call voice acquisition unit is used for acquiring a first call voice and a second call voice, the first call voice is the call voice of a first user end and a first object end, and the second call voice is the call voice of a second user end and a second object end;

a communication channel constructing unit, configured to construct a communication channel if the first sound data and the second sound data are both electronic tones, where the communication channel is used to connect the first object end and the second object end;

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.