CN117354419A - Voice telephone recognition processing method and device, electronic equipment and storage medium - Google Patents

Voice telephone recognition processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117354419A
CN117354419A CN202311310806.6A CN202311310806A CN117354419A CN 117354419 A CN117354419 A CN 117354419A CN 202311310806 A CN202311310806 A CN 202311310806A CN 117354419 A CN117354419 A CN 117354419A
Authority
CN
China
Prior art keywords
voiceprint
voice
telephone
calling
recognition processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311310806.6A
Other languages
Chinese (zh)
Inventor
田波
田春平
吴永梅
文颢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN202311310806.6A priority Critical patent/CN117354419A/en
Publication of CN117354419A publication Critical patent/CN117354419A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/22Arrangements for supervision, monitoring or testing
    • H04M3/2281Call monitoring, e.g. for law enforcement purposes; Call tracing; Detection or prevention of malicious calls
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/10Architectures or entities
    • H04L65/1016IP multimedia subsystem [IMS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/65Network streaming protocols, e.g. real-time transport protocol [RTP] or real-time control protocol [RTCP]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Technology Law (AREA)
  • Computer Security & Cryptography (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a voice telephone recognition processing method, a voice telephone recognition processing device, electronic equipment and a storage medium, wherein the voice telephone recognition processing method comprises the following steps: acquiring voice flow of an IMS core network, and further analyzing the voice flow to obtain voice data; the voice data comprises a plurality of calling call flow packets and telephone information matched with each calling call flow packet; restoring according to the calling call flow packet to obtain a voice file, and extracting voiceprint characteristics of the voice file to obtain a full-quantity main voiceprint; the full-scale main voiceprint comprises a plurality of voiceprint features; obtaining a voiceprint black sample, and matching from the total main voiceprint according to the voiceprint black sample to obtain target voiceprint characteristics; and stopping and early warning processing is carried out according to the telephone information of the pairing of the calling call flow packets corresponding to the target voiceprint characteristics. The invention can accurately carry out voice telephone recognition processing, achieves the aim of effectively reducing telephone fraud cases through coping processing, and can be widely applied to the technical field of data processing.

Description

Voice telephone recognition processing method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and apparatus for voice phone recognition processing, an electronic device, and a storage medium.
Background
Telecommunication phishing is mostly induced by voice telephony. However, fraud molecules often provide a call landing scheme by a black gray production team in a remote manner, and particularly the fraud molecules use a cloud SIM (machine card separation scene) technology, so that separation of a mobile phone card and an access network is realized through special equipment such as GOIP and SIMBANK, and a lot of barriers are brought to fraud behavior identification of an operation Shang Duiyu voice call.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems in the related art to some extent. Therefore, the invention provides a voice telephone recognition processing method, a voice telephone recognition processing device, electronic equipment and a storage medium, which can accurately perform voice telephone recognition processing.
In one aspect, an embodiment of the present invention provides a voice telephone recognition processing method, including:
acquiring voice flow of an IMS core network, and further analyzing the voice flow to obtain voice data; the voice data comprises a plurality of calling call flow packets and telephone information matched with each calling call flow packet;
Restoring according to the calling call flow packet to obtain a voice file, and extracting voiceprint characteristics of the voice file to obtain a full-quantity main voiceprint; the full-scale main voiceprint comprises a plurality of voiceprint features;
obtaining a voiceprint black sample, and matching from the total main voiceprint according to the voiceprint black sample to obtain target voiceprint characteristics;
and stopping and early warning processing is carried out according to the telephone information of the pairing of the calling call flow packets corresponding to the target voiceprint characteristics.
Optionally, acquiring voice traffic of the IMS core network includes:
the method comprises the steps of converging the light splitting flow of each optical link of an IMS core network through a session boundary controller, and further obtaining all flow of a Gm interface and an RTP protocol of the session boundary controller as voice flow through a loading strategy.
Optionally, the voice traffic includes Gm interface traffic and RTP protocol traffic; analyzing the voice flow to obtain voice data, including:
resolving RTP protocol flow, and further extracting based on a source IP address to obtain a calling call flow packet;
analyzing the signaling surface session initiation protocol of the Gm interface flow to obtain the call time, the calling mobile phone number, the called mobile phone number and the call duration, and further finishing to obtain the telephone information corresponding to the calling call flow packet.
Optionally, extracting voiceprint features of the voice file to obtain a full-scale main voiceprint, including:
inputting the voice file into a preset sound processing model to extract voiceprint characteristics, and obtaining the voiceprint characteristics;
and according to the voiceprint characteristics corresponding to all the voice files, finishing to obtain the full-quantity main voiceprints.
Optionally, matching the target voiceprint features from the full-scale main voiceprint according to the voiceprint black samples includes:
identifying each voiceprint feature in the full-scale main voiceprint one by one according to the voiceprint black sample through a distortion judgment criterion;
and obtaining the target voiceprint characteristic based on the recognition result of the distortion judgment criterion.
Optionally, the telephone information includes a calling mobile phone number and a called mobile phone number; according to the telephone information of the calling call flow packet pairing corresponding to the target voiceprint characteristics, stopping and early warning processing is carried out, and the method comprises the following steps:
and stopping the calling mobile phone number according to the telephone information matched with the calling call flow packet corresponding to the target voiceprint feature, and sending early warning information to the called mobile phone number.
Optionally, the method further comprises:
and according to the calling mobile phone number, acquiring network access information of the target object by combining with the signaling message of the operator.
In another aspect, an embodiment of the present invention provides a voice telephone recognition processing device, including:
the first module is used for acquiring the voice flow of the IMS core network, and further analyzing the voice flow to obtain voice data; the voice data comprises a plurality of calling call flow packets and telephone information matched with each calling call flow packet;
the second module is used for recovering according to the calling call flow packet to obtain a voice file, and extracting voiceprint characteristics of the voice file to obtain a total number of main voiceprints; the full-scale main voiceprint comprises a plurality of voiceprint features;
the third module is used for obtaining a voiceprint black sample, and matching the voiceprint black sample from the total main voiceprint to obtain target voiceprint characteristics;
and the fourth module is used for stopping and early warning according to the telephone information matched with the calling call flow packet corresponding to the target voiceprint characteristic.
Optionally, the first module is specifically configured to:
the method comprises the steps of converging the light splitting flow of each optical link of an IMS core network through a session boundary controller, and further obtaining all flow of a Gm interface and an RTP protocol of the session boundary controller as voice flow through a loading strategy.
Optionally, the voice traffic includes Gm interface traffic and RTP protocol traffic; the first module is on the other hand specifically intended for:
Resolving RTP protocol flow, and further extracting based on a source IP address to obtain a calling call flow packet;
analyzing the signaling surface session initiation protocol of the Gm interface flow to obtain the call time, the calling mobile phone number, the called mobile phone number and the call duration, and further finishing to obtain the telephone information corresponding to the calling call flow packet.
Optionally, the second module is specifically configured to:
inputting the voice file into a preset sound processing model to extract voiceprint characteristics, and obtaining the voiceprint characteristics;
and according to the voiceprint characteristics corresponding to all the voice files, finishing to obtain the full-quantity main voiceprints.
Optionally, the third module is specifically configured to:
identifying each voiceprint feature in the full-scale main voiceprint one by one according to the voiceprint black sample through a distortion judgment criterion;
and obtaining the target voiceprint characteristic based on the recognition result of the distortion judgment criterion.
Optionally, the telephone information includes a calling mobile phone number and a called mobile phone number; the fourth module is specifically configured to:
and stopping the calling mobile phone number according to the telephone information matched with the calling call flow packet corresponding to the target voiceprint feature, and sending early warning information to the called mobile phone number.
Optionally, the apparatus further comprises:
And a fifth module, configured to obtain network access information of the target object according to the calling mobile phone number and in combination with the operator signaling message.
In another aspect, an embodiment of the present invention provides an electronic device, including: a processor and a memory; the memory is used for storing programs; the processor executes the program to realize the voice telephone recognition processing method.
In another aspect, an embodiment of the present invention provides a computer storage medium in which a processor-executable program is stored, which when executed by a processor is configured to implement the above-described voice telephone recognition processing method.
According to the embodiment of the invention, the voice flow of the IMS core network is acquired, and then the voice flow is analyzed and processed to obtain voice data; the voice data comprises a plurality of calling call flow packets and telephone information matched with each calling call flow packet; restoring according to the calling call flow packet to obtain a voice file, and extracting voiceprint characteristics of the voice file to obtain a full-quantity main voiceprint; the full-scale main voiceprint comprises a plurality of voiceprint features; obtaining a voiceprint black sample, and matching from the total main voiceprint according to the voiceprint black sample to obtain target voiceprint characteristics; and stopping and early warning processing is carried out according to the telephone information of the pairing of the calling call flow packets corresponding to the target voiceprint characteristics. The embodiment of the invention is used for indirectly verifying the identity of a caller by the voiceprint feature matching technology, so that corresponding coping is realized.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate and do not limit the invention.
FIG. 1 is a schematic diagram of an implementation environment for performing voice telephony recognition processing according to an embodiment of the present invention;
fig. 2 is a flow chart of a voice telephone recognition processing method according to an embodiment of the present invention;
fig. 3 is a schematic flow chart of obtaining voice traffic of an IMS core network according to an embodiment of the present invention;
fig. 4 is a flowchart of a method for identifying and processing voice telephone in combination with determining network access information according to an embodiment of the present invention;
fig. 5 is an overall flow chart of a voice telephone recognition processing method according to an embodiment of the present invention;
FIG. 6 is a schematic flow diagram of voiceprint feature extraction and matching according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a voice phone recognition processing device according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of a system architecture of a voice phone recognition process according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;
Fig. 10 is a block diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
It should be noted that although functional block diagrams are depicted as block diagrams, and logical sequences are shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than the block diagrams in the system. The terms first/S100, second/S200, and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
It can be understood that the voice telephone recognition processing method provided by the embodiment of the invention can be applied to any computer equipment with data processing and computing capabilities, and the computer equipment can be various terminals or servers. When the computer device in the embodiment is a server, the server is an independent physical server, or is a server cluster or a distributed system formed by a plurality of physical servers, or is a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network ), basic cloud computing services such as big data and artificial intelligence platforms, and the like. Alternatively, the terminal is a smart phone, a tablet computer, a notebook computer, a desktop computer, or the like, but is not limited thereto.
FIG. 1 is a schematic view of an embodiment of the invention. Referring to fig. 1, the implementation environment includes at least one terminal 102 and a server 101. The terminal 102 and the server 101 can be connected through a network in a wireless or wired mode to complete data transmission and exchange.
The server 101 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), basic cloud computing services such as big data and artificial intelligent platforms, and the like.
In addition, server 101 may also be a node server in a blockchain network. The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like.
The terminal 102 may be, but is not limited to, a smart phone, tablet, notebook, desktop, smart box, smart watch, etc. The terminal 102 and the server 101 may be directly or indirectly connected through wired or wireless communication, which is not limited in this embodiment of the present invention.
The embodiment of the present invention provides a voice phone recognition processing method based on the implementation environment shown in fig. 1, and the following description will take an example that the voice phone recognition processing method is applied to the server 101 as an application, and it will be understood that the voice phone recognition processing method may also be applied to the terminal 102.
Referring to fig. 2, fig. 2 is a flowchart of a voice phone recognition processing method applied to a server according to an embodiment of the present invention, where an execution body of the voice phone recognition processing method may be any one of the foregoing computer devices (including a server or a terminal). Referring to fig. 2, the method includes the steps of:
s100, acquiring voice traffic of an IMS core network, and further analyzing the voice traffic to obtain voice data;
it should be noted that, the voice data includes several calling call traffic packets and telephone information paired with each calling call traffic packet; in some embodiments, obtaining voice traffic of an IMS core network includes: the method comprises the steps of converging the light splitting flow of each optical link of an IMS core network through a session boundary controller, and further obtaining all flow of a Gm interface and an RTP protocol of the session boundary controller as voice flow through a loading strategy.
In some embodiments, IMS voice traffic acquisition is performed first: the link beam splitting is realized by arranging a beam splitter and an optical amplifier on an optical link at one side of an SBC network element of an IMS core network, and the corresponding system consists of a main link optical beam splitter, a duplicate link optical amplifier and a duplicate link optical beam splitter. Wherein the main link optical splitter adopts 1:2, the light splitting ratio of each branch is 2: and 8, the replica link optical amplifier regenerates and amplifies the signal. And then IMS voice flow aggregation is carried out: and acquiring the split flow through the convergence and distribution equipment, and acquiring all flows of the signaling plane SIP and all flows of the media plane RTP of the Gm interface of the SBC network element through a loading strategy. Among them, YD/T1980 prescribes the definition of M-series interfaces and Gm interfaces in IMS systems of mobile communication networks. The Real-time transport protocol (Real-time Transport Protocol or RTP in short) is a network transport protocol published by the IETF's multimedia transport working group 1996 in RFC 1889.
It should be further noted that IMS (IP Multimedia Subsystem) is an IP multimedia system, which is a new multimedia service form, and can meet the requirements of more novel and diversified multimedia services of the terminal clients. IMS is considered as a core technology of the next generation network, and is also an important way to solve the problem of integration of mobile and fixed networks and introduce differentiated services such as triple integration of voice, data and video. However, most of the global IMS networks are in the primary stage, and application methods are under discussion in the industry. IMS is a network architecture for realizing a large convergence scheme of a next generation communication network (NGN) proposed by a lambert, and the advancement of the lambert IMS convergence solution is determined by various patent technologies of a service enhancement layer, which are innovations of bell laboratories in the IMS key field. IMS solutions have a great number of advantages over soft-switched solutions, and are taking on an increasingly important role in the NGN market. By 2003, the international authoritative standards organization has generally converged IMS as the core standard for NGN networks, business and technical innovations. For large-scale commercial deployment, IMS is mature enough from the technology itself. The IMS not only can realize the initial VoIP service, but also can manage network resources, user resources and application resources more effectively, and improve the intelligence of the network, so that the user can cross various networks and use various terminals to feel the converged communication experience. IMS is used as a communication architecture, a brand new telecommunication business mode is created, and the development space of the whole information industry is expanded. IMS is essentially a network architecture. The technology is rooted in the mobile domain, originally defined by 3GPP for mobile networks, whereas under the framework of NGN, IMS should support both fixed and mobile access.
Session border controllers (Session Border Controller, SBC for short) are IP service gateways in VoIP communications, commonly applied in operator IMS and enterprise VoIP, the SBC being capable of supporting both VoIP session signaling agents and media agents. As an important network element in an NGN/IMS system, an SBC solves the problems of NAT traversal, security, qoS, intercommunication and the like in the calling service of an operator. However, with the extension of the IMS service of the operator to the enterprise, the service connotation of the VoIP of the enterprise is also more and more enriched (such as converged communication, call center, etc.), and the SBC is gradually applied to the VoIP communication network of the enterprise. The SBC is a VoIP session control product based on the SIP protocol, can be deployed at key nodes of a VoIP communication network, and can realize services such as signaling intercommunication, flow control, NAT traversal, call encryption and decryption, illegal access interception, quality of service (QoS) and the like in VoIP service through analyzing and processing session Signaling (SIP) and media (RTP) at two ends of the nodes. The SBC helps enterprises and VoIP service providers to efficiently and stably establish seamless, safe and high-quality VoIP communication connection, and plays a key role in meeting the requirements of enterprise VoIP network access (IMS), solving abnormal intercommunication, protecting communication safety, expanding networking architecture and the like.
In some embodiments, as shown in fig. 3, the voice traffic includes Gm interface traffic and RTP protocol traffic; analyzing the voice flow to obtain voice data, including: s101, resolving RTP protocol flow, and further extracting a calling call flow packet based on a source IP address; s102, analyzing a signaling surface session initiation protocol of the Gm interface flow to obtain call time, a calling mobile phone number, a called mobile phone number and call duration, and further finishing to obtain telephone information corresponding to a calling call flow packet.
In some embodiments, the signaling plane SIP protocol in the Gm interface traffic obtained in the previous step is parsed to obtain call records including call time, calling phone number, called phone number, call duration, etc.; and then the calling call flow packet is extracted based on the source IP address by analyzing the media plane RTP protocol.
SIP (Session initialization Protocol, session initiation protocol) is a multimedia communication protocol formulated by IETF (Internet Engineering Task Force ), among others. It is a text-based application layer control protocol for creating, modifying and releasing sessions of one or more participants. SIP is an IP voice session control protocol derived from the internet, and has the characteristics of flexibility, easy implementation, convenient expansion, and the like.
S200, recovering according to the calling call flow packet to obtain a voice file, and extracting voiceprint characteristics of the voice file to obtain a full-scale main voiceprint;
it should be noted that the full-scale master voiceprint includes a plurality of voiceprint features; in some embodiments, voiceprint feature extraction is performed on a voice file to obtain a full-scale master voiceprint, including: inputting the voice file into a preset sound processing model to extract voiceprint characteristics, and obtaining the voiceprint characteristics; and according to the voiceprint characteristics corresponding to all the voice files, finishing to obtain the full-quantity main voiceprints.
In some embodiments, by reverting the traffic packets to a voice file and extracting voiceprint features, illustratively: and restoring the calling call flow packet obtained in the previous step into a voice file, wherein the voice file format is an AWB format. And then extracting voiceprint features from the voice file through the processes of feature extraction, model training and the like.
S300, obtaining a voiceprint black sample, and matching from the total main voiceprint according to the voiceprint black sample to obtain target voiceprint characteristics;
it should be noted that, in some embodiments, the matching the target voiceprint feature from the full-scale main voiceprint according to the voiceprint black sample includes: identifying each voiceprint feature in the full-scale main voiceprint one by one according to the voiceprint black sample through a distortion judgment criterion; and obtaining the target voiceprint characteristic based on the recognition result of the distortion judgment criterion.
In some embodiments, based on the case-related fraud calls periodically notified by the supervision unit, the call record in the phone information can be indexed to the corresponding calling call traffic packet, and the calling voiceprint feature of the call (the extraction step is consistent with the principle of step S200) is extracted as the voiceprint black sample of the target person (target object), so as to construct a voiceprint black sample library. And identifying other fraud calls based on the target personnel voiceprint black sample matching, and extracting hit call records based on the fact that the constructed target personnel voiceprint black sample library is matched with the obtained total main voiceprints again.
In some embodiments, for the overall flow of the voiceprint feature extraction related to step S200 and the voiceprint matching related to step S300, the implementation flow may be implemented by a voice recognition technology, for example, as follows:
the voice recognition technology is to let the intelligent device understand the voice of human beings. It is a science involving the intersection of multiple disciplines such as digital signal processing, artificial intelligence, linguistics, mathematical statistics, acoustics, emotions, psychology, etc. The techniques may provide a number of applications such as automated customer service, automated speech translation, command control, voice verification code, etc. In recent years, with the rise of artificial intelligence, the speech recognition technology has made a great breakthrough in theory and application, and starts to move from the laboratory to the market, and has gradually moved into our daily lives. Speech recognition is now used in many fields, mainly including speech recognition dictation, speech paging and answering platforms, autonomous advertising platforms, intelligent customer service, etc.
Principle of speech recognition:
the essence of speech recognition is a pattern recognition based on speech feature parameters, i.e. through learning, the system can classify the input speech according to a certain pattern, and then find out the best matching result according to the decision criteria. Currently, the pattern matching principle has been applied to most speech recognition systems.
The general pattern recognition comprises basic modules such as preprocessing, feature extraction, pattern matching and the like. The input speech is first pre-processed, wherein the pre-processing includes framing, windowing, pre-emphasis, etc. And secondly, feature extraction, so that the selection of proper feature parameters is particularly important. Common characteristic parameters include: pitch period, formants, short-time average energy or amplitude, linear Prediction Coefficients (LPC), perceptual weighting prediction coefficients (PLP), short-time average zero-crossing rate, linear Prediction Cepstrum Coefficients (LPCC), autocorrelation functions, mel cepstrum coefficients (MFCC), wavelet transform coefficients, empirical mode decomposition coefficients (EMD), gamma-pass filter coefficients (GFCC), and the like. When the actual recognition is carried out, templates are generated for the test voice according to the training process, and finally the recognition is carried out according to the distortion judgment criterion. Common distortion decision criteria include euclidean distance, covariance matrix and bayesian distance. After the voiceprint features are extracted, training and recognition can be performed on the voiceprint features through a specific audio event recognition algorithm of a Gaussian Mixture Model (GMM) or a Support Vector Machine (SVM).
S400, stopping and early warning processing is carried out according to the telephone information matched with the calling call flow packet corresponding to the target voiceprint feature.
It should be noted that, the telephone information includes a calling mobile phone number and a called mobile phone number; in some embodiments, step S400 may include: and stopping the calling mobile phone number according to the telephone information matched with the calling call flow packet corresponding to the target voiceprint feature, and sending early warning information to the called mobile phone number.
In some embodiments, the shutdown treatment may be performed for the calling number, i.e. the fraud number, in the call record identified in the foregoing step, and the early warning may be performed for the called number, i.e. the victim number.
In some embodiments, as shown in fig. 4, the method may further include: s500, according to the calling mobile phone number, the network access information of the target object is obtained by combining the operator signaling message. And then the operations such as identification and auxiliary positioning of the target object can be realized based on the network access information.
In some embodiments, the GOIP foster point location may be further obtained: the network access information of the mobile phone number can be obtained through the mobile phone number used by the target person and the signaling message of the operator, and the mobile phone number mainly comprises ECGI and the like, and can be positioned to a specific geographic cell through the ECGI, so that GOIP fraud-related location is realized.
For the purpose of illustrating the principles of the present invention in detail, the following general flow chart of the present invention is described in connection with certain specific embodiments, and it is to be understood that the following is illustrative of the principles of the present invention and is not to be construed as limiting the present invention.
In some embodiments, as shown in fig. 5, the overall flow of the voice phone recognition processing according to the embodiment of the present invention is as follows:
step one: IMS voice flow acquisition: the link beam splitting is realized by arranging a beam splitter and an optical amplifier on an optical link at one side of an SBC network element of an IMS core network, and the corresponding system consists of a main link optical beam splitter, a duplicate link optical amplifier and a duplicate link optical beam splitter. Wherein the main link optical splitter adopts 1:2, the light splitting ratio of each branch is 2: and 8, the replica link optical amplifier regenerates and amplifies the signal.
Step two: IMS voice traffic aggregation: and (3) collecting the split flow in the step one through convergence and distribution equipment, and obtaining all flows of the signaling plane SIP and all flows of the media plane RTP of the Gm interface of the SBC network element through a loading strategy.
Step three: IMS voice flow analysis: analyzing the signaling plane SIP protocol in the Gm interface flow obtained in the second step to obtain call records including call time, calling mobile phone number, called mobile phone number, call duration and the like; and extracting calling call flow packets based on the source IP address by analyzing the media plane RTP protocol.
Step four: the flow packet is restored into a voice file and voiceprint characteristics are extracted: and D, restoring the calling call flow packet obtained in the step three into a voice file, wherein the voice file format is an AWB format. The "model training" section of fig. 6, the voice file is then used to extract voiceprint features through feature extraction, model training, etc.
Step five: building a target personnel voiceprint black sample library: and (3) indexing corresponding call records in a call record table established in the step (III) based on the case-related fraud call periodically notified by the supervision unit, and simultaneously extracting the voice print characteristics of the calling party of the call as a voice print black sample library of the target person.
Step six: identifying other fraud-related calls based on target person voiceprint black sample library matching: as shown in the "model identification" part of fig. 6, the hit call record is extracted based on the target person voiceprint black sample library constructed in the step five to be matched with the total number of main voiceprints obtained in the step four again. And carrying out shut-down treatment on the calling number, namely the fraud number, in the hit call record, and carrying out early warning on the called number, namely the victim number.
Step seven: acquiring a GOIP (gate-related item) position: the network access information of the mobile phone number can be obtained through the mobile phone number used by the target person and the signaling message of the operator, and the mobile phone number mainly comprises ECGI and the like, and can be positioned to a specific geographic cell through the ECGI, so that GOIP fraud-related location is realized.
Step eight: identifying a fraud call type: and (3) indexing the voice file and converting the voice into a text by adopting an ASR technology based on the target call record obtained in the step (six), extracting keywords by adopting an NLP technology and matching the keywords of the fraud telephone types (such as credit, bank card, verification code, loan, refund and the like), thereby realizing the identification of the fraud types.
In summary, the invention aims to solve the problem of low accuracy of identifying fraud mobile phone numbers caused by anti-fraud models of operators based on special equipment such as GOIP, SIMBANK and the like by adopting cloud SIM technology through an IMS voice core network of the operators by target personnel engaged in telephone fraud. The invention adopts the voiceprint feature matching technology, firstly collects the total calling call flow packets in the IMS voice core network based on the flow collection technology, then adopts the flow restoration technology to restore the flow packets into a playable voice file, then carries out voiceprint feature extraction on the voice file, and simultaneously searches the fraud call voiceprint feature based on the sent fraud call reported to the operator by the supervision department and marks the fraud call voiceprint feature as a telephone fraud target person, then distributes and controls the target person voiceprint feature in the IMS voice core network, and the call hitting the voiceprint is the fraud call, and can adopt measures such as cutting off the fraud call, carrying out early warning on the victim number of the fraud call, and the like, thereby achieving the purpose of effectively reducing the occurrence of telephone fraud cases. The voiceprint feature matching technology is a voice-based biometric technology and can be used for verifying the identity of a speaker. The principle is that the identity of a speaker is identified and distinguished by the characteristics of the frequency, intonation, speed of speech and the like of sound. Voiceprint feature matching techniques are widely used, such as user authentication, voice transfer, and the like. In an anti-fraud scheme, voiceprint feature matching techniques may be used to verify the identity of a caller, avoiding the occurrence of fraud events.
In the prior art, fraud calls are identified mainly through a telecom operator big data behavior analysis model, the scheme is effective for the fraud calls to occur in a relatively fixed position (a machine-card integrated scene), but the identification accuracy of the model is reduced linearly under the condition that target personnel widely adopt a cloud SIM mode (a machine-card separation scene). In the invention, the target personnel are identified in the IMS voice core network of the operator by a main voice call voice print mode, and the voice print has uniqueness, so that the data advantage of the operator can be fully exerted, the target personnel and the group partner are identified, and further telecommunication resources such as fraud mobile phone numbers and the like used by the target personnel are determined. Compared with the prior art, the invention at least has the following beneficial effects:
1. the target person identification accuracy is high, the acoustic characteristics and the behavioral characteristics of the person in the speaking process are almost unique, and even if the person imitates the acoustic characteristics and the behavioral characteristics, the most essential pronunciation characteristics and the channel characteristics of the speaker are difficult to change. The voiceprint of each person is irrelevant to the speaking content and language, and the identity of the target person can be easily identified even if the target person dialects with the person by using dialect or local language;
2. through natural language analysis of the voice of the target person, the speaking key words of the target person can be extracted, and fraud techniques and fraud types are identified, so that pertinence of warning and discouraging of victims is improved;
3. By acquiring the mobile phone number used by the target person, the ECGI of the mobile phone number when the network access behavior occurs is extracted, so that the GOIP equipment is positioned with high precision, and technical support is provided for the supervision department to strike the payment.
On the other hand, as shown in fig. 7, an embodiment of the present invention provides a voice telephone recognition processing device 700, including: a first module 710, configured to obtain a voice traffic of the IMS core network, and further analyze the voice traffic to obtain voice data; the voice data comprises a plurality of calling call flow packets and telephone information matched with each calling call flow packet; a second module 720, configured to restore the calling traffic packet to obtain a voice file, and extract voiceprint features of the voice file to obtain a full amount of main voiceprints; the full-scale main voiceprint comprises a plurality of voiceprint features; a third module 730, configured to obtain a voiceprint black sample, and obtain a target voiceprint feature from the full-scale main voiceprints according to the voiceprint black sample; and a fourth module 740, configured to perform shutdown and early warning processing according to the phone information paired with the calling call traffic packet corresponding to the target voiceprint feature.
In some embodiments, the apparatus may further include: and a fifth module, configured to obtain network access information of the target object according to the calling mobile phone number and in combination with the operator signaling message.
In some embodiments, the apparatus of the present invention may be applied to a system architecture as shown in fig. 8 to implement a voice phone recognition process, where, as shown in fig. 8, the system architecture may exist in a B/S form, and the background service is designed as a service architecture of a distributed independent service. The back-end services communicate with each other through a Restful interface and an RPC (remote procedure call). Development is based on multiple languages, including Java, C/C++, javaScript, HTML, and the like. Through the isolation between the servers, the design effect of low coupling is achieved, and the complexity is reduced.
The system architecture can comprise five large modules, namely a data acquisition layer, an associated synthesis layer, a voiceprint recognition layer, a system application layer, system management and the like, wherein the key modules comprise the following components:
the session log generation module is responsible for associating an IMS core network signaling surface with a service surface to generate a voice session log, and comprises key elements such as call starting time, calling and called mobile phone numbers, call duration and the like;
and a voice reduction module: the module is responsible for restoring voice packets contained in the RTP protocol of the voice media surface into a voice file capable of playing and reading, and the AWB format is adopted by default;
Voiceprint acquisition module: the voice print data acquisition and storage method is used for acquiring and storing voice print data of clients and generating a voice print model. The module needs to be able to guarantee the security and integrity of the data to avoid data leakage and tampering.
Identity authentication module: for verifying the identity information of the client to ensure that the voiceprint characteristics of the client match with their identity information. The module needs to have a high-precision voiceprint recognition algorithm to improve the accuracy and safety of authentication.
And the real-time fraud detection module: the method is used for monitoring and identifying the fraudulent conduct in the conversation process in real time. The module needs to be able to identify various fraudulent activities, such as spurious promotions, information leakage, fraudulent transactions, etc., and to alert or take appropriate action in time.
Risk assessment module: for evaluating the risk of the customer based on the voiceprint characteristics of the customer and the fraud detection results. The module needs to be able to take corresponding measures, e.g. to improve security verification, etc., according to different risk levels.
Data management and update module: the voice print database is used for managing and updating the voice print database regularly so as to ensure the accuracy and the reliability of the voice print recognition system. The module needs to have an efficient data management and update algorithm to ensure timely updating and maintenance of the voiceprint database.
A personnel portrait module: and (3) completing the sound print representation of the fraud molecules, and comprehensively representing the fraud molecules by combining characteristic analysis of sound print characteristics, gender, age, penetration region and the like of the fraud molecules.
The content of the method embodiment of the invention is suitable for the device embodiment, the specific function of the device embodiment is the same as that of the method embodiment, and the achieved beneficial effects are the same as those of the method.
On the other hand, as shown in fig. 9, an embodiment of the present invention further provides an electronic device 900, which includes at least one processor 910, and at least one memory 920 for storing at least one program; take a processor 910 and a memory 920 as examples.
The processor 910 and the memory 920 may be connected by a bus or other means.
Memory 920 acts as a non-transitory computer readable storage medium that may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, memory 920 may include high-speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some implementations, the memory 920 may optionally include memory located remotely from the processor, which may be connected to the device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The above described embodiments of the electronic device are merely illustrative, wherein the units described as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In particular, FIG. 10 schematically shows a block diagram of a computer system for implementing an electronic device of an embodiment of the invention.
It should be noted that, the computer system 1000 of the electronic device shown in fig. 10 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present invention.
As shown in fig. 10, the computer system 1000 includes a central processing unit 1001 (Central Processing Unit, CPU) which can execute various appropriate actions and processes according to a program stored in a Read-Only Memory 1002 (ROM) or a program loaded from a storage section 1008 into a random access Memory 1003 (Random Access Memory, RAM). In the random access memory 1003, various programs and data necessary for the system operation are also stored. The cpu 1001, the rom 1002, and the ram 1003 are connected to each other via a bus 1004. An Input/Output interface 1005 (i.e., an I/O interface) is also connected to bus 1004.
The following components are connected to the input/output interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output portion 1007 including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and a speaker; a storage portion 1008 including a hard disk or the like; and a communication section 1009 including a network interface card such as a local area network card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The drive 1010 is also connected to the input/output interface 1005 as needed. A removable medium 1011, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed as needed in the drive 1010, so that a computer program read out therefrom is installed as needed in the storage section 1008.
In particular, the processes described in the various method flowcharts may be implemented as computer software programs according to embodiments of the invention. For example, embodiments of the present invention include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 1009, and/or installed from the removable medium 1011. The computer programs, when executed by the central processor 1001, perform the various functions defined in the system of the present invention.
It should be noted that, the computer readable medium shown in the embodiments of the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The content of the method embodiment of the invention is suitable for the system embodiment, the specific function of the system embodiment is the same as that of the method embodiment, and the achieved beneficial effects are the same as those of the method.
Another aspect of the embodiments of the present invention also provides a computer-readable storage medium storing a program that is executed by a processor to implement the foregoing method.
The content of the method embodiment of the invention is applicable to the computer readable storage medium embodiment, the functions of the computer readable storage medium embodiment are the same as those of the method embodiment, and the achieved beneficial effects are the same as those of the method.
Embodiments of the present invention also disclose a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions may be read from a computer-readable storage medium by a processor of a computer device, and executed by the processor, to cause the computer device to perform the foregoing method.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It should be noted that although in the above detailed description several modules of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the invention. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present invention may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a touch terminal, or a network device, etc.) to perform the method according to the embodiments of the present invention.
In some alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flowcharts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed, and in which sub-operations described as part of a larger operation are performed independently.
Furthermore, while the invention is described in the context of functional modules, it should be appreciated that, unless otherwise indicated, one or more of the functions and/or features may be integrated in a single physical device and/or software module or may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be apparent to those skilled in the art from consideration of their attributes, functions and internal relationships. Accordingly, one of ordinary skill in the art can implement the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative and are not intended to be limiting upon the scope of the invention, which is to be defined in the appended claims and their full scope of equivalents.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method of the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution apparatus, device, or apparatus, such as a computer-based apparatus, processor-containing apparatus, or other apparatus that can fetch the instructions from the instruction execution apparatus, device, or apparatus and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution apparatus, device, or apparatus.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium may even be paper or other suitable medium upon which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution device. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiment of the present invention has been described in detail, the present invention is not limited to the embodiments, and those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present invention, and the equivalent modifications or substitutions are intended to be included in the scope of the present invention as defined in the appended claims.

Claims (10)

1. A voice telephone recognition processing method, comprising:
acquiring voice flow of an IMS core network, and further analyzing the voice flow to obtain voice data; the voice data comprises a plurality of calling call flow packets and telephone information matched with each calling call flow packet;
restoring according to the calling call flow packet to obtain a voice file, and extracting voiceprint characteristics of the voice file to obtain a full-scale main voiceprint; the full-scale master voiceprint comprises a plurality of voiceprint features;
Obtaining a voiceprint black sample, and matching the voiceprint black sample from the full-scale main voiceprint to obtain target voiceprint characteristics;
and stopping and early warning processing is carried out according to the telephone information matched with the calling call flow packet corresponding to the target voiceprint feature.
2. The voice telephony recognition processing method of claim 1, wherein the obtaining voice traffic of the IMS core network comprises:
and converging the light splitting flow of each optical link of the IMS core network through a session boundary controller, and further acquiring all the flows of a Gm interface and an RTP (real-time protocol) of the session boundary controller as the voice flow through a loading strategy.
3. The voice telephony recognition processing method of claim 1, wherein the voice traffic comprises Gm interface traffic and RTP protocol traffic; the analyzing the voice flow to obtain voice data comprises the following steps:
analyzing the RTP protocol flow, and further extracting a calling call flow packet based on a source IP address;
analyzing the signaling surface session initiation protocol of the Gm interface flow to obtain the call time, the calling mobile phone number, the called mobile phone number and the call time length, and further finishing to obtain the telephone information corresponding to the calling call flow packet.
4. The voice telephone recognition processing method according to claim 1, wherein the performing voiceprint feature extraction on the voice file to obtain a full-scale main voiceprint comprises:
inputting the voice file into a preset sound processing model to extract voiceprint characteristics, and obtaining voiceprint characteristics;
and according to the voiceprint characteristics corresponding to all the voice files, finishing to obtain a full-quantity main voiceprints.
5. The voice telephone recognition processing method according to claim 1, wherein the matching the target voiceprint feature from the full-scale main voiceprint according to the voiceprint black sample includes:
identifying each voiceprint feature in the full-scale main voiceprint one by one according to the voiceprint black sample through a distortion judgment criterion;
and obtaining the target voiceprint characteristic based on the recognition result of the distortion judgment criterion.
6. The voice telephone recognition processing method according to claim 1, wherein the telephone information includes a calling mobile phone number and a called mobile phone number; and stopping and early warning processing is performed according to the telephone information matched with the calling call flow packet corresponding to the target voiceprint feature, and the method comprises the following steps:
And stopping the calling mobile phone number according to the telephone information matched with the calling call flow packet corresponding to the target voiceprint feature, and sending early warning information to the called mobile phone number.
7. The voice telephone recognition processing method according to claim 6, characterized in that the method further comprises:
and according to the calling mobile phone number, acquiring network access information of the target object by combining with an operator signaling message.
8. A voice telephone recognition processing apparatus, comprising:
the first module is used for acquiring the voice flow of the IMS core network, and further analyzing the voice flow to obtain voice data; the voice data comprises a plurality of calling call flow packets and telephone information matched with each calling call flow packet;
the second module is used for recovering the calling call flow packet to obtain a voice file, and extracting voiceprint characteristics of the voice file to obtain a full-quantity main voiceprint; the full-scale master voiceprint comprises a plurality of voiceprint features;
the third module is used for obtaining a voiceprint black sample, and matching the voiceprint black sample from the full-scale main voiceprint to obtain target voiceprint characteristics;
And a fourth module, configured to perform shutdown and early warning processing according to the phone information paired with the calling call traffic packet corresponding to the target voiceprint feature.
9. An electronic device comprising a processor and a memory;
the memory is used for storing programs;
the processor executing the program implements the method of any one of claims 1 to 7.
10. A computer storage medium in which a processor executable program is stored, characterized in that the processor executable program is for implementing the method according to any one of claims 1 to 7 when being executed by the processor.
CN202311310806.6A 2023-10-10 2023-10-10 Voice telephone recognition processing method and device, electronic equipment and storage medium Pending CN117354419A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311310806.6A CN117354419A (en) 2023-10-10 2023-10-10 Voice telephone recognition processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311310806.6A CN117354419A (en) 2023-10-10 2023-10-10 Voice telephone recognition processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117354419A true CN117354419A (en) 2024-01-05

Family

ID=89360670

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311310806.6A Pending CN117354419A (en) 2023-10-10 2023-10-10 Voice telephone recognition processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117354419A (en)

Similar Documents

Publication Publication Date Title
US10249304B2 (en) Method and system for using conversational biometrics and speaker identification/verification to filter voice streams
US9842590B2 (en) Face-to-face communication analysis via mono-recording system and methods
US9571652B1 (en) Enhanced diarization systems, media and methods of use
CN105027196B (en) It is searched for outside quick vocabulary in automatic speech recognition system
US8145562B2 (en) Apparatus and method for fraud prevention
JP2023511104A (en) A Robust Spoofing Detection System Using Deep Residual Neural Networks
US8005676B2 (en) Speech analysis using statistical learning
US11715460B2 (en) Z-vectors: speaker embeddings from raw audio using sincnet, extended CNN architecture and in-network augmentation techniques
CN109873907A (en) Call processing method, device, computer equipment and storage medium
CN109192216A (en) A kind of Application on Voiceprint Recognition training dataset emulation acquisition methods and its acquisition device
CN112511696A (en) System and method for identifying bad content of call center AI engine
CN116631412A (en) Method for judging voice robot through voiceprint matching
US20100172479A1 (en) Dynamically improving performance of an interactive voice response (ivr) system using a complex events processor (cep)
CN117424960A (en) Intelligent voice service method, device, terminal equipment and storage medium
Hamidi et al. Interactive voice application-based amazigh speech recognition
CN117354419A (en) Voice telephone recognition processing method and device, electronic equipment and storage medium
CN113314103B (en) Illegal information identification method and device based on real-time speech emotion analysis
US20220182485A1 (en) Method for training a spoofing detection model using biometric clustering
US20180096679A1 (en) Electronic speech recognition name directory prognostication system
Shaikh et al. Language independent on–off voice over IP source model with lognormal transitions
Suendermann et al. Crowdsourcing for industrial spoken dialog systems
US20240205330A1 (en) Source agnostic call recording and chat ingestion
Wang et al. Applying Feature Extraction of Speech Recognition on VOIP Auditing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination