CN116996613A - Audio information detection method and device, storage medium and electronic equipment - Google Patents

Audio information detection method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN116996613A
CN116996613A CN202210447094.1A CN202210447094A CN116996613A CN 116996613 A CN116996613 A CN 116996613A CN 202210447094 A CN202210447094 A CN 202210447094A CN 116996613 A CN116996613 A CN 116996613A
Authority
CN
China
Prior art keywords
target
information
voiceprint
account
audio information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210447094.1A
Other languages
Chinese (zh)
Inventor
陈星宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210447094.1A priority Critical patent/CN116996613A/en
Publication of CN116996613A publication Critical patent/CN116996613A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/22Arrangements for supervision, monitoring or testing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/22Arrangements for supervision, monitoring or testing
    • H04M3/2281Call monitoring, e.g. for law enforcement purposes; Call tracing; Detection or prevention of malicious calls

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Technology Law (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The application discloses a method and a device for detecting audio information, a storage medium and electronic equipment. Wherein the method comprises the following steps: the method comprises the steps of obtaining target audio information to be detected, matching target voiceprint information in the target audio information through a preset voiceprint matching model, wherein the voiceprint matching model is used for matching the target voiceprint information in the target audio information with marked voiceprint information, carrying out semantic recognition on the target audio information according to a preset semantic recognition model under the condition that marked voiceprint information corresponding to the target voiceprint information in the target audio information is not matched through the voiceprint matching model, obtaining a target recognition result, and transferring virtual resources corresponding to a target account under the condition that the target recognition result indicates that the target account is an invalid account. The application solves the technical problem of lower detection efficiency of the audio information in the related technology.

Description

Audio information detection method and device, storage medium and electronic equipment
Technical Field
The present application relates to the field of computers, and in particular, to a method and apparatus for detecting audio information, a storage medium, and an electronic device.
Background
At present, the detection of the audio information of the outbound voice is mainly judged according to the hang-up reason state code returned by the operator, or the ringing audio of the calling user is completely recorded and identified, and the accurate hang-up reason is judged by checking the key words or manually.
For the mode of judging the hang-up reason status code returned by the operator, because the hang-up reason status code returned by the operator is customized by different operators in different areas, the accuracy of judging the hang-up reason cannot be ensured, so that the detection accuracy of the audio information cannot be ensured, and the detection efficiency of the audio information is further influenced.
For the manner of recording and identifying all ringing audios of calling users, all the ringing audios of the users need to be recorded, so that the ringing audios of the complete users need to be stored, a large amount of computing resources for identifying and calculating are also needed, and the detection efficiency of audio information is low.
In view of the above problems, no effective solution has been proposed at present.
Disclosure of Invention
The embodiment of the application provides a method and a device for detecting audio information, a storage medium and electronic equipment, which are used for at least solving the technical problem of low detection efficiency of the audio information in the related technology.
According to an aspect of an embodiment of the present application, there is provided a method for detecting audio information, including: acquiring target audio information to be detected, wherein the target audio information is generated after a call is initiated to a target account; matching the target voiceprint information in the target audio information through a preset voiceprint matching model, wherein the voiceprint matching model is used for matching the target voiceprint information in the target audio information with the marked voiceprint information; under the condition that the voiceprint matching model is not matched with the marked voiceprint information corresponding to the target voiceprint information in the target audio information, carrying out semantic recognition on the target audio information according to a preset semantic recognition model to obtain a target recognition result, wherein the semantic recognition model is used for recognizing the semantic of the target audio information, and the target recognition result is used for indicating whether the target account is the invalid account; and transferring a virtual resource corresponding to the target account when the target identification result indicates that the target account is the invalid account, wherein the virtual resource is configured to be allowed to be transferred when the target account is the invalid account.
According to another aspect of the embodiment of the present application, there is also provided an apparatus for detecting audio information, including: the acquisition module is used for acquiring target audio information to be detected, wherein the target audio information is generated after a call is initiated to a target account; the matching module is used for matching the target voiceprint information in the target audio information through a preset voiceprint matching model, wherein the voiceprint matching model is used for matching the target voiceprint information in the target audio information with the marked voiceprint information; the identification module is used for carrying out semantic identification on the target audio information according to a preset semantic identification model to obtain a target identification result when the voiceprint matching model does not match the marked voiceprint information corresponding to the target voiceprint information in the target audio information, wherein the semantic identification model is used for identifying the semantic of the target audio information, and the target identification result is used for indicating whether the target account is the invalid account; and the transferring module is used for transferring the virtual resource corresponding to the target account when the target identification result indicates that the target account is the invalid account, wherein the virtual resource is configured to be allowed to be transferred when the target account is the invalid account.
Optionally, the device is configured to obtain the target audio information to be detected by: acquiring a target account set, wherein the target account set is an account set provided by a media information publisher; selecting the target account from the target account set, and initiating a call request to the target account; and acquiring the target audio information in the process of initiating a call request to the target account.
Optionally, the device is configured to select the target account from the target account set, and initiate a call request to the target account by: acquiring an invalid account identifier provided by the media information publisher, wherein the invalid account identifier is an identifier obtained by the media information publisher for an invalid account mark in the target account set; and selecting the target account from the target account set according to the invalid account identifier, and initiating a call request to the target account, wherein the target account is an account with the invalid account identifier in the target account set.
Optionally, the device is further configured to: and under the condition that the target identification result indicates that the target account is not the invalid account, prohibiting the transfer of virtual resources corresponding to the target account to the media information publisher.
Optionally, the device is configured to match target voiceprint information in the target audio information through a preset voiceprint matching model in the following manner: analyzing the real-time voice stream of the target audio information according to a preset time period to obtain the target voiceprint information; and inputting the target voiceprint information into the voiceprint matching model, and matching the target voiceprint information with voiceprint information marked in the voiceprint matching model through the voiceprint matching model.
Optionally, the device is configured to input the target voiceprint information into the voiceprint matching model, and match the target voiceprint information with voiceprint information marked in the voiceprint matching model through the voiceprint matching model by using the following method: under the condition that the target voiceprint information is matched with first voiceprint information, determining a first labeling result of the first voiceprint information as a target matching result output by the voiceprint matching model, wherein the labeled voiceprint information comprises the first voiceprint information, and the first labeling result comprises calling party abnormality; the device is also for: and stopping the call with the target account to generate alarm information under the condition that the target matching result indicates that the call state of the call with the target account is abnormal, wherein the alarm information is used for indicating that the account of the current call with the target account is abnormal.
Optionally, the device is configured to input the target voiceprint information into the voiceprint matching model, and match the target voiceprint information with voiceprint information marked in the voiceprint matching model through the voiceprint matching model by using the following method: under the condition that the target voiceprint information is matched with second voiceprint information, determining a second labeling result of the second voiceprint information as a target matching result output by the voiceprint matching model, wherein the labeled voiceprint information comprises the second voiceprint information, and the second labeling result comprises abnormality of a called party; the device is also for: and stopping the call with the target account and determining the target account as the invalid account when the target matching result indicates that the call state of the call with the target account is abnormal for the called party.
Optionally, the device is configured to perform semantic recognition on the target audio information according to a preset semantic recognition model to obtain a target recognition result when the voiceprint matching model does not match the annotated voiceprint information corresponding to the target voiceprint information in the target audio information, where the method includes: storing the target audio information if the voiceprint matching model does not match the annotated voiceprint information corresponding to the target voiceprint information in the target audio information; inputting the target audio information into the semantic recognition model, and performing semantic recognition on the target audio information to obtain the target recognition result, wherein the semantic recognition model is used for acquiring text information corresponding to the target audio information and performing semantic recognition on the text information.
Optionally, the device is further configured to: stopping the call with the target account to generate alarm information under the condition that the target identification result also indicates that the call state of the call with the target account is abnormal, wherein the alarm information is used for indicating that the account of the current call with the target account is abnormal; and stopping the call with the target account and determining the target account as the invalid account under the condition that the target identification result also indicates that the call state of the call with the target account is abnormal.
Optionally, the device is further configured to: when the target identification result also indicates that the call state of the call with the target account is abnormal, adding third voiceprint information corresponding to the target audio information into the voiceprint matching model, and determining a labeling result corresponding to the third voiceprint information as abnormal calling party; and when the target identification result also indicates that the call state of the call with the target account is abnormal, adding fourth voiceprint information corresponding to the target audio information into the voiceprint matching model, and determining the labeling result corresponding to the fourth voiceprint information as abnormal of the called party.
According to still another aspect of the embodiments of the present application, there is also provided a computer-readable storage medium having a computer program stored therein, wherein the computer program is configured to perform the above-described method of detecting audio information when run.
According to yet another aspect of embodiments of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions so that the computer device performs the detection method of the audio information as above.
According to still another aspect of the embodiments of the present application, there is also provided an electronic device including a memory in which a computer program is stored, and a processor configured to execute the above-described audio information detection method by the computer program.
In the embodiment of the application, the target audio information to be detected is acquired, wherein the target audio information is the audio information generated after the call is initiated to the target account, the target voiceprint information in the target audio information is matched through a preset voiceprint matching model, the voiceprint matching model is used for matching the target voiceprint information in the target audio information with the marked voiceprint information, when the voiceprint matching model is not matched with the marked voiceprint information corresponding to the target voiceprint information in the target audio information, the target audio information is subjected to semantic recognition according to the preset semantic recognition model to obtain a target recognition result, the semantic recognition model is used for recognizing the semantics of the target audio information, the target recognition result is used for representing whether the target account is an invalid account, and virtual resources corresponding to the target account are transferred under the condition that the target recognition result represents the target account is an invalid account.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
fig. 1 is a schematic view of an application environment of an alternative audio information detection method according to an embodiment of the present application;
FIG. 2 is a flow chart of an alternative method for detecting audio information according to an embodiment of the application;
FIG. 3 is a schematic diagram of an alternative method of detecting audio information according to an embodiment of the present application;
FIG. 4 is a schematic diagram of yet another alternative method of detecting audio information according to an embodiment of the present application;
FIG. 5 is a schematic diagram of yet another alternative method of detecting audio information according to an embodiment of the present application;
FIG. 6 is a schematic diagram of yet another alternative method of detecting audio information according to an embodiment of the present application;
FIG. 7 is a schematic diagram of yet another alternative method of detecting audio information according to an embodiment of the present application;
fig. 8 is a schematic structural view of an alternative audio information detecting apparatus according to an embodiment of the present application;
FIG. 9 is a schematic diagram of an alternative audio information detection product according to an embodiment of the present application;
Fig. 10 is a schematic structural view of an alternative electronic device according to an embodiment of the present application.
Detailed Description
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, partial terms or terminology appearing in the course of describing the embodiments of the application are applicable to the following explanation:
ASR: automatic Speech Recognition, automatic speech recognition, and the conversion of speech into text.
Ringing audio: in telephone communication, a ringing tone generated before a calling party is not connected is called ringing tone.
MRCP: media Resource Control Protocol the media resource control protocol, which provides various services to clients by the voice server, is used to control the resources (e.g., ASR, TTS) that are subject to media stream processing and how to communicate with these resources.
Voiceprint recognition: voiceprint Recognition, automatically verifying the matching technique based on the sound characteristics and the speaking content information.
Freeswitch: is a free, open-source software type telephone switching system.
Intelligent cleaning system: the media information release platform replaces customer service by a robot, helps the media information release person to call the customer, and helps the media information release person to finish the primary screening and scoring of the intention of the customer according to the answer of the user.
The application is illustrated below with reference to examples:
according to an aspect of the embodiment of the present application, there is provided a method for detecting audio information, optionally, in this embodiment, the method for detecting audio information described above may be applied to a hardware environment composed of the server 101 and the terminal device 103 as shown in fig. 1. As shown in fig. 1, a server 101 is connected to a terminal 103 through a network, and may be used to provide services to a terminal device or an application installed on the terminal device, which may be a video application, an instant messaging application, a browser application, an educational application, a game application, or the like. The database 105 may be provided on or separate from the server for providing data storage services for the server 101, such as a game data storage server, which may include, but is not limited to: a wired network, a wireless network, wherein the wired network comprises: local area networks, metropolitan area networks, and wide area networks, the wireless network comprising: bluetooth, WIFI and other wireless communication networks, the terminal device 103 may be a terminal configured with an application program, and may include, but is not limited to, at least one of the following: the application program 107 of the audio information detection method is displayed through the terminal device 103 or other connected display devices.
As shown in fig. 1, the above method for detecting audio information may be implemented in the terminal device 103 by:
s1, acquiring target audio information to be detected on terminal equipment 103, wherein the target audio information is generated after a call is initiated to a target account;
s2, matching target voiceprint information in target audio information through a preset voiceprint matching model on the terminal equipment 103, wherein the voiceprint matching model is used for matching the target voiceprint information in the target audio information with the marked voiceprint information;
s3, under the condition that the voice print matching model is not matched with the labeled voice print information corresponding to the target voice print information in the target audio information, carrying out semantic recognition on the target audio information according to a preset semantic recognition model on the terminal equipment 103 to obtain a target recognition result, wherein the semantic recognition model is used for recognizing the semantic of the target audio information, and the target recognition result is used for indicating whether the target account is an invalid account;
s4, when the target recognition result shows that the target account is an invalid account, transferring a virtual resource corresponding to the target account on the terminal equipment 103, wherein the virtual resource is configured to be allowed to be transferred when the target account is the invalid account.
Alternatively, in the present embodiment, the above-described method for detecting audio information may also be implemented by a server, for example, the server 101 shown in fig. 1; or by both the terminal device and the server.
The above is merely an example, and the present embodiment is not particularly limited.
Optionally, as an optional implementation manner, as shown in fig. 2, the method for detecting audio information includes:
s202, acquiring target audio information to be detected, wherein the target audio information is generated after a call is initiated to a target account;
s204, matching the target voiceprint information in the target audio information through a preset voiceprint matching model, wherein the voiceprint matching model is used for matching the target voiceprint information in the target audio information with the marked voiceprint information;
s206, under the condition that the voice print matching model is not matched with the labeled voice print information corresponding to the target voice print information in the target audio information, carrying out semantic recognition on the target audio information according to a preset semantic recognition model to obtain a target recognition result, wherein the semantic recognition model is used for recognizing the semantic of the target audio information, and the target recognition result is used for indicating whether the target account is an invalid account;
S208, transferring the virtual resource corresponding to the target account when the target identification result indicates that the target account is an invalid account, wherein the virtual resource is configured to be allowed to be transferred when the target account is the invalid account.
Optionally, in the embodiment of the present application, the method for detecting audio information may include, but is not limited to, application scenarios in which whether the target account is valid or not needs to be determined, such as a media information publishing scenario, a customer service promotion scenario, a telemarketing scenario, a stream call information query scenario, and the like.
For example, fig. 3 is a schematic diagram of an alternative audio information detection method according to an embodiment of the present application, as shown in fig. 3, a media information publisher publishes media information 302 through a media information publishing platform, where the media information 302 is a "xx exhibition entry form", provides an entry for inputting information such as a name and a mobile phone number to a user, and the user obtains media information 304 by adding the name "Zhang san" and the mobile phone number, and provides related information of the above-mentioned target account to the media information publisher or the media information publishing platform through a "submit" button displayed on the media information 304.
Optionally, in the embodiment of the present application, the target audio information may include, but is not limited to, audio information generated after a call is initiated from the calling account to the target account. Specifically, the audio information returned by the operator when the call is not connected by the target account number may include, but is not limited to, for example, in the call, shutdown, stop, busy, no. and the like, or may also include, but is not limited to, your number has stopped, arrears, suspended services and the like.
It should be noted that, the calling account may include, but is not limited to, an account associated with an execution body of the audio information detection method, and the calling account may perform a call initiation function on the obtained target account.
For example, fig. 4 is a schematic diagram of another alternative audio information detection method according to an embodiment of the present application, as shown in fig. 4, specifically including, but not limited to, the following steps:
s1, under the condition that a media information publisher publishes media information through a media information publishing platform, the media information publishing platform receives an account set provided by the media information publisher;
s2, the media information issuing platform initiates a call to each account in the account set in sequence through the calling account;
s3, the media information release platform judges the effectiveness of the called target account according to the audio information generated in the call initiating process;
and S4, when the target account is judged to be an invalid account, transferring virtual resources associated with the target account to a media information publisher.
Alternatively, in the embodiment of the present application, the preset voiceprint matching model may include, but is not limited to, a recognition model of a biometric technology, which is also called speaker recognition, and is recognized by converting an acoustic signal into an electrical signal and using a computer. Voiceprint (Voiceprint) is the spectrum of sound waves carrying speech information that is displayed with electro-acoustic instrumentation. Training of the voiceprint matching model can generally include, but is not limited to, being accomplished by:
The template matching method comprises the following steps: utilizing Dynamic Time Warping (DTW) to align training and test feature sequences;
nearest neighbor method: during training, all feature vectors are reserved, and K nearest training vectors are found for each vector during recognition, so that recognition is performed;
the neural network method comprises the following steps: there are many forms such as multi-layer perception, radial Basis Functions (RBFs), etc.;
hidden Markov Model (HMM) method: typically, a single state HMM, or Gaussian Mixture Model (GMM), is used;
VQ clustering method (e.g., LBG): the algorithm complexity is low, and the HMM method can be matched with training to achieve a better effect;
polynomial classifier method.
It should be noted that, in the embodiment of the present application, the trained voiceprint matching model may match voiceprint information in the target audio information, and match different voiceprint information with parameters indicating a current voice call state.
For example, the voiceprint information includes A, B, C three voiceprint information, where a is labeled as matching "in-call" returned by the carrier, B is labeled as matching "off" returned by the carrier, C is labeled as matching "your phone arrears" returned by the carrier, and so on.
And matching the voiceprint information carried in the target audio information with the marked A, B, C voiceprint information to obtain a matching result corresponding to the target voiceprint information, so as to directly determine the validity of the target account according to the target voiceprint information.
Optionally, in the embodiment of the present application, the target voiceprint information is voiceprint information extracted from the target audio information in real time through a voiceprint feature extraction model, and the noted voiceprint information indicates that a corresponding voice call state parameter is noted in advance for each voiceprint information in a preset voiceprint information set, that is, each voiceprint information in the preset voiceprint information set for matching is noted with a corresponding voice call state parameter. The matching the target voiceprint information in the target audio information may include, but is not limited to, matching the target voiceprint information with voiceprint information in a preset voiceprint information set, and using a voice call state parameter corresponding to voiceprint information matched with the target voiceprint information in the preset voiceprint information set as a voice call state parameter of the target voiceprint information, thereby determining whether the target account is a valid account according to the matched voice call state parameter, so as to facilitate execution of subsequent operations.
Optionally, in an embodiment of the present application, the voice print matching model does not match the labeled voice print information corresponding to the target voice print information in the target audio information, which may include, but is not limited to, that the voice call state parameter corresponding to the target voice print information cannot be determined.
Optionally, in the embodiment of the present application, performing semantic recognition on the target audio information according to the preset semantic recognition model may include, but is not limited to, recording the target audio information and inputting the semantic recognition model for recognition to obtain a target recognition result when the voice call state parameter corresponding to the target voiceprint information cannot be determined, where the semantic recognition model may include, but is not limited to, recognizing the target audio information as text information, and searching the recognition result corresponding to the text information from the database, and the target recognition result may include, but is not limited to, indicating whether the target account is a valid account, or indicating whether the target account is an invalid account, where the target account is a valid account or an invalid account may be preconfigured by the media information publishing platform, or may further determine the recognition result of the target audio information.
For example, taking the application to a media information release platform as an example, after the media information release platform releases media information provided by a media information publisher, a certain proportion of pay virtual resources are carried out on invalid accounts (client numbers of shutdown, shutdown and idle) in accounts collected by the media information provided by the media information publisher, so that the loss of the media information publisher is compensated, and the media information publisher is encouraged to actively label and return the invalid accounts.
Fig. 5 is a schematic diagram of yet another alternative method for detecting audio information according to an embodiment of the present application, where, as shown in fig. 5, "Zhang san" and "Li Si" are invalid accounts marked by a media information publisher, by using the above method for detecting audio information, a call is initiated by a media information publishing platform to "Zhang san" and "Li Si", taking "Zhang san" as an example, including but not limited to the following steps:
s1, acquiring an account number of Zhang Sanning, and initiating a call to Zhang Sanning;
s2, acquiring target audio information;
s3, matching target voiceprint information of target audio information in the call initiated to Zhang Sanj through a voiceprint matching model;
s4, under the condition that a matching result is not obtained, inputting the target audio information into a semantic recognition model for recognition;
S5, marking the state of the Zhang Sanning as valid and not paying any more under the condition that the matching result of S3 indicates that the Zhang Sanning is a valid account or the identification result of S4 indicates that the Zhang Sanning is a valid account.
Take "Lifour" as an example, including but not limited to the following steps:
s1, acquiring an account number of 'Liqu', and initiating a call to 'Liqu';
s2, acquiring target audio information;
s3, matching target voiceprint information of target audio information in the call initiated to 'Liqu' through a voiceprint matching model;
s4, under the condition that a matching result is not obtained, inputting the target audio information into a semantic recognition model for recognition;
s5, when the matching result of S3 indicates that "Liqu" is an invalid account, or the identification result of S4 indicates that "Liqu" is an invalid account, marking the state of "Liqu" as "invalid", and paying the media information publisher for the virtual resource associated with "Liqu".
The foregoing is merely an example, and embodiments of the present application are not limited in any way.
Optionally, in the embodiment of the present application, whether the target account is valid may include, but is not limited to, determining by the matching result of the voiceprint matching model and the target recognition result, performing matching by the voiceprint matching model, and performing recognition by using the recognition model only if the matching is not completed.
Optionally, in the embodiment of the present application, whether the target account is valid may include, but is not limited to, that the voice print or the voice print of the target audio information identifies the corresponding call state as abnormal to the called party, for example, in the call, shutdown, busy, idle, and the like, and the same or different subsequent operations may be performed for different abnormal states of the called party. For example, a virtual resource corresponding to the target account may be transferred, where the receiving party of the virtual resource may include, but is not limited to, a media information publisher that provides the target account, where the virtual resource is configured to allow the virtual resource to be transferred when the target account is an invalid account and is understood that the target account is an invalid account, and may be directly transferred to the media information publisher, or when the target account is an invalid account, a detection process of audio information of other accounts is performed, and finally, the virtual resource is uniformly transferred to the media information publisher.
Alternatively, in an embodiment of the present application, the virtual resource may include, but is not limited to, virtual gold, virtual gifts, virtual services, and the like.
According to the embodiment of the application, the target audio information to be detected is obtained, wherein the target audio information is the audio information generated after the call is initiated to the target account, the target voiceprint information in the target audio information is matched through a preset voiceprint matching model, the voiceprint matching model is used for matching the target voiceprint information in the target audio information with the annotated voiceprint information, under the condition that the annotated voiceprint information corresponding to the target voiceprint information in the target audio information is not matched through the voiceprint matching model, the semantic recognition is carried out on the target audio information according to the preset semantic recognition model to obtain a target recognition result, the semantic recognition model is used for recognizing the semantic of the target audio information, the target recognition result is used for representing whether the target account is an invalid account, and virtual resources corresponding to the target account are transferred when the target recognition result represents that the target account is the invalid account, wherein the virtual resources are configured in a mode of allowing to be transferred when the target account is the invalid account, the voice print matching model is used for preferentially matching through obtaining the voice information generated after the call is initiated to the target account, and the semantic recognition model is used for recognizing when the voice print information is not matched, so that the validity of the target account is determined, the purpose of rapidly judging whether the virtual resources need to be transferred to the invalid target account is achieved, the technical effect of improving the detection efficiency of the voice information is achieved, and the technical problem that the detection efficiency of the voice information is lower in the related technology is solved.
As an alternative, acquiring the target audio information to be detected includes:
acquiring a target account set, wherein the target account set is an account set provided by a media information publisher;
selecting a target account from the target account set, and initiating a call request to the target account;
and in the process of initiating a call request to the target account, acquiring target audio information.
Optionally, in the embodiment of the present application, the target account set may include, but is not limited to, an account set obtained by a media information publisher according to provided media information, where account information corresponding to the account set is actively provided by a user, or the account information is obtained under the condition that the user allows.
In particular, the set of target accounts may include, but is not limited to, a set of accounts labeled as invalid accounts by the media information publisher.
Optionally, in the embodiment of the present application, the selecting the target account from the target account set may include, but is not limited to, randomly selecting the target account from the target account set, and deleting the detected target account from the target account set, so as to avoid duplicate detection.
Optionally, in the embodiment of the present application, the foregoing call request to the target account may include, but is not limited to, a call request to the target account through a calling account associated with the media information publishing platform, and may also include, but is not limited to, a call request to the target account by a third party platform.
Optionally, in the embodiment of the present application, the process of initiating the call request to the target account may include, but is not limited to, making a call to the target account, but is not yet connected by the target account.
According to the embodiment of the application, the acquisition target account set is adopted, wherein the target account set is the account set provided by the media information publisher, the target account is selected from the target account set, and the call request is initiated to the target account, and in the process of initiating the call request to the target account, the target audio information is acquired.
As an alternative solution, selecting a target account from a target account set, and initiating a call request to the target account, including:
acquiring an invalid account identifier provided by a media information publisher, wherein the invalid account identifier is obtained by the media information publisher for an invalid account mark in a target account set;
and selecting a target account from the target account set according to the invalid account identifier, and initiating a call request to the target account, wherein the target account is an account with the invalid account identifier in the target account set.
Optionally, in the embodiment of the present application, the invalid account identifier is an identifier of an invalid account marked by the media information publisher, and the media information publisher may preprocess the account set, so as to reduce the number of accounts to be detected, improve the technical effect of the detection efficiency of the audio information, and further solve the technical problem of lower detection efficiency of the audio information in the related technology.
According to the embodiment of the application, the invalid account identification provided by the media information publisher is obtained, wherein the invalid account identification is obtained by the media information publisher for the invalid account mark in the target account set, the target account is selected from the target account set according to the invalid account identification, and the call request is initiated to the target account, wherein the target account is the account with the invalid account identification in the target account set, and the detection is carried out only aiming at the invalid target account by obtaining the invalid target account marked in advance by the media information publisher, so that the payment cost of the media information publishing platform is reduced, and the detection accuracy is improved.
As an alternative, the method further comprises:
and under the condition that the target identification result indicates that the target account is not an invalid account, prohibiting the transfer of virtual resources corresponding to the target account to the media information publisher.
Optionally, in the embodiment of the present application, the fact that the target account is not an invalid account may be understood that the identification result of the target audio information indicates that the target account is allowed to be turned on, and at this time, the target account is an valid account, so that the media information publisher approves the target account, and additional payment is avoided for the incorrectly marked account.
According to the embodiment of the application, the mode of prohibiting the transfer of the virtual resource corresponding to the target account to the media information publisher is adopted under the condition that the target recognition result indicates that the target account is not the invalid account, so that the media information publisher is prevented from paying the account which is valid but marked as invalid by the media information publisher, the paying cost of the media information publisher is reduced, and the detection accuracy of the target audio information is improved.
As an alternative, matching, by a preset voiceprint matching model, target voiceprint information in target audio information includes:
Analyzing the real-time voice stream of the target audio information according to a preset time period to obtain target voiceprint information;
inputting the target voiceprint information into a voiceprint matching model, and matching the target voiceprint information with the voiceprint information marked in the voiceprint matching model through the voiceprint matching model.
Optionally, in the embodiment of the present application, the preset time period may include, but is not limited to, every second, every 5 seconds, and so on, and specifically, may be flexibly adjusted according to actual needs.
Optionally, in the embodiment of the present application, the parsing manner may include, but is not limited to, performing an off-system synchronization process on the target audio information, and performing matching by converting a real-time voice stream of the target audio information into voiceprints.
According to the embodiment of the application, the real-time voice stream of the target audio information is analyzed according to the preset time period to obtain the target voiceprint information, the target voiceprint information is input into the voiceprint matching model, and the target voiceprint information is matched with the voiceprint information marked in the voiceprint matching model by the voiceprint matching model, so that the detection of the target audio information can be rapidly completed, the detection timeliness of the target audio information is improved, the payment cost of a media information issuing platform is reduced, and the detection accuracy of the target audio information is improved.
As an alternative to this, it is also possible,
inputting the target voiceprint information into a voiceprint matching model, and matching the target voiceprint information with the voiceprint information marked in the voiceprint matching model through the voiceprint matching model, wherein the method comprises the following steps of: under the condition that the target voiceprint information is matched with the first voiceprint information, determining a first labeling result of the first voiceprint information as a target matching result output by a voiceprint matching model, wherein the labeled voiceprint information comprises the first voiceprint information, and the first labeling result comprises calling party abnormality;
the method further comprises the following steps: and stopping the call with the target account to generate alarm information under the condition that the target matching result indicates that the call state of the call with the target account is abnormal, wherein the alarm information is used for indicating that the account currently establishing the call with the target account is abnormal.
Optionally, in the embodiment of the present application, the first voiceprint information is voiceprint information labeled with a first labeling result in advance, for example, a voiceprint corresponding to "your phone is stopped" is labeled as the first voiceprint information as a first labeling result of calling party abnormality in advance.
Optionally, in the embodiment of the present application, the calling party exception may include, but is not limited to, calling party exception, for example, arrearages, sealing stops, etc. of calling numbers applied by the media information publisher or the media information publishing platform.
Optionally, in the embodiment of the present application, the stopping the call with the target account may include, but is not limited to, hanging up the current phone, and the generating the alert information may include, but is not limited to, generating an alert prompt to alert that the current caller is abnormal.
According to the embodiment of the application, under the condition that target voiceprint information is matched with first voiceprint information, a first labeling result of the first voiceprint information is determined to be a target matching result output by a voiceprint matching model, wherein the labeled voiceprint information comprises the first voiceprint information, and the first labeling result comprises calling party abnormality; and under the condition that the target matching result indicates that the call state of the call with the target account is abnormal, stopping the call with the target account to generate alarm information, wherein the alarm information is used for indicating the abnormal mode of the account which is currently in call with the target account, the call can be actively hung up when the abnormal mode of the call is generated, and the alarm information can be timely generated, so that the follow-up detection is convenient, the detection timeliness of the target audio information is improved, the payment cost of a media information release platform is reduced, and the detection accuracy of the target audio information is improved.
As an alternative to this, it is also possible,
inputting the target voiceprint information into a voiceprint matching model, and matching the target voiceprint information with the voiceprint information marked in the voiceprint matching model through the voiceprint matching model, wherein the method comprises the following steps of: under the condition that the target voiceprint information is matched with the second voiceprint information, determining a second labeling result of the second voiceprint information as a target matching result output by a voiceprint matching model, wherein the labeled voiceprint information comprises the second voiceprint information, and the second labeling result comprises the abnormality of a called party;
the method further comprises the steps of: and stopping the call with the target account and determining that the target account is an invalid account when the target matching result indicates that the call state of the call with the target account is abnormal.
Optionally, in the embodiment of the present application, the second voiceprint information is voiceprint information labeled with a second labeling result in advance, for example, a voiceprint corresponding to "you make a call is turned off" is labeled as the second labeling result of the second voiceprint information, where the second labeling result is abnormal to the called party.
Alternatively, in the embodiment of the present application, the called party exception may include, but is not limited to, a called party exception, for example, in a call, shutdown, busy, null, etc.
Optionally, in the embodiment of the present application, the stopping the call with the target account may include, but is not limited to, hanging up the current phone, and the determining that the target account is the invalid account may include, but is not limited to, marking the target account as the invalid account on the media information publishing platform.
According to the embodiment of the application, when the target voiceprint information is matched with the second voiceprint information, the second labeling result of the second voiceprint information is determined to be the target matching result output by the voiceprint matching model, wherein the labeled voiceprint information comprises the second voiceprint information, the second labeling result comprises the abnormality of the called party, when the target matching result indicates that the call state of the call with the target account is the abnormality of the called party, the call with the target account is stopped, and the target account is determined to be an invalid account, so that when the abnormality of the called party occurs, the call is actively hung up, and confirmation information is generated, so that the subsequent reimbursement to the media information publisher is facilitated, and the detection accuracy of the target audio information is improved.
As an alternative, in the case that the voiceprint matching model does not match the annotated voiceprint information corresponding to the target voiceprint information in the target audio information, performing semantic recognition on the target audio information according to a preset semantic recognition model to obtain a target recognition result, including:
Storing the target audio information under the condition that the marked voiceprint information corresponding to the target voiceprint information in the target audio information is not matched through the voiceprint matching model;
inputting the target audio information into a semantic recognition model, and performing semantic recognition on the target audio information to obtain a target recognition result, wherein the semantic recognition model is used for acquiring text information corresponding to the target audio information and performing semantic recognition on the text information.
Optionally, in the embodiment of the present application, the storing the target audio information may include, but is not limited to, recording the target audio information simultaneously in a process of voiceprint matching the target audio information, and finally storing the target audio information.
Optionally, in the embodiment of the present application, the above identification manner may include, but is not limited to, identifying the semantics of the target audio information according to the keywords of the target audio information, obtaining the semantics of the target audio information, and determining the final target identification result according to the semantics.
As an alternative, the method further comprises:
when the target identification result also indicates that the call state of the call with the target account is abnormal, stopping the call with the target account, and generating alarm information, wherein the alarm information is used for indicating that the account of the current call with the target account is abnormal;
And stopping the call with the target account and determining that the target account is an invalid account when the target identification result also indicates that the call state of the call with the target account is abnormal.
As an alternative, the method further comprises:
when the target identification result also indicates that the call state of the call with the target account is abnormal, adding third voiceprint information corresponding to the target audio information into a voiceprint matching model, and determining a labeling result corresponding to the third voiceprint information as abnormal calling party;
and when the target identification result also indicates that the call state of the call with the target account is abnormal, adding fourth voiceprint information corresponding to the target audio information into the voiceprint matching model, and determining the labeling result corresponding to the fourth voiceprint information as abnormal of the called party.
Optionally, in the embodiment of the present application, when the target recognition result further indicates that the call state of the call with the target account is abnormal, the target recognition result may be further used to remark the third voiceprint information that is not matched to the corresponding marked voiceprint information, and when the target recognition result further indicates that the call state of the call with the target account is abnormal, the target recognition result may be further used to remark the fourth voiceprint information that is not matched to the corresponding marked voiceprint information.
According to the embodiment of the application, for the target voiceprint information which is not matched with the corresponding marked voiceprint information, the target voiceprint information is input into the semantic recognition model to obtain the target recognition result, and the target voiceprint information is re-marked according to the target recognition result, so that when other target accounts acquire the voiceprint information, the corresponding marked voiceprint information can be matched.
Embodiments of the present application are further illustrated below in conjunction with specific examples:
under the invalid pay policy of the media information release platform, a certain proportion of policy pay virtual money (corresponding to the virtual resource) is carried out on invalid clues (the client numbers of shutdown, shutdown and idle) in target client collection numbers (corresponding to the target account set) after the media information release is released by the media information release platform, so that the loss of the media information release is compensated, the media information release is encouraged to mark and return the line volume pair clues, and the media information release is facilitated to use the returned standard information to feed back advertisements.
In the advertisement information collecting process, intelligent verification judgment is performed on the invalid number marked and returned by the media information publisher, wherein the intelligent verification judgment comprises the following contents:
Detecting invalidity of a collection thread based on the customers in the advertisement scene;
based on MRCP, quick through ringing tone matching hang-up reason is realized;
completing automatic and analytic labeling of the voiceprint library based on java engineering and ASR technology;
in the intelligent cleaning system, for abnormal numbers, the intelligent cleaning system needs to quickly find and inform the abnormal calling numbers (such as arrearages, sealing stops and the like of calling numbers applied by media information publishers). For abnormal called number (clue submitted by customer) (idle number, shutdown, busy etc.), intelligent cleaning can be accelerated to hang up, and call efficiency is improved, including but not limited to the following:
real-time early warning based on abnormal calling numbers of advertisement scenes;
calling acceleration in intelligent cleaning process based on advertisement scene;
the voiceprint matching part of the embodiment of the application adopts a minimum voiceprint matching mode, namely voiceprints per second are analyzed and accumulated for voiceprint matching, and corresponding hang-up reasons can be matched in the shortest time.
Under the scene of advertisement delivery, if the target of the delivery is customer information collection, the media information publisher can collect form information actively filled by a user, and after the user submits clues to the media information publisher, the clues of the media information publisher can be automatically collected and synchronized to a clue platform, and can be checked, exported and dialed.
In the invalid pay process, the validity detection is required for the data marked by the media information publisher for pay, whether the pay is required for the invalid clue marked by the media information publisher is judged, and the follow points are required for the detection of the clue invalidity, so that the media information publisher can trust the trust and avoid the additional pay of the media information publishing platform due to the clue marked by errors.
1. Real-time performance of invalid clues (because the mobile phone number can change every day at any time, the detection time efficiency requirement is high);
2. the accuracy of invalid clue judgment, the pay policy only pays the mobile phone number of the empty number, the shutdown and the connection of the shutdown for 3 days, so the accuracy requirement is very high, otherwise, extra compensation is caused, and extra loss is caused;
3. the accuracy of ringing audio at the time is used for recording and backing up the user ringing audio dialed during invalid detection, and the objection condition of the media information publisher can be exemplified.
4. Model autonomous optimization and perfecting capability (because mobile phone ring can be updated and iterated continuously, the system needs to learn and iterate continuously, more and more various ring models are identified, and accurate judgment is carried out.)
FIG. 6 is a schematic diagram of yet another alternative method for detecting audio information according to an embodiment of the present application, as shown in FIG. 6, wherein an invalid thread real-time detection system is constructed, including thread real-time detection, ringing audio real-time recording, model automatic ASR recognition, intelligent analysis of ringing audio voiceprint, intelligent optimization of voiceprint model library, and the like, and specifically includes, but is not limited to, the following steps:
s1, obtaining clues submitted by a media information publisher;
s2, judging whether the clues submitted by the media information publisher are marked as invalid clues or not;
s3, under the condition that the judgment result of S2 is yes, carrying out invalidation detection on the clue;
s4, initiating a voice call to the clue, and acquiring voiceprint information to perform voiceprint recognition;
s5, judging whether the voiceprint information is successfully matched, directly executing the step S10 under the condition that the voiceprint information is successfully matched, and executing the step S7 under the condition that the voiceprint information is failed to be matched;
s6, recording ringing audio in the execution process of S4;
s7, carrying out ASR identification on the recorded ringing audio;
S8, carrying out voiceprint analysis on the ringing audio while carrying out ASR identification on the recorded ringing audio;
s9, optimizing the model according to the identification result and the voiceprint analysis result;
s10, the identification result is stored and transmitted back to the system for subsequent payment.
The foregoing is merely an example, and embodiments of the present application are not limited in any way.
According to the embodiment, the verification labor cost is reduced, the verification cost is raised, the detection accuracy and the detection effectiveness are improved, the ringing audio can be increased, and model optimization can be performed continuously and intelligently. In the intelligent cleaning scene, on one hand, the abnormal number can be quickly hung up, and on the other hand, the accurate hanging-up result can be displayed to the client, so that the client can intuitively eliminate invalid clues.
Fig. 7 is a schematic diagram of yet another alternative method for detecting audio information according to an embodiment of the present application, as shown in fig. 7, a thread validity detection system is constructed, and the thread validity detection system runs on a freeswitch-based outbound platform, and performs recognition analysis and recording work by hanging on a module, and rewriting a method for intercepting a voice stream to intercept ringing audio of a calling user's mobile phone.
It should be noted that, during the process of dialing a phone, the state of the dialed phone number (normal number, in call, shutdown, busy, idle number, etc.) and the abnormality of the calling number (your number has been shutdown, arrears, suspending service, etc.) can be quickly obtained through the ringing audio which is not answered by the user, in the traditional calling system, the calling system can continue calling for the invalid called party until the operator actively hangs up, the system uses the format freely embedded by the freeswitch module to perform the synchronization processing outside the system based on the ringing audio of the calling system, the detection module can real-time transfer the voice stream to the detection system through real-time conversion of the ringing audio into voiceprint to obtain the current state result of the main/called party, if the calling abnormality is found, the task is immediately triggered to trigger the alarm, if the called abnormality is found, the corresponding result is marked, the service logic of the corresponding service is executed (invalidation, the intelligent cleaning is completed, the hang-up operation is completed, the hang-up audio is written, the voice is not completed), the automatic recognition is carried out for the ASR is carried out, and the text model is not matched if the result is not found, the ASR is not matched with the text model, and the result is not found, and the result is matched with the text model.
1. The forwarding mode of ringing audio:
the system starts recording and forwarding ringing audios according to different service types when a scheduling task is acquired through a lua script built in a fresh switch, and processes different services
2. Voiceprint autonomous training verification mode:
the mrcp voiceprint engine and the system are used for making a call, and training is automatically sent in the independent training learning and model training is carried out by analyzing voiceprints:
3. unmatched autonomous learning mode:
for ringing audios which are not matched with the voiceprint library, the detection system module is used for independently recording part of the ringing audios, the recording part of the ringing audios is used for carrying out ASR speech recognition through a tool kit, the results of the ASR speech recognition are used for carrying out preliminary learning through the results of some text keywords, and if learning results are obtained, model training is carried out autonomously:
in the invalid pay policy, the system is used as a detection system tool, not only is the feedback of the line by the media information publisher promoted, and the feeding-back advertisement is helped, but also the feeding confidence of the media information publisher is increased through the invalid pay policy.
In the intelligent cleaning system, the capability of actively finding abnormal early warning is improved, and the calling efficiency of the system is greatly improved.
It will be appreciated that in the specific embodiments of the present application, related data such as user information is involved, and when the above embodiments of the present application are applied to specific products or technologies, user permissions or consents need to be obtained, and the collection, use and processing of related data need to comply with related laws and regulations and standards of related countries and regions.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.
According to another aspect of the embodiment of the present application, there is also provided an audio information detection apparatus for implementing the above audio information detection method. As shown in fig. 8, the apparatus includes:
an obtaining module 802, configured to obtain target audio information to be detected, where the target audio information is audio information generated after a call is initiated to a target account;
The matching module 804 is configured to match target voiceprint information in the target audio information through a preset voiceprint matching model, where the voiceprint matching model is configured to match target voiceprint information in the target audio information with annotated voiceprint information;
the identifying module 806 is configured to, when the voiceprint matching model does not match the annotated voiceprint information corresponding to the target voiceprint information in the target audio information, perform semantic identification on the target audio information according to a preset semantic identification model, so as to obtain a target identification result, where the semantic identification model is used to identify the semantic of the target audio information, and the target identification result is used to indicate whether the target account is the invalid account;
and a transferring module 808, configured to transfer a virtual resource corresponding to the target account if the target identification result indicates that the target account is the invalid account, where the virtual resource is configured to allow to be transferred when the target account is the invalid account.
As an alternative, the device is configured to obtain the target audio information to be detected by:
Acquiring a target account set, wherein the target account set is an account set provided by a media information publisher;
selecting the target account from the target account set, and initiating a call request to the target account;
and acquiring the target audio information in the process of initiating a call request to the target account.
As an optional solution, the device is configured to select the target account from the target account set, and initiate a call request to the target account by:
acquiring an invalid account identifier provided by the media information publisher, wherein the invalid account identifier is an identifier obtained by the media information publisher for an invalid account mark in the target account set;
and selecting the target account from the target account set according to the invalid account identifier, and initiating a call request to the target account, wherein the target account is an account with the invalid account identifier in the target account set.
As an alternative, the device is further configured to:
and under the condition that the target identification result indicates that the target account is not the invalid account, prohibiting the transfer of virtual resources corresponding to the target account to the media information publisher.
As an optional solution, the device is configured to match target voiceprint information in the target audio information through a preset voiceprint matching model in the following manner:
analyzing the real-time voice stream of the target audio information according to a preset time period to obtain the target voiceprint information;
and inputting the target voiceprint information into the voiceprint matching model, and matching the target voiceprint information with voiceprint information marked in the voiceprint matching model through the voiceprint matching model.
As an alternative to this, it is also possible,
the device is used for inputting the target voiceprint information into the voiceprint matching model in the following way, and matching the target voiceprint information with voiceprint information marked in the voiceprint matching model through the voiceprint matching model: under the condition that the target voiceprint information is matched with first voiceprint information, determining a first labeling result of the first voiceprint information as a target matching result output by the voiceprint matching model, wherein the labeled voiceprint information comprises the first voiceprint information, and the first labeling result comprises calling party abnormality;
The device is also for: and stopping the call with the target account to generate alarm information under the condition that the target matching result indicates that the call state of the call with the target account is abnormal, wherein the alarm information is used for indicating that the account of the current call with the target account is abnormal.
As an alternative to this, it is also possible,
the device is used for inputting the target voiceprint information into the voiceprint matching model in the following way, and matching the target voiceprint information with voiceprint information marked in the voiceprint matching model through the voiceprint matching model: under the condition that the target voiceprint information is matched with second voiceprint information, determining a second labeling result of the second voiceprint information as a target matching result output by the voiceprint matching model, wherein the labeled voiceprint information comprises the second voiceprint information, and the second labeling result comprises abnormality of a called party;
the device is also for: and stopping the call with the target account and determining the target account as the invalid account when the target matching result indicates that the call state of the call with the target account is abnormal for the called party.
As an optional solution, the device is configured to, when the voiceprint matching model does not match the annotated voiceprint information corresponding to the target voiceprint information in the target audio information, perform semantic recognition on the target audio information according to a preset semantic recognition model, so as to obtain a target recognition result:
storing the target audio information if the voiceprint matching model does not match the annotated voiceprint information corresponding to the target voiceprint information in the target audio information;
inputting the target audio information into the semantic recognition model, and performing semantic recognition on the target audio information to obtain the target recognition result, wherein the semantic recognition model is used for acquiring text information corresponding to the target audio information and performing semantic recognition on the text information.
As an alternative, the device is further configured to:
stopping the call with the target account to generate alarm information under the condition that the target identification result also indicates that the call state of the call with the target account is abnormal, wherein the alarm information is used for indicating that the account of the current call with the target account is abnormal;
And stopping the call with the target account and determining the target account as the invalid account under the condition that the target identification result also indicates that the call state of the call with the target account is abnormal.
As an alternative, the device is further configured to:
when the target identification result also indicates that the call state of the call with the target account is abnormal, adding third voiceprint information corresponding to the target audio information into the voiceprint matching model, and determining a labeling result corresponding to the third voiceprint information as abnormal calling party;
and when the target identification result also indicates that the call state of the call with the target account is abnormal, adding fourth voiceprint information corresponding to the target audio information into the voiceprint matching model, and determining the labeling result corresponding to the fourth voiceprint information as abnormal of the called party.
According to one aspect of the present application, there is provided a computer program product comprising a computer program/instruction containing program code for executing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 909 and/or installed from the removable medium 911. When the computer program is executed by the central processor 901, various functions provided by the embodiments of the present application are performed.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
Fig. 9 schematically shows a block diagram of a computer system of an electronic device for implementing an embodiment of the application.
It should be noted that, the computer system 900 of the electronic device shown in fig. 9 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.
As shown in fig. 9, the computer system 900 includes a central processing unit 901 (Central Processing Unit, CPU) which can execute various appropriate actions and processes according to a program stored in a Read-Only Memory 902 (ROM) or a program loaded from a storage portion 908 into a random access Memory 903 (Random Access Memory, RAM). In the random access memory 903, various programs and data required for system operation are also stored. The cpu 901, the rom 902, and the ram 903 are connected to each other via a bus 904. An Input/Output interface 905 (i.e., an I/O interface) is also connected to bus 904.
The following components are connected to the input/output interface 905: an input section 906 including a keyboard, a mouse, and the like; an output section 907 including a speaker and the like, such as a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and the like; a storage portion 908 including a hard disk or the like; and a communication section 909 including a network interface card such as a local area network card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 190 is also connected to the input/output interface 905 as needed. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed on the drive 190 as needed, so that a computer program read out therefrom is installed into the storage section 908 as needed.
In particular, the processes described in the various method flowcharts may be implemented as computer software programs according to embodiments of the application. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 909 and/or installed from the removable medium 911. When executed by the central processor 901, performs various functions defined in the system of the present application.
According to still another aspect of the embodiment of the present application, there is also provided an electronic device for implementing the above-mentioned method for detecting audio information, where the electronic device may be a terminal device or a server as shown in fig. 1. The present embodiment is described taking the electronic device as a terminal device as an example. As shown in fig. 10, the electronic device comprises a memory 1002 and a processor 1004, the memory 1002 having stored therein a computer program, the processor 1004 being arranged to perform the steps of any of the method embodiments described above by means of the computer program.
Alternatively, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of the computer network.
Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:
s1, acquiring target audio information to be detected, wherein the target audio information is generated after a call is initiated to a target account;
s2, matching the target voiceprint information in the target audio information through a preset voiceprint matching model, wherein the voiceprint matching model is used for matching the target voiceprint information in the target audio information with the marked voiceprint information;
s3, under the condition that the voice print matching model is not matched with the labeled voice print information corresponding to the target voice print information in the target audio information, carrying out semantic recognition on the target audio information according to a preset semantic recognition model to obtain a target recognition result, wherein the semantic recognition model is used for recognizing the semantic of the target audio information, and the target recognition result is used for indicating whether the target account is an invalid account;
and S4, transferring the virtual resource corresponding to the target account when the target identification result indicates that the target account is an invalid account, wherein the virtual resource is configured to be allowed to be transferred when the target account is the invalid account.
Alternatively, as will be appreciated by those skilled in the art, the structure shown in fig. 10 is merely illustrative, and the electronic device may be a smart phone (such as an Android mobile phone, an iOS mobile phone, etc.), a tablet computer, a palmtop computer, a mobile internet device (Mobile Internet Devices, MID), a PAD, or other terminal devices. Fig. 10 is not limited to the structure of the electronic device and the electronic apparatus described above. For example, the electronics may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 10, or have a different configuration than shown in FIG. 10.
The memory 1002 may be configured to store software programs and modules, such as program instructions/modules corresponding to the method and apparatus for detecting audio information in the embodiment of the present application, and the processor 1004 executes the software programs and modules stored in the memory 1002 to perform various functional applications and data processing, that is, implement the method for detecting audio information described above. The memory 1002 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory. In some examples, the memory 1002 may further include memory located remotely from the processor 1004, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 1002 may be used for storing information such as a target account number, but is not limited to. As an example, as shown in fig. 10, the memory 1002 may include, but is not limited to, an acquisition module 802, a matching module 804, an identification module 806, and a transfer module 808 in a detection device including the audio information. In addition, other module units in the above-mentioned audio information detection device may be included, but are not limited to, and are not described in detail in this example.
Optionally, the transmission device 1006 is configured to receive or transmit data via a network. Specific examples of the network described above may include wired networks and wireless networks. In one example, the transmission means 1006 includes a network adapter (Network Interface Controller, NIC) that can be connected to other network devices and routers via a network cable to communicate with the internet or a local area network. In one example, the transmission device 1006 is a Radio Frequency (RF) module for communicating with the internet wirelessly.
In addition, the electronic device further includes: a display 1008 for displaying the transferred virtual resource and the like; and a connection bus 1010 for connecting the respective module parts in the above-described electronic apparatus.
In other embodiments, the terminal device or the server may be a node in a distributed system, where the distributed system may be a blockchain system, and the blockchain system may be a distributed system formed by connecting the plurality of nodes through a network communication. Among them, the nodes may form a Peer-To-Peer (P2P) network, and any type of computing device, such as a server, a terminal, etc., may become a node in the blockchain system by joining the Peer-To-Peer network.
According to an aspect of the present application, there is provided a computer-readable storage medium, from which a processor of a computer device reads the computer instructions, the processor executing the computer instructions, so that the computer device performs the method of detecting audio information provided in various alternative implementations of the above-described aspect of detecting audio information.
Alternatively, in the present embodiment, the above-described computer-readable storage medium may be configured to store a computer program for performing the steps of:
s1, acquiring target audio information to be detected, wherein the target audio information is generated after a call is initiated to a target account;
s2, matching the target voiceprint information in the target audio information through a preset voiceprint matching model, wherein the voiceprint matching model is used for matching the target voiceprint information in the target audio information with the marked voiceprint information;
s3, under the condition that the voice print matching model is not matched with the labeled voice print information corresponding to the target voice print information in the target audio information, carrying out semantic recognition on the target audio information according to a preset semantic recognition model to obtain a target recognition result, wherein the semantic recognition model is used for recognizing the semantic of the target audio information, and the target recognition result is used for indicating whether the target account is an invalid account;
And S4, transferring the virtual resource corresponding to the target account when the target identification result indicates that the target account is an invalid account, wherein the virtual resource is configured to be allowed to be transferred when the target account is the invalid account.
Alternatively, in this embodiment, it will be understood by those skilled in the art that all or part of the steps in the methods of the above embodiments may be performed by a program for instructing a terminal device to execute the steps, where the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the method described in the embodiments of the present application.
In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In several embodiments provided by the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims (15)

1. A method for detecting audio information, comprising:
acquiring target audio information to be detected, wherein the target audio information is generated after a call is initiated to a target account;
matching the target voiceprint information in the target audio information through a preset voiceprint matching model, wherein the voiceprint matching model is used for matching the target voiceprint information in the target audio information with the marked voiceprint information;
under the condition that the voiceprint matching model is not matched with the marked voiceprint information corresponding to the target voiceprint information in the target audio information, carrying out semantic recognition on the target audio information according to a preset semantic recognition model to obtain a target recognition result, wherein the semantic recognition model is used for recognizing the semantic of the target audio information, and the target recognition result is used for indicating whether the target account is the invalid account;
And transferring a virtual resource corresponding to the target account when the target identification result indicates that the target account is the invalid account, wherein the virtual resource is configured to be allowed to be transferred when the target account is the invalid account.
2. The method of claim 1, wherein the obtaining the target audio information to be detected comprises:
acquiring a target account set, wherein the target account set is an account set provided by a media information publisher;
selecting the target account from the target account set, and initiating a call request to the target account;
and acquiring the target audio information in the process of initiating a call request to the target account.
3. The method of claim 2, wherein selecting the target account from the set of target accounts and initiating a call request to the target account comprises:
acquiring an invalid account identifier provided by the media information publisher, wherein the invalid account identifier is an identifier obtained by the media information publisher for an invalid account mark in the target account set;
And selecting the target account from the target account set according to the invalid account identifier, and initiating a call request to the target account, wherein the target account is an account with the invalid account identifier in the target account set.
4. The method according to claim 2, wherein the method further comprises:
and under the condition that the target identification result indicates that the target account is not the invalid account, prohibiting the transfer of virtual resources corresponding to the target account to the media information publisher.
5. The method according to claim 1, wherein the matching the target voiceprint information in the target audio information by a preset voiceprint matching model includes:
analyzing the real-time voice stream of the target audio information according to a preset time period to obtain the target voiceprint information;
and inputting the target voiceprint information into the voiceprint matching model, and matching the target voiceprint information with voiceprint information marked in the voiceprint matching model through the voiceprint matching model.
6. The method of claim 5, wherein the step of determining the position of the probe is performed,
Inputting the target voiceprint information into the voiceprint matching model, and matching the target voiceprint information with voiceprint information marked in the voiceprint matching model through the voiceprint matching model, wherein the method comprises the following steps of: under the condition that the target voiceprint information is matched with first voiceprint information, determining a first labeling result of the first voiceprint information as a target matching result output by the voiceprint matching model, wherein the labeled voiceprint information comprises the first voiceprint information, and the first labeling result comprises calling party abnormality;
the method further comprises the steps of: and stopping the call with the target account to generate alarm information under the condition that the target matching result indicates that the call state of the call with the target account is abnormal, wherein the alarm information is used for indicating that the account of the current call with the target account is abnormal.
7. The method of claim 5, wherein the step of determining the position of the probe is performed,
inputting the target voiceprint information into the voiceprint matching model, and matching the target voiceprint information with voiceprint information marked in the voiceprint matching model through the voiceprint matching model, wherein the method comprises the following steps of: under the condition that the target voiceprint information is matched with second voiceprint information, determining a second labeling result of the second voiceprint information as a target matching result output by the voiceprint matching model, wherein the labeled voiceprint information comprises the second voiceprint information, and the second labeling result comprises abnormality of a called party;
The method further comprises the steps of: and stopping the call with the target account and determining the target account as the invalid account when the target matching result indicates that the call state of the call with the target account is abnormal for the called party.
8. The method according to any one of claims 1 to 7, wherein, in the case that the voiceprint matching model does not match the annotated voiceprint information corresponding to the target voiceprint information in the target audio information, performing semantic recognition on the target audio information according to a preset semantic recognition model to obtain a target recognition result, including:
storing the target audio information if the voiceprint matching model does not match the annotated voiceprint information corresponding to the target voiceprint information in the target audio information;
inputting the target audio information into the semantic recognition model, and performing semantic recognition on the target audio information to obtain the target recognition result, wherein the semantic recognition model is used for acquiring text information corresponding to the target audio information and performing semantic recognition on the text information.
9. The method of claim 8, wherein the method further comprises:
stopping the call with the target account to generate alarm information under the condition that the target identification result also indicates that the call state of the call with the target account is abnormal, wherein the alarm information is used for indicating that the account of the current call with the target account is abnormal;
and stopping the call with the target account and determining the target account as the invalid account under the condition that the target identification result also indicates that the call state of the call with the target account is abnormal.
10. The method according to claim 9, wherein the method further comprises:
when the target identification result also indicates that the call state of the call with the target account is abnormal, adding third voiceprint information corresponding to the target audio information into the voiceprint matching model, and determining a labeling result corresponding to the third voiceprint information as abnormal calling party;
and when the target identification result also indicates that the call state of the call with the target account is abnormal, adding fourth voiceprint information corresponding to the target audio information into the voiceprint matching model, and determining the labeling result corresponding to the fourth voiceprint information as abnormal of the called party.
11. An apparatus for detecting audio information, comprising:
the acquisition module is used for acquiring target audio information to be detected, wherein the target audio information is generated after a call is initiated to a target account;
the matching module is used for matching the target voiceprint information in the target audio information through a preset voiceprint matching model, wherein the voiceprint matching model is used for matching the target voiceprint information in the target audio information with the marked voiceprint information;
the identification module is used for carrying out semantic identification on the target audio information according to a preset semantic identification model to obtain a target identification result when the voiceprint matching model does not match the marked voiceprint information corresponding to the target voiceprint information in the target audio information, wherein the semantic identification model is used for identifying the semantic of the target audio information, and the target identification result is used for indicating whether the target account is the invalid account;
and the transferring module is used for transferring the virtual resource corresponding to the target account when the target identification result indicates that the target account is the invalid account, wherein the virtual resource is configured to be allowed to be transferred when the target account is the invalid account.
12. The apparatus of claim 11, wherein the apparatus is configured to obtain the target audio information to be detected by:
acquiring a target account set, wherein the target account set is an account set provided by a media information publisher;
selecting the target account from the target account set, and initiating a call request to the target account;
and acquiring the target audio information in the process of initiating a call request to the target account.
13. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored program, wherein the program is executable by a terminal device or a computer to perform the method of any one of claims 1 to 10.
14. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method as claimed in any one of claims 1 to 10.
15. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method according to any of the claims 1 to 10 by means of the computer program.
CN202210447094.1A 2022-04-26 2022-04-26 Audio information detection method and device, storage medium and electronic equipment Pending CN116996613A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210447094.1A CN116996613A (en) 2022-04-26 2022-04-26 Audio information detection method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210447094.1A CN116996613A (en) 2022-04-26 2022-04-26 Audio information detection method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN116996613A true CN116996613A (en) 2023-11-03

Family

ID=88530761

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210447094.1A Pending CN116996613A (en) 2022-04-26 2022-04-26 Audio information detection method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN116996613A (en)

Similar Documents

Publication Publication Date Title
US10244111B1 (en) System for providing data to an interactive response system
EP2297933B1 (en) Method and system for handling a telephone call
KR20190011570A (en) Method for providing chatting service with chatbot assisted by human agents
AU2017415315B2 (en) Integrating virtual and human agents in a multi-channel support system for complex software applications
CN101341532A (en) Sharing voice application processing via markup
US9270811B1 (en) Visual options for audio menu
WO2014140970A2 (en) Voice print tagging of interactive voice response sessions
CN109697243A (en) Ring-back tone clustering method, device, medium and calculating equipment
CN110381221A (en) Call processing method, device, system, equipment and computer storage medium
CN110176252A (en) Intelligent sound quality detecting method and system based on risk management and control mode
CN109271503A (en) Intelligent answer method, apparatus, equipment and storage medium
CN111126071B (en) Method and device for determining questioning text data and method for processing customer service group data
JP2019144400A (en) Controller, control method and computer program
CN116996613A (en) Audio information detection method and device, storage medium and electronic equipment
CN106371905B (en) Application program operation method and device and server
CN114067842B (en) Customer satisfaction degree identification method and device, storage medium and electronic equipment
CN113840040B (en) Man-machine cooperation outbound method, device, equipment and storage medium
CN110765242A (en) Method, device and system for providing customer service information
JP7237381B1 (en) Program, information processing system and information processing method
CN110047486A (en) Sound control method, device, server, system and storage medium
CN114726635A (en) Authority verification method, device, electronic equipment and medium
CN114202363A (en) Artificial intelligence based call method, device, computer equipment and medium
CN114157763A (en) Information processing method and device in interactive process, terminal and storage medium
US20220417360A1 (en) Suspicious call handling system, suspicious call handling method, outgoing/incoming call information collection server and program
CN114819981A (en) Customer service problem processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination