CN114512144B - Method, device, medium and equipment for identifying malicious voice information - Google Patents

Method, device, medium and equipment for identifying malicious voice information Download PDF

Info

Publication number
CN114512144B
CN114512144B CN202210106763.9A CN202210106763A CN114512144B CN 114512144 B CN114512144 B CN 114512144B CN 202210106763 A CN202210106763 A CN 202210106763A CN 114512144 B CN114512144 B CN 114512144B
Authority
CN
China
Prior art keywords
conversation
information
data
real
emotion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210106763.9A
Other languages
Chinese (zh)
Other versions
CN114512144A (en
Inventor
吴昌明
管彦杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PEOPLE'S PUBLIC SECURITY UNIVERSITY OF CHINA
Original Assignee
PEOPLE'S PUBLIC SECURITY UNIVERSITY OF CHINA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PEOPLE'S PUBLIC SECURITY UNIVERSITY OF CHINA filed Critical PEOPLE'S PUBLIC SECURITY UNIVERSITY OF CHINA
Priority to CN202210106763.9A priority Critical patent/CN114512144B/en
Publication of CN114512144A publication Critical patent/CN114512144A/en
Application granted granted Critical
Publication of CN114512144B publication Critical patent/CN114512144B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Child & Adolescent Psychology (AREA)
  • General Health & Medical Sciences (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention relates to a method for identifying malicious voice information, which comprises the steps of collecting real-time conversation data of a monitored group, and taking conversation personnel as key monitoring personnel when the deviation between the emotion curve of conversation personnel in the monitored group and the emotion curve of the monitored group is larger than a preset deviation value; obtaining voiceprint characteristic information and corpus characteristic information of key monitoring personnel; and carrying out security check on the voiceprint characteristic information and the corpus characteristic information of the important monitoring personnel according to a preset security rule, and determining whether the voiceprint characteristic information and the corpus characteristic information of the important monitoring personnel contain malicious information or malicious language. The invention realizes the identification of malicious information on voice information in the micro-letter group and the identification of threat and other actions by utilizing the micro-letter group, thereby protecting the security of the micro-letter group. The invention also relates to a device for identifying malicious voice information, a storage medium and equipment.

Description

Method, device, medium and equipment for identifying malicious voice information
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, a medium, and a device for identifying malicious voice information.
Background
With the development of mobile internet, mobile terminals are becoming a platform for communication with people, especially in recent years, chat software and rapid development of network construction are also becoming lower and lower in cost, and communication modes between users are changed from traditional short messages/phones to main chat software such as WeChat, so that the cost of sending messages is basically zero, and meanwhile malicious information comprises various malicious marketing information, harassment information and the like, so that the whole network is also filled. Taking WeChat as an example, various voice messages in a WeChat group are messages that users send to the group or individual containing malicious information. How to identify such malicious information is a technical problem that needs to be solved at present.
Disclosure of Invention
The invention aims to solve the technical problem of providing a method, a device, a medium and equipment for converting GeoJSON data format into three-dimensional GLB format aiming at the defects of the prior art.
The technical scheme for solving the technical problems is as follows:
a method of identifying malicious speech information, the method comprising:
Collecting real-time session data of a monitored group, and drawing an emotion curve of the monitored group based on the real-time session data;
Splitting the real-time conversation data according to the conversation participants to obtain conversation data of each conversation participant, and drawing emotion curves of the conversation participants based on the conversation data of each conversation participant;
When the deviation between the emotion curve of the conversation person in the monitored group and the emotion curve of the monitored group is larger than a preset deviation value, taking the conversation person as a key monitoring person;
Voiceprint feature recognition is carried out on the conversation data of the key monitoring personnel to obtain voiceprint feature information of the key monitoring personnel, and corpus feature information is extracted from the conversation data of the key monitoring personnel by utilizing a preset algorithm;
And carrying out security check on the voiceprint characteristic information and the corpus characteristic information of the key monitoring personnel according to preset security rules, and determining whether the voiceprint characteristic information and the corpus characteristic information of the key monitoring personnel contain malicious information or malicious language, wherein the preset security rules comprise at least one type of rules in characters and language.
On the basis of the technical scheme, the invention can be improved as follows.
Further, the drawing the emotion curve of the monitored group based on the real-time session data specifically includes:
Performing intonation recognition on the real-time conversation data according to the time sequence of the real-time conversation data, determining emotional states in the real-time conversation data according to the intonation recognition result, wherein the emotional states comprise calm, happiness, difficulty, gas generation and fear,
And drawing emotion curves of the monitored groups according to the emotion states and the time sequence of the real-time session data, wherein preset values of all the emotion states are respectively set.
Further, the drawing of the emotion curve of the conversation person based on the conversation data of each conversation person specifically includes:
According to the time sequence of the real-time conversation data, carrying out intonation recognition on the real-time conversation data of the conversation personnel, determining the emotional state in the real-time conversation data according to the intonation recognition result, wherein the emotional state comprises calm, happiness, difficulty, gas and fear,
And drawing the emotion curves of the conversation personnel according to the emotion states and the time sequence of the real-time conversation data.
Further, when the deviation between the emotion curve of the conversation person in the monitored group and the emotion curve of the monitored group is greater than a preset deviation value, taking the conversation person as a key monitoring person, specifically including:
performing deviation comparison on the emotion curves of the conversation staff in the monitored group and the emotion curves of the monitored group to obtain time sequence deviation data;
judging whether the time sequence deviation data has deviation data larger than a preset deviation value or not;
if yes, taking the session personnel as key monitoring personnel.
Further, the voice print feature recognition is performed on the session data of the key monitoring personnel to obtain voice print feature information of the key monitoring personnel, and corpus feature information is extracted from the session data of the key monitoring personnel by using a preset algorithm, which specifically comprises the following steps:
Performing voiceprint recognition on session data of the key monitoring personnel by using the established voiceprint model to obtain voiceprint characteristic information of the key monitoring personnel;
Converting the session data of the key monitoring personnel into text information, and segmenting the text information to obtain a plurality of words;
And respectively inputting the vocabularies as trained word models to obtain word embedding characteristics corresponding to the vocabularies, thereby obtaining the corpus characteristic information of the key monitoring personnel.
Further, according to a preset security rule, performing security check on the voiceprint feature information and the corpus feature information of the key monitoring personnel, and determining whether the voiceprint feature information and the corpus feature information of the key monitoring personnel contain malicious information or malicious language, specifically including:
Calculating the similarity between the voiceprint characteristic information of the key monitoring personnel and the early warning voiceprint characteristic information in the preset safety rule, and if the similarity value is larger than a preset similarity threshold value, enabling malicious language to exist in the voiceprint characteristic information of the key monitoring personnel;
Judging whether malicious information in the preset safety rule exists in the corpus feature information, and if so, including the malicious information.
The method has the beneficial effects that: the method for identifying malicious voice information comprises the steps of collecting real-time session data of a monitored group, and drawing an emotion curve of the monitored group based on the real-time session data; splitting the real-time conversation data according to the conversation participants to obtain conversation data of each conversation participant, and drawing emotion curves of the conversation participants based on the conversation data of each conversation participant; when the deviation between the emotion curve of the conversation person in the monitored group and the emotion curve of the monitored group is larger than a preset deviation value, taking the conversation person as a key monitoring person; voiceprint feature recognition is carried out on the conversation data of the key monitoring personnel to obtain voiceprint feature information of the key monitoring personnel, and corpus feature information is extracted from the conversation data of the key monitoring personnel by utilizing a preset algorithm; and carrying out security check on the voiceprint characteristic information and the corpus characteristic information of the key monitoring personnel according to preset security rules, and determining whether the voiceprint characteristic information and the corpus characteristic information of the key monitoring personnel contain malicious information or malicious language, wherein the preset security rules comprise at least one type of rules in characters and language. The invention realizes the identification of malicious information on voice information in the micro-letter group and the identification of threat and other actions by utilizing the micro-letter group, thereby protecting the security of the micro-letter group.
The other technical scheme for solving the technical problems is as follows:
An apparatus for identifying malicious speech information, the apparatus comprising:
The acquisition module is used for acquiring real-time session data of the monitored group;
The analysis module is used for drawing an emotion curve of the monitored group based on the real-time session data; splitting the real-time conversation data according to the conversation participants to obtain conversation data of each conversation participant, and drawing emotion curves of the conversation participants based on the conversation data of each conversation participant; when the deviation between the emotion curve of the conversation person in the monitored group and the emotion curve of the monitored group is larger than a preset deviation value, taking the conversation person as a key monitoring person;
the judging module is used for carrying out voiceprint feature recognition on the conversation data of the key monitoring personnel to obtain voiceprint feature information of the key monitoring personnel, and extracting corpus feature information from the conversation data of the key monitoring personnel by utilizing a preset algorithm;
And carrying out security check on the voiceprint characteristic information and the corpus characteristic information of the key monitoring personnel according to preset security rules, and determining whether the voiceprint characteristic information and the corpus characteristic information of the key monitoring personnel contain malicious information or malicious language, wherein the preset security rules comprise at least one type of rules in characters and language.
Further, the analysis module is specifically configured to perform intonation recognition on the real-time session data according to the time sequence of the real-time session data, and determine an emotional state in the real-time session data according to the intonation recognition result, where the emotional state includes calm, happiness, difficulty, gas generation and fear,
And drawing emotion curves of the monitored groups according to the emotion states and the time sequence of the real-time session data, wherein preset values of all the emotion states are respectively set.
Furthermore, the present invention provides a computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, implements the steps of the method of any of the above-mentioned technical solutions.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method according to any of the above technical solutions when executing the program.
Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following description will briefly explain the embodiments of the present invention or the drawings used in the description of the prior art, and it is obvious that the drawings described below are only some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for identifying malicious voice information according to an embodiment of the present invention;
Fig. 2 is a schematic block diagram of an apparatus for recognizing malicious voice information according to another embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
As shown in fig. 1, a method for identifying malicious voice information according to an embodiment of the present invention includes the following steps:
110. and collecting real-time session data of the monitored group, and drawing an emotion curve of the monitored group based on the real-time session data.
120. Splitting the real-time conversation data according to the conversation participants to obtain conversation data of each conversation participant, and drawing emotion curves of the conversation participants based on the conversation data of each conversation participant.
130. And when the deviation between the emotion curve of the conversation person in the monitored group and the emotion curve of the monitored group is larger than a preset deviation value, taking the conversation person as a key monitoring person.
140. And carrying out voiceprint feature recognition on the session data of the key monitoring personnel to obtain voiceprint feature information of the key monitoring personnel, and extracting corpus feature information from the session data of the key monitoring personnel by utilizing a preset algorithm.
150. And carrying out security check on the voiceprint characteristic information and the corpus characteristic information of the key monitoring personnel according to preset security rules, and determining whether the voiceprint characteristic information and the corpus characteristic information of the key monitoring personnel contain malicious information or malicious language, wherein the preset security rules comprise at least one type of rules in characters and language.
Based on the above embodiment, step 110 specifically includes:
111. And carrying out intonation recognition on the real-time conversation data according to the time sequence of the real-time conversation data, and determining emotional states in the real-time conversation data according to the intonation recognition result, wherein the emotional states comprise calm, happiness, difficulty, gas generation and fear.
It is understood that emotional states also include frightening, threatening, exciting, etc., each emotional state being set to a preset value, e.g., calm set to 0, happy set to 5, excited set to 10, offensive set to-5, frightened set to-20. There are many methods for identifying intonation, and the description of the method is omitted.
112. And drawing emotion curves of the monitored groups according to the emotion states and the time sequence of the real-time session data, wherein preset values of all the emotion states are respectively set.
Further, in step 120, the drawing the emotion curve of the conversation person based on the conversation data of each conversation person specifically includes:
121. According to the time sequence of the real-time conversation data, carrying out intonation recognition on the real-time conversation data of conversation staff, and determining the emotion states in the real-time conversation data according to the intonation recognition result, wherein the emotion states comprise calm, happiness, difficulty, gas and fear.
122. And drawing the emotion curves of the conversation personnel according to the emotion states and the time sequence of the real-time conversation data.
Further, step 130 specifically includes:
And carrying out deviation comparison on the emotion curve of the conversation person in the monitored group and the emotion curve of the monitored group to obtain time sequence deviation data.
And judging whether the time sequence deviation data has deviation data larger than a preset deviation value or not.
If yes, taking the session personnel as key monitoring personnel.
It will be appreciated that there is some deviation between the emotional profile of the group session itself and the emotional profile of the person in question, but attention is required if the deviation between the emotional profile of the person in question and the group session is great.
Further, step 140 specifically includes:
and carrying out voiceprint recognition on the session data of the key monitoring personnel by using the established voiceprint model to obtain voiceprint characteristic information of the key monitoring personnel.
And converting the session data of the key monitoring personnel into text information, and segmenting the text information to obtain a plurality of words.
And respectively inputting the vocabularies as trained word models to obtain word embedding characteristics corresponding to the vocabularies, thereby obtaining the corpus characteristic information of the key monitoring personnel.
It should be understood that the voice print recognition method for the session data is many, and the method for obtaining the corpus feature information of the key monitoring personnel is also many, which is not described in detail in the present application.
Further, step 150 specifically includes:
and calculating the similarity between the voiceprint characteristic information of the key monitoring personnel and the early warning voiceprint characteristic information in the preset safety rule, and if the similarity value is larger than a preset similarity threshold value, enabling malicious language to exist in the voiceprint characteristic information of the key monitoring personnel.
Judging whether malicious information in the preset safety rule exists in the corpus feature information, and if so, including the malicious information.
It should be appreciated that a number of pre-alarm voiceprint feature information is stored in the pre-set security rules, and that the pre-alarm voiceprint information is voiceprint information about frightening, threatening, spoofing, etc. When malicious information exists in the corpus characteristic information and malicious language gas exists in the voiceprint characteristic information, intensive investigation is needed for key monitoring personnel.
The method for identifying malicious voice information provided by the embodiment comprises the steps of collecting real-time session data of a monitored group, and drawing an emotion curve of the monitored group based on the real-time session data; splitting the real-time conversation data according to the conversation participants to obtain conversation data of each conversation participant, and drawing emotion curves of the conversation participants based on the conversation data of each conversation participant; when the deviation between the emotion curve of the conversation person in the monitored group and the emotion curve of the monitored group is larger than a preset deviation value, taking the conversation person as a key monitoring person; voiceprint feature recognition is carried out on the conversation data of the key monitoring personnel to obtain voiceprint feature information of the key monitoring personnel, and corpus feature information is extracted from the conversation data of the key monitoring personnel by utilizing a preset algorithm; and carrying out security check on the voiceprint characteristic information and the corpus characteristic information of the key monitoring personnel according to preset security rules, and determining whether the voiceprint characteristic information and the corpus characteristic information of the key monitoring personnel contain malicious information or malicious language, wherein the preset security rules comprise at least one type of rules in characters and language. The invention realizes the identification of malicious information on voice information in the micro-letter group and the identification of threat and other actions by utilizing the micro-letter group, thereby protecting the security of the micro-letter group.
As shown in fig. 2, an apparatus for recognizing malicious voice information, the apparatus comprising:
and the acquisition module is used for acquiring the real-time session data of the monitored group.
The analysis module is used for drawing an emotion curve of the monitored group based on the real-time session data; splitting the real-time conversation data according to the conversation participants to obtain conversation data of each conversation participant, and drawing emotion curves of the conversation participants based on the conversation data of each conversation participant; and when the deviation between the emotion curve of the conversation person in the monitored group and the emotion curve of the monitored group is larger than a preset deviation value, taking the conversation person as a key monitoring person.
And the judging module is used for carrying out voiceprint feature recognition on the session data of the key monitoring personnel to obtain voiceprint feature information of the key monitoring personnel, and extracting corpus feature information from the session data of the key monitoring personnel by utilizing a preset algorithm.
And carrying out security check on the voiceprint characteristic information and the corpus characteristic information of the key monitoring personnel according to preset security rules, and determining whether the voiceprint characteristic information and the corpus characteristic information of the key monitoring personnel contain malicious information or malicious language, wherein the preset security rules comprise at least one type of rules in characters and language.
Further, the analysis module is specifically configured to perform intonation recognition on the real-time session data according to the time sequence of the real-time session data, and determine an emotional state in the real-time session data according to the intonation recognition result, where the emotional state includes calm, happiness, difficulty, gas generation and fear,
And drawing emotion curves of the monitored groups according to the emotion states and the time sequence of the real-time session data, wherein preset values of all the emotion states are respectively set.
Furthermore, the present invention provides a computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, implements the steps of the method of any of the above-mentioned technical solutions.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method according to any of the above technical solutions when executing the program.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other manners. For example, the apparatus/terminal device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium.
Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a Read-only memory (ROM), a random access memory (RAM, randomAccessMemory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium may include content that is subject to appropriate increases and decreases as required by jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is not included as electrical carrier signals and telecommunication signals.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.
The present invention is not limited to the above embodiments, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the present invention, and these modifications and substitutions are intended to be included in the scope of the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (10)

1. A method of identifying malicious speech information, the method comprising:
Collecting real-time session data of a monitored group, and drawing an emotion curve of the monitored group based on the real-time session data;
Splitting the real-time conversation data according to the conversation participants to obtain conversation data of each conversation participant, and drawing emotion curves of the conversation participants based on the conversation data of each conversation participant;
When the deviation between the emotion curve of the conversation person in the monitored group and the emotion curve of the monitored group is larger than a preset deviation value, taking the conversation person as a key monitoring person;
Voiceprint feature recognition is carried out on the conversation data of the key monitoring personnel to obtain voiceprint feature information of the key monitoring personnel, and corpus feature information is extracted from the conversation data of the key monitoring personnel by utilizing a preset algorithm;
And carrying out security check on the voiceprint characteristic information and the corpus characteristic information of the key monitoring personnel according to preset security rules, and determining whether the voiceprint characteristic information and the corpus characteristic information of the key monitoring personnel contain malicious information or malicious language, wherein the preset security rules comprise at least one type of rules in characters and language.
2. The method for identifying malicious voice information according to claim 1, wherein the drawing the emotion curve of the monitored group based on the real-time session data specifically comprises:
Performing intonation recognition on the real-time conversation data according to the time sequence of the real-time conversation data, determining emotional states in the real-time conversation data according to the intonation recognition result, wherein the emotional states comprise calm, happiness, difficulty, gas generation and fear,
And drawing emotion curves of the monitored groups according to the emotion states and the time sequence of the real-time session data, wherein preset values of all the emotion states are respectively set.
3. The method for identifying malicious voice information according to claim 2, wherein the drawing the emotion curve of the conversation person based on the conversation data of each conversation person specifically comprises:
According to the time sequence of the real-time conversation data, carrying out intonation recognition on the real-time conversation data of the conversation personnel, determining the emotional state in the real-time conversation data according to the intonation recognition result, wherein the emotional state comprises calm, happiness, difficulty, gas and fear,
And drawing the emotion curves of the conversation personnel according to the emotion states and the time sequence of the real-time conversation data.
4. The method for identifying malicious voice information according to claim 3, wherein when a deviation between an emotion curve of a person in conversation among the monitored groups and an emotion curve of the monitored groups is greater than a preset deviation value, the method for identifying malicious voice information specifically comprises the steps of:
performing deviation comparison on the emotion curves of the conversation staff in the monitored group and the emotion curves of the monitored group to obtain time sequence deviation data;
judging whether the time sequence deviation data has deviation data larger than a preset deviation value or not;
if yes, taking the session personnel as key monitoring personnel.
5. The method for identifying malicious voice information according to claim 4, wherein the voice feature identification is performed on the session data of the key monitor person to obtain voice feature information of the key monitor person, and extracting corpus feature information from the session data of the key monitor person by using a preset algorithm, specifically comprising:
Performing voiceprint recognition on session data of the key monitoring personnel by using the established voiceprint model to obtain voiceprint characteristic information of the key monitoring personnel;
Converting the session data of the key monitoring personnel into text information, and segmenting the text information to obtain a plurality of words;
And respectively inputting the vocabularies as trained word models to obtain word embedding characteristics corresponding to the vocabularies, thereby obtaining the corpus characteristic information of the key monitoring personnel.
6. The method for identifying malicious voice information according to claim 5, wherein the performing security check on the voiceprint feature information and the corpus feature information of the key monitor according to a preset security rule to determine whether the voiceprint feature information and the corpus feature information of the key monitor contain malicious information or malicious language, specifically includes:
Calculating the similarity between the voiceprint characteristic information of the key monitoring personnel and the early warning voiceprint characteristic information in the preset safety rule, and if the similarity value is larger than a preset similarity threshold value, enabling malicious language to exist in the voiceprint characteristic information of the key monitoring personnel;
Judging whether malicious information in the preset safety rule exists in the corpus feature information, and if so, including the malicious information.
7. An apparatus for identifying malicious speech information, the apparatus comprising:
The acquisition module is used for acquiring real-time session data of the monitored group;
The analysis module is used for drawing an emotion curve of the monitored group based on the real-time session data; splitting the real-time conversation data according to the conversation participants to obtain conversation data of each conversation participant, and drawing emotion curves of the conversation participants based on the conversation data of each conversation participant; when the deviation between the emotion curve of the conversation person in the monitored group and the emotion curve of the monitored group is larger than a preset deviation value, taking the conversation person as a key monitoring person;
the judging module is used for carrying out voiceprint feature recognition on the conversation data of the key monitoring personnel to obtain voiceprint feature information of the key monitoring personnel, and extracting corpus feature information from the conversation data of the key monitoring personnel by utilizing a preset algorithm;
And carrying out security check on the voiceprint characteristic information and the corpus characteristic information of the key monitoring personnel according to preset security rules, and determining whether the voiceprint characteristic information and the corpus characteristic information of the key monitoring personnel contain malicious information or malicious language, wherein the preset security rules comprise at least one type of rules in characters and language.
8. The apparatus for recognizing malicious speech information as recited in claim 7, wherein the analysis module is configured to perform intonation recognition on the real-time conversation data in a time sequence of the real-time conversation data, and determine an emotional state in the real-time conversation data based on the intonation recognition result, the emotional state including calm, happy, offensive, angry, and fear,
And drawing emotion curves of the monitored groups according to the emotion states and the time sequence of the real-time session data, wherein preset values of all the emotion states are respectively set.
9. A storage medium having stored thereon computer instructions which, when run, perform the steps of the method of any of claims 1 to 6.
10. A terminal comprising a memory and a processor, the memory having stored thereon computer instructions executable on the processor, wherein the processor, when executing the computer instructions, performs the steps of the method of any of claims 1 to 6.
CN202210106763.9A 2022-01-28 2022-01-28 Method, device, medium and equipment for identifying malicious voice information Active CN114512144B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210106763.9A CN114512144B (en) 2022-01-28 2022-01-28 Method, device, medium and equipment for identifying malicious voice information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210106763.9A CN114512144B (en) 2022-01-28 2022-01-28 Method, device, medium and equipment for identifying malicious voice information

Publications (2)

Publication Number Publication Date
CN114512144A CN114512144A (en) 2022-05-17
CN114512144B true CN114512144B (en) 2024-05-17

Family

ID=81552068

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210106763.9A Active CN114512144B (en) 2022-01-28 2022-01-28 Method, device, medium and equipment for identifying malicious voice information

Country Status (1)

Country Link
CN (1) CN114512144B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103701999A (en) * 2012-09-27 2014-04-02 中国电信股份有限公司 Method and system for monitoring voice communication of call center
JP2017135642A (en) * 2016-01-29 2017-08-03 株式会社日立システムズ Telephone voice monitoring evaluation system
CN107680602A (en) * 2017-08-24 2018-02-09 平安科技(深圳)有限公司 Voice fraud recognition methods, device, terminal device and storage medium
CN108764010A (en) * 2018-03-23 2018-11-06 姜涵予 Emotional state determines method and device
CN110769425A (en) * 2019-09-18 2020-02-07 平安科技(深圳)有限公司 Method and device for judging abnormal call object, computer equipment and storage medium
CN110837813A (en) * 2019-11-14 2020-02-25 爱驰汽车有限公司 Environment equipment control method and device, electronic equipment and storage medium
CN113343058A (en) * 2021-05-31 2021-09-03 平安普惠企业管理有限公司 Voice session supervision method and device, computer equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103701999A (en) * 2012-09-27 2014-04-02 中国电信股份有限公司 Method and system for monitoring voice communication of call center
JP2017135642A (en) * 2016-01-29 2017-08-03 株式会社日立システムズ Telephone voice monitoring evaluation system
CN107680602A (en) * 2017-08-24 2018-02-09 平安科技(深圳)有限公司 Voice fraud recognition methods, device, terminal device and storage medium
CN108764010A (en) * 2018-03-23 2018-11-06 姜涵予 Emotional state determines method and device
CN110769425A (en) * 2019-09-18 2020-02-07 平安科技(深圳)有限公司 Method and device for judging abnormal call object, computer equipment and storage medium
CN110837813A (en) * 2019-11-14 2020-02-25 爱驰汽车有限公司 Environment equipment control method and device, electronic equipment and storage medium
CN113343058A (en) * 2021-05-31 2021-09-03 平安普惠企业管理有限公司 Voice session supervision method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN114512144A (en) 2022-05-17

Similar Documents

Publication Publication Date Title
CN110910901B (en) Emotion recognition method and device, electronic equipment and readable storage medium
CN107222865B (en) Communication swindle real-time detection method and system based on suspicious actions identification
CN112804400B (en) Customer service call voice quality inspection method and device, electronic equipment and storage medium
CN109769099B (en) Method and device for detecting abnormality of call person
CN107395352B (en) Personal identification method and device based on vocal print
CN108682420B (en) Audio and video call dialect recognition method and terminal equipment
CN109327632A (en) Intelligent quality inspection system, method and the computer readable storage medium of customer service recording
CN112468659B (en) Quality evaluation method, device, equipment and storage medium applied to telephone customer service
CN113191787A (en) Telecommunication data processing method, device electronic equipment and storage medium
CN113077821B (en) Audio quality detection method and device, electronic equipment and storage medium
CN110704618B (en) Method and device for determining standard problem corresponding to dialogue data
CN111739558A (en) Monitoring system, method, device, server and storage medium
CN110047518A (en) A kind of speech emotional analysis system
CN113903363A (en) Violation detection method, device, equipment and medium based on artificial intelligence
CN115171731A (en) Emotion category determination method, device and equipment and readable storage medium
CN115050457A (en) Method, device, equipment, medium and product for evaluating quality of on-line inquiry service
CN114512144B (en) Method, device, medium and equipment for identifying malicious voice information
CN117119104A (en) Telecom fraud active detection processing method based on virtual character orientation training
JP6733901B2 (en) Psychological analysis device, psychological analysis method, and program
US20160203121A1 (en) Analysis object determination device and analysis object determination method
CN111353147A (en) Password strength evaluation method, device, equipment and readable storage medium
CN111464687A (en) Strange call request processing method and device
CN110580899A (en) Voice recognition method and device, storage medium and computing equipment
CN114254088A (en) Method for constructing automatic response model and automatic response method
CN112380323A (en) Junk information removing system and method based on Chinese word segmentation recognition technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant