CN112784016A

CN112784016A - Method and equipment for detecting speech information

Info

Publication number: CN112784016A
Application number: CN202110112955.6A
Authority: CN
Inventors: 林征尔
Original assignee: Shanghai Lianshang Network Technology Co Ltd
Current assignee: Shanghai Lianshang Network Technology Co Ltd
Priority date: 2021-01-27
Filing date: 2021-01-27
Publication date: 2021-05-11

Abstract

The application aims to provide a method for detecting speech information, which comprises the following steps: acquiring the speech information which is proposed to be published by a user aiming at a comment object and the object related information corresponding to the comment object; sensitive word detection is carried out on the speech information through a sensitive information decision tree, and sensitive information corresponding to the speech information is determined, wherein the sensitive information comprises at least one sensitive word contained in the speech information, and the sensitive information decision tree is constructed according to a sensitive word stock; combining the relevant information of the object, and detecting and obtaining offensive identification result information corresponding to the speech information through an emotion model based on a convolutional neural network and/or a Bayesian classifier, wherein the offensive identification result information is used for indicating whether the speech information is offensive to the comment object; and determining whether to release the speech information or not according to the sensitive information and the offensive identification result information.

Description

Method and equipment for detecting speech information

Technical Field

The present application relates to the field of communications, and in particular, to a technique for detecting speech information.

Background

With the development of the times, networks become essential tools for people to work and live, however, some unhealthy factors appear on the networks, for example, serious network violence problems may exist in various forum communication groups, and although the present inspection methods for the uncivilized terms still have great holes.

Disclosure of Invention

An object of the present application is to provide a method and apparatus for detecting speech information.

According to an aspect of the present application, there is provided a method of detecting speech information, the method comprising:

acquiring the speech information which is proposed to be published by a user aiming at a comment object and the object related information corresponding to the comment object;

sensitive word detection is carried out on the speech information through a sensitive information decision tree, and sensitive information corresponding to the speech information is determined, wherein the sensitive information comprises at least one sensitive word contained in the speech information, and the sensitive information decision tree is constructed according to a sensitive word stock;

combining the relevant information of the object, and detecting and obtaining offensive identification result information corresponding to the speech information through an emotion model based on a convolutional neural network and/or a Bayesian classifier, wherein the offensive identification result information is used for indicating whether the speech information is offensive to the comment object;

and determining whether to release the speech information or not according to the sensitive information and the offensive identification result information.

According to an aspect of the present application, there is provided a network device for detecting speech information, the device including:

the one-to-one module is used for acquiring the speech information which is proposed to be published by the user aiming at the comment object and the object related information corresponding to the comment object;

the first module and the second module are used for detecting sensitive words of the speech information through a sensitive information decision tree and determining sensitive information corresponding to the speech information, wherein the sensitive information comprises at least one sensitive word contained in the speech information, and the sensitive information decision tree is constructed according to a sensitive word stock;

a third module, configured to obtain, by combining the object related information, offensive recognition result information corresponding to the utterance information through emotion model detection based on a convolutional neural network and/or a bayesian classifier, where the offensive recognition result information is used to indicate whether the utterance information is offensive to the comment object;

and the four modules are used for determining whether to release the speech information or not according to the sensitive information and the offensive identification result information.

According to an aspect of the present application, there is provided an apparatus for detecting speech information, wherein the apparatus includes:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to:

According to one aspect of the application, there is provided a computer-readable medium storing instructions that, when executed, cause a system to:

According to another aspect of the application, there is provided a computer program product comprising a computer program which, when executed by a processor, performs the method of:

Compared with the prior art, the method and the device can detect the sensitive words of the speech information proposed by the user through the sensitive information decision tree, determine the sensitive information corresponding to the speech information, and then combine the relevant information of the object, detecting and obtaining the corresponding offensive identification result information of the speech information through an emotion model based on a convolutional neural network and/or a Bayesian classifier, further determining whether to release the speech information according to the sensitive information and the offensiveness identification result information, therefore, whether the user plans to send the spoken information has obvious personal attack can be accurately checked by adopting the emotional model detection, therefore, network violence is effectively prevented, the crowds which are easy to attack by the network can be better protected, and the attackers are subjected to legal sanctions, so that the network environment is greened by warning and standardizing functions.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 illustrates a flow diagram of a method of detecting verbal information according to one embodiment of the present application;

FIG. 2 illustrates a block diagram of a network device that detects speech information according to one embodiment of the present application;

FIG. 3 illustrates a flow diagram of a method of detecting verbal information according to one embodiment of the present application;

FIG. 4 illustrates an exemplary system that can be used to implement the various embodiments described in this application.

The same or similar reference numbers in the drawings identify the same or similar elements.

Detailed Description

The present application is described in further detail below with reference to the attached figures.

In a typical configuration of the present application, the terminal, the device serving the network, and the trusted party each include one or more processors (e.g., Central Processing Units (CPUs)), input/output interfaces, network interfaces, and memory.

The Memory may include forms of volatile Memory, Random Access Memory (RAM), and/or non-volatile Memory in a computer-readable medium, such as Read Only Memory (ROM) or Flash Memory. Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, Phase-Change Memory (PCM), Programmable Random Access Memory (PRAM), Static Random-Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), electrically Erasable Programmable Read-Only Memory (EEPROM), flash Memory or other Memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.

The device referred to in this application includes, but is not limited to, a user device, a network device, or a device formed by integrating a user device and a network device through a network. The user equipment includes, but is not limited to, any mobile electronic product, such as a smart phone, a tablet computer, etc., capable of performing human-computer interaction with a user (e.g., human-computer interaction through a touch panel), and the mobile electronic product may employ any operating system, such as an Android operating system, an iOS operating system, etc. The network Device includes an electronic Device capable of automatically performing numerical calculation and information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded Device, and the like. The network device includes but is not limited to a computer, a network host, a single network server, a plurality of network server sets or a cloud of a plurality of servers; here, the Cloud is composed of a large number of computers or web servers based on Cloud Computing (Cloud Computing), which is a kind of distributed Computing, one virtual supercomputer consisting of a collection of loosely coupled computers. Including, but not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, a wireless Ad Hoc network (Ad Hoc network), etc. Preferably, the device may also be a program running on the user device, the network device, or a device formed by integrating the user device and the network device, the touch terminal, or the network device and the touch terminal through a network.

Of course, those skilled in the art will appreciate that the foregoing is by way of example only, and that other existing or future devices, which may be suitable for use in the present application, are also encompassed within the scope of the present application and are hereby incorporated by reference.

In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.

Fig. 1 shows a flowchart of a method for detecting utterance information according to an embodiment of the present application, where the method includes step S11, step S12, step S13, and step S14. In step S11, the network device obtains utterance information that the user proposes to make for a comment object and object-related information corresponding to the comment object; in step S12, the network device performs sensitive word detection on the speech information through a sensitive information decision tree to determine sensitive information corresponding to the speech information, where the sensitive information includes at least one sensitive word included in the speech information, and the sensitive information decision tree is constructed according to a sensitive word library; in step S13, the network device obtains, by using an emotion model detection based on a convolutional neural network and/or a bayesian classifier, offensive identification result information corresponding to the utterance information in combination with the object-related information, where the offensive identification result information is used to indicate whether the utterance information is offensive to the comment object; in step S14, the network device determines whether to publish the utterance information according to the sensitive information and the attack recognition result information.

In step S11, the network device obtains the utterance information that the user proposes to make for the comment object and the object related information corresponding to the comment object. In some embodiments, the comment object includes, but is not limited to, an article, a post, a video, a picture, a commodity, an application, a comment, and the like, and the related information of the comment object includes, but is not limited to, an article title, an article abstract, article content, a post title, post content, a video title, a video brief summary, a picture title, a picture brief summary, a commodity name, a commodity title, a commodity brief summary, an application name, an application brief summary, comment content, and the like. In some embodiments, a user inputs speech information to be published for a certain comment object on a user device used by the user, and sends the speech information and object related information of the comment object to a network device. In some embodiments, a user inputs utterance information to be published for a certain comment object on a user device used by the user device, and sends the utterance information and identification information of the comment object to a network device, and the network device searches for object related information of the comment object identified by the identification information in a local storage (e.g., a memory, a cache, a database, a file, etc.) of the network device through the identification information.

In step S12, the network device performs sensitive word detection on the speech information through a sensitive information decision tree, and determines sensitive information corresponding to the speech information, where the sensitive information includes at least one sensitive word included in the speech information, and the sensitive information decision tree is constructed according to a sensitive word library. In some embodiments, the sensitive words are non-civilized words with sensitive political trends, violence trends, personal attack trends, and unhealthy trends, and the sensitive word corpus includes a plurality of sensitive word corpora, each of which is an identified non-civilized word. In some embodiments, a sensitive information decision tree may be constructed according to a plurality of sensitive word materials in a sensitive word material library, and after the sensitive information decision tree is constructed, the sensitive information decision tree is not dependent on the sensitive word material library any more, and only once the sensitive information decision tree is constructed, each node in the sensitive information decision tree is a character, a word composed of a plurality of characters corresponding to a plurality of sequentially connected nodes in the sensitive information decision tree is a sensitive word material in the sensitive word material library, for example, node 1 "M1", node 2 "M2" is a child node of node 1, node 3 "M3" is a child node of node 2, and a word "M1M 2M 3" composed of node 1, node 2, and node 3 is a sensitive word material in the sensitive word material library. In some embodiments, the speech information is first split into a plurality of words, and for each word, if the word composed of a plurality of characters corresponding to a plurality of sequentially connected nodes exists in the sensitive information decision tree, the word can be determined to be a sensitive word, so that whether the speech information contains the sensitive words, the number of contained sensitive words, and which sensitive word or words are contained can be determined.

In step S13, the network device obtains, by using an emotion model detection based on a convolutional neural network and/or a bayesian classifier, offensive identification result information corresponding to the utterance information in combination with the object-related information, where the offensive identification result information is used to indicate whether the utterance information is offensive to the comment object. In some embodiments, in addition to the sensitive word detection on the speech information, the whole piece of information detection on the speech information is also needed, and it is detected that the speech information is offensive to its corresponding comment object. In some embodiments, the utterance information and the object related information of the comment object may be input into a trained bayesian classifier-based first emotion model, and whether the utterance information is offensive for the comment object may be output. In some embodiments, the speech information and the object related information of the comment object may also be input into a second emotion model based on the CNN convolutional neural network, which is trained, and it may also be output whether the speech information is offensive for the comment object. In some embodiments, a large amount of sample emotion information data is needed to learn and train an emotion model, a sentence-level training mode and a classification mode are used, artificial emotion information of a certain piece of speech information for a certain comment object can be predicted through the trained emotion model, and therefore whether the certain piece of speech information has obvious personal attack on the certain comment object or not can be determined, wherein the sample emotion information data is sample data which can be understood by combining with related information and can generate different emotion understanding by combining with different related information. In some embodiments, a bayesian emotion model based on a bayesian classifier is used as a basic model, a CNN convolutional neural network is used for model upgrading on the basic model, the speech information and the object related information of the comment object can be input into a trained emotion model after model upgrading based on the convolutional neural network and the bayesian classifier, whether the speech information is offensive for the comment object is output, and the upgraded model has better performance and higher accuracy compared with a first emotion model based only on the bayesian classifier and a second emotion model based only on the CNN convolutional neural network.

In step S14, the network device determines whether to publish the utterance information according to the sensitive information and the attack recognition result information. In some embodiments, the speech information may be published in the network device only if the sensitive words included in the speech information satisfy a predetermined sensitive condition, where the predetermined sensitive condition includes, but is not limited to, that no sensitive words are included in the speech information, that the number of sensitive words included in the speech information is less than or equal to a predetermined number threshold, and that the word frequency of the sensitive words included in the speech information is less than or equal to a predetermined word frequency threshold. In some embodiments, the verbal information may be published only if the verbal information is not offensive for its corresponding commenting object. In some embodiments, the utterance information may be published only when the sensitive words included in the utterance information not only need to satisfy a predetermined sensitive condition, but also need to satisfy a condition that the utterance information is not offensive for its corresponding comment object. Whether the human body attack is obvious or not in the speech information to be published of the user can be accurately checked by adopting emotion model detection, so that network violence is effectively prevented, the crowd which is easy to attack by the network can be better protected, and an attacker is subjected to legal sanction, so that the network environment is greened by warning and standardizing effects.

In some embodiments, the sensitive information further includes sensitivity information corresponding to the speech information; wherein the method further comprises step S15 (not shown). In step S15, the network device determines the sensitivity information according to the quantity information of the at least one sensitive word and the word frequency information of each sensitive word in the speech information. In some embodiments, the greater the number of sensitive words contained in the speech information, the greater the sensitivity corresponding to the speech information, and the higher the word frequency of each sensitive word in the speech information, the greater the sensitivity corresponding to the speech information. For example, the information is "M1M 2M3M2M 1", the speech information includes 2 sensitive words, which are "M1" and "M3", respectively, the sensitive word "M1" appears 2 times in the speech information, the corresponding frequency of the appearing word is "2", the sensitive word "M3" appears 1 times in the speech information, the corresponding frequency of the appearing word is "1", and therefore, the sensitivity corresponding to the speech information is 1+1 × 2 — 3. In some embodiments, a piece of speech information may be published in a network device only if its corresponding sensitivity is less than or equal to a predetermined sensitivity threshold.

In some embodiments, the step S15 includes: and the network equipment determines the sensitivity information according to the quantity information of the at least one sensitive word and the word frequency information of each sensitive word in the speech information and in combination with the position information of each sensitive word in the speech information. In some embodiments, when determining the sensitivity corresponding to a certain piece of speech information, it is further required to combine the position of each sensitive word included in the speech information, where different positions correspond to different weight coefficients. In some embodiments, the positions further forward correspond to larger weighting factors, or the positions further backward correspond to larger weighting factors, or the positions further to the two ends correspond to larger weighting factors, and the positions further to the middle correspond to smaller weighting factors. For example, the information is "M1M 2M3M4M 5", the speech information includes 2 sensitive words, "M1" and "M3", respectively, the position of the sensitive word "M1" in the speech information is "0", the corresponding weight coefficient is "1.5", the position of the sensitive word "M3" in the speech information is "2", and the corresponding weight coefficient is "0.8", and therefore, the sensitivity corresponding to the speech information is 1 × 1.5+1 × 0.8 — 2.3.

In some embodiments, the step S14 includes: the network equipment determines whether to release the speech information or not according to the sensitive information and the offensive identification result information; and if not, sending the legal agreement corresponding to the speech information to the user, receiving feedback information which is returned by the user and related to the legal agreement, and if the feedback information indicates that the user signs the legal agreement, publishing the speech information. In some embodiments, if it is determined that the speech information cannot be directly published, a legal agreement corresponding to the speech information is generated and sent to the user equipment corresponding to the user for presentation, if the user refuses to sign the legal agreement, the speech information is rejected and cannot be published on the network equipment, and if the user signs the legal agreement, the speech information is published on the network equipment.

In some embodiments, said publishing said verbal information if said feedback information indicates that said user has signed said legal agreement comprises: and if the feedback information indicates that the user signs the legal agreement, the speech information is published, the user is confirmed as a potentially dangerous user, and the speech information, the object related information and the user related information of the user are stored and sent to network equipment corresponding to a specified national institution for dangerous user storage. In some embodiments, if the user signs the legal agreement corresponding to the speech information, the speech information is published on the network device, and the user is identified as a potentially dangerous user, and the speech information, the object-related information of the comment object corresponding to the speech information, and the user-related information of the user (for example, a user ID, a user name, user personal information, account-related information corresponding to an account used by the user to publish the speech information, device-related information corresponding to user equipment used by the user to publish the speech information, and the like) are stored in the network device and pushed to a server corresponding to a specified national institution for dangerous user storage.

In some embodiments, the method further comprises: the network equipment receives dangerous user identification result information about the user, which is sent by the network equipment corresponding to the specified country organization, and determines penalty information corresponding to the user according to the dangerous user identification result information; and performing punishment on the user according to the punishment information. In some embodiments, the specified country organization may determine whether the user violates the law, and if it is determined that the user triggers the law, may identify the user as a dangerous user, generate dangerous user identification result information indicating that the user is identified as a dangerous user, and send the dangerous user identification result information to the network device issuing the talk information, and the network device may determine penalty information corresponding to the user according to the dangerous user identification result information, and perform a penalty on the user according to the penalty information, where the penalty information includes, but is not limited to, a warning penalty, an integral deduction penalty, a credit degradation penalty, a short-time cover penalty, a permanent cover penalty, and the like. In some embodiments, the information of the determined result of the dangerous user further includes information of a danger level, information of a danger degree, and the like corresponding to the user, and the higher the danger level and the higher the danger degree corresponding to the user are, the heavier the penalty corresponding to the user is. As an example, as shown in fig. 3, the application end sends the speech information published by the user to the server end, the server end pulls the speech information and the related information such as the article title of the comment object corresponding to the speech information, intelligently detecting the speech information, detecting whether the speech information carries an offensive speech, if so, returning a legal agreement signature to the user, if not, the speech information is rejected, if the user agrees to sign the legal agreement, the speech information is published, and storing the user information, the equipment number, the account number and other related information, and pushing the related information to a national institution for dangerous user storage, the national institution judging whether the user violates the law, and sending the judgment result to the server, and if the user violates the law, the server informs the application end to carry out corresponding punishment on the user according to the situation.

In some embodiments, the method further comprises: and the network equipment clusters a plurality of sensitive words in the sensitive word stock according to the pinyin first letter of the first word of each sensitive word corpus, takes each pinyin first letter as a root node, takes the first word of each sensitive word corpus as a child node of the corresponding pinyin first letter, takes the second word of the sensitive word corpus as a child node of the first word, and establishes the sensitive information decision tree by analogy. In some embodiments, all the sensitive word materials in the sensitive word material library are clustered according to the first pinyin letter of the first word of each sensitive word material to obtain a sensitive word material set corresponding to each first pinyin letter, and each first pinyin letter is used as a root node of the sensitive information decision tree. In some embodiments, the sensitive information decision tree includes 26 root nodes, respectively 26 pinyin initials from "a" to "z". In some embodiments, for a root node corresponding to each pinyin first letter, a first word of each sensitive word corpus in a sensitive word corpus set corresponding to the pinyin first letter is taken as a child node of the pinyin first letter, a second word of the sensitive word corpus is taken as a child node of the first word of the sensitive word corpus, and so on, a subsequent word of the sensitive word corpus is taken as a child node of a previous word of the sensitive word corpus, thereby constructing the sensitive information decision tree. For example, a sensitive word corpus is "M1M 2M 3", the pinyin initial letter of the first word "M1" of the sensitive word corpus is "a", the root node is "a", the child node of the root node "a" is "M1", the child node of the child node "M1" is "M2", and the child node of the child node "M2" is "M3". In some embodiments, if a sensitive word corpus is an english sensitive word corpus, clustering the english sensitive word corpus according to a first english letter of the english sensitive word corpus to obtain an english sensitive word corpus set corresponding to each english letter, taking each english letter as a root node of a sensitive information decision tree, taking a second english letter as a child node of the root node corresponding to the first english letter, and so on, taking a next english letter of the english sensitive word corpus as a child node of a previous english letter of the sensitive word corpus.

In some embodiments, the step S12 includes a step S121 (not shown) and a step S122 (not shown). In step S121, the network device filters meaningless character information in the speech information to obtain filtered speech information; in step S122, the network device performs sensitive word detection on the filtered speech information through a sensitive information decision tree, and determines sensitive information corresponding to the speech information. In some embodiments, the speech information needs to be preprocessed, and nonsense characters in the speech information are filtered to obtain filtered speech information, where the nonsense characters include, but are not limited to punctuation marks (e.g., "etc.), nonsense modifiers, such as structure helpers (" of "," ground "," get ", etc.), language help words (" couple "," zoom "," java ", etc.), modifiers used only for enhancing language atmosphere (" though "," but "," more ", etc.), and in the speech information, these nonsense characters often appear but not be sensitive words, and their existence greatly increases the detection time of the sensitive words, so that these nonsense characters need to be filtered out.

In some embodiments, the step S122 includes: and the network equipment divides the filtered speech information into a plurality of word information, and determines the sensitive information corresponding to the speech information by searching the plurality of word information in the sensitive information decision tree. In some embodiments, if one or more word information of the plurality of word information is found in the sensitive information decision tree, the one or more word information is determined as the sensitive information corresponding to the speech information, and in some embodiments, if a synonym or synonym of the one or more word information is found in the sensitive information decision tree, the one or more word information is determined as the sensitive information corresponding to the speech information. In some embodiments, the plurality of word information may be split from the filtered verbal information based on a predetermined word type (e.g., adjective, verb, etc.).

In some embodiments, the splitting the filtered speech information into a plurality of word information includes: and determining word information formed by the first word and a plurality of continuous words behind the first word from the first word of the filtered speech information, and splitting the filtered speech information into a plurality of word information from the next word of the word information by analogy. In some embodiments, starting with the first word in the filtered utterance information, if the first word and the N words immediately following the first word can form a word, the combination of the first word and the N words is determined as a word information, and the next word information is re-determined starting with the next word after the N words. As an example, starting from the first word of the filtered speech information, determining that the first word and the following 3 words form a word, using the first word and the 3 words together as word information, then starting from the 5 th word, if the 5 th word cannot form a word with the 6 th word, determining the 5 th word as word information alone, then continuing to determine the next word information from the 6 th word, and so on, so as to split the filtered speech information into a plurality of word information.

In some embodiments, the method further comprises step S16 (not shown). In step S16, the network device performs learning training through object related information of a plurality of comment objects and one or more sample emotion information data corresponding to each comment object, to obtain the emotion model. In some embodiments, a large number of comment objects and a large number of sample emotion information data corresponding to each comment object are required to perform emotion model learning and training, and a sentence-level training mode and classification mode are used, where the sample emotion information data is sample data that needs to be understood in combination with a specific comment object and that generates different emotion understandings in combination with different comment objects.

In some embodiments, each sample affective information data includes a score information and comment information corresponding to the score information; wherein the step S16 includes: and the network equipment performs learning training through the object related information of the comment objects, one or more grading information corresponding to each comment object and the comment information corresponding to each grading information to obtain the emotion model. In some embodiments, the comment object may be a commodity in an e-commerce website or an e-commerce APP, and the commodity name, the commodity title, the commodity profile of each commodity, a plurality of user rating information corresponding to the commodity, and user comment information corresponding to each user rating information may be taken as training data to be brought into the emotion model for learning and training. In some embodiments, the comment object may also be an application in an application market, and the application name, the application profile, the user rating information corresponding to the application, and the user comment information corresponding to each user rating information of each application may be taken as training data into the emotion model for learning and training. In some embodiments, the user scoring information may be vectorized, for example, the user scoring information vector greater than or equal to a certain predetermined scoring threshold is quantized to 1, the user scoring information vector less than a certain predetermined scoring threshold is quantized to 0, so that the user scoring information vector is quantized into positive and negative scores, and then the scores are brought into the emotion model for model training.

In some embodiments, the emotion model comprises a bayesian emotion model based on a bayesian classifier and an emotion model based on a convolutional neural network; the step S13 includes: the network equipment acquires content offensiveness identification result information corresponding to the speech information through the Bayesian emotion model detection in combination with the object related information, wherein the content offensiveness identification result information is used for indicating whether the speech information has content offensiveness on the comment object; detecting and obtaining emotion offensiveness recognition result information corresponding to the speech information through the emotion model by combining the relevant information of the object, wherein the emotion offensiveness recognition result information is used for indicating whether the speech information has emotion offensiveness on the comment object; wherein the step S14 includes: and the network equipment determines whether to release the speech information or not according to the sensitive information, the content aggressivity identification result information and the emotion aggressivity identification result information. In some embodiments, because the bayesian emotional model based on the bayesian classifier is used for judging the actual meaning of the speech information positively or negatively, and cannot define neutral words or words with ironic meanings and emotions, whether the speech information has content aggressivity for the corresponding comment object can be detected through the bayesian emotional model, after the bayesian emotional model is detected, whether the emotion expressed by the speech information is positive or negative can be detected through the emotion model based on the CNN convolutional neural network, and whether the speech information has emotional aggressivity for the corresponding comment object can be detected. In some embodiments, for a piece of speech information, if the piece of speech information has both content aggressiveness and emotion aggressiveness for its corresponding comment object, it may be determined that the piece of speech information is aggressive. In some embodiments, for a piece of speech information, it may be determined that the piece of speech information is offensive as long as the piece of speech information has one of content offensiveness and emotional offensiveness for its corresponding commentary object.

In some embodiments, the method further comprises performing step S17 (not shown) after the step S14. In step S17, if it is determined that the speech information includes at least one sensitive word, the network device performs sensitive word processing on the speech information according to the at least one sensitive word, and releases the processed speech information. In some embodiments, if it is determined that the speech information can be published and at least one sensitive word is included in the speech information, the at least one sensitive word may be removed from the speech information and the rejected speech information is displayed.

In some embodiments, the step S17 includes: if the network equipment determines to publish the speech information and the speech information contains at least one sensitive word, sending replacement prompt information corresponding to the at least one sensitive word to the user; receiving replacement word information corresponding to the at least one sensitive word returned by the user, wherein the replacement word information is determined by the user according to the replacement prompt information; and executing replacement operation on the at least one sensitive word according to the replacement word information, and publishing the replaced speech information. In some embodiments, if it is determined that the speech information can be published and at least one sensitive word is included in the speech information, generating replacement prompt information about the at least one sensitive word, sending the replacement prompt information to a user who is about to publish the speech information, prompting the user to determine a replacement word corresponding to the at least one sensitive word, replacing the at least one sensitive word in the speech information with the replacement word determined by the user, and sending the speech information after replacement. In some embodiments, the replacement word corresponding to the at least one sensitive word may be manually input by the user, or a plurality of replacement word alternatives corresponding to the at least one sensitive word may be provided for selection by the user, or a predetermined default character (e.g., ") may be set as the replacement word corresponding to the at least one sensitive word by the network device if the user does not determine the replacement word corresponding to the at least one sensitive word over time.

Fig. 2 is a diagram illustrating a structure of a network device for detecting speech information according to an embodiment of the present application, where the network device includes a one-to-one module 11, a two-to-two module 12, a three-to-three module 13, and a four-to-four module 14. The one-to-one module 11 is used for acquiring the speech information which is proposed to be published by the user aiming at the comment object and the object related information corresponding to the comment object; a second module 12, configured to perform sensitive word detection on the speech information through a sensitive information decision tree, and determine sensitive information corresponding to the speech information, where the sensitive information includes at least one sensitive word included in the speech information, and the sensitive information decision tree is constructed according to a sensitive word library; a third module 13, configured to obtain, by combining the object related information, offensive identification result information corresponding to the utterance information through emotion model detection based on a convolutional neural network and/or a bayesian classifier, where the offensive identification result information is used to indicate whether the utterance information is offensive to the comment object; a fourth module 14, configured to determine whether to publish the utterance information according to the sensitive information and the offensive identification result information.

And the one-to-one module 11 is used for acquiring the speech information which is proposed to be published by the user aiming at the comment object and the object related information corresponding to the comment object. In some embodiments, the comment object includes, but is not limited to, an article, a post, a video, a picture, a commodity, an application, a comment, and the like, and the related information of the comment object includes, but is not limited to, an article title, an article abstract, article content, a post title, post content, a video title, a video brief summary, a picture title, a picture brief summary, a commodity name, a commodity title, a commodity brief summary, an application name, an application brief summary, comment content, and the like. In some embodiments, a user inputs speech information to be published for a certain comment object on a user device used by the user, and sends the speech information and object related information of the comment object to a network device. In some embodiments, a user inputs utterance information to be published for a certain comment object on a user device used by the user device, and sends the utterance information and identification information of the comment object to a network device, and the network device searches for object related information of the comment object identified by the identification information in a local storage (e.g., a memory, a cache, a database, a file, etc.) of the network device through the identification information.

And a second module 12, configured to perform sensitive word detection on the speech information through a sensitive information decision tree, and determine sensitive information corresponding to the speech information, where the sensitive information includes at least one sensitive word included in the speech information, and the sensitive information decision tree is constructed according to a sensitive word library. In some embodiments, the sensitive words are non-civilized words with sensitive political trends, violence trends, personal attack trends, and unhealthy trends, and the sensitive word corpus includes a plurality of sensitive word corpora, each of which is an identified non-civilized word. In some embodiments, a sensitive information decision tree may be constructed according to a plurality of sensitive word materials in a sensitive word material library, and after the sensitive information decision tree is constructed, the sensitive information decision tree is not dependent on the sensitive word material library any more, and only once the sensitive information decision tree is constructed, each node in the sensitive information decision tree is a character, a word composed of a plurality of characters corresponding to a plurality of sequentially connected nodes in the sensitive information decision tree is a sensitive word material in the sensitive word material library, for example, node 1 "M1", node 2 "M2" is a child node of node 1, node 3 "M3" is a child node of node 2, and a word "M1M 2M 3" composed of node 1, node 2, and node 3 is a sensitive word material in the sensitive word material library. In some embodiments, the speech information is first split into a plurality of words, and for each word, if the word composed of a plurality of characters corresponding to a plurality of sequentially connected nodes exists in the sensitive information decision tree, the word can be determined to be a sensitive word, so that whether the speech information contains the sensitive words, the number of contained sensitive words, and which sensitive word or words are contained can be determined.

And a third module 13, configured to obtain, by combining the object related information, offensive identification result information corresponding to the utterance information through emotion model detection based on a convolutional neural network and/or a bayesian classifier, where the offensive identification result information is used to indicate whether the utterance information is offensive to the comment object. In some embodiments, in addition to the sensitive word detection on the speech information, the whole piece of information detection on the speech information is also needed, and it is detected that the speech information is offensive to its corresponding comment object. In some embodiments, the utterance information and the object related information of the comment object may be input into a trained bayesian classifier-based first emotion model, and whether the utterance information is offensive for the comment object may be output. In some embodiments, the speech information and the object related information of the comment object may also be input into a second emotion model based on the CNN convolutional neural network, which is trained, and it may also be output whether the speech information is offensive for the comment object. In some embodiments, a large amount of sample emotion information data is needed to learn and train an emotion model, a sentence-level training mode and a classification mode are used, artificial emotion information of a certain piece of speech information for a certain comment object can be predicted through the trained emotion model, and therefore whether the certain piece of speech information has obvious personal attack on the certain comment object or not can be determined, wherein the sample emotion information data is sample data which can be understood by combining with related information and can generate different emotion understanding by combining with different related information. In some embodiments, a bayesian emotion model based on a bayesian classifier is used as a basic model, a CNN convolutional neural network is used for model upgrading on the basic model, the speech information and the object related information of the comment object can be input into a trained emotion model after model upgrading based on the convolutional neural network and the bayesian classifier, whether the speech information is offensive for the comment object is output, and the upgraded model has better performance and higher accuracy compared with a first emotion model based only on the bayesian classifier and a second emotion model based only on the CNN convolutional neural network.

A fourth module 14, configured to determine whether to publish the utterance information according to the sensitive information and the offensive identification result information. In some embodiments, the speech information may be published in the network device only if the sensitive words included in the speech information satisfy a predetermined sensitive condition, where the predetermined sensitive condition includes, but is not limited to, that no sensitive words are included in the speech information, that the number of sensitive words included in the speech information is less than or equal to a predetermined number threshold, and that the word frequency of the sensitive words included in the speech information is less than or equal to a predetermined word frequency threshold. In some embodiments, the verbal information may be published only if the verbal information is not offensive for its corresponding commenting object. In some embodiments, the utterance information may be published only when the sensitive words included in the utterance information not only need to satisfy a predetermined sensitive condition, but also need to satisfy a condition that the utterance information is not offensive for its corresponding comment object. Whether the human body attack is obvious or not in the speech information to be published of the user can be accurately checked by adopting emotion model detection, so that network violence is effectively prevented, the crowd which is easy to attack by the network can be better protected, and an attacker is subjected to legal sanction, so that the network environment is greened by warning and standardizing effects.

In some embodiments, the sensitive information further includes sensitivity information corresponding to the speech information; wherein the device further comprises a five-module 15 (not shown). And a fifthly module 15, configured to determine the sensitivity information according to the quantity information of the at least one sensitive word and the word frequency information of each sensitive word in the speech information. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, the one-five module 15 is configured to: and determining the sensitivity information according to the quantity information of the at least one sensitive word and the word frequency information of each sensitive word in the speech information and by combining the position information of each sensitive word in the speech information. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, the one-four module 14 is configured to: determining whether to release the speech information or not according to the sensitive information and the offensive identification result information; and if not, sending the legal agreement corresponding to the speech information to the user, receiving feedback information which is returned by the user and related to the legal agreement, and if the feedback information indicates that the user signs the legal agreement, publishing the speech information. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, said publishing said verbal information if said feedback information indicates that said user has signed said legal agreement comprises: and if the feedback information indicates that the user signs the legal agreement, the speech information is published, the user is confirmed as a potentially dangerous user, and the speech information, the object related information and the user related information of the user are stored and sent to network equipment corresponding to a specified national institution for dangerous user storage. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, the apparatus is further configured to: receiving dangerous user identification result information about the user, which is sent by network equipment corresponding to the specified country organization, and determining penalty information corresponding to the user according to the dangerous user identification result information; and performing punishment on the user according to the punishment information. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, the apparatus is further configured to: clustering a plurality of sensitive words in the sensitive word stock according to the pinyin first letter of the first word of each sensitive word corpus, taking each pinyin first letter as a root node, taking the first word of each sensitive word corpus as a child node of the corresponding pinyin first letter, taking the second word of the sensitive word corpus as a child node of the first word, and so on to construct the sensitive information decision tree. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, the two-module 12 includes a two-one module 121 (not shown) and a two-two module 122 (not shown). A second-first module 121, configured to filter meaningless character information in the speech information to obtain filtered speech information; a second-second module 122, configured to perform sensitive word detection on the filtered speech information through a sensitive information decision tree, and determine sensitive information corresponding to the speech information. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, the one, two and two modules 122 are configured to: and splitting the filtered speech information into a plurality of word information, and searching the plurality of word information in the sensitive information decision tree to determine the sensitive information corresponding to the speech information. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, the splitting the filtered speech information into a plurality of word information includes: and determining word information formed by the first word and a plurality of continuous words behind the first word from the first word of the filtered speech information, and splitting the filtered speech information into a plurality of word information from the next word of the word information by analogy. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, the apparatus further comprises a six-module 16 (not shown). And a sixth module 16, configured to perform learning training on the object-related information of the multiple comment objects and one or more sample emotion information data corresponding to each comment object, to obtain the emotion model. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, each sample affective information data includes a score information and comment information corresponding to the score information; wherein the sixth module 16 is configured to: and performing learning training through the object related information of the comment objects, one or more grading information corresponding to each comment object and the comment information corresponding to each grading information to obtain the emotion model. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, the emotion model comprises a bayesian emotion model based on a bayesian classifier and an emotion model based on a convolutional neural network; the one-three module 13 is configured to: obtaining content offensiveness identification result information corresponding to the speech information through the Bayesian emotion model detection in combination with the object related information, wherein the content offensiveness identification result information is used for indicating whether the speech information has content offensiveness on the comment object; detecting and obtaining emotion offensiveness recognition result information corresponding to the speech information through the emotion model by combining the relevant information of the object, wherein the emotion offensiveness recognition result information is used for indicating whether the speech information has emotion offensiveness on the comment object; wherein the four modules 14 are configured to: and determining whether to release the speech information or not according to the sensitive information, the content aggressivity identification result information and the emotion aggressivity identification result information. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, the apparatus further comprises a seven module 17 (not shown). And a seventh module 17, configured to, if it is determined that the speech information is published and the speech information includes at least one sensitive word, perform sensitive word processing on the speech information according to the at least one sensitive word, and publish the processed speech information. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, the seventy-one module 17 is configured to: if the speech information is confirmed to be published and contains at least one sensitive word, sending replacement prompt information corresponding to the at least one sensitive word to the user; receiving replacement word information corresponding to the at least one sensitive word returned by the user, wherein the replacement word information is determined by the user according to the replacement prompt information; and executing replacement operation on the at least one sensitive word according to the replacement word information, and publishing the replaced speech information. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, as shown in FIG. 4, the system 300 can be implemented as any of the devices in the various embodiments described. In some embodiments, system 300 may include one or more computer-readable media (e.g., system memory or NVM/storage 320) having instructions and one or more processors (e.g., processor(s) 305) coupled with the one or more computer-readable media and configured to execute the instructions to implement modules to perform the actions described herein.

For one embodiment, system control module 310 may include any suitable interface controllers to provide any suitable interface to at least one of processor(s) 305 and/or any suitable device or component in communication with system control module 310.

The system control module 310 may include a memory controller module 330 to provide an interface to the system memory 315. Memory controller module 330 may be a hardware module, a software module, and/or a firmware module.

System memory 315 may be used, for example, to load and store data and/or instructions for system 300. For one embodiment, system memory 315 may include any suitable volatile memory, such as suitable DRAM. In some embodiments, the system memory 315 may include a double data rate type four synchronous dynamic random access memory (DDR4 SDRAM).

For one embodiment, system control module 310 may include one or more input/output (I/O) controllers to provide an interface to NVM/storage 320 and communication interface(s) 325.

For example, NVM/storage 320 may be used to store data and/or instructions. NVM/storage 320 may include any suitable non-volatile memory (e.g., flash memory) and/or may include any suitable non-volatile storage device(s) (e.g., one or more Hard Disk Drives (HDDs), one or more Compact Disc (CD) drives, and/or one or more Digital Versatile Disc (DVD) drives).

NVM/storage 320 may include storage resources that are physically part of the device on which system 300 is installed or may be accessed by the device and not necessarily part of the device. For example, NVM/storage 320 may be accessible over a network via communication interface(s) 325.

Communication interface(s) 325 may provide an interface for system 300 to communicate over one or more networks and/or with any other suitable device. System 300 may wirelessly communicate with one or more components of a wireless network according to any of one or more wireless network standards and/or protocols.

For one embodiment, at least one of the processor(s) 305 may be packaged together with logic for one or more controller(s) (e.g., memory controller module 330) of the system control module 310. For one embodiment, at least one of the processor(s) 305 may be packaged together with logic for one or more controller(s) of the system control module 310 to form a System In Package (SiP). For one embodiment, at least one of the processor(s) 305 may be integrated on the same die with logic for one or more controller(s) of the system control module 310. For one embodiment, at least one of the processor(s) 305 may be integrated on the same die with logic for one or more controller(s) of the system control module 310 to form a system on a chip (SoC).

In various embodiments, system 300 may be, but is not limited to being: a server, a workstation, a desktop computing device, or a mobile computing device (e.g., a laptop computing device, a holding computing device, a tablet, a netbook, etc.). In various embodiments, system 300 may have more or fewer components and/or different architectures. For example, in some embodiments, system 300 includes one or more cameras, a keyboard, a Liquid Crystal Display (LCD) screen (including a touch screen display), a non-volatile memory port, multiple antennas, a graphics chip, an Application Specific Integrated Circuit (ASIC), and speakers.

The present application also provides a computer readable storage medium having stored thereon computer code which, when executed, performs a method as in any one of the preceding.

The present application also provides a computer program product, which when executed by a computer device, performs the method of any of the preceding claims.

The present application further provides a computer device, comprising:

one or more processors;

a memory for storing one or more computer programs;

the one or more computer programs, when executed by the one or more processors, cause the one or more processors to implement the method of any preceding claim.

It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, implemented using Application Specific Integrated Circuits (ASICs), general purpose computers or any other similar hardware devices. In one embodiment, the software programs of the present application may be executed by a processor to implement the steps or functions described above. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

In addition, some of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present application through the operation of the computer. Those skilled in the art will appreciate that the form in which the computer program instructions reside on a computer-readable medium includes, but is not limited to, source files, executable files, installation package files, and the like, and that the manner in which the computer program instructions are executed by a computer includes, but is not limited to: the computer directly executes the instruction, or the computer compiles the instruction and then executes the corresponding compiled program, or the computer reads and executes the instruction, or the computer reads and installs the instruction and then executes the corresponding installed program. Computer-readable media herein can be any available computer-readable storage media or communication media that can be accessed by a computer.

Communication media includes media by which communication signals, including, for example, computer readable instructions, data structures, program modules, or other data, are transmitted from one system to another. Communication media may include conductive transmission media such as cables and wires (e.g., fiber optics, coaxial, etc.) and wireless (non-conductive transmission) media capable of propagating energy waves such as acoustic, electromagnetic, RF, microwave, and infrared. Computer readable instructions, data structures, program modules, or other data may be embodied in a modulated data signal, for example, in a wireless medium such as a carrier wave or similar mechanism such as is embodied as part of spread spectrum techniques. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. The modulation may be analog, digital or hybrid modulation techniques.

By way of example, and not limitation, computer-readable storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. For example, computer-readable storage media include, but are not limited to, volatile memory such as random access memory (RAM, DRAM, SRAM); and non-volatile memory such as flash memory, various read-only memories (ROM, PROM, EPROM, EEPROM), magnetic and ferromagnetic/ferroelectric memories (MRAM, FeRAM); and magnetic and optical storage devices (hard disk, tape, CD, DVD); or other now known media or later developed that can store computer-readable information/data for use by a computer system.

An embodiment according to the present application comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method and/or a solution according to the aforementioned embodiments of the present application.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims

1. A method for detecting speech information is applied to a network equipment side, wherein the method comprises the following steps:

2. The method of claim 1, wherein the sensitive information further comprises sensitivity information corresponding to the spoken information;

wherein the method further comprises:

and determining the sensitivity information according to the quantity information of the at least one sensitive word and the word frequency information of each sensitive word in the speech information.

3. The method of claim 2, wherein the determining the sensitivity information according to the quantity information of the at least one sensitive word and the occurrence word frequency information of each sensitive word in the speech information comprises:

and determining the sensitivity information according to the quantity information of the at least one sensitive word and the word frequency information of each sensitive word in the speech information and by combining the position information of each sensitive word in the speech information.

4. The method of claim 1, wherein the determining whether to publish the verbal information according to the sensitive information and the offensiveness-identifying-result information comprises:

determining whether to release the speech information or not according to the sensitive information and the offensive identification result information; and if not, sending the legal agreement corresponding to the speech information to the user, receiving feedback information which is returned by the user and related to the legal agreement, and if the feedback information indicates that the user signs the legal agreement, publishing the speech information.

5. The method of claim 4, wherein said publishing said verbal information if said feedback information indicates that said user has signed said legal agreement comprises:

and if the feedback information indicates that the user signs the legal agreement, the speech information is published, the user is confirmed as a potentially dangerous user, and the speech information, the object related information and the user related information of the user are stored and sent to network equipment corresponding to a specified national institution for dangerous user storage.

6. The method of claim 5, wherein the method further comprises:

receiving dangerous user identification result information about the user, which is sent by network equipment corresponding to the specified country organization, and determining penalty information corresponding to the user according to the dangerous user identification result information;

and performing punishment on the user according to the punishment information.

7. The method of claim 1, wherein the method further comprises:

clustering a plurality of sensitive words in the sensitive word stock according to the pinyin first letter of the first word of each sensitive word corpus, taking each pinyin first letter as a root node, taking the first word of each sensitive word corpus as a child node of the corresponding pinyin first letter, taking the second word as a child node of the first word, and so on to construct the sensitive information decision tree.

8. The method of claim 1, wherein the performing sensitive word detection on the speech information through a sensitive information decision tree to determine sensitive information corresponding to the speech information comprises:

filtering meaningless character information in the language information to obtain filtered language information;

and performing sensitive word detection on the filtered speech information through a sensitive information decision tree to determine sensitive information corresponding to the speech information.

9. The method of claim 8, wherein the performing sensitive word detection on the filtered speech information through a sensitive information decision tree to determine sensitive information corresponding to the speech information comprises:

and splitting the filtered speech information into a plurality of word information, and searching the plurality of word information in the sensitive information decision tree to determine the sensitive information corresponding to the speech information.

10. The method of claim 9, wherein the splitting the filtered speech information into a plurality of word information comprises:

and determining word information formed by the first word and a plurality of continuous words behind the first word from the first word of the filtered speech information, and splitting the filtered speech information into a plurality of word information from the next word of the word information by analogy.

11. The method of claim 1, wherein the method further comprises:

and performing learning training through object related information of a plurality of comment objects and one or more sample emotion information data corresponding to each comment object to obtain the emotion model.

12. The method of claim 11, wherein each sample affective information data includes a score and comment information corresponding to the score;

the obtaining of the emotion model by performing learning training on object related information of a plurality of comment objects and one or more sample emotion information data corresponding to each comment object includes:

and performing learning training through the object related information of the comment objects, one or more grading information corresponding to each comment object and the comment information corresponding to each grading information to obtain the emotion model.

13. The method of claim 1, wherein the emotion model comprises a bayesian emotion model based on a bayesian classifier and an emotion model based on a convolutional neural network;

the detecting and obtaining the corresponding aggressive identification result information of the speech information by combining the relevant information of the object and through an emotion model based on a convolutional neural network and/or a Bayesian classifier comprises the following steps:

obtaining content offensiveness identification result information corresponding to the speech information through the Bayesian emotion model detection in combination with the object related information, wherein the content offensiveness identification result information is used for indicating whether the speech information has content offensiveness on the comment object;

detecting and obtaining emotion offensiveness recognition result information corresponding to the speech information through the emotion model by combining the relevant information of the object, wherein the emotion offensiveness recognition result information is used for indicating whether the speech information has emotion offensiveness on the comment object;

wherein, the determining whether to issue the utterance information according to the sensitive information and the offensiveness recognition result information includes:

and determining whether to release the speech information or not according to the sensitive information, the content aggressivity identification result information and the emotion aggressivity identification result information.

14. The method of claim 1, wherein the method further comprises, after determining whether to publish the speech information according to the sensitive information and the offensiveness-identifying-result information, performing:

and if the fact that the speech information is published and the speech information contains at least one sensitive word is determined, performing sensitive word processing on the speech information according to the at least one sensitive word, and publishing the processed speech information.

15. The method of claim 14, wherein if it is determined that the speech information is published and the speech information includes at least one sensitive word, performing sensitive word processing on the speech information according to the at least one sensitive word, and publishing the processed speech information comprises:

if the speech information is confirmed to be published and contains at least one sensitive word, sending replacement prompt information corresponding to the at least one sensitive word to the user;

receiving replacement word information corresponding to the at least one sensitive word returned by the user, wherein the replacement word information is determined by the user according to the replacement prompt information;

and executing replacement operation on the at least one sensitive word according to the replacement word information, and publishing the replaced speech information.

16. An apparatus for detecting speech information, the apparatus comprising:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the method of any of claims 1 to 15.

17. A computer-readable medium storing instructions that, when executed by a computer, cause the computer to perform operations of any of the methods of claims 1-15.

18. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method according to any one of claims 1 to 15 when executed by a processor.