WO2021169499A1 - Procédé, appareil et système de surveillance de données défectueuses de réseau, et support de stockage - Google Patents

Procédé, appareil et système de surveillance de données défectueuses de réseau, et support de stockage Download PDF

Info

Publication number
WO2021169499A1
WO2021169499A1 PCT/CN2020/136403 CN2020136403W WO2021169499A1 WO 2021169499 A1 WO2021169499 A1 WO 2021169499A1 CN 2020136403 W CN2020136403 W CN 2020136403W WO 2021169499 A1 WO2021169499 A1 WO 2021169499A1
Authority
WO
WIPO (PCT)
Prior art keywords
bad
word
words
vocabulary
preset
Prior art date
Application number
PCT/CN2020/136403
Other languages
English (en)
Chinese (zh)
Inventor
张国辉
钱柏丞
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021169499A1 publication Critical patent/WO2021169499A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Definitions

  • This application relates to the technical field of big data processing, and in particular to a method, device and computer-readable storage medium for monitoring network bad data.
  • this application provides a method, device, and computer-readable storage medium for monitoring network bad data.
  • the main purpose of the method is to divide each word with a preset
  • the bad words in the bad vocabulary comparison table are compared, and the same bad words are loaded into the first bad vocabulary list. Because the bad words in the bad vocabulary comparison table are limited, there may be bad words similar to the bad words, so through the words
  • the similarity calculation formula calculates the word segmentation in the target text again, and loads the words that meet the preset similarity threshold range into the first bad vocabulary. Since the bad words found by the similarity calculation are not certain, the emotions
  • the analysis algorithm and word position structure method screen out the non-bad words in the first bad vocabulary, and finally output the third bad vocabulary.
  • the unregistered bad vocabulary can be found more accurately. Compared with the prior art, the accuracy of the recorded bad vocabulary is higher and the accuracy is improved.
  • this application provides a method for monitoring network bad data, which includes:
  • word similarity calculation formula calculate the average similarity of each candidate word and the words in the preset bad vocabulary comparison table, and load the candidate words with the average similarity greater than the preset similarity threshold to the The first bad vocabulary list;
  • the words that do not meet the preset sentiment trend rule of the bad words are screened out from the first bad vocabulary to obtain the second bad vocabulary;
  • words that do not conform to the position structure of the bad vocabulary sentence are screened out from the second bad vocabulary list, and the third bad vocabulary list is obtained and output.
  • the present application also provides an electronic device, the electronic device comprising: a memory, a processor, and a network bad data monitoring program is stored in the memory, and the network bad data monitoring program is When the processor executes, the following steps are implemented:
  • word similarity calculation formula calculate the average similarity of each candidate word and the words in the preset bad vocabulary comparison table, and load the candidate words with the average similarity greater than the preset similarity threshold to the The first bad vocabulary list;
  • the words that do not meet the preset sentiment trend rule of the bad words are screened out from the first bad vocabulary to obtain the second bad vocabulary;
  • words that do not conform to the position structure of the bad vocabulary sentence are screened out from the second bad vocabulary list, and the third bad vocabulary list is obtained and output.
  • this application also provides a network bad data monitoring system, including:
  • the word segmentation processing unit is used to perform word segmentation processing on the target text to obtain a word segmentation set
  • the bad word screening unit is used to compare words in the word segmentation set with a preset bad vocabulary comparison table, filter bad words from the word segmentation set, load the bad words into the first bad vocabulary list, and load the bad words into the first bad vocabulary list. The remaining words after screening in the word segmentation set are used as candidate words;
  • the word similarity calculation unit is used to calculate the average similarity between each of the candidate words and the words in the preset bad vocabulary comparison table through a word similarity calculation formula, and make the average similarity greater than the preset similarity threshold Load the candidate words of to the first bad vocabulary list;
  • the sentiment analysis unit is used to screen out words that do not meet the preset sentiment trend rule of undesirable words from the first unhealthy vocabulary through an sentiment analysis algorithm to obtain a second unhealthy vocabulary;
  • the word position structure screening unit is used to filter out words that do not conform to the position structure of the bad vocabulary sentence from the second bad vocabulary list through the word position structure method to obtain and output the third bad vocabulary list.
  • the present application also provides a computer-readable storage medium in which a network bad data monitoring program is stored, and when the network bad data monitoring program is executed by a processor, Realize any step in the method for monitoring network bad data as described above.
  • the network bad data monitoring method, device and computer readable storage medium proposed in this application compare each word segment with the bad words in the preset bad vocabulary comparison table after word segmentation processing of the target text, and compare the same bad words.
  • the words are loaded into the first bad vocabulary list. Due to the limited bad words in the bad vocabulary comparison table, there may be bad words similar to the bad words. Therefore, the word similarity calculation formula is used to calculate the word segmentation in the target text again.
  • the words with the preset similarity threshold range are loaded into the first bad vocabulary. Since the bad words found by the similarity calculation are not certain, the sentiment analysis algorithm and word position structure method are used to analyze the non-bad words in the first bad vocabulary. Words are screened out, and finally the third bad vocabulary list is output.
  • the unregistered bad vocabulary can be found more accurately. Compared with the prior art, the accuracy of the recorded bad vocabulary is higher and the accuracy is improved.
  • FIG. 1 is a flowchart of a preferred embodiment of a method for monitoring bad network data according to this application;
  • FIG. 2 is a schematic diagram of an application environment of a preferred embodiment of a method for monitoring bad network data according to this application;
  • FIG. 3 is a schematic diagram of modules of a preferred embodiment of the network bad data monitoring program in FIG. 2;
  • Figure 4 is a system logic diagram corresponding to the method for monitoring bad network data in this application.
  • the present application provides a method for monitoring bad network data.
  • FIG. 1 it is a schematic diagram of an application environment of a preferred embodiment of the method for monitoring bad network data according to this application.
  • the method can be executed by a device, and the device can be implemented by software and/or hardware.
  • the method for monitoring network bad data includes: step S110-step S150.
  • Step S110 Perform word segmentation processing on the target text to obtain a word segmentation set.
  • Step S120 Compare the words in the word segmentation set with a preset bad vocabulary comparison table, filter out bad words from the word segmentation set, load the bad words into the first bad vocabulary list, and use the remaining words filtered in the word segmentation set as waiting Choose words.
  • each word in the word segmentation set is compared with the bad words in the preset bad vocabulary comparison table.
  • the preset bad vocabulary comparison table stores a large number of bad words. Through the comparison, the bad words in the word segmentation set can be determined. Words, filter out the words identified as bad words in the word segmentation set, and load them into the first bad vocabulary list.
  • the bad words in the preset bad vocabulary comparison table can be derived from common bad words in the Internet.
  • the words in the word segmentation set are compared with the preset bad words If the bad words in the comparison table are exactly the same, the words are selected from the word segmentation set and loaded into the first bad vocabulary list. For example, if the word “mentally retarded” exists in the word segmentation set, it is also in the default bad vocabulary comparison table. If the word “mentally retarded” is recorded, the "mentally retarded” in the word segmentation set will be filtered out and recorded in the first bad vocabulary list.
  • the words in the word segmentation set are compared with the preset bad vocabulary comparison table, bad words are selected from the word segmentation set, the bad words are loaded into the first bad vocabulary list, and the remaining words after screening in the word segmentation set are selected as candidates
  • the word steps include:
  • the preset same word screening model includes:
  • the first input layer for inputting words in the word segmentation set for inputting words in the word segmentation set
  • the second input layer for inputting the preset bad vocabulary comparison table for inputting the words input for the first input layer and the preset bad words input for the second input layer
  • the same word filtering layer for comparison and analysis of the comparison table for comparison and analysis of the comparison table
  • the first output layer used to output bad words from the word segmentation set in the same word filtering layer and the first output layer used to filter the same word filtering layer from the word segmentation set
  • the second output layer where the remaining words after bad words are output.
  • Step S130 Calculate the average similarity between each candidate word and the words in the preset bad vocabulary comparison table through the word similarity calculation formula, and load the candidate words with the average similarity greater than the preset similarity threshold to the first bad word. Glossary.
  • the word segmentation set Since the bad words in the preset bad vocabulary comparison table are usually bad words that have been recorded, the recorded bad words are limited. If there are bad words in the word segmentation set that are not recorded in the preset bad vocabulary comparison table, the word segmentation set The screening of bad words in is not thorough enough, so the word similarity calculation formula can filter out bad words similar to bad words in the preset bad word comparison table from the remaining words in the word segmentation set, for example, the remaining words in the word segmentation set There is the word “mentally retarded” in the presupposed bad vocabulary comparison table, but the word “mentally retarded” is not recorded, but the word “mentally retarded” is recorded. Set the comparison of similarity thresholds, and finally screen out words similar to bad words from the remaining words in the word segmentation set.
  • the steps of calculating the mean value of the similarity between each candidate word and the words in the predetermined bad vocabulary comparison table through the word similarity calculation formula include:
  • the word vector of each candidate word and the bad word vector in the preset bad word word vector set are calculated by the word similarity calculation formula to calculate the similarity, and N similarity values are obtained.
  • the word vector set of is the word vector set obtained by vectorizing the words in the preset bad vocabulary comparison table;
  • the mean value of the similarity between the words in the comparison table of the candidate words and the preset bad words is obtained.
  • obtaining the mean similarity value of the words in the comparison table of the candidate words and the preset bad words includes:
  • N similarity values are added and processed to obtain the total similarity value; where N is the number of words in the preset bad vocabulary comparison table;
  • each candidate word is quantified to obtain the word vector of the candidate word, and the words in the preset bad vocabulary comparison table are vectorized in advance to obtain The preset word vector set of bad words, taking the word vector of any candidate word as an example, the word vector of the candidate word is similar to each bad word vector in the preset bad word word vector set through words
  • the degree calculation formula performs similarity calculation to obtain N similarity values, where N is the number of words in the preset bad vocabulary comparison table, and then the N similarity values are added and averaged, which is the candidate to be selected
  • Each candidate word is calculated according to the above method to obtain the mean value of similarity.
  • W1 is the word vector of the word to be selected
  • W2 is any word vector in the preset word vector set of bad words
  • n is the word vector dimension
  • W1 i is the value of W1 in the i dimensions of W1
  • W2 i is W2 is the value of W2 in i dimensions.
  • the similarity threshold range is preset, and words that meet the preset similarity threshold range are selected from the remaining words in the word segmentation set and loaded into the first bad vocabulary list.
  • step S140 through the sentiment analysis algorithm, words that do not meet the preset sentiment trend rule of the undesirable words are screened out from the first unhealthy vocabulary to obtain the second unhealthy vocabulary.
  • the bad words selected by the similarity may have non-bad words, so it is necessary to screen the words in the first bad vocabulary to deal with non-bad words, through the sentiment analysis algorithm (its English abbreviation is SO-PMI algorithm), from the first
  • SO-PMI algorithm the sentiment analysis algorithm
  • a bad vocabulary is used to filter out words that do not satisfy the preset bad words sentiment tendency rule, where the sentiment analysis algorithm (its English abbreviation is SO-PMI algorithm) is a point mutual information algorithm, which is used to calculate the value of the word sentiment tendency strength.
  • the words that do not meet the preset emotional tendency rules of the bad words are filtered out from the first bad vocabulary, and the steps of obtaining the second bad vocabulary include:
  • the word co-occurrence frequency calculation formula is used to calculate the word co-occurrence frequency of the word vector to be calculated and the word vector in the pre-built civilized vocabulary and the word co-occurrence frequency of the word vector to be calculated and the word vector in the pre-built uncivilized vocabulary, as Co-occurrence frequency of unused words;
  • the emotional tendency intensity value of each word in the first bad vocabulary is calculated through the calculation formula of sentiment analysis
  • F(N1, N2) refers to the frequency at which N1 and N2 appear simultaneously in a window of a set size in all n articles
  • F(N1), F(N2) refers to all n articles The frequency at which N1 and N2 appear respectively.
  • Q is the words in the first bad vocabulary
  • Cwords is the pre-built civilized vocabulary
  • Iwords is the pre-built uncivilized vocabulary
  • PMI (Q, cword) is the words in the first bad vocabulary and the pre-built civilized vocabulary Co-occurrence frequency of word vectors in the library
  • PMI(Q, Iword) is the co-occurrence frequency of words in the first bad vocabulary and word vectors in the pre-built uncivilized vocabulary
  • SO-PMI(Q) is the first bad word The value of the emotional tendency intensity of the word Q in the table.
  • the threshold rule for the intensity of sentimentality is:
  • the word is a word that does not meet the preset bad words emotional tendency rule
  • the words are words that meet the preset bad words emotional tendency rules.
  • sentiment analysis algorithm to determine the polarity of words is based on the polarity of large-scale corpus mining words, and the unregistered words are judged based on the frequency of unregistered words co-occurring with existing vocabulary whose polarity has been determined polarity.
  • Word co-occurrence means that two words appear at the same time in a certain word window.
  • sentiment analysis algorithms to determine the polarity of words requires the construction of a seed vocabulary, pre-built uncivilized vocabulary as a bad seed vocabulary, including the same number of civilized vocabulary as the pre-built civilized vocabulary, and then calculated according to the sentiment analysis calculation formula The emotional tendency intensity value of the word w, and then judge the possibility that the word is a bad vocabulary according to the emotional tendency intensity threshold rule.
  • step S150 words that do not meet the position structure of the bad vocabulary sentence are screened out from the second bad vocabulary list through the word position structure method, and the third bad vocabulary list is obtained and output.
  • the word position structure method is used to filter out the bad words from the second bad vocabulary list. Words conforming to the positional structure of the bad vocabulary sentence get the third bad vocabulary list.
  • the steps of screening words that do not conform to the position structure of the bad vocabulary sentence from the second bad vocabulary include:
  • the specific method is based on the part-of-speech tagging and syntactic analysis functions provided by the word segmentation tool. Take “You are really mentally retarded” as an example. Its part-of-speech tagging is: you(r)/ ⁇ (d)/ ⁇ (q)/mentally retarded( n). This sentence pattern and part of speech structure can be included as a template. The part-of-speech tag of "this is an obstacle” is: this (r)/is (v)/a (q)/obstacle (n). The word handicap and the word mentally handicapped are used differently in terms of syntactic structure and part-of-speech structure. Therefore, the introduction of such "impurities" can be reduced according to the summarized lexical-syntactic part-of-speech structure template.
  • This application provides a method for monitoring bad network data, which is applied to an electronic device 1.
  • FIG. 2 it is a schematic diagram of the application environment of the preferred embodiment of the method for monitoring network bad data according to the present application.
  • the electronic device 1 may be a terminal device with a computing function such as a server, a smart phone, a tablet computer, a portable computer, a desktop computer, and the like.
  • the electronic device 1 includes a processor 12, a memory 11, a network interface 13, and a communication bus 14.
  • the memory 11 includes at least one type of readable storage medium.
  • the at least one type of readable storage medium may be a non-volatile storage medium such as flash memory, hard disk, multimedia card, card-type memory 11, and the like.
  • the readable storage medium may be an internal storage unit of the electronic device 1, for example, the hard disk of the electronic device 1.
  • the readable storage medium may also be the external memory 11 of the electronic device 1, such as a plug-in hard disk, a smart media card (SMC), and a secure digital (Secure Digital) equipped on the electronic device 1. , SD) card, flash card (Flash Card), etc.
  • the readable storage medium of the memory 11 is generally used to store the network bad data monitoring program 10 installed in the electronic device 1, a preset bad word comparison table, and the like.
  • the memory 11 can also be used to temporarily store data that has been output or will be output.
  • the processor 12 may be a central processing unit (CPU), a microprocessor, or other data processing chip, which is used to run program codes or process data stored in the memory 11, for example, execute network bad data. Monitoring program 10 etc.
  • CPU central processing unit
  • microprocessor microprocessor
  • other data processing chip which is used to run program codes or process data stored in the memory 11, for example, execute network bad data. Monitoring program 10 etc.
  • the network interface 13 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is generally used to establish a communication connection between the electronic device 1 and other electronic devices.
  • a standard wired interface and a wireless interface such as a WI-FI interface
  • the communication bus 14 is used to realize the connection and communication between the above-mentioned components.
  • FIG. 2 only shows the electronic device 1 with the components 11-14, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead.
  • the memory 11 as a computer storage medium may include an operating system and a network bad data monitoring program 10; when the processor 12 executes the network bad data monitoring program 10 stored in the memory 11, The steps of the method for monitoring bad network data in Embodiment 1 are implemented, as shown in Fig. 1 for example.
  • the processor 12 implements the functions of the modules/units in the foregoing device embodiments when executing the network bad data monitoring method.
  • the network bad data monitoring program 10 shown in FIG. 3 can be divided into: a word segmentation processing module 110, bad words The screening module 120, the word similarity calculation module 130, the sentiment analysis module 140, and the word location structure screening module 150.
  • modules 110-150 are all similar to the above, and will not be described in detail here. Illustratively, for example:
  • the word segmentation processing module 110 is used to perform word segmentation processing on the target text to obtain a word segmentation set.
  • Bad word screening module 120 used to compare words in the word segmentation set with a preset bad vocabulary comparison table, filter bad words from the word segmentation set, load the bad words into the first bad vocabulary list, and filter out the word segmentation set The remaining words as candidates for selection.
  • Word similarity calculation module 130 used to calculate the average similarity between each candidate word and the words in the preset bad vocabulary comparison table through the word similarity calculation formula, and calculate the average similarity greater than the preset similarity threshold for candidates The words are loaded into the first bad vocabulary list.
  • Sentiment analysis module 140 used to filter out words that do not meet the preset emotional tendency rule of bad words from the first bad vocabulary through the sentiment analysis algorithm to obtain the second bad vocabulary.
  • Word position structure screening module 150 used to filter out words that do not conform to the position structure of the bad vocabulary sentence from the second bad vocabulary list through the word position structure method to obtain and output the third bad vocabulary list.
  • the embodiment of the present application also proposes a network bad data monitoring system 400, which includes a word segmentation processing unit 410, a bad word screening unit 420, a word similarity calculation unit 430, an sentiment analysis unit 440, and word location structure screening Unit 450, in which the word segmentation processing unit 410, the bad word screening unit 420, the word similarity calculation unit 430, the sentiment analysis unit 440, and the word location structure screening unit 450 realize the functions and the steps of the network bad data monitoring method in the embodiment one by one correspond.
  • the word segmentation processing unit 410 is configured to perform word segmentation processing on the target text to obtain a word segmentation set
  • the bad word screening unit 420 is used to compare words in the word segmentation set with a preset bad vocabulary comparison table, filter out bad words from the word segmentation set, load the bad words into the first bad vocabulary list, and filter out the word segmentation set The remaining words of as candidates for selection;
  • the word similarity calculation unit 430 is used to calculate the average similarity between each candidate word and the words in the preset bad vocabulary comparison table through a word similarity calculation formula, and to select candidates whose average similarity is greater than the preset similarity threshold Words are loaded into the first bad vocabulary list;
  • the sentiment analysis unit 440 is configured to filter out words that do not meet the preset sentiment trend rule of the bad words from the first bad vocabulary through the sentiment analysis algorithm to obtain the second bad vocabulary;
  • the word position structure screening unit 450 is used to filter out words that do not meet the position structure of the bad vocabulary sentence from the second bad vocabulary list through the word position structure method to obtain and output the third bad vocabulary list.
  • the embodiment of the present application also proposes a computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile; the computer-readable storage medium includes a network bad data monitoring program, so The network bad data monitoring program is executed by the processor to implement the network bad data monitoring method in Embodiment 1. In order to avoid repetition, it will not be repeated here. Or, when the computer program is executed by the processor, the function of each module/unit in the network bad data monitoring system in Embodiment 4 is realized. To avoid repetition, it will not be repeated here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

La présente invention concerne un procédé et un appareil de surveillance de données défectueuses de réseau, et un support de stockage lisible par ordinateur. Le procédé comprend : la réalisation d'une segmentation de mots sur un texte cible (S110) ; la comparaison de mots dans un ensemble de segmentation de mots avec une table de comparaison de vocabulaire défectueux prédéfinie, le criblage de mots défectueux à partir de l'ensemble de segmentation de mots, et le chargement des mots défectueux dans une première liste de vocabulaire défectueux (S120) ; au moyen d'une formule de calcul de similarité de mots, le calcul d'une similarité moyenne de chaque mot à sélectionner, et le chargement du mot à sélectionner dont la similarité moyenne est supérieure à un seuil de similarité prédéfini dans la première liste de vocabulaire défectueux (S130) ; le criblage de mots qui ne satisfont pas à une règle de tendance d'émotion de mots défectueux prédéfinie à l'aide d'un algorithme d'analyse de sentiment (S140) ; et le criblage de mots qui ne sont pas conformes à une structure de position de phrase de vocabulaire défectueux au moyen d'un procédé de structure de position de mot (S150). Le procédé peut découvrir plus précisément un vocabulaire défectueux non enregistré, et en comparaison avec la technologie existante, la précision et l'exactitude du vocabulaire défectueux enregistré sont plus élevées.
PCT/CN2020/136403 2020-02-26 2020-12-15 Procédé, appareil et système de surveillance de données défectueuses de réseau, et support de stockage WO2021169499A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010119614.7A CN111400439A (zh) 2020-02-26 2020-02-26 网络不良数据监控方法、装置及存储介质
CN202010119614.7 2020-02-26

Publications (1)

Publication Number Publication Date
WO2021169499A1 true WO2021169499A1 (fr) 2021-09-02

Family

ID=71428466

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/136403 WO2021169499A1 (fr) 2020-02-26 2020-12-15 Procédé, appareil et système de surveillance de données défectueuses de réseau, et support de stockage

Country Status (2)

Country Link
CN (1) CN111400439A (fr)
WO (1) WO2021169499A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114897566A (zh) * 2022-03-21 2022-08-12 晨雨初听(武汉)文化艺术传播有限公司 一种基于大数据的短视频合规性在线诊断分析方法及诊断分析系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111400439A (zh) * 2020-02-26 2020-07-10 平安科技(深圳)有限公司 网络不良数据监控方法、装置及存储介质
CN113627179B (zh) * 2021-10-13 2021-12-21 广东机电职业技术学院 一种基于大数据的威胁情报预警文本分析方法及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101520802A (zh) * 2009-04-13 2009-09-02 腾讯科技(深圳)有限公司 一种问答对的质量评价方法和系统
CN101639824A (zh) * 2009-08-27 2010-02-03 北京理工大学 一种针对不良信息的基于情感倾向性分析的文本过滤方法
CN107992471A (zh) * 2017-11-10 2018-05-04 北京光年无限科技有限公司 一种人机交互过程中的信息过滤方法及装置
CN108984600A (zh) * 2018-06-04 2018-12-11 百度在线网络技术(北京)有限公司 交互处理方法、装置、计算机设备及可读介质
US10262059B2 (en) * 2014-03-14 2019-04-16 Tencent Technology (Shenzhen) Company Limited Method, apparatus, and storage medium for text information processing
CN111400439A (zh) * 2020-02-26 2020-07-10 平安科技(深圳)有限公司 网络不良数据监控方法、装置及存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101464898B (zh) * 2009-01-12 2011-09-21 腾讯科技(深圳)有限公司 一种提取文本主题词的方法
CN104142913A (zh) * 2013-05-07 2014-11-12 株式会社日立制作所 词语极性的判别方法和判别系统
CN110825840B (zh) * 2019-11-08 2023-02-17 北京声智科技有限公司 词库扩充方法、装置、设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101520802A (zh) * 2009-04-13 2009-09-02 腾讯科技(深圳)有限公司 一种问答对的质量评价方法和系统
CN101639824A (zh) * 2009-08-27 2010-02-03 北京理工大学 一种针对不良信息的基于情感倾向性分析的文本过滤方法
US10262059B2 (en) * 2014-03-14 2019-04-16 Tencent Technology (Shenzhen) Company Limited Method, apparatus, and storage medium for text information processing
CN107992471A (zh) * 2017-11-10 2018-05-04 北京光年无限科技有限公司 一种人机交互过程中的信息过滤方法及装置
CN108984600A (zh) * 2018-06-04 2018-12-11 百度在线网络技术(北京)有限公司 交互处理方法、装置、计算机设备及可读介质
CN111400439A (zh) * 2020-02-26 2020-07-10 平安科技(深圳)有限公司 网络不良数据监控方法、装置及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114897566A (zh) * 2022-03-21 2022-08-12 晨雨初听(武汉)文化艺术传播有限公司 一种基于大数据的短视频合规性在线诊断分析方法及诊断分析系统
CN114897566B (zh) * 2022-03-21 2023-08-04 深圳市单仁牛商科技股份有限公司 一种基于大数据的短视频合规性在线诊断分析方法及诊断分析系统

Also Published As

Publication number Publication date
CN111400439A (zh) 2020-07-10

Similar Documents

Publication Publication Date Title
WO2021169499A1 (fr) Procédé, appareil et système de surveillance de données défectueuses de réseau, et support de stockage
US11093854B2 (en) Emoji recommendation method and device thereof
US10262059B2 (en) Method, apparatus, and storage medium for text information processing
CN109446517B (zh) 指代消解方法、电子装置及计算机可读存储介质
CN109471944B (zh) 文本分类模型的训练方法、装置及可读存储介质
Altowayan et al. Improving Arabic sentiment analysis with sentiment-specific embeddings
CN110083832B (zh) 文章转载关系的识别方法、装置、设备及可读存储介质
CN107924398B (zh) 用于提供以评论为中心的新闻阅读器的系统和方法
CN110059156A (zh) 基于关联词的协同检索方法、装置、设备及可读存储介质
CN108959259B (zh) 新词发现方法及系统
US20230114673A1 (en) Method for recognizing token, electronic device and storage medium
CN112818686A (zh) 领域短语挖掘方法、装置和电子设备
CN106663123B (zh) 以评论为中心的新闻阅读器
CN112989235A (zh) 基于知识库的内链构建方法、装置、设备和存储介质
CN114330343A (zh) 词性感知嵌套命名实体识别方法、系统、设备和存储介质
CN115730597A (zh) 多级语义意图识别方法及其相关设备
WO2019041528A1 (fr) Procédé, appareil électronique et support d'informations lisible par ordinateur permettant de déterminer la polarité d'un sentiment portant sur les actualités
CN114048288A (zh) 细粒度情感分析方法、系统、计算机设备和存储介质
US11669574B2 (en) Method, apparatus, and computer-readable medium for determining a data domain associated with data
US9336197B2 (en) Language recognition based on vocabulary lists
CN116451072A (zh) 结构化敏感数据识别方法及装置
US20220335070A1 (en) Method and apparatus for querying writing material, and storage medium
US20190087086A1 (en) Method for providing cognitive semiotics based multimodal predictions and electronic device thereof
CN114491038A (zh) 一种基于会话场景的流程挖掘方法、装置及设备
CN112926297B (zh) 处理信息的方法、装置、设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20921970

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20921970

Country of ref document: EP

Kind code of ref document: A1