WO2021169499A1 - Procédé, appareil et système de surveillance de données défectueuses de réseau, et support de stockage - Google Patents
Procédé, appareil et système de surveillance de données défectueuses de réseau, et support de stockage Download PDFInfo
- Publication number
- WO2021169499A1 WO2021169499A1 PCT/CN2020/136403 CN2020136403W WO2021169499A1 WO 2021169499 A1 WO2021169499 A1 WO 2021169499A1 CN 2020136403 W CN2020136403 W CN 2020136403W WO 2021169499 A1 WO2021169499 A1 WO 2021169499A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- bad
- word
- words
- vocabulary
- preset
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
Definitions
- This application relates to the technical field of big data processing, and in particular to a method, device and computer-readable storage medium for monitoring network bad data.
- this application provides a method, device, and computer-readable storage medium for monitoring network bad data.
- the main purpose of the method is to divide each word with a preset
- the bad words in the bad vocabulary comparison table are compared, and the same bad words are loaded into the first bad vocabulary list. Because the bad words in the bad vocabulary comparison table are limited, there may be bad words similar to the bad words, so through the words
- the similarity calculation formula calculates the word segmentation in the target text again, and loads the words that meet the preset similarity threshold range into the first bad vocabulary. Since the bad words found by the similarity calculation are not certain, the emotions
- the analysis algorithm and word position structure method screen out the non-bad words in the first bad vocabulary, and finally output the third bad vocabulary.
- the unregistered bad vocabulary can be found more accurately. Compared with the prior art, the accuracy of the recorded bad vocabulary is higher and the accuracy is improved.
- this application provides a method for monitoring network bad data, which includes:
- word similarity calculation formula calculate the average similarity of each candidate word and the words in the preset bad vocabulary comparison table, and load the candidate words with the average similarity greater than the preset similarity threshold to the The first bad vocabulary list;
- the words that do not meet the preset sentiment trend rule of the bad words are screened out from the first bad vocabulary to obtain the second bad vocabulary;
- words that do not conform to the position structure of the bad vocabulary sentence are screened out from the second bad vocabulary list, and the third bad vocabulary list is obtained and output.
- the present application also provides an electronic device, the electronic device comprising: a memory, a processor, and a network bad data monitoring program is stored in the memory, and the network bad data monitoring program is When the processor executes, the following steps are implemented:
- word similarity calculation formula calculate the average similarity of each candidate word and the words in the preset bad vocabulary comparison table, and load the candidate words with the average similarity greater than the preset similarity threshold to the The first bad vocabulary list;
- the words that do not meet the preset sentiment trend rule of the bad words are screened out from the first bad vocabulary to obtain the second bad vocabulary;
- words that do not conform to the position structure of the bad vocabulary sentence are screened out from the second bad vocabulary list, and the third bad vocabulary list is obtained and output.
- this application also provides a network bad data monitoring system, including:
- the word segmentation processing unit is used to perform word segmentation processing on the target text to obtain a word segmentation set
- the bad word screening unit is used to compare words in the word segmentation set with a preset bad vocabulary comparison table, filter bad words from the word segmentation set, load the bad words into the first bad vocabulary list, and load the bad words into the first bad vocabulary list. The remaining words after screening in the word segmentation set are used as candidate words;
- the word similarity calculation unit is used to calculate the average similarity between each of the candidate words and the words in the preset bad vocabulary comparison table through a word similarity calculation formula, and make the average similarity greater than the preset similarity threshold Load the candidate words of to the first bad vocabulary list;
- the sentiment analysis unit is used to screen out words that do not meet the preset sentiment trend rule of undesirable words from the first unhealthy vocabulary through an sentiment analysis algorithm to obtain a second unhealthy vocabulary;
- the word position structure screening unit is used to filter out words that do not conform to the position structure of the bad vocabulary sentence from the second bad vocabulary list through the word position structure method to obtain and output the third bad vocabulary list.
- the present application also provides a computer-readable storage medium in which a network bad data monitoring program is stored, and when the network bad data monitoring program is executed by a processor, Realize any step in the method for monitoring network bad data as described above.
- the network bad data monitoring method, device and computer readable storage medium proposed in this application compare each word segment with the bad words in the preset bad vocabulary comparison table after word segmentation processing of the target text, and compare the same bad words.
- the words are loaded into the first bad vocabulary list. Due to the limited bad words in the bad vocabulary comparison table, there may be bad words similar to the bad words. Therefore, the word similarity calculation formula is used to calculate the word segmentation in the target text again.
- the words with the preset similarity threshold range are loaded into the first bad vocabulary. Since the bad words found by the similarity calculation are not certain, the sentiment analysis algorithm and word position structure method are used to analyze the non-bad words in the first bad vocabulary. Words are screened out, and finally the third bad vocabulary list is output.
- the unregistered bad vocabulary can be found more accurately. Compared with the prior art, the accuracy of the recorded bad vocabulary is higher and the accuracy is improved.
- FIG. 1 is a flowchart of a preferred embodiment of a method for monitoring bad network data according to this application;
- FIG. 2 is a schematic diagram of an application environment of a preferred embodiment of a method for monitoring bad network data according to this application;
- FIG. 3 is a schematic diagram of modules of a preferred embodiment of the network bad data monitoring program in FIG. 2;
- Figure 4 is a system logic diagram corresponding to the method for monitoring bad network data in this application.
- the present application provides a method for monitoring bad network data.
- FIG. 1 it is a schematic diagram of an application environment of a preferred embodiment of the method for monitoring bad network data according to this application.
- the method can be executed by a device, and the device can be implemented by software and/or hardware.
- the method for monitoring network bad data includes: step S110-step S150.
- Step S110 Perform word segmentation processing on the target text to obtain a word segmentation set.
- Step S120 Compare the words in the word segmentation set with a preset bad vocabulary comparison table, filter out bad words from the word segmentation set, load the bad words into the first bad vocabulary list, and use the remaining words filtered in the word segmentation set as waiting Choose words.
- each word in the word segmentation set is compared with the bad words in the preset bad vocabulary comparison table.
- the preset bad vocabulary comparison table stores a large number of bad words. Through the comparison, the bad words in the word segmentation set can be determined. Words, filter out the words identified as bad words in the word segmentation set, and load them into the first bad vocabulary list.
- the bad words in the preset bad vocabulary comparison table can be derived from common bad words in the Internet.
- the words in the word segmentation set are compared with the preset bad words If the bad words in the comparison table are exactly the same, the words are selected from the word segmentation set and loaded into the first bad vocabulary list. For example, if the word “mentally retarded” exists in the word segmentation set, it is also in the default bad vocabulary comparison table. If the word “mentally retarded” is recorded, the "mentally retarded” in the word segmentation set will be filtered out and recorded in the first bad vocabulary list.
- the words in the word segmentation set are compared with the preset bad vocabulary comparison table, bad words are selected from the word segmentation set, the bad words are loaded into the first bad vocabulary list, and the remaining words after screening in the word segmentation set are selected as candidates
- the word steps include:
- the preset same word screening model includes:
- the first input layer for inputting words in the word segmentation set for inputting words in the word segmentation set
- the second input layer for inputting the preset bad vocabulary comparison table for inputting the words input for the first input layer and the preset bad words input for the second input layer
- the same word filtering layer for comparison and analysis of the comparison table for comparison and analysis of the comparison table
- the first output layer used to output bad words from the word segmentation set in the same word filtering layer and the first output layer used to filter the same word filtering layer from the word segmentation set
- the second output layer where the remaining words after bad words are output.
- Step S130 Calculate the average similarity between each candidate word and the words in the preset bad vocabulary comparison table through the word similarity calculation formula, and load the candidate words with the average similarity greater than the preset similarity threshold to the first bad word. Glossary.
- the word segmentation set Since the bad words in the preset bad vocabulary comparison table are usually bad words that have been recorded, the recorded bad words are limited. If there are bad words in the word segmentation set that are not recorded in the preset bad vocabulary comparison table, the word segmentation set The screening of bad words in is not thorough enough, so the word similarity calculation formula can filter out bad words similar to bad words in the preset bad word comparison table from the remaining words in the word segmentation set, for example, the remaining words in the word segmentation set There is the word “mentally retarded” in the presupposed bad vocabulary comparison table, but the word “mentally retarded” is not recorded, but the word “mentally retarded” is recorded. Set the comparison of similarity thresholds, and finally screen out words similar to bad words from the remaining words in the word segmentation set.
- the steps of calculating the mean value of the similarity between each candidate word and the words in the predetermined bad vocabulary comparison table through the word similarity calculation formula include:
- the word vector of each candidate word and the bad word vector in the preset bad word word vector set are calculated by the word similarity calculation formula to calculate the similarity, and N similarity values are obtained.
- the word vector set of is the word vector set obtained by vectorizing the words in the preset bad vocabulary comparison table;
- the mean value of the similarity between the words in the comparison table of the candidate words and the preset bad words is obtained.
- obtaining the mean similarity value of the words in the comparison table of the candidate words and the preset bad words includes:
- N similarity values are added and processed to obtain the total similarity value; where N is the number of words in the preset bad vocabulary comparison table;
- each candidate word is quantified to obtain the word vector of the candidate word, and the words in the preset bad vocabulary comparison table are vectorized in advance to obtain The preset word vector set of bad words, taking the word vector of any candidate word as an example, the word vector of the candidate word is similar to each bad word vector in the preset bad word word vector set through words
- the degree calculation formula performs similarity calculation to obtain N similarity values, where N is the number of words in the preset bad vocabulary comparison table, and then the N similarity values are added and averaged, which is the candidate to be selected
- Each candidate word is calculated according to the above method to obtain the mean value of similarity.
- W1 is the word vector of the word to be selected
- W2 is any word vector in the preset word vector set of bad words
- n is the word vector dimension
- W1 i is the value of W1 in the i dimensions of W1
- W2 i is W2 is the value of W2 in i dimensions.
- the similarity threshold range is preset, and words that meet the preset similarity threshold range are selected from the remaining words in the word segmentation set and loaded into the first bad vocabulary list.
- step S140 through the sentiment analysis algorithm, words that do not meet the preset sentiment trend rule of the undesirable words are screened out from the first unhealthy vocabulary to obtain the second unhealthy vocabulary.
- the bad words selected by the similarity may have non-bad words, so it is necessary to screen the words in the first bad vocabulary to deal with non-bad words, through the sentiment analysis algorithm (its English abbreviation is SO-PMI algorithm), from the first
- SO-PMI algorithm the sentiment analysis algorithm
- a bad vocabulary is used to filter out words that do not satisfy the preset bad words sentiment tendency rule, where the sentiment analysis algorithm (its English abbreviation is SO-PMI algorithm) is a point mutual information algorithm, which is used to calculate the value of the word sentiment tendency strength.
- the words that do not meet the preset emotional tendency rules of the bad words are filtered out from the first bad vocabulary, and the steps of obtaining the second bad vocabulary include:
- the word co-occurrence frequency calculation formula is used to calculate the word co-occurrence frequency of the word vector to be calculated and the word vector in the pre-built civilized vocabulary and the word co-occurrence frequency of the word vector to be calculated and the word vector in the pre-built uncivilized vocabulary, as Co-occurrence frequency of unused words;
- the emotional tendency intensity value of each word in the first bad vocabulary is calculated through the calculation formula of sentiment analysis
- F(N1, N2) refers to the frequency at which N1 and N2 appear simultaneously in a window of a set size in all n articles
- F(N1), F(N2) refers to all n articles The frequency at which N1 and N2 appear respectively.
- Q is the words in the first bad vocabulary
- Cwords is the pre-built civilized vocabulary
- Iwords is the pre-built uncivilized vocabulary
- PMI (Q, cword) is the words in the first bad vocabulary and the pre-built civilized vocabulary Co-occurrence frequency of word vectors in the library
- PMI(Q, Iword) is the co-occurrence frequency of words in the first bad vocabulary and word vectors in the pre-built uncivilized vocabulary
- SO-PMI(Q) is the first bad word The value of the emotional tendency intensity of the word Q in the table.
- the threshold rule for the intensity of sentimentality is:
- the word is a word that does not meet the preset bad words emotional tendency rule
- the words are words that meet the preset bad words emotional tendency rules.
- sentiment analysis algorithm to determine the polarity of words is based on the polarity of large-scale corpus mining words, and the unregistered words are judged based on the frequency of unregistered words co-occurring with existing vocabulary whose polarity has been determined polarity.
- Word co-occurrence means that two words appear at the same time in a certain word window.
- sentiment analysis algorithms to determine the polarity of words requires the construction of a seed vocabulary, pre-built uncivilized vocabulary as a bad seed vocabulary, including the same number of civilized vocabulary as the pre-built civilized vocabulary, and then calculated according to the sentiment analysis calculation formula The emotional tendency intensity value of the word w, and then judge the possibility that the word is a bad vocabulary according to the emotional tendency intensity threshold rule.
- step S150 words that do not meet the position structure of the bad vocabulary sentence are screened out from the second bad vocabulary list through the word position structure method, and the third bad vocabulary list is obtained and output.
- the word position structure method is used to filter out the bad words from the second bad vocabulary list. Words conforming to the positional structure of the bad vocabulary sentence get the third bad vocabulary list.
- the steps of screening words that do not conform to the position structure of the bad vocabulary sentence from the second bad vocabulary include:
- the specific method is based on the part-of-speech tagging and syntactic analysis functions provided by the word segmentation tool. Take “You are really mentally retarded” as an example. Its part-of-speech tagging is: you(r)/ ⁇ (d)/ ⁇ (q)/mentally retarded( n). This sentence pattern and part of speech structure can be included as a template. The part-of-speech tag of "this is an obstacle” is: this (r)/is (v)/a (q)/obstacle (n). The word handicap and the word mentally handicapped are used differently in terms of syntactic structure and part-of-speech structure. Therefore, the introduction of such "impurities" can be reduced according to the summarized lexical-syntactic part-of-speech structure template.
- This application provides a method for monitoring bad network data, which is applied to an electronic device 1.
- FIG. 2 it is a schematic diagram of the application environment of the preferred embodiment of the method for monitoring network bad data according to the present application.
- the electronic device 1 may be a terminal device with a computing function such as a server, a smart phone, a tablet computer, a portable computer, a desktop computer, and the like.
- the electronic device 1 includes a processor 12, a memory 11, a network interface 13, and a communication bus 14.
- the memory 11 includes at least one type of readable storage medium.
- the at least one type of readable storage medium may be a non-volatile storage medium such as flash memory, hard disk, multimedia card, card-type memory 11, and the like.
- the readable storage medium may be an internal storage unit of the electronic device 1, for example, the hard disk of the electronic device 1.
- the readable storage medium may also be the external memory 11 of the electronic device 1, such as a plug-in hard disk, a smart media card (SMC), and a secure digital (Secure Digital) equipped on the electronic device 1. , SD) card, flash card (Flash Card), etc.
- the readable storage medium of the memory 11 is generally used to store the network bad data monitoring program 10 installed in the electronic device 1, a preset bad word comparison table, and the like.
- the memory 11 can also be used to temporarily store data that has been output or will be output.
- the processor 12 may be a central processing unit (CPU), a microprocessor, or other data processing chip, which is used to run program codes or process data stored in the memory 11, for example, execute network bad data. Monitoring program 10 etc.
- CPU central processing unit
- microprocessor microprocessor
- other data processing chip which is used to run program codes or process data stored in the memory 11, for example, execute network bad data. Monitoring program 10 etc.
- the network interface 13 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is generally used to establish a communication connection between the electronic device 1 and other electronic devices.
- a standard wired interface and a wireless interface such as a WI-FI interface
- the communication bus 14 is used to realize the connection and communication between the above-mentioned components.
- FIG. 2 only shows the electronic device 1 with the components 11-14, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead.
- the memory 11 as a computer storage medium may include an operating system and a network bad data monitoring program 10; when the processor 12 executes the network bad data monitoring program 10 stored in the memory 11, The steps of the method for monitoring bad network data in Embodiment 1 are implemented, as shown in Fig. 1 for example.
- the processor 12 implements the functions of the modules/units in the foregoing device embodiments when executing the network bad data monitoring method.
- the network bad data monitoring program 10 shown in FIG. 3 can be divided into: a word segmentation processing module 110, bad words The screening module 120, the word similarity calculation module 130, the sentiment analysis module 140, and the word location structure screening module 150.
- modules 110-150 are all similar to the above, and will not be described in detail here. Illustratively, for example:
- the word segmentation processing module 110 is used to perform word segmentation processing on the target text to obtain a word segmentation set.
- Bad word screening module 120 used to compare words in the word segmentation set with a preset bad vocabulary comparison table, filter bad words from the word segmentation set, load the bad words into the first bad vocabulary list, and filter out the word segmentation set The remaining words as candidates for selection.
- Word similarity calculation module 130 used to calculate the average similarity between each candidate word and the words in the preset bad vocabulary comparison table through the word similarity calculation formula, and calculate the average similarity greater than the preset similarity threshold for candidates The words are loaded into the first bad vocabulary list.
- Sentiment analysis module 140 used to filter out words that do not meet the preset emotional tendency rule of bad words from the first bad vocabulary through the sentiment analysis algorithm to obtain the second bad vocabulary.
- Word position structure screening module 150 used to filter out words that do not conform to the position structure of the bad vocabulary sentence from the second bad vocabulary list through the word position structure method to obtain and output the third bad vocabulary list.
- the embodiment of the present application also proposes a network bad data monitoring system 400, which includes a word segmentation processing unit 410, a bad word screening unit 420, a word similarity calculation unit 430, an sentiment analysis unit 440, and word location structure screening Unit 450, in which the word segmentation processing unit 410, the bad word screening unit 420, the word similarity calculation unit 430, the sentiment analysis unit 440, and the word location structure screening unit 450 realize the functions and the steps of the network bad data monitoring method in the embodiment one by one correspond.
- the word segmentation processing unit 410 is configured to perform word segmentation processing on the target text to obtain a word segmentation set
- the bad word screening unit 420 is used to compare words in the word segmentation set with a preset bad vocabulary comparison table, filter out bad words from the word segmentation set, load the bad words into the first bad vocabulary list, and filter out the word segmentation set The remaining words of as candidates for selection;
- the word similarity calculation unit 430 is used to calculate the average similarity between each candidate word and the words in the preset bad vocabulary comparison table through a word similarity calculation formula, and to select candidates whose average similarity is greater than the preset similarity threshold Words are loaded into the first bad vocabulary list;
- the sentiment analysis unit 440 is configured to filter out words that do not meet the preset sentiment trend rule of the bad words from the first bad vocabulary through the sentiment analysis algorithm to obtain the second bad vocabulary;
- the word position structure screening unit 450 is used to filter out words that do not meet the position structure of the bad vocabulary sentence from the second bad vocabulary list through the word position structure method to obtain and output the third bad vocabulary list.
- the embodiment of the present application also proposes a computer-readable storage medium.
- the computer-readable storage medium may be non-volatile or volatile; the computer-readable storage medium includes a network bad data monitoring program, so The network bad data monitoring program is executed by the processor to implement the network bad data monitoring method in Embodiment 1. In order to avoid repetition, it will not be repeated here. Or, when the computer program is executed by the processor, the function of each module/unit in the network bad data monitoring system in Embodiment 4 is realized. To avoid repetition, it will not be repeated here.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
La présente invention concerne un procédé et un appareil de surveillance de données défectueuses de réseau, et un support de stockage lisible par ordinateur. Le procédé comprend : la réalisation d'une segmentation de mots sur un texte cible (S110) ; la comparaison de mots dans un ensemble de segmentation de mots avec une table de comparaison de vocabulaire défectueux prédéfinie, le criblage de mots défectueux à partir de l'ensemble de segmentation de mots, et le chargement des mots défectueux dans une première liste de vocabulaire défectueux (S120) ; au moyen d'une formule de calcul de similarité de mots, le calcul d'une similarité moyenne de chaque mot à sélectionner, et le chargement du mot à sélectionner dont la similarité moyenne est supérieure à un seuil de similarité prédéfini dans la première liste de vocabulaire défectueux (S130) ; le criblage de mots qui ne satisfont pas à une règle de tendance d'émotion de mots défectueux prédéfinie à l'aide d'un algorithme d'analyse de sentiment (S140) ; et le criblage de mots qui ne sont pas conformes à une structure de position de phrase de vocabulaire défectueux au moyen d'un procédé de structure de position de mot (S150). Le procédé peut découvrir plus précisément un vocabulaire défectueux non enregistré, et en comparaison avec la technologie existante, la précision et l'exactitude du vocabulaire défectueux enregistré sont plus élevées.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010119614.7A CN111400439A (zh) | 2020-02-26 | 2020-02-26 | 网络不良数据监控方法、装置及存储介质 |
CN202010119614.7 | 2020-02-26 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021169499A1 true WO2021169499A1 (fr) | 2021-09-02 |
Family
ID=71428466
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/136403 WO2021169499A1 (fr) | 2020-02-26 | 2020-12-15 | Procédé, appareil et système de surveillance de données défectueuses de réseau, et support de stockage |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111400439A (fr) |
WO (1) | WO2021169499A1 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114897566A (zh) * | 2022-03-21 | 2022-08-12 | 晨雨初听(武汉)文化艺术传播有限公司 | 一种基于大数据的短视频合规性在线诊断分析方法及诊断分析系统 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111400439A (zh) * | 2020-02-26 | 2020-07-10 | 平安科技(深圳)有限公司 | 网络不良数据监控方法、装置及存储介质 |
CN113627179B (zh) * | 2021-10-13 | 2021-12-21 | 广东机电职业技术学院 | 一种基于大数据的威胁情报预警文本分析方法及系统 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101520802A (zh) * | 2009-04-13 | 2009-09-02 | 腾讯科技(深圳)有限公司 | 一种问答对的质量评价方法和系统 |
CN101639824A (zh) * | 2009-08-27 | 2010-02-03 | 北京理工大学 | 一种针对不良信息的基于情感倾向性分析的文本过滤方法 |
CN107992471A (zh) * | 2017-11-10 | 2018-05-04 | 北京光年无限科技有限公司 | 一种人机交互过程中的信息过滤方法及装置 |
CN108984600A (zh) * | 2018-06-04 | 2018-12-11 | 百度在线网络技术(北京)有限公司 | 交互处理方法、装置、计算机设备及可读介质 |
US10262059B2 (en) * | 2014-03-14 | 2019-04-16 | Tencent Technology (Shenzhen) Company Limited | Method, apparatus, and storage medium for text information processing |
CN111400439A (zh) * | 2020-02-26 | 2020-07-10 | 平安科技(深圳)有限公司 | 网络不良数据监控方法、装置及存储介质 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101464898B (zh) * | 2009-01-12 | 2011-09-21 | 腾讯科技(深圳)有限公司 | 一种提取文本主题词的方法 |
CN104142913A (zh) * | 2013-05-07 | 2014-11-12 | 株式会社日立制作所 | 词语极性的判别方法和判别系统 |
CN110825840B (zh) * | 2019-11-08 | 2023-02-17 | 北京声智科技有限公司 | 词库扩充方法、装置、设备及存储介质 |
-
2020
- 2020-02-26 CN CN202010119614.7A patent/CN111400439A/zh active Pending
- 2020-12-15 WO PCT/CN2020/136403 patent/WO2021169499A1/fr active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101520802A (zh) * | 2009-04-13 | 2009-09-02 | 腾讯科技(深圳)有限公司 | 一种问答对的质量评价方法和系统 |
CN101639824A (zh) * | 2009-08-27 | 2010-02-03 | 北京理工大学 | 一种针对不良信息的基于情感倾向性分析的文本过滤方法 |
US10262059B2 (en) * | 2014-03-14 | 2019-04-16 | Tencent Technology (Shenzhen) Company Limited | Method, apparatus, and storage medium for text information processing |
CN107992471A (zh) * | 2017-11-10 | 2018-05-04 | 北京光年无限科技有限公司 | 一种人机交互过程中的信息过滤方法及装置 |
CN108984600A (zh) * | 2018-06-04 | 2018-12-11 | 百度在线网络技术(北京)有限公司 | 交互处理方法、装置、计算机设备及可读介质 |
CN111400439A (zh) * | 2020-02-26 | 2020-07-10 | 平安科技(深圳)有限公司 | 网络不良数据监控方法、装置及存储介质 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114897566A (zh) * | 2022-03-21 | 2022-08-12 | 晨雨初听(武汉)文化艺术传播有限公司 | 一种基于大数据的短视频合规性在线诊断分析方法及诊断分析系统 |
CN114897566B (zh) * | 2022-03-21 | 2023-08-04 | 深圳市单仁牛商科技股份有限公司 | 一种基于大数据的短视频合规性在线诊断分析方法及诊断分析系统 |
Also Published As
Publication number | Publication date |
---|---|
CN111400439A (zh) | 2020-07-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021169499A1 (fr) | Procédé, appareil et système de surveillance de données défectueuses de réseau, et support de stockage | |
US11093854B2 (en) | Emoji recommendation method and device thereof | |
US10262059B2 (en) | Method, apparatus, and storage medium for text information processing | |
CN109446517B (zh) | 指代消解方法、电子装置及计算机可读存储介质 | |
CN109471944B (zh) | 文本分类模型的训练方法、装置及可读存储介质 | |
Altowayan et al. | Improving Arabic sentiment analysis with sentiment-specific embeddings | |
CN110083832B (zh) | 文章转载关系的识别方法、装置、设备及可读存储介质 | |
CN107924398B (zh) | 用于提供以评论为中心的新闻阅读器的系统和方法 | |
CN110059156A (zh) | 基于关联词的协同检索方法、装置、设备及可读存储介质 | |
CN108959259B (zh) | 新词发现方法及系统 | |
US20230114673A1 (en) | Method for recognizing token, electronic device and storage medium | |
CN112818686A (zh) | 领域短语挖掘方法、装置和电子设备 | |
CN106663123B (zh) | 以评论为中心的新闻阅读器 | |
CN112989235A (zh) | 基于知识库的内链构建方法、装置、设备和存储介质 | |
CN114330343A (zh) | 词性感知嵌套命名实体识别方法、系统、设备和存储介质 | |
CN115730597A (zh) | 多级语义意图识别方法及其相关设备 | |
WO2019041528A1 (fr) | Procédé, appareil électronique et support d'informations lisible par ordinateur permettant de déterminer la polarité d'un sentiment portant sur les actualités | |
CN114048288A (zh) | 细粒度情感分析方法、系统、计算机设备和存储介质 | |
US11669574B2 (en) | Method, apparatus, and computer-readable medium for determining a data domain associated with data | |
US9336197B2 (en) | Language recognition based on vocabulary lists | |
CN116451072A (zh) | 结构化敏感数据识别方法及装置 | |
US20220335070A1 (en) | Method and apparatus for querying writing material, and storage medium | |
US20190087086A1 (en) | Method for providing cognitive semiotics based multimodal predictions and electronic device thereof | |
CN114491038A (zh) | 一种基于会话场景的流程挖掘方法、装置及设备 | |
CN112926297B (zh) | 处理信息的方法、装置、设备和存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20921970 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20921970 Country of ref document: EP Kind code of ref document: A1 |