CN114797114A - Real-time intelligent identification method and system for game chat advertisement - Google Patents
Real-time intelligent identification method and system for game chat advertisement Download PDFInfo
- Publication number
- CN114797114A CN114797114A CN202110133090.1A CN202110133090A CN114797114A CN 114797114 A CN114797114 A CN 114797114A CN 202110133090 A CN202110133090 A CN 202110133090A CN 114797114 A CN114797114 A CN 114797114A
- Authority
- CN
- China
- Prior art keywords
- chat
- text
- layer
- player
- advertisement
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 239000011159 matrix material Substances 0.000 claims description 42
- 238000004364 calculation method Methods 0.000 claims description 17
- 238000010606 normalization Methods 0.000 claims description 12
- 238000007781 pre-processing Methods 0.000 claims description 12
- 230000007774 longterm Effects 0.000 claims description 10
- 238000007637 random forest analysis Methods 0.000 claims description 10
- 238000004458 analytical method Methods 0.000 claims description 9
- 230000002457 bidirectional effect Effects 0.000 claims description 8
- 238000012423 maintenance Methods 0.000 claims description 8
- 238000004140 cleaning Methods 0.000 claims description 6
- 230000006870 function Effects 0.000 claims description 6
- 238000013135 deep learning Methods 0.000 claims description 5
- 238000001514 detection method Methods 0.000 claims description 5
- 230000015654 memory Effects 0.000 claims description 5
- 230000007787 long-term memory Effects 0.000 claims description 4
- 230000000750 progressive effect Effects 0.000 claims description 4
- 230000006403 short-term memory Effects 0.000 claims description 4
- 239000000284 extract Substances 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims 2
- 238000007418 data mining Methods 0.000 description 6
- 238000013500 data storage Methods 0.000 description 3
- 238000013136 deep learning model Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
Images
Classifications
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/70—Game security or game management aspects
- A63F13/79—Game security or game management aspects involving player-related data, e.g. identities, accounts, preferences or play histories
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/85—Providing additional services to players
- A63F13/87—Communicating with other players during game play, e.g. by e-mail or chat
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Security & Cryptography (AREA)
- General Business, Economics & Management (AREA)
- Business, Economics & Management (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a real-time intelligent identification method and a real-time intelligent identification system for game chat advertisements. Extracting chat characteristic data of each player from game chat contents of a game server in real time, and distributing the chat characteristic data to a group of chat wind control servers for processing through a load balancing framework of an ngnix server and an asynchronous mechanism of a distributed database redis; and asynchronously processing the refreshed chat characteristic data of redis thereof through various advertisement identification rules/models set in each chat wind control server, and identifying whether the advertisement content exists in real time. Compared with the existing game chat advertisement identification scheme, the technical scheme provided by the invention has the advantages of good stability, high processing speed, strong real-time property, high identification accuracy and high automation degree.
Description
Technical Field
The scheme provided by the invention relates to the field of text processing and semantic recognition, and is used for recognizing/detecting the content of a specific semantic tendency in a text; in particular to a real-time intelligent identification method and a real-time intelligent identification system for game chat advertisements.
Background
Some "players" of the game that are conducting promotions/promotions of goods or other services, and advertising promoters attempting to divert users to other games, often occur. The often occupational nature of such players promotes items/equipment directly or "implicitly" to other players during the course of a game. These promotions/promotions may be lost to the gaming operator or other players if left unattended, and also affect the player's experience during the game.
At present, in order to identify advertisement content in game chatting, manual identification or a deep neural network algorithm-based training identification model is generally adopted to match chatting texts. For example, the traditional data mining model inputs the real-time speech text of each player into a random forest model to judge whether the speech of the player is illegal, and the features adopted by the model are all derived from manual extraction. The existing game chat advertisement identification scheme is often slow in identification speed, poor in real-time performance and not stable enough, and advertisement content in chatting cannot be accurately identified due to a single identification model in the conventional game chat advertisement identification scheme, and particularly text identification capability that advertisement properties can be reflected in multilingual sentence texts is not enough.
Disclosure of Invention
In order to solve the defects of the existing game chat advertisement identification scheme, the invention provides a real-time intelligent identification scheme of game chat advertisements. The scheme is based on a distributed storage system and a real-time computing engine to obtain chat data of each player in game chat in real time, and the extracted chat data is refreshed to distributed redis on a group of chat wind control servers in real time through a load balancing technology to perform multi-rule/model advertisement identification. The scheme has the advantages of good stability, high processing speed, strong real-time property, high identification accuracy and high automation degree.
The technical scheme provided by the invention is specifically realized as follows:
a real-time intelligent identification method of game chat advertisements comprises the following steps: extracting chat characteristic data of a player from game chat contents of a game server in real time, and distributing the chat characteristic data to a group of chat wind control servers for processing through a load balancing framework of an ngnix server and an asynchronous mechanism of a distributed database redis; and asynchronously processing the refreshed chat characteristic data of redis on the chat wind control server through a plurality of advertisement identification rules/models arranged on the chat wind control server, and identifying whether the advertisement content exists in real time. The extracting of the chat characteristic data of different players from the game chat content in real time at least comprises the following steps: roles, IP, and chat-time speech text.
The chat characteristic data of the player has two paths from the game server to the chat wind control server:
1. for the chat features which do not need to be calculated by combining historical information of the players, the chat features are directly sent to an ngnix server by adopting a stream processing platform kafka, and the chat feature data are sent to redis in the group of chat wind control servers by utilizing a load balancing mechanism of the ngnix server.
2. For the chatting feature data related to the player history data, which can be obtained only by combining with the calculation of the player history data, the stream processing platform kafka sends the related chatting data to a distributed storage cassandra or kudu for storing the player history data; calculating the chat characteristic data related to the historical data of a specific player corresponding to the specific player by a real-time calculation engine presto according to all related data of the specific player in distributed storage cassandra or kudu, writing the chat characteristic data into a search analysis engine es (elastic search), swiping the chat characteristic data into the ngnix server by es in real time, and sending the chat characteristic data related to the historical data of the player to redis in the group of chat wind control servers through a load balancing mechanism of the ngnix server.
Further, the plurality of advertisement identification rules/models includes: the system comprises a white list matching rule based on player information, a black list matching rule based on rules formulated by the player information, sensitive words and long-term experience of game operators, a single text semantic advertisement recognition model based on a multi-head self-attention mechanism, a multi-text semantic advertisement recognition model based on a Word2Vec + BilsTM framework and a random forest advertisement recognition model based on real-time characteristic input.
The single text semantic advertisement recognition model is obtained by training a multi-head self-attention framework based on a transform framework and adopting a large number of manually labeled positive and negative samples. The advertisement recognition model includes: the preprocessing layer is used for inputting the text to be recognized into the word embedding layer after text cleaning; a word2vec model used in the word embedding layer for extracting a digital feature matrix for an input text and recording a position matrix of each word position in the text is obtained by learning a large amount of in-game historical chat conversations collected by operation and maintenance personnel by adopting a CBOW or Skip-Gram method; the multi-layer sensor MPL is used for receiving the digital matrix and the position matrix for processing, obtaining a text feature matrix related to the text attention and inputting the text feature matrix into a full connection layer, and is formed by cascading processing units consisting of a multi-head self-attention module and an FFN layer; and the full connection layer is of a multilayer structure, and the probability of whether the text is the advertisement is generated after a value output by the last layer is processed by a sigmoid function.
The multi-text semantic advertisement recognition model is obtained by training a large number of positive and negative samples which are manually marked on the basis of a Word2Vec + BilStm deep learning architecture. The system comprises a preprocessing layer, a word embedding layer, a normalization layer norm, a bidirectional long-term and short-term memory layer and a full connection layer. The preprocessing layer is used for cleaning multilingual sentence texts formed by splicing the latest preset number of speech texts of a specific player and inputting the multilingual sentence texts into the word embedding layer; the word embedding layer, a digital feature matrix used for extracting an input text and a position matrix for recording the position of each word in the text are input into the normalization layer norm for standard normalization processing, and a word2vec model used by the word embedding layer is obtained by learning a large amount of in-game historical chat sessions collected by an operation and maintenance worker by adopting a CBOW or Skip-Gram method; the bidirectional long-short term memory layer BilSTM is used for extracting a context information matrix of the context dependence information of the multi-language sentence text and then outputting the context information matrix to the full connection layer to predict whether the multi-language sentence text is an advertisement or not.
And further, the chat wind control server identifies the chat advertisements in a layer-by-layer progressive mode based on the multiple advertisement identification rules/models. The specific process is as follows: firstly, carrying out first matching on chat data of a certain player by adopting a white list matching rule based on player information, directly passing the chat data without subsequent detection if the player information is matched with the information in a white list, otherwise, carrying out second matching on the chat data by utilizing the black list matching rule, and when the player is in the black list, or a chat speech text of the player contains sensitive words in the black list, or directly alarming according with a rule formulated by long-term experience of game operators in the black list, or else, carrying out advertisement recognition on the speech text in the chat data by adopting a single text semantic advertisement recognition model; if the recognition result of the single-text semantic advertisement recognition model based on the multi-head attention mechanism belongs to the advertisement, directly alarming, otherwise, splicing the latest preset number of speaking texts of the player together to form a multi-sentence text, and recognizing the advertisement by adopting the multi-text semantic advertisement recognition model. Preferably, for a multi-language sentence text predicted as a non-advertisement by the multi-text semantic advertisement recognition model, further adopting a traditional data mining model to input chat data of each player into a random forest model, and outputting whether the player violates rules or not so as to improve the recall rate of the system; the features used by the random forest model are all extracted manually, and the decision making process is scientific and reliable.
Corresponding to the method, the invention also provides a real-time intelligent identification system for the game chat advertisement. The system comprises a stream processing platform kafka, a distributed storage system cassandra or kudu, a real-time computing engine presto, a group of chat wind control servers, an ngnix server and a search analysis engine es;
wherein, for the chat characteristic data which can be obtained without combining the historical information calculation of the player, the stream processing platform kafka directly sends the chat characteristic data to the ngnix server, and the ngnix server is used for carrying out a load balancing mechanism to send the chat characteristic data to redis in the group of chat wind control servers; for the chat characteristic data related to the player history data, which can be obtained only by combining with the calculation of the player history data, the stream processing platform kafka sends the related chat data to a distributed storage cassandra or kudu for storing the player history data, a real-time calculation engine presto calculates the chat characteristic data related to the player history data corresponding to a specific player according to all related data of the specific player in the distributed storage cassandra or kudu, writes the chat characteristic data into a search analysis engine es (elastic search), the chat characteristic data is flushed into the ngnix server by es in real time, and the chat characteristic data related to the player history data is sent to redis in the group of chat wind control servers through a load balancing mechanism of the ngnix server;
the real-time computing engine presto extracts chat data of different players from game chat contents in real time and stores the chat data into the distributed storage system cassandra;
the ngnix server communicates with the distributed storage system cassandra, and distributes the chat data to the group of chat wind control servers for processing through a distributed redis asynchronous mechanism based on a self load balancing framework;
each of the group of chat wind control servers is provided with a plurality of advertisement identification rules/models, and based on the plurality of advertisement identification rules/models, the chat data refreshed to redis thereof is asynchronously processed, and whether the advertisement content exists is identified in real time.
The specific implementation process of the real-time intelligent identification system of the game chat advertisement corresponds to the details of the real-time intelligent identification method of the game chat advertisement.
Drawings
Fig. 1 is a schematic diagram of various advertisement recognition rules/models provided in a chat session management server according to an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages solved by the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
A real-time intelligent identification method of game chat advertisements comprises the following steps: extracting chat characteristic data of each player from game chat contents of a game server in real time, and distributing the chat characteristic data to a group of chat wind control servers for processing through a load balancing framework of an ngnix server and an asynchronous mechanism of a distributed database redis; and asynchronously processing the refreshed chat characteristic data of redis on the chat wind control server through a plurality of advertisement identification rules/models arranged on the chat wind control server, and identifying whether the advertisement content exists in real time. The extracting of the chat characteristic data of each player in real time from the game chat content at least comprises the following steps: roles, IP, and chat-time speech text.
The chat characteristic data of the player has two paths from the game server to the chat wind control server:
1. for the chat features which do not need to be calculated by combining historical information of the players, the chat features are directly sent to an ngnix server by adopting a stream processing platform kafka, and the chat feature data are sent to redis in the group of chat wind control servers by utilizing a load balancing mechanism of the ngnix server.
2. For the chat characteristic data which is related to the player history data and can be obtained only by combining with the player history data calculation, the stream processing platform kafka sends the related chat data to a distributed storage cassandra or kudu for storing the player history data for storage; calculating the chat characteristic data related to the historical data of a specific player corresponding to the specific player by a real-time calculation engine presto according to all related data of the specific player in distributed storage cassandra or kudu, writing the chat characteristic data into a search analysis engine es (elastic search), swiping the chat characteristic data into the ngnix server by es in real time, and sending the chat characteristic data related to the historical data of the player to redis in the group of chat wind control servers through a load balancing mechanism of the ngnix server.
Wherein, Cassandra is a set of open source distributed NoSQL database system. The distributed structured data storage method is originally developed by Facebook and used for storing simple format data such as an inbox, and the Facebook integrates a data model of GoogleBigTable and a fully distributed architecture of Amazon Dynamo, opens Cassandra at 2008, and is adopted by known Web 2.0 websites such as Digg and Twitter due to good expandability of Cassandra, so that the distributed structured data storage method becomes a popular distributed structured data storage scheme. Cassandra is a mixed-type, non-relational database, similar to the BigTable of Google. The main functions of the system are richer than those of Dynamo (distributed Key-Value storage system), and the system has the most abundant functions in non-relational databases and is most like a relational database. The supported data structure is very loose and is in a json-like bjson format, so that more complex data types can be stored. The Presto query engine is a Master-Slave architecture, and consists of a Coordinator node, a Discovery Server node and a plurality of Worker nodes, wherein the Discovery Server is usually embedded in the Coordinator node. The Coordinator is responsible for analyzing the SQL statement, generating an execution plan and distributing an execution task to the Worker node for execution. Due to the characteristics, the system is very suitable for matching with a distributed storage system Cassandra, and the chat data of each player can be captured, stored and inquired.
The remote dictionary service (Redis) is a key-value storage system, data is stored in a memory, incremental operation of the data is supported, full operation during data query can be avoided, meanwhile, abundant data operations such as push/pop, add/remove and the like are supported, and the operations are atomic, so that the problem of data inconsistency is avoided. The Ngnix server is internally provided with a load balancing module, and the load balancing of data distribution can be conveniently realized according to different application scenes.
As shown in fig. 1, in the technical solution provided by the present invention, the multiple advertisement recognition rules/models in the chat session control server include:
the model is directly put through, the white list matching rule based on player information is mainly included, players according with the rule and players on the ip and account white list directly return to normal.
The model is directly disabled and internally contains blacklist matching rules based on rules formulated by player information, sensitive words and long-term experience of game operators. According to rules established by long-term experience of game operators, ip and account blacklists, when a player contains sensitive words or accords with the rules in the blacklists or speeches, the system directly returns alarm information.
The single text semantic advertisement recognition model is obtained by training a large number of positive and negative samples which are manually marked and based on a transform framework. The advertisement recognition model includes: the preprocessing layer is used for inputting the text to be recognized into the word embedding layer after text cleaning; a word2vec model used in the word embedding layer for extracting a digital feature matrix for an input text and recording a position matrix of each word position in the text is obtained by learning a large amount of in-game historical chat conversations collected by operation and maintenance personnel by adopting a CBOW or Skip-Gram method; the multi-layer sensor MPL is used for receiving the digital matrix and the position matrix for processing, obtaining a text feature matrix related to the text attention and inputting the text feature matrix into a full connection layer, and is formed by cascading processing units consisting of a multi-head self-attention module and an FFN layer; and the full connection layer is of a multilayer structure, and the probability of whether the text is the advertisement is generated after a value output by the last layer is processed by a sigmoid function. The single text deep learning model based on the transform framework can distinguish the player who is not captured by the rules and the blacklist but really violates the speech from the semantics through the semantics of the speech of the player.
The multi-text semantic advertisement recognition model is obtained by training a large number of manually labeled positive and negative samples based on a Word2Vec + BilSTM deep learning architecture. The system comprises a preprocessing layer, a word embedding layer, a normalization layer norm, a bidirectional long-term and short-term memory layer and a full connection layer. The preprocessing layer is used for cleaning multilingual sentence texts formed by splicing the latest preset number of speech texts of a specific player and inputting the multilingual sentence texts into the word embedding layer; the word embedding layer, a digital feature matrix used for extracting an input text and a position matrix for recording the position of each word in the text are input into the normalization layer norm for standard normalization processing, and a word2vec model used by the word embedding layer is obtained by learning a large amount of in-game historical chat sessions collected by an operation and maintenance worker by adopting a CBOW or Skip-Gram method; the bidirectional long-short term memory layer BilSTM is used for extracting a context information matrix of the context dependence information of the multi-language sentence text and then outputting the context information matrix to the full connection layer to predict whether the multi-language sentence text is an advertisement or not. The advertisement recognition model is used for recognizing the speech of a player in a time period and judging whether the player is the player who has the speech violation or not, so that the recall rate of the system is greatly improved.
The method comprises the following steps of (1) adopting a traditional data mining model, wherein the traditional data mining model is a random forest module, and all the used characteristics of the traditional data mining model are extracted manually; and inputting the real-time chatting data of each player into the random forest model to judge whether the speech output by the player is illegal.
The Word2Vec + BiLSTM deep learning architecture and the transform architecture-based multi-head self-attention architecture are relatively mature architectures, and the training process can be performed by conventional training of a person skilled in the art, which is not described herein.
The system comprises a white list matching rule based on player information, a black list matching rule based on rules formulated by the player information, sensitive words and long-term experience of game operators, a single text semantic advertisement recognition model based on a multi-head self-attention mechanism, a multi-text semantic advertisement recognition model based on a Word2Vec + BilsTM framework and a random forest advertisement recognition model based on real-time characteristic input.
Further, in order to improve the accuracy of advertisement identification, the chat wind control server identifies the chat advertisements in a layer-by-layer progressive mode based on the multiple advertisement identification rules/models. The specific process is as follows: firstly, carrying out first matching on chat data of a certain player by adopting a white list matching rule based on player information, directly passing the chat data without subsequent detection if the player information is matched with the information in a white list, otherwise, carrying out second matching on the chat data by utilizing the black list matching rule, and when the player is in the black list, or a chat speech text of the player contains sensitive words in the black list, or directly alarming according with a rule formulated by long-term experience of game operators in the black list, or else, carrying out advertisement recognition on the speech text in the chat data by adopting a single text semantic advertisement recognition model; if the recognition result of the single text semantic advertisement recognition model based on the multi-head attention mechanism is that the single text semantic advertisement recognition model belongs to an advertisement, directly alarming, otherwise, splicing the latest preset number of speech texts of the player together to form a multi-sentence text, and recognizing the advertisement by adopting the multi-text semantic advertisement recognition model; and for the multi-text semantic advertisement recognition model, predicting the multi-language sentence text which is not the advertisement, further adopting a traditional data mining model to input the chatting data of each player into a random forest model, and outputting whether the player violates rules or not so as to improve the recall rate of the system.
Corresponding to the method, the invention also provides a real-time intelligent identification system of the game chat advertisement, which is characterized by comprising a stream processing platform kafka, a distributed storage system cassandra or kudu, a real-time computing engine presto, a group of chat wind control servers, an ngnix server and a search analysis engine es.
Wherein, for the chat characteristic data which can be obtained without combining the historical information calculation of the player, the stream processing platform kafka directly sends the chat characteristic data to the ngnix server, and the ngnix server is used for carrying out a load balancing mechanism to send the chat characteristic data to redis in the group of chat wind control servers; for the chat characteristic data related to the player history data, which can be obtained only by combining with the calculation of the player history data, the stream processing platform kafka sends the related chat data to a distributed storage cassandra or kudu for storing the player history data, a real-time calculation engine presto calculates the chat characteristic data related to the player history data corresponding to a specific player according to all related data of the specific player in the distributed storage cassandra or kudu, writes the chat characteristic data into a search analysis engine es (elastic search), the chat characteristic data is flushed into the ngnix server by es in real time, and the chat characteristic data related to the player history data is sent to redis in the group of chat wind control servers through a load balancing mechanism of the ngnix server;
the real-time computing engine presto extracts chat data of different players from game chat contents in real time and stores the chat data into the distributed storage system cassandra;
the ngnix server communicates with the distributed storage system cassandra, and distributes the chat data to the group of chat wind control servers for processing through a distributed redis asynchronous mechanism based on a self load balancing framework;
each of the group of chat wind control servers is provided with a plurality of advertisement identification rules/models, and based on the plurality of advertisement identification rules/models, the chat data refreshed to redis thereof is asynchronously processed, and whether the advertisement content exists is identified in real time.
The specific details of the game chatting advertisement real-time intelligent recognition system for recognizing the advertisement of the player chatting data correspond to the details of the game chatting advertisement real-time intelligent recognition method for recognizing the advertisement of the player chatting data.
The real-time intelligent identification scheme of the game chat advertisement provided by the invention is combined with the distributed storage system and the real-time calculation engine to be matched with each other, so that the chat data of each player in the game chat is obtained in real time, the data extraction speed is increased, and the requirement of extracting the chat data in the game in real time is met; the extracted chatting data are refreshed in a distributed redis on a group of chatting wind control servers in real time through a load balancing technology to be processed, so that the stability of the system is improved, and meanwhile, the chatting wind control servers can be prevented from carrying out full operation on the chatting data of the players. The chat wind control server is internally provided with a multi-rule/model advertisement identification model comprising a deep learning model, and advertisement identification is carried out on the chat data of the player in a layer-by-layer detection mode, so that the identification accuracy is greatly improved, and the automation degree is high.
Claims (12)
1. A real-time intelligent identification method for game chat advertisements is characterized by comprising the following steps: extracting chat characteristic data of each player from game chat contents of a game server in real time, and distributing the chat characteristic data to a group of chat wind control servers for processing through a load balancing framework of an ngnix server and an asynchronous mechanism of a distributed database redis; and asynchronously processing the refreshed chat characteristic data of redis thereof through various advertisement identification rules/models set in each chat wind control server, and identifying whether the advertisement content exists in real time.
2. The method of claim 1, wherein the chat feature data for the player from the game server to the chat client comprises two paths:
a. for chat characteristic data which can be obtained without combining historical information calculation of a player, a stream processing platform kafka is adopted to directly send the chat characteristic data to an ngnix server, and the ngnix server is utilized to carry out a load balancing mechanism to send the chat characteristic data to redis in the group of chat wind control servers;
b. for chat characteristic data which needs to be obtained by combining with player historical data through calculation, a stream processing platform kafka is adopted to send relevant chat data to a distributed storage cassandra or kudu for storing player historical data for storage; the chat characteristic data corresponding to a specific player in the cassandra or kudu is calculated by a real-time calculation engine presto according to all relevant data of the specific player in the distributed storage cassandra or kudu and then written into a search analysis engine es (elastic search), and then the es swipes the chat characteristic data into the redis of the group of chat wind control servers in real time.
3. A method as recited in claim 1 or 2, wherein said extracting chat feature data of each player in real time comprises at least: roles, IP, and chat-time speech text.
4. The method of claim 3, wherein the plurality of advertisement identification rules/models comprises: the system comprises a white list matching rule based on player information, a black list matching rule based on rules formulated by the player information, sensitive words and long-term experience of game operators, a single text semantic advertisement recognition model based on a multi-head self-attention mechanism, a multi-text semantic advertisement recognition model based on a Word2Vec + BilsTM framework and a random forest advertisement recognition model based on real-time characteristic input.
5. The method of claim 4, wherein the single-text semantic advertisement recognition model is trained by a multi-head self-attention framework based on a transform framework and by a large number of manually labeled positive and negative samples; the method comprises the following steps: the system comprises a pretreatment layer, a word embedding layer, a multi-layer perceptron MPL and a full connection layer;
the preprocessing layer is used for inputting the text to be recognized and the speech text to be recognized to the word embedding layer after the text is cleaned; the word embedding layer, a digital feature matrix for extracting an input text and a position matrix for recording the position of each word in the text are used, and a word2vec model used by the word embedding layer is obtained by learning a large amount of in-game historical chat conversations collected by operation and maintenance personnel by adopting a CBOW or Skip-Gram method;
the multi-layer perceptron MPL is formed by cascading processing units consisting of a multi-head self-attention module and an FFN layer, and is used for receiving the number matrix and the position matrix for processing, obtaining a text feature matrix related to the text attention, inputting the text feature matrix into a full-connection layer, and predicting whether the text is an advertisement or not;
the full connection layer is a multi-layer full connection layer and is used for mapping the text feature matrix to a sample mark space, and the probability of whether the text is an advertisement or not is generated after a value output by the last layer is processed by a sigmoid function.
6. The method of claim 4 or 5, wherein the multi-text semantic advertisement recognition model is trained based on a Word2Vec + BilsTM deep learning architecture using a large number of manually labeled positive and negative samples; the method comprises the following steps: the system comprises a preprocessing layer, a word embedding layer, a normalization layer norm, a bidirectional long-term and short-term memory layer and a full connection layer; the preprocessing layer is used for cleaning multilingual sentence texts formed by splicing the latest preset number of speech texts of a specific player and inputting the multilingual sentence texts into the word embedding layer; the word embedding layer, a digital feature matrix used for extracting an input text and a position matrix for recording the position of each word in the text are input into the normalization layer norm for standard normalization processing, and a word2vec model used by the word embedding layer is obtained by learning a large amount of in-game historical chat sessions collected by an operation and maintenance worker by adopting a CBOW or Skip-Gram method; the bidirectional long-short term memory layer BilSTM is used for extracting a context information matrix of the context dependence information of the multi-language sentence text and then outputting the context information matrix to the full connection layer to generate the prediction of whether the multi-language sentence text is an advertisement.
7. The method of claim 6, wherein the step of identifying chat advertisements among the plurality of advertisement identification rules/models in a layer-by-layer progressive manner comprises: firstly, carrying out first matching on chat data of a certain player by adopting a white list matching rule based on player information, directly passing the chat data without subsequent detection if the player information is matched with the information in a white list, otherwise, carrying out second matching on the chat data by utilizing the black list matching rule, and when the player is in the black list, or a chat speech text of the player contains sensitive words in the black list, or directly alarming according with a rule formulated by long-term experience of game operators in the black list, or else, carrying out advertisement recognition on the speech text in the chat data by adopting a single text semantic advertisement recognition model; if the recognition result of the single-text semantic advertisement recognition model based on the multi-head attention mechanism belongs to the advertisement, directly alarming, otherwise, splicing the latest preset number of speaking texts of the player together to form a multi-sentence text, and recognizing the advertisement by adopting the multi-text semantic advertisement recognition model.
8. A real-time intelligent identification system for game chat advertisements is characterized by comprising a stream processing platform kafka, a distributed storage system cassandra or kudu, a real-time computing engine presto, a group of chat wind control servers, an ngnix server and a search analysis engine es;
wherein, for the chat characteristic data which can be obtained without combining the historical information calculation of the player, the stream processing platform kafka directly sends the chat characteristic data to the ngnix server, and the ngnix server is used for carrying out a load balancing mechanism to send the chat characteristic data to redis in the group of chat wind control servers; for the chat characteristic data related to the player history data, which can be obtained only by combining with the calculation of the player history data, the stream processing platform kafka sends the related chat data to a distributed storage cassandra or kudu for storing the player history data, a real-time calculation engine presto calculates the chat characteristic data related to the player history data corresponding to a specific player according to all related data of the specific player in the distributed storage cassandra or kudu and writes the chat characteristic data into a search analysis engine es (elastic search), es swipes the chat characteristic data related to the player history data into the ngnix server in real time, and sends the chat characteristic data to redis in the group of chat wind control servers through a load balancing mechanism of the ngnix server;
the real-time computing engine presto extracts chat data of different players from game chat contents in real time and stores the chat data into the distributed storage system cassandra;
the ngnix server communicates with the distributed storage system cassandra, and distributes the chat data to the group of chat wind control servers for processing through a distributed redis asynchronous mechanism based on a self load balancing framework;
each of the group of chat wind control servers is provided with a plurality of advertisement identification rules/models, and based on the plurality of advertisement identification rules/models, the chat data refreshed to redis thereof is asynchronously processed, and whether the advertisement content exists is identified in real time.
9. The system of claim 8, wherein the plurality of advertisement identification rules/models comprises: the system comprises a white list matching rule based on player information, a black list matching rule based on rules formulated by the player information, sensitive words and long-term experience of game operators, a single text semantic advertisement recognition model based on a multi-head self-attention mechanism, a multi-text semantic advertisement recognition model based on the multi-head self-attention mechanism and a random forest advertisement recognition model based on real-time characteristic input.
10. The system of claim 9, wherein the single-text semantic advertisement recognition model is trained based on a transform-based multi-head self-attention framework using a large number of manually labeled positive and negative examples; the method comprises the following steps: the system comprises a pretreatment layer, a word embedding layer, a multi-layer sensor MPL and a full connection layer; the preprocessing layer is used for inputting the text to be recognized and the speech text to be recognized to the word embedding layer after the text is cleaned; the word embedding layer, a digital feature matrix for extracting an input text and a position matrix for recording the position of each word in the text are used, and a word2vec model used by the word embedding layer is obtained by learning a large amount of in-game historical chat conversations collected by operation and maintenance personnel by adopting a CBOW or Skip-Gram method; the multi-layer perceptron MPL is formed by cascading processing units consisting of a multi-head self-attention module and an FFN layer, and is used for receiving the number matrix and the position matrix for processing, obtaining a text feature matrix related to the text attention, inputting the text feature matrix into a full-connection layer, and generating prediction of whether the text is an advertisement. The full connection layer is a multi-layer full connection layer and is used for mapping the text feature matrix to a sample mark space, and the probability of whether the text is an advertisement or not is generated after a value output by the last layer is processed by a sigmoid function.
11. The system of claim 9 or 10, wherein the multi-text semantic advertisement recognition model is trained based on a Word2Vec + BiLSTM deep learning architecture using a large number of manually labeled positive and negative examples; the method comprises the following steps: the system comprises a preprocessing layer, a word embedding layer, a normalization layer norm, a bidirectional long-term and short-term memory layer and a full connection layer;
the preprocessing layer is used for cleaning multilingual sentence texts formed by splicing the latest preset number of speech texts of a specific player and inputting the multilingual sentence texts into the word embedding layer;
the word embedding layer, a digital feature matrix used for extracting an input text and a position matrix for recording the position of each word in the text are input into the normalization layer norm for standard normalization processing, and a word2vec model used by the word embedding layer is obtained by learning a large amount of in-game historical chat sessions collected by an operation and maintenance worker by adopting a CBOW or Skip-Gram method;
the bidirectional long-short term memory layer BilSTM is used for extracting a context information matrix of the context dependence information of the multi-language sentence text and then outputting the context information matrix to the full connection layer to generate the prediction of whether the multi-language sentence text is an advertisement.
12. The system of claim 11, wherein the chat client identifies the advertisements in a layer-by-layer progressive manner based on the plurality of advertisement identification rules/models, the identification comprising: firstly, carrying out first matching on chat data of a certain player by adopting a white list matching rule based on player information, directly passing the chat data without subsequent detection if the player information is matched with the information in a white list, otherwise, carrying out second matching on the chat data by utilizing the black list matching rule, and when the player is in the black list, or a chat speech text of the player contains sensitive words in the black list, or directly alarming according with a rule formulated by long-term experience of game operators in the black list, or else, carrying out advertisement recognition on the speech text in the chat data by adopting a single text semantic advertisement recognition model; if the recognition result of the single-text semantic advertisement recognition model based on the multi-head attention mechanism belongs to the advertisement, directly alarming, otherwise, splicing the latest preset number of speaking texts of the player together to form a multi-sentence text, and recognizing the advertisement by adopting the multi-text semantic advertisement recognition model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110133090.1A CN114797114A (en) | 2021-01-29 | 2021-01-29 | Real-time intelligent identification method and system for game chat advertisement |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110133090.1A CN114797114A (en) | 2021-01-29 | 2021-01-29 | Real-time intelligent identification method and system for game chat advertisement |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114797114A true CN114797114A (en) | 2022-07-29 |
Family
ID=82527002
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110133090.1A Pending CN114797114A (en) | 2021-01-29 | 2021-01-29 | Real-time intelligent identification method and system for game chat advertisement |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114797114A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116506244A (en) * | 2023-05-24 | 2023-07-28 | 北京比邻星空科技有限公司 | Chat room configuration method capable of self-adapting to number of people in room |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107657286A (en) * | 2017-10-19 | 2018-02-02 | 北京深极智能科技有限公司 | A kind of advertisement recognition method and computer-readable recording medium |
CN108197966A (en) * | 2017-11-01 | 2018-06-22 | 上海新数网络科技股份有限公司 | A kind of accurate advertisement analysis method based on crowd's label data |
US10417667B1 (en) * | 2012-06-27 | 2019-09-17 | Groupon, Inc. | Method and apparatus for creating web content and identifying advertisements for users creating and viewing content |
CN110719221A (en) * | 2019-10-16 | 2020-01-21 | 北京蚂蜂窝网络科技有限公司 | Instant messaging method, device, equipment and storage medium |
CN111538836A (en) * | 2020-04-22 | 2020-08-14 | 哈尔滨工业大学(威海) | Method for identifying financial advertisements in text advertisements |
-
2021
- 2021-01-29 CN CN202110133090.1A patent/CN114797114A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10417667B1 (en) * | 2012-06-27 | 2019-09-17 | Groupon, Inc. | Method and apparatus for creating web content and identifying advertisements for users creating and viewing content |
CN107657286A (en) * | 2017-10-19 | 2018-02-02 | 北京深极智能科技有限公司 | A kind of advertisement recognition method and computer-readable recording medium |
CN108197966A (en) * | 2017-11-01 | 2018-06-22 | 上海新数网络科技股份有限公司 | A kind of accurate advertisement analysis method based on crowd's label data |
CN110719221A (en) * | 2019-10-16 | 2020-01-21 | 北京蚂蜂窝网络科技有限公司 | Instant messaging method, device, equipment and storage medium |
CN111538836A (en) * | 2020-04-22 | 2020-08-14 | 哈尔滨工业大学(威海) | Method for identifying financial advertisements in text advertisements |
Non-Patent Citations (1)
Title |
---|
高扬: "《智能摘要与深度学习》", 30 April 2019, 北京理工大学出版社, pages: 23 - 38 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116506244A (en) * | 2023-05-24 | 2023-07-28 | 北京比邻星空科技有限公司 | Chat room configuration method capable of self-adapting to number of people in room |
CN116506244B (en) * | 2023-05-24 | 2023-11-17 | 北京比邻星空科技有限公司 | Chat room configuration method capable of self-adapting to number of people in room |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109657054B (en) | Abstract generation method, device, server and storage medium | |
CN108717408B (en) | Sensitive word real-time monitoring method, electronic equipment, storage medium and system | |
CN110232109A (en) | A kind of Internet public opinion analysis method and system | |
CN110598037B (en) | Image searching method, device and storage medium | |
CN110598070B (en) | Application type identification method and device, server and storage medium | |
CN110427461A (en) | Intelligent answer information processing method, electronic equipment and computer readable storage medium | |
CN114238573A (en) | Information pushing method and device based on text countermeasure sample | |
CN109408574B (en) | Complaint responsibility confirmation system based on text mining technology | |
CN111522915A (en) | Extraction method, device and equipment of Chinese event and storage medium | |
CN111666400B (en) | Message acquisition method, device, computer equipment and storage medium | |
CN117609479B (en) | Model processing method, device, equipment, medium and product | |
CN111026840A (en) | Text processing method, device, server and storage medium | |
CN112069324B (en) | Classification label adding method, device, equipment and storage medium | |
CN115688920A (en) | Knowledge extraction method, model training method, device, equipment and medium | |
US11200264B2 (en) | Systems and methods for identifying dynamic types in voice queries | |
CN111488501A (en) | E-commerce statistical system based on cloud platform | |
CN114797114A (en) | Real-time intelligent identification method and system for game chat advertisement | |
CN113821612A (en) | Information searching method and device | |
WO2024055603A1 (en) | Method and apparatus for identifying text from minor | |
CN117764373A (en) | Risk prediction method, apparatus, device and storage medium | |
CN111639494A (en) | Case affair relation determining method and system | |
CN117033626A (en) | Text auditing method, device, equipment and storage medium | |
CN116484105A (en) | Service processing method, device, computer equipment, storage medium and program product | |
CN114444609B (en) | Data processing method, device, electronic equipment and computer readable storage medium | |
CN110188201A (en) | A kind of information matching method and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |