CN114797114A - Real-time intelligent identification method and system for game chat advertisement - Google Patents

Real-time intelligent identification method and system for game chat advertisement Download PDF

Info

Publication number
CN114797114A
CN114797114A CN202110133090.1A CN202110133090A CN114797114A CN 114797114 A CN114797114 A CN 114797114A CN 202110133090 A CN202110133090 A CN 202110133090A CN 114797114 A CN114797114 A CN 114797114A
Authority
CN
China
Prior art keywords
chat
text
layer
player
advertisement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110133090.1A
Other languages
Chinese (zh)
Inventor
夏聃
孔融
胡天
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHENGQU INFORMATION TECHNOLOGY (SHANGHAI) CO LTD
Original Assignee
SHENGQU INFORMATION TECHNOLOGY (SHANGHAI) CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHENGQU INFORMATION TECHNOLOGY (SHANGHAI) CO LTD filed Critical SHENGQU INFORMATION TECHNOLOGY (SHANGHAI) CO LTD
Priority to CN202110133090.1A priority Critical patent/CN114797114A/en
Publication of CN114797114A publication Critical patent/CN114797114A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/70Game security or game management aspects
    • A63F13/79Game security or game management aspects involving player-related data, e.g. identities, accounts, preferences or play histories
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/85Providing additional services to players
    • A63F13/87Communicating with other players during game play, e.g. by e-mail or chat
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • General Business, Economics & Management (AREA)
  • Business, Economics & Management (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a real-time intelligent identification method and a real-time intelligent identification system for game chat advertisements. Extracting chat characteristic data of each player from game chat contents of a game server in real time, and distributing the chat characteristic data to a group of chat wind control servers for processing through a load balancing framework of an ngnix server and an asynchronous mechanism of a distributed database redis; and asynchronously processing the refreshed chat characteristic data of redis thereof through various advertisement identification rules/models set in each chat wind control server, and identifying whether the advertisement content exists in real time. Compared with the existing game chat advertisement identification scheme, the technical scheme provided by the invention has the advantages of good stability, high processing speed, strong real-time property, high identification accuracy and high automation degree.

Description

Real-time intelligent identification method and system for game chat advertisement
Technical Field
The scheme provided by the invention relates to the field of text processing and semantic recognition, and is used for recognizing/detecting the content of a specific semantic tendency in a text; in particular to a real-time intelligent identification method and a real-time intelligent identification system for game chat advertisements.
Background
Some "players" of the game that are conducting promotions/promotions of goods or other services, and advertising promoters attempting to divert users to other games, often occur. The often occupational nature of such players promotes items/equipment directly or "implicitly" to other players during the course of a game. These promotions/promotions may be lost to the gaming operator or other players if left unattended, and also affect the player's experience during the game.
At present, in order to identify advertisement content in game chatting, manual identification or a deep neural network algorithm-based training identification model is generally adopted to match chatting texts. For example, the traditional data mining model inputs the real-time speech text of each player into a random forest model to judge whether the speech of the player is illegal, and the features adopted by the model are all derived from manual extraction. The existing game chat advertisement identification scheme is often slow in identification speed, poor in real-time performance and not stable enough, and advertisement content in chatting cannot be accurately identified due to a single identification model in the conventional game chat advertisement identification scheme, and particularly text identification capability that advertisement properties can be reflected in multilingual sentence texts is not enough.
Disclosure of Invention
In order to solve the defects of the existing game chat advertisement identification scheme, the invention provides a real-time intelligent identification scheme of game chat advertisements. The scheme is based on a distributed storage system and a real-time computing engine to obtain chat data of each player in game chat in real time, and the extracted chat data is refreshed to distributed redis on a group of chat wind control servers in real time through a load balancing technology to perform multi-rule/model advertisement identification. The scheme has the advantages of good stability, high processing speed, strong real-time property, high identification accuracy and high automation degree.
The technical scheme provided by the invention is specifically realized as follows:
a real-time intelligent identification method of game chat advertisements comprises the following steps: extracting chat characteristic data of a player from game chat contents of a game server in real time, and distributing the chat characteristic data to a group of chat wind control servers for processing through a load balancing framework of an ngnix server and an asynchronous mechanism of a distributed database redis; and asynchronously processing the refreshed chat characteristic data of redis on the chat wind control server through a plurality of advertisement identification rules/models arranged on the chat wind control server, and identifying whether the advertisement content exists in real time. The extracting of the chat characteristic data of different players from the game chat content in real time at least comprises the following steps: roles, IP, and chat-time speech text.
The chat characteristic data of the player has two paths from the game server to the chat wind control server:
1. for the chat features which do not need to be calculated by combining historical information of the players, the chat features are directly sent to an ngnix server by adopting a stream processing platform kafka, and the chat feature data are sent to redis in the group of chat wind control servers by utilizing a load balancing mechanism of the ngnix server.
2. For the chatting feature data related to the player history data, which can be obtained only by combining with the calculation of the player history data, the stream processing platform kafka sends the related chatting data to a distributed storage cassandra or kudu for storing the player history data; calculating the chat characteristic data related to the historical data of a specific player corresponding to the specific player by a real-time calculation engine presto according to all related data of the specific player in distributed storage cassandra or kudu, writing the chat characteristic data into a search analysis engine es (elastic search), swiping the chat characteristic data into the ngnix server by es in real time, and sending the chat characteristic data related to the historical data of the player to redis in the group of chat wind control servers through a load balancing mechanism of the ngnix server.
Further, the plurality of advertisement identification rules/models includes: the system comprises a white list matching rule based on player information, a black list matching rule based on rules formulated by the player information, sensitive words and long-term experience of game operators, a single text semantic advertisement recognition model based on a multi-head self-attention mechanism, a multi-text semantic advertisement recognition model based on a Word2Vec + BilsTM framework and a random forest advertisement recognition model based on real-time characteristic input.
The single text semantic advertisement recognition model is obtained by training a multi-head self-attention framework based on a transform framework and adopting a large number of manually labeled positive and negative samples. The advertisement recognition model includes: the preprocessing layer is used for inputting the text to be recognized into the word embedding layer after text cleaning; a word2vec model used in the word embedding layer for extracting a digital feature matrix for an input text and recording a position matrix of each word position in the text is obtained by learning a large amount of in-game historical chat conversations collected by operation and maintenance personnel by adopting a CBOW or Skip-Gram method; the multi-layer sensor MPL is used for receiving the digital matrix and the position matrix for processing, obtaining a text feature matrix related to the text attention and inputting the text feature matrix into a full connection layer, and is formed by cascading processing units consisting of a multi-head self-attention module and an FFN layer; and the full connection layer is of a multilayer structure, and the probability of whether the text is the advertisement is generated after a value output by the last layer is processed by a sigmoid function.
The multi-text semantic advertisement recognition model is obtained by training a large number of positive and negative samples which are manually marked on the basis of a Word2Vec + BilStm deep learning architecture. The system comprises a preprocessing layer, a word embedding layer, a normalization layer norm, a bidirectional long-term and short-term memory layer and a full connection layer. The preprocessing layer is used for cleaning multilingual sentence texts formed by splicing the latest preset number of speech texts of a specific player and inputting the multilingual sentence texts into the word embedding layer; the word embedding layer, a digital feature matrix used for extracting an input text and a position matrix for recording the position of each word in the text are input into the normalization layer norm for standard normalization processing, and a word2vec model used by the word embedding layer is obtained by learning a large amount of in-game historical chat sessions collected by an operation and maintenance worker by adopting a CBOW or Skip-Gram method; the bidirectional long-short term memory layer BilSTM is used for extracting a context information matrix of the context dependence information of the multi-language sentence text and then outputting the context information matrix to the full connection layer to predict whether the multi-language sentence text is an advertisement or not.
And further, the chat wind control server identifies the chat advertisements in a layer-by-layer progressive mode based on the multiple advertisement identification rules/models. The specific process is as follows: firstly, carrying out first matching on chat data of a certain player by adopting a white list matching rule based on player information, directly passing the chat data without subsequent detection if the player information is matched with the information in a white list, otherwise, carrying out second matching on the chat data by utilizing the black list matching rule, and when the player is in the black list, or a chat speech text of the player contains sensitive words in the black list, or directly alarming according with a rule formulated by long-term experience of game operators in the black list, or else, carrying out advertisement recognition on the speech text in the chat data by adopting a single text semantic advertisement recognition model; if the recognition result of the single-text semantic advertisement recognition model based on the multi-head attention mechanism belongs to the advertisement, directly alarming, otherwise, splicing the latest preset number of speaking texts of the player together to form a multi-sentence text, and recognizing the advertisement by adopting the multi-text semantic advertisement recognition model. Preferably, for a multi-language sentence text predicted as a non-advertisement by the multi-text semantic advertisement recognition model, further adopting a traditional data mining model to input chat data of each player into a random forest model, and outputting whether the player violates rules or not so as to improve the recall rate of the system; the features used by the random forest model are all extracted manually, and the decision making process is scientific and reliable.
Corresponding to the method, the invention also provides a real-time intelligent identification system for the game chat advertisement. The system comprises a stream processing platform kafka, a distributed storage system cassandra or kudu, a real-time computing engine presto, a group of chat wind control servers, an ngnix server and a search analysis engine es;
wherein, for the chat characteristic data which can be obtained without combining the historical information calculation of the player, the stream processing platform kafka directly sends the chat characteristic data to the ngnix server, and the ngnix server is used for carrying out a load balancing mechanism to send the chat characteristic data to redis in the group of chat wind control servers; for the chat characteristic data related to the player history data, which can be obtained only by combining with the calculation of the player history data, the stream processing platform kafka sends the related chat data to a distributed storage cassandra or kudu for storing the player history data, a real-time calculation engine presto calculates the chat characteristic data related to the player history data corresponding to a specific player according to all related data of the specific player in the distributed storage cassandra or kudu, writes the chat characteristic data into a search analysis engine es (elastic search), the chat characteristic data is flushed into the ngnix server by es in real time, and the chat characteristic data related to the player history data is sent to redis in the group of chat wind control servers through a load balancing mechanism of the ngnix server;
the real-time computing engine presto extracts chat data of different players from game chat contents in real time and stores the chat data into the distributed storage system cassandra;
the ngnix server communicates with the distributed storage system cassandra, and distributes the chat data to the group of chat wind control servers for processing through a distributed redis asynchronous mechanism based on a self load balancing framework;
each of the group of chat wind control servers is provided with a plurality of advertisement identification rules/models, and based on the plurality of advertisement identification rules/models, the chat data refreshed to redis thereof is asynchronously processed, and whether the advertisement content exists is identified in real time.
The specific implementation process of the real-time intelligent identification system of the game chat advertisement corresponds to the details of the real-time intelligent identification method of the game chat advertisement.
Drawings
Fig. 1 is a schematic diagram of various advertisement recognition rules/models provided in a chat session management server according to an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages solved by the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
A real-time intelligent identification method of game chat advertisements comprises the following steps: extracting chat characteristic data of each player from game chat contents of a game server in real time, and distributing the chat characteristic data to a group of chat wind control servers for processing through a load balancing framework of an ngnix server and an asynchronous mechanism of a distributed database redis; and asynchronously processing the refreshed chat characteristic data of redis on the chat wind control server through a plurality of advertisement identification rules/models arranged on the chat wind control server, and identifying whether the advertisement content exists in real time. The extracting of the chat characteristic data of each player in real time from the game chat content at least comprises the following steps: roles, IP, and chat-time speech text.
The chat characteristic data of the player has two paths from the game server to the chat wind control server:
1. for the chat features which do not need to be calculated by combining historical information of the players, the chat features are directly sent to an ngnix server by adopting a stream processing platform kafka, and the chat feature data are sent to redis in the group of chat wind control servers by utilizing a load balancing mechanism of the ngnix server.
2. For the chat characteristic data which is related to the player history data and can be obtained only by combining with the player history data calculation, the stream processing platform kafka sends the related chat data to a distributed storage cassandra or kudu for storing the player history data for storage; calculating the chat characteristic data related to the historical data of a specific player corresponding to the specific player by a real-time calculation engine presto according to all related data of the specific player in distributed storage cassandra or kudu, writing the chat characteristic data into a search analysis engine es (elastic search), swiping the chat characteristic data into the ngnix server by es in real time, and sending the chat characteristic data related to the historical data of the player to redis in the group of chat wind control servers through a load balancing mechanism of the ngnix server.
Wherein, Cassandra is a set of open source distributed NoSQL database system. The distributed structured data storage method is originally developed by Facebook and used for storing simple format data such as an inbox, and the Facebook integrates a data model of GoogleBigTable and a fully distributed architecture of Amazon Dynamo, opens Cassandra at 2008, and is adopted by known Web 2.0 websites such as Digg and Twitter due to good expandability of Cassandra, so that the distributed structured data storage method becomes a popular distributed structured data storage scheme. Cassandra is a mixed-type, non-relational database, similar to the BigTable of Google. The main functions of the system are richer than those of Dynamo (distributed Key-Value storage system), and the system has the most abundant functions in non-relational databases and is most like a relational database. The supported data structure is very loose and is in a json-like bjson format, so that more complex data types can be stored. The Presto query engine is a Master-Slave architecture, and consists of a Coordinator node, a Discovery Server node and a plurality of Worker nodes, wherein the Discovery Server is usually embedded in the Coordinator node. The Coordinator is responsible for analyzing the SQL statement, generating an execution plan and distributing an execution task to the Worker node for execution. Due to the characteristics, the system is very suitable for matching with a distributed storage system Cassandra, and the chat data of each player can be captured, stored and inquired.
The remote dictionary service (Redis) is a key-value storage system, data is stored in a memory, incremental operation of the data is supported, full operation during data query can be avoided, meanwhile, abundant data operations such as push/pop, add/remove and the like are supported, and the operations are atomic, so that the problem of data inconsistency is avoided. The Ngnix server is internally provided with a load balancing module, and the load balancing of data distribution can be conveniently realized according to different application scenes.
As shown in fig. 1, in the technical solution provided by the present invention, the multiple advertisement recognition rules/models in the chat session control server include:
the model is directly put through, the white list matching rule based on player information is mainly included, players according with the rule and players on the ip and account white list directly return to normal.
The model is directly disabled and internally contains blacklist matching rules based on rules formulated by player information, sensitive words and long-term experience of game operators. According to rules established by long-term experience of game operators, ip and account blacklists, when a player contains sensitive words or accords with the rules in the blacklists or speeches, the system directly returns alarm information.
The single text semantic advertisement recognition model is obtained by training a large number of positive and negative samples which are manually marked and based on a transform framework. The advertisement recognition model includes: the preprocessing layer is used for inputting the text to be recognized into the word embedding layer after text cleaning; a word2vec model used in the word embedding layer for extracting a digital feature matrix for an input text and recording a position matrix of each word position in the text is obtained by learning a large amount of in-game historical chat conversations collected by operation and maintenance personnel by adopting a CBOW or Skip-Gram method; the multi-layer sensor MPL is used for receiving the digital matrix and the position matrix for processing, obtaining a text feature matrix related to the text attention and inputting the text feature matrix into a full connection layer, and is formed by cascading processing units consisting of a multi-head self-attention module and an FFN layer; and the full connection layer is of a multilayer structure, and the probability of whether the text is the advertisement is generated after a value output by the last layer is processed by a sigmoid function. The single text deep learning model based on the transform framework can distinguish the player who is not captured by the rules and the blacklist but really violates the speech from the semantics through the semantics of the speech of the player.
The multi-text semantic advertisement recognition model is obtained by training a large number of manually labeled positive and negative samples based on a Word2Vec + BilSTM deep learning architecture. The system comprises a preprocessing layer, a word embedding layer, a normalization layer norm, a bidirectional long-term and short-term memory layer and a full connection layer. The preprocessing layer is used for cleaning multilingual sentence texts formed by splicing the latest preset number of speech texts of a specific player and inputting the multilingual sentence texts into the word embedding layer; the word embedding layer, a digital feature matrix used for extracting an input text and a position matrix for recording the position of each word in the text are input into the normalization layer norm for standard normalization processing, and a word2vec model used by the word embedding layer is obtained by learning a large amount of in-game historical chat sessions collected by an operation and maintenance worker by adopting a CBOW or Skip-Gram method; the bidirectional long-short term memory layer BilSTM is used for extracting a context information matrix of the context dependence information of the multi-language sentence text and then outputting the context information matrix to the full connection layer to predict whether the multi-language sentence text is an advertisement or not. The advertisement recognition model is used for recognizing the speech of a player in a time period and judging whether the player is the player who has the speech violation or not, so that the recall rate of the system is greatly improved.
The method comprises the following steps of (1) adopting a traditional data mining model, wherein the traditional data mining model is a random forest module, and all the used characteristics of the traditional data mining model are extracted manually; and inputting the real-time chatting data of each player into the random forest model to judge whether the speech output by the player is illegal.
The Word2Vec + BiLSTM deep learning architecture and the transform architecture-based multi-head self-attention architecture are relatively mature architectures, and the training process can be performed by conventional training of a person skilled in the art, which is not described herein.
The system comprises a white list matching rule based on player information, a black list matching rule based on rules formulated by the player information, sensitive words and long-term experience of game operators, a single text semantic advertisement recognition model based on a multi-head self-attention mechanism, a multi-text semantic advertisement recognition model based on a Word2Vec + BilsTM framework and a random forest advertisement recognition model based on real-time characteristic input.
Further, in order to improve the accuracy of advertisement identification, the chat wind control server identifies the chat advertisements in a layer-by-layer progressive mode based on the multiple advertisement identification rules/models. The specific process is as follows: firstly, carrying out first matching on chat data of a certain player by adopting a white list matching rule based on player information, directly passing the chat data without subsequent detection if the player information is matched with the information in a white list, otherwise, carrying out second matching on the chat data by utilizing the black list matching rule, and when the player is in the black list, or a chat speech text of the player contains sensitive words in the black list, or directly alarming according with a rule formulated by long-term experience of game operators in the black list, or else, carrying out advertisement recognition on the speech text in the chat data by adopting a single text semantic advertisement recognition model; if the recognition result of the single text semantic advertisement recognition model based on the multi-head attention mechanism is that the single text semantic advertisement recognition model belongs to an advertisement, directly alarming, otherwise, splicing the latest preset number of speech texts of the player together to form a multi-sentence text, and recognizing the advertisement by adopting the multi-text semantic advertisement recognition model; and for the multi-text semantic advertisement recognition model, predicting the multi-language sentence text which is not the advertisement, further adopting a traditional data mining model to input the chatting data of each player into a random forest model, and outputting whether the player violates rules or not so as to improve the recall rate of the system.
Corresponding to the method, the invention also provides a real-time intelligent identification system of the game chat advertisement, which is characterized by comprising a stream processing platform kafka, a distributed storage system cassandra or kudu, a real-time computing engine presto, a group of chat wind control servers, an ngnix server and a search analysis engine es.
Wherein, for the chat characteristic data which can be obtained without combining the historical information calculation of the player, the stream processing platform kafka directly sends the chat characteristic data to the ngnix server, and the ngnix server is used for carrying out a load balancing mechanism to send the chat characteristic data to redis in the group of chat wind control servers; for the chat characteristic data related to the player history data, which can be obtained only by combining with the calculation of the player history data, the stream processing platform kafka sends the related chat data to a distributed storage cassandra or kudu for storing the player history data, a real-time calculation engine presto calculates the chat characteristic data related to the player history data corresponding to a specific player according to all related data of the specific player in the distributed storage cassandra or kudu, writes the chat characteristic data into a search analysis engine es (elastic search), the chat characteristic data is flushed into the ngnix server by es in real time, and the chat characteristic data related to the player history data is sent to redis in the group of chat wind control servers through a load balancing mechanism of the ngnix server;
the real-time computing engine presto extracts chat data of different players from game chat contents in real time and stores the chat data into the distributed storage system cassandra;
the ngnix server communicates with the distributed storage system cassandra, and distributes the chat data to the group of chat wind control servers for processing through a distributed redis asynchronous mechanism based on a self load balancing framework;
each of the group of chat wind control servers is provided with a plurality of advertisement identification rules/models, and based on the plurality of advertisement identification rules/models, the chat data refreshed to redis thereof is asynchronously processed, and whether the advertisement content exists is identified in real time.
The specific details of the game chatting advertisement real-time intelligent recognition system for recognizing the advertisement of the player chatting data correspond to the details of the game chatting advertisement real-time intelligent recognition method for recognizing the advertisement of the player chatting data.
The real-time intelligent identification scheme of the game chat advertisement provided by the invention is combined with the distributed storage system and the real-time calculation engine to be matched with each other, so that the chat data of each player in the game chat is obtained in real time, the data extraction speed is increased, and the requirement of extracting the chat data in the game in real time is met; the extracted chatting data are refreshed in a distributed redis on a group of chatting wind control servers in real time through a load balancing technology to be processed, so that the stability of the system is improved, and meanwhile, the chatting wind control servers can be prevented from carrying out full operation on the chatting data of the players. The chat wind control server is internally provided with a multi-rule/model advertisement identification model comprising a deep learning model, and advertisement identification is carried out on the chat data of the player in a layer-by-layer detection mode, so that the identification accuracy is greatly improved, and the automation degree is high.

Claims (12)

1. A real-time intelligent identification method for game chat advertisements is characterized by comprising the following steps: extracting chat characteristic data of each player from game chat contents of a game server in real time, and distributing the chat characteristic data to a group of chat wind control servers for processing through a load balancing framework of an ngnix server and an asynchronous mechanism of a distributed database redis; and asynchronously processing the refreshed chat characteristic data of redis thereof through various advertisement identification rules/models set in each chat wind control server, and identifying whether the advertisement content exists in real time.
2. The method of claim 1, wherein the chat feature data for the player from the game server to the chat client comprises two paths:
a. for chat characteristic data which can be obtained without combining historical information calculation of a player, a stream processing platform kafka is adopted to directly send the chat characteristic data to an ngnix server, and the ngnix server is utilized to carry out a load balancing mechanism to send the chat characteristic data to redis in the group of chat wind control servers;
b. for chat characteristic data which needs to be obtained by combining with player historical data through calculation, a stream processing platform kafka is adopted to send relevant chat data to a distributed storage cassandra or kudu for storing player historical data for storage; the chat characteristic data corresponding to a specific player in the cassandra or kudu is calculated by a real-time calculation engine presto according to all relevant data of the specific player in the distributed storage cassandra or kudu and then written into a search analysis engine es (elastic search), and then the es swipes the chat characteristic data into the redis of the group of chat wind control servers in real time.
3. A method as recited in claim 1 or 2, wherein said extracting chat feature data of each player in real time comprises at least: roles, IP, and chat-time speech text.
4. The method of claim 3, wherein the plurality of advertisement identification rules/models comprises: the system comprises a white list matching rule based on player information, a black list matching rule based on rules formulated by the player information, sensitive words and long-term experience of game operators, a single text semantic advertisement recognition model based on a multi-head self-attention mechanism, a multi-text semantic advertisement recognition model based on a Word2Vec + BilsTM framework and a random forest advertisement recognition model based on real-time characteristic input.
5. The method of claim 4, wherein the single-text semantic advertisement recognition model is trained by a multi-head self-attention framework based on a transform framework and by a large number of manually labeled positive and negative samples; the method comprises the following steps: the system comprises a pretreatment layer, a word embedding layer, a multi-layer perceptron MPL and a full connection layer;
the preprocessing layer is used for inputting the text to be recognized and the speech text to be recognized to the word embedding layer after the text is cleaned; the word embedding layer, a digital feature matrix for extracting an input text and a position matrix for recording the position of each word in the text are used, and a word2vec model used by the word embedding layer is obtained by learning a large amount of in-game historical chat conversations collected by operation and maintenance personnel by adopting a CBOW or Skip-Gram method;
the multi-layer perceptron MPL is formed by cascading processing units consisting of a multi-head self-attention module and an FFN layer, and is used for receiving the number matrix and the position matrix for processing, obtaining a text feature matrix related to the text attention, inputting the text feature matrix into a full-connection layer, and predicting whether the text is an advertisement or not;
the full connection layer is a multi-layer full connection layer and is used for mapping the text feature matrix to a sample mark space, and the probability of whether the text is an advertisement or not is generated after a value output by the last layer is processed by a sigmoid function.
6. The method of claim 4 or 5, wherein the multi-text semantic advertisement recognition model is trained based on a Word2Vec + BilsTM deep learning architecture using a large number of manually labeled positive and negative samples; the method comprises the following steps: the system comprises a preprocessing layer, a word embedding layer, a normalization layer norm, a bidirectional long-term and short-term memory layer and a full connection layer; the preprocessing layer is used for cleaning multilingual sentence texts formed by splicing the latest preset number of speech texts of a specific player and inputting the multilingual sentence texts into the word embedding layer; the word embedding layer, a digital feature matrix used for extracting an input text and a position matrix for recording the position of each word in the text are input into the normalization layer norm for standard normalization processing, and a word2vec model used by the word embedding layer is obtained by learning a large amount of in-game historical chat sessions collected by an operation and maintenance worker by adopting a CBOW or Skip-Gram method; the bidirectional long-short term memory layer BilSTM is used for extracting a context information matrix of the context dependence information of the multi-language sentence text and then outputting the context information matrix to the full connection layer to generate the prediction of whether the multi-language sentence text is an advertisement.
7. The method of claim 6, wherein the step of identifying chat advertisements among the plurality of advertisement identification rules/models in a layer-by-layer progressive manner comprises: firstly, carrying out first matching on chat data of a certain player by adopting a white list matching rule based on player information, directly passing the chat data without subsequent detection if the player information is matched with the information in a white list, otherwise, carrying out second matching on the chat data by utilizing the black list matching rule, and when the player is in the black list, or a chat speech text of the player contains sensitive words in the black list, or directly alarming according with a rule formulated by long-term experience of game operators in the black list, or else, carrying out advertisement recognition on the speech text in the chat data by adopting a single text semantic advertisement recognition model; if the recognition result of the single-text semantic advertisement recognition model based on the multi-head attention mechanism belongs to the advertisement, directly alarming, otherwise, splicing the latest preset number of speaking texts of the player together to form a multi-sentence text, and recognizing the advertisement by adopting the multi-text semantic advertisement recognition model.
8. A real-time intelligent identification system for game chat advertisements is characterized by comprising a stream processing platform kafka, a distributed storage system cassandra or kudu, a real-time computing engine presto, a group of chat wind control servers, an ngnix server and a search analysis engine es;
wherein, for the chat characteristic data which can be obtained without combining the historical information calculation of the player, the stream processing platform kafka directly sends the chat characteristic data to the ngnix server, and the ngnix server is used for carrying out a load balancing mechanism to send the chat characteristic data to redis in the group of chat wind control servers; for the chat characteristic data related to the player history data, which can be obtained only by combining with the calculation of the player history data, the stream processing platform kafka sends the related chat data to a distributed storage cassandra or kudu for storing the player history data, a real-time calculation engine presto calculates the chat characteristic data related to the player history data corresponding to a specific player according to all related data of the specific player in the distributed storage cassandra or kudu and writes the chat characteristic data into a search analysis engine es (elastic search), es swipes the chat characteristic data related to the player history data into the ngnix server in real time, and sends the chat characteristic data to redis in the group of chat wind control servers through a load balancing mechanism of the ngnix server;
the real-time computing engine presto extracts chat data of different players from game chat contents in real time and stores the chat data into the distributed storage system cassandra;
the ngnix server communicates with the distributed storage system cassandra, and distributes the chat data to the group of chat wind control servers for processing through a distributed redis asynchronous mechanism based on a self load balancing framework;
each of the group of chat wind control servers is provided with a plurality of advertisement identification rules/models, and based on the plurality of advertisement identification rules/models, the chat data refreshed to redis thereof is asynchronously processed, and whether the advertisement content exists is identified in real time.
9. The system of claim 8, wherein the plurality of advertisement identification rules/models comprises: the system comprises a white list matching rule based on player information, a black list matching rule based on rules formulated by the player information, sensitive words and long-term experience of game operators, a single text semantic advertisement recognition model based on a multi-head self-attention mechanism, a multi-text semantic advertisement recognition model based on the multi-head self-attention mechanism and a random forest advertisement recognition model based on real-time characteristic input.
10. The system of claim 9, wherein the single-text semantic advertisement recognition model is trained based on a transform-based multi-head self-attention framework using a large number of manually labeled positive and negative examples; the method comprises the following steps: the system comprises a pretreatment layer, a word embedding layer, a multi-layer sensor MPL and a full connection layer; the preprocessing layer is used for inputting the text to be recognized and the speech text to be recognized to the word embedding layer after the text is cleaned; the word embedding layer, a digital feature matrix for extracting an input text and a position matrix for recording the position of each word in the text are used, and a word2vec model used by the word embedding layer is obtained by learning a large amount of in-game historical chat conversations collected by operation and maintenance personnel by adopting a CBOW or Skip-Gram method; the multi-layer perceptron MPL is formed by cascading processing units consisting of a multi-head self-attention module and an FFN layer, and is used for receiving the number matrix and the position matrix for processing, obtaining a text feature matrix related to the text attention, inputting the text feature matrix into a full-connection layer, and generating prediction of whether the text is an advertisement. The full connection layer is a multi-layer full connection layer and is used for mapping the text feature matrix to a sample mark space, and the probability of whether the text is an advertisement or not is generated after a value output by the last layer is processed by a sigmoid function.
11. The system of claim 9 or 10, wherein the multi-text semantic advertisement recognition model is trained based on a Word2Vec + BiLSTM deep learning architecture using a large number of manually labeled positive and negative examples; the method comprises the following steps: the system comprises a preprocessing layer, a word embedding layer, a normalization layer norm, a bidirectional long-term and short-term memory layer and a full connection layer;
the preprocessing layer is used for cleaning multilingual sentence texts formed by splicing the latest preset number of speech texts of a specific player and inputting the multilingual sentence texts into the word embedding layer;
the word embedding layer, a digital feature matrix used for extracting an input text and a position matrix for recording the position of each word in the text are input into the normalization layer norm for standard normalization processing, and a word2vec model used by the word embedding layer is obtained by learning a large amount of in-game historical chat sessions collected by an operation and maintenance worker by adopting a CBOW or Skip-Gram method;
the bidirectional long-short term memory layer BilSTM is used for extracting a context information matrix of the context dependence information of the multi-language sentence text and then outputting the context information matrix to the full connection layer to generate the prediction of whether the multi-language sentence text is an advertisement.
12. The system of claim 11, wherein the chat client identifies the advertisements in a layer-by-layer progressive manner based on the plurality of advertisement identification rules/models, the identification comprising: firstly, carrying out first matching on chat data of a certain player by adopting a white list matching rule based on player information, directly passing the chat data without subsequent detection if the player information is matched with the information in a white list, otherwise, carrying out second matching on the chat data by utilizing the black list matching rule, and when the player is in the black list, or a chat speech text of the player contains sensitive words in the black list, or directly alarming according with a rule formulated by long-term experience of game operators in the black list, or else, carrying out advertisement recognition on the speech text in the chat data by adopting a single text semantic advertisement recognition model; if the recognition result of the single-text semantic advertisement recognition model based on the multi-head attention mechanism belongs to the advertisement, directly alarming, otherwise, splicing the latest preset number of speaking texts of the player together to form a multi-sentence text, and recognizing the advertisement by adopting the multi-text semantic advertisement recognition model.
CN202110133090.1A 2021-01-29 2021-01-29 Real-time intelligent identification method and system for game chat advertisement Pending CN114797114A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110133090.1A CN114797114A (en) 2021-01-29 2021-01-29 Real-time intelligent identification method and system for game chat advertisement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110133090.1A CN114797114A (en) 2021-01-29 2021-01-29 Real-time intelligent identification method and system for game chat advertisement

Publications (1)

Publication Number Publication Date
CN114797114A true CN114797114A (en) 2022-07-29

Family

ID=82527002

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110133090.1A Pending CN114797114A (en) 2021-01-29 2021-01-29 Real-time intelligent identification method and system for game chat advertisement

Country Status (1)

Country Link
CN (1) CN114797114A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116506244A (en) * 2023-05-24 2023-07-28 北京比邻星空科技有限公司 Chat room configuration method capable of self-adapting to number of people in room

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107657286A (en) * 2017-10-19 2018-02-02 北京深极智能科技有限公司 A kind of advertisement recognition method and computer-readable recording medium
CN108197966A (en) * 2017-11-01 2018-06-22 上海新数网络科技股份有限公司 A kind of accurate advertisement analysis method based on crowd's label data
US10417667B1 (en) * 2012-06-27 2019-09-17 Groupon, Inc. Method and apparatus for creating web content and identifying advertisements for users creating and viewing content
CN110719221A (en) * 2019-10-16 2020-01-21 北京蚂蜂窝网络科技有限公司 Instant messaging method, device, equipment and storage medium
CN111538836A (en) * 2020-04-22 2020-08-14 哈尔滨工业大学(威海) Method for identifying financial advertisements in text advertisements

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10417667B1 (en) * 2012-06-27 2019-09-17 Groupon, Inc. Method and apparatus for creating web content and identifying advertisements for users creating and viewing content
CN107657286A (en) * 2017-10-19 2018-02-02 北京深极智能科技有限公司 A kind of advertisement recognition method and computer-readable recording medium
CN108197966A (en) * 2017-11-01 2018-06-22 上海新数网络科技股份有限公司 A kind of accurate advertisement analysis method based on crowd's label data
CN110719221A (en) * 2019-10-16 2020-01-21 北京蚂蜂窝网络科技有限公司 Instant messaging method, device, equipment and storage medium
CN111538836A (en) * 2020-04-22 2020-08-14 哈尔滨工业大学(威海) Method for identifying financial advertisements in text advertisements

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
高扬: "《智能摘要与深度学习》", 30 April 2019, 北京理工大学出版社, pages: 23 - 38 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116506244A (en) * 2023-05-24 2023-07-28 北京比邻星空科技有限公司 Chat room configuration method capable of self-adapting to number of people in room
CN116506244B (en) * 2023-05-24 2023-11-17 北京比邻星空科技有限公司 Chat room configuration method capable of self-adapting to number of people in room

Similar Documents

Publication Publication Date Title
CN109657054B (en) Abstract generation method, device, server and storage medium
CN108717408B (en) Sensitive word real-time monitoring method, electronic equipment, storage medium and system
CN110232109A (en) A kind of Internet public opinion analysis method and system
CN110598037B (en) Image searching method, device and storage medium
CN110598070B (en) Application type identification method and device, server and storage medium
CN110427461A (en) Intelligent answer information processing method, electronic equipment and computer readable storage medium
CN114238573A (en) Information pushing method and device based on text countermeasure sample
CN109408574B (en) Complaint responsibility confirmation system based on text mining technology
CN111522915A (en) Extraction method, device and equipment of Chinese event and storage medium
CN111666400B (en) Message acquisition method, device, computer equipment and storage medium
CN117609479B (en) Model processing method, device, equipment, medium and product
CN111026840A (en) Text processing method, device, server and storage medium
CN112069324B (en) Classification label adding method, device, equipment and storage medium
CN115688920A (en) Knowledge extraction method, model training method, device, equipment and medium
US11200264B2 (en) Systems and methods for identifying dynamic types in voice queries
CN111488501A (en) E-commerce statistical system based on cloud platform
CN114797114A (en) Real-time intelligent identification method and system for game chat advertisement
CN113821612A (en) Information searching method and device
WO2024055603A1 (en) Method and apparatus for identifying text from minor
CN117764373A (en) Risk prediction method, apparatus, device and storage medium
CN111639494A (en) Case affair relation determining method and system
CN117033626A (en) Text auditing method, device, equipment and storage medium
CN116484105A (en) Service processing method, device, computer equipment, storage medium and program product
CN114444609B (en) Data processing method, device, electronic equipment and computer readable storage medium
CN110188201A (en) A kind of information matching method and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination