CN113011167B - Cheating identification method, device, equipment and storage medium based on artificial intelligence - Google Patents

Cheating identification method, device, equipment and storage medium based on artificial intelligence Download PDF

Info

Publication number
CN113011167B
CN113011167B CN202110176521.2A CN202110176521A CN113011167B CN 113011167 B CN113011167 B CN 113011167B CN 202110176521 A CN202110176521 A CN 202110176521A CN 113011167 B CN113011167 B CN 113011167B
Authority
CN
China
Prior art keywords
node
drainage
identified
article
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110176521.2A
Other languages
Chinese (zh)
Other versions
CN113011167A (en
Inventor
邓强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110176521.2A priority Critical patent/CN113011167B/en
Publication of CN113011167A publication Critical patent/CN113011167A/en
Application granted granted Critical
Publication of CN113011167B publication Critical patent/CN113011167B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Abstract

The application provides a cheating identification method and device based on artificial intelligence, electronic equipment and a computer readable storage medium; the method comprises the following steps: performing feature extraction processing on the article to be identified to obtain text features of the article to be identified; determining node characteristics of drainage nodes of articles to be identified based on drainage relations of the articles to be identified; constructing a drainage relation diagram of the articles to be identified based on the articles to be identified and the drainage nodes; updating text characteristics of the articles to be identified and node characteristics of the drainage nodes based on the drainage relation diagram of the articles to be identified; carrying out fusion processing on the text characteristics of the updated articles to be identified and the node characteristics of the updated drainage nodes to obtain fusion characteristics; and performing cheating prediction processing based on the fusion characteristics to obtain the probability that the article to be identified belongs to the cheating article. By the method and the device, efficient and automatic processing of cheating identification is realized.

Description

Cheating identification method, device, equipment and storage medium based on artificial intelligence
Technical Field
The present application relates to an artificial intelligence technology, and in particular, to an artificial intelligence-based cheating identification method, apparatus, electronic device and computer readable storage medium.
Background
Artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is a comprehensive technology of computer science, and by researching the design principle and implementation method of various intelligent machines, the machines have the functions of sensing, reasoning and decision. Artificial intelligence technology is a comprehensive subject, and relates to a wide range of fields, such as natural language processing technology, machine learning/deep learning and other directions, and with the development of technology, the artificial intelligence technology will be applied in more fields and has an increasingly important value.
The cheating identification is an important research direction in the field of artificial intelligence, and the cheating identification refers to a process of identifying cheating articles from a large number of articles.
However, the related art lacks a scheme for performing cheating identification on articles based on artificial intelligence, and mainly relies on artificially set rules for performing the cheating identification.
Disclosure of Invention
The embodiment of the application provides an image target recognition method, an image target recognition device, electronic equipment and a computer readable storage medium, which realize efficient and automatic processing of cheating recognition.
The technical scheme of the embodiment of the application is realized as follows:
The embodiment of the application provides an artificial intelligence-based cheating identification method, which comprises the following steps:
performing feature extraction processing on an article to be identified to obtain text features of the article to be identified;
Determining node characteristics of drainage nodes of the articles to be identified based on the drainage relations of the articles to be identified;
constructing a drainage relation diagram of the article to be identified based on the article to be identified and the drainage node;
Updating text characteristics of the articles to be identified and node characteristics of the drainage nodes based on the drainage relation diagram of the articles to be identified;
Carrying out fusion processing on the updated text characteristics of the article to be identified and the updated node characteristics of the drainage node to obtain fusion characteristics;
and performing cheating prediction processing based on the fusion characteristics to obtain the probability that the article to be identified belongs to the cheating article.
In the above technical solution, the fusing processing is performed on the updated text feature of the article to be identified and the updated node feature of the drainage node to obtain a fused feature, including:
splicing the updated text features of the articles to be identified with the updated node features of the drainage nodes to obtain the fusion features; or alternatively
And adding the updated text features of the articles to be identified and the updated node features of the drainage nodes to obtain the fusion features.
The embodiment of the application provides a cheating identification device based on artificial intelligence, which comprises the following components:
the feature extraction module is used for carrying out feature extraction processing on the article to be identified to obtain text features of the article to be identified;
The determining module is used for determining node characteristics of drainage nodes of the articles to be identified based on the drainage relations of the articles to be identified;
The construction module is used for constructing a drainage relation diagram of the article to be identified based on the article to be identified and the drainage node;
The updating module is used for updating text characteristics of the article to be identified and node characteristics of the drainage nodes based on the drainage relation diagram of the article to be identified;
The fusion module is used for carrying out fusion processing on the updated text characteristics of the article to be identified and the updated node characteristics of the drainage node to obtain fusion characteristics;
And the classification module is used for carrying out cheating prediction processing based on the fusion characteristics to obtain the probability that the article to be identified belongs to the cheating article.
In the above technical solution, the feature extraction module is further configured to perform feature extraction processing on the title of the article to be identified, so as to obtain title features of the article to be identified;
Performing feature extraction processing on the text of the article to be identified to obtain the text features of the article to be identified;
And carrying out fusion processing on the title features of the article to be identified and the text features of the article to be identified to obtain the text features of the article to be identified.
In the above technical solution, the feature extraction module is further configured to perform word segmentation processing on a title of the article to be identified, so as to obtain a plurality of words of the title;
mapping the words of the title to obtain word vectors corresponding to the words respectively;
Performing splicing processing on word vectors corresponding to the words respectively to obtain a vector matrix of the title;
and extracting keywords based on the vector matrix of the title to obtain the title characteristics of the article to be identified.
In the above technical solution, the feature extraction module is further configured to perform convolution processing based on the vector matrix of the header, to obtain a plurality of feature graphs of the header;
keyword extraction processing is carried out on the feature graphs of the title to obtain a plurality of keyword features of the title;
And performing splicing processing on the plurality of keyword features to obtain the title features of the articles to be identified.
In the above technical solution, the drainage nodes of the article to be identified include an initial drainage node and a termination drainage node; the determining module is further used for determining an initial drainage node and an end drainage node of the article to be identified based on the drainage relation of the article to be identified;
and respectively carrying out feature extraction processing on the initial drainage node and the termination drainage node to obtain node features of the initial drainage node and node features of the termination drainage node.
In the above technical solution, the drainage nodes of the article to be identified include an initial drainage node and a termination drainage node; the construction module is further used for determining neighbor nodes of the article to be identified based on the article to be identified, the starting drainage node and the ending drainage node;
taking the article to be identified as an edge between the initial drainage node and the termination drainage node;
And constructing a drainage relation diagram of the article to be identified based on edges between the starting drainage node and the ending drainage node, the starting drainage node, the ending drainage node and neighbor nodes of the article to be identified.
In the above technical solution, the drainage nodes of the article to be identified include an initial drainage node and a termination drainage node; the updating module is further used for determining an initial drainage node of the article to be identified, a termination drainage node of the article to be identified and a neighbor node of the article to be identified based on the drainage relation diagram of the article to be identified;
Updating text features of the article to be identified based on the article to be identified, the starting drainage node and the ending drainage node;
And updating node characteristics of the initial drainage node and node characteristics of the termination drainage node based on the neighbor nodes, the initial drainage node and the termination drainage node of the article to be identified.
In the above technical solution, the update module is further configured to perform a stitching process on the text feature of the article to be identified, the node feature of the start drainage node, and the node feature of the end drainage node, so as to obtain a stitching feature;
mapping processing is carried out based on the splicing characteristics to obtain mapping characteristics;
And updating the text characteristics of the article to be identified based on the mapping characteristics.
In the above technical solution, the updating module is further configured to perform product processing on the spliced feature and the learnable matrix;
and carrying out nonlinear mapping processing on the product processing result to obtain the mapping characteristic.
In the above technical solution, the update module is further configured to determine a neighbor related feature of the starting drainage node based on a neighbor node of the article to be identified;
carrying out fusion processing on the neighbor related characteristics of the initial drainage node and the node characteristics of the initial drainage node to obtain fusion characteristics of the initial drainage node;
updating node characteristics of the initial drainage node based on the fusion characteristics of the initial drainage node;
determining neighbor related features of the termination drainage node based on neighbor nodes of the article to be identified;
Carrying out fusion processing on the neighbor related characteristics of the termination drainage node and the node characteristics of the termination drainage node to obtain fusion characteristics of the termination drainage node;
and updating the node characteristics of the termination drainage node based on the fusion characteristics of the termination drainage node.
In the above technical solution, the update module is further configured to determine a neighbor node of the termination drainage node based on a neighbor node of the article to be identified;
determining edge characteristics of edges between neighbor nodes of the termination drainage node and the termination drainage node;
splicing the node characteristics of the neighbor nodes of the termination drainage node with the edge characteristics to obtain splicing characteristics;
And performing attention processing based on the splicing characteristics and the node characteristics of the termination drainage node to obtain neighbor related characteristics of the termination drainage node.
In the above technical solution, the update module is further configured to perform product processing on node features of the termination drainage node and a learnable matrix;
and splicing the product processing result with the neighbor related characteristics of the termination drainage node to obtain the fusion characteristics of the termination drainage node.
In the above technical solution, the fusion module is further configured to splice the updated text feature of the article to be identified with the updated node feature of the drainage node, so as to obtain the fusion feature; or alternatively
And adding the updated text features of the articles to be identified and the updated node features of the drainage nodes to obtain the fusion features.
The embodiment of the application provides electronic equipment for cheating identification, which comprises:
A memory for storing executable instructions;
And the processor is used for realizing the cheating identification method based on artificial intelligence when executing the executable instructions stored in the memory.
The embodiment of the application provides a computer readable storage medium which stores executable instructions for causing a processor to execute, thereby realizing the artificial intelligence-based cheating identification method provided by the embodiment of the application.
The embodiment of the application has the following beneficial effects:
By combining text features of the articles to be identified and drainage relations of the articles to be identified, the probability that the articles to be identified belong to the cheating articles is obtained, so that an efficient article cheating identification process is realized, and the accuracy of cheating article identification is improved.
Drawings
Fig. 1 is a schematic diagram of an application scenario of a cheating identification system provided by an embodiment of the present application;
FIG. 2 is a schematic diagram of an alternative architecture of a distributed system for use in a blockchain system in accordance with embodiments of the present application;
FIG. 3 is a schematic diagram of an alternative block structure according to an embodiment of the present application;
FIG. 4 is a schematic structural diagram of an electronic device for cheating identification according to an embodiment of the present application;
FIGS. 5A-5C are schematic flow diagrams of an artificial intelligence based cheating identification method according to embodiments of the present application;
FIG. 6 is a schematic flow chart of drainage provided by an embodiment of the present application;
FIG. 7 is a schematic diagram of a neighboring node provided by an embodiment of the present application;
FIG. 8A is a schematic diagram of a search entry interface provided by an embodiment of the present application;
FIG. 8B is a schematic diagram of a search main interface provided by an embodiment of the present application;
FIG. 9 is a schematic diagram of a cheating-based drainage article provided by an embodiment of the present application;
FIG. 10 is a schematic flow chart of a method for detecting cheating guide articles according to an embodiment of the present application;
FIG. 11 is a schematic diagram of a structure of a method for detecting cheating guide articles according to an embodiment of the present application;
FIG. 12 is an algorithmic schematic of the attention mechanism provided by embodiments of the present application;
FIG. 13 is a flow diagram of batch sub-graph reasoning provided by an embodiment of the present application.
Detailed Description
The present application will be further described in detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present application more apparent, and the described embodiments should not be construed as limiting the present application, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present application.
In the following description, the terms "first", "second", and the like are merely used to distinguish between similar objects and do not represent a particular ordering of the objects, it being understood that the "first", "second", or the like may be interchanged with one another, if permitted, to enable embodiments of the application described herein to be practiced otherwise than as illustrated or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.
Before describing embodiments of the present application in further detail, the terms and terminology involved in the embodiments of the present application will be described, and the terms and terminology involved in the embodiments of the present application will be used in the following explanation.
1) Convolutional neural network (CNN, convolutional Neural Networks): one type of feed-forward neural network (FNN, feedforward Neural Networks) that includes convolution calculations and has a deep structure is one of the representative algorithms for deep learning (DEEP LEARNING). Convolutional neural networks have a token learning (representation learning) capability that enables a shift-invariant classification (shift-INVARIANT CLASSIFICATION) of input images in their hierarchical structure.
2) Supervised learning: and (3) adjusting the parameters of the classifier by using a set of samples with known classes to achieve the required performance. In supervised learning, each instance is composed of an input object (typically a vector) and a desired output value (also called a supervisory signal). The supervised learning algorithm is a function that analyzes the training data and generates an inference that can be used to map out new instances. The supervised learning algorithm mainly comprises a neural network propagation algorithm, a decision tree learning algorithm and the like. The supervised learning in the embodiments of the present application is a process of training a supervised recognition model based on white samples (non-cheating article samples) and black samples (cheating article samples).
3) Drainage: a means of directing the attention of a user through some kind of propagation carrier or by means of various platforms using text, picture links, reading text, menu jumps, audio, video etc. For example, after a user clicks a public number a (a propagation carrier), a drainage article of the public number is presented, the drainage article includes a two-dimensional code picture, after the user scans the two-dimensional code picture, the user jumps to another public number B, that is, the public number a is drained to the public number B through the two-dimensional code picture in the drainage article.
4) Drainage relationship: the drainage nodes (various propagation carriers, not limited to public numbers, external chains and the like) with the articles have association relations, the drainage relations comprise an active drainage relation and a passive drainage relation, and the drainage nodes comprise an initial drainage node (a drainage node for bearing the articles) and a termination drainage node (a drainage node reached after the articles are drained). For example, after a user clicks a public number a (a propagation carrier), a drainage article of the public number is presented, the drainage article comprises a two-dimensional code picture, after the user scans the two-dimensional code picture, the user jumps to another public number B, that is, the public number a is drained to the public number B through the two-dimensional code picture in the drainage article, so that the public number a, the public number B and the drainage article have a drainage relationship, the public number a is an initial drainage node, and the public number B is a termination drainage node.
5) Blockchain (Blockchain): the storage structure of encrypted, chained transactions formed by blocks (blocks).
6) Blockchain network (Blockchain Network): the new block is incorporated into the set of nodes of the blockchain in a consensus manner.
The embodiment of the application provides an artificial intelligence cheating identification method, an artificial intelligence cheating identification device, electronic equipment and a computer readable storage medium, which can realize efficient and automatic processing of cheating identification.
The cheating identification method based on artificial intelligence provided by the embodiment of the application can be independently realized by a terminal/server; the method can also be realized by cooperation of the terminal and the server, for example, the terminal solely bears an artificial intelligence-based cheating identification method described below, or the terminal sends an article-based cheating identification request to the server, the server executes the artificial intelligence-based cheating identification method according to the received article-based cheating identification request, and performs cheating prediction on the article to be identified based on the text and the drainage relation of the article to be identified so as to realize the cheating identification function, thereby identifying the cheating article, and performs filtering operation based on the cheating article so as to perform subsequent searching, recommending and other operations.
The electronic device for cheating identification provided by the embodiment of the application can be various types of terminal devices or servers, wherein the servers can be independent physical servers, can be a server cluster or a distributed system formed by a plurality of physical servers, and can be cloud servers for providing cloud computing services; the terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein.
Taking a server as an example, for example, a server cluster deployed in the cloud may be used, an artificial intelligence cloud service (AI AS A SERVICE, AIAAS) is opened to users, a AIaaS platform splits several types of common AI services and provides independent or packaged services in the cloud, and the service mode is similar to an AI theme mall, and all users can access one or more artificial intelligence services provided by using the AIaaS platform through an application programming interface.
For example, one of the artificial intelligence cloud services may be a cheating identification service, that is, a cloud server is packaged with a cheating identification program provided by the embodiment of the present application. The user invokes the cheating identification service in the cloud service through the terminal (the client is operated, such as a search client and a recommendation client) so that a server deployed at the cloud end invokes a packaged cheating identification program, and based on the text and the drainage relation of the articles to be identified, the cheating prediction is performed on the articles to be identified so as to identify the cheating articles, for example, for a recommendation application, before the articles are recommended to the user, the cheating prediction is performed on the articles to be recommended based on the text and the drainage relation of the articles to be recommended in the database so as to identify the cheating articles, and the cheating articles are screened from the database so as to recommend normal articles to the user, improve the quality of the recommended articles, quickly obtain the user behavior data and improve the effect of recommending the articles based on the user behaviors in the later period.
Referring to fig. 1, fig. 1 is a schematic diagram of an application scenario of a cheating identification system 10 according to an embodiment of the present application, where a terminal 200 is connected to a server 100 through a network 300, and the network 300 may be a wide area network or a local area network, or a combination of the two.
The terminal 200 (running with a client, e.g., search client, recommendation client, etc.) may be used to obtain a request for cheating identification for an article, e.g., after a user enters a search keyword at an input interface of the terminal, the terminal automatically obtains the request for cheating identification for an article.
In some embodiments, a cheating identification plug-in can be implanted in a client running in the terminal to locally implement an artificial intelligence-based cheating identification method on the client. For example, after acquiring a cheating identification request for an article, the terminal 200 invokes a cheating identification plug-in to implement an artificial intelligence-based cheating identification method, and performs cheating prediction on the article to be identified based on the text and the drainage relationship of the article to be identified, so as to identify the cheating article, for example, for a recommendation application, before recommending the article to a user, the cheating prediction is performed on the article to be recommended based on the text and the drainage relationship of the article to be recommended in the database, so as to identify the cheating article in the database, and the cheating article is screened from the database, so as to perform article recommendation based on the normal article in the database, improve the quality of the recommended article, quickly obtain user behavior data, and improve the effect of recommending the article based on the user behavior in the later period.
In some embodiments, after the terminal 200 obtains the cheating identification request for the article, the cheating identification interface of the server 100 is invoked (may be provided in the form of cloud service, i.e. the cheating identification service), the server 100 performs cheating prediction on the article to be identified based on the text and the drainage relationship of the article to be identified, thereby identifying the cheating article, for example, for the search application, the terminal 200 automatically generates the cheating identification request (including the search keyword) for the article through the search keyword input by the user, and sends the cheating identification request for the article to the server 100, the server 100 analyzes the cheating identification request for the article, obtains the search keyword, and recalls the article from the database based on the search keyword, performs cheating prediction on the recalled article based on the text of the recalled article and the drainage relationship of the recalled article, thereby identifying the cheating article, and removing the cheating article from the recalled article, and sending the normal article in the recalled article to the terminal 200, thereby improving the quality of the search article, and fully utilizing the normal behavior of the user.
The cheating identification system according to the embodiment of the present application may be a distributed system formed by connecting a client and a plurality of nodes (any form of computing device in an access network, such as a server and a user terminal) through a network communication.
Taking a distributed system as an example of a blockchain system, referring To fig. 2, fig. 2 is a schematic diagram of an alternative architecture of a distributed system 200 applied To the blockchain system according To an embodiment of the present application, the architecture is formed by a plurality of nodes 201 (any type of computing devices in an access network, such as servers and user terminals) and clients 202, and a Peer-To-Peer (P2P, peer To Peer) network is formed between the nodes, where the P2P protocol is an application layer protocol running on top of a transmission control protocol (TCP, transmission Control Protocol) protocol. In a distributed system, any machine, such as a server, a terminal, may join to become a node, including a hardware layer, an intermediate layer, an operating system layer, and an application layer.
Referring to the functionality of each node in the blockchain system shown in fig. 2, the functions involved include:
1) The routing, the node has basic functions for supporting communication between nodes.
Besides the routing function, the node can also have the following functions:
2) The application is used for being deployed in a block chain to realize specific service according to actual service requirements, recording data related to the realization function to form recorded data, carrying a digital signature in the recorded data to represent the source of task data, sending the recorded data to other nodes in the block chain system, and adding the recorded data into a temporary block when the source and the integrity of the recorded data are verified by the other nodes.
For example, the services implemented by the application include:
2.1 The shared account book is used for providing the functions of storing, inquiring, modifying and the like of account data, sending record data of the operation on the account data to other nodes in the blockchain system, and after the other nodes verify to be effective, storing the record data into a temporary block as a response for acknowledging that the account data is effective, and also sending confirmation to the node initiating the operation.
2.2 A computerized agreement that can execute the terms of a certain contract, implemented by code deployed on the shared ledger for execution when certain conditions are met, for completing automated transactions, such as querying for cheating articles, according to actual business demand codes; of course, the smart contract is not limited to executing the contract for the transaction, and may execute a contract that processes the received information.
3) The blockchain comprises a series of blocks (blocks) which are connected with each other according to the generated sequence time, the new blocks are not removed once being added into the blockchain, and record data submitted by nodes in the blockchain system are recorded in the blocks.
Referring to fig. 3, fig. 3 is an optional Block Structure (Block Structure) provided in an embodiment of the present application, where each Block includes a hash value of a transaction record stored in the Block (hash value of the Block) and a hash value of a previous Block, and each Block is connected by the hash value to form a Block chain. In addition, the block may include information such as a time stamp at the time of block generation. The blockchain (Blockchain), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains associated information for verifying the validity of its information (anti-counterfeiting) and generating the next block.
Referring to fig. 4, fig. 4 is a schematic structural diagram of an electronic device 500 for cheating identification according to an embodiment of the present application, taking the electronic device 500 as a server as an example, the electronic device 500 for cheating identification shown in fig. 4 includes: at least one processor 510, a memory 550, at least one network interface 520, and a user interface 530. The various components in electronic device 500 are coupled together by bus system 540. It is appreciated that the bus system 540 is used to enable connected communications between these components. The bus system 540 includes a power bus, a control bus, and a status signal bus in addition to the data bus. The various buses are labeled as bus system 540 in fig. 4 for clarity of illustration.
The Processor 510 may be an integrated circuit chip having signal processing capabilities such as a general purpose Processor, such as a microprocessor or any conventional Processor, a digital signal Processor (DSP, digital Signal Processor), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.
Memory 550 includes volatile memory or nonvolatile memory, and may also include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a random access Memory (RAM, random Access Memory). The memory 550 described in embodiments of the present application is intended to comprise any suitable type of memory. Memory 550 may optionally include one or more storage devices physically located remote from processor 510.
In some embodiments, memory 550 is capable of storing data to support various operations, examples of which include programs, modules and data structures, or subsets or supersets thereof, as exemplified below.
An operating system 551 including system programs for handling various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and handling hardware-based tasks;
Network communication module 552 is used to reach other computing devices via one or more (wired or wireless) network interfaces 520, exemplary network interfaces 520 include: bluetooth, wireless compatibility authentication (WiFi), and universal serial bus (USB, universal Serial Bus), etc.;
In some embodiments, the cheating identification device provided by the embodiments of the present application may be implemented in a software manner, for example, may be the cheating identification plug-in the above terminal, and may be the cheating identification service in the above server. Of course, without being limited thereto, the cheating identification apparatus provided by the embodiments of the present application may be provided in various software embodiments, including various forms of application programs, software modules, scripts, or code.
FIG. 4 illustrates a cheating identification means 555 stored in memory 550, which may be software in the form of a program, plug-in, or the like, such as a cheating identification plug-in, and comprises a series of modules including a feature extraction module 5551, a determination module 5552, a construction module 5553, an update module 5554, a fusion module 5555, and a classification module 5556; the feature extraction module 5551, the determination module 5552, the construction module 5553, the update module 5554, the fusion module 5555, and the classification module 5556 are configured to implement the cheating identification function provided by the embodiment of the present application.
As described above, the artificial intelligence based cheating identification method provided by the embodiments of the present application may be implemented by various types of electronic devices. Referring to fig. 5A, fig. 5A is a schematic flow chart of an artificial intelligence based cheating identification method according to an embodiment of the present application, and is described with reference to the steps shown in fig. 5A.
In the following steps, the drainage nodes are various propagation carriers, and are not limited to public numbers, outer chains and the like. The drainage nodes comprise an initial drainage node and a terminal drainage node, wherein the initial drainage node represents a node for bearing an article to be identified, and the terminal drainage node represents a node reached after the article to be identified is drained.
In step 101, feature extraction processing is performed on the article to be identified, so as to obtain text features of the article to be identified.
As an acquisition example of an article to be identified, a search keyword input by a user at an input interface of a terminal automatically generates a cheating identification request (including the search keyword) for the article, and sends the cheating identification request for the article to a server, the server 100 analyzes the cheating identification request for the article, acquires the search keyword, recalls the article to be identified from a database based on the search keyword, and performs feature extraction on the article to be identified to obtain text features of the article to be identified, so that the article cheating identification is performed based on the text features of the article to be identified.
Referring to fig. 5B, fig. 5B is a schematic flow chart of an alternative method for identifying cheating based on artificial intelligence according to an embodiment of the present application, and fig. 5B illustrates that step 101 in fig. 5A may be implemented through steps 1011-1013: in step 1011, performing feature extraction processing on the title of the article to be identified to obtain title features of the article to be identified; in step 1012, feature extraction processing is performed on the text of the article to be identified, so as to obtain the text features of the article to be identified; in step 1013, the title feature of the article to be identified and the text feature of the article to be identified are fused, so as to obtain the text feature of the article to be identified.
The text comprises text, the number of pictures, video information and the like in the article. In order to more fully extract the text features of the article to be identified, the title features and the text features of the article to be identified can be extracted, and the title features of the article to be identified and the text features of the article to be identified are spliced, added or averaged to obtain the text features of the accurate article to be identified, so that the text features of the article to be identified are represented through the features of multiple dimensions, and the cheating identification is performed based on the accurate text features.
In some embodiments, performing feature extraction processing on a title of an article to be identified to obtain title features of the article to be identified, including: word segmentation processing is carried out on the titles of the articles to be identified, so that a plurality of words of the titles are obtained; mapping the words of the title to obtain word vectors corresponding to the words respectively; performing splicing processing on word vectors corresponding to the words respectively to obtain a vector matrix of the title; and extracting keywords based on the vector matrix of the title to obtain the title characteristics of the article to be identified.
For example, firstly, word segmentation is performed on the title of the article to be identified through a word segmentation device to obtain a plurality of words of the title, for example, the title is divided into w1, w2, … and wk words, K words are totally mapped into M-dimensional vectors, the M-dimensional vectors are spliced to obtain a M-x-K matrix, namely a vector matrix of the title, and finally keyword extraction processing is performed on the M-x-K matrix through a text recognition model to obtain title features of the article to be identified (namely initialization features of the article to be identified).
In some embodiments, keyword extraction processing is performed based on a vector matrix of the headlines to obtain headline features of the articles to be identified, including: performing convolution processing based on the vector matrix of the title to obtain a plurality of feature graphs of the title; keyword extraction processing is carried out on the feature graphs of the title to obtain a plurality of keyword features of the title; and performing splicing processing on the plurality of keyword features to obtain the title features of the articles to be identified.
For example, core key features in the title are extracted by utilizing strong local feature extraction capability of a Text classification model (TextCNN) based on a convolutional neural network, namely, vector matrixes of the title are subjected to convolution processing through convolution layers of different convolution kernels in TextCNN to obtain a plurality of feature graphs of the title, the plurality of feature graphs of the title are subjected to pooling processing through pooling layers in TextCN N to obtain a plurality of keyword features of the title, and the plurality of keyword features are spliced to obtain the title features of an article to be identified, so that strong local feature extraction capability of the Text CNN is realized, the parameter quantity of the title features is reduced, and the consumption of calculation resources is reduced.
In step 102, node characteristics of the drainage nodes of the articles to be identified are determined based on the drainage relationships of the articles to be identified.
The drainage node of the article to be identified includes a start drainage node and a stop drainage node, for example, the start drainage node represents a public number for publishing the article to be identified, and the stop drainage node represents a public number reached after the two-dimensional code based on the article to be identified is drained.
In some embodiments, determining node features of the drainage nodes of the articles to be identified based on the drainage relationships of the articles to be identified includes: determining an initial drainage node and a termination drainage node of the article to be identified based on the drainage relation of the article to be identified; and respectively carrying out feature extraction processing on the initial drainage node and the termination drainage node to obtain node features of the initial drainage node and node features of the termination drainage node.
For example, as shown in fig. 6, the public number 601 publishes an article 602 to be identified, performs drainage based on the two-dimensional code in the article 602 to be identified, and jumps to the public number 603, that is, determines that the start drainage node of the article to be identified is the public number 601 and the end drainage node is the public number 603 based on the drainage relationship of the article to be identified. After the initial drainage node and the termination drainage node of the article to be identified are determined, feature extraction is carried out on the initial drainage node and the termination drainage node respectively, and node features of the initial drainage node and node features of the termination drainage node are obtained.
In step 103, a drainage relationship diagram of the articles to be identified is constructed based on the articles to be identified and the drainage nodes.
For example, after determining the drainage nodes of the articles to be identified, a drainage relation diagram of the articles to be identified is constructed based on the articles to be identified and the drainage nodes so as to perform cheating identification based on the drainage relation diagram.
In some embodiments, the drainage nodes of the article to be identified include a start drainage node and a stop drainage node; based on the article to be identified and the drainage nodes, constructing a drainage relation diagram of the article to be identified, including: determining neighbor nodes of the articles to be identified based on the articles to be identified, the starting drainage nodes and the ending drainage nodes; taking the article to be identified as an edge between the initial drainage node and the termination drainage node; and constructing a drainage relation diagram of the article to be identified based on the edges between the initial drainage node and the terminal drainage node, the initial drainage node, the terminal drainage node and the neighbor nodes of the article to be identified.
For example, as shown in fig. 7, based on the article to be identified, the start and stop stream nodes, a multi-level neighbor node of the article to be identified, such as a first-level neighbor node (stream node pointing to the start stream node and the stop stream node), a second-level neighbor node (stream node pointing to the first-level stream node), may be determined. And taking the article to be identified as an edge between the initial drainage node and the terminal drainage node, and constructing a drainage relation diagram of the article to be identified based on the edge between the initial drainage node and the terminal drainage node, the initial drainage node, the terminal drainage node and the neighbor nodes of the article to be identified.
In step 104, based on the drainage relationship diagram of the article to be identified, the text features of the article to be identified and the node features of the drainage nodes are updated.
For example, after determining the drainage relation diagram of the article to be identified, updating the text features of the article to be identified and the node features of the drainage nodes through the graph attention network model based on the drainage relation diagram of the article to be identified, and performing cheating identification based on the updated text features of the article to be identified and the updated node features of the drainage nodes.
Referring to fig. 5C, fig. 5C is a schematic flow chart of an alternative method for identifying cheating based on artificial intelligence according to an embodiment of the present application, and fig. 5C illustrates that step 104 in fig. 5A may be implemented by steps 1041 to 1043: in step 1041, determining an initial drainage node of the article to be identified, a final drainage node of the article to be identified and a neighbor node of the article to be identified based on the drainage relationship diagram of the article to be identified; in step 1042, updating text features of the article to be identified based on the article to be identified, the start drainage node and the end drainage node; in step 1043, node characteristics of the start and stop drainage nodes are updated based on neighbor nodes, start and stop drainage nodes of the article to be identified.
The drawing attention network model comprises multiple layers of attention layers, and text features of articles to be identified, node features of starting drainage nodes and node features of ending drainage nodes are updated through multiple layers of attention layer hierarchies.
In some embodiments, updating text features of the article to be identified based on the article to be identified, the start drainage node, and the end drainage node, includes: splicing the text features of the articles to be identified, the node features of the initial drainage nodes and the node features of the termination drainage nodes to obtain splicing features; mapping processing is carried out based on the splicing characteristics to obtain mapping characteristics; the text features of the article to be identified are updated based on the mapped features.
After the text features of the articles to be identified, the node features of the starting drainage nodes and the node features of the ending drainage nodes are spliced, the spliced features are obtained, mapping is carried out based on the spliced features, mapping features are obtained, and the text features of the articles to be identified are updated to be the mapping features in the updating process.
In some embodiments, mapping processing is performed based on the splice feature to obtain a mapped feature, including: performing product processing on the spliced characteristics and the learnable matrix; and carrying out nonlinear mapping processing on the product processing result to obtain mapping characteristics.
Following the above example, the calculation of the map isWherein/>A leachable matrix representing the kth layer, σ (·) representing a nonlinear mapping process, k representing the number of layers of the attention layer, concat representing a stitching process,/>Text features representing an article to be identified of layer k-1,/>Node characteristics representing the initial drainage node of layer k-1,/>Node characteristics representing the termination of the drainage node of layer k-1,/>Representing the text features (i.e., updated text features) of the k-th layer of articles to be identified.
In some embodiments, updating node characteristics of the start and stop drainage nodes based on neighbor nodes, start and stop drainage nodes of the article to be identified, includes: determining neighbor related features of the initial drainage node based on neighbor nodes of articles to be identified; carrying out fusion processing on the neighbor related characteristics of the initial drainage node and the node characteristics of the initial drainage node to obtain fusion characteristics of the initial drainage node; updating node characteristics of the initial drainage node based on the fusion characteristics of the initial drainage node; determining neighbor related features of the termination drainage node based on neighbor nodes of articles to be identified; carrying out fusion processing on the neighbor related characteristics of the termination drainage node and the node characteristics of the termination drainage node to obtain fusion characteristics of the termination drainage node; node characteristics of the termination and drainage nodes are updated based on the fusion characteristics of the termination and drainage nodes.
For example, for updating the node characteristics of the initial drainage node, firstly determining the neighbor related characteristics of the initial drainage node based on the neighbor nodes of the article to be identified, then splicing the neighbor related characteristics of the initial drainage node and the node characteristics of the initial drainage node to obtain the fusion characteristics of the initial drainage node, and updating the node characteristics of the initial drainage node to the fusion characteristics of the initial drainage node.
For example, for updating the node characteristics of the termination drainage node, determining the neighbor related characteristics of the termination drainage node based on the neighbor nodes of the article to be identified, then splicing the neighbor related characteristics of the termination drainage node and the node characteristics of the termination drainage node to obtain the fusion characteristics of the termination drainage node, and updating the node characteristics of the termination drainage node to the fusion characteristics of the termination drainage node.
In some embodiments, determining neighbor related features of a termination drainage node based on neighbor nodes of an article to be identified includes: determining neighbor nodes of the termination drainage node based on neighbor nodes of articles to be identified; determining edge characteristics of edges between neighboring nodes of the termination drainage node and the termination drainage node; splicing the node characteristics and the edge characteristics of the neighbor nodes of the termination drainage node to obtain splicing characteristics; and performing attention processing based on the splicing characteristics and the node characteristics of the termination drainage nodes to obtain neighbor related characteristics of the termination drainage nodes.
For example, an edge between nodes is an article whose edge features characterize the text features of the article. The calculation process of the neighbor related features of the termination drainage node is as shown in the formula (1) -the formula (2):
Wherein sre' denotes a neighbor node of the termination-drainage node, E (dst) denotes a set of all neighbor nodes of the termination-drainage node, Representing edge features of the k-1 layer,/>Representing splice characteristics, ATTN represents attention handling,/>A leachable matrix representing a k-th layer,/>Representing neighbor related features of the termination drainage node.
In some embodiments, the fusing processing is performed on the neighbor related features of the termination and drainage node and the node features of the termination and drainage node to obtain the fused features of the termination and drainage node, including: performing product processing on node characteristics of the termination drainage nodes and the learnable matrix; and splicing the product processing result with the neighbor related features of the termination drainage node to obtain the fusion features of the termination drainage node.
For example, the fusion process is as followsWherein/>A leachable matrix representing a kth layer, k representing the number of layers of the attention layer, concat representing the stitching process,/>Representing neighbor related features of a termination drainage node of layer k-1,/>Node characteristics representing the termination of the drainage node of layer k-1,/>Representing node characteristics of the termination and drainage node of the k-th layer (i.e., node characteristics of the updated termination and drainage node).
In step 105, fusion processing is performed on the text features of the updated article to be identified and the node features of the updated drainage node, so as to obtain fusion features.
After the text features of the updated article to be identified and the node features of the updated drainage node are obtained, the text features of the article to be identified and the node features of the updated drainage node are fused to obtain fusion features, so that the cheating identification is performed based on the fusion features.
For example, the text features of the updated articles to be identified and the node features of the updated drainage nodes are spliced to obtain fusion features, so that fusion processing is realized through simple splicing operation, and the computing resources are saved.
For example, the text features of the updated articles to be identified and the node features of the updated drainage nodes are added to obtain fusion features, so that fusion processing is realized through simple addition operation, and the computing resources are saved.
In step 106, the cheating prediction process is performed based on the fusion feature, so as to obtain the probability that the article to be identified belongs to the cheating article.
For example, after determining the fusion feature, classifying the fusion feature by a classifier to obtain the probability that the article to be identified belongs to the cheating article, and determining that the article to be identified is the cheating article when the probability that the article to be identified belongs to the cheating article is greater than a probability threshold value, so as to realize an accurate cheating identification function, so that subsequent post-processing based on the identified cheating article, such as filtering the cheating article from a database, is performed.
In the following, an exemplary application of the embodiment of the present application in a practical application scenario will be described.
The cheating article identification of the embodiment of the application can be applied to various scenes, for example, as shown in fig. 1, for search application, a terminal automatically generates a cheating identification request (comprising search keywords) for an article through search keywords input by a user, and sends the cheating identification request for the article to a server, the server analyzes the cheating identification request for the article, acquires the search keywords, recalls the article from a database based on the search keywords, acquires various recall articles, firstly carries out cheating prediction on the recall articles based on the text of the recall articles and the drainage relation of the recall articles, so as to identify the cheating article, screens the cheating article from the recall articles, sends the normal article in the recall article to the terminal, and the terminal presents the normal article to the user, thereby improving the quality of the search article and rapidly acquiring user behavior data so as to fully utilize computing resources.
The cheating article identification scheme in the related art comprises the following steps:
the method 1 comprises the steps of optimizing (SEO, search Engine Opti mization) keywords (such as public numbers and the like) by manually mining a batch of cheating search engines, and setting related rules to judge;
And 2, inputting the extracted text features into a machine learning model for discrimination by adopting a text classification method so as to identify the cheating articles.
The applicant has found the following problems in practicing embodiments of the present application:
1) By manually mining the cheating SEO keywords and setting related rules, not only is labor-consuming, but also cheating articles are easy to miss, and the efficiency is low. In addition, since the cheating content in the article is often not determined by a small number of keywords, large-area misjudgment easily occurs when the keywords are adopted for judgment, and many normal articles can be considered as cheating articles only because the articles contain a small number of keywords;
2) The method for classifying the text by adopting the machine learning model can solve the problems caused by manual mining to a certain extent. However, in the articles published by public numbers, with continuous countermeasure of black-producing group partners, the cheating articles are easily bypassed, so that the machine learning model is missed.
In order to solve the problems, the embodiment of the application provides a method for detecting cheating drainage articles based on a graph neural network, which can identify the cheating drainage articles (the cheating articles in the drainage process), construct an account article drainage relation graph (the drainage relation graph between accounts and articles) according to the drainage relation among the articles, extract the titles of the articles or the effective text features in the text through a text classification model (TextCNN) based on the convolutional neural network, update the feature vectors of nodes and edges in the drainage relation graph based on a graph attention network (GAT, graph Attention Network) model, and finally splice the updated feature vectors of the nodes and edges together to perform the two classification of the cheating articles so as to identify the cheating articles. By constructing an account drainage relation graph through the article drainage relation, the back partner operation relation can be deeply mined, and the coverage rate is improved. And modeling the text and the drainage relationship of the article simultaneously by using the graph attention network, so as to ensure the precision and recall rate.
The method for detecting the cheating drainage articles based on the graph neural network can be applied to the search process of one search, and the cheating articles are identified, so that the cheating articles are filtered according to search results. The main search portal 801 shown in fig. 8A is searched for, and the main interface 802 shown in fig. 8B after searching for the search term (query) can be accessed by clicking on the main search portal 801, so that the user can read the article of interest by clicking on the article in fig. 8B.
For search-class products, the quality of the search results greatly affects the user experience. The more good quality articles in the search results, the better the user experience. As shown in fig. 9, when a user clicks a cheating guide article (a cheating guide article maliciously) carried by a public number and then enters a text, the user is induced to scan a two-dimensional code to jump to another public number, the public number continues to induce the jump, the user reaches a destination to recharge after a plurality of jumps, and finally the user is induced to recharge through menu jumps. The user finds it deceived in cases where it takes much time, which is extremely harmful to the user search experience. The masking of such cheating drainage articles is necessary to enhance the user experience. The embodiment of the application combines the text of the article with the drainage relationship, and can filter the cheating drainage article, thereby avoiding the cheating drainage article from appearing in the search result and improving the search quality.
As shown in FIG. 10, the method for detecting the cheating drainage articles based on the graph neural network provided by the embodiment of the application mainly comprises 4 parts, namely feature extraction, construction of an account article drainage relation graph, graph annotation force network model learning and classification of the cheating drainage articles. The following specifically describes the algorithm flow of the cheating drainage article detection method based on the graph neural network:
1) And acquiring all articles for a period of time, and constructing an account article drainage relation diagram according to the relation of two-dimensional code drainage (not limited to two-dimensional code drainage, but also capable of being in other forms such as text, picture linkage, reading text, menu jump and the like in the articles).
As shown in FIG. 11, the circled nodes in the figure represent public numbers (not limited to public numbers, but other outer chains) that create a directed edge when a public number publishes an article and is drained to another public number by a two-dimensional code. As shown in fig. 11, the edges in the account article drainage relationship diagram represent articles, the edges in the solid line represent cheating drainage articles, and the edges in the broken line represent normal drainage articles.
2) Feature vector initialization
For each article, firstly, dividing the title into k words in total, wherein the k words are correspondingly divided into w1, w2, … and wk. Each word is mapped into 1 300-dimensional real vector by using a related model (word 2 vec) for generating word vectors, namely, the article title is mapped into a k-by-300 matrix, namely, the article vector representation, and then the article vector representation is input into a TextCNN model for calculation, and keyword features are extracted as initializing features of edges. The extraction of the article features is not limited to the title, but may be text related features, such as text, number of pictures, video, and other statistical features. And carrying out random initialization on the feature vector of the account node.
3) Graph attention network model learning
As shown in fig. 11, the node and edge vectors are updated by means of a graph attention network, where the edge is updated in combination with the public number of the sender (the start drainage node h src), the public number of the guide (the end drainage node h dst) and the article itself (h e), and the node is updated in combination with all the public numbers of the guide to the public number and the corresponding article titles, where h' src indicates that h src,h′e after updating indicates h e,h′dst after updating indicates h dst after updating. As shown in fig. 12, different weights are assigned to different neighbors in conjunction with the attention mechanism, for example, if a public number publishes a normal drainage article, the corresponding weight will be lower, where h 1 represents h dst (termination drainage node) in fig. 12, h' 1 represents updated h dst,h2、h3、h4、h5、h6 represents a neighbor node of h dst, and α represents the attention weight.
The update formula of the edge is shown in formula (3):
Wherein, Representing a learnable parameter, σ (·) representing an activation function, concat representing a stitching process,/>Vector representing edge e of k-1 layer (i.e., article e)/>Vector representing node src (publically issued) of k-1 layer,/>Vector representing node dst (public number being steered) of k-1 layer,/>Representing the vector of edge e of the k-layer (i.e., article e).
The update formula of the node is shown in formulas (4) - (6):
Wherein, Representing a learnable parameter, σ (·) represents an activation function, concat represents a stitching process, E (dst) represents all edges pointing to dst, i.e. (sre ', dst) sre ' in (sre ', dst) represents a neighbor node pointing to dst.
4) Classification of cheating drainage articles
And for each article, splicing the updated node vector of the public number and the edge vector of the article, inputting the spliced node vector and the edge vector of the article into a classifier formed by a full-connection layer for two classification, and outputting a corresponding cheating probability score.
Before application, the model in the algorithm flow needs to be trained first. And collecting about 2 ten thousand public number articles with labels by adopting a supervised learning mode, wherein the labels are used for representing whether the articles are cheating drainage articles or not. In addition, public number articles (total 80 ten thousand) published in a period of time are selected, and a relationship diagram containing 43 ten thousand public numbers and 49 ten thousand articles is constructed. During training, the model is trained on a training set using a random gradient descent (Adam) optimization algorithm and a binary cross entropy loss (binary cross entropy loss) optimization objective until convergence. The model learns a probability score for each article being a malicious drainage article during the training process.
After model training, public number articles can be processed as online services according to the algorithm flow. The number of layers of GAT set first is 2. The model calculation only considers 2-order neighbors during prediction, so pruning can be performed before prediction, namely, the prediction of an article does not need to calculate the whole drainage relation once, and only the subgraph of the 2-order neighbors is considered. In addition, if the sub-graphs are simply put one by one into a graphics processor (GPU, graphics Processing Unit) to operate during prediction, the efficiency is low, so that the batch sub-graph reasoning (Batch Subgraph Inference) shown in fig. 13 is adopted to combine multiple sub-graphs into an internal independent large graph for parallel computation.
The specific application process is as follows:
Step 1, inputting an article for feature extraction, obtaining an article title and a drainage relation, and searching a sub-image required by second-order neighbor acquisition calculation;
Step 2, inputting the subgraph into a graph annotation meaning network model, and calculating the cheating probability score of the input article to determine whether the input article is a cheating drainage article;
And 3, filtering the article identified as the cheating drainage article from the search result.
In summary, the method for detecting the cheating drainage article based on the graph neural network provided by the embodiment of the application utilizes TextCNN strong local feature extraction capability to obtain the title core keywords, reduces the parameter number, improves the performance and reduces the consumption of computing resources; all articles which consider diversion when updating the nodes by using the graph attention network are distributed with different weights to different neighbors, and the representation of the nodes can be directly generalized; and when the article classification is finally carried out, the article text and the diversion relation are considered at the same time, so that compared with a method which only considers the article text, the accuracy and the recall rate are improved in effect.
The artificial intelligence-based cheating identification method provided by the embodiment of the application has been described so far in connection with the exemplary application and implementation of the cheating identification system provided by the embodiment of the application. The embodiment of the application also provides a cheating identification device, and in practical application, each functional module in the cheating identification device can be cooperatively realized by hardware resources of electronic equipment (such as a server or a server cluster), computing resources such as a processor, communication resources (such as a support for realizing various modes of communication such as optical cables, cells and the like) and a memory. The cheating identification device (fig. 4 shows the cheating identification device 555 stored in the memory 550) may be software in the form of a program, a plug-in, etc., for example, a software module designed by a programming language such as software C/c++, java, etc., an application software designed by a programming language such as C/c++, java, etc., or a dedicated software module in a large software system, an application program interface, a plug-in, a cloud service, etc., and different implementations are exemplified below.
Example one, the cheating identification means is a mobile end application and module
The cheating identification device in the embodiment of the application can be provided as a software module designed by using programming languages such as software C/C++, java and the like, and is embedded into various mobile terminal applications (stored in a storage medium of a mobile terminal by executable instructions and executed by a processor of the mobile terminal) of an Android or iOS-based system and the like, so that the related cheating identification task is completed by directly using the computing resources of the mobile terminal, and the processing result is transmitted to a remote server in a periodic or aperiodic manner through various network communication modes or is stored locally at the mobile terminal.
Example two, the cheating identification device is a server application and platform
The cheating identification device in the embodiment of the application can be provided as application software designed by using programming languages such as C/C++, java and the like or a special software module in a large software system, runs on a server side (is stored in a storage medium of the server side in a mode of executable instructions and is run by a processor of the server side), and the server uses own computing resources to complete related cheating identification tasks.
The embodiment of the application can also be used for carrying a customized and easy-to-interact network (Web) Interface or other User Interfaces (UI) on a distributed and parallel computing platform formed by a plurality of servers to form a cheating identification platform for individuals, groups or units, and the like.
Example three the cheating identification means is a server side application program interface (API, application Program Interface) and plug-in
The cheating identification device in the embodiment of the application can be provided as an API or plug-in on the server side for a user to call so as to execute the cheating identification method based on artificial intelligence in the embodiment of the application and be embedded into various application programs.
Example four, the cheating identification means is a mobile device client API and plug-in
The cheating identification device in the embodiment of the application can be provided as an API or a plug-in of the mobile equipment end for a user to call so as to execute the cheating identification method based on the artificial intelligence in the embodiment of the application.
Example five, the cheating identification device is a cloud open service
The cheating identification device in the embodiment of the application can provide the cheating identification cloud service developed for the user for individuals, groups or units.
The cheating identification device 555 comprises a series of modules, including a feature extraction module 5551, a determination module 5552, a construction module 5553, an update module 5554, a fusion module 5555 and a classification module 5556. The following continues to describe the implementation of the cheating identification scheme by matching each module in the cheating identification device 555 provided by the embodiment of the present application.
The feature extraction module 5551 is configured to perform feature extraction processing on an article to be identified, so as to obtain text features of the article to be identified; a determining module 5552, configured to determine node characteristics of a drainage node of the article to be identified based on the drainage relationship of the article to be identified; a construction module 5553, configured to construct a drainage relationship diagram of the article to be identified based on the article to be identified and the drainage node; an updating module 5554, configured to update text features of the article to be identified and node features of the drainage node based on the drainage relationship diagram of the article to be identified; the fusion module 5555 is configured to perform fusion processing on the updated text feature of the article to be identified and the updated node feature of the drainage node, so as to obtain a fusion feature; and the classification module 5556 is configured to perform a cheating prediction process based on the fusion feature, so as to obtain a probability that the article to be identified belongs to a cheating article.
In some embodiments, the feature extraction module 5551 is further configured to perform feature extraction processing on the title of the article to be identified, so as to obtain the title feature of the article to be identified; performing feature extraction processing on the text of the article to be identified to obtain the text features of the article to be identified; and carrying out fusion processing on the title features of the article to be identified and the text features of the article to be identified to obtain the text features of the article to be identified.
In some embodiments, the feature extraction module 5551 is further configured to perform word segmentation on the title of the article to be identified, so as to obtain a plurality of words of the title; mapping the words of the title to obtain word vectors corresponding to the words respectively; performing splicing processing on word vectors corresponding to the words respectively to obtain a vector matrix of the title; and extracting keywords based on the vector matrix of the title to obtain the title characteristics of the article to be identified.
In some embodiments, the feature extraction module 5551 is further configured to perform convolution processing based on the vector matrix of the header to obtain a plurality of feature graphs of the header; keyword extraction processing is carried out on the feature graphs of the title to obtain a plurality of keyword features of the title; and performing splicing processing on the plurality of keyword features to obtain the title features of the articles to be identified.
In some embodiments, the drainage nodes of the article to be identified include a start drainage node and an end drainage node; the determining module 5552 is further configured to determine an initial drainage node and an end drainage node of the article to be identified based on the drainage relationship of the article to be identified; and respectively carrying out feature extraction processing on the initial drainage node and the termination drainage node to obtain node features of the initial drainage node and node features of the termination drainage node.
In some embodiments, the drainage nodes of the article to be identified include a start drainage node and an end drainage node; the building module 5553 is further configured to determine a neighboring node of the article to be identified based on the article to be identified, the start drainage node, and the end drainage node; taking the article to be identified as an edge between the initial drainage node and the termination drainage node; and constructing a drainage relation diagram of the article to be identified based on edges between the starting drainage node and the ending drainage node, the starting drainage node, the ending drainage node and neighbor nodes of the article to be identified.
In some embodiments, the drainage nodes of the article to be identified include a start drainage node and an end drainage node; the updating module 5554 is further configured to determine, based on the drainage relationship diagram of the article to be identified, a start drainage node of the article to be identified, a stop drainage node of the article to be identified, and a neighboring node of the article to be identified; updating text features of the article to be identified based on the article to be identified, the starting drainage node and the ending drainage node; and updating node characteristics of the initial drainage node and node characteristics of the termination drainage node based on the neighbor nodes, the initial drainage node and the termination drainage node of the article to be identified.
In some embodiments, the update module 5554 is further configured to splice the text feature of the article to be identified, the node feature of the start drainage node, and the node feature of the end drainage node to obtain a spliced feature; mapping processing is carried out based on the splicing characteristics to obtain mapping characteristics; and updating the text characteristics of the article to be identified based on the mapping characteristics.
In some embodiments, the updating module 5554 is further configured to product the stitching feature with a learnable matrix; and carrying out nonlinear mapping processing on the product processing result to obtain the mapping characteristic.
In some embodiments, the updating module 5554 is further configured to determine a neighbor related feature of the starting drainage node based on a neighbor node of the article to be identified; carrying out fusion processing on the neighbor related characteristics of the initial drainage node and the node characteristics of the initial drainage node to obtain fusion characteristics of the initial drainage node; updating node characteristics of the initial drainage node based on the fusion characteristics of the initial drainage node; determining neighbor related features of the termination drainage node based on neighbor nodes of the article to be identified; carrying out fusion processing on the neighbor related characteristics of the termination drainage node and the node characteristics of the termination drainage node to obtain fusion characteristics of the termination drainage node; and updating the node characteristics of the termination drainage node based on the fusion characteristics of the termination drainage node.
In some embodiments, the updating module 5554 is further configured to determine a neighbor node of the termination drainage node based on a neighbor node of the article to be identified; determining edge characteristics of edges between neighbor nodes of the termination drainage node and the termination drainage node; splicing the node characteristics of the neighbor nodes of the termination drainage node with the edge characteristics to obtain splicing characteristics; and performing attention processing based on the splicing characteristics and the node characteristics of the termination drainage node to obtain neighbor related characteristics of the termination drainage node.
In some embodiments, the updating module 5554 is further configured to multiply the node characteristics of the termination drainage node with a learnable matrix; and splicing the product processing result with the neighbor related characteristics of the termination drainage node to obtain the fusion characteristics of the termination drainage node.
In some embodiments, the fusion module 5555 is further configured to splice the updated text feature of the article to be identified with the updated node feature of the drainage node, so as to obtain the fusion feature; or adding the updated text features of the articles to be identified and the updated node features of the drainage nodes to obtain the fusion features.
Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device executes the artificial intelligence-based cheating identification method according to the embodiment of the present application.
Embodiments of the present application provide a computer readable storage medium having stored therein executable instructions that, when executed by a processor, cause the processor to perform the artificial intelligence based cheating identification method provided by embodiments of the present application, for example, as shown in fig. 5A-5C.
In some embodiments, the computer readable storage medium may be FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; but may be a variety of devices including one or any combination of the above memories.
In some embodiments, the executable instructions may be in the form of programs, software modules, scripts, or code, written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages), and they may be deployed in any form, including as stand-alone programs or as modules, components, subroutines, or other units suitable for use in a computing environment.
As an example, executable instructions may, but need not, correspond to files in a file system, may be stored as part of a file that holds other programs or data, such as in one or more scripts in a hypertext markup language (HTML, hyper Text Markup Language) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).
As an example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices located at one site or distributed across multiple sites and interconnected by a communication network.
The foregoing is merely exemplary embodiments of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims (12)

1. An artificial intelligence based cheating identification method, the method comprising:
performing feature extraction processing on an article to be identified to obtain text features of the article to be identified;
Determining node characteristics of drainage nodes of the articles to be identified based on the drainage relations of the articles to be identified;
constructing a drainage relation diagram of the article to be identified based on the article to be identified and the drainage node; the drainage nodes of the articles to be identified comprise an initial drainage node and a termination drainage node;
Determining an initial drainage node of the article to be identified, a termination drainage node of the article to be identified and a neighbor node of the article to be identified based on the drainage relation diagram of the article to be identified;
Splicing the text features of the articles to be identified, the node features of the starting drainage nodes and the node features of the ending drainage nodes to obtain splicing features;
Mapping processing is carried out based on the splicing characteristics to obtain mapping characteristics; updating the text features of the article to be identified based on the mapping features;
Determining neighbor related features of the initial drainage node based on neighbor nodes of the article to be identified;
carrying out fusion processing on the neighbor related characteristics of the initial drainage node and the node characteristics of the initial drainage node to obtain fusion characteristics of the initial drainage node;
updating node characteristics of the initial drainage node based on the fusion characteristics of the initial drainage node;
determining neighbor related features of the termination drainage node based on neighbor nodes of the article to be identified;
Carrying out fusion processing on the neighbor related characteristics of the termination drainage node and the node characteristics of the termination drainage node to obtain fusion characteristics of the termination drainage node;
Updating node characteristics of the termination drainage node based on the fusion characteristics of the termination drainage node;
Carrying out fusion processing on the updated text characteristics of the article to be identified, the updated node characteristics of the initial drainage node and the updated node characteristics of the termination drainage node to obtain fusion characteristics;
and performing cheating prediction processing based on the fusion characteristics to obtain the probability that the article to be identified belongs to the cheating article.
2. The method of claim 1, wherein the feature extraction of the article to be identified to obtain text features of the article to be identified comprises:
performing feature extraction processing on the titles of the articles to be identified to obtain the title features of the articles to be identified;
Performing feature extraction processing on the text of the article to be identified to obtain the text features of the article to be identified;
And carrying out fusion processing on the title features of the article to be identified and the text features of the article to be identified to obtain the text features of the article to be identified.
3. The method of claim 2, wherein the performing feature extraction on the title of the article to be identified to obtain the title feature of the article to be identified comprises:
Word segmentation processing is carried out on the titles of the articles to be identified, so that a plurality of words of the titles are obtained;
mapping the words of the title to obtain word vectors corresponding to the words respectively;
Performing splicing processing on word vectors corresponding to the words respectively to obtain a vector matrix of the title;
and extracting keywords based on the vector matrix of the title to obtain the title characteristics of the article to be identified.
4. The method of claim 3, wherein the keyword extraction processing is performed based on the vector matrix of the headlines to obtain headline features of the article to be identified, including:
performing convolution processing based on the vector matrix of the title to obtain a plurality of feature graphs of the title;
keyword extraction processing is carried out on the feature graphs of the title to obtain a plurality of keyword features of the title;
And performing splicing processing on the plurality of keyword features to obtain the title features of the articles to be identified.
5. The method of claim 1, wherein the determining node characteristics of the drainage nodes of the articles to be identified based on the drainage relationships of the articles to be identified comprises:
Determining an initial drainage node and a termination drainage node of the article to be identified based on the drainage relation of the article to be identified;
and respectively carrying out feature extraction processing on the initial drainage node and the termination drainage node to obtain node features of the initial drainage node and node features of the termination drainage node.
6. The method of claim 1, wherein the constructing a drainage relationship graph of the article to be identified based on the article to be identified and the drainage node comprises:
determining neighbor nodes of the article to be identified based on the article to be identified, the starting drainage node and the ending drainage node;
taking the article to be identified as an edge between the initial drainage node and the termination drainage node;
And constructing a drainage relation diagram of the article to be identified based on edges between the starting drainage node and the ending drainage node, the starting drainage node, the ending drainage node and neighbor nodes of the article to be identified.
7. The method of claim 1, wherein the mapping based on the stitching feature to obtain a mapping feature comprises:
performing product processing on the spliced features and the learnable matrix;
and carrying out nonlinear mapping processing on the product processing result to obtain the mapping characteristic.
8. The method of claim 1, wherein the determining the neighbor related features of the termination drainage node based on the neighbor nodes of the article to be identified comprises:
Determining neighbor nodes of the termination drainage node based on the neighbor nodes of the article to be identified;
determining edge characteristics of edges between neighbor nodes of the termination drainage node and the termination drainage node;
splicing the node characteristics of the neighbor nodes of the termination drainage node with the edge characteristics to obtain splicing characteristics;
And performing attention processing based on the splicing characteristics and the node characteristics of the termination drainage node to obtain neighbor related characteristics of the termination drainage node.
9. The method according to claim 1, wherein the fusing the neighbor related features of the termination and drainage nodes and the node features of the termination and drainage nodes to obtain the fused features of the termination and drainage nodes includes:
performing product processing on node characteristics of the termination drainage nodes and a learnable matrix;
and splicing the product processing result with the neighbor related characteristics of the termination drainage node to obtain the fusion characteristics of the termination drainage node.
10. An artificial intelligence based cheating identification device, the device comprising:
the feature extraction module is used for carrying out feature extraction processing on the article to be identified to obtain text features of the article to be identified;
The determining module is used for determining node characteristics of drainage nodes of the articles to be identified based on the drainage relations of the articles to be identified;
The construction module is used for constructing a drainage relation diagram of the article to be identified based on the article to be identified and the drainage node; the drainage nodes of the articles to be identified comprise an initial drainage node and a termination drainage node;
the updating module is used for determining an initial drainage node of the article to be identified, a termination drainage node of the article to be identified and a neighbor node of the article to be identified based on the drainage relation diagram of the article to be identified;
Splicing the text features of the articles to be identified, the node features of the starting drainage nodes and the node features of the ending drainage nodes to obtain splicing features;
Mapping processing is carried out based on the splicing characteristics to obtain mapping characteristics; updating the text features of the article to be identified based on the mapping features;
Determining neighbor related features of the initial drainage node based on neighbor nodes of the article to be identified;
carrying out fusion processing on the neighbor related characteristics of the initial drainage node and the node characteristics of the initial drainage node to obtain fusion characteristics of the initial drainage node;
updating node characteristics of the initial drainage node based on the fusion characteristics of the initial drainage node;
determining neighbor related features of the termination drainage node based on neighbor nodes of the article to be identified;
Carrying out fusion processing on the neighbor related characteristics of the termination drainage node and the node characteristics of the termination drainage node to obtain fusion characteristics of the termination drainage node;
Updating node characteristics of the termination drainage node based on the fusion characteristics of the termination drainage node;
The fusion module is used for carrying out fusion processing on the updated text characteristics of the article to be identified, the updated node characteristics of the initial drainage node and the updated node characteristics of the termination drainage node to obtain fusion characteristics;
And the classification module is used for carrying out cheating prediction processing based on the fusion characteristics to obtain the probability that the article to be identified belongs to the cheating article.
11. An electronic device, the electronic device comprising:
A memory for storing executable instructions;
a processor for implementing the artificial intelligence based cheating identification method of any one of claims 1 to 9 when executing executable instructions stored in said memory.
12. A computer readable storage medium storing executable instructions for implementing the artificial intelligence based cheating identification method of any of claims 1 to 9 when executed by a processor.
CN202110176521.2A 2021-02-09 2021-02-09 Cheating identification method, device, equipment and storage medium based on artificial intelligence Active CN113011167B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110176521.2A CN113011167B (en) 2021-02-09 2021-02-09 Cheating identification method, device, equipment and storage medium based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110176521.2A CN113011167B (en) 2021-02-09 2021-02-09 Cheating identification method, device, equipment and storage medium based on artificial intelligence

Publications (2)

Publication Number Publication Date
CN113011167A CN113011167A (en) 2021-06-22
CN113011167B true CN113011167B (en) 2024-04-23

Family

ID=76384576

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110176521.2A Active CN113011167B (en) 2021-02-09 2021-02-09 Cheating identification method, device, equipment and storage medium based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN113011167B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113536087B (en) * 2021-06-30 2022-05-17 北京百度网讯科技有限公司 Method, device, equipment, storage medium and program product for identifying cheating sites
CN115269851B (en) * 2022-08-04 2024-04-16 腾讯科技(深圳)有限公司 Article classification method, apparatus, electronic device, storage medium and program product
CN117172245A (en) * 2023-05-26 2023-12-05 国家计算机网络与信息安全管理中心 Control method and control system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101127046A (en) * 2007-09-25 2008-02-20 腾讯科技(深圳)有限公司 Method and system for sequencing to blog article
CN107491432A (en) * 2017-06-20 2017-12-19 北京百度网讯科技有限公司 Low quality article recognition methods and device, equipment and medium based on artificial intelligence
CN110688540A (en) * 2019-10-08 2020-01-14 腾讯科技(深圳)有限公司 Cheating account screening method, device, equipment and medium
CN112035671A (en) * 2020-11-05 2020-12-04 腾讯科技(深圳)有限公司 State detection method and device, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101127046A (en) * 2007-09-25 2008-02-20 腾讯科技(深圳)有限公司 Method and system for sequencing to blog article
CN107491432A (en) * 2017-06-20 2017-12-19 北京百度网讯科技有限公司 Low quality article recognition methods and device, equipment and medium based on artificial intelligence
CN110688540A (en) * 2019-10-08 2020-01-14 腾讯科技(深圳)有限公司 Cheating account screening method, device, equipment and medium
CN112035671A (en) * 2020-11-05 2020-12-04 腾讯科技(深圳)有限公司 State detection method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN113011167A (en) 2021-06-22

Similar Documents

Publication Publication Date Title
CN113011167B (en) Cheating identification method, device, equipment and storage medium based on artificial intelligence
CN112165462A (en) Attack prediction method and device based on portrait, electronic equipment and storage medium
KR20200002332A (en) Terminal apparatus and method for searching image using deep learning
CN111538929B (en) Network link identification method and device, storage medium and electronic equipment
WO2010040125A1 (en) Systems and methods for automatic creation of agent-based systems
CN115511501A (en) Data processing method, computer equipment and readable storage medium
CN111563192A (en) Entity alignment method and device, electronic equipment and storage medium
CN115080836A (en) Information recommendation method and device based on artificial intelligence, electronic equipment and storage medium
CN115757991A (en) Webpage identification method and device, electronic equipment and storage medium
CN112015896A (en) Emotion classification method and device based on artificial intelligence
CN114580794B (en) Data processing method, apparatus, program product, computer device and medium
CN114756768B (en) Data processing method, device, equipment, readable storage medium and program product
CN116226850A (en) Method, device, equipment, medium and program product for detecting virus of application program
CN113935738B (en) Transaction data processing method, device, storage medium and equipment
CN113010772A (en) Data processing method, related equipment and computer readable storage medium
Rungta et al. Two-phase multimodal neural network for app categorization using APK resources
CN114139165B (en) Intelligent contract vulnerability detection method based on multi-target recognition in machine learning
CN114820085B (en) User screening method, related device and storage medium
CN117435995B (en) Biological medicine classification method based on residual map network
CN117009954A (en) Data processing method and device
CN113722636A (en) Data processing method and device, computer equipment and storage medium
CN117390577A (en) Feature acquisition method, device, apparatus, storage medium, and program product
Sachdeva et al. Identifying Patterns of Vulnerability Incidence in Foundational Machine Learning Repositories on GitHub: An Unsupervised Graph Embedding Approach
CN117171562A (en) Training method and device of intent prediction model, electronic equipment and storage medium
CN114445757A (en) Nomination obtaining method, network training method, device, storage medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant