CN108830630A - A kind of recognition methods and its equipment of spoofing - Google Patents
A kind of recognition methods and its equipment of spoofing Download PDFInfo
- Publication number
- CN108830630A CN108830630A CN201810309691.1A CN201810309691A CN108830630A CN 108830630 A CN108830630 A CN 108830630A CN 201810309691 A CN201810309691 A CN 201810309691A CN 108830630 A CN108830630 A CN 108830630A
- Authority
- CN
- China
- Prior art keywords
- text
- propagation
- user
- matrix
- carrier
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 239000013598 vector Substances 0.000 claims abstract description 233
- 239000011159 matrix material Substances 0.000 claims abstract description 161
- 238000004364 calculation method Methods 0.000 claims description 60
- 238000013528 artificial neural network Methods 0.000 claims description 30
- 230000001902 propagating effect Effects 0.000 claims description 24
- 238000006243 chemical reaction Methods 0.000 claims description 23
- 230000008676 import Effects 0.000 claims description 23
- 230000000644 propagated effect Effects 0.000 claims description 21
- 230000006870 function Effects 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 14
- 230000000306 recurrent effect Effects 0.000 claims description 13
- 125000004122 cyclic group Chemical group 0.000 claims description 10
- 238000000354 decomposition reaction Methods 0.000 claims description 6
- 230000004931 aggregating effect Effects 0.000 claims description 4
- 230000004927 fusion Effects 0.000 claims description 4
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 238000011835 investigation Methods 0.000 abstract description 8
- 230000010365 information processing Effects 0.000 abstract description 2
- 230000007480 spreading Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 6
- 238000000605 extraction Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000012549 training Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000009131 signaling function Effects 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0207—Discounts or incentives, e.g. coupons or rebates
- G06Q30/0225—Avoiding frauds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/018—Certifying business or products
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0248—Avoiding fraud
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0609—Buyer or seller confidence or verification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
- G06Q50/265—Personal security, identity or safety
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Strategic Management (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Marketing (AREA)
- Economics (AREA)
- General Business, Economics & Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Tourism & Hospitality (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Game Theory and Decision Science (AREA)
- Educational Administration (AREA)
- Primary Health Care (AREA)
- Data Mining & Analysis (AREA)
- Computer Security & Cryptography (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention is suitable for technical field of information processing, provides the recognition methods and its equipment of a kind of spoofing, including:Obtain the propagation path of multiple carrier texts comprising target message and each carrier text;Mark based on carrier text and propagation user, obtains the text matrix of each carrier text;Each text matrix is directed into preset feature vector computation model, obtains the Text eigenvector of target message;According to the propagation path of all carrier texts, user's propogator matrix about target message is generated;User's propogator matrix is imported into preset user characteristics computation model, obtains the corresponding user's propagation characteristic vector of target message;According to user's propagation characteristic vector and Text eigenvector, the true and false index of target message is calculated;If identifying that the target message is spoofing in preset false index range.The present invention reduces the time of cost of labor and investigation, improves spoofing recognition accuracy without artificial investigation evidence obtaining.
Description
Technical Field
The invention belongs to the technical field of information processing, and particularly relates to a false message identification method and equipment.
Background
A fake message, or "rumor," refers to a message that is created without the fact that it exists. False messages can falsely influence the public opinion, leading people to make wrong choices. Particularly in the field of financial investments, false messages may cause investors to make wrong investment choices, even cause panic to investors, cause confusion in the investment of the economic market, and increase the risk of economic property loss for users. Therefore, it is important to accurately identify whether the target message is a false message.
The existing technology for identifying the false message needs to perform relevant investigation on a target message to determine whether the target message is the false message. However, the above method requires a lot of manpower to perform thread exploration, and especially when the target message has a plurality of places and is no longer the same as the location of the investigator, a lot of time cost and manpower cost are required, and the recognition efficiency is low.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for identifying a false message, so as to solve the problems that the existing method for identifying a false message needs to consume a large amount of time cost and labor cost, and has low identification efficiency.
A first aspect of an embodiment of the present invention provides a method for identifying a false message, including:
acquiring a plurality of carrier texts containing target messages and a propagation path of each carrier text; the propagation path comprises an identification of a propagation user propagating the carrier text;
obtaining a text matrix of each carrier text based on the carrier text and the identifier of the propagation user;
importing each text matrix into a preset feature vector calculation model to obtain a text feature vector of the target message;
generating a user propagation matrix related to the target message according to the propagation paths of all the carrier texts; each element contained in the user propagation matrix is specifically the number of the carrier texts propagated by each propagation user;
importing the user propagation matrix into a preset user characteristic calculation model to obtain a user propagation characteristic vector corresponding to the target message;
calculating the authenticity index of the target message according to the user propagation feature vector and the text feature vector;
and if the authenticity index is within a preset false index range, identifying the target message as a false message.
A second aspect of embodiments of the present invention provides an apparatus for identifying false messages, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the first aspect when executing the computer program.
A third aspect of embodiments of the present invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, performs the steps of the first aspect.
The method and the device for identifying the false message provided by the embodiment of the invention have the following beneficial effects:
the method comprises the steps of obtaining all carrier texts containing target messages and propagation paths of all the carrier texts, obtaining text matrixes of all the carrier texts through the carrier texts and identifications of propagation users contained in the propagation paths, and obtaining text characteristic vectors of the target messages through a plurality of text matrixes; meanwhile, a user propagation matrix is obtained through the propagation path of each carrier text, and then the user propagation characteristic vector of the target message is obtained through calculation; and finally, calculating the authenticity index of the target message based on the user propagation characteristic vector and the text characteristic vector, and identifying whether the target message is a false message or not through the authenticity index. Compared with the existing false message identification technology, the embodiment does not need manual investigation and evidence collection, thereby reducing the labor cost and the time required by investigation, but can show whether the target message has the flaring characteristic or not through the text characteristic vector and whether the target message has the outbreak spreading property or not in the spreading process or not through the user characteristic vector by collecting the text characteristic of the carrier text for transferring the target message and analyzing the user characteristic of each spreading user spreading the target message, thereby obtaining the false index of the target message through the two characteristic vectors, further identifying whether the target message is the false message or not, and improving the identification accuracy of the false message.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a flowchart of an implementation of a false message identification method according to a first embodiment of the present invention;
fig. 2 is a flowchart illustrating an implementation details of a false message identification method S103 according to a second embodiment of the present invention;
fig. 3 is a flowchart illustrating an implementation of the method for identifying false messages S105 according to the third embodiment of the present invention;
fig. 4a is a flowchart illustrating an implementation details of a false message identification method S106 according to a fourth embodiment of the present invention;
fig. 4b is a block diagram of a calculation model of an authenticity index according to an embodiment of the present invention;
fig. 5 is a flowchart illustrating an implementation details of a false message identification method S102 according to a fourth embodiment of the present invention;
fig. 6 is a block diagram illustrating a structure of a device for identifying false messages according to an embodiment of the present invention;
fig. 7 is a schematic diagram of a device for identifying false messages according to another embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The method comprises the steps of obtaining all carrier texts containing target messages and propagation paths of all the carrier texts, obtaining text matrixes of all the carrier texts through the carrier texts and identifications of propagation users contained in the propagation paths, and obtaining text characteristic vectors of the target messages through a plurality of text matrixes; meanwhile, a user propagation matrix is obtained through the propagation path of each carrier text, and then the user propagation characteristic vector of the target message is obtained through calculation; and finally, calculating the authenticity index of the target message based on the user propagation characteristic vector and the text characteristic vector, and identifying whether the target message is a false message or not through the authenticity index, thereby solving the problems that the existing false message identification method needs to consume a large amount of time cost and labor cost, and has lower identification efficiency.
In the embodiment of the invention, the execution subject of the flow is the identification device of the false message. The identification device of the false message includes but is not limited to: the device comprises a notebook computer, a server, a tablet computer, a smart phone and the like. In particular, the identification method of the false message may be a server of a network platform, so that various propagation parameters such as forwarding amount, propagation speed, propagation path and the like of each propagation text on the network platform can be obtained. Fig. 1 shows a flowchart of an implementation of the false message identification method according to the first embodiment of the present invention, which is detailed as follows:
in S101, acquiring a plurality of carrier texts containing target messages and propagation paths of the carrier texts; the propagation path includes an identification of a propagating user that propagates the carrier text.
In this embodiment, the target message may be set by the user, that is, when the user needs to determine the authenticity of a certain message, the content of the target message may be input into the identification device of the false message provided in this embodiment, or a message carrier such as an article, a link, etc. containing the message may be sent to the identification device, and then the identification device determines the target message from the message carrier. Optionally, the identification device may further set a detection period to periodically detect the authenticity of the message propagated in the network platform where the identification device is located. In this case, the recognition device collects the carrier texts included in the network platform in a preset detection period, extracts the target messages from each carrier text propagated by the network platform based on a preset target message extraction condition, and performs the relevant operation of S101.
Optionally, the preset target message extraction condition may be: extracting text keywords from each carrier text based on a semantic recognition algorithm, and counting the occurrence times of the same text keywords in each carrier text; and if the occurrence frequency of a certain text keyword is greater than a preset frequency threshold value, determining that the message corresponding to the text keyword is the target message.
In this embodiment, the message is propagated by using various carriers, for example, in text forms such as articles, comments, and chat records, and the text carrying the target message is the above-mentioned carrier text. After determining the target message, the identification device may query whether each text contains the target message, and if a certain text in the network platform contains the target message, identify the text as a carrier text. Preferably, since the dummy message is time-limited, i.e. the burst period of the dummy message propagation will be in a short time range of one week or more than ten days, but will not persist for a longer time, e.g. the propagation of the dummy message has already started before one year or even earlier without being discovered. In order to reduce the processing amount of the carrier texts by the recognition equipment, a valid time range is set, namely, the texts with the creation time within the valid time range and containing the target messages are acquired and recognized as the carrier texts, and the texts with the creation time outside the valid time range are not recognized, so that the processing efficiency is improved, and a large amount of invalid texts are effectively screened out.
In this embodiment, the identification device may obtain a propagation path of the carrier text, where the propagation path is specifically a path through which the carrier text flows between propagation users in the network platform, and therefore the propagation path may include an identifier of the propagation user that has propagated the carrier text. The identifier of the propagation user may be a user name, a user account, or user information of the propagation user. Preferably, in this embodiment, user information of the propagation user is adopted, and since the same entity user can register a plurality of different user accounts in the network platform and have a plurality of user names, different user names or entity persons possibly corresponding to the user accounts are the same, but the occurrence of the above situation can be avoided by adopting the user information, because the user information, such as an identity card number and the like, is unique, it is ensured that the entity persons corresponding to the same user information are also the same, and the efficiency of false message identification is improved.
In S102, a text matrix of each carrier text is obtained based on the carrier text and the identifier of the propagation user.
In this embodiment, the same propagation user may propagate a plurality of carrier texts related to the target message, and the same carrier text may be propagated by a plurality of different propagation users, so that, in order to accurately determine the propagation condition of the target message, the recognition device may determine, according to the propagation path of the carrier text, user identifiers of all the propagation users that propagate the carrier text, and construct a text matrix for each carrier text based on the user identifiers of the propagation users.
Preferably, the text matrix may include text content information of the carrier text in addition to the user identification information of the carrier text dissemination user. In this case, the recognition device performs a keyword extraction operation on the carrier text to determine keywords contained in the carrier text. It should be noted that the extracted keywords are keywords associated with the target message, after the target message is determined, the recognition device determines candidate keywords associated with the target message, determines which candidate keywords are included in the carrier text, determines content characteristic parameters of the carrier text based on the candidate keywords obtained by recognition, and then constructs a text matrix of the carrier text based on the content characteristic parameters and the identifier of the propagation user.
In S103, importing each text matrix into a preset feature vector calculation model to obtain a text feature vector of the target message.
In this embodiment, since an important characteristic of the dummy message is the explosion and the universality of the propagation speed, the carrier text containing the dummy message also has the above two characteristics. And the text matrix generated by each carrier text according to the corresponding spreading user identification can represent the relevant characteristics of the carrier text in the user spreading angle, judge whether the carrier text has explosiveness and universality, if the carrier text has explosiveness and universality, the carrier text is likely to carry false messages, and as one carrier text may contain various different messages, the text matrix of each carrier text needs to be identified in order to determine whether the explosive spreading is caused by the target message. Therefore, after the recognition device generates the text matrix of each carrier text, it is necessary to import each text matrix into a preset feature vector calculation model, and determine the text feature vector of the target message as one of the reference parameters for identifying the authenticity of the target message.
It should be noted that, because S102 and S103 are text feature vectors for calculating the target message, and S104 and S105 are user feature vectors for calculating the target message, that is, there is no precedence order between the above two major steps, the terminal device may first execute S102 and S103, and then execute S104 and S105; or executing S104 and S105 first and then executing S102 and S103. Preferably, if the recognition device can perform the dual thread calculation concurrently, the operations of S102 and S104 can be performed simultaneously.
In S104, generating a user propagation matrix about the target message according to the propagation paths of all the carrier texts; each element contained in the user propagation matrix is specifically the number of the carrier texts propagated by each propagation user.
In this embodiment, as described above, one propagation user may propagate a plurality of carrier texts containing a target message at the same time, and therefore, in order to determine the number of the carrier texts propagated by each propagation user, it is necessary to count the number of the carrier texts propagated by each propagation user according to the propagation path of each carrier text, and obtain a user propagation matrix corresponding to the target message. For the false message, the generation user of the false message, i.e. the rumor, intentionally and continuously broadcasts the carrier text related to the false message, i.e. the number of the broadcast carrier files for the rumor is a large proportion of the total broadcast volume of the carrier text, while the ordinary broadcast user of the non-rumor, whose number of the broadcast carrier files is limited, is a scattered broadcast behavior, therefore, through the user broadcast matrix, it can better represent whether there is the target message maliciously broadcast by the rumor, thereby judging whether the target message is the false message.
Alternatively, the identification device may create a propagation user mesh, and draw the propagation path of each carrier text on the propagation user mesh according to the propagation path of each carrier text, and if the propagation path passes through one propagation user in the mesh, add 1 to the number of the propagation texts of the propagation user, so that after all the propagation paths are drawn, the number of the carrier texts propagated by each propagation user may be determined, and the user propagation matrix may be generated.
Preferably, in this embodiment, the order of each propagation user in the user propagation matrix is consistent with the propagation order on the propagation path. That is, if a certain broadcast user is the author of the carrier file, i.e. the first broadcaster, the order of the matrix of the broadcast users is 1, and so on. If multiple users are in the same propagation order, the propagation users in the same propagation order may be sorted again based on the number of the propagation carrier texts, and an array formed by the number of the carrier texts propagated by the propagation users in the same propagation order may also be used as an element of the order in the user propagation matrix.
In S105, the user propagation matrix is imported to a preset user feature calculation model, so as to obtain a user propagation feature vector corresponding to the target message.
In this embodiment, the propagation rule of the target message between the propagating users can be determined through the user propagation matrix, in order to extract the user propagation characteristics about the propagating users, the identification device imports the user propagation matrix of the target message into the user characteristic calculation model, determines the user propagation characteristic vector about the target message, and determines whether the user propagation characteristic vector conforms to the user propagation characteristics of the false message through the user, so that the user propagation characteristic vector can be used as one of the reference parameters for subsequently calculating the authenticity index.
In S106, the authenticity index of the target message is calculated according to the user propagation feature vector and the text feature vector.
In this embodiment, after determining the user propagation feature vector and the text feature vector of the target message, the identification device may calculate the authenticity index of the target message. The specific calculation method may be as follows: and importing the user propagation characteristic vector and the text characteristic vector into a preset authenticity index calculation model, and obtaining the authenticity index of the target message after conversion through the authenticity index calculation model. Preferably, the authenticity index calculation model may be a neural network. And the administrator generates corresponding user propagation characteristic vectors and text characteristic vectors through the training messages, introduces the user propagation characteristic vectors and the text characteristic vectors into the neural network for calculating the authenticity index, adjusts each parameter in the neural network to enable the value of the loss function of the neural network to be minimum, and then takes the adjusted neural network as an authenticity index calculation model. Specifically, the expression of the loss function of the neural network is specifically as follows:
wherein L isjIs an actual authenticity index of the training message.Is a preset regularization term.And (4) transmitting the characteristic vector and the text characteristic vector for the user of the training message, and then leading the characteristic vector into a true and false index calculation model, and calculating to obtain the true and false index. And N is the total number of the training messages.
Optionally, in addition to determining the authenticity index of the target message through the authenticity index calculation model, the text feature vector and each parameter value contained in the user feature vector may be compared with a preset false parameter range, the number of parameter values whose parameter values fall into the false parameter range is counted, the number of parameter values is used as the authenticity index of the target message, and the similarity between the target message and the false message can be represented through the authenticity index.
In S107, if the authenticity index is within a preset false index range, the target message is identified as a false message.
In this embodiment, the identification device has only a false index range, and if the authenticity index calculated from a certain target message is in the false index range, it indicates that the target message conforms to the characteristics of the false message in both text characteristics and propagation user characteristics, so that the target message is identified as the false message; otherwise, if the authenticity index of the target message is out of the false index range, the target message is not consistent with the characteristics of the false message, and the target message is identified as a real message.
As can be seen from the above, the method for identifying a false message provided in the embodiment of the present invention obtains text matrices of each carrier text through acquiring all carrier texts containing a target message and propagation paths of each carrier text, and through the carrier texts and identifications of propagation users contained in the propagation paths, and obtains text feature vectors of the target message through a plurality of text matrices; meanwhile, a user propagation matrix is obtained through the propagation path of each carrier text, and then the user propagation characteristic vector of the target message is obtained through calculation; and finally, calculating the authenticity index of the target message based on the user propagation characteristic vector and the text characteristic vector, and identifying whether the target message is a false message or not through the authenticity index. Compared with the existing false message identification technology, the embodiment does not need manual investigation and evidence collection, thereby reducing the labor cost and the time required by investigation, but can show whether the target message has the flaring characteristic or not through the text characteristic vector and whether the target message has the outbreak spreading property or not in the spreading process or not through the user characteristic vector by collecting the text characteristic of the carrier text for transferring the target message and analyzing the user characteristic of each spreading user spreading the target message, thereby obtaining the false index of the target message through the two characteristic vectors, further identifying whether the target message is the false message or not, and improving the identification accuracy of the false message.
Fig. 2 shows a flowchart of a specific implementation of the false message identification method S103 according to the second embodiment of the present invention. Referring to fig. 2, with respect to the embodiment described in fig. 1, S103 in the method for identifying a false message provided in this embodiment includes S1031 to S1034, which are detailed as follows:
in S1031, the propagation times, the content feature parameters, and the propagation time parameters of the respective carrier texts are acquired.
In this embodiment, in order to improve the accuracy of the text feature vector, the identification device of the false message obtains the propagation times, content feature parameters, and propagation time parameters of each carrier text in addition to the propagation path of each carrier text, and performs the authenticity attribute determination on multiple aspects of the carrier text.
Specifically, the propagation times include, in addition to the times of forwarding the carrier text by the user, the times of propagating the comments on the carrier text by the user and the times of agreeing on the carrier text, that is, the times of various behaviors contributing to the propagation of the carrier text. The content characteristic parameter is specifically used to represent content information that needs to be expressed by the carrier text, and the extraction manner may be as described in S102, and the content characteristic parameter of the carrier text is determined by determining a keyword included in the carrier text and then by using the keyword obtained by the extraction. And the propagation time parameter includes, but is not limited to, at least one of: carrier text creation time, average propagation interval, total propagation duration, etc.
In S1032, the respective carrier texts are sorted based on the propagation time parameter, and an import order of the respective carrier texts is determined.
Since the multilayer feedback recurrent neural network adopted in this embodiment determines the text feature vector of the target message, it is necessary to preset and determine the import order of each carrier text into the multilayer recurrent neural network, that is, the circulation level where the carrier text is located. If the hierarchy of the multi-layer feedback cyclic neural network is larger than the number of the carrier texts, the number of layers of the multi-layer feedback cyclic neural network is reduced during the import operation so as to be matched with the number of the carrier texts.
In this embodiment, the recognition device determines the import order of each carrier text according to the propagation time parameter, wherein the import order is determined according to the type of the parameter included in the propagation time parameter. For example, if the propagation time parameter is the creation time of the carrier text, the import order of each carrier text may be determined according to the sequence of the creation time; if the propagation time parameter is the total propagation time, the import order of each carrier text can be determined according to the sequence of the time length of the total propagation time.
In S1033, importing the propagation times, the content feature parameters, the propagation time parameters, and the text matrix into a text time sequence vector conversion model to obtain a text time sequence vector of each of the carrier texts; the text time sequence vector conversion model specifically comprises the following steps:
wherein,the text time sequence vector of the carrier text with the lead-in order T, η the propagation times, Delta T the propagation time parameter, xuIs the text matrix;ta fusion matrix of vector texts with the import order of t is obtained;τthe content characteristic parameter is obtained; waAnd baAnd adjusting the preset coefficient of the text time sequence vector conversion model.
In this embodiment, the recognition device first identifies the content based on the propagation times, the content characteristic parameter, the propagation time parameter, and the text matrix,constructing a text feature matrix of the carrier text, namely x as described abovetThe construction method can add 3 matrix rows on the basis of the text matrix, and the matrix rows are respectively used for storing three groups of characteristic quantities of the propagation times, the content characteristic parameters and the propagation time parameters, namely if the text matrix is an n-dimensional matrix, the corresponding text characteristic matrix is an n + 3-dimensional matrix.
Because the multilayer recurrent neural network is a neural network with a time sequence relation, time sequence conversion needs to be carried out on the text feature matrix between importations, namely, the text time sequence vector of the carrier text is determined. The tan h function is adopted in the embodiment because the function has better nonlinearity and is matched with the time sequence characteristic. Therefore, the recognition device imports the text feature matrix into the tanh function, and determines the text time sequence vector corresponding to each carrier text.
In S1034, based on the importing order, importing the text timing vector of each carrier text to each level of the multi-layer feedback recurrent neural network, so as to obtain a text feature vector of the target message; the multilayer feedback cyclic neural network specifically comprises:
wherein h is0The method comprises the steps of (1) setting a preset initial text vector;a text time sequence vector for each carrier text; h is1、h2…ht-1The text feature iteration intermediate value output for each level of the multilayer feedback cyclic neural network; h istA text feature vector of the target message; w, U, b is the adjustment factor.
In this embodiment, the recognition device sequentially imports the text time sequence vectors of the carrier texts into each hierarchy in the multilayer feedback recurrent neural network based on the import sequence of the carrier texts, the output of each hierarchy is used as the input of the next hierarchy, the time sequence characteristics of the carrier texts are continuously overlapped, and thus the calculated text feature vectors are output vectors based on the overlapping influence of the carrier texts, and the text features of the texts are fully fused.
In this embodiment, the recognition device takes the output of the last layer of the recurrent neural network as the text feature vector of the target message. Before extracting the multiple layers of recurrent neural networks, the recognition device adjusts the hierarchy of each multiple layers of recurrent neural networks according to the number of the carrier texts of the target message, so that the hierarchy of each multiple layers of recurrent neural networks is matched with the number of the carrier texts.
In the embodiment of the invention, the text time sequence vector of each carrier text is determined by collecting a plurality of parameter values of the carrier text, and the text feature vector of the target message is calculated based on the multilayer recurrent neural network, so that the richness of the text feature vector to the text characteristic can be improved, and the accuracy of false message identification is improved.
Fig. 3 shows a flowchart of a specific implementation of the false message identification method S105 according to the third embodiment of the present invention. Referring to fig. 3, in comparison with the embodiment shown in fig. 1, the method for identifying a false message S105 provided in this embodiment further includes S1051 to S1055, which are detailed as follows:
in S1051, singular value decomposition is performed on the user propagation matrix to obtain a user propagation coefficient of each propagation user.
In this embodiment, since the user propagation matrix is a global matrix for all propagation users, if the user propagation coefficient of each propagation user needs to be determined, singular value decomposition needs to be performed on the user propagation matrix, so that the contribution situation of different propagation users in propagating the target message can be determined. Specifically, if the user propagation matrix is a 1 × N matrix, the diagonal matrix subjected to the singular decomposition is a regular matrix of 1 × 1, so that the user propagation matrix can be decomposed into N matrices of 1 × 1, and the user propagation coefficients of the respective propagation users can be identified.
In S1052, introducing each user propagation coefficient into a propagation feature vector conversion model, and determining a user feature vector of each propagation user; the user feature vector conversion model specifically comprises:
wherein s isiA user feature vector for the ith propagation user; y isiPropagating coefficients for users of an ith said propagating user;a user timing vector for the ith said propagating user; wu、bu、And bsConverting a preset coefficient of a model for the user feature vector; e is the natural logarithm.
In this embodiment, the identification device first performs time domain transformation on the calculated user propagation coefficients of each propagation user to obtain user timing vectors of each propagation user, that is, the user timing vectors are obtainedAs described above, because the nonlinearity of the tanh function has a better matching degree with the time sequence characteristic, when the time domain conversion is performed on the user propagation coefficient in S1052, the tanh function is also used, and in order to adapt to the requirement of the user feature vector, the preset coefficient in the tanh function is adjusted, that is, W is the preset coefficientuAnd bu。
In this embodiment, the recognition device determines the user timing vector of each propagating user and then passes the signal function, i.e., the timing vector of each propagating userDetermining the user characteristic vector corresponding to each user time sequence vector, whereinAnd bsIs a preset parameter value.
In S1053, a user feature matrix is generated based on the user feature vectors of the respective propagation users.
In this embodiment, after determining the user feature vectors of the respective propagation users, the identification device may determine the user feature vector of each propagation user, for example, identify whether the user is a rumor user or a general propagation user through the user feature vectors, so that the user characteristics of all users who propagate the target message can be intuitively determined through the user feature matrix formed by the user feature vectors, thereby improving the efficiency of identifying whether the target message is a false message.
Specifically, if the user characteristic vectors of the multiple propagation users propagating the target message match with the characteristic vector of the rumor user, it can be determined that the target message is mainly propagated by the rumor, which means that the target message is a false message with a high probability.
In S1054, a mask vector of each carrier text is obtained according to the text matrix, and the mask vector and the user feature matrix are imported into a user propagation feature value calculation model, so as to determine a user propagation feature value of each carrier text; the user propagation characteristic value calculation model specifically comprises the following steps:
wherein, the [ alpha ], [ beta ]i]The user feature matrix is obtained; m isjA mask vector for the jth said carrier text; p is a radical ofjPropagating feature values for users of the jth said carrier text; d ([ s ]i]*j) Is a non-null element statistical function.
In this embodiment, since the text matrix is generated based on the identifier of the propagation user, if the ith element in the text matrix is not null, it indicates that the ith user has propagated through the carrier text. Therefore, in order to determine the user propagation feature values of the respective carrier texts, it is first necessary to determine which users have propagated through the carrier file, i.e. to generate the mask vector described above. For example, if the text matrix of a certain carrier text is [5,0,0,5,0,7,5,6 ]]Then, it means that there are five propagation users performing propagation operations on the carrier text, and therefore the corresponding mask vectors are: [1,0,0,1,0,1,1,1]So that the user feature vector of each propagation user associated with the carrier text can be extracted from the user feature matrix through the mask vector, namely, the obtained value is obtainedi]*mj。
In this embodiment, the recognition device, after determining the propagating users contributing to the propagation of the carrier text, will calculate the mean of the propagation vectors of the respective users, thus passing d ([ s ])i]*j) Function statistici]*jAnd the number of the non-hollow elements, so that the user propagation characteristic value obtained by calculation is the average value of the characteristic vectors of all the users.
In S1055, a user propagation feature vector of the target message is generated according to each of the user propagation feature values.
In this embodiment, after determining the user propagation feature values of all the carrier texts, the recognition device aggregates all the user propagation feature values to form a user propagation feature vector corresponding to the target message.
In the embodiment of the invention, the user characteristic vectors of the propagation users are calculated, and the average user characteristic vector of each carrier text, namely the user propagation characteristic value is determined based on the user characteristic vectors, so that the user propagation characteristic vectors not only have the user characteristics, but also include the propagation characteristics of the carrier text, and the accuracy of false message identification is improved.
Fig. 4a shows a flowchart of a specific implementation of the false message identification method S106 according to the fourth embodiment of the present invention. Referring to fig. 4a, with respect to the embodiment shown in fig. 1 to 3, in the method for identifying a false message provided in this embodiment, calculating the authenticity index of the target label according to the user propagation feature vector and the text feature vector includes S1061 to S1062, which are detailed as follows:
further, the calculating the authenticity index of the target label according to the user propagation feature vector and the text feature vector includes:
in S1061, aggregating the user propagation feature vector and the text feature vector to obtain an authenticity identification matrix of the target message.
In this embodiment, after determining the user propagation vector and the text feature vector, the recognition device performs an aggregation operation on the two vectors to form an authenticity recognition matrix including the two types of features. Specifically, if the user propagation vector is n1*m1And the text feature vector is n2*m2The aggregated true-false identification matrix is (n)1+n2)*max(m1,m2) And if blank elements exist in the aggregated authenticity identification matrix, filling the blank elements with a preset character, wherein the preset character is preferably 0.
In S1062, the authenticity identification matrix is led into an authenticity index calculation model to obtain an authenticity index of the target message; the authenticity index calculation model specifically comprises the following steps:
wherein,the true or false index is the true or false index; [j]Identifying a matrix for the authenticity;and bcCalculating a preset coefficient of the model for the authenticity index; e is the natural logarithm.
In this embodiment, after determining the true identification matrix, the terminal device imports the matrix into a fingerprint index calculation model, where the true and false index calculation model is specifically a signal function, that is, a functionWherein,and bcThe preset coefficient of the model is calculated for the authenticity index, can be determined through training and learning, and can be manually adjusted according to the requirements of an administrator.
For example, fig. 4b shows a calculation block diagram of a calculation model of an authenticity index according to an embodiment of the present invention. Wherein, the [ alpha ], [ beta ]j]Text feature vector [ alpha ], [ alphaj]For the authenticity identification matrix, thej]The feature vectors are propagated for the users.
In the embodiment of the invention, the text characteristic vector and the user propagation characteristic vector are aggregated to obtain the authenticity identification matrix, so that the two parameters can be integrated into one parameter, the calculation times are reduced, and the calculation efficiency of the authenticity index is improved.
Fig. 5 shows a flowchart of a specific implementation of the false message identification method S102 according to the fifth embodiment of the present invention. Referring to fig. 5, with respect to the embodiment described in fig. 1, the method for identifying a false message S102 provided in this embodiment further includes: s1021 and S1022 are specifically described as follows:
further, after creating a clustering interface calling service based on each clustering interface, the method further includes:
in S1021, a global propagation matrix of the target message is constructed based on the carrier text and the identity of the propagating userij]n×m(ii) a Wherein, the aijThe propagation mark value of the ith propagation user for the jth carrier text; the n is the number of the propagation users; the m is the number of the carrier texts; .
In this embodiment, after the identification device obtains the propagation path of each carrier text, it may determine each propagation user propagating the carrier text, and generate a sequence based on the user number of each propagation user. The statistical operation of the propagation user is carried out on each carrier text in the above mode, so that a global propagation matrix about the target message can be obtained. Wherein, the element set of the ith row in the global propagation matrix represents which carrier texts are propagated by the ith propagation user; and the element set of the jth column in the global propagation matrix represents which propagation users propagate the jth carrier text, so that the propagation condition of the target message in the network platform can be determined through the global propagation matrix, the propagation information of each carrier text can be obtained by dividing the columns, and the propagation information of each propagation user can be obtained by dividing the rows.
In this embodiment, aijA propagation flag value of the ith propagation user for the jth carrier text is set, specifically, if the ith propagation user propagates the jth carrier text, the propagation flag value is 1; on the other hand, if the ith propagation user has not propagated the jth carrier text, the propagation flag value is 0, thereby constituting the global propagation matrix [1 ] and 0ij]n×mAnd the propagation contribution of any propagation user to each carrier file can be searched through the global propagation matrix.
In S1022, the global propagation matrix [ 2 ]ij]n×mThe submatrix formed by each column in the text matrix is used as the text matrix of each carrier text.
In this embodiment, the global propagation matrix[ij]n×mThe set of the elements in the ith column is the propagation users by which the ith carrier text is propagated, so that the global propagation matrix [ 2 ]ij]n×mAnd dividing the text into m sub-matrixes, wherein each sub-matrix is a text matrix corresponding to the carrier text.
In the embodiment of the invention, the propagation conditions of each carrier text and each propagation user can be conveniently determined by constructing the global propagation matrix, and the text matrix of each carrier text can be obtained by dividing based on the global propagation matrix, so that the generation efficiency of the text matrix is improved.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
Fig. 6 shows a block diagram of a device for identifying a false message according to an embodiment of the present invention, where the device for identifying a false message includes units for performing the steps in the corresponding embodiment of fig. 1. Please refer to fig. 1 and fig. 1 for the corresponding description of the embodiment. For convenience of explanation, only the portions related to the present embodiment are shown.
Referring to fig. 6, the apparatus for identifying the false message includes:
a target message parameter obtaining unit 61, configured to obtain multiple carrier texts containing target messages and propagation paths of the carrier texts; the propagation path comprises an identification of a propagation user propagating the carrier text;
a text matrix generating unit 62, configured to obtain a text matrix of each carrier text based on the carrier text and the identifier of the propagation user;
a text feature vector generating unit 63, configured to import each text matrix into a preset feature vector calculation model, so as to obtain a text feature vector of the target message;
a user propagation matrix generating unit 64, configured to generate a user propagation matrix about the target message according to propagation paths of all the carrier texts; each element contained in the user propagation matrix is specifically the number of the carrier texts propagated by each propagation user;
the user propagation feature vector calculation unit 65 is configured to import the user propagation matrix into a preset user feature calculation model, so as to obtain a user propagation feature vector corresponding to the target message;
a true-false index calculation unit 66, configured to calculate a true-false index of the target message according to the user propagation feature vector and the text feature vector;
a false message identification unit 67, configured to identify the target message as a false message if the authenticity index is within a preset false index range.
Optionally, the text feature vector generating unit 63 includes:
the text parameter acquisition unit is used for respectively acquiring the propagation times, the content characteristic parameters and the propagation time parameters of the carrier texts;
the import sequence determining unit is used for sequencing each carrier text based on the propagation time parameter and determining the import sequence of each carrier text;
the text time sequence vector calculation unit is used for importing the propagation times, the content characteristic parameters, the propagation time parameters and the text matrix into a text time sequence vector conversion model to obtain text time sequence vectors of the carrier texts; the text time sequence vector conversion model specifically comprises the following steps:
wherein,the text time sequence vector of the carrier text with the lead-in order T, η the propagation times, Delta T the propagation time parameter, xuIs the text matrix;ta fusion matrix of vector texts with the import order of t is obtained;τthe content characteristic parameter is obtained; waAnd baPresetting an adjusting coefficient for the text time sequence vector conversion model;
the text characteristic vector calculation unit is used for importing the text time sequence vector of each carrier text into each level of the multilayer feedback recurrent neural network based on the importing sequence to obtain the text characteristic vector of the target message; the multilayer feedback cyclic neural network specifically comprises:
wherein h is0The method comprises the steps of (1) setting a preset initial text vector;a text time sequence vector for each carrier text; h is1、h2…ht-1The text feature iteration intermediate value output for each level of the multilayer feedback cyclic neural network; h istA text feature vector of the target message; w, U, b is the adjustment factor.
Optionally, the user propagation feature vector calculation unit 65 includes:
a propagation coefficient determining unit, configured to perform singular value decomposition on the user propagation matrix to obtain a user propagation coefficient of each propagation user;
the user characteristic vector calculation unit is used for respectively importing the user propagation coefficients into a propagation characteristic vector conversion model and determining the user characteristic vectors of the propagation users; the user feature vector conversion model specifically comprises:
wherein s isiA user feature vector for the ith propagation user; y isiPropagating coefficients for users of an ith said propagating user;a user timing vector for the ith said propagating user; wu、bu、And bsConverting a preset coefficient of a model for the user feature vector; e is a natural logarithm;
a user feature matrix generating unit, configured to generate a user feature matrix based on the user feature vector of each propagation user;
the user propagation characteristic value calculation unit is used for obtaining a mask vector of each carrier text according to a text matrix, importing the mask vector and the user characteristic matrix into a user propagation characteristic value calculation model, and determining a user propagation characteristic value of each carrier text; the user propagation characteristic value calculation model specifically comprises the following steps:
wherein, the [ alpha ], [ beta ]i]The user feature matrix is obtained; m isjA mask vector for the jth said carrier text; p is a radical ofjPropagating feature values for users of the jth said carrier text; d ([ s ]i]*j) A statistical function for non-null elements;
and the user propagation characteristic vector determining unit is used for generating the user propagation characteristic vector of the target message according to each user propagation characteristic value.
Optionally, the authenticity index calculation unit 66 includes:
the authenticity identification matrix generating unit is used for aggregating the user propagation characteristic vector and the text characteristic vector to obtain an authenticity identification matrix of the target message;
the authenticity index calculation unit is used for leading the authenticity identification matrix into an authenticity index calculation model to obtain the authenticity index of the target message; the authenticity index calculation model specifically comprises the following steps:
wherein,the true or false index is the true or false index; [j]Identifying a matrix for the authenticity;and bcCalculating a preset coefficient of the model for the authenticity index; e is the natural logarithm.
Alternatively, the text matrix generating unit 62 includes:
a global propagation matrix creating unit for constructing a global propagation matrix of the target message based on the carrier text and the identification of the propagation userij]n×m(ii) a Wherein, the aijThe propagation mark value of the ith propagation user for the jth carrier text; the n is the number of the propagation users; the m is the number of the carrier texts; .
A text matrix segmentation unit for segmenting the global propagation matrix [ 2 ]ij]n×mThe submatrix formed by each column is used as the text of each carrier textAnd (4) matrix.
Therefore, the false message identification device provided by the embodiment of the invention can also be used for identifying and obtaining the false message without manually researching and obtaining evidence, thereby reducing the labor cost and the time required by investigation, but can be used for analyzing the text characteristics of the carrier text for transmitting the target message and the user characteristics of each propagation user propagating the target message, wherein whether the target message has the flaring characteristic can be shown through the text characteristic vector, whether the target message has the outbreak propagation property in the propagation process can be shown through the user characteristic vector, and the false index of the target message can be obtained through the two characteristic vectors, thereby identifying and obtaining whether the target message is the false message, and improving the identification accuracy of the false message.
Fig. 7 is a schematic diagram of a device for identifying false messages according to another embodiment of the present invention. As shown in fig. 7, the false message identification device 7 of this embodiment includes: a processor 70, a memory 71 and a computer program 72, such as a false message identification program, stored in said memory 71 and executable on said processor 70. The processor 70, when executing the computer program 72, implements the steps in the above-described embodiments of the method for identifying false messages, such as S101 to S107 shown in fig. 1. Alternatively, the processor 70, when executing the computer program 72, implements the functions of the units in the above-described device embodiments, such as the functions of the modules 61 to 67 shown in fig. 6.
Illustratively, the computer program 72 may be divided into one or more units, which are stored in the memory 71 and executed by the processor 70 to accomplish the present invention. The one or more units may be a series of computer program instruction segments capable of performing specific functions for describing the execution of the computer program 72 in the identification device 7 of false messages. For example, the computer program 72 may be divided into a target message parameter acquiring unit, a text matrix generating unit, a text feature vector generating unit, a user propagation matrix generating unit, a user propagation feature vector calculating unit, a authenticity index calculating unit, and a false message identifying unit, each of which functions as described above.
The false message identification device 7 may be a computing device such as a desktop computer, a notebook, a palm computer, and a cloud server. The identification device of the false message may include, but is not limited to, a processor 70 and a memory 71. It will be appreciated by those skilled in the art that fig. 7 is merely an example of a false message identification device 7 and does not constitute a limitation of a false message identification device 7 and may comprise more or less components than shown, or some components in combination, or different components, for example the false message identification device may also comprise an input output device, a network access device, a bus, etc.
The Processor 70 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 71 may be an internal storage unit of the identification device 7 of the false message, such as a hard disk or a memory of the identification device 7 of the false message. The memory 71 may also be an external storage device of the identification device 7 for the false message, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like provided on the identification device 7 for the false message. Further, the memory 71 may also comprise both an internal memory unit of the identification device 7 of the dummy message and an external memory device. The memory 71 is used for storing the computer program and other programs and data needed for the identification of the false message. The memory 71 may also be used to temporarily store data that has been output or is to be output.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.
Claims (10)
1. A method for identifying false messages, comprising:
acquiring a plurality of carrier texts containing target messages and a propagation path of each carrier text; the propagation path comprises an identification of a propagation user propagating the carrier text;
obtaining a text matrix of each carrier text based on the carrier text and the identifier of the propagation user;
importing each text matrix into a preset feature vector calculation model to obtain a text feature vector of the target message;
generating a user propagation matrix related to the target message according to the propagation paths of all the carrier texts; each element contained in the user propagation matrix is specifically the number of the carrier texts propagated by each propagation user;
importing the user propagation matrix into a preset user characteristic calculation model to obtain a user propagation characteristic vector corresponding to the target message;
calculating the authenticity index of the target message according to the user propagation feature vector and the text feature vector;
and if the authenticity index is within a preset false index range, identifying the target message as a false message.
2. The identification method according to claim 1, wherein the importing each text matrix into a preset feature vector calculation model to obtain the text feature vector of the target message comprises:
respectively acquiring the propagation times, content characteristic parameters and propagation time parameters of each carrier text;
sequencing each carrier text based on the propagation time parameter, and determining the import sequence of each carrier text;
importing the propagation times, the content characteristic parameters, the propagation time parameters and the text matrix into a text time sequence vector conversion model to obtain a text time sequence vector of each carrier text; the text time sequence vector conversion model specifically comprises the following steps:
wherein,the text time sequence vector of the carrier text with the lead-in order of T, η the propagation times, and Delta T the propagation timeAn inter-parameter; x is the number ofuIs the text matrix; x is the number oftA fusion matrix of vector texts with the import order of t is obtained; x is the number ofτThe content characteristic parameter is obtained; waAnd baPresetting an adjusting coefficient for the text time sequence vector conversion model;
based on the importing sequence, importing the text time sequence vector of each carrier text into each level of a multi-layer feedback recurrent neural network to obtain a text feature vector of the target message; the multilayer feedback cyclic neural network specifically comprises:
wherein h is0The method comprises the steps of (1) setting a preset initial text vector;a text time sequence vector for each carrier text; h is1、h2…ht-1The text feature iteration intermediate value output for each level of the multilayer feedback cyclic neural network; h istA text feature vector of the target message; w, U, b is the adjustment factor.
3. The identification method according to claim 1, wherein the step of importing the user propagation matrix into a preset user feature calculation model to obtain a user propagation feature vector corresponding to the target message comprises:
singular value decomposition is carried out on the user propagation matrix to obtain a user propagation coefficient of each propagation user;
respectively importing the user propagation coefficients into a propagation characteristic vector conversion model, and determining the user characteristic vectors of the propagation users; the user feature vector conversion model specifically comprises:
wherein s isiA user feature vector for the ith propagation user; y isiPropagating coefficients for users of an ith said propagating user;a user timing vector for the ith said propagating user; wu、bu、And bsConverting a preset coefficient of a model for the user feature vector; e is a natural logarithm;
generating a user feature matrix based on the user feature vectors of the propagation users;
obtaining a mask vector of each carrier text according to a text matrix, importing the mask vector and the user feature matrix into a user propagation feature value calculation model, and determining a user propagation feature value of each carrier text; the user propagation characteristic value calculation model specifically comprises the following steps:
wherein, [ s ]i]The user feature matrix is obtained; m isjA mask vector for the jth said carrier text; p is a radical ofjPropagating feature values for users of the jth said carrier text; d ([ s ]i]*mj) A statistical function for non-null elements;
and generating a user propagation characteristic vector of the target message according to each user propagation characteristic value.
4. The identification method according to any one of claims 1 to 3, wherein the calculating the authenticity index of the target message according to the user propagation feature vector and the text feature vector comprises:
aggregating the user propagation characteristic vectors and the text characteristic vectors to obtain a true and false identification matrix of the target message;
importing the authenticity identification matrix into an authenticity index calculation model to obtain an authenticity index of the target message; the authenticity index calculation model specifically comprises the following steps:
wherein,the true or false index is the true or false index; [ c ] isj]Identifying a matrix for the authenticity;and bcCalculating a preset coefficient of the model for the authenticity index; e is the natural logarithm.
5. The method according to claim 1, wherein the obtaining a text matrix of each carrier text based on the carrier text and the identifier of the propagation user comprises:
constructing a global propagation matrix [ a ] of the target message based on the carrier text and the identification of the propagation userij]n×m(ii) a Wherein, the aijThe propagation mark value of the ith propagation user for the jth carrier text; the n is the number of the propagation users; the m is the number of the carrier texts;
the global propagation matrix [ a ]ij]n×mThe submatrix formed by each column in the text matrix is used as the text matrix of each carrier text.
6. A false message identification device, characterized in that the false message identification device comprises a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
acquiring a plurality of carrier texts containing target messages and a propagation path of each carrier text; the propagation path comprises an identification of a propagation user propagating the carrier text;
obtaining a text matrix of each carrier text based on the carrier text and the identifier of the propagation user;
importing each text matrix into a preset feature vector calculation model to obtain a text feature vector of the target message;
generating a user propagation matrix related to the target message according to the propagation paths of all the carrier texts; each element contained in the user propagation matrix is specifically the number of the carrier texts propagated by each propagation user;
importing the user propagation matrix into a preset user characteristic calculation model to obtain a user propagation characteristic vector corresponding to the target message;
calculating the authenticity index of the target message according to the user propagation feature vector and the text feature vector;
and if the authenticity index is within a preset false index range, identifying the target message as a false message.
7. The apparatus for identifying false messages according to claim 6, wherein the importing each text matrix into a preset feature vector calculation model to obtain the text feature vector of the target message includes:
respectively acquiring the propagation times, content characteristic parameters and propagation time parameters of each carrier text;
sequencing each carrier text based on the propagation time parameter, and determining the import sequence of each carrier text;
importing the propagation times, the content characteristic parameters, the propagation time parameters and the text matrix into a text time sequence vector conversion model to obtain a text time sequence vector of each carrier text; the text time sequence vector conversion model specifically comprises the following steps:
wherein,the text time sequence vector of the carrier text with the lead-in order T, η the propagation times, Delta T the propagation time parameter, xuIs the text matrix; x is the number oftA fusion matrix of vector texts with the import order of t is obtained; x is the number ofτThe content characteristic parameter is obtained; waAnd baPresetting an adjusting coefficient for the text time sequence vector conversion model;
based on the importing sequence, importing the text time sequence vector of each carrier text into each level of a multi-layer feedback recurrent neural network to obtain a text feature vector of the target message; the multilayer feedback cyclic neural network specifically comprises:
wherein h is0The method comprises the steps of (1) setting a preset initial text vector;a text time sequence vector for each carrier text; h is1、h2…ht-1The text feature iteration intermediate value output for each level of the multilayer feedback cyclic neural network; h istA text feature vector of the target message; w, U, b is the adjustment factor.
8. The apparatus for identifying a false message according to claim 6, wherein the step of importing the user propagation matrix into a preset user feature calculation model to obtain a user propagation feature vector corresponding to the target message includes:
singular value decomposition is carried out on the user propagation matrix to obtain a user propagation coefficient of each propagation user;
respectively importing the user propagation coefficients into a propagation characteristic vector conversion model, and determining the user characteristic vectors of the propagation users; the user feature vector conversion model specifically comprises:
wherein s isiA user feature vector for the ith propagation user; y isiPropagating coefficients for users of an ith said propagating user;a user timing vector for the ith said propagating user; wu、bu、And bsConverting a preset coefficient of a model for the user feature vector; e is a natural logarithm;
generating a user feature matrix based on the user feature vectors of the propagation users;
obtaining a mask vector of each carrier text according to a text matrix, importing the mask vector and the user feature matrix into a user propagation feature value calculation model, and determining a user propagation feature value of each carrier text; the user propagation characteristic value calculation model specifically comprises the following steps:
wherein, [ s ]i]The user feature matrix is obtained; m isjA mask vector for the jth said carrier text; p is a radical ofjPropagating feature values for users of the jth said carrier text; d ([ s ]i]*mj) A statistical function for non-null elements;
and generating a user propagation characteristic vector of the target message according to each user propagation characteristic value.
9. An apparatus for identifying false messages according to any one of claims 6-8, wherein the calculating the authenticity index of the target message according to the user propagation feature vector and the text feature vector comprises:
aggregating the user propagation characteristic vectors and the text characteristic vectors to obtain a true and false identification matrix of the target message;
importing the authenticity identification matrix into an authenticity index calculation model to obtain an authenticity index of the target message; the authenticity index calculation model specifically comprises the following steps:
wherein,the true or false index is the true or false index; [ c ] isj]Identifying a matrix for the authenticity;and bcCalculating a preset coefficient of the model for the authenticity index; e is the natural logarithm.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810309691.1A CN108830630B (en) | 2018-04-09 | 2018-04-09 | False message identification method and equipment |
PCT/CN2018/097540 WO2019196259A1 (en) | 2018-04-09 | 2018-07-27 | Method for identifying false message and device thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810309691.1A CN108830630B (en) | 2018-04-09 | 2018-04-09 | False message identification method and equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108830630A true CN108830630A (en) | 2018-11-16 |
CN108830630B CN108830630B (en) | 2020-04-10 |
Family
ID=64154438
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810309691.1A Active CN108830630B (en) | 2018-04-09 | 2018-04-09 | False message identification method and equipment |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108830630B (en) |
WO (1) | WO2019196259A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110188194A (en) * | 2019-04-26 | 2019-08-30 | 哈尔滨工业大学(深圳) | A kind of pseudo event detection method and system based on multi-task learning model |
CN110750735A (en) * | 2019-10-23 | 2020-02-04 | 腾讯科技(深圳)有限公司 | False event identification method, device, equipment and storage medium based on block chain network |
CN111428151A (en) * | 2020-04-20 | 2020-07-17 | 浙江工业大学 | False message identification method and device based on network acceleration |
CN111831790A (en) * | 2020-06-23 | 2020-10-27 | 广东工业大学 | False news identification method based on low threshold integration and text content matching |
TWI731469B (en) * | 2019-11-11 | 2021-06-21 | 財團法人資訊工業策進會 | Apparatus and method for verfication of information |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160212163A1 (en) * | 2015-01-16 | 2016-07-21 | The Trustees Of The Stevens Institute Of Technology | Method and Apparatus to Identify the Source of Information or Misinformation in Large-Scale Social Media Networks |
CN106354845A (en) * | 2016-08-31 | 2017-01-25 | 上海交通大学 | Microblog rumor recognizing method and system based on propagation structures |
CN106980692A (en) * | 2016-05-30 | 2017-07-25 | 国家计算机网络与信息安全管理中心 | A kind of influence power computational methods based on microblogging particular event |
CN107797998A (en) * | 2016-08-29 | 2018-03-13 | 腾讯科技(深圳)有限公司 | The recognition methods of user-generated content containing rumour and device |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103902621B (en) * | 2012-12-28 | 2017-02-08 | 深圳先进技术研究院 | Method and device for identifying network rumor |
CN105045857A (en) * | 2015-07-09 | 2015-11-11 | 中国科学院计算技术研究所 | Social network rumor recognition method and system |
-
2018
- 2018-04-09 CN CN201810309691.1A patent/CN108830630B/en active Active
- 2018-07-27 WO PCT/CN2018/097540 patent/WO2019196259A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160212163A1 (en) * | 2015-01-16 | 2016-07-21 | The Trustees Of The Stevens Institute Of Technology | Method and Apparatus to Identify the Source of Information or Misinformation in Large-Scale Social Media Networks |
CN106980692A (en) * | 2016-05-30 | 2017-07-25 | 国家计算机网络与信息安全管理中心 | A kind of influence power computational methods based on microblogging particular event |
CN107797998A (en) * | 2016-08-29 | 2018-03-13 | 腾讯科技(深圳)有限公司 | The recognition methods of user-generated content containing rumour and device |
CN106354845A (en) * | 2016-08-31 | 2017-01-25 | 上海交通大学 | Microblog rumor recognizing method and system based on propagation structures |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110188194A (en) * | 2019-04-26 | 2019-08-30 | 哈尔滨工业大学(深圳) | A kind of pseudo event detection method and system based on multi-task learning model |
CN110750735A (en) * | 2019-10-23 | 2020-02-04 | 腾讯科技(深圳)有限公司 | False event identification method, device, equipment and storage medium based on block chain network |
CN110750735B (en) * | 2019-10-23 | 2024-07-16 | 腾讯科技(深圳)有限公司 | False event identification method, device, equipment and storage medium based on blockchain network |
TWI731469B (en) * | 2019-11-11 | 2021-06-21 | 財團法人資訊工業策進會 | Apparatus and method for verfication of information |
CN111428151A (en) * | 2020-04-20 | 2020-07-17 | 浙江工业大学 | False message identification method and device based on network acceleration |
CN111428151B (en) * | 2020-04-20 | 2022-05-17 | 浙江工业大学 | False message identification method and device based on network acceleration |
CN111831790A (en) * | 2020-06-23 | 2020-10-27 | 广东工业大学 | False news identification method based on low threshold integration and text content matching |
CN111831790B (en) * | 2020-06-23 | 2023-07-14 | 广东工业大学 | False news identification method based on low threshold integration and text content matching |
Also Published As
Publication number | Publication date |
---|---|
WO2019196259A1 (en) | 2019-10-17 |
CN108830630B (en) | 2020-04-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108830630B (en) | False message identification method and equipment | |
CN110162593B (en) | Search result processing and similarity model training method and device | |
CN109101620B (en) | Similarity calculation method, clustering method, device, storage medium and electronic equipment | |
US20210182611A1 (en) | Training data acquisition method and device, server and storage medium | |
CN108427708B (en) | Data processing method, data processing apparatus, storage medium, and electronic apparatus | |
CN105022754B (en) | Object classification method and device based on social network | |
WO2019200782A1 (en) | Sample data classification method, model training method, electronic device and storage medium | |
CN108021651B (en) | Network public opinion risk assessment method and device | |
CN112650923A (en) | Public opinion processing method and device for news events, storage medium and computer equipment | |
CN108595688A (en) | Across the media Hash search methods of potential applications based on on-line study | |
CN109492217B (en) | Word segmentation method based on machine learning and terminal equipment | |
WO2022247955A1 (en) | Abnormal account identification method, apparatus and device, and storage medium | |
CN109918498B (en) | Problem warehousing method and device | |
CN111178077A (en) | Corpus generation method, corpus generation device and intelligent device | |
EP2786221A2 (en) | Classifying attribute data intervals | |
CN112131322B (en) | Time sequence classification method and device | |
CN110135681A (en) | Risk subscribers recognition methods, device, readable storage medium storing program for executing and terminal device | |
CN112307860A (en) | Image recognition model training method and device and image recognition method and device | |
CN110909125A (en) | Media rumor detection method for shoji society | |
CN111428151B (en) | False message identification method and device based on network acceleration | |
CN114416998A (en) | Text label identification method and device, electronic equipment and storage medium | |
CN116881430A (en) | Industrial chain identification method and device, electronic equipment and readable storage medium | |
CN114925286A (en) | Public opinion data processing method and device | |
CN111523586A (en) | Noise-aware-based full-network supervision target detection method | |
CN114896977A (en) | Dynamic evaluation method for entity service trust value of Internet of things |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |