WO2019196259A1 - Method for identifying false message and device thereof - Google Patents

Method for identifying false message and device thereof Download PDF

Info

Publication number
WO2019196259A1
WO2019196259A1 PCT/CN2018/097540 CN2018097540W WO2019196259A1 WO 2019196259 A1 WO2019196259 A1 WO 2019196259A1 CN 2018097540 W CN2018097540 W CN 2018097540W WO 2019196259 A1 WO2019196259 A1 WO 2019196259A1
Authority
WO
WIPO (PCT)
Prior art keywords
propagation
user
text
carrier
matrix
Prior art date
Application number
PCT/CN2018/097540
Other languages
French (fr)
Chinese (zh)
Inventor
王健宗
黄章成
吴天博
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019196259A1 publication Critical patent/WO2019196259A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0207Discounts or incentives, e.g. coupons or rebates
    • G06Q30/0225Avoiding frauds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0248Avoiding fraud
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0609Buyer or seller confidence or verification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • G06Q50/265Personal security, identity or safety

Definitions

  • the present application belongs to the field of information processing technologies, and in particular, to a method for identifying a fake message and a device thereof.
  • False news refers to a message that is fabricated without facts. False news can erroneously influence the public's point of view and lead people to make wrong choices. Especially in the field of financial investment, false news may cause investors to make wrong investment choices, and even cause investors to panic, resulting in investment chaos in the economic market and increasing the risk of loss of economic property of users. Therefore, how to accurately identify whether the target message is a false message is of great significance.
  • the existing identification technology of the fake message needs to be related to the target message to determine whether the target message is a false message.
  • the above method requires a lot of manpower for thread surveying, especially when there are multiple destinations and the same area as the investigator's location, which requires a lot of time and labor costs, and the recognition efficiency is low.
  • the embodiment of the present application provides a method for identifying a false message and a device thereof, to solve the existing method for identifying a fake message, which requires a large amount of time and labor cost, and has a low recognition efficiency.
  • a first aspect of the embodiment of the present application provides a method for identifying a fake message, including:
  • each element included in the user propagation matrix is specifically a number of carrier texts propagated by each of the propagation users;
  • the target message is identified as a false message.
  • the embodiment of the present application obtains a text matrix of each carrier text by acquiring all carrier texts including the target message and the propagation path of each carrier text, and by using the carrier text and the identifier of the propagating user included in the propagation path, and adopting multiple text matrices.
  • the authenticity index of the target message identifies whether the target message is a false message by using an authenticity index.
  • the present embodiment does not require manual research and forensics, thereby reducing the labor cost and the time required for the investigation, and can collect the text features of the carrier text of the target message and the dissemination.
  • the user characteristics of each of the propagating users of the target message are analyzed, wherein the text feature vector can indicate whether the target message has a swaying characteristic, and the user eigenvector can indicate whether the target message has burst propagation in the process of propagation.
  • the false index of the target message can be obtained, thereby identifying whether the target message is a false message, and improving the recognition accuracy of the fake message.
  • FIG. 1 is a flowchart of an implementation of a method for identifying a fake message according to a first embodiment of the present application
  • FIG. 2 is a flowchart of a specific implementation method for identifying a false message S103 according to the second embodiment of the present application;
  • FIG. 3 is a flowchart of a specific implementation method for identifying a false message S105 according to the third embodiment of the present application;
  • FIG. 4 is a flowchart of a specific implementation method of a method for identifying a fake message according to a fourth embodiment of the present application
  • 4b is a computational block diagram of a true and false index calculation model provided by an embodiment of the present application.
  • FIG. 5 is a flowchart of a specific implementation method of a method for identifying a fake message according to a fourth embodiment of the present application
  • FIG. 6 is a structural block diagram of a device for identifying a fake message according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of an apparatus for identifying a fake message according to another embodiment of the present application.
  • the execution subject of the process is an identification device of a fake message.
  • the identification device of the fake message includes, but is not limited to, a recognition device for a fake message such as a notebook computer, a computer, a server, a tablet computer, and a smart phone.
  • the method for identifying the fake message may be a server of a network platform, so that various propagation parameters such as the forwarding amount, the propagation speed, and the propagation path of each piece of the broadcast text on the network platform can be obtained.
  • FIG. 1 is a flowchart of an implementation of a method for identifying a fake message according to a first embodiment of the present application, which is described in detail as follows:
  • a plurality of carrier texts including a target message, and a propagation path of each of the carrier texts are acquired; the propagation path includes an identifier of a propagation user that propagates the carrier text.
  • the target message may be set by the user, that is, when the user needs to determine the authenticity of a certain message, the content of the target message may be input into the identification device of the fake message provided in this embodiment, or
  • the message carrier, such as an article, link, etc., containing the message is sent to the identification device, which in turn determines the target message from the message carrier.
  • the identification device may further set a detection period to periodically detect the authenticity of the message propagated in the network platform where the identification device is located. In this case, the identification device collects the carrier text contained in the network platform by using a preset detection period, and extracts the target message from each carrier text propagated by the network platform based on the preset target message extraction condition, and performs the correlation of S101. operating.
  • the preset target message extraction condition may be: extracting a text keyword from each carrier text based on a semantic recognition algorithm, and counting the number of occurrences of the same text keyword in each carrier text; if a certain text keyword If the number of occurrences is greater than the preset number of times threshold, it is determined that the message corresponding to the text keyword is the target message.
  • the propagation of the message depends on various carriers, for example, by texts such as articles, comments, chat records, etc., and the text carrying the target message is the above-mentioned carrier text.
  • the identification device may query whether the target message is included in each text. If a certain text in the network platform includes the target message, the text is identified as the carrier text.
  • the false message has a certain time limit, that is, the burst period of the false message propagation may be in a short time range ranging from one week or ten days without a long time, for example, even one year. False messages have been spread earlier without being discovered.
  • a valid time range is set, that is, the text whose creation time is within the valid time range and contains the target message is recognized as the carrier text, and the text whose creation time is outside the valid time range is It is not recognized, which improves processing efficiency and effectively filters out a large number of invalid texts.
  • the identification device obtains a propagation path of the carrier text, where the propagation path is specifically a path in which the carrier text flows between the various propagation users in the network platform, so the propagation path may include a propagation user that propagates the carrier text.
  • logo The identifier of the propagating user may be a user name of the propagating user, a user account, or user information of the propagating user.
  • the user information of the user is used. Since the same entity user can register a plurality of different user accounts in the network platform, and multiple user names exist, different user names or user accounts may correspond. The entity is the same, but the user information can avoid the above situation, because the user information, such as the ID number, is unique, so that the entity corresponding to the same user information is the same. Improve the efficiency of false message recognition.
  • a text matrix of each of the carrier texts is obtained based on the carrier text and the identifier of the propagating user.
  • the same propagating user can propagate a plurality of carrier texts about the target message, and the same carrier text can be propagated by a plurality of different propagating users. Therefore, in order to accurately determine the propagation of the target message, the identifying device Based on the propagation path of the carrier text, the user identifiers of all the propagating users that propagate the carrier text are determined, and a text matrix is constructed for each carrier text based on the user ID of the propagating user.
  • the text matrix may include text content information of the carrier text in addition to the user identification information of the carrier text propagation user.
  • the identification device performs a keyword extraction operation on the carrier text to determine keywords included in the carrier text.
  • the extracted keyword is a keyword associated with the target message, and after identifying the target message, the identification device determines candidate keywords associated with the target message, and determines which ones are included from the carrier text.
  • the candidate keyword determines a content feature parameter of the carrier text based on the identified candidate keyword, and then constructs a text matrix of the carrier text based on the content feature parameter and the identifier of the propagation user.
  • each of the text matrices is imported into a preset feature vector calculation model to obtain a text feature vector of the target message.
  • the carrier text containing the fake message since an important feature of the spurious message is the explosiveness and extensiveness of the propagation speed, the carrier text containing the fake message also has the above two features.
  • the text matrix generated by each carrier text according to the identifier of the corresponding propagating user can represent the relevant feature of the carrier text in the perspective of user propagation, and determine whether there is explosiveness and extensiveness. If it exists, it indicates that the carrier text is very It may carry a false message. Since a carrier text may contain a plurality of different messages, in order to determine whether the burst propagation is caused by the target message, the text matrix of each carrier text needs to be identified. Therefore, after the text matrix of each carrier text is generated, the identification device needs to import each text matrix into a preset feature vector calculation model, and determine a text feature vector of the target message as a reference parameter for identifying the authenticity of the target message. one.
  • S102 and S103 are text feature vectors for calculating a target message
  • S104 and S105 are user feature vectors for calculating a target message
  • the identification device of the message may first execute S102 and S103, and then execute S104 and S105; or execute S104 and S105 first, and then perform S102 and S103.
  • the identification device can concurrently perform dual-thread calculation, the operations of S102 and S104 can be performed simultaneously.
  • a user propagation matrix about the target message is generated according to a propagation path of all the carrier texts; each element included in the user propagation matrix is specifically a number of carrier texts propagated by each of the propagation users. .
  • one propagating user can simultaneously transmit a plurality of carrier texts including the target message. Therefore, in order to determine the number of propagating carrier texts of each propagating user, it is necessary to count each according to the propagation path of each carrier text. The number of texts transmitted by the user is propagated, and the user propagation matrix corresponding to the target message is obtained.
  • the user of the false message is generally generated by the user, that is, the creator, consciously and continuously disseminating the carrier text of the false message, that is, for the rumor, the number of the carrier files will account for a large proportion of the total amount of the carrier text.
  • the general propagating user of the non-executor has a limited number of propagating carrier texts, which is a scattered propagating behavior. Therefore, through the user propagation matrix, it can better reflect whether the rumor maliciously spreads the target message, thereby judging whether the target message is For false news.
  • the identification device may create a propagation user network map, and draw a propagation path of each carrier text on the propagation user network map according to a propagation path of each carrier text, if the propagation path passes through the network diagram A propagating user adds 1 to the number of propagating texts of the propagating user, so that after all the propagating paths are drawn, the number of carrier texts propagated by each propagating user can be determined, and the user propagating matrix is generated.
  • the order of the respective propagating users in the matrix in the user propagation matrix is consistent with the order of propagation on the propagation path. That is, if a propagating user is the author of the carrier file, that is, the first communicator, the order in which the user matrix is propagated is 1, and so on. If multiple users are in the same propagation order, the propagation users of the same propagation order may be sorted again based on the number of propagation carrier texts, and an array of the number of carrier texts propagated by the propagation users of the same propagation order may be used as an array. The user propagates the elements of the order in the matrix.
  • the user propagation matrix is imported into a preset user feature calculation model to obtain a user propagation feature vector corresponding to the target message.
  • the propagation rule of the target message between the propagation users can be determined by the user propagation matrix.
  • the identification device imports the user propagation matrix of the target message into the user feature calculation.
  • the model determines a user propagation feature vector for the target message, and whether the feature vector conforms to the user propagation feature of the fake message by the user, and thus can be used as one of the reference parameters for the subsequent calculation of the authenticity index.
  • an authenticity index of the target message is calculated according to the user propagation feature vector and the text feature vector.
  • the identification device may calculate the authenticity index of the target message.
  • the specific manner of the calculation may be: importing the user propagation feature vector and the text feature vector into a preset authenticity index calculation model, and obtaining the authenticity index of the target message by converting the authenticity index calculation model.
  • the authenticity index calculation model can be a neural network.
  • the administrator generates a corresponding user propagation feature vector and a text feature vector through the training message, and imports it into the neural network for calculating the authenticity index, and adjusts each parameter in the neural network to minimize the value of the loss function of the neural network. Then, the adjusted neural network is used as the calculation model of the authenticity index.
  • the expression of the loss function of the neural network is specifically:
  • L j is the actual authenticity index of the training message. Is the default regular item. After the feature vector of the training message is propagated and the text feature vector is imported into the authenticity index calculation model, the obtained authenticity index is calculated. N is the total number of training messages.
  • the text feature vector and each parameter value included in the user feature vector may be compared with a preset false parameter range, and the statistical parameter is The value of the parameter falls within the range of the false parameter, and the number of the parameter is used as the authenticity index of the target message.
  • the authenticity index can be used to characterize the similarity between the target message and the fake message.
  • the identification device has only a false index range. If the authenticity index calculated by a target message is within the false index range, it indicates that the target message conforms to the false message in both the text feature and the propagation user feature. The feature, therefore, identifies the target message as a false message; conversely, if the authenticity index of the target message is outside the range of the false index, it indicates that the target message does not match the feature of the fake message, and the target message is identified as a real message.
  • the method for identifying a fake message obtains all the carrier texts including the target message, and the propagation path of each carrier text, and the carrier text and the identifier of the propagating user included in the propagation path. Obtaining a text matrix of each carrier text, and obtaining a text feature vector of the target message through a plurality of text matrices; at the same time, obtaining a user propagation matrix through the propagation path of each carrier text, and then calculating a user propagation characteristic of the target message Finally, the authenticity index of the target message is calculated based on the user propagation feature vector and the text feature vector, and the authenticity index is used to identify whether the target message is a false message.
  • the present embodiment does not require manual research and forensics, thereby reducing the labor cost and the time required for the investigation, and can collect the text features of the carrier text of the target message and the dissemination.
  • the user characteristics of each of the propagating users of the target message are analyzed, wherein the text feature vector can indicate whether the target message has a swaying characteristic, and the user eigenvector can indicate whether the target message has burst propagation in the process of propagation.
  • the false index of the target message can be obtained, thereby identifying whether the target message is a false message, and improving the recognition accuracy of the fake message.
  • FIG. 2 is a flowchart showing a specific implementation of the method S103 for identifying a fake message according to the second embodiment of the present application.
  • S103 includes S1031 to S1034, and the details are as follows:
  • the recognition device of the fake message acquires the propagation times of the carrier text, the content feature parameters, and the propagation time parameters, in addition to the propagation path of each carrier text. Multiple aspects of the carrier text are discriminated against authenticity attributes.
  • the number of times of propagation includes, in addition to the number of times the user forwards the carrier text, the number of times the user comments the carrier text and the number of times the carrier text is liked, that is, the number of times various behaviors contributing to the propagation of the vector text.
  • the content feature parameter is specifically used to represent the content information that is required to be expressed by the carrier text, and the manner of extracting may be determined by determining the keyword included in the carrier text, and then determining the carrier text by using the extracted keyword, as described in S102.
  • the propagation time parameters include, but are not limited to, at least one of the following: carrier text creation time, average propagation interval, total propagation duration, and the like.
  • each of the carrier texts is sorted based on the propagation time parameter, and an order of introduction of each of the carrier texts is determined.
  • the multi-layer feedback loop neural network used in this embodiment determines the text feature vector of the target message, it is necessary to preset to determine the import order of each carrier text to the multi-layer loop neural network, that is, the loop level at which it is located. Wherein, if the level of the multi-layer feedback loop neural network is greater than the number of the carrier text, the number of layers of the multi-layer loop neural network is reduced during the import operation to match the number of the carrier text.
  • the identification device determines the import order of the respective carrier texts according to the propagation time parameters, wherein the manner of determining the import order differs based on the types of parameters included in the propagation time parameters. For example, if the propagation time parameter is the creation time of the carrier text, the import order of each carrier text may be determined according to the order of creation time; if the propagation time parameter is the total propagation duration, the size may be based on the length of the total propagation duration. To determine the order in which the individual carrier texts are imported.
  • the number of propagation times, the content feature parameter, the propagation time parameter, and the text matrix are imported into a text time series vector conversion model to obtain a text timing vector of each of the carrier texts;
  • the vector conversion model is specifically:
  • the identification device first constructs a text feature matrix of the carrier text according to the number of propagations, the content feature parameter, the propagation time parameter, and the text matrix, that is, the above-mentioned x t , and the manner of constructing may be based on the text matrix.
  • Three matrix rows are added to store the three sets of feature quantities: the number of propagation times, the content feature parameters and the propagation time parameters. If the text matrix is an n-dimensional matrix, the corresponding text feature matrix is an n+3 dimensional matrix.
  • the multi-layer cyclic neural network is a neural network with timing relationship, it is necessary to perform time-series conversion between the text feature matrices, that is, to determine the text timing vector of the carrier text.
  • the tanh function is used in this embodiment because the function has good nonlinearity and is matched with the timing characteristics. Therefore, the recognition device will import the text feature matrix into the tanh function to determine the text timing vector corresponding to each carrier text.
  • the identification device sequentially introduces the text timing vectors of the respective carrier texts into each level in the multi-layer feedback loop neural network based on the introduction order of the carrier text, and the output of each level is used as the next level.
  • Input the time series characteristics of each carrier text are continuously superimposed, and the calculated text feature vector is a vector based on the influence of each carrier text superposition, and fully integrates the text features of each text.
  • the identification device uses the output of the last layer of the cyclic neural network as the text feature vector of the target message. It should be noted that before the multi-layer loop neural network is extracted, the identification device adjusts the level of each multi-layer cyclic neural network according to the number of carrier texts of the target message, so that the level matches the number of the carrier text.
  • the text time vector of each carrier text is determined by collecting multiple parameter values of the carrier text, and the text feature vector of the target message is calculated based on the multi-layer cyclic neural network, thereby improving the text feature vector for the text characteristic.
  • the richness of the ones thus improving the accuracy of false message recognition.
  • FIG. 3 is a flowchart showing a specific implementation of a method for identifying a fake message S105 according to the third embodiment of the present application.
  • the method S105 for identifying a fake message provided by this embodiment further includes S1051 to S1055, which are specifically described as follows:
  • the user propagation matrix is subjected to singular value decomposition to obtain user propagation coefficients of each of the propagation users.
  • the user propagation matrix is a global matrix for all propagating users, if it is necessary to determine the user propagation coefficient of each propagating user, it is necessary to perform singular value decomposition on the user propagation matrix, thereby being able to determine different propagations.
  • the contribution of the user to the dissemination of the target message Specifically, if the user propagation matrix is a matrix of 1*N, the diagonal matrix of the singular decomposition is a regular matrix of 1*1, which can be decomposed into N 1*1 matrices and identified as each of the propagation users. User propagation factor.
  • each of the user propagation coefficients is respectively imported into a propagation feature vector transformation model to determine a user feature vector of each of the propagation users;
  • the user feature vector transformation model is specifically:
  • s i is the user feature vector of the i-th propagation user
  • y i is the user propagation coefficient of the i-th propagation user
  • W u , b u , And b s is a preset coefficient of the user feature vector conversion model
  • e is a natural logarithm.
  • the identification device first performs time domain transformation on the calculated user propagation coefficients of the respective propagation users, thereby obtaining a user timing vector of each propagation user, that is, As described above, due to the nonlinearity of the tanh function, it has a good matching degree with the timing characteristics. Therefore, when the user propagation coefficient is time-domain converted in S1052, the tanh function is also used, in order to meet the needs of the user feature vector, wherein the predetermined coefficient for adjusting, i.e. W u and b u.
  • the identification device After the identification device determines the user timing vector of each propagation user, the identification device passes the signal function, that is, Determining a user feature vector corresponding to each user timing vector, wherein And b s is a preset parameter value.
  • a user feature matrix is generated based on user feature vectors of each of the propagation users.
  • the user feature vector of each propagation user may be determined, for example, by using the user feature vector to identify whether the user is an rumor user or a general communication user, Through the user feature matrix formed by each user feature vector, the user property of all users of the propagation target message can be intuitively determined, thereby improving the efficiency of identifying whether the target message is a fake message.
  • the target message is mainly propagated by the creator, indicating that the target message is a false message. high.
  • a mask vector of each of the carrier texts is obtained according to a text matrix, and the mask vector and the user feature matrix are imported into a user propagation feature value calculation model to determine user propagation characteristics of each of the carrier texts.
  • Value the user propagation feature value calculation model is specifically:
  • [s i ] is the user feature matrix
  • m j is the mask vector of the carrier text described in the jth article
  • p j is the user propagation feature value of the carrier text described in the jth article
  • d([s i ] *m j ) is a non-empty element statistical function.
  • the text matrix is generated based on the identifier of the propagating user, if the i-th element in the text matrix is non-empty, it indicates that the i-th user has propagated the carrier text. Therefore, in order to determine the user-propagation feature values of the individual carrier texts, it is first necessary to determine which users have propagated the carrier file, ie to generate the mask vector described above. For example, if the text matrix of a certain carrier text is [5, 0, 0, 5, 0, 7, 5, 6], it means that five propagating users have propagated the carrier text, so the corresponding The mask vector is: [1,0,0,1,0,1,1,1], so that the user characteristics of each propagating user associated with the carrier text can be extracted from the user feature matrix by the mask vector. Vector, that is, get [s i ]*m j .
  • the identification device identified by a propagated contribution to support text propagating user, calculates the respective users spread of the mean vector, and thus statistics [s i] by d ([s i] * m j) functions * The number of non-empty elements in m j , so that the calculated user propagation feature value is the mean of each user characteristic vector.
  • a user propagation feature vector of the target message is generated according to each of the user propagation feature values.
  • the identification device aggregates all the user propagation feature values to form a user propagation feature vector corresponding to the target message.
  • the user feature vector of each propagation user is calculated, and the average user feature vector of each carrier text, that is, the user propagation feature value, is determined based on the user feature vector, so that the user propagation feature vector not only has the user
  • the feature also includes the propagation characteristics of the carrier text, thereby improving the accuracy of false message recognition.
  • FIG. 4a is a flowchart showing a specific implementation of a method for identifying a fake message S106 according to the fourth embodiment of the present application.
  • the method for identifying a fake message according to the embodiment of the present invention provides the method for calculating the false message according to the user propagation feature vector and the text feature vector.
  • the authenticity index of the target tag, including S1061 ⁇ S1062 is as follows:
  • the calculating the authenticity index of the target tag according to the user propagation feature vector and the text feature vector including:
  • the user propagation feature vector and the text feature vector are aggregated to obtain an authenticity recognition matrix of the target message.
  • the identification device performs an aggregation operation on the two vectors to form an authenticity recognition matrix including the above two types of features. Specifically, if the user propagation vector is a matrix of n 1 *m 1 and the text feature vector is a matrix of n 2 *m 2 , the authenticity recognition matrix obtained by the aggregation is (n 1 +n 2 )*max( m 1 , m 2 ), wherein if the authenticity recognition matrix after the aggregation has a blank element, the preset character can be used for filling, and preferably, the preset character is 0.
  • the authenticity recognition matrix is introduced into the authenticity index calculation model to obtain an authenticity index of the target message;
  • the authenticity index calculation model is specifically:
  • the authenticity index [c j ] is the authenticity recognition matrix;
  • b c is a preset coefficient of the authenticity index calculation model;
  • e is a natural logarithm.
  • the matrix is imported into the fingerprint index calculation model, and the authenticity index calculation model is specifically a signal function, that is, a function. among them,
  • b c is a preset coefficient of the authenticity index calculation model, which can be determined through training and learning, and can also be manually adjusted according to the needs of the administrator.
  • FIG. 4b shows a calculation block diagram of a true and false index calculation model provided by an embodiment of the present application.
  • [c j ] is the authenticity recognition matrix
  • [p j ] is a user propagation feature vector.
  • the text feature vector and the user propagation feature vector are aggregated to obtain an authenticity recognition matrix, so that two parameters can be integrated into one parameter, the number of calculations is reduced, and the calculation of the authenticity index is improved. effectiveness.
  • FIG. 5 is a flowchart showing a specific implementation of a method for identifying a fake message S102 according to the fifth embodiment of the present application.
  • the method S102 for identifying a fake message provided by this embodiment further includes: S1021 and S1022, which are specifically described as follows:
  • the method further includes:
  • a global propagation matrix [a ij ] n ⁇ m of the target message is constructed based on the carrier text and the identifier of the propagation user; wherein the a ij is the i-th propagation user for the jth a propagation tag value of the carrier text; the n is the number of the propagating users; the m is the number of the carrier texts;
  • the identification device may determine each of the propagation users that propagate the carrier text, and generate a sequence based on the user numbers of the respective propagation users. For each carrier text, the statistical operation of the user is propagated in the above manner, so that it is critical to the global propagation matrix about the target message.
  • the global propagation matrix can be used to determine the propagation of the target message in the network platform, and the propagation information of each carrier text can be obtained by dividing the columns, and the propagation information of each propagation user can be obtained by dividing the rows.
  • a ij is the propagation flag value of the i-th propagation user for the j-th carrier text, specifically, if the i-th propagation user propagates the j-th carrier text, the propagation flag value is 1; On the other hand, if the i-th propagation user does not propagate the j-th carrier text, the propagation flag value is 0, thereby constituting a global propagation matrix [a ij ] n ⁇ m composed of 1 and 0, through which the global propagation The matrix can look up the propagation contribution of any propagating user to individual carrier files.
  • a submatrix composed of each of the global propagation matrices [a ij ] n ⁇ m is used as a text matrix of each of the carrier texts.
  • the set of elements of the i-th column is the propagation of the i-th carrier text by which users, so the global propagation matrix [a ij ] can be n ⁇ m is divided into m sub-matrices, and each sub-matrix is a text matrix corresponding to the carrier text.
  • FIG. 6 is a structural block diagram of a device for identifying a fake message according to an embodiment of the present application.
  • the device included in the device for identifying a fake message is used to execute each step in the embodiment corresponding to FIG. 1.
  • FIG. 1 For details, please refer to the related description in the embodiment corresponding to FIG. 1 and FIG. 1. For the convenience of explanation, only the parts related to the present embodiment are shown.
  • the device for identifying a fake message includes:
  • a target message parameter obtaining unit 61 configured to acquire a plurality of carrier texts including a target message, and a propagation path of each of the carrier texts; the propagation path includes an identifier of a propagation user that propagates the carrier text;
  • a text matrix generating unit 62 configured to obtain a text matrix of each of the carrier texts based on the carrier text and the identifier of the propagating user;
  • a text feature vector generating unit 63 configured to import each of the text matrices into a preset feature vector calculation model to obtain a text feature vector of the target message;
  • a user propagation matrix generating unit 64 configured to generate a user propagation matrix about the target message according to a propagation path of all the carrier texts; each element included in the user propagation matrix is specifically propagated for each of the propagation users The number of vector texts;
  • the user propagation feature vector calculation unit 65 is configured to import the user propagation matrix into a preset user feature calculation model to obtain a user propagation feature vector corresponding to the target message.
  • the authenticity index calculation unit 66 is configured to calculate an authenticity index of the target message according to the user propagation feature vector and the text feature vector;
  • the false message identifying unit 67 is configured to identify the target message as a fake message if the authenticity index is within a preset false index range.
  • the text feature vector generating unit 63 includes:
  • a text parameter obtaining unit configured to respectively acquire a propagation number, a content feature parameter, and a propagation time parameter of each of the carrier texts;
  • an import order determining unit configured to sort each of the carrier texts based on the propagation time parameter, and determine an import order of each of the carrier texts
  • a text timing vector calculation unit configured to import the number of propagation times, the content feature parameter, the propagation time parameter, and the text matrix into a text time series vector conversion model to obtain a text timing vector of each of the carrier texts;
  • the text time series vector conversion model is specifically:
  • a text feature vector calculation unit configured to import text timing vectors of each of the carrier texts into each layer of the multi-layer feedback loop neural network based on the import order, to obtain a text feature vector of the target message;
  • the feedback cyclic neural network is specifically:
  • the user propagation feature vector calculation unit 65 includes:
  • a propagation coefficient determining unit configured to perform singular value decomposition on the user propagation matrix to obtain a user propagation coefficient of each of the propagation users
  • a user feature vector calculation unit configured to import each of the user propagation coefficients into a propagation feature vector transformation model to determine a user feature vector of each of the propagation users;
  • the user feature vector transformation model is specifically:
  • s i is the user feature vector of the i-th propagation user
  • y i is the user propagation coefficient of the i-th propagation user
  • W u , b u , And b s is a preset coefficient of the user feature vector conversion model
  • e is a natural logarithm
  • a user feature matrix generating unit configured to generate a user feature matrix based on user feature vectors of each of the propagation users
  • a user propagation feature value calculation unit configured to obtain a mask vector of each of the carrier texts according to a text matrix, and import the mask vector and the user feature matrix into a user propagation feature value calculation model to determine each of the carriers
  • the user of the text propagates the feature value;
  • the user propagation feature value calculation model is specifically:
  • [s i ] is the user feature matrix
  • m j is the mask vector of the carrier text described in the jth article
  • p j is the user propagation feature value of the carrier text described in the jth article
  • d([s i ] *m j ) is a non-empty element statistical function
  • a user propagation feature vector determining unit configured to generate a user propagation feature vector of the target message according to each of the user propagation feature values.
  • the authenticity index calculation unit 66 includes:
  • An authenticity recognition matrix generating unit configured to aggregate the user propagation feature vector and the text feature vector to obtain an authenticity recognition matrix of the target message
  • the authenticity index calculation unit is configured to import the authenticity recognition matrix into the authenticity index calculation model to obtain an authenticity index of the target message;
  • the authenticity index calculation model is specifically:
  • the authenticity index [c j ] is the authenticity recognition matrix;
  • b c is a preset coefficient of the authenticity index calculation model;
  • e is a natural logarithm.
  • the text matrix generating unit 62 includes:
  • a global propagation matrix creating unit configured to construct a global propagation matrix [a ij ] n ⁇ m of the target message based on the carrier text and the identifier of the propagation user; wherein the a ij is an ith propagation user a propagation tag value for the jth carrier text; the n is the number of the propagating users; the m is the number of the carrier texts;
  • a text matrix dividing unit configured to use a sub-matrix formed by each of the global propagation matrices [a ij ] n ⁇ m as a text matrix of each of the carrier texts.
  • the identification device of the fake message provided by the embodiment of the present application can also reduce the labor cost and the time required for the investigation by eliminating the need for manual research and forensics, and can collect the text feature of the carrier text of the target message and the propagation of the target text.
  • the user characteristics of each of the propagating users of the target message are analyzed, wherein the text feature vector can indicate whether the target message has a swaying characteristic, and the user eigenvector can indicate whether the target message has burst propagation during the process of propagation.
  • the false index of the target message can be obtained, thereby identifying whether the target message is a false message, and improving the recognition accuracy of the fake message.
  • FIG. 7 is a schematic diagram of an apparatus for identifying a fake message according to another embodiment of the present application.
  • the identification device 7 of the fake message of this embodiment includes a processor 70, a memory 71, and computer readable instructions 72 stored in the memory 71 and operable on the processor 70, for example The identification procedure for false messages.
  • the processor 70 executes the computer readable instructions 72 to implement the steps in the foregoing method for identifying the respective fake messages, such as S101 to S107 shown in FIG. 1.
  • the processor 70 when executing the computer readable instructions 72, implements the functions of the various units in the various apparatus embodiments described above, such as the functions of the modules 61 through 67 shown in FIG.
  • the computer readable instructions 72 may be partitioned into one or more units, the one or more units being stored in the memory 71 and executed by the processor 70 to complete the application.
  • the one or more units may be a series of computer readable instruction instruction segments capable of performing a particular function for describing the execution of the computer readable instructions 72 in the identification device 7 of the fake message.
  • the computer readable instructions 72 may be segmented into a target message parameter acquisition unit, a text matrix generation unit, a text feature vector generation unit, a user propagation matrix generation unit, a user propagation feature vector calculation unit, an authenticity index calculation unit, and a false
  • the message identification unit has specific functions as described above.

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Tourism & Hospitality (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Game Theory and Decision Science (AREA)
  • Educational Administration (AREA)
  • Primary Health Care (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application applies to the technical field of information processing, and provides a method for identifying a false message and a device thereof. The method comprises: obtaining multiple carrier texts comprising a target message, and propagation paths of the carrier texts; obtaining text matrices of the carrier texts on the basis of the carrier texts and an identifier of a propagation user; importing the text matrices into a preset feature vector calculation model to obtain a text feature vector of the target message; generating a user propagation matrix with respect to the target message according to the propagation paths of all the carrier texts; importing the user propagation matrix into a preset user feature calculation model to obtain a user propagation feature vector corresponding to the target message; calculating an authenticity index of the target message according to the user propagation feature vector and the text feature vector; and if the authenticity index falls within a preset authenticity index range, identifying the target message as a false message. According to the present application, there is no need for manual forensic investigation, thereby reducing labor costs and an investigation duration and improving the accuracy of false message identification.

Description

一种虚假消息的识别方法及其设备Method for identifying false message and device thereof
本申请申明享有2018年04月09日递交的申请号为201810309691.1、名称为“一种虚假消息的识别方法及其设备”中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。The present application claims the priority of the Chinese Patent Application entitled "A Method for Identifying False Messages and Its Equipment", which is filed on Apr. 09, 2018, the entire contents of which are incorporated by reference. In this application.
技术领域Technical field
本申请属于信息处理技术领域,尤其涉及一种虚假消息的识别方法及其设备。The present application belongs to the field of information processing technologies, and in particular, to a method for identifying a fake message and a device thereof.
背景技术Background technique
虚假消息,或称“谣言”,指没有事实存在而捏造的消息。虚假消息会错误地影响大众的观点,引导人们做出错误的选择。特别在金融投资领域,虚假消息可能会使投资者做出错误的投资选择,甚至引起投资人的恐慌,造成经济市场的投资混乱,并增加用户的经济财产损失的风险。因此,如何准确地识别出目标消息是否为虚假消息有着重要的意义。False news, or "rumor", refers to a message that is fabricated without facts. False news can erroneously influence the public's point of view and lead people to make wrong choices. Especially in the field of financial investment, false news may cause investors to make wrong investment choices, and even cause investors to panic, resulting in investment chaos in the economic market and increasing the risk of loss of economic property of users. Therefore, how to accurately identify whether the target message is a false message is of great significance.
现有的虚假消息的识别技术,需要通过对目标消息展开相关调查后才能确定该目标消息是否为虚假消息。然而上述方式需要耗费大量人力进行线程勘察,特别当目标消息的发生地有多个且与调查人员的所在地不再同一地区时,则需要耗费大量的时间成本以及人力成本,识别效率较低。The existing identification technology of the fake message needs to be related to the target message to determine whether the target message is a false message. However, the above method requires a lot of manpower for thread surveying, especially when there are multiple destinations and the same area as the investigator's location, which requires a lot of time and labor costs, and the recognition efficiency is low.
技术问题technical problem
有鉴于此,本申请实施例提供了一种虚假消息的识别方法及其设备,以解决现有的虚假消息的识别方法,需要耗费大量的时间成本以及人力成本,识别效率较低的问题。In view of this, the embodiment of the present application provides a method for identifying a false message and a device thereof, to solve the existing method for identifying a fake message, which requires a large amount of time and labor cost, and has a low recognition efficiency.
技术解决方案Technical solution
本申请实施例的第一方面提供了一种虚假消息的识别方法,包括:A first aspect of the embodiment of the present application provides a method for identifying a fake message, including:
获取包含目标消息的多个载体文本,以及各个所述载体文本的传播路径;所述传播路径包括传播所述载体文本的传播用户的标识;Acquiring a plurality of carrier texts containing the target message, and a propagation path of each of the carrier texts; the propagation path comprising an identifier of a propagation user propagating the carrier text;
基于所述载体文本以及所述传播用户的标识,得到各个所述载体文本的文本矩阵;Obtaining a text matrix of each of the carrier texts based on the carrier text and the identifier of the propagating user;
将各个所述文本矩阵导入至预设的特征向量计算模型,得到所述目标消息的文本特征向量;Importing each of the text matrices into a preset feature vector calculation model to obtain a text feature vector of the target message;
根据所有所述载体文本的传播路径,生成关于所述目标消息的用户传播矩阵;所述用户传播矩阵中包含的各元素具体为每个所述传播用户传播的载体文本的个数;Generating, according to a propagation path of all the carrier texts, a user propagation matrix about the target message; each element included in the user propagation matrix is specifically a number of carrier texts propagated by each of the propagation users;
将所述用户传播矩阵导入到预设的用户特征计算模型,得到所述目标消息对应的用户传播特征向量;Importing the user propagation matrix into a preset user feature calculation model to obtain a user propagation feature vector corresponding to the target message;
根据所述用户传播特征向量以及所述文本特征向量,计算所述目标消息的真伪指数;Calculating an authenticity index of the target message according to the user propagation feature vector and the text feature vector;
若所述真伪指数在预设的虚假指数范围内,则识别所述目标消息为虚假消息。If the authenticity index is within a preset false index range, the target message is identified as a false message.
有益效果Beneficial effect
本申请实施例通过获取包含目标消息的所有载体文本,以及各个载体文本的传播路径,通过载体文本以及传播路径中包含的传播用户的标识,得到各个载体文本的文本矩阵,并通过多个文本矩阵得到该目标消息的文本特征向量;与此同时,通过各个载体文本的传播路径,得到用户传播矩阵,继而计算得到该目标消息的用户传播特征向量;最后,基于用户传播特征向量以及文本特征向量计算该目标消息的真伪指数,通过真伪指数识别该目标消息是否为虚假消息。与现有的虚假消息识别技术相比,本实施例无需人工调研取证,从而减少了人工成本以及调查所需的时间,而是可以通过采集传递该目标消息的载体文本的文本特征以及对传播过该目标消息的各个传播用户的用户特征进行分析,其中,通过文本特征向量可以表现出该目标消息是否具有煽动特性,通过用户特征向量可以表现出该目标消息在传播的过程中是否具有爆发传播性,通过上述两个特征向量则可得到该目标消息的虚假指数,从而识别得到该目标消息是否为虚假消息,提高了虚假消息的识别准确率。The embodiment of the present application obtains a text matrix of each carrier text by acquiring all carrier texts including the target message and the propagation path of each carrier text, and by using the carrier text and the identifier of the propagating user included in the propagation path, and adopting multiple text matrices. Obtaining the text feature vector of the target message; at the same time, obtaining the user propagation matrix through the propagation path of each carrier text, and then calculating the user propagation feature vector of the target message; finally, calculating based on the user propagation feature vector and the text feature vector The authenticity index of the target message identifies whether the target message is a false message by using an authenticity index. Compared with the existing false message recognition technology, the present embodiment does not require manual research and forensics, thereby reducing the labor cost and the time required for the investigation, and can collect the text features of the carrier text of the target message and the dissemination. The user characteristics of each of the propagating users of the target message are analyzed, wherein the text feature vector can indicate whether the target message has a swaying characteristic, and the user eigenvector can indicate whether the target message has burst propagation in the process of propagation. Through the above two feature vectors, the false index of the target message can be obtained, thereby identifying whether the target message is a false message, and improving the recognition accuracy of the fake message.
附图说明DRAWINGS
图1是本申请第一实施例提供的一种虚假消息的识别方法的实现流程图;1 is a flowchart of an implementation of a method for identifying a fake message according to a first embodiment of the present application;
图2是本申请第二实施例提供的一种虚假消息的识别方法S103具体实现流程图;2 is a flowchart of a specific implementation method for identifying a false message S103 according to the second embodiment of the present application;
图3是本申请第三实施例提供的一种虚假消息的识别方法S105具体实现流程图;FIG. 3 is a flowchart of a specific implementation method for identifying a false message S105 according to the third embodiment of the present application;
图4a是本申请第四实施例提供的一种虚假消息的识别方法S106具体实现流程图;FIG. 4 is a flowchart of a specific implementation method of a method for identifying a fake message according to a fourth embodiment of the present application;
图4b是本申请一实施例提供的一种真伪指数计算模型的计算框图;4b is a computational block diagram of a true and false index calculation model provided by an embodiment of the present application;
图5是本申请第四实施例提供的一种虚假消息的识别方法S102具体实现流程图;FIG. 5 is a flowchart of a specific implementation method of a method for identifying a fake message according to a fourth embodiment of the present application;
图6是本申请一实施例提供的一种虚假消息的识别设备的结构框图;FIG. 6 is a structural block diagram of a device for identifying a fake message according to an embodiment of the present application;
图7是本申请另一实施例提供的一种虚假消息的识别设备的示意图。FIG. 7 is a schematic diagram of an apparatus for identifying a fake message according to another embodiment of the present application.
本发明的实施方式Embodiments of the invention
在本申请实施例中,流程的执行主体为虚假消息的识别设备。该虚假消息的识别设备包括但不限于:笔记本电脑、计算机、服务器、平板电脑以及智能手机等虚假消息的识别设备。特别地,该虚假消息的识别方法可以为一网络平台的服务器,从而可以获取得到该网络平台上各个传播文本的转发量、传播速度以及传播路径等各种传播参数。图1示出了本申请第一实施例提供的虚假消息的识别方法的实现流程图,详述如下:In the embodiment of the present application, the execution subject of the process is an identification device of a fake message. The identification device of the fake message includes, but is not limited to, a recognition device for a fake message such as a notebook computer, a computer, a server, a tablet computer, and a smart phone. In particular, the method for identifying the fake message may be a server of a network platform, so that various propagation parameters such as the forwarding amount, the propagation speed, and the propagation path of each piece of the broadcast text on the network platform can be obtained. FIG. 1 is a flowchart of an implementation of a method for identifying a fake message according to a first embodiment of the present application, which is described in detail as follows:
在S101中,获取包含目标消息的多个载体文本,以及各个所述载体文本的传播路径;所述传播路径包括传播所述载体文本的传播用户的标识。In S101, a plurality of carrier texts including a target message, and a propagation path of each of the carrier texts are acquired; the propagation path includes an identifier of a propagation user that propagates the carrier text.
在本实施例中,目标消息可以由用户进行设置,即当用户需要判定某一消息的真伪性时,可以把该目标消息的内容输入到本实施例提供的虚假消息的识别设备内,或者将包含该消息的文章、链接等消息载体发送给识别设备,继而识别设备从该消息载体中确定目标消息。可选地,识别设备还可以设置一个检测周期,定期检测在该识别设备所在网络平台中传播消息的真伪性。在该情况下,识别设备以预设的检测周期采集网络平台内包含的载体文本,并基于预设的目标消息提取条件,从网络平台传播的各个载体文本中提取目标消息,并执行S101的相关操作。In this embodiment, the target message may be set by the user, that is, when the user needs to determine the authenticity of a certain message, the content of the target message may be input into the identification device of the fake message provided in this embodiment, or The message carrier, such as an article, link, etc., containing the message is sent to the identification device, which in turn determines the target message from the message carrier. Optionally, the identification device may further set a detection period to periodically detect the authenticity of the message propagated in the network platform where the identification device is located. In this case, the identification device collects the carrier text contained in the network platform by using a preset detection period, and extracts the target message from each carrier text propagated by the network platform based on the preset target message extraction condition, and performs the correlation of S101. operating.
可选地,该预设的目标消息提取条件可以为:基于语义识别算法,从各个载体文本中提取文本关键词,并统计各个载体文本中相同文本关键词的出现次数;若某一文本关键词的出现次数大于预设的次数阈值,则确定该文本关键词所对应的消息为目标消息。Optionally, the preset target message extraction condition may be: extracting a text keyword from each carrier text based on a semantic recognition algorithm, and counting the number of occurrences of the same text keyword in each carrier text; if a certain text keyword If the number of occurrences is greater than the preset number of times threshold, it is determined that the message corresponding to the text keyword is the target message.
在本实施例中,消息的传播依赖各种各样的载体,例如通过文章、评论、聊天记录等文本形式进行传播,而承载目标消息的文本即为上述的载体文本。识别设备在确定了目标消息后,可以查询各个文本中是否包含目标消息,若网络平台中的某一文本包含目标消息,则识别该文本为载体文本。优选地,由于虚假消息是具有一定的时限性的,即虚假消息传播的爆发期会在一周或十几天不等的短期时间范围内,而不会存留较长的时间,例如在一年甚至更早之前已经开始传播虚假消息而不被发现。为了减少识别设备对载体文本的处理数量,设置有一有效时间范围,即获取创建时间在该有效时间范围内且包含目标消息的文本才识别为载体文本,而创建时间在有效时间范围外的文本则不予以识别,从而提高处理效率,并有效筛选出大量无效的文本。In this embodiment, the propagation of the message depends on various carriers, for example, by texts such as articles, comments, chat records, etc., and the text carrying the target message is the above-mentioned carrier text. After the target device determines the target message, the identification device may query whether the target message is included in each text. If a certain text in the network platform includes the target message, the text is identified as the carrier text. Preferably, since the false message has a certain time limit, that is, the burst period of the false message propagation may be in a short time range ranging from one week or ten days without a long time, for example, even one year. False messages have been spread earlier without being discovered. In order to reduce the number of processing of the carrier text by the identification device, a valid time range is set, that is, the text whose creation time is within the valid time range and contains the target message is recognized as the carrier text, and the text whose creation time is outside the valid time range is It is not recognized, which improves processing efficiency and effectively filters out a large number of invalid texts.
在本实施例中,识别设备会获取载体文本的传播路径,该传播路径具体为该载体文本在网络平台中各个传播用户之间流转的路径,因此传播路径会包含传播了该载体文本的传播用户的标识。其中,该传播用户的标识可以为传播用户的用户名、用户账户或该传播用户的用户信息。优选地,在本实施例中采用传播用户的用户信息,由于同一实体用户可以在网络平台中注册多个不同的用户账户,并存在多个用户名,因此不同的用户名或用户账户可能对应的实体人是相同给的,但采用用户信息则可以避免上述情况的发生,因为用户信息,例如身份证号码等,是具有唯一性的,从而保证了相同的用户信息对应的实体人也是相同的,提高了虚假消息识别的效率。In this embodiment, the identification device obtains a propagation path of the carrier text, where the propagation path is specifically a path in which the carrier text flows between the various propagation users in the network platform, so the propagation path may include a propagation user that propagates the carrier text. Logo. The identifier of the propagating user may be a user name of the propagating user, a user account, or user information of the propagating user. Preferably, in this embodiment, the user information of the user is used. Since the same entity user can register a plurality of different user accounts in the network platform, and multiple user names exist, different user names or user accounts may correspond. The entity is the same, but the user information can avoid the above situation, because the user information, such as the ID number, is unique, so that the entity corresponding to the same user information is the same. Improve the efficiency of false message recognition.
在S102中,基于所述载体文本以及所述传播用户的标识,得到各个所述载体文本的文本矩阵。In S102, a text matrix of each of the carrier texts is obtained based on the carrier text and the identifier of the propagating user.
在本实施例中,同一个传播用户可以传播关于目标消息的多个载体文本,而同一个载体文本可以由多个不同的传播用户进行传播,因此,为了准确确定目标消息的传播情况,识别设备会根据载体文本的传播路径,确定传播该载体文本的所有传播用户的用户标识,并基于该传播用户的用户标识,对每一个载体文本构建一个文本矩阵。In this embodiment, the same propagating user can propagate a plurality of carrier texts about the target message, and the same carrier text can be propagated by a plurality of different propagating users. Therefore, in order to accurately determine the propagation of the target message, the identifying device Based on the propagation path of the carrier text, the user identifiers of all the propagating users that propagate the carrier text are determined, and a text matrix is constructed for each carrier text based on the user ID of the propagating user.
优选地,文本矩阵除了包含载体文本传播用户的用户标识信息外,还可以包括该载体文本的文本内容信息。在该情况下,识别设备会对该载体文本进行关键词提取操作,确定该载体文本中包含的关键词。需要说明的是,所提取得到的关键词是与目标消息相关联的关键词,识别设备在确定目标消息后,会确定与该目标消息相关联的候选关键词,并从载体文本中确定包含哪些候选关键词,基于识别得到的候选关键词确定该载体文本的内容特征参数,然后基于内容特征参数以及传播用户的标识,构建该载体文本的文本矩阵。Preferably, the text matrix may include text content information of the carrier text in addition to the user identification information of the carrier text propagation user. In this case, the identification device performs a keyword extraction operation on the carrier text to determine keywords included in the carrier text. It should be noted that the extracted keyword is a keyword associated with the target message, and after identifying the target message, the identification device determines candidate keywords associated with the target message, and determines which ones are included from the carrier text. The candidate keyword determines a content feature parameter of the carrier text based on the identified candidate keyword, and then constructs a text matrix of the carrier text based on the content feature parameter and the identifier of the propagation user.
在S103中,将各个所述文本矩阵导入至预设的特征向量计算模型,得到所述目标消息的文本特征向量。In S103, each of the text matrices is imported into a preset feature vector calculation model to obtain a text feature vector of the target message.
在本实施例中,由于虚假消息的一个重要特征是传播速度的爆发性以及广泛性,而包含虚假消息的载体文本也同样会具备上述两个特征。而各个载体文本根据对应的传播用户的标识生成的文本矩阵,可以表征出该载体文本在用户传播的角度上的相关特征,判断是否存在爆发性以及广泛性,若存在,则表示该载体文本很可能携带有虚假消息,由于一个载体文本可能包含多种不同消息,为了确定产生爆发性传播是否由目标消息造成,需要对各个载体文本的文本矩阵进行识别。因此,识别设备在生成了各个载体文本的文本矩阵后,需要将各个文本矩阵导入到预设的特征向量计算模型,确定该目标消息的文本特征向量,作为识别该目标消息真伪性的参考参数之一。In this embodiment, since an important feature of the spurious message is the explosiveness and extensiveness of the propagation speed, the carrier text containing the fake message also has the above two features. The text matrix generated by each carrier text according to the identifier of the corresponding propagating user can represent the relevant feature of the carrier text in the perspective of user propagation, and determine whether there is explosiveness and extensiveness. If it exists, it indicates that the carrier text is very It may carry a false message. Since a carrier text may contain a plurality of different messages, in order to determine whether the burst propagation is caused by the target message, the text matrix of each carrier text needs to be identified. Therefore, after the text matrix of each carrier text is generated, the identification device needs to import each text matrix into a preset feature vector calculation model, and determine a text feature vector of the target message as a reference parameter for identifying the authenticity of the target message. one.
需要说明的是,由于S102、S103是用于计算目标消息的文本特征向量,而S104以及S105是用于计算目标消息的用户特征向量,即上述两大类步骤之间并不存在先后次序,虚假消息的识别设备可以先执行S102和S103,再执行S104和S105;或者先执行S104和S105,再执行S102和S103。优选地,若识别设备可以并发双线程计算,则可以同时执行S102以及S104的操作。It should be noted that, since S102 and S103 are text feature vectors for calculating a target message, and S104 and S105 are user feature vectors for calculating a target message, that is, there is no order between the above two types of steps, and false The identification device of the message may first execute S102 and S103, and then execute S104 and S105; or execute S104 and S105 first, and then perform S102 and S103. Preferably, if the identification device can concurrently perform dual-thread calculation, the operations of S102 and S104 can be performed simultaneously.
在S104中,根据所有所述载体文本的传播路径,生成关于所述目标消息的用户传播矩阵;所述用户传播矩阵中包含的各元素具体为每个所述传播用户传播的载体文本的个数。In S104, a user propagation matrix about the target message is generated according to a propagation path of all the carrier texts; each element included in the user propagation matrix is specifically a number of carrier texts propagated by each of the propagation users. .
在本实施例中,如上所述,一个传播用户可以同时传播多个包含目标消息的载体文本,因此,为了确定各个传播用户传播载体文本的数量,需要根据各个载体文本的传播路径,统计每个传播用户传播载体文本的个数,并得到该目标消息对应的用户传播矩阵。对于虚假消息,一般是由虚假消息的产生用户,即造谣者,有意识地不断散播有关虚假消息的载体文本,即对于造谣者其传播载体文件的数量会占载体文本总传播量的较大比例,而非造谣者的普通传播用户,其传播载体文本的数量有限,是一个零散的传播行为,因此通过用户传播矩阵,可以较好地体现是否有造谣者恶意散播目标消息,从而判断该目标消息是否为虚假消息。In this embodiment, as described above, one propagating user can simultaneously transmit a plurality of carrier texts including the target message. Therefore, in order to determine the number of propagating carrier texts of each propagating user, it is necessary to count each according to the propagation path of each carrier text. The number of texts transmitted by the user is propagated, and the user propagation matrix corresponding to the target message is obtained. For false messages, the user of the false message is generally generated by the user, that is, the creator, consciously and continuously disseminating the carrier text of the false message, that is, for the rumor, the number of the carrier files will account for a large proportion of the total amount of the carrier text. The general propagating user of the non-executor has a limited number of propagating carrier texts, which is a scattered propagating behavior. Therefore, through the user propagation matrix, it can better reflect whether the rumor maliciously spreads the target message, thereby judging whether the target message is For false news.
可选地,识别设备可以创建一个传播用户网状图,并根据各个载体文本的传播路径,在该传播用户网状图上绘制各个载体文本的传播路径,若该传播路径经过网状图中的一个传播用户,则对该传播用户的传播文本个数上进行加1操作,从而对所有传播路径进行绘制之后,则可以确定各个传播用户传播的载体文本的个数,生成该用户传播矩阵。Optionally, the identification device may create a propagation user network map, and draw a propagation path of each carrier text on the propagation user network map according to a propagation path of each carrier text, if the propagation path passes through the network diagram A propagating user adds 1 to the number of propagating texts of the propagating user, so that after all the propagating paths are drawn, the number of carrier texts propagated by each propagating user can be determined, and the user propagating matrix is generated.
优选地,在本实施例,在用户传播矩阵中各个传播用户在矩阵中的次序,与传播路径上的传播次序一致。即若某一传播用户为载体文件的作者,即首位传播者,则在传播用户矩阵的次序为1,以此类推。若多个用户处于同一传播次序,可以基于传播载体文本的个数,再次对相同传播次序的传播用户进行排序,还可以由相同传播次序的传播用户传播的载体文本的个数构成的数组,作为该用户传播矩阵中该次序的元素。Preferably, in the present embodiment, the order of the respective propagating users in the matrix in the user propagation matrix is consistent with the order of propagation on the propagation path. That is, if a propagating user is the author of the carrier file, that is, the first communicator, the order in which the user matrix is propagated is 1, and so on. If multiple users are in the same propagation order, the propagation users of the same propagation order may be sorted again based on the number of propagation carrier texts, and an array of the number of carrier texts propagated by the propagation users of the same propagation order may be used as an array. The user propagates the elements of the order in the matrix.
在S105中,将所述用户传播矩阵导入到预设的用户特征计算模型,得到所述目标消息对应的用户传播特征向量。In S105, the user propagation matrix is imported into a preset user feature calculation model to obtain a user propagation feature vector corresponding to the target message.
在本实施例中,通过用户传播矩阵可以确定该目标消息在传播用户之间的传播规律,为了提取得到关于传播用户的用户传播特征,识别设备会将目标消息的用户传播矩阵导入到用户特征计算模型,确定关于该目标消息的用户传播特征向量,通过该用户传播特征向量是否符合虚假消息的用户传播特征,因此可以作为后续计算真伪指数的参考参数之一。In this embodiment, the propagation rule of the target message between the propagation users can be determined by the user propagation matrix. In order to extract the user propagation characteristics about the propagation user, the identification device imports the user propagation matrix of the target message into the user feature calculation. The model determines a user propagation feature vector for the target message, and whether the feature vector conforms to the user propagation feature of the fake message by the user, and thus can be used as one of the reference parameters for the subsequent calculation of the authenticity index.
在S106中,根据所述用户传播特征向量以及所述文本特征向量,计算所述目标消息的真伪指数。In S106, an authenticity index of the target message is calculated according to the user propagation feature vector and the text feature vector.
在本实施例中,识别设备在确定了目标消息的用户传播特征向量以及文本特征向量后,可以计算该目标消息的真伪指数。其中,计算的具体方式可以为:将用户传播特征向量以及文本特征向量导入到预设的真伪指数计算模型,通过该真伪指数计算模型转换后,得到目标消息的真伪指数。优选地,该真伪指数计算模型可以为一神经网络。管理员通过训练消息生成对应的用户传播特征向量以及文本特征向量,导入到该计算真伪指数的神经网络内,并调整该神经网络中的各个参数,以使该神经网络的损失函数的值最小,则将调整后的神经网络作为真伪指数计算模型。具体地,该神经网络的损失函数的表达式具体为:In this embodiment, after determining the user propagation feature vector of the target message and the text feature vector, the identification device may calculate the authenticity index of the target message. The specific manner of the calculation may be: importing the user propagation feature vector and the text feature vector into a preset authenticity index calculation model, and obtaining the authenticity index of the target message by converting the authenticity index calculation model. Preferably, the authenticity index calculation model can be a neural network. The administrator generates a corresponding user propagation feature vector and a text feature vector through the training message, and imports it into the neural network for calculating the authenticity index, and adjusts each parameter in the neural network to minimize the value of the loss function of the neural network. Then, the adjusted neural network is used as the calculation model of the authenticity index. Specifically, the expression of the loss function of the neural network is specifically:
Figure PCTCN2018097540-appb-000001
Figure PCTCN2018097540-appb-000001
其中,L j是训练消息的实际真伪指数。
Figure PCTCN2018097540-appb-000002
是预设的正则项。
Figure PCTCN2018097540-appb-000003
为训练消息的用户传播特征向量以及文本特征向量导入到真伪指数计算模型后,计算得到的真伪指数。N为训练消息的总个数。
Where L j is the actual authenticity index of the training message.
Figure PCTCN2018097540-appb-000002
Is the default regular item.
Figure PCTCN2018097540-appb-000003
After the feature vector of the training message is propagated and the text feature vector is imported into the authenticity index calculation model, the obtained authenticity index is calculated. N is the total number of training messages.
可选地,除了通过真伪指数计算模型确定该目标消息的真伪指数外,还可以将文本特征向量以及用户特征向量中包含的各个参数值与预设的虚假参数范围进行比对,统计参数值落入到虚假参数范围的参数值个数,将该参数值个数作为该目标消息的真伪指数,通过真伪指数可以表征出目标消息与虚假消息之间的相似度。Optionally, in addition to determining the authenticity index of the target message by using the authenticity index calculation model, the text feature vector and each parameter value included in the user feature vector may be compared with a preset false parameter range, and the statistical parameter is The value of the parameter falls within the range of the false parameter, and the number of the parameter is used as the authenticity index of the target message. The authenticity index can be used to characterize the similarity between the target message and the fake message.
在S107中,若所述真伪指数在预设的虚假指数范围内,则识别所述目标消息为虚假消息。In S107, if the authenticity index is within a preset false index range, the target message is identified as a fake message.
在本实施例中,识别设备只有虚假指数范围,若某一目标消息计算得到的真伪指数在虚假指数范围内,则表示该目标消息在文本特征以及传播用户特征两个方面均符合虚假消息的特点,因而会识别该目标消息为虚假消息;反之,若该目标消息的真伪指数在虚假指数范围外,则表示该目标消息与虚假消息的特征不相符,识别该目标消息为真实消息。In this embodiment, the identification device has only a false index range. If the authenticity index calculated by a target message is within the false index range, it indicates that the target message conforms to the false message in both the text feature and the propagation user feature. The feature, therefore, identifies the target message as a false message; conversely, if the authenticity index of the target message is outside the range of the false index, it indicates that the target message does not match the feature of the fake message, and the target message is identified as a real message.
以上可以看出,本申请实施例提供的一种虚假消息的识别方法通过获取包含目标消息的所有载体文本,以及各个载体文本的传播路径,通过载体文本以及传播路径中包含的传播用户的标识,得到各个载体文本的文本矩阵,并通过多个文本矩阵得到该目标消息的文本特征向量;与此同时,通过各个载体文本的传播路径,得到用户传播矩阵,继而计算得到该目标消息的用户传播特征向量;最后,基于用户传播特征向量以及文本特征向量计算该目标消息的真伪指数,通过真伪指数识别该目标消息是否为虚假消息。与现有的虚假消息识别技术相比,本实施例无需人工调研取证,从而减少了人工成本以及调查所需的时间,而是可以通过采集传递该目标消息的载体文本的文本特征以及对传播过该目标消息的各个传播用户的用户特征进行分析,其中,通过文本特征向量可以表现出该目标消息是否具有煽动特性,通过用户特征向量可以表现出该目标消息在传播的过程中是否具有爆发传播性,通过上述两个特征向量则可得到该目标消息的虚假指数,从而识别得到该目标消息是否为虚假消息,提高了虚假消息的识别准确率。It can be seen that the method for identifying a fake message provided by the embodiment of the present application obtains all the carrier texts including the target message, and the propagation path of each carrier text, and the carrier text and the identifier of the propagating user included in the propagation path. Obtaining a text matrix of each carrier text, and obtaining a text feature vector of the target message through a plurality of text matrices; at the same time, obtaining a user propagation matrix through the propagation path of each carrier text, and then calculating a user propagation characteristic of the target message Finally, the authenticity index of the target message is calculated based on the user propagation feature vector and the text feature vector, and the authenticity index is used to identify whether the target message is a false message. Compared with the existing false message recognition technology, the present embodiment does not require manual research and forensics, thereby reducing the labor cost and the time required for the investigation, and can collect the text features of the carrier text of the target message and the dissemination. The user characteristics of each of the propagating users of the target message are analyzed, wherein the text feature vector can indicate whether the target message has a swaying characteristic, and the user eigenvector can indicate whether the target message has burst propagation in the process of propagation. Through the above two feature vectors, the false index of the target message can be obtained, thereby identifying whether the target message is a false message, and improving the recognition accuracy of the fake message.
图2示出了本申请第二实施例提供的一种虚假消息的识别方法S103的具体实现流程图。参见图2所示,相对于图1所述实施例,本实施例提供的一种虚假消息的识别方法中S103包括S1031~S1034,具体详述如下:FIG. 2 is a flowchart showing a specific implementation of the method S103 for identifying a fake message according to the second embodiment of the present application. Referring to FIG. 2, in the method for identifying a fake message provided by the embodiment, S103 includes S1031 to S1034, and the details are as follows:
在S1031中,分别获取各个所述载体文本的传播次数、内容特征参数以及传播时间参数。In S1031, the number of times of propagation of each of the carrier texts, content feature parameters, and propagation time parameters are respectively acquired.
在本实施例中,为了提高文本特征向量的准确性,虚假消息的识别设备除了获取各个载体文本的传播路径之外,还会获取该载体文本的传播次数、内容特征参数以及传播时间参数,对该载体文本的多个方面进行真伪属性的判别。In this embodiment, in order to improve the accuracy of the text feature vector, the recognition device of the fake message acquires the propagation times of the carrier text, the content feature parameters, and the propagation time parameters, in addition to the propagation path of each carrier text. Multiple aspects of the carrier text are discriminated against authenticity attributes.
具体地,该传播次数除了用户转发载体文本的次数外,还包括传播用户评论载体文本的次数以及对载体文本进行点赞的次数,即对载体文本传播产生贡献的各种行为的次数。该内容特征参数具体用于表示该载体文本所需表达的内容信息,提取的方式可以如S102中所述,通过确定该载体文本包含的关键词,继而通过该提取得到的关键词确定该载体文本的内容特征参数。而传播时间参数包括但不限于以下至少一种:载体文本创建时间、平均传播间隔、总传播时长等。Specifically, the number of times of propagation includes, in addition to the number of times the user forwards the carrier text, the number of times the user comments the carrier text and the number of times the carrier text is liked, that is, the number of times various behaviors contributing to the propagation of the vector text. The content feature parameter is specifically used to represent the content information that is required to be expressed by the carrier text, and the manner of extracting may be determined by determining the keyword included in the carrier text, and then determining the carrier text by using the extracted keyword, as described in S102. Content feature parameters. The propagation time parameters include, but are not limited to, at least one of the following: carrier text creation time, average propagation interval, total propagation duration, and the like.
在S1032中,基于所述传播时间参数对各个所述载体文本进行排序,确定各个所述载体文本的导入次序。In S1032, each of the carrier texts is sorted based on the propagation time parameter, and an order of introduction of each of the carrier texts is determined.
由于本实施例采用的多层反馈循环神经网络来确定目标消息的文本特征向量,因此需要预设确定各个载体文本导入到该多层循环神经网络的导入次序,即所在的循环层级。其中,若多层反馈循环神经网络的层级大于该载体文本的数量,则在导入操作时,减少多层循环神经网络的层数,以使与载体文本的个数相匹配。Since the multi-layer feedback loop neural network used in this embodiment determines the text feature vector of the target message, it is necessary to preset to determine the import order of each carrier text to the multi-layer loop neural network, that is, the loop level at which it is located. Wherein, if the level of the multi-layer feedback loop neural network is greater than the number of the carrier text, the number of layers of the multi-layer loop neural network is reduced during the import operation to match the number of the carrier text.
在本实施例中,识别设备会根据传播时间参数确定各个载体文本的导入次序,其中确定导入次序的方式基于传播时间参数所包含的参数类型不同而不同。例如,若传播时间参数为载体文本的创建时间,则可以根据创建时间的先后次序确定各个载体文本的导入次序;若传播时间参数为总传播时长,则可以根据总传播时长的时间长度的大小次序,确定各个载体文本的导入次序。In this embodiment, the identification device determines the import order of the respective carrier texts according to the propagation time parameters, wherein the manner of determining the import order differs based on the types of parameters included in the propagation time parameters. For example, if the propagation time parameter is the creation time of the carrier text, the import order of each carrier text may be determined according to the order of creation time; if the propagation time parameter is the total propagation duration, the size may be based on the length of the total propagation duration. To determine the order in which the individual carrier texts are imported.
在S1033中,将所述传播次数、所述内容特征参数、所述传播时间参数以及所述文本矩阵,导入到文本时序向量转换模型,得到各个所述载体文本的文本时序向量;所述文本时序向量转换模型具体为:In S1033, the number of propagation times, the content feature parameter, the propagation time parameter, and the text matrix are imported into a text time series vector conversion model to obtain a text timing vector of each of the carrier texts; The vector conversion model is specifically:
Figure PCTCN2018097540-appb-000004
Figure PCTCN2018097540-appb-000004
其中,
Figure PCTCN2018097540-appb-000005
为导入次序为t的载体文本的所述文本时序向量;η为所述传播次数;ΔT为所述传播时间参数;x u为所述文本矩阵;x t为导入次序为t的载体文本的融合矩阵;x τ为所述内容特征参数;W a以及b a为所述文本时序向量转换模型的预设调整系数。
among them,
Figure PCTCN2018097540-appb-000005
To introduce the text time series vector of the carrier text of order t; η is the number of propagations; ΔT is the propagation time parameter; x u is the text matrix; x t is the fusion of the carrier text with the import order t a matrix; x τ is the content feature parameter; W a and b a are preset adjustment coefficients of the text time series vector conversion model.
在本实施例中,识别设备首先根据传播次数、内容特征参数、传播时间参数以及文本矩阵,构建该载体文本的文本特征矩阵,即上述的x t,构建的方式可以在文本矩阵的基础上,添加3个矩阵行,分别用于存储传播次数、内容特征参数以及传播时间参数三组特征量,即文本矩阵若为n维矩阵,则其对应的文本特征矩阵为n+3维矩阵。 In this embodiment, the identification device first constructs a text feature matrix of the carrier text according to the number of propagations, the content feature parameter, the propagation time parameter, and the text matrix, that is, the above-mentioned x t , and the manner of constructing may be based on the text matrix. Three matrix rows are added to store the three sets of feature quantities: the number of propagation times, the content feature parameters and the propagation time parameters. If the text matrix is an n-dimensional matrix, the corresponding text feature matrix is an n+3 dimensional matrix.
由于多层循环神经网络是一个具备时序关系的神经网络,因此在导入之间需要对文本特征矩阵进行时序转换,即确定该载体文本的文本时序向量。本实施例采用tanh函数是由于该函数具有较好的非线性性,与时序特性较为匹配。因此,识别设备会将文本特征矩阵导入到tanh函数内,确定各个载体文本对应的文本时序向量。Since the multi-layer cyclic neural network is a neural network with timing relationship, it is necessary to perform time-series conversion between the text feature matrices, that is, to determine the text timing vector of the carrier text. The tanh function is used in this embodiment because the function has good nonlinearity and is matched with the timing characteristics. Therefore, the recognition device will import the text feature matrix into the tanh function to determine the text timing vector corresponding to each carrier text.
在S1034中,基于所述导入次序,将各个所述载体文本的文本时序向量导入到多层反馈循环神经网络的各层级,得到所述目标消息的文本特征向量;所述多层反馈循环神经网络具体为:In S1034, based on the importing order, text timing vectors of each of the carrier texts are imported into each layer of the multi-layer feedback loop neural network to obtain a text feature vector of the target message; the multi-layer feedback loop neural network Specifically:
Figure PCTCN2018097540-appb-000006
Figure PCTCN2018097540-appb-000006
其中,h 0为预设的初始文本向量;
Figure PCTCN2018097540-appb-000007
为各个所述载体文本的文本时序向量;h 1、h 2…h t-1为所述多层反馈循环神经网络各层级输出的文本特征迭代中间值;h t为所述目标消息的文本特征向量;W、U、b为调整系数。
Where h 0 is a preset initial text vector;
Figure PCTCN2018097540-appb-000007
a text timing vector for each of the carrier texts; h 1 , h 2 ... h t-1 are text feature iteration intermediate values output by each level of the multi-layer feedback loop neural network; h t is a text feature of the target message Vector; W, U, b are adjustment coefficients.
在本实施例中,识别设备将各个载体文本的文本时序向量,基于该载体文本的导入次序,依次导入到多层反馈循环神经网络中的各个层级,每一层级的输出将作为下一层级的输入,将各个载体文本的时序特性不断叠加,从而计算得到的文本特征向量是基于各个载体文本叠加影响后输出的向量,充分融合各个文本的文本特征。In this embodiment, the identification device sequentially introduces the text timing vectors of the respective carrier texts into each level in the multi-layer feedback loop neural network based on the introduction order of the carrier text, and the output of each level is used as the next level. Input, the time series characteristics of each carrier text are continuously superimposed, and the calculated text feature vector is a vector based on the influence of each carrier text superposition, and fully integrates the text features of each text.
在本实施例中,识别设备将最后一层循环神经网络的输出作为该目标消息的文本特征向量。需要说明的是,识别设备在提取多层循环神经网络之前,会根据目标消息的载体文本的个数调整各多层循环神经网络的层级,以使其层级与载体文本的个数相匹配。In this embodiment, the identification device uses the output of the last layer of the cyclic neural network as the text feature vector of the target message. It should be noted that before the multi-layer loop neural network is extracted, the identification device adjusts the level of each multi-layer cyclic neural network according to the number of carrier texts of the target message, so that the level matches the number of the carrier text.
在本申请实施例中,通过采集载体文本的多项参数值,确定各个载体文本的文本时序向量,并基于多层循环神经网络计算目标消息的文本特征向量,从而能够提高文本特征向量对于文本特性的丰富度,从而提高了虚假消息识别的准确率。In the embodiment of the present application, the text time vector of each carrier text is determined by collecting multiple parameter values of the carrier text, and the text feature vector of the target message is calculated based on the multi-layer cyclic neural network, thereby improving the text feature vector for the text characteristic. The richness of the ones, thus improving the accuracy of false message recognition.
图3示出了本申请第三实施例提供的一种虚假消息的识别方法S105的具体实现流程图。参见图3所示,相对于图1所述实施例,本实施例提供的一种虚假消息的识别方法S105还包括S1051~S1055,具体详述如下:FIG. 3 is a flowchart showing a specific implementation of a method for identifying a fake message S105 according to the third embodiment of the present application. As shown in FIG. 3, with respect to the embodiment shown in FIG. 1, the method S105 for identifying a fake message provided by this embodiment further includes S1051 to S1055, which are specifically described as follows:
在S1051中,对所述用户传播矩阵进行奇异值分解,得到各个所述传播用户的用户传播系数。In S1051, the user propagation matrix is subjected to singular value decomposition to obtain user propagation coefficients of each of the propagation users.
在本实施例中,由于用户传播矩阵是对于所有传播用户的而言的全局矩阵,若需要确定每一个传播用户的用户传播系数,则需要对用户传播矩阵进行奇异值分解,从而能够确定不同传播用户在对目标消息进行传播的贡献情况。具体地,若该用户传播矩阵为1*N的矩阵,则进行奇异分解的对角矩阵为1*1的正则矩阵,从而可以分解为N个1*1的矩阵,并识别为各个传播用户的用户传播系数。In this embodiment, since the user propagation matrix is a global matrix for all propagating users, if it is necessary to determine the user propagation coefficient of each propagating user, it is necessary to perform singular value decomposition on the user propagation matrix, thereby being able to determine different propagations. The contribution of the user to the dissemination of the target message. Specifically, if the user propagation matrix is a matrix of 1*N, the diagonal matrix of the singular decomposition is a regular matrix of 1*1, which can be decomposed into N 1*1 matrices and identified as each of the propagation users. User propagation factor.
在S1052中,将各个所述用户传播系数分别导入到传播特征向量转换模型,确定各个所述传播用户的用户特征向量;所述用户特征向量转换模型具体为:In S1052, each of the user propagation coefficients is respectively imported into a propagation feature vector transformation model to determine a user feature vector of each of the propagation users; the user feature vector transformation model is specifically:
Figure PCTCN2018097540-appb-000008
Figure PCTCN2018097540-appb-000008
其中,s i为第i个所述传播用户的用户特征向量;y i为第i个所述传播用户的用户传播系数;
Figure PCTCN2018097540-appb-000009
为第i个所述传播用户的用户时序向量;W u、b u
Figure PCTCN2018097540-appb-000010
以及b s为所述用户特征向量转换模型的预设系数;e为自然对数。
Where s i is the user feature vector of the i-th propagation user; y i is the user propagation coefficient of the i-th propagation user;
Figure PCTCN2018097540-appb-000009
The user timing vector for the i-th propagation user; W u , b u ,
Figure PCTCN2018097540-appb-000010
And b s is a preset coefficient of the user feature vector conversion model; e is a natural logarithm.
在本实施例中,识别设备首先将计算得到的各个传播用户的用户传播系数进行时域变换,从而得到各个传播用户的用户时序向量,即
Figure PCTCN2018097540-appb-000011
如上所述,由于tanh函数的非线性性,与时序特性具有较好的匹配度,因此在S1052中对用户传播系数进行时域转换时,同样采用tanh函数,为了适应用户特征向量的需求,会对其中的预设系数进行调整,即为W u以及b u
In this embodiment, the identification device first performs time domain transformation on the calculated user propagation coefficients of the respective propagation users, thereby obtaining a user timing vector of each propagation user, that is,
Figure PCTCN2018097540-appb-000011
As described above, due to the nonlinearity of the tanh function, it has a good matching degree with the timing characteristics. Therefore, when the user propagation coefficient is time-domain converted in S1052, the tanh function is also used, in order to meet the needs of the user feature vector, wherein the predetermined coefficient for adjusting, i.e. W u and b u.
在本实施例中,识别设备在确定了各个传播用户的用户时序向量后,会通过signal函数,即
Figure PCTCN2018097540-appb-000012
确定各个用户时序向量所对应的用户特征向量,其中
Figure PCTCN2018097540-appb-000013
以及b s为预先设置的参数值。
In this embodiment, after the identification device determines the user timing vector of each propagation user, the identification device passes the signal function, that is,
Figure PCTCN2018097540-appb-000012
Determining a user feature vector corresponding to each user timing vector, wherein
Figure PCTCN2018097540-appb-000013
And b s is a preset parameter value.
在S1053中,基于各个所述传播用户的用户特征向量,生成用户特征矩阵。In S1053, a user feature matrix is generated based on user feature vectors of each of the propagation users.
在本实施例中,识别设备在确定了各个传播用户的用户特征向量后,则可以确定每个传播用户的用户特征向量,例如通过用户特征向量识别该用户是否为造谣用户或者普通传播用户,因此通过各个用户特征向量所构成的用户特征矩阵,能够直观地确定传播目标消息的所有用户的用户性质,从而提高识别目标消息是否为虚假消息的效率。In this embodiment, after the identification device determines the user feature vector of each propagation user, the user feature vector of each propagation user may be determined, for example, by using the user feature vector to identify whether the user is an rumor user or a general communication user, Through the user feature matrix formed by each user feature vector, the user property of all users of the propagation target message can be intuitively determined, thereby improving the efficiency of identifying whether the target message is a fake message.
具体地,若传播目标消息的多个传播用户的用户特性向量与造谣用户的特性向量相匹配,则可以确定该目标消息主要是有造谣者进行传播,表示该目标消息为虚假消息的可能性较高。Specifically, if the user characteristic vector of the multiple propagation users of the propagation target message matches the characteristic vector of the rumor user, it may be determined that the target message is mainly propagated by the creator, indicating that the target message is a false message. high.
在S1054中,根据文本矩阵得到各个所述载体文本的掩码向量,并将所述掩码向量以及所述用户特征矩阵导入到用户传播特征值计算模型,确定各个所述载体文本的用户传播特征值;所述用户传播特征值计算模型具体为:In S1054, a mask vector of each of the carrier texts is obtained according to a text matrix, and the mask vector and the user feature matrix are imported into a user propagation feature value calculation model to determine user propagation characteristics of each of the carrier texts. Value; the user propagation feature value calculation model is specifically:
Figure PCTCN2018097540-appb-000014
Figure PCTCN2018097540-appb-000014
其中,[s i]为所述用户特征矩阵;m j为第j篇所述载体文本的掩码向量;p j为第j篇所述载体文本的用户传播特征值;d([s i]*m j)为非空元素统计函数。 Where [s i ] is the user feature matrix; m j is the mask vector of the carrier text described in the jth article; p j is the user propagation feature value of the carrier text described in the jth article; d([s i ] *m j ) is a non-empty element statistical function.
在本实施例中,由于文本矩阵是基于传播用户的标识生成的,若文本矩阵中第i个元素为非空,则表示第i个用户传播过该载体文本。因此,为了确定各个载体文本的用户传播特征值,首先需要确定哪些用户传播过该载体文件,即生成上述的掩码向量。举例性地,若某一载体文本的文本矩阵为[5,0,0,5,0,7,5,6],则表示有五个传播用户对该载体文本进行过传播操作,因此其对应的掩码向量为:[1,0,0,1,0,1,1,1],从而通过该掩码向量可以从用户特征矩阵中提取出与该载体文本关联的各个传播用户的用户特征向量,即获取得到[s i]*m jIn this embodiment, since the text matrix is generated based on the identifier of the propagating user, if the i-th element in the text matrix is non-empty, it indicates that the i-th user has propagated the carrier text. Therefore, in order to determine the user-propagation feature values of the individual carrier texts, it is first necessary to determine which users have propagated the carrier file, ie to generate the mask vector described above. For example, if the text matrix of a certain carrier text is [5, 0, 0, 5, 0, 7, 5, 6], it means that five propagating users have propagated the carrier text, so the corresponding The mask vector is: [1,0,0,1,0,1,1,1], so that the user characteristics of each propagating user associated with the carrier text can be extracted from the user feature matrix by the mask vector. Vector, that is, get [s i ]*m j .
在本实施例中,识别设备在确定了对载体文本有传播贡献的传播用户后,会计算各个用户传播向量的均值,因此通过d([s i]*m j)函数统计[s i]*m j中非空元素的个数,从而计算得到的用户传播特征值为各个用户特性向量的均值。 After the present embodiment, the identification device identified by a propagated contribution to support text propagating user, calculates the respective users spread of the mean vector, and thus statistics [s i] by d ([s i] * m j) functions * The number of non-empty elements in m j , so that the calculated user propagation feature value is the mean of each user characteristic vector.
在S1055中,根据各个所述用户传播特征值,生成所述目标消息的用户传播特征向量。In S1055, a user propagation feature vector of the target message is generated according to each of the user propagation feature values.
在本实施例中,识别设备在确定了所有载体文本的用户传播特征值后,则会把所有用户传播特征值进行聚合,构成目标消息对应的用户传播特征向量。In this embodiment, after determining the user propagation feature values of all the carrier texts, the identification device aggregates all the user propagation feature values to form a user propagation feature vector corresponding to the target message.
在本申请实施例中,通过计算各个传播用户的用户特征向量,并基于该用户特征向量确定各个载体文本的平均用户特征向量,即上述的用户传播特征值,从而使得用户传播特征向量不仅具有用户特征,还包含了对于载体文本的传播特征,从而提高了虚假消息识别的准确率。In the embodiment of the present application, the user feature vector of each propagation user is calculated, and the average user feature vector of each carrier text, that is, the user propagation feature value, is determined based on the user feature vector, so that the user propagation feature vector not only has the user The feature also includes the propagation characteristics of the carrier text, thereby improving the accuracy of false message recognition.
图4a示出了本申请第四实施例提供的一种虚假消息的识别方法S106的具体实现流程图。参见图4a所示,相对于图1~图3所述实施例,本实施例提供的一种虚假消息的识别方法中所述根据所述用户传播特征向量以及所述文本特征向量,计算所述目标标签的真伪指数,包括S1061~S1062,具体详述如下:FIG. 4a is a flowchart showing a specific implementation of a method for identifying a fake message S106 according to the fourth embodiment of the present application. Referring to FIG. 4a, the method for identifying a fake message according to the embodiment of the present invention provides the method for calculating the false message according to the user propagation feature vector and the text feature vector. The authenticity index of the target tag, including S1061~S1062, is as follows:
进一步地,所述根据所述用户传播特征向量以及所述文本特征向量,计算所述目标标签的真伪指数,包括:Further, the calculating the authenticity index of the target tag according to the user propagation feature vector and the text feature vector, including:
在S1061中,将所述用户传播特征向量以及所述文本特征向量进行聚合,得到所述目标消息的真伪识别矩阵。In S1061, the user propagation feature vector and the text feature vector are aggregated to obtain an authenticity recognition matrix of the target message.
在本实施例中,识别设备在确定了用户传播向量以及文本特征向量后,会将上述两个向量进行聚合 操作,构成包含上述两类特征的真伪识别矩阵。具体地,若用户传播向量为一n 1*m 1的矩阵,而文本特征向量为一n 2*m 2的矩阵,则聚合得到的真伪识别矩阵为(n 1+n 2)*max(m 1,m 2),其中,若聚合后的真伪识别矩阵存在空白的元素,则可用预设字符进行填充,优选地,该预设字符为0。 In this embodiment, after the user propagation vector and the text feature vector are determined, the identification device performs an aggregation operation on the two vectors to form an authenticity recognition matrix including the above two types of features. Specifically, if the user propagation vector is a matrix of n 1 *m 1 and the text feature vector is a matrix of n 2 *m 2 , the authenticity recognition matrix obtained by the aggregation is (n 1 +n 2 )*max( m 1 , m 2 ), wherein if the authenticity recognition matrix after the aggregation has a blank element, the preset character can be used for filling, and preferably, the preset character is 0.
在S1062中,将所述真伪识别矩阵导入真伪指数计算模型,得到所述目标消息的真伪指数;所述真伪指数计算模型具体为:In S1062, the authenticity recognition matrix is introduced into the authenticity index calculation model to obtain an authenticity index of the target message; the authenticity index calculation model is specifically:
Figure PCTCN2018097540-appb-000015
Figure PCTCN2018097540-appb-000015
其中,
Figure PCTCN2018097540-appb-000016
为所述真伪指数;[c j]为所述真伪识别矩阵;
Figure PCTCN2018097540-appb-000017
以及b c为所述真伪指数计算模型的预设系数;e为自然对数。
among them,
Figure PCTCN2018097540-appb-000016
The authenticity index; [c j ] is the authenticity recognition matrix;
Figure PCTCN2018097540-appb-000017
And b c is a preset coefficient of the authenticity index calculation model; e is a natural logarithm.
在本实施例中,虚假消息的识别设备在确定了真实识别矩阵后,将该矩阵导入到指纹指数计算模型,该真伪指数计算模型具体为一signal函数,即函数
Figure PCTCN2018097540-appb-000018
其中,
Figure PCTCN2018097540-appb-000019
以及b c为所述真伪指数计算模型的预设系数,可以通过训练学习确定,还可以根据管理员的需求手动调整。
In this embodiment, after the identification device of the fake message determines the real recognition matrix, the matrix is imported into the fingerprint index calculation model, and the authenticity index calculation model is specifically a signal function, that is, a function.
Figure PCTCN2018097540-appb-000018
among them,
Figure PCTCN2018097540-appb-000019
And b c is a preset coefficient of the authenticity index calculation model, which can be determined through training and learning, and can also be manually adjusted according to the needs of the administrator.
举例性地,图4b示出了本申请实施例提供的一种真伪指数计算模型的计算框图。其中,[v j]文本特征向量,[c j]为所述真伪识别矩阵,[p j]为用户传播特征向量。 For example, FIG. 4b shows a calculation block diagram of a true and false index calculation model provided by an embodiment of the present application. Wherein, [v j ] text feature vector, [c j ] is the authenticity recognition matrix, and [p j ] is a user propagation feature vector.
在本申请实施例中,通过将文本特征向量以及用户传播特征向量进行聚合,从而得到真伪识别矩阵,从而可以将两项参数整合为一个参数,减少计算的次数,提高了真伪指数的计算效率。In the embodiment of the present application, the text feature vector and the user propagation feature vector are aggregated to obtain an authenticity recognition matrix, so that two parameters can be integrated into one parameter, the number of calculations is reduced, and the calculation of the authenticity index is improved. effectiveness.
图5示出了本申请第五实施例提供的一种虚假消息的识别方法S102的具体实现流程图。参见图5所示,相对于图1所述实施例,本实施例提供的一种虚假消息的识别方法S102还包括:S1021以及S1022,具体详述如下:FIG. 5 is a flowchart showing a specific implementation of a method for identifying a fake message S102 according to the fifth embodiment of the present application. As shown in FIG. 5, with respect to the embodiment shown in FIG. 1, the method S102 for identifying a fake message provided by this embodiment further includes: S1021 and S1022, which are specifically described as follows:
进一步地,在基于各个所述聚类接口,创建聚类接口调用服务之后,还包括:Further, after the clustering interface is invoked based on each of the clustering interfaces, the method further includes:
在S1021中,基于所述载体文本以及所述传播用户的标识,构建所述目标消息的全局传播矩阵[a ij] n×m;其中,所述a ij为第i个传播用户对于第j个载体文本的传播标记值;所述n为所述传播用户的个数;所述m为所述载体文本的个数;。 In S1021, a global propagation matrix [a ij ] n×m of the target message is constructed based on the carrier text and the identifier of the propagation user; wherein the a ij is the i-th propagation user for the jth a propagation tag value of the carrier text; the n is the number of the propagating users; the m is the number of the carrier texts;
在本实施例中,识别设备在获取了各个载体文本的传播路径后,则可以确定传播该载体文本的各个传播用户,并基于各个传播用户的用户编号,生成一个数列。对于每个载体文本均采用上述方式进行传播用户的统计操作,从而可以关键到关于目标消息的全局传播矩阵。其中,该全局传播矩阵中第i行的元素集合表示第i个传播用户传播了哪些载体文本;而该全局传播矩阵中第j列的元素集合表示第j个载体文本由哪些传播用户进行传播,从而通过该全局传播矩阵,可以确定目标消息在网络平台中传播情况,通过对列进行划分可以得到各个载体文本的传播信息,而通过对行进行划分则可以得到各个传播用户的传播信息。In this embodiment, after acquiring the propagation path of each carrier text, the identification device may determine each of the propagation users that propagate the carrier text, and generate a sequence based on the user numbers of the respective propagation users. For each carrier text, the statistical operation of the user is propagated in the above manner, so that it is critical to the global propagation matrix about the target message. Wherein, the set of elements of the i-th row in the global propagation matrix indicates which carrier texts the i-th propagation user propagated; and the set of elements of the j-th column in the global propagation matrix indicates which propagation users of the j-th carrier text are propagated, Therefore, the global propagation matrix can be used to determine the propagation of the target message in the network platform, and the propagation information of each carrier text can be obtained by dividing the columns, and the propagation information of each propagation user can be obtained by dividing the rows.
在本实施例中,a ij为第i个传播用户对于第j个载体文本的传播标记值,具体地,若第i个传播用户传播了第j个载体文本,则该传播标记值为1;反之,若第i个传播用户没有传播了第j个载体文本,则该传播标记值为0,由此构成了以1和0组成的全局传播矩阵[a ij] n×m,通过该全局传播矩阵可以查找任意传播用户对于各个载体文件的传播贡献。 In this embodiment, a ij is the propagation flag value of the i-th propagation user for the j-th carrier text, specifically, if the i-th propagation user propagates the j-th carrier text, the propagation flag value is 1; On the other hand, if the i-th propagation user does not propagate the j-th carrier text, the propagation flag value is 0, thereby constituting a global propagation matrix [a ij ] n×m composed of 1 and 0, through which the global propagation The matrix can look up the propagation contribution of any propagating user to individual carrier files.
在S1022中,将所述全局传播矩阵[a ij] n×m中各列构成的子矩阵作为各个所述载体文本的文本矩阵。 In S1022, a submatrix composed of each of the global propagation matrices [a ij ] n × m is used as a text matrix of each of the carrier texts.
在本实施例中,全局传播矩阵[a ij] n×m中,第i列的元素构成的集合即为第i个载体文本由哪些传播用户进行传播,因此可以将全局传播矩阵[a ij] n×m划分为m个子矩阵,每个子矩阵则为对应载体文本的文本矩阵。 In this embodiment, in the global propagation matrix [a ij ] n×m , the set of elements of the i-th column is the propagation of the i-th carrier text by which users, so the global propagation matrix [a ij ] can be n×m is divided into m sub-matrices, and each sub-matrix is a text matrix corresponding to the carrier text.
在本申请实施例中,通过构建全局传播矩阵,能够方便确定各个载体文本以及各个传播用户的传播情况,并且基于该全局传播矩阵可以划分得到各个载体文本的文本矩阵,提高了文本矩阵的生成效率。In the embodiment of the present application, by constructing a global propagation matrix, it is convenient to determine the propagation of each carrier text and each propagation user, and the text matrix of each carrier text can be divided based on the global propagation matrix, thereby improving the generation efficiency of the text matrix. .
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence of the steps in the above embodiments does not mean that the order of execution is performed. The order of execution of each process should be determined by its function and internal logic, and should not be construed as limiting the implementation process of the embodiments of the present application.
图6示出了本申请一实施例提供的一种虚假消息的识别设备的结构框图,该虚假消息的识别设备包括的各单元用于执行图1对应的实施例中的各步骤。具体请参阅图1与图1所对应的实施例中的相关描 述。为了便于说明,仅示出了与本实施例相关的部分。FIG. 6 is a structural block diagram of a device for identifying a fake message according to an embodiment of the present application. The device included in the device for identifying a fake message is used to execute each step in the embodiment corresponding to FIG. 1. For details, please refer to the related description in the embodiment corresponding to FIG. 1 and FIG. 1. For the convenience of explanation, only the parts related to the present embodiment are shown.
参见图6,所述虚假消息的识别设备包括:Referring to FIG. 6, the device for identifying a fake message includes:
目标消息参数获取单元61,用于获取包含目标消息的多个载体文本,以及各个所述载体文本的传播路径;所述传播路径包括传播所述载体文本的传播用户的标识;a target message parameter obtaining unit 61, configured to acquire a plurality of carrier texts including a target message, and a propagation path of each of the carrier texts; the propagation path includes an identifier of a propagation user that propagates the carrier text;
文本矩阵生成单元62,用于基于所述载体文本以及所述传播用户的标识,得到各个所述载体文本的文本矩阵;a text matrix generating unit 62, configured to obtain a text matrix of each of the carrier texts based on the carrier text and the identifier of the propagating user;
文本特征向量生成单元63,用于将各个所述文本矩阵导入至预设的特征向量计算模型,得到所述目标消息的文本特征向量;a text feature vector generating unit 63, configured to import each of the text matrices into a preset feature vector calculation model to obtain a text feature vector of the target message;
用户传播矩阵生成单元64,用于根据所有所述载体文本的传播路径,生成关于所述目标消息的用户传播矩阵;所述用户传播矩阵中包含的各元素具体为每个所述传播用户传播的载体文本的个数;a user propagation matrix generating unit 64, configured to generate a user propagation matrix about the target message according to a propagation path of all the carrier texts; each element included in the user propagation matrix is specifically propagated for each of the propagation users The number of vector texts;
用户传播特征向量计算单元65,用于将所述用户传播矩阵导入到预设的用户特征计算模型,得到所述目标消息对应的用户传播特征向量;The user propagation feature vector calculation unit 65 is configured to import the user propagation matrix into a preset user feature calculation model to obtain a user propagation feature vector corresponding to the target message.
真伪指数计算单元66,用于根据所述用户传播特征向量以及所述文本特征向量,计算所述目标消息的真伪指数;The authenticity index calculation unit 66 is configured to calculate an authenticity index of the target message according to the user propagation feature vector and the text feature vector;
虚假消息识别单元67,用于若所述真伪指数在预设的虚假指数范围内,则识别所述目标消息为虚假消息。The false message identifying unit 67 is configured to identify the target message as a fake message if the authenticity index is within a preset false index range.
可选地,所述文本特征向量生成单元63包括:Optionally, the text feature vector generating unit 63 includes:
文本参数获取单元,用于分别获取各个所述载体文本的传播次数、内容特征参数以及传播时间参数;a text parameter obtaining unit, configured to respectively acquire a propagation number, a content feature parameter, and a propagation time parameter of each of the carrier texts;
导入次序确定单元,用于基于所述传播时间参数对各个所述载体文本进行排序,确定各个所述载体文本的导入次序;And an import order determining unit, configured to sort each of the carrier texts based on the propagation time parameter, and determine an import order of each of the carrier texts;
文本时序向量计算单元,用于将所述传播次数、所述内容特征参数、所述传播时间参数以及所述文本矩阵,导入到文本时序向量转换模型,得到各个所述载体文本的文本时序向量;所述文本时序向量转换模型具体为:a text timing vector calculation unit, configured to import the number of propagation times, the content feature parameter, the propagation time parameter, and the text matrix into a text time series vector conversion model to obtain a text timing vector of each of the carrier texts; The text time series vector conversion model is specifically:
Figure PCTCN2018097540-appb-000020
Figure PCTCN2018097540-appb-000020
其中,
Figure PCTCN2018097540-appb-000021
为导入次序为t的载体文本的所述文本时序向量;η为所述传播次数;ΔT为所述传播时间参数;x u为所述文本矩阵;x t为导入次序为t的载体文本的融合矩阵;x τ为所述内容特征参数;W a以及b a为所述文本时序向量转换模型的预设调整系数;
among them,
Figure PCTCN2018097540-appb-000021
To introduce the text time series vector of the carrier text of order t; η is the number of propagations; ΔT is the propagation time parameter; x u is the text matrix; x t is the fusion of the carrier text with the import order t a matrix; x τ is the content feature parameter; W a and b a are preset adjustment coefficients of the text time series vector conversion model;
文本特征向量计算单元,用于基于所述导入次序,将各个所述载体文本的文本时序向量导入到多层反馈循环神经网络的各层级,得到所述目标消息的文本特征向量;所述多层反馈循环神经网络具体为:a text feature vector calculation unit, configured to import text timing vectors of each of the carrier texts into each layer of the multi-layer feedback loop neural network based on the import order, to obtain a text feature vector of the target message; The feedback cyclic neural network is specifically:
Figure PCTCN2018097540-appb-000022
Figure PCTCN2018097540-appb-000022
其中,h 0为预设的初始文本向量;
Figure PCTCN2018097540-appb-000023
为各个所述载体文本的文本时序向量;h 1、h 2…h t-1为所述多层反馈循环神经网络各层级输出的文本特征迭代中间值;h t为所述目标消息的文本特征向量;W、U、b为调整系数。
Where h 0 is a preset initial text vector;
Figure PCTCN2018097540-appb-000023
a text timing vector for each of the carrier texts; h 1 , h 2 ... h t-1 are text feature iteration intermediate values output by each level of the multi-layer feedback loop neural network; h t is a text feature of the target message Vector; W, U, b are adjustment coefficients.
可选地,用户传播特征向量计算单元65包括:Optionally, the user propagation feature vector calculation unit 65 includes:
传播系数确定单元,用于对所述用户传播矩阵进行奇异值分解,得到各个所述传播用户的用户传播系数;a propagation coefficient determining unit, configured to perform singular value decomposition on the user propagation matrix to obtain a user propagation coefficient of each of the propagation users;
用户特征向量计算单元,用于将各个所述用户传播系数分别导入到传播特征向量转换模型,确定各个所述传播用户的用户特征向量;所述用户特征向量转换模型具体为:a user feature vector calculation unit, configured to import each of the user propagation coefficients into a propagation feature vector transformation model to determine a user feature vector of each of the propagation users; the user feature vector transformation model is specifically:
Figure PCTCN2018097540-appb-000024
Figure PCTCN2018097540-appb-000024
其中,s i为第i个所述传播用户的用户特征向量;y i为第i个所述传播用户的用户传播系数;
Figure PCTCN2018097540-appb-000025
为第i个所述传播用户的用户时序向量;W u、b u
Figure PCTCN2018097540-appb-000026
以及b s为所述用户特征向量转换模型的预设系数;e为自然对数;
Where s i is the user feature vector of the i-th propagation user; y i is the user propagation coefficient of the i-th propagation user;
Figure PCTCN2018097540-appb-000025
The user timing vector for the i-th propagation user; W u , b u ,
Figure PCTCN2018097540-appb-000026
And b s is a preset coefficient of the user feature vector conversion model; e is a natural logarithm;
用户特征矩阵生成单元,用于基于各个所述传播用户的用户特征向量,生成用户特征矩阵;a user feature matrix generating unit, configured to generate a user feature matrix based on user feature vectors of each of the propagation users;
用户传播特征值计算单元,用于根据文本矩阵得到各个所述载体文本的掩码向量,并将所述掩码向量以及所述用户特征矩阵导入到用户传播特征值计算模型,确定各个所述载体文本的用户传播特征值;所述用户传播特征值计算模型具体为:a user propagation feature value calculation unit, configured to obtain a mask vector of each of the carrier texts according to a text matrix, and import the mask vector and the user feature matrix into a user propagation feature value calculation model to determine each of the carriers The user of the text propagates the feature value; the user propagation feature value calculation model is specifically:
Figure PCTCN2018097540-appb-000027
Figure PCTCN2018097540-appb-000027
其中,[s i]为所述用户特征矩阵;m j为第j篇所述载体文本的掩码向量;p j为第j篇所述载体文本的用户传播特征值;d([s i]*m j)为非空元素统计函数; Where [s i ] is the user feature matrix; m j is the mask vector of the carrier text described in the jth article; p j is the user propagation feature value of the carrier text described in the jth article; d([s i ] *m j ) is a non-empty element statistical function;
用户传播特征向量确定单元,用于根据各个所述用户传播特征值,生成所述目标消息的用户传播特征向量。And a user propagation feature vector determining unit, configured to generate a user propagation feature vector of the target message according to each of the user propagation feature values.
可选地,真伪指数计算单元66包括:Optionally, the authenticity index calculation unit 66 includes:
真伪识别矩阵生成单元,用于将所述用户传播特征向量以及所述文本特征向量进行聚合,得到所述目标消息的真伪识别矩阵;An authenticity recognition matrix generating unit, configured to aggregate the user propagation feature vector and the text feature vector to obtain an authenticity recognition matrix of the target message;
真伪指数计算单元,用于将所述真伪识别矩阵导入真伪指数计算模型,得到所述目标消息的真伪指数;所述真伪指数计算模型具体为:The authenticity index calculation unit is configured to import the authenticity recognition matrix into the authenticity index calculation model to obtain an authenticity index of the target message; the authenticity index calculation model is specifically:
Figure PCTCN2018097540-appb-000028
Figure PCTCN2018097540-appb-000028
其中,
Figure PCTCN2018097540-appb-000029
为所述真伪指数;[c j]为所述真伪识别矩阵;
Figure PCTCN2018097540-appb-000030
以及b c为所述真伪指数计算模型的预设系数;e为自然对数。
among them,
Figure PCTCN2018097540-appb-000029
The authenticity index; [c j ] is the authenticity recognition matrix;
Figure PCTCN2018097540-appb-000030
And b c is a preset coefficient of the authenticity index calculation model; e is a natural logarithm.
可选地,文本矩阵生成单元62包括:Optionally, the text matrix generating unit 62 includes:
全局传播矩阵创建单元,用于基于所述载体文本以及所述传播用户的标识,构建所述目标消息的全局传播矩阵[a ij] n×m;其中,所述a ij为第i个传播用户对于第j个载体文本的传播标记值;所述n为所述传播用户的个数;所述m为所述载体文本的个数;。 a global propagation matrix creating unit, configured to construct a global propagation matrix [a ij ] n×m of the target message based on the carrier text and the identifier of the propagation user; wherein the a ij is an ith propagation user a propagation tag value for the jth carrier text; the n is the number of the propagating users; the m is the number of the carrier texts;
文本矩阵分割单元,用于将所述全局传播矩阵[a ij] n×m中各列构成的子矩阵作为各个所述载体文本的文本矩阵。 And a text matrix dividing unit configured to use a sub-matrix formed by each of the global propagation matrices [a ij ] n×m as a text matrix of each of the carrier texts.
因此,本申请实施例提供的虚假消息的识别设备同样可以无需人工调研取证,从而减少了人工成本以及调查所需的时间,而是可以通过采集传递该目标消息的载体文本的文本特征以及对传播过该目标消息的各个传播用户的用户特征进行分析,其中,通过文本特征向量可以表现出该目标消息是否具有煽动特性,通过用户特征向量可以表现出该目标消息在传播的过程中是否具有爆发传播性,通过上述两个特征向量则可得到该目标消息的虚假指数,从而识别得到该目标消息是否为虚假消息,提高了虚假消息的识别准确率。Therefore, the identification device of the fake message provided by the embodiment of the present application can also reduce the labor cost and the time required for the investigation by eliminating the need for manual research and forensics, and can collect the text feature of the carrier text of the target message and the propagation of the target text. The user characteristics of each of the propagating users of the target message are analyzed, wherein the text feature vector can indicate whether the target message has a swaying characteristic, and the user eigenvector can indicate whether the target message has burst propagation during the process of propagation. Through the above two feature vectors, the false index of the target message can be obtained, thereby identifying whether the target message is a false message, and improving the recognition accuracy of the fake message.
图7是本申请另一实施例提供的一种虚假消息的识别设备的示意图。如图7所示,该实施例的虚假消息的识别设备7包括:处理器70、存储器71以及存储在所述存储器71中并可在所述处理器70上运行的计算机可读指令72,例如虚假消息的识别程序。所述处理器70执行所述计算机可读指令72时实现上述各个虚假消息的识别方法实施例中的步骤,例如图1所示的S101至S107。或者,所述处理器70执行所述计算机可读指令72时实现上述各装置实施例中各单元的功能,例如图6所示模块61至67功能。FIG. 7 is a schematic diagram of an apparatus for identifying a fake message according to another embodiment of the present application. As shown in FIG. 7, the identification device 7 of the fake message of this embodiment includes a processor 70, a memory 71, and computer readable instructions 72 stored in the memory 71 and operable on the processor 70, for example The identification procedure for false messages. The processor 70 executes the computer readable instructions 72 to implement the steps in the foregoing method for identifying the respective fake messages, such as S101 to S107 shown in FIG. 1. Alternatively, the processor 70, when executing the computer readable instructions 72, implements the functions of the various units in the various apparatus embodiments described above, such as the functions of the modules 61 through 67 shown in FIG.
示例性的,所述计算机可读指令72可以被分割成一个或多个单元,所述一个或者多个单元被存储在所述存储器71中,并由所述处理器70执行,以完成本申请。所述一个或多个单元可以是能够完成特定功能的一系列计算机可读指令指令段,该指令段用于描述所述计算机可读指令72在所述虚假消息的识别设备7中的执行过程。例如,所述计算机可读指令72可以被分割成目标消息参数获取单元、文本矩阵生成单元、文本特征向量生成单元、用户传播矩阵生成单元、用户传播特征向量计算单元、真伪指 数计算单元以及虚假消息识别单元,各单元具体功能如上所述。Illustratively, the computer readable instructions 72 may be partitioned into one or more units, the one or more units being stored in the memory 71 and executed by the processor 70 to complete the application. . The one or more units may be a series of computer readable instruction instruction segments capable of performing a particular function for describing the execution of the computer readable instructions 72 in the identification device 7 of the fake message. For example, the computer readable instructions 72 may be segmented into a target message parameter acquisition unit, a text matrix generation unit, a text feature vector generation unit, a user propagation matrix generation unit, a user propagation feature vector calculation unit, an authenticity index calculation unit, and a false The message identification unit has specific functions as described above.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to explain the technical solutions of the present application, and are not limited thereto; although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still implement the foregoing embodiments. The technical solutions described in the examples are modified or equivalently replaced with some of the technical features; and the modifications or substitutions do not deviate from the spirit and scope of the technical solutions of the embodiments of the present application, and should be included in Within the scope of protection of this application.

Claims (20)

  1. 一种虚假消息的识别方法,其特征在于,包括:A method for identifying a false message, comprising:
    获取包含目标消息的多个载体文本,以及各个所述载体文本的传播路径;所述传播路径包括传播所述载体文本的传播用户的标识;Acquiring a plurality of carrier texts containing the target message, and a propagation path of each of the carrier texts; the propagation path comprising an identifier of a propagation user propagating the carrier text;
    基于所述载体文本以及所述传播用户的标识,得到各个所述载体文本的文本矩阵;Obtaining a text matrix of each of the carrier texts based on the carrier text and the identifier of the propagating user;
    将各个所述文本矩阵导入至预设的特征向量计算模型,得到所述目标消息的文本特征向量;Importing each of the text matrices into a preset feature vector calculation model to obtain a text feature vector of the target message;
    根据所有所述载体文本的传播路径,生成关于所述目标消息的用户传播矩阵;所述用户传播矩阵中包含的各元素具体为每个所述传播用户传播的载体文本的个数;Generating, according to a propagation path of all the carrier texts, a user propagation matrix about the target message; each element included in the user propagation matrix is specifically a number of carrier texts propagated by each of the propagation users;
    将所述用户传播矩阵导入到预设的用户特征计算模型,得到所述目标消息对应的用户传播特征向量;Importing the user propagation matrix into a preset user feature calculation model to obtain a user propagation feature vector corresponding to the target message;
    根据所述用户传播特征向量以及所述文本特征向量,计算所述目标消息的真伪指数;Calculating an authenticity index of the target message according to the user propagation feature vector and the text feature vector;
    若所述真伪指数在预设的虚假指数范围内,则识别所述目标消息为虚假消息。If the authenticity index is within a preset false index range, the target message is identified as a false message.
  2. 根据权利要求1所述的识别方法,其特征在于,所述将各个所述文本矩阵导入至预设的特征向量计算模型,得到所述目标消息的文本特征向量,包括:The identification method according to claim 1, wherein the importing each of the text matrices into a preset feature vector calculation model to obtain a text feature vector of the target message comprises:
    分别获取各个所述载体文本的传播次数、内容特征参数以及传播时间参数;Obtaining, respectively, the number of times of propagation of each of the carrier texts, content feature parameters, and propagation time parameters;
    基于所述传播时间参数对各个所述载体文本进行排序,确定各个所述载体文本的导入次序;And sorting each of the carrier texts according to the propagation time parameter, and determining an import order of each of the carrier texts;
    将所述传播次数、所述内容特征参数、所述传播时间参数以及所述文本矩阵,导入到文本时序向量转换模型,得到各个所述载体文本的文本时序向量;所述文本时序向量转换模型具体为:Importing the number of times of propagation, the content feature parameter, the propagation time parameter, and the text matrix into a text time series vector conversion model to obtain a text time series vector of each of the carrier texts; the text time series vector conversion model is specific for:
    Figure PCTCN2018097540-appb-100001
    Figure PCTCN2018097540-appb-100001
    其中,
    Figure PCTCN2018097540-appb-100002
    为导入次序为t的载体文本的所述文本时序向量;η为所述传播次数;ΔT为所述传播时间参数;x u为所述文本矩阵;x t为导入次序为t的载体文本的融合矩阵;x τ为所述内容特征参数;W a以及b a为所述文本时序向量转换模型的预设调整系数;
    among them,
    Figure PCTCN2018097540-appb-100002
    To introduce the text time series vector of the carrier text of order t; η is the number of propagations; ΔT is the propagation time parameter; x u is the text matrix; x t is the fusion of the carrier text with the import order t a matrix; x τ is the content feature parameter; W a and b a are preset adjustment coefficients of the text time series vector conversion model;
    基于所述导入次序,将各个所述载体文本的文本时序向量导入到多层反馈循环神经网络的各层级,得到所述目标消息的文本特征向量;所述多层反馈循环神经网络具体为:And displaying, according to the importing order, a text time series vector of each of the carrier texts to each level of the multi-layer feedback loop neural network to obtain a text feature vector of the target message; the multi-layer feedback loop neural network is specifically:
    Figure PCTCN2018097540-appb-100003
    Figure PCTCN2018097540-appb-100003
    其中,h 0为预设的初始文本向量;
    Figure PCTCN2018097540-appb-100004
    为各个所述载体文本的文本时序向量;h 1、h 2…h t-1为所述多层反馈循环神经网络各层级输出的文本特征迭代中间值;h t为所述目标消息的文本特征向量;W、U、b为调整系数。
    Where h 0 is a preset initial text vector;
    Figure PCTCN2018097540-appb-100004
    a text timing vector for each of the carrier texts; h 1 , h 2 ... h t-1 are text feature iteration intermediate values output by each level of the multi-layer feedback loop neural network; h t is a text feature of the target message Vector; W, U, b are adjustment coefficients.
  3. 根据权利要求1所述的识别方法,其特征在于,所述将所述用户传播矩阵导入到预设的用户特征计算模型,得到所述目标消息对应的用户传播特征向量,包括:The identification method according to claim 1, wherein the importing the user propagation matrix into a preset user feature calculation model to obtain a user propagation feature vector corresponding to the target message comprises:
    对所述用户传播矩阵进行奇异值分解,得到各个所述传播用户的用户传播系数;Performing singular value decomposition on the user propagation matrix to obtain user propagation coefficients of each of the propagation users;
    将各个所述用户传播系数分别导入到传播特征向量转换模型,确定各个所述传播用户的用户特征向量;所述用户特征向量转换模型具体为:Introducing each of the user propagation coefficients into a propagation feature vector transformation model to determine a user feature vector of each of the propagation users; the user feature vector transformation model is specifically:
    Figure PCTCN2018097540-appb-100005
    Figure PCTCN2018097540-appb-100005
    其中,s i为第i个所述传播用户的用户特征向量;y i为第i个所述传播用户的用户传播系数;
    Figure PCTCN2018097540-appb-100006
    为第i个所述传播用户的用户时序向量;W u、b u
    Figure PCTCN2018097540-appb-100007
    以及b s为所述用户特征向量转换模型的预设系数;e为自然对数;
    Where s i is the user feature vector of the i-th propagation user; y i is the user propagation coefficient of the i-th propagation user;
    Figure PCTCN2018097540-appb-100006
    The user timing vector for the i-th propagation user; W u , b u ,
    Figure PCTCN2018097540-appb-100007
    And b s is a preset coefficient of the user feature vector conversion model; e is a natural logarithm;
    基于各个所述传播用户的用户特征向量,生成用户特征矩阵;Generating a user feature matrix based on user feature vectors of each of the propagation users;
    根据文本矩阵得到各个所述载体文本的掩码向量,并将所述掩码向量以及所述用户特征矩阵导入到用户传播特征值计算模型,确定各个所述载体文本的用户传播特征值;所述用户传播特征值计算模型具 体为:Obtaining, according to the text matrix, a mask vector of each of the carrier texts, and importing the mask vector and the user feature matrix into a user propagation feature value calculation model, and determining user propagation feature values of each of the carrier texts; The user propagation feature value calculation model is specifically as follows:
    Figure PCTCN2018097540-appb-100008
    Figure PCTCN2018097540-appb-100008
    其中,[s i]为所述用户特征矩阵;m j为第j篇所述载体文本的掩码向量;p j为第j篇所述载体文本的用户传播特征值;d([s i]*m j)为非空元素统计函数; Where [s i ] is the user feature matrix; m j is the mask vector of the carrier text described in the jth article; p j is the user propagation feature value of the carrier text described in the jth article; d([s i ] *m j ) is a non-empty element statistical function;
    根据各个所述用户传播特征值,生成所述目标消息的用户传播特征向量。Generating a user propagation feature vector of the target message according to each of the user propagation feature values.
  4. 根据权利要求1-3任一项所述的识别方法,其特征在于,所述根据所述用户传播特征向量以及所述文本特征向量,计算所述目标消息的真伪指数,包括:The identification method according to any one of claims 1-3, wherein the calculating the authenticity index of the target message according to the user propagation feature vector and the text feature vector comprises:
    将所述用户传播特征向量以及所述文本特征向量进行聚合,得到所述目标消息的真伪识别矩阵;And synthesizing the user propagation feature vector and the text feature vector to obtain an authenticity recognition matrix of the target message;
    将所述真伪识别矩阵导入真伪指数计算模型,得到所述目标消息的真伪指数;所述真伪指数计算模型具体为:The authenticity recognition matrix is imported into the authenticity index calculation model to obtain the authenticity index of the target message; the authenticity index calculation model is specifically:
    Figure PCTCN2018097540-appb-100009
    Figure PCTCN2018097540-appb-100009
    其中,
    Figure PCTCN2018097540-appb-100010
    为所述真伪指数;[c j]为所述真伪识别矩阵;
    Figure PCTCN2018097540-appb-100011
    以及b c为所述真伪指数计算模型的预设系数;e为自然对数。
    among them,
    Figure PCTCN2018097540-appb-100010
    The authenticity index; [c j ] is the authenticity recognition matrix;
    Figure PCTCN2018097540-appb-100011
    And b c is a preset coefficient of the authenticity index calculation model; e is a natural logarithm.
  5. 根据权利要求1所述的识别方法,其特征在于,所述基于所述载体文本以及所述传播用户的标识,得到各个所述载体文本的文本矩阵,包括:The identification method according to claim 1, wherein the text matrix of each of the carrier texts is obtained based on the carrier text and the identifier of the propagating user, including:
    基于所述载体文本以及所述传播用户的标识,构建所述目标消息的全局传播矩阵[a ij] n×m;其中,所述a ij为第i个传播用户对于第j个载体文本的传播标记值;所述n为所述传播用户的个数;所述m为所述载体文本的个数; Constructing a global propagation matrix [a ij ] n×m of the target message based on the carrier text and the identifier of the propagation user; wherein the a ij is the propagation of the jth carrier text by the ith propagation user a tag value; the n is the number of the propagating users; the m is the number of the carrier texts;
    将所述全局传播矩阵[a ij] n×m中各列构成的子矩阵作为各个所述载体文本的文本矩阵。 A submatrix composed of each of the global propagation matrices [a ij ] n × m is used as a text matrix of each of the carrier texts.
  6. 一种虚假消息的识别设备,其特征在于,包括:A device for identifying a false message, comprising:
    目标消息参数获取单元,用于获取包含目标消息的多个载体文本,以及各个所述载体文本的传播路径;所述传播路径包括传播所述载体文本的传播用户的标识;a target message parameter obtaining unit, configured to acquire a plurality of carrier texts including the target message, and a propagation path of each of the carrier texts; the propagation path includes an identifier of a propagation user that propagates the carrier text;
    文本矩阵生成单元,用于基于所述载体文本以及所述传播用户的标识,得到各个所述载体文本的文本矩阵;a text matrix generating unit, configured to obtain a text matrix of each of the carrier texts based on the carrier text and the identifier of the propagating user;
    文本特征向量生成单元,用于将各个所述文本矩阵导入至预设的特征向量计算模型,得到所述目标消息的文本特征向量;a text feature vector generating unit, configured to import each of the text matrices into a preset feature vector calculation model to obtain a text feature vector of the target message;
    用户传播矩阵生成单元,用于根据所有所述载体文本的传播路径,生成关于所述目标消息的用户传播矩阵;所述用户传播矩阵中包含的各元素具体为每个所述传播用户传播的载体文本的个数;a user propagation matrix generating unit, configured to generate a user propagation matrix about the target message according to a propagation path of all the carrier texts; each element included in the user propagation matrix is specifically a carrier for each of the propagation users The number of texts;
    用户传播特征向量计算单元,用于将所述用户传播矩阵导入到预设的用户特征计算模型,得到所述目标消息对应的用户传播特征向量;a user propagation feature vector calculation unit, configured to import the user propagation matrix into a preset user feature calculation model, to obtain a user propagation feature vector corresponding to the target message;
    真伪指数计算单元,用于根据所述用户传播特征向量以及所述文本特征向量,计算所述目标消息的真伪指数;An authenticity index calculation unit, configured to calculate an authenticity index of the target message according to the user propagation feature vector and the text feature vector;
    虚假消息识别单元,用于若所述真伪指数在预设的虚假指数范围内,则识别所述目标消息为虚假消息。The false message identifying unit is configured to identify the target message as a fake message if the authenticity index is within a preset false index range.
  7. 根据权利要求6所述的虚假消息的识别设备,其特征在于,所述文本特征向量生成单元包括:The device for identifying a fake message according to claim 6, wherein the text feature vector generating unit comprises:
    文本参数获取单元,用于分别获取各个所述载体文本的传播次数、内容特征参数以及传播时间参数;a text parameter obtaining unit, configured to respectively acquire a propagation number, a content feature parameter, and a propagation time parameter of each of the carrier texts;
    导入次序确定单元,用于基于所述传播时间参数对各个所述载体文本进行排序,确定各个所述载体文本的导入次序;And an import order determining unit, configured to sort each of the carrier texts based on the propagation time parameter, and determine an import order of each of the carrier texts;
    文本时序向量计算单元,用于将所述传播次数、所述内容特征参数、所述传播时间参数以及所述文本矩阵,导入到文本时序向量转换模型,得到各个所述载体文本的文本时序向量;所述文本时序向量转换模型具体为:a text timing vector calculation unit, configured to import the number of propagation times, the content feature parameter, the propagation time parameter, and the text matrix into a text time series vector conversion model to obtain a text timing vector of each of the carrier texts; The text time series vector conversion model is specifically:
    Figure PCTCN2018097540-appb-100012
    Figure PCTCN2018097540-appb-100012
    其中,
    Figure PCTCN2018097540-appb-100013
    为导入次序为t的载体文本的所述文本时序向量;η为所述传播次数;ΔT为所述传播时间参数;x u为所述文本矩阵;x t为导入次序为t的载体文本的融合矩阵;x τ为所述内容特征参数;W a以及b a为所述文本时序向量转换模型的预设调整系数;
    among them,
    Figure PCTCN2018097540-appb-100013
    To introduce the text time series vector of the carrier text of order t; η is the number of propagations; ΔT is the propagation time parameter; x u is the text matrix; x t is the fusion of the carrier text with the import order t a matrix; x τ is the content feature parameter; W a and b a are preset adjustment coefficients of the text time series vector conversion model;
    文本特征向量计算单元,用于基于所述导入次序,将各个所述载体文本的文本时序向量导入到多层反馈循环神经网络的各层级,得到所述目标消息的文本特征向量;所述多层反馈循环神经网络具体为:a text feature vector calculation unit, configured to import text timing vectors of each of the carrier texts into each layer of the multi-layer feedback loop neural network based on the import order, to obtain a text feature vector of the target message; The feedback cyclic neural network is specifically:
    Figure PCTCN2018097540-appb-100014
    Figure PCTCN2018097540-appb-100014
    其中,h 0为预设的初始文本向量;
    Figure PCTCN2018097540-appb-100015
    为各个所述载体文本的文本时序向量;h 1、h 2…h t-1为所述多层反馈循环神经网络各层级输出的文本特征迭代中间值;h t为所述目标消息的文本特征向量;W、U、b为调整系数。
    Where h 0 is a preset initial text vector;
    Figure PCTCN2018097540-appb-100015
    a text timing vector for each of the carrier texts; h 1 , h 2 ... h t-1 are text feature iteration intermediate values output by each level of the multi-layer feedback loop neural network; h t is a text feature of the target message Vector; W, U, b are adjustment coefficients.
  8. 根据权利要求6所述的虚假消息的识别设备,其特征在于,所述用户传播特征向量计算单元包括:The device for identifying a fake message according to claim 6, wherein the user propagation feature vector calculation unit comprises:
    传播系数确定单元,用于对所述用户传播矩阵进行奇异值分解,得到各个所述传播用户的用户传播系数;a propagation coefficient determining unit, configured to perform singular value decomposition on the user propagation matrix to obtain a user propagation coefficient of each of the propagation users;
    用户特征向量计算单元,用于将各个所述用户传播系数分别导入到传播特征向量转换模型,确定各个所述传播用户的用户特征向量;所述用户特征向量转换模型具体为:a user feature vector calculation unit, configured to import each of the user propagation coefficients into a propagation feature vector transformation model to determine a user feature vector of each of the propagation users; the user feature vector transformation model is specifically:
    Figure PCTCN2018097540-appb-100016
    Figure PCTCN2018097540-appb-100016
    其中,s i为第i个所述传播用户的用户特征向量;y i为第i个所述传播用户的用户传播系数;
    Figure PCTCN2018097540-appb-100017
    为第i个所述传播用户的用户时序向量;W u、b u
    Figure PCTCN2018097540-appb-100018
    以及b s为所述用户特征向量转换模型的预设系数;e为自然对数;
    Where s i is the user feature vector of the i-th propagation user; y i is the user propagation coefficient of the i-th propagation user;
    Figure PCTCN2018097540-appb-100017
    The user timing vector for the i-th propagation user; W u , b u ,
    Figure PCTCN2018097540-appb-100018
    And b s is a preset coefficient of the user feature vector conversion model; e is a natural logarithm;
    用户特征矩阵生成单元,用于基于各个所述传播用户的用户特征向量,生成用户特征矩阵;a user feature matrix generating unit, configured to generate a user feature matrix based on user feature vectors of each of the propagation users;
    用户传播特征值计算单元,用于根据文本矩阵得到各个所述载体文本的掩码向量,并将所述掩码向量以及所述用户特征矩阵导入到用户传播特征值计算模型,确定各个所述载体文本的用户传播特征值;所述用户传播特征值计算模型具体为:a user propagation feature value calculation unit, configured to obtain a mask vector of each of the carrier texts according to a text matrix, and import the mask vector and the user feature matrix into a user propagation feature value calculation model to determine each of the carriers The user of the text propagates the feature value; the user propagation feature value calculation model is specifically:
    Figure PCTCN2018097540-appb-100019
    Figure PCTCN2018097540-appb-100019
    其中,[s i]为所述用户特征矩阵;m j为第j篇所述载体文本的掩码向量;p j为第j篇所述载体文本的用户传播特征值;d([s i]*m j)为非空元素统计函数; Where [s i ] is the user feature matrix; m j is the mask vector of the carrier text described in the jth article; p j is the user propagation feature value of the carrier text described in the jth article; d([s i ] *m j ) is a non-empty element statistical function;
    用户传播特征向量确定单元,用于根据各个所述用户传播特征值,生成所述目标消息的用户传播特征向量。And a user propagation feature vector determining unit, configured to generate a user propagation feature vector of the target message according to each of the user propagation feature values.
  9. 根据权利要求6-8任一项所述的虚假消息的识别设备,其特征在于,所述真伪指数计算单元包括:The device for identifying a false message according to any one of claims 6-8, wherein the authenticity index calculation unit comprises:
    真伪识别矩阵生成单元,用于将所述用户传播特征向量以及所述文本特征向量进行聚合,得到所述目标消息的真伪识别矩阵;An authenticity recognition matrix generating unit, configured to aggregate the user propagation feature vector and the text feature vector to obtain an authenticity recognition matrix of the target message;
    真伪指数计算单元,用于将所述真伪识别矩阵导入真伪指数计算模型,得到所述目标消息的真伪指数;所述真伪指数计算模型具体为:The authenticity index calculation unit is configured to import the authenticity recognition matrix into the authenticity index calculation model to obtain an authenticity index of the target message; the authenticity index calculation model is specifically:
    Figure PCTCN2018097540-appb-100020
    Figure PCTCN2018097540-appb-100020
    其中,
    Figure PCTCN2018097540-appb-100021
    为所述真伪指数;[c j]为所述真伪识别矩阵;
    Figure PCTCN2018097540-appb-100022
    以及b c为所述真伪指数计算模型的预设系数;e为自然对数。
    among them,
    Figure PCTCN2018097540-appb-100021
    The authenticity index; [c j ] is the authenticity recognition matrix;
    Figure PCTCN2018097540-appb-100022
    And b c is a preset coefficient of the authenticity index calculation model; e is a natural logarithm.
  10. 根据权利要求6所述的虚假消息的识别设备,其特征在于,所述文本矩阵生成单元包括:The device for identifying a fake message according to claim 6, wherein the text matrix generating unit comprises:
    全局传播矩阵创建单元,用于基于所述载体文本以及所述传播用户的标识,构建所述目标消息的全 局传播矩阵[a ij] n×m;其中,所述a ij为第i个传播用户对于第j个载体文本的传播标记值;所述n为所述传播用户的个数;所述m为所述载体文本的个数;。 a global propagation matrix creating unit, configured to construct a global propagation matrix [a ij ] n×m of the target message based on the carrier text and the identifier of the propagation user; wherein the a ij is an ith propagation user a propagation tag value for the jth carrier text; the n is the number of the propagating users; the m is the number of the carrier texts;
    文本矩阵分割单元,用于将所述全局传播矩阵[a ij] n×m中各列构成的子矩阵作为各个所述载体文本的文本矩阵。 And a text matrix dividing unit configured to use a sub-matrix formed by each of the global propagation matrices [a ij ] n×m as a text matrix of each of the carrier texts.
  11. 一种虚假消息的识别设备,其特征在于,所述虚假消息的识别设备包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:An identification device for a fake message, characterized in that the identification device of the fake message comprises a memory, a processor, and computer readable instructions stored in the memory and operable on the processor, the processor The following steps are implemented when the computer readable instructions are executed:
    获取包含目标消息的多个载体文本,以及各个所述载体文本的传播路径;所述传播路径包括传播所述载体文本的传播用户的标识;Acquiring a plurality of carrier texts containing the target message, and a propagation path of each of the carrier texts; the propagation path comprising an identifier of a propagation user propagating the carrier text;
    基于所述载体文本以及所述传播用户的标识,得到各个所述载体文本的文本矩阵;Obtaining a text matrix of each of the carrier texts based on the carrier text and the identifier of the propagating user;
    将各个所述文本矩阵导入至预设的特征向量计算模型,得到所述目标消息的文本特征向量;Importing each of the text matrices into a preset feature vector calculation model to obtain a text feature vector of the target message;
    根据所有所述载体文本的传播路径,生成关于所述目标消息的用户传播矩阵;所述用户传播矩阵中包含的各元素具体为每个所述传播用户传播的载体文本的个数;Generating, according to a propagation path of all the carrier texts, a user propagation matrix about the target message; each element included in the user propagation matrix is specifically a number of carrier texts propagated by each of the propagation users;
    将所述用户传播矩阵导入到预设的用户特征计算模型,得到所述目标消息对应的用户传播特征向量;Importing the user propagation matrix into a preset user feature calculation model to obtain a user propagation feature vector corresponding to the target message;
    根据所述用户传播特征向量以及所述文本特征向量,计算所述目标消息的真伪指数;Calculating an authenticity index of the target message according to the user propagation feature vector and the text feature vector;
    若所述真伪指数在预设的虚假指数范围内,则识别所述目标消息为虚假消息。If the authenticity index is within a preset false index range, the target message is identified as a false message.
  12. 根据权利要求11所述的虚假消息的识别设备,其特征在于,所述将各个所述文本矩阵导入至预设的特征向量计算模型,得到所述目标消息的文本特征向量,包括:The device for identifying a fake message according to claim 11, wherein the importing each of the text matrices into a preset feature vector calculation model to obtain a text feature vector of the target message comprises:
    分别获取各个所述载体文本的传播次数、内容特征参数以及传播时间参数;Obtaining, respectively, the number of times of propagation of each of the carrier texts, content feature parameters, and propagation time parameters;
    基于所述传播时间参数对各个所述载体文本进行排序,确定各个所述载体文本的导入次序;And sorting each of the carrier texts according to the propagation time parameter, and determining an import order of each of the carrier texts;
    将所述传播次数、所述内容特征参数、所述传播时间参数以及所述文本矩阵,导入到文本时序向量转换模型,得到各个所述载体文本的文本时序向量;所述文本时序向量转换模型具体为:Importing the number of times of propagation, the content feature parameter, the propagation time parameter, and the text matrix into a text time series vector conversion model to obtain a text time series vector of each of the carrier texts; the text time series vector conversion model is specific for:
    Figure PCTCN2018097540-appb-100023
    Figure PCTCN2018097540-appb-100023
    其中,
    Figure PCTCN2018097540-appb-100024
    为导入次序为t的载体文本的所述文本时序向量;η为所述传播次数;ΔT为所述传播时间参数;x u为所述文本矩阵;x t为导入次序为t的载体文本的融合矩阵;x τ为所述内容特征参数;W a以及b a为所述文本时序向量转换模型的预设调整系数;
    among them,
    Figure PCTCN2018097540-appb-100024
    To introduce the text time series vector of the carrier text of order t; η is the number of propagations; ΔT is the propagation time parameter; x u is the text matrix; x t is the fusion of the carrier text with the import order t a matrix; x τ is the content feature parameter; W a and b a are preset adjustment coefficients of the text time series vector conversion model;
    基于所述导入次序,将各个所述载体文本的文本时序向量导入到多层反馈循环神经网络的各层级,得到所述目标消息的文本特征向量;所述多层反馈循环神经网络具体为:And displaying, according to the importing order, a text time series vector of each of the carrier texts to each level of the multi-layer feedback loop neural network to obtain a text feature vector of the target message; the multi-layer feedback loop neural network is specifically:
    Figure PCTCN2018097540-appb-100025
    Figure PCTCN2018097540-appb-100025
    其中,h 0为预设的初始文本向量;
    Figure PCTCN2018097540-appb-100026
    为各个所述载体文本的文本时序向量;h 1、h 2…h t-1为所述多层反馈循环神经网络各层级输出的文本特征迭代中间值;h t为所述目标消息的文本特征向量;W、U、b为调整系数。
    Where h 0 is a preset initial text vector;
    Figure PCTCN2018097540-appb-100026
    a text timing vector for each of the carrier texts; h 1 , h 2 ... h t-1 are text feature iteration intermediate values output by each level of the multi-layer feedback loop neural network; h t is a text feature of the target message Vector; W, U, b are adjustment coefficients.
  13. 根据权利要求11所述的虚假消息的识别设备,其特征在于,所述将所述用户传播矩阵导入到预设的用户特征计算模型,得到所述目标消息对应的用户传播特征向量,包括:The device for identifying a false message according to claim 11, wherein the importing the user propagation matrix into a preset user feature calculation model to obtain a user propagation feature vector corresponding to the target message comprises:
    对所述用户传播矩阵进行奇异值分解,得到各个所述传播用户的用户传播系数;Performing singular value decomposition on the user propagation matrix to obtain user propagation coefficients of each of the propagation users;
    将各个所述用户传播系数分别导入到传播特征向量转换模型,确定各个所述传播用户的用户特征向量;所述用户特征向量转换模型具体为:Introducing each of the user propagation coefficients into a propagation feature vector transformation model to determine a user feature vector of each of the propagation users; the user feature vector transformation model is specifically:
    Figure PCTCN2018097540-appb-100027
    Figure PCTCN2018097540-appb-100027
    其中,s i为第i个所述传播用户的用户特征向量;y i为第i个所述传播用户的用户传播系数;
    Figure PCTCN2018097540-appb-100028
    为第 i个所述传播用户的用户时序向量;W u、b u
    Figure PCTCN2018097540-appb-100029
    以及b s为所述用户特征向量转换模型的预设系数;e为自然对数;
    Where s i is the user feature vector of the i-th propagation user; y i is the user propagation coefficient of the i-th propagation user;
    Figure PCTCN2018097540-appb-100028
    The user timing vector for the i-th propagation user; W u , b u ,
    Figure PCTCN2018097540-appb-100029
    And b s is a preset coefficient of the user feature vector conversion model; e is a natural logarithm;
    基于各个所述传播用户的用户特征向量,生成用户特征矩阵;Generating a user feature matrix based on user feature vectors of each of the propagation users;
    根据文本矩阵得到各个所述载体文本的掩码向量,并将所述掩码向量以及所述用户特征矩阵导入到用户传播特征值计算模型,确定各个所述载体文本的用户传播特征值;所述用户传播特征值计算模型具体为:Obtaining, according to the text matrix, a mask vector of each of the carrier texts, and importing the mask vector and the user feature matrix into a user propagation feature value calculation model, and determining user propagation feature values of each of the carrier texts; The user propagation feature value calculation model is specifically as follows:
    Figure PCTCN2018097540-appb-100030
    Figure PCTCN2018097540-appb-100030
    其中,[s i]为所述用户特征矩阵;m j为第j篇所述载体文本的掩码向量;p j为第j篇所述载体文本的用户传播特征值;d([s i]*m j)为非空元素统计函数; Where [s i ] is the user feature matrix; m j is the mask vector of the carrier text described in the jth article; p j is the user propagation feature value of the carrier text described in the jth article; d([s i ] *m j ) is a non-empty element statistical function;
    根据各个所述用户传播特征值,生成所述目标消息的用户传播特征向量。Generating a user propagation feature vector of the target message according to each of the user propagation feature values.
  14. 根据权利要求11-13任一项所述的虚假消息的识别设备,其特征在于,所述根据所述用户传播特征向量以及所述文本特征向量,计算所述目标消息的真伪指数,包括:The device for identifying a false message according to any one of claims 11 to 13, wherein the calculating the authenticity index of the target message according to the user propagation feature vector and the text feature vector comprises:
    将所述用户传播特征向量以及所述文本特征向量进行聚合,得到所述目标消息的真伪识别矩阵;And synthesizing the user propagation feature vector and the text feature vector to obtain an authenticity recognition matrix of the target message;
    将所述真伪识别矩阵导入真伪指数计算模型,得到所述目标消息的真伪指数;所述真伪指数计算模型具体为:The authenticity recognition matrix is imported into the authenticity index calculation model to obtain the authenticity index of the target message; the authenticity index calculation model is specifically:
    Figure PCTCN2018097540-appb-100031
    Figure PCTCN2018097540-appb-100031
    其中,
    Figure PCTCN2018097540-appb-100032
    为所述真伪指数;[c j]为所述真伪识别矩阵;
    Figure PCTCN2018097540-appb-100033
    以及b c为所述真伪指数计算模型的预设系数;e为自然对数。
    among them,
    Figure PCTCN2018097540-appb-100032
    The authenticity index; [c j ] is the authenticity recognition matrix;
    Figure PCTCN2018097540-appb-100033
    And b c is a preset coefficient of the authenticity index calculation model; e is a natural logarithm.
  15. 根据权利要求11所述的虚假消息的识别设备,其特征在于,所述基于所述载体文本以及所述传播用户的标识,得到各个所述载体文本的文本矩阵,包括:The device for identifying a false message according to claim 11, wherein the text matrix of each of the carrier texts is obtained based on the carrier text and the identifier of the user, including:
    基于所述载体文本以及所述传播用户的标识,构建所述目标消息的全局传播矩阵[a ij] n×m;其中,所述a ij为第i个传播用户对于第j个载体文本的传播标记值;所述n为所述传播用户的个数;所述m为所述载体文本的个数; Constructing a global propagation matrix [a ij ] n×m of the target message based on the carrier text and the identifier of the propagation user; wherein the a ij is the propagation of the jth carrier text by the ith propagation user a tag value; the n is the number of the propagating users; the m is the number of the carrier texts;
    将所述全局传播矩阵[a ij] n×m中各列构成的子矩阵作为各个所述载体文本的文本矩阵。 A submatrix composed of each of the global propagation matrices [a ij ] n × m is used as a text matrix of each of the carrier texts.
  16. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现如下步骤:A computer readable storage medium storing computer readable instructions, wherein the computer readable instructions, when executed by a processor, implement the following steps:
    获取包含目标消息的多个载体文本,以及各个所述载体文本的传播路径;所述传播路径包括传播所述载体文本的传播用户的标识;Acquiring a plurality of carrier texts containing the target message, and a propagation path of each of the carrier texts; the propagation path comprising an identifier of a propagation user propagating the carrier text;
    基于所述载体文本以及所述传播用户的标识,得到各个所述载体文本的文本矩阵;Obtaining a text matrix of each of the carrier texts based on the carrier text and the identifier of the propagating user;
    将各个所述文本矩阵导入至预设的特征向量计算模型,得到所述目标消息的文本特征向量;Importing each of the text matrices into a preset feature vector calculation model to obtain a text feature vector of the target message;
    根据所有所述载体文本的传播路径,生成关于所述目标消息的用户传播矩阵;所述用户传播矩阵中包含的各元素具体为每个所述传播用户传播的载体文本的个数;Generating, according to a propagation path of all the carrier texts, a user propagation matrix about the target message; each element included in the user propagation matrix is specifically a number of carrier texts propagated by each of the propagation users;
    将所述用户传播矩阵导入到预设的用户特征计算模型,得到所述目标消息对应的用户传播特征向量;Importing the user propagation matrix into a preset user feature calculation model to obtain a user propagation feature vector corresponding to the target message;
    根据所述用户传播特征向量以及所述文本特征向量,计算所述目标消息的真伪指数;Calculating an authenticity index of the target message according to the user propagation feature vector and the text feature vector;
    若所述真伪指数在预设的虚假指数范围内,则识别所述目标消息为虚假消息。If the authenticity index is within a preset false index range, the target message is identified as a false message.
  17. 根据权利要求16所述的计算机可读存储介质,其特征在于,所述将各个所述文本矩阵导入至预设的特征向量计算模型,得到所述目标消息的文本特征向量,包括:The computer readable storage medium according to claim 16, wherein the importing each of the text matrices into a preset feature vector calculation model to obtain a text feature vector of the target message comprises:
    分别获取各个所述载体文本的传播次数、内容特征参数以及传播时间参数;Obtaining, respectively, the number of times of propagation of each of the carrier texts, content feature parameters, and propagation time parameters;
    基于所述传播时间参数对各个所述载体文本进行排序,确定各个所述载体文本的导入次序;And sorting each of the carrier texts according to the propagation time parameter, and determining an import order of each of the carrier texts;
    将所述传播次数、所述内容特征参数、所述传播时间参数以及所述文本矩阵,导入到文本时序向量转换模型,得到各个所述载体文本的文本时序向量;所述文本时序向量转换模型具体为:Importing the number of times of propagation, the content feature parameter, the propagation time parameter, and the text matrix into a text time series vector conversion model to obtain a text time series vector of each of the carrier texts; the text time series vector conversion model is specific for:
    Figure PCTCN2018097540-appb-100034
    Figure PCTCN2018097540-appb-100034
    其中,
    Figure PCTCN2018097540-appb-100035
    为导入次序为t的载体文本的所述文本时序向量;η为所述传播次数;ΔT为所述传播时间参数;x u为所述文本矩阵;x t为导入次序为t的载体文本的融合矩阵;x τ为所述内容特征参数;W a以及b a为所述文本时序向量转换模型的预设调整系数;
    among them,
    Figure PCTCN2018097540-appb-100035
    To introduce the text time series vector of the carrier text of order t; η is the number of propagations; ΔT is the propagation time parameter; x u is the text matrix; x t is the fusion of the carrier text with the import order t a matrix; x τ is the content feature parameter; W a and b a are preset adjustment coefficients of the text time series vector conversion model;
    基于所述导入次序,将各个所述载体文本的文本时序向量导入到多层反馈循环神经网络的各层级,得到所述目标消息的文本特征向量;所述多层反馈循环神经网络具体为:And displaying, according to the importing order, a text time series vector of each of the carrier texts to each level of the multi-layer feedback loop neural network to obtain a text feature vector of the target message; the multi-layer feedback loop neural network is specifically:
    Figure PCTCN2018097540-appb-100036
    Figure PCTCN2018097540-appb-100036
    其中,h 0为预设的初始文本向量;
    Figure PCTCN2018097540-appb-100037
    为各个所述载体文本的文本时序向量;h 1、h 2…h t-1为所述多层反馈循环神经网络各层级输出的文本特征迭代中间值;h t为所述目标消息的文本特征向量;W、U、b为调整系数。
    Where h 0 is a preset initial text vector;
    Figure PCTCN2018097540-appb-100037
    a text timing vector for each of the carrier texts; h 1 , h 2 ... h t-1 are text feature iteration intermediate values output by each level of the multi-layer feedback loop neural network; h t is a text feature of the target message Vector; W, U, b are adjustment coefficients.
  18. 根据权利要求16所述的计算机可读存储介质,其特征在于,所述将所述用户传播矩阵导入到预设的用户特征计算模型,得到所述目标消息对应的用户传播特征向量,包括:The computer readable storage medium according to claim 16, wherein the importing the user propagation matrix into a preset user feature calculation model to obtain a user propagation feature vector corresponding to the target message comprises:
    对所述用户传播矩阵进行奇异值分解,得到各个所述传播用户的用户传播系数;Performing singular value decomposition on the user propagation matrix to obtain user propagation coefficients of each of the propagation users;
    将各个所述用户传播系数分别导入到传播特征向量转换模型,确定各个所述传播用户的用户特征向量;所述用户特征向量转换模型具体为:Introducing each of the user propagation coefficients into a propagation feature vector transformation model to determine a user feature vector of each of the propagation users; the user feature vector transformation model is specifically:
    Figure PCTCN2018097540-appb-100038
    Figure PCTCN2018097540-appb-100038
    其中,s i为第i个所述传播用户的用户特征向量;y i为第i个所述传播用户的用户传播系数;
    Figure PCTCN2018097540-appb-100039
    为第i个所述传播用户的用户时序向量;W u、b u
    Figure PCTCN2018097540-appb-100040
    以及b s为所述用户特征向量转换模型的预设系数;e为自然对数;
    Where s i is the user feature vector of the i-th propagation user; y i is the user propagation coefficient of the i-th propagation user;
    Figure PCTCN2018097540-appb-100039
    The user timing vector for the i-th propagation user; W u , b u ,
    Figure PCTCN2018097540-appb-100040
    And b s is a preset coefficient of the user feature vector conversion model; e is a natural logarithm;
    基于各个所述传播用户的用户特征向量,生成用户特征矩阵;Generating a user feature matrix based on user feature vectors of each of the propagation users;
    根据文本矩阵得到各个所述载体文本的掩码向量,并将所述掩码向量以及所述用户特征矩阵导入到用户传播特征值计算模型,确定各个所述载体文本的用户传播特征值;所述用户传播特征值计算模型具体为:Obtaining, according to the text matrix, a mask vector of each of the carrier texts, and importing the mask vector and the user feature matrix into a user propagation feature value calculation model, and determining user propagation feature values of each of the carrier texts; The user propagation feature value calculation model is specifically as follows:
    Figure PCTCN2018097540-appb-100041
    Figure PCTCN2018097540-appb-100041
    其中,[s i]为所述用户特征矩阵;m j为第j篇所述载体文本的掩码向量;p j为第j篇所述载体文本的用户传播特征值;d([s i]*m j)为非空元素统计函数; Where [s i ] is the user feature matrix; m j is the mask vector of the carrier text described in the jth article; p j is the user propagation feature value of the carrier text described in the jth article; d([s i ] *m j ) is a non-empty element statistical function;
    根据各个所述用户传播特征值,生成所述目标消息的用户传播特征向量。Generating a user propagation feature vector of the target message according to each of the user propagation feature values.
  19. 根据权利要求16-18任一项所述的计算机可读存储介质,其特征在于,所述根据所述用户传播特征向量以及所述文本特征向量,计算所述目标消息的真伪指数,包括:The computer readable storage medium according to any one of claims 16 to 18, wherein the calculating the authenticity index of the target message according to the user propagation feature vector and the text feature vector comprises:
    将所述用户传播特征向量以及所述文本特征向量进行聚合,得到所述目标消息的真伪识别矩阵;And synthesizing the user propagation feature vector and the text feature vector to obtain an authenticity recognition matrix of the target message;
    将所述真伪识别矩阵导入真伪指数计算模型,得到所述目标消息的真伪指数;所述真伪指数计算模型具体为:The authenticity recognition matrix is imported into the authenticity index calculation model to obtain the authenticity index of the target message; the authenticity index calculation model is specifically:
    Figure PCTCN2018097540-appb-100042
    Figure PCTCN2018097540-appb-100042
    其中,
    Figure PCTCN2018097540-appb-100043
    为所述真伪指数;[c j]为所述真伪识别矩阵;
    Figure PCTCN2018097540-appb-100044
    以及b c为所述真伪指数计算模型的预设系数;e为自然对数。
    among them,
    Figure PCTCN2018097540-appb-100043
    The authenticity index; [c j ] is the authenticity recognition matrix;
    Figure PCTCN2018097540-appb-100044
    And b c is a preset coefficient of the authenticity index calculation model; e is a natural logarithm.
  20. 根据权利要求16所述的计算机可读存储介质,其特征在于,所述基于所述载体文本以及所述传播用户的标识,得到各个所述载体文本的文本矩阵,包括:The computer readable storage medium according to claim 16, wherein the text matrix of each of the carrier texts is obtained based on the carrier text and the identifier of the propagating user, including:
    基于所述载体文本以及所述传播用户的标识,构建所述目标消息的全局传播矩阵[a ij] n×m;其中, 所述a ij为第i个传播用户对于第j个载体文本的传播标记值;所述n为所述传播用户的个数;所述m为所述载体文本的个数; Constructing a global propagation matrix [a ij ] n×m of the target message based on the carrier text and the identifier of the propagation user; wherein the a ij is the propagation of the jth carrier text by the ith propagation user a tag value; the n is the number of the propagating users; the m is the number of the carrier texts;
    将所述全局传播矩阵[a ij] n×m中各列构成的子矩阵作为各个所述载体文本的文本矩阵。 A submatrix composed of each of the global propagation matrices [a ij ] n × m is used as a text matrix of each of the carrier texts.
PCT/CN2018/097540 2018-04-09 2018-07-27 Method for identifying false message and device thereof WO2019196259A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810309691.1 2018-04-09
CN201810309691.1A CN108830630B (en) 2018-04-09 2018-04-09 False message identification method and equipment

Publications (1)

Publication Number Publication Date
WO2019196259A1 true WO2019196259A1 (en) 2019-10-17

Family

ID=64154438

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/097540 WO2019196259A1 (en) 2018-04-09 2018-07-27 Method for identifying false message and device thereof

Country Status (2)

Country Link
CN (1) CN108830630B (en)
WO (1) WO2019196259A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188194B (en) * 2019-04-26 2020-12-01 哈尔滨工业大学(深圳) False news detection method and system based on multitask learning model
CN110750735A (en) * 2019-10-23 2020-02-04 腾讯科技(深圳)有限公司 False event identification method, device, equipment and storage medium based on block chain network
TWI731469B (en) * 2019-11-11 2021-06-21 財團法人資訊工業策進會 Apparatus and method for verfication of information
CN111428151B (en) * 2020-04-20 2022-05-17 浙江工业大学 False message identification method and device based on network acceleration
CN111831790B (en) * 2020-06-23 2023-07-14 广东工业大学 False news identification method based on low threshold integration and text content matching

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902621A (en) * 2012-12-28 2014-07-02 深圳先进技术研究院 Method and device for identifying network rumor
CN105045857A (en) * 2015-07-09 2015-11-11 中国科学院计算技术研究所 Social network rumor recognition method and system
US20160212163A1 (en) * 2015-01-16 2016-07-21 The Trustees Of The Stevens Institute Of Technology Method and Apparatus to Identify the Source of Information or Misinformation in Large-Scale Social Media Networks
CN106354845A (en) * 2016-08-31 2017-01-25 上海交通大学 Microblog rumor recognizing method and system based on propagation structures
CN107797998A (en) * 2016-08-29 2018-03-13 腾讯科技(深圳)有限公司 The recognition methods of user-generated content containing rumour and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106980692B (en) * 2016-05-30 2020-12-08 国家计算机网络与信息安全管理中心 Influence calculation method based on microblog specific events

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902621A (en) * 2012-12-28 2014-07-02 深圳先进技术研究院 Method and device for identifying network rumor
US20160212163A1 (en) * 2015-01-16 2016-07-21 The Trustees Of The Stevens Institute Of Technology Method and Apparatus to Identify the Source of Information or Misinformation in Large-Scale Social Media Networks
CN105045857A (en) * 2015-07-09 2015-11-11 中国科学院计算技术研究所 Social network rumor recognition method and system
CN107797998A (en) * 2016-08-29 2018-03-13 腾讯科技(深圳)有限公司 The recognition methods of user-generated content containing rumour and device
CN106354845A (en) * 2016-08-31 2017-01-25 上海交通大学 Microblog rumor recognizing method and system based on propagation structures

Also Published As

Publication number Publication date
CN108830630B (en) 2020-04-10
CN108830630A (en) 2018-11-16

Similar Documents

Publication Publication Date Title
WO2019196259A1 (en) Method for identifying false message and device thereof
US11475143B2 (en) Sensitive data classification
CN110162593B (en) Search result processing and similarity model training method and device
CN103258000B (en) Method and device for clustering high-frequency keywords in webpages
Zhang et al. Organizing books and authors by multilayer SOM
CN105335496B (en) Customer service based on cosine similarity text mining algorithm repeats call processing method
CN104573130B (en) The entity resolution method and device calculated based on colony
TW201839628A (en) Method, system and apparatus for discovering and tracking hot topics from network media data streams
CN108647322B (en) Method for identifying similarity of mass Web text information based on word network
CN107577688A (en) Original article influence power analysis system based on media information collection
TW201214169A (en) Recognition of target words using designated characteristic values
CN107844533A (en) A kind of intelligent Answer System and analysis method
CN112434151A (en) Patent recommendation method and device, computer equipment and storage medium
CN109918621B (en) News text infringement detection method and device based on digital fingerprints and semantic features
CN110175221B (en) Junk short message identification method by combining word vector with machine learning
Qu et al. Efficient online summarization of large-scale dynamic networks
CN109472027A (en) A kind of social robot detection system and method based on blog article similitude
CN110197389A (en) A kind of user identification method and device
CN105389341A (en) Text clustering and analysis method for repeating caller work orders of customer service calls
CN109657116A (en) A kind of public sentiment searching method, searcher, storage medium and terminal device
CN110390044A (en) A kind of searching method and equipment of the similar network page
CN109992676B (en) Cross-media resource retrieval method and retrieval system
CN111177559A (en) Text travel service recommendation method and device, electronic equipment and storage medium
Bansal et al. User tweets based genre prediction and movie recommendation using LSI and SVD
CN112380344A (en) Text classification method, topic generation method, device, equipment and medium

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 01/02/2021)

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18914661

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 18914661

Country of ref document: EP

Kind code of ref document: A1