CN111914645A - Method and device for identifying false information, electronic equipment and storage medium - Google Patents
Method and device for identifying false information, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN111914645A CN111914645A CN202010615617.XA CN202010615617A CN111914645A CN 111914645 A CN111914645 A CN 111914645A CN 202010615617 A CN202010615617 A CN 202010615617A CN 111914645 A CN111914645 A CN 111914645A
- Authority
- CN
- China
- Prior art keywords
- information
- user
- identified
- false
- obtaining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 83
- 230000009471 action Effects 0.000 claims abstract description 87
- 230000006399 behavior Effects 0.000 claims description 196
- 238000003066 decision tree Methods 0.000 claims description 27
- 238000012549 training Methods 0.000 claims description 25
- 238000012986 modification Methods 0.000 claims description 22
- 230000004048 modification Effects 0.000 claims description 22
- 230000002776 aggregation Effects 0.000 claims description 17
- 238000004220 aggregation Methods 0.000 claims description 17
- 230000004931 aggregating effect Effects 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 11
- 230000000875 corresponding effect Effects 0.000 description 40
- 230000008569 process Effects 0.000 description 21
- 230000007115 recruitment Effects 0.000 description 9
- 230000008450 motivation Effects 0.000 description 7
- 238000012545 processing Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002650 habitual effect Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/604—Tools and structures for managing or administering access control systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Human Computer Interaction (AREA)
- Social Psychology (AREA)
- Multimedia (AREA)
- Automation & Control Theory (AREA)
- Bioethics (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Psychiatry (AREA)
Abstract
The invention discloses a method and a device for identifying false information, electronic equipment and a storage medium. The method comprises the following steps: obtaining information to be identified and a plurality of historical information issued by a user of the information to be identified; obtaining information action characteristics corresponding to the information to be identified according to multiple operations of the user on the information to be identified; obtaining user behavior characteristics according to multiple operations of the user on the plurality of pieces of published historical information; and determining whether the information to be identified is false information or not according to the information action characteristics and the user behavior characteristics. By adopting the technical scheme of the invention, the identification accuracy of the false information can be improved.
Description
Technical Field
The present invention relates to the field of information processing technologies, and in particular, to a method and an apparatus for identifying false information, an electronic device, and a storage medium.
Background
With the development of the internet, people are more and more accustomed to using the internet to publish some information or browse some information so as to handle daily business and the like. For example, taking a recruitment website as an example, a user needs to publish recruitment information on the recruitment website to perform recruitment service, or publish job hunting information to obtain appropriate work. Taking the house intermediary website as an example, the merchant or the house owner can publish the house source information on the website to sell or rent the house. However, in practice, false information which is not practical is often found on the website, such as recruitment information false, house source false, and the like.
At present, when identifying whether the information is false information, the content contained in the information itself is generally extracted, and whether the information is false information is judged by identifying the content. However, the accuracy of this false information identification method is low.
Disclosure of Invention
In view of the foregoing problems, embodiments of the present invention provide a method, an apparatus, an electronic device, and a storage medium for identifying false information, which are used to solve the problem in the related art that the accuracy of identifying false information is low.
In order to solve the technical problem, the invention adopts the following scheme:
in a first aspect, an embodiment of the present invention provides a method for identifying false information, where the method includes:
obtaining information to be identified and a plurality of historical information issued by a user of the information to be identified;
obtaining the information action characteristics according to multiple operations of the user on the information to be identified;
obtaining user behavior characteristics according to multiple operations of the user on the plurality of pieces of published historical information;
and determining whether the information to be identified is false information or not according to the information action characteristics and the user behavior characteristics.
Optionally, obtaining the information action feature according to multiple operations of the user on the information to be identified includes:
reading an operation record of a first preset operation performed on the information to be identified by the user, wherein the first preset operation comprises one or more of the following operations: modifying operation, refreshing operation and closing operation;
and determining the frequency and the times of the first preset operation of the user on the information to be identified according to the operation record so as to obtain the information action characteristics.
Optionally, obtaining user behavior characteristics according to multiple operations performed by the user on the plurality of pieces of published historical information includes:
determining behavior characteristics corresponding to each historical information in the plurality of historical information;
and aggregating the behavior characteristics corresponding to the plurality of historical information to obtain the user behavior characteristics.
Optionally, the method further comprises:
obtaining a parameter value of a second preset operation performed by the user within a preset time period, wherein the second preset operation comprises one or more of the following operations: the method comprises the following steps of login operation, registration operation, newly-added published information, modification operation on the published information, refreshing operation on the published information and closing operation on the published information;
obtaining the category number and the occupation ratio of the published information of the user aiming at different scenes in the preset time period;
obtaining the proportion of the total number of the published information of the user in a preset time period to the total number of the published information outside the preset time period;
aggregating the behavior characteristics corresponding to the plurality of historical information to obtain the user behavior characteristics, including:
and determining an aggregation result obtained by aggregating the behavior characteristics corresponding to the plurality of pieces of historical information and at least one of the aggregation result and the behavior characteristics as the user behavior characteristics.
Optionally, the method further comprises:
analyzing the attribute parameter value of the user to obtain the attribute characteristic of the user;
determining whether the information to be identified is false information according to the information action characteristics and the user behavior characteristics, wherein the determining comprises the following steps:
and determining whether the information to be identified is false information or not according to the information action characteristic, the user behavior characteristic and the user attribute characteristic.
Optionally, determining whether the information to be identified is false information according to the information action feature and the user behavior feature, including:
inputting the information action characteristics and the user behavior characteristics into a first false information identification model;
classifying the information action characteristics and the user action characteristics respectively by using the first false information identification model by adopting a decision tree algorithm;
obtaining a false degree score output by the first false identification model according to a classification result so as to determine whether the information to be identified is false information;
the first false information identification model is obtained by taking a plurality of information samples carrying labels as training samples and training a preset model based on a decision tree algorithm, wherein the label carried by each information sample represents whether the information is false information.
Optionally, determining whether the information to be identified is false information according to the information action feature, the user behavior feature, and the user attribute feature, includes:
inputting the information action characteristics, the user behavior characteristics and the user attribute characteristics into a second false information identification model;
classifying the information action characteristics, the user behavior characteristics and the user attribute characteristics by using the second false information identification model by adopting a decision tree algorithm;
obtaining a false degree score output by the first false identification model according to a classification result so as to determine whether the information to be identified is false information;
the second false information identification model is obtained by taking a plurality of information samples as training samples and training a preset model based on a decision tree algorithm, wherein each information sample carries a label for representing whether the information is false information, and each information sample carries user attribute characteristics of a user who issues the sample information.
Optionally, after determining whether the information to be identified is false information according to the information action feature and the user behavior feature, the method further includes:
when the information to be identified is determined not to be false information and is not issued, issuing the information to be identified;
and when the information to be identified is determined to be false information and the information to be identified is issued, setting the access authority of the information to be identified as forbidden access.
Optionally, the method further comprises:
after the access authority of the information to be identified is set to be access-prohibited, identifying whether new information sent by the user is false information or not when the new information is received again;
and after the information to be identified is issued, when new information sent by the user is received again, marking the new information as identification-free information and issuing the identification-free information.
In a second aspect of the embodiments of the present invention, there is provided an apparatus for identifying false information, the apparatus including:
the information acquisition module is used for acquiring information to be identified and a plurality of historical information issued by a user of the information to be identified;
the first behavior feature obtaining module is used for obtaining the information behavior feature according to multiple operations of the user on the information to be identified;
the second behavior characteristic obtaining module is used for obtaining the behavior characteristics of the user according to a plurality of times of operations of the user on the plurality of pieces of published historical information;
and the information determining module is used for determining whether the information to be identified is false information or not according to the information action characteristics and the user behavior characteristics.
Optionally, the first behavior feature obtaining module includes:
a record reading unit, configured to read an operation record of a first preset operation performed on the information to be identified by the user, where the first preset operation includes one or more of the following: modifying operation, refreshing operation and closing operation;
and the behavior characteristic obtaining unit is used for determining the frequency and the times of the first preset operation of the user on the information to be identified according to the operation record so as to obtain the information action characteristic.
Optionally, the second behavior feature obtaining module includes:
the determining unit is used for determining the behavior characteristics corresponding to each piece of historical information in the plurality of pieces of historical information;
and the aggregation unit is used for aggregating the behavior characteristics corresponding to the plurality of historical information to obtain the user behavior characteristics.
Optionally, the apparatus further comprises:
a parameter value obtaining module, configured to obtain a parameter value of a second preset operation performed by the user within a preset time period, where the second preset operation includes one or more of the following: the method comprises the following steps of login operation, registration operation, newly-added published information, modification operation on the published information, refreshing operation on the published information and closing operation on the published information;
the first statistical module is used for obtaining the category number and the proportion of the published information of the user aiming at different scenes in the preset time period;
the second statistical module is used for obtaining the proportion of the total number of the published information of the user in a preset time period to the total number of the published information outside the preset time period;
the aggregation unit is specifically configured to determine, as the user behavior feature, an aggregation result obtained by aggregating the behavior features corresponding to the plurality of pieces of history information, and at least one of the aggregation result and the behavior features.
Optionally, the apparatus further comprises:
the attribute characteristic obtaining module is used for analyzing the attribute parameter value of the user to obtain the attribute characteristic of the user;
the information determining module is specifically configured to determine whether the information to be identified is false information according to the information action feature, the user behavior feature, and the user attribute feature.
Optionally, the information determining module includes:
the first input unit is used for inputting the information action characteristics and the user behavior characteristics into a first false information identification model;
the classification unit is used for classifying the information action characteristics and the user behavior characteristics by using the first false information identification model through a decision tree algorithm;
the result output unit is used for obtaining a false degree score output by the first false identification model according to the classification result so as to determine whether the information to be identified is false information;
the first false information identification model is obtained by taking a plurality of information samples carrying labels as training samples and training a preset model based on a decision tree algorithm, wherein the label carried by each information sample represents whether the information is false information.
Optionally, the information determining module includes:
the second input unit is used for inputting the information action characteristics, the user behavior characteristics and the user attribute characteristics into a second false information identification model;
the second determining unit is used for classifying the information action characteristics, the user behavior characteristics and the user attribute characteristics by using the second false information identification model through a decision tree algorithm;
the result output unit is used for obtaining a false degree score output by the second false identification model according to the classification result so as to determine whether the information to be identified is false information;
the second false information identification model is obtained by taking a plurality of information samples as training samples and training a preset model based on a decision tree algorithm, wherein each information sample carries a 00 label used for representing whether the information is false information, and each information sample carries user attribute characteristics of a user who issues the sample information.
Optionally, the apparatus further comprises:
the information issuing module is used for issuing the information to be identified when the information to be identified is determined not to be false information and the information to be identified is not issued;
and the access prohibition module is used for setting the access permission of the information to be identified as prohibited to access when the information to be identified is determined to be false information and the information to be identified is issued.
Optionally, the apparatus further comprises:
the first identification triggering module is used for identifying whether the new information is false information or not when the new information sent by the user is received again after the access authority of the information to be identified is set to be access-prohibited;
and the second identification triggering module is used for marking the new information as identification-free information and issuing the identification-free information when the new information sent by the user is received again after the information to be identified is issued.
In a third aspect, an embodiment of the present invention provides an electronic device, including: memory, processor and computer program stored on the memory and executable on the processor, which computer program, when being executed by the processor, carries out the method steps of identifying false information as described in the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the method for identifying false information according to the first aspect.
Compared with the prior art, the embodiment of the invention at least has the following advantages:
in the embodiment of the invention, information to be identified and a plurality of pieces of historical information issued by a user of the information to be identified can be obtained, and the information action characteristics are obtained according to a plurality of times of operations of the user on the information to be identified; acquiring user behavior characteristics according to multiple operations of the user on the published multiple pieces of historical information; and determining whether the information to be identified is false information or not according to the information action characteristics and the user behavior characteristics.
The user behavior characteristics can reflect the multiple operation behaviors of the user on the plurality of historical information, and the information action characteristics reflect the multiple operation behaviors of the user on the current information to be identified, so that when the information to be identified is identified, the authenticity of the information to be identified, which is to be issued by the user at present, can be identified by comprehensively considering all the operation behaviors of the user (including the operation behaviors of the user on the historical information and the operation behaviors of the user on the current information to be identified). The operation behavior of the user directly reflects the personal behavior characteristics of the information issued or to be issued by the user, and can accurately reflect the motivation of the user to issue the information, so that the authenticity of the information to be identified is identified from the dimension of the operation behavior of the user, and the accuracy rate of identifying the information is improved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without inventive labor.
FIG. 1 is a flow chart of steps of a method of identifying spurious information in an embodiment of the present invention;
FIG. 2 is a flow chart of steps of a further method of identifying spurious information in an embodiment of the present invention;
FIG. 3 is a flow chart of steps for obtaining information action characteristics in an embodiment of the invention;
FIG. 4 is a flow chart of steps for obtaining user behavior characteristics in an embodiment of the invention;
FIG. 5 is a schematic structural diagram of an apparatus for identifying false information in an embodiment of the present invention;
fig. 6 is a schematic hardware structure diagram of a server in the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the related art, when the falsification of the information is identified, the content contained in the information itself is generally identified, for example, some keywords in the information, the picture, are identified to determine whether the picture is a P picture, and whether the content is fraudulent.
However, there is no necessary correlation between the content contained in the information itself and whether the information is false information, and in some cases, false information identification is performed according to the content contained in the information itself, and the false information may be mistaken as real information. Taking the information to be identified as the house source post as an example, the house source post issued by the user may be a real house source, but the authorized handler of the house source is not the user who issues the house source post, and in this case, it should be determined that the house source post is false and cannot be pushed to the platform to be browsed by other users. If the content contained in the house source post is identified, the situation that the user who issues the house source post is actually not the authorized handler of the house source may not be identified, and then the house source post is judged to be real. Therefore, the way of identifying the content contained in the information itself results in a low accuracy of information identification, and cannot cover more false types existing in the actual scene.
Based on the technical problems to be solved, the inventor proposes the core concept of the application: the operation behavior of the user is depicted, so that whether the information which is prepared and issued by the user is false information or not is judged according to the operation behavior of the user, and the accuracy rate of identifying the information is improved.
Based on the technical conception, the scheme for identifying the false information is provided, and the method for identifying the false information is clearly and completely described below.
The method for identifying the false information in the embodiment of the invention can be applied to a server, and particularly can be applied to an application scene for determining whether the issued house source information is the false house source information. Referring to fig. 1, a flow chart illustrating steps of a method for identifying false information in an embodiment of the present invention is shown. As shown in fig. 1, the method for identifying false information may specifically include the following steps:
step S11: the method comprises the steps of obtaining information to be identified and a plurality of pieces of historical information issued by a user of the information to be identified.
The information to be identified may be information to be published or information already published received by the server. The information to be issued may be information that the user has edited the information and sent the information to the background service for auditing. In practice, when the server receives a piece of information to be published or receives a piece of published information to be identified, the server may correspondingly obtain a plurality of pieces of history information sent by the user who published the information to be identified before the current time.
In practice, the plurality of history information may be information that is issued by the user within a preset time period before the current time. For example, the plurality of history information is information that the user published within two months before the current time. In one embodiment, the plurality of history information and the information to be identified may be information in the same distribution scenario. That is, a plurality of pieces of history information and information to be identified are published in the same publishing scene, which may be understood as a block where the information is published on the platform. In practice, the publishing scenario may be, but is not limited to, the following scenario: recruitment scene, house selling scene, and rent house scene. For example, in the embodiment of the present invention, the information to be released is house source information, and the release scenario is a house selling scenario, all historical house source information released by the target user in the house selling scenario may be obtained.
In the embodiment of the present invention, the information to be identified may be house source information, where the house source information refers to information that describes basic conditions of a house and is edited by a user, and includes, but is not limited to, a house address, a house type map, an orientation, a rental price, a selling price, an indoor picture, a house cell environment, and the like.
In practice, the information to be identified may carry a user ID, and the server may determine the user who sends the information according to the user ID carried by the information to be identified. Further, the server may acquire a plurality of pieces of history information transmitted by the user before the current time.
Step S12: and obtaining the information action characteristics according to multiple operations of the user on the information to be identified.
In this embodiment, the information action feature may be obtained according to multiple operations performed on the information to be recognized by the user within the life cycle of the information to be recognized, and the information action feature may represent an operation feature of the user performing multiple operations on the information to be recognized.
The lifecycle of the information to be identified may represent an effective duration of the information to be identified on the line, and specifically, the effective duration refers to a time period between an initial sending time and a deadline of sending the information to be identified. The deadline is the time when the information to be identified is deleted by the user or the server or the user closes. In practice, if the information to be identified is not deleted by the user actively, is not deleted by the server, or is not closed by the user, it indicates that the information is still valid information, and the deadline may be the current time.
In practice, the multiple operations performed by the user on the information to be recognized may refer to multiple modification, refresh or shutdown operations performed by the user on the information to be recognized within the life cycle. The modification operation refers to that a user modifies and edits information to be identified, the refresh operation refers to that the user refreshes a post page where the information to be identified is located, the close operation refers to that the user closes the information to be identified, and in practice, after the information to be identified is closed, the information to be identified is invisible to other users.
In an implementation manner in this embodiment, when the information action feature is obtained according to multiple operations performed by the user on the information to be recognized, user behavior data corresponding to the information to be recognized may be analyzed, so as to obtain the information action feature. The information action characteristic may characterize the frequency of operation, and/or the total number of modifications, and/or the total number of shutdowns, and/or the refresh frequency, time interval, etc. of the information to be identified.
Step S13: and obtaining user behavior characteristics according to a plurality of operations of the user on the plurality of pieces of published historical information.
In one implementation, the user behavior characteristics may be obtained by analyzing user behavior data corresponding to each of the plurality of historical information to obtain the user behavior characteristics of the user performing multiple operations on the plurality of historical information.
The operation performed by the user on the plurality of pieces of history information may include, but is not limited to, the following operations: a modify operation, a refresh operation, and a close operation. The modifying operation refers to that a user modifies and edits the history information, the refreshing operation refers to that the user refreshes a post page where the history information is located, the closing operation refers to that the user closes the history information, and in practice, after the history information is closed, the history information is invisible to other users.
Accordingly, in this embodiment, the user behavior feature may characterize the frequency of operations, and/or the total number of modifications, and/or the total number of shutdowns, and/or the refresh frequency, time interval, etc. performed on the plurality of historical information.
Step S14: and determining whether the information to be identified is false information or not according to the information action characteristics and the user behavior characteristics.
In this embodiment, the information action characteristics and the user behavior characteristics may be correlated, so as to determine whether the information to be identified is false information according to the correlated result. In one specific implementation, respective weight values may be set for the information action characteristic and the user behavior characteristic, so as to perform weighted average on the information action characteristic and the user behavior characteristic, and further obtain a final weighted average value, and whether the information is false information or not may be determined by the weighted average value.
In practice, a score threshold may be set in advance, and when the weighted average does not reach the score threshold, the information may be determined to be real information. For example, the score threshold is set to 0.1, and when the value obtained by weighted averaging of the information action characteristic and the user behavior characteristic is between 0 and 0.1, the information can be determined to be real information, and if the obtained value is greater than 0.1, the information can be determined to be false information.
In the embodiment of the invention, the information action characteristics are obtained according to the operation of the user on the information to be identified, so that the action characteristics of the user on the information to be identified are described; the user behavior characteristics are obtained according to multiple operations of the user on the plurality of historical information, the behavior characteristics of the user on the plurality of historical information are comprehensively depicted, and finally whether the information to be identified is false information or not can be judged by combining the user behavior characteristics and the information action characteristics.
Therefore, when the authenticity of the information is identified, the authenticity of the information to be identified which is prepared and issued by the user at present is identified by comprehensively considering all operation behaviors of the user (including the operation behavior of the user on the historical information and the operation behavior of the user on the information to be identified at present). The operation behavior of the user directly reflects the personal behavior characteristics of the information issued or to be issued by the user, and can accurately reflect the motivation of the user to issue the information, so that the authenticity of the information to be identified is identified from the dimension of the operation behavior of the user, and the accuracy rate of identifying the information is improved.
Referring to fig. 2, fig. 2 is a flowchart illustrating steps of a method for identifying false information according to another embodiment of the present invention, and referring to fig. 2, the method may specifically include the following steps:
step S21: the method comprises the steps of obtaining information to be identified and a plurality of pieces of historical information issued by a user of the information to be identified.
The specific process of step S21 is similar to step S11, and reference may be made to the description of step S11, which is not repeated herein.
Step S22: and obtaining the information action characteristics according to multiple operations of the user on the information to be identified.
In this embodiment, referring to fig. 3, a step diagram for obtaining information action characteristics is shown, and as shown in fig. 3, the step S22 may specifically include the following steps:
step S221: and reading an operation record of a first preset operation of the user on the information to be identified.
Wherein the first preset operation comprises one or more of: modify operation, refresh operation, close operation.
Step S222: and determining the frequency and the times of the first preset operation of the user on the information to be identified according to the operation record so as to obtain the information action characteristics.
In this embodiment, the operation record for performing the first preset operation on the information to be identified may be an operation record in a life cycle of the information to be identified, and is used to record an operation performed by a user on the information to be identified in the life cycle.
In one embodiment, the operation record may include an operation time per operation and an operation type per operation of the user on the information to be identified. Further, the number of times of each operation, and thus the operation frequency, may be determined according to the operation time and the operation type of each operation. The frequency can represent the frequency of the first preset operation performed by the user in the life cycle. For example, taking three modification operations and three refresh operations performed on the room source information W to be recognized by the target user in the first life cycle as an example, the frequency of the modification operations is 3, and the frequency of the refresh operations is also 3.
In practice, the time interval between two adjacent operations of the same operation type may also be determined, and the average time interval, the maximum time interval, the minimum time interval, and the mean of the variance corresponding to each operation may be obtained by performing statistics on the time intervals of the same operation type. In this case, the information action feature may include a mean time interval, a maximum time interval, a minimum time interval, or a mean value of the variance corresponding to each operation. Then, the average time interval, the maximum time interval, the minimum time interval, or the mean of the variances corresponding to each operation may reflect the behavior characteristics of the user to perform the operation on the identification.
For example, taking as an example that a user performs three modification operations and three refresh operations on information W to be identified in a lifecycle, where the information W is not closed in the lifecycle, and modification times corresponding to the three modification operations are: 11:05 points of 11 days in 3 and 3 months in 2019, 9:00 points of 13 days in 3 and 13:00 points in 13 months in 2019, and the time intervals of the modification operation performed in every two adjacent times are respectively determined as follows: 45 hours 55 minutes (2755 minutes), 4 hours (240 minutes), the average of the two time intervals being 1498 minutes. The refreshing time corresponding to the three refreshing operations is as follows: 10:05 of 10 days in 2019, 2 and 10 months, 12:08 of 11 days in 2019, 2 and 11 days in 2019, and 13:00 of 12 days in 2019, the time intervals of the refreshing operation performed every two adjacent times are respectively determined as follows: 26 hours 3 minutes (1563 minutes), 24 hours 52 minutes (1612 minutes), the average of the two time intervals being 1588 minutes. Then 1588 minutes and 1498 minutes may be included in the behavior profile. Of course, the maximum time interval 2755 minutes for the modify operation and the maximum time interval 1612 minutes for the refresh operation may also be used as the behavior characteristics.
When the method and the device are adopted, the frequency, the times and/or the time interval of the first preset operation can be counted, so that the behavior characteristic of the information to be identified is obtained, and whether a behavior motivation of the user for information counterfeiting exists or not is preliminarily abstracted according to the behavior characteristic. For example, if the frequency of modification or refresh times is high and the time interval is short, it is considered that there is a behavioral incentive for information to be falsified (generally, when distributing the false information, a user may frequently modify and refresh the false information in order to make the false information look more like real information), and thus the probability of the information to be recognized may be false information.
Step S23: and obtaining user behavior characteristics according to a plurality of operations of the user on the plurality of pieces of published historical information.
Referring to fig. 4, a flowchart of the step of obtaining the user behavior feature is shown, and as shown in fig. 4, in step S23, the method may specifically include the following steps:
step S231: and determining the behavior characteristics corresponding to each historical information in the plurality of historical information.
Step S232: and aggregating the behavior characteristics corresponding to the plurality of historical information to obtain the user behavior characteristics.
In this embodiment, the behavior characteristic corresponding to each piece of history information may represent an operation characteristic of a user performing multiple operations on each piece of history information. During specific implementation, an operation record of a first preset operation performed on each historical information by a user can be obtained, so that behavior characteristics corresponding to each historical information are obtained according to the operation record. Specifically, the behavior characteristic corresponding to each history information may be described with reference to the processes of step S221 to step S222 described above.
The behavior feature corresponding to each piece of history information represents an operation feature of a user operating the history information, and when the behavior features corresponding to the plurality of pieces of history information are aggregated, the feature representing the same operation feature in the behavior features corresponding to the plurality of pieces of history information may be subjected to statistical processing, where the statistical processing may be mean processing for obtaining a mean, a maximum, a minimum, or a variance, so as to obtain the user behavior feature, and the user behavior feature may at least include a mean of a maximum value/a mean value/a minimum value/a variance of an operation frequency, and an operation time interval of a first preset operation performed on the plurality of pieces of history information.
Illustratively, for example, taking the plurality of pieces of history information as history information a, history information B, and history information C, respectively, the behavior characteristics of the obtained history information a include 122 minutes (average time interval of modification operation), 128 minutes (average time interval of refresh operation), and 20 minutes (average time interval of closing operation), the sub-user behavior characteristics of the history information B include 135 minutes (average time interval of modification operation), 141 minutes (average time interval of refresh operation), and 38 minutes (average time interval of closing operation), and the sub-user behavior characteristics of the history information C include 28 minutes (average time interval of modification operation), 48 minutes (average time interval of refresh operation), and 0 minutes (average time interval of closing operation).
And aggregating the behavior characteristics of the three historical information, namely aggregating the characteristics 122, 135 and 28 corresponding to the modification operation. In practice, the average or variance of 122, 135, 28 may be calculated, for example, if the average of 122, 135, 28 is 142.5, 142.5 may be used as one of the user behavior characteristics, or alternatively, the maximum value or the minimum value may be directly used as one of the user behavior characteristics, and then the finally obtained user behavior characteristics include: 142.5, 158.5, 29.
When the method and the device are adopted, the behavior characteristics of each piece of historical information in the plurality of pieces of historical information are aggregated to obtain the user behavior characteristics, so that the behavior operation of the user on other pieces of historical information can be used as reference, and the behavior motivation of the user for issuing the information to be identified can be further abstracted. For example, if the user does not modify/refresh a plurality of pieces of history information frequently or at short time intervals (that is, the user issues real information), and if the user modifies/refreshes the information to be recognized frequently or at short time intervals, it may be determined that the user may have a false motivation when issuing the information to be recognized (in order to make the false information look more real, the information is generally modified frequently).
In an optional example, in order to expand the operation characteristics represented by the obtained user behavior characteristics to accurately abstract the behavior motivation of the user, the corresponding other information may be obtained through the following steps while obtaining multiple operations of the user on multiple pieces of historical information:
step S233: obtaining a parameter value of a second preset operation performed by the user within a preset time period, wherein the second preset operation comprises one or more of the following operations: the method comprises the steps of login operation, registration operation, newly-added published information, modification operation on the published information, refreshing operation on the published information and closing operation on the published information.
In this embodiment, the second preset operation performed within the preset time may refer to a second preset operation performed by the user on the plurality of historical information within the preset time period, and a parameter value of the second preset operation may describe an operation characteristic of the second preset operation performed within the preset time period.
In practice, if a user registers a plurality of different accounts through the same identity (for example, an identity card number), and publishes historical information through different accounts, the user may hide the property of publishing false information. The closing operation of the published information refers to an operation of setting the published information to be invisible to other users.
The newly added and released information refers to an operation of adding information by a user in a preset time period, and since the plurality of historical information are information sent by the user in the preset time period, the sending refers to an action of sending the information to the server by the user, and actually does not necessarily represent that the information is newly added and sent for the first time. Therefore, in the embodiment of the present invention, it may be determined whether the history information is information that is newly added within a preset time period by knowing the time when the information is newly added.
For example, taking the house source information W and the preset time period as two months as an example, if the house source information W is sent by the user for the first time before two months, but the user modifies the house source information W in the last two months, the modified house source information W is sent to the server again to release the house source information W. In this case, it can be determined that the house source information W is not newly added in the last two months. If the initial sending time of the house source information W is about two months, it can be determined that the house source information W is newly added in about two months.
In this embodiment, the operation frequency, the time interval, and the like of the second preset operation may be subjected to statistical processing, so as to obtain a parameter value of the second preset operation, where a process of performing statistical processing on the operation frequency, the time interval, and the like of the second preset operation is similar to the process of the step S221 to the step S22, and is not described herein again.
Step S234: and obtaining the category number and the occupation ratio of the published information of the user for different scenes in the preset time period.
In this embodiment, in order to further improve the accuracy of identifying the false information, and further identify the correlation between the false information and identity false, information state false, and the like, published information issued by a user in different scenes, that is, published information issued by the user in other scenes different from the scene in which the information to be identified is issued, can be obtained. If the information to be recognized is house source information and the release scene is a house selling scene, the user can also obtain the job information released in the job application scene within a preset time period or the job hunting information released in the job hunting scene.
In specific implementation, the number of the published information of each category can be counted, and when the number of the published information of each category is obtained, the proportion of the published information of each category can be determined according to the total number of the published information in the preset time. Taking the total number of the released information as 12 as an example, wherein 5 pieces of information belong to the recruitment category, 4 pieces of information belong to the application category, and 3 pieces of information belong to the second-hand item buying and selling category, the quantity distribution of the 12 pieces of released information on the categories can be obtained. In practice, when the proportion of a category of information is higher, for example, the proportion of the information released in the recruitment scene is high, it indicates that the user is hidden in the quality, and the false degree of the information to be identified is higher.
In the embodiment, the relevance between the information to be identified and the published information of different types can be established by counting the published information published by the users in different scenes, so that the common identity of the users is abstracted, the authenticity of the information to be identified is identified by combining the common identity of the users, and the identification accuracy of the information to be identified is improved.
For example, if a user publishes a large amount of information in a recruitment scenario, the user may actually be an intermediary, and it is considered that there may be an intermediary pretending to be a person to publish information to be identified, and the identification information may be false house source information.
Step S235: and obtaining the proportion of the total number of the published information of the user in the preset time period to the total number of the published information outside the preset time period.
In this embodiment, the published information may refer to history information published by the user before the current time. The preset time period may be a time period in an early stage within a preset time period in which a plurality of pieces of history information are obtained, for example, the history information is history information of last two months, and the preset time period referred to in this embodiment may be the previous month or the previous 20 days within two months. The time other than the preset time period may refer to the latter one month or the latter 40 days of two months.
In this embodiment, the information quantity distribution of the plurality of pieces of historical information in the distribution time may be counted, for example, taking 60 days from 5/month 1 in 2019 to 7/month 1 in 2019 as an example, a preset time period of 5/month 1 to 5/month 30 is provided, assuming that 10 pieces of historical information are distributed in total from 5/month 1 to 5/month 30, and 45 pieces of information are distributed in total from 6/month 1 to 7/month 1, it may be determined that the historical information quantity distribution is not distributed in the distribution of the user in the distribution time in the 60 days.
By adopting the embodiment, the frequency degree of the historical information issued by the user can be obtained, so that the relevance between the information to be identified and the frequency degree of posting of the user is established, and the accuracy rate of identifying the information to be identified can be further improved.
For example, if a user frequently issues a large amount of similar information in a time slot for issuing the information to be recognized, the possibility that the information to be recognized is false information is high, and therefore, if only a small amount of similar information is issued in the time slot for issuing the information to be recognized, the possibility that the information to be recognized is real information is high.
Accordingly, after the corresponding information is obtained through steps S233 to S234, an aggregation result obtained by aggregating the behavior characteristics corresponding to the plurality of pieces of history information and at least one of the above may be determined as the user behavior characteristic.
That is, in the case of obtaining the ratio of the total number of the published information of the user in the preset time period to the total number of the published information outside the preset time period, the number of categories and the ratio of the published information of the user for different scenes in the preset time period, and the parameter value of the user performing the second preset operation in the preset time period, the result of aggregating one or more of the above information and the behavior characteristic corresponding to each piece of history information may be used as the user behavior characteristic.
By adopting the embodiment, the relevance between the information to be identified and different information can be established, namely the relevance between the information to be identified and published information under different scenes, the relevance between the information to be identified and the posting frequency of the user and the relevance between the information to be identified and the user behavior are established. Because the relevance between the information to be identified and the published information in different scenes is established, whether the information to be identified is the fake of the user identity type can be judged; because the relevance between the information to be identified and the posting frequency of the user is established, whether the information to be identified is stage fake or not can be judged; because the relevance between the information to be identified and the user behavior is established, whether the information to be identified is habitual faking of the user can be judged, so that the coverage rate of the false type of the hit information is increased, and the identification accuracy rate is improved.
Step S24: and determining whether the information to be identified is false information or not according to the information action characteristics and the user behavior characteristics.
The process of step S24 is similar to the process of step S14, and reference may be made to the process of step S14, which is not described herein again.
Accordingly, in order to improve the efficiency and intelligence of information identification, in an implementation manner proposed in this embodiment, when determining whether the information to be identified is false information according to the information action feature and the user behavior feature, the information action feature and the user behavior feature may be input into a first false information identification model; and a decision tree algorithm is adopted, the information action characteristics and the user behavior characteristics are respectively classified by utilizing the first false information identification model, and a false degree score output by the first false identification model is obtained according to a classification result so as to determine whether the information to be identified is false information.
The first false information identification model is obtained by taking a plurality of information samples carrying labels as training samples and training a preset model based on a decision tree algorithm, wherein the label carried by each information sample represents whether the information is false information.
In this embodiment, the first false information recognition model classifies the information action feature and the user behavior feature based on a decision tree algorithm, respectively. When the method and the device are adopted, the first false information identification model can be utilized to identify the information to be identified, and the model can output the false degree score, so that the method and the device can be conveniently implemented on the ground, and the operating efficiency of the platform is improved.
The preset model may be a lightGBM model, which is a fast, distributed, and high-performance gradient lifting framework based on a decision tree algorithm. In practice, after the preset model is trained, a first false information recognition model is obtained, and the first false information recognition model can be used for recognizing the authenticity of the information.
In this embodiment, a threshold of the false degree score may be preset, and when the false degree score is greater than or equal to the threshold of the false degree score, it may be determined that the information to be identified is false, and if the false degree score is less than the threshold of the false degree score, it may be determined that the information to be identified is true.
In practice, when a preset model is trained, feature extraction may be performed on a plurality of information samples respectively, behavior features and user behavior features corresponding to the plurality of information samples are obtained, and then the behavior features and the user behavior features corresponding to the plurality of information samples are input into the preset model for training, so as to obtain a first false information recognition model. Specifically, the process of obtaining the behavior feature corresponding to each information sample is similar to the process in step S12, and the process of obtaining the user behavior feature corresponding to each information sample is similar to the process in step S13, which is not described herein again, and the related process may refer to the above description.
In yet another practical situation, the preset model may be further configured to perform feature extraction on the information sample according to the method shown in the foregoing embodiment to obtain behavior features corresponding to the information sample and user behavior features corresponding to a plurality of other information samples associated with the information sample, and further perform feature analysis on the behavior features corresponding to the information sample and the user behavior features corresponding to the plurality of other information samples associated with the information sample, for example, perform analysis through convolution with a scale 1 × 1 to obtain an identification result, determine a loss according to a label carried by the information sample, and update the preset model according to the loss.
In this way, the obtained first false information identification model can directly identify the information to be identified and the plurality of pieces of historical information, and under the condition, the information to be identified and the plurality of pieces of historical information can also be directly input into the first false information identification model, so that the information action characteristics and the user behavior characteristics are obtained through the first false information identification model, and the false degree score is output.
In other embodiments, in addition to determining whether the information to be identified is false information according to the information action feature and the user behavior feature, a user attribute feature may be added, that is, determining whether the information to be identified is false information according to the information action feature, the user behavior feature, and the user attribute feature.
Analyzing the attribute parameter value of the user to obtain the attribute characteristic of the user; and when determining whether the information to be identified is false information according to the information action characteristic and the user behavior characteristic, determining whether the information to be identified is false information according to the information action characteristic, the user behavior characteristic and the user attribute characteristic.
In specific implementation, the server may obtain the user information of the user, and perform feature extraction on the user information to obtain the user attribute features. Wherein the user information may include, but is not limited to, the following information: registration time, year and month of birth of the user, contact information, home address, work unit, occupational property, gender, type of user list, and the like. The user attribute features are data extracted from the user information that can characterize the personalized features of the user. In specific implementation, the following user attribute features can be extracted from the user information: the time interval from the registration time to the expiration date, the complete number of the three information of the birth year and the month of the user, the contact information and the occupational property, the gender of the user and the type of the user list.
The user list type can comprise three types of black, white and gray, different types represent credit levels of the user, the higher the credit level is, the lighter the color is, the trustworthiness of the user is represented, and the higher the truth of the issued information is. The larger the complete number of the three information of the birth year and month, the contact way and the occupational property of the user is, the higher the integrity of the information representing the user is, and in practice, the higher the integrity is, the more reliable the user is, and the higher the authenticity of the issued information is. The expiration date may refer to a date when the user was logged off or pulled into the blacklist by the server, and if the user was not logged off or pulled into the blacklist, the expiration date may refer to a current time. The longer the time interval from the registration time to the expiration date, the longer the lifetime of the target user, and in practice, the more reliable the target user can be represented.
In the embodiment, because the user attribute characteristics are obtained, the individual credit degree of the user can be abstracted according to the user attribute characteristics, so that the information to be identified is identified by referring to the individual credit degree of the user, and the accuracy rate of identifying the information to be identified is further improved.
In an implementation manner of this embodiment, when determining whether the information to be identified is false information according to the information action feature, the user behavior feature, and the user attribute feature, in order to improve the efficiency of information identification and implement true and false of intelligent identification information, the information action feature, the user behavior feature, and the user attribute feature may be input into a second false information identification model; classifying the information action characteristics, the user behavior characteristics and the user attribute characteristics by using the second false information identification model by adopting a decision tree algorithm; and obtaining a false degree score output by the second false recognition model according to the classification result so as to determine whether the information to be recognized is false information.
The second false information identification model is obtained by taking a plurality of information samples as training samples and training a preset model based on a decision tree algorithm, wherein each information sample carries a label for representing whether the information is false information, and each information sample carries user attribute characteristics of a user who issues the sample information.
In this embodiment, the second false information recognition model is classified based on a decision tree algorithm according to information action characteristics, user behavior characteristics, and user attribute characteristics. The process of obtaining the second false information identification model may refer to the process of obtaining the first false information identification model, which is not described herein again. It should be noted that: the second false information identification model may also directly identify the information to be identified and the plurality of pieces of history information, and in this case, the information to be identified, the plurality of pieces of history information, and the user attribute feature may also be directly input to the second false information identification model, thereby outputting the false degree score.
In this embodiment, after the identification result of the information to be identified is obtained, the information to be identified may be processed according to the identification result of the information to be identified, and the information reissued by the user may be processed according to the identification result of the information to be identified.
Specifically, after step S24, when it is determined that the information to be identified is not false information and the information to be identified is not published, step S25 may be executed; when it is determined that the information to be identified is false information and the information to be identified is issued, step S26 may be executed.
Step S25: and issuing the information to be identified.
Specifically, if the information to be identified is unpublished information and the information is determined to be authentic information, the information may be distributed. In particular, the publishing may refer to publishing into a network platform for viewing by other users.
Accordingly, in an embodiment, after the information to be identified is published, when new information sent by the user is received again, the new information is marked as identification-free information and published.
In practice, when it is determined that the information to be identified is real information, the user can be marked as a trustworthy user, so that when new information sent by the user is received again, the new information can be directly issued according to the condition that the user is marked as a trustworthy user, that is, false identification of the new information is avoided, so that the information issuing efficiency of part of users is improved, and the user experience is optimized.
Step S26: and setting the access authority of the information to be identified as forbidden access.
Specifically, if the information to be identified is published, that is, the information is in a state that the information can be viewed by public browsing, and the information to be identified is false, the server may set the information to be identified to be prohibited from being accessed to prohibit other users from viewing the information to be identified, so that the identification information is changed from being viewable by public browsing to being not viewable by public browsing to avoid the false information being viewed by public. Of course, in practice, the user may also be included in the blacklist.
Accordingly, in one embodiment, after the access right of the information to be identified is set to be access-prohibited, when new information sent by the user is received again, whether the new information is false information or not is identified.
With this embodiment, when new information sent by the target user is received again, the virtual identification of the new information may be started according to the fact that the target user is included in the blacklist. Therefore, the authenticity of information release is ensured, and a good network mutual trust environment is created.
Of course, in some embodiments, if it is recognized that the information to be recognized is false information and the information to be recognized is not published, the information to be recognized may be closed, so that the user cannot edit the information to be recognized any more and cannot browse the information by other users.
In the embodiment of the invention, the user behavior characteristics comprise the behavior characteristics of performing multiple operations on a plurality of historical information, and also comprise the parameter value for performing the second preset operation, the category number and the occupation ratio of the published information aiming at different scenes in the preset time period, and the proportion of the total number of the published information of the user in the preset time period to the total number of the published information outside the preset time period, so that the behavior operation characteristics of the user on other information are enriched and perfected, the accuracy of abstracting the behavior motivation of the user to publish the information to be identified is improved, and the identification accuracy of the information to be identified is improved.
And because the information to be identified can be identified by utilizing the first false information identification model or the second false information identification model, the intelligence and the efficiency of identification are improved, and meanwhile, the generalization of information false-false identification is also improved by adopting the model to identify the true and false of the information.
Referring to fig. 5, a schematic structural diagram of an apparatus for identifying false information according to an embodiment of the present invention is shown, and as shown in fig. 5, the apparatus may be applied to a server, and specifically may include the following modules:
an information obtaining module 501, configured to obtain information to be identified and a plurality of pieces of history information that has been issued by a user of the information to be identified;
a first behavior feature obtaining module 502, configured to obtain an information behavior feature corresponding to the information to be identified according to multiple operations performed on the information to be identified by the user;
a second behavior feature obtaining module 503, configured to obtain a user behavior feature according to multiple operations performed by the user on the multiple pieces of published historical information;
the information determining module 504 may be configured to determine whether the information to be identified is false information according to the information action feature and the user behavior feature.
Optionally, the first behavior feature obtaining module 502 may specifically include the following units:
a record reading unit, configured to read an operation record of a first preset operation performed on the information to be identified by the user, where the first preset operation includes one or more of: modifying operation, refreshing operation and closing operation;
and the behavior characteristic obtaining unit may be configured to determine, according to the operation record, a frequency and a number of times of a first preset operation performed by the user on the information to be identified, so as to obtain the information behavior characteristic.
Optionally, the second behavior feature obtaining module 503 may specifically include the following units:
the determining unit may be configured to determine a behavior feature corresponding to each of the plurality of pieces of history information;
the aggregation unit may be configured to aggregate behavior characteristics corresponding to the plurality of pieces of history information, to obtain the user behavior characteristics.
Optionally, the apparatus may further include the following modules:
a parameter value obtaining module, configured to obtain a parameter value of a second preset operation performed by the user within a preset time period, where the second preset operation includes one or more of: the method comprises the following steps of login operation, registration operation, newly-added published information, modification operation on the published information, refreshing operation on the published information and closing operation on the published information;
the first statistical module can be used for obtaining the category number and the proportion of the published information of the user for different scenes in the preset time period;
the second statistical module may be configured to obtain a ratio of a total number of pieces of published information of the user in a preset time period to a total number of pieces of published information outside the preset time period;
the aggregation unit may be specifically configured to determine, as the user behavior feature, an aggregation result obtained by aggregating the behavior features corresponding to the plurality of pieces of history information, and at least one of the aggregation result and the behavior features.
Optionally, the apparatus may further include the following modules:
the attribute characteristic obtaining module can be used for analyzing the attribute parameter value of the user to obtain the attribute characteristic of the user;
the information determining module may be specifically configured to determine whether the information to be identified is false information according to the information action feature, the user behavior feature, and the user attribute feature.
Optionally, the information determining module 504 may specifically include the following units: :
a first input unit, which can be used for inputting the information action characteristics and the user behavior characteristics into a first false information identification model;
the classification unit is used for classifying the information action characteristics and the user behavior characteristics by using the first false information identification model through a decision tree algorithm;
the result output unit is used for obtaining a false degree score output by the first false identification model according to the classification result so as to determine whether the information to be identified is false information;
the first false information identification model is obtained by taking a plurality of information samples carrying labels as training samples and training a preset model based on a decision tree algorithm, wherein the label carried by each information sample represents whether the information is false information.
Optionally, the information determining module 504 may specifically include the following units: :
the second input unit can be used for inputting the information action characteristics, the user behavior characteristics and the user attribute characteristics into a second false information identification model;
the second determining unit may be configured to classify the information action feature, the user behavior feature, and the user attribute feature by using the second false information identification model through a decision tree algorithm;
the result output unit is used for obtaining a false degree score output by the first false identification model according to the classification result so as to determine whether the information to be identified is false information;
the second false information identification model is obtained by taking a plurality of information samples as training samples and training a preset model based on a decision tree algorithm, wherein each information sample carries a label which can be used for representing whether the information is false information, and each information sample carries user attribute characteristics of a user who issues the sample information.
Optionally, the apparatus may further include the following modules:
the information issuing module can be used for issuing the information to be identified when the information to be identified is determined not to be false information and the information to be identified is not issued;
and the access prohibition module can be used for setting the access permission of the information to be identified as prohibited to access when the information to be identified is determined to be false information and the information to be identified is issued.
Optionally, the apparatus may further include the following modules:
the first identification triggering module can be used for identifying whether the new information is false information or not when the new information sent by the user is received again after the access authority of the information to be identified is set to be access-prohibited;
the second identification triggering module may be configured to mark the new information as identification-free information and issue the identification-free information when the new information sent by the user is received again after the information to be identified is issued.
Fig. 6 is a schematic structural diagram of a server 600 for implementing various embodiments of the present invention, where the server 600 may include a device 61 for identifying false information and a database 62, and may further include a network interface 64 and a data interface 63. A plurality of information issued by the user can be stored in the database 62, and the false information identifying device 61 can be used for executing the false information identifying method. Specifically, the means 61 for identifying false information may be a combination of software and hardware, the hardware may include a physical key, the physical key may be used for providing functions such as returning, confirming and the like, and the software includes an application program; wherein, the device 61 for identifying false information can cooperate with the database 62 through software and hardware to implement the method for identifying false information described in the above embodiments.
An embodiment of the present invention further provides an electronic device, including: the processor, the memory, and the computer program stored in the memory and capable of running on the processor, when being executed by the processor, implement each process of the above method embodiment for identifying false information, and can achieve the same technical effect, and are not described herein again to avoid repetition.
The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when being executed by a processor, the computer program implements each process of the above method for identifying false information, and can achieve the same technical effect, and in order to avoid repetition, the details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (20)
1. A method of identifying false information, the method comprising:
obtaining information to be identified and a plurality of historical information issued by a user of the information to be identified;
obtaining information action characteristics corresponding to the information to be identified according to multiple operations of the user on the information to be identified;
obtaining user behavior characteristics according to multiple operations of the user on the plurality of pieces of published historical information;
and determining whether the information to be identified is false information or not according to the information action characteristics and the user behavior characteristics.
2. The method according to claim 1, wherein obtaining the information action feature according to a plurality of operations of the user on the information to be identified comprises:
reading an operation record of a first preset operation performed on the information to be identified by the user, wherein the first preset operation comprises one or more of the following operations: modifying operation, refreshing operation and closing operation;
and determining the frequency and the times of the first preset operation of the user on the information to be identified according to the operation record so as to obtain the information action characteristics.
3. The method according to claim 1 or 2, wherein obtaining the user behavior characteristics according to a plurality of operations performed by the user on the plurality of pieces of published history information comprises:
determining behavior characteristics corresponding to each historical information in the plurality of historical information;
and aggregating the behavior characteristics corresponding to the plurality of historical information to obtain the user behavior characteristics.
4. The method of claim 3, further comprising:
obtaining a parameter value of a second preset operation performed by the user within a preset time period, wherein the second preset operation comprises one or more of the following operations: the method comprises the following steps of login operation, registration operation, newly-added published information, modification operation on the published information, refreshing operation on the published information and closing operation on the published information;
obtaining the category number and the occupation ratio of the published information of the user aiming at different scenes in the preset time period;
obtaining the proportion of the total number of the published information of the user in a preset time period to the total number of the published information outside the preset time period;
aggregating the behavior characteristics corresponding to the plurality of historical information to obtain the user behavior characteristics, including:
and determining an aggregation result obtained by aggregating the behavior characteristics corresponding to the plurality of pieces of historical information and at least one of the aggregation result and the behavior characteristics as the user behavior characteristics.
5. The method of claim 1, further comprising:
analyzing the attribute parameter value of the user to obtain the attribute characteristic of the user;
determining whether the information to be identified is false information according to the information action characteristics and the user behavior characteristics, wherein the determining comprises the following steps:
and determining whether the information to be identified is false information or not according to the information action characteristic, the user behavior characteristic and the user attribute characteristic.
6. The method of claim 1, wherein determining whether the information to be identified is false information according to the information action characteristic and the user behavior characteristic comprises:
inputting the information action characteristics and the user behavior characteristics into a first false information identification model;
classifying the information action characteristics and the user action characteristics respectively by using the first false information identification model by adopting a decision tree algorithm;
obtaining a false degree score output by the first false identification model according to a classification result so as to determine whether the information to be identified is false information;
the first false information identification model is obtained by taking a plurality of information samples carrying labels as training samples and training a preset model based on a decision tree algorithm, wherein the label carried by each information sample represents whether the information is false information.
7. The method of claim 5, wherein determining whether the information to be identified is false information according to the information action feature, the user behavior feature and the user attribute feature comprises:
inputting the information action characteristics, the user behavior characteristics and the user attribute characteristics into a second false information identification model;
classifying the information action characteristics, the user behavior characteristics and the user attribute characteristics by using the second false information identification model by adopting a decision tree algorithm;
obtaining a false degree score output by the second false recognition model according to the classification result so as to determine whether the information to be recognized is false information;
the second false information identification model is obtained by taking a plurality of information samples as training samples and training a preset model based on a decision tree algorithm, wherein each information sample carries a label for representing whether the information is false information, and each information sample carries user attribute characteristics of a user who issues the sample information.
8. The method according to claim 1, wherein after determining whether the information to be identified is false information according to the information action characteristic and the user behavior characteristic, the method further comprises:
when the information to be identified is determined not to be false information and is not issued, issuing the information to be identified;
and when the information to be identified is determined to be false information and the information to be identified is issued, setting the access authority of the information to be identified as forbidden access.
9. The method of claim 8, further comprising:
after the access authority of the information to be identified is set to be access-prohibited, identifying whether new information sent by the user is false information or not when the new information is received again;
and after the information to be identified is issued, when new information sent by the user is received again, marking the new information as identification-free information and issuing the identification-free information.
10. An apparatus for identifying false information, the apparatus comprising:
the information acquisition module is used for acquiring information to be identified and a plurality of historical information issued by a user of the information to be identified;
the first behavior feature obtaining module is used for obtaining information behavior features corresponding to the information to be identified according to multiple operations of the user on the information to be identified;
the second behavior characteristic obtaining module is used for obtaining the behavior characteristics of the user according to a plurality of times of operations of the user on the plurality of pieces of published historical information;
and the information determining module is used for determining whether the information to be identified is false information or not according to the information action characteristics and the user behavior characteristics.
11. The apparatus of claim 10, wherein the first behavior feature obtaining module comprises:
a record reading unit, configured to read an operation record of a first preset operation performed on the information to be identified by the user, where the first preset operation includes one or more of the following: modifying operation, refreshing operation and closing operation;
and the behavior characteristic obtaining unit is used for determining the frequency and the times of the first preset operation of the user on the information to be identified according to the operation record so as to obtain the information action characteristic.
12. The apparatus according to claim 10 or 11, wherein the second behavior feature obtaining module comprises:
the determining unit is used for determining the behavior characteristics corresponding to each piece of historical information in the plurality of pieces of historical information;
and the aggregation unit is used for aggregating the behavior characteristics corresponding to the plurality of historical information to obtain the user behavior characteristics.
13. The apparatus of claim 12, further comprising:
a parameter value obtaining module, configured to obtain a parameter value of a second preset operation performed by the user within a preset time period, where the second preset operation includes one or more of the following: the method comprises the following steps of login operation, registration operation, newly-added published information, modification operation on the published information, refreshing operation on the published information and closing operation on the published information;
the first statistical module is used for obtaining the category number and the proportion of the published information of the user aiming at different scenes in the preset time period;
the second statistical module is used for obtaining the proportion of the total number of the published information of the user in a preset time period to the total number of the published information outside the preset time period;
the aggregation unit is specifically configured to determine, as the user behavior feature, an aggregation result obtained by aggregating the behavior features corresponding to the plurality of pieces of history information, and at least one of the aggregation result and the behavior features.
14. The apparatus of claim 10, further comprising:
the attribute characteristic obtaining module is used for analyzing the attribute parameter value of the user to obtain the attribute characteristic of the user;
the information determining module is specifically configured to determine whether the information to be identified is false information according to the information action feature, the user behavior feature, and the user attribute feature.
15. The apparatus of claim 10, wherein the information determining module comprises:
the first input unit is used for inputting the information action characteristics and the user behavior characteristics into a first false information identification model;
the classification unit is used for classifying the information action characteristics and the user behavior characteristics by using the first false information identification model through a decision tree algorithm;
the result output unit is used for obtaining a false degree score output by the first false identification model according to the classification result so as to determine whether the information to be identified is false information;
the first false information identification model is obtained by taking a plurality of information samples carrying labels as training samples and training a preset model based on a decision tree algorithm, wherein the label carried by each information sample represents whether the information is false information.
16. The apparatus of claim 14, wherein the information determining module comprises:
the second input unit is used for inputting the information action characteristics, the user behavior characteristics and the user attribute characteristics into a second false information identification model;
the second determining unit is used for classifying the information action characteristics, the user behavior characteristics and the user attribute characteristics by using the second false information identification model through a decision tree algorithm;
the result output unit is used for obtaining a false degree score output by the first false identification model according to the classification result so as to determine whether the information to be identified is false information;
the second false information identification model is obtained by taking a plurality of information samples as training samples and training a preset model based on a decision tree algorithm, wherein each information sample carries a label for representing whether the information is false information, and each information sample carries user attribute characteristics of a user who issues the sample information.
17. The apparatus of claim 10, further comprising:
the information issuing module is used for issuing the information to be identified when the information to be identified is determined not to be false information and the information to be identified is not issued;
and the access prohibition module is used for setting the access permission of the information to be identified as prohibited to access when the information to be identified is determined to be false information and the information to be identified is issued.
18. The apparatus of claim 17, further comprising:
the first identification triggering module is used for identifying whether the new information is false information or not when the new information sent by the user is received again after the access authority of the information to be identified is set to be access-prohibited;
and the second identification triggering module is used for marking the new information as identification-free information and issuing the identification-free information when the new information sent by the user is received again after the information to be identified is issued.
19. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the method of identifying false information according to any one of claims 1-9.
20. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the method for identifying false information according to any one of claims 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010615617.XA CN111914645A (en) | 2020-06-30 | 2020-06-30 | Method and device for identifying false information, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010615617.XA CN111914645A (en) | 2020-06-30 | 2020-06-30 | Method and device for identifying false information, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111914645A true CN111914645A (en) | 2020-11-10 |
Family
ID=73227010
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010615617.XA Pending CN111914645A (en) | 2020-06-30 | 2020-06-30 | Method and device for identifying false information, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111914645A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113902457A (en) * | 2021-11-19 | 2022-01-07 | 北京房江湖科技有限公司 | Method and device for evaluating reliability of house source information, electronic equipment and storage medium |
CN115409104A (en) * | 2022-08-25 | 2022-11-29 | 贝壳找房(北京)科技有限公司 | Method, apparatus, device, medium and program product for identifying object type |
CN115482014A (en) * | 2022-09-15 | 2022-12-16 | 广东数鼎科技有限公司 | Method and device for identifying false vehicle source of second-hand vehicle |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103064987A (en) * | 2013-01-31 | 2013-04-24 | 五八同城信息技术有限公司 | Bogus transaction information identification method |
CN103793484A (en) * | 2014-01-17 | 2014-05-14 | 五八同城信息技术有限公司 | Fraudulent conduct identification system based on machine learning in classified information website |
CN106952190A (en) * | 2017-03-22 | 2017-07-14 | 国信优易数据有限公司 | False source of houses typing Activity recognition and early warning system |
CN107437223A (en) * | 2017-08-17 | 2017-12-05 | 重庆小雨点小额贷款有限公司 | Credit information checking method, device and equipment |
CN108711013A (en) * | 2018-05-24 | 2018-10-26 | 深圳市买买提信息科技有限公司 | Abnormal behaviour determines method, apparatus, equipment and storage medium |
US20190014071A1 (en) * | 2016-10-13 | 2019-01-10 | Tencent Technology (Shenzhen) Company Limited | Network information identification method and apparatus |
US20190155851A1 (en) * | 2016-09-09 | 2019-05-23 | Tencent Technology (Shenzhen) Company Limited | Information filtering |
CN111104963A (en) * | 2019-11-22 | 2020-05-05 | 贝壳技术有限公司 | Target user determination method and device, storage medium and electronic equipment |
-
2020
- 2020-06-30 CN CN202010615617.XA patent/CN111914645A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103064987A (en) * | 2013-01-31 | 2013-04-24 | 五八同城信息技术有限公司 | Bogus transaction information identification method |
CN103793484A (en) * | 2014-01-17 | 2014-05-14 | 五八同城信息技术有限公司 | Fraudulent conduct identification system based on machine learning in classified information website |
US20190155851A1 (en) * | 2016-09-09 | 2019-05-23 | Tencent Technology (Shenzhen) Company Limited | Information filtering |
US20190014071A1 (en) * | 2016-10-13 | 2019-01-10 | Tencent Technology (Shenzhen) Company Limited | Network information identification method and apparatus |
CN106952190A (en) * | 2017-03-22 | 2017-07-14 | 国信优易数据有限公司 | False source of houses typing Activity recognition and early warning system |
CN107437223A (en) * | 2017-08-17 | 2017-12-05 | 重庆小雨点小额贷款有限公司 | Credit information checking method, device and equipment |
CN108711013A (en) * | 2018-05-24 | 2018-10-26 | 深圳市买买提信息科技有限公司 | Abnormal behaviour determines method, apparatus, equipment and storage medium |
CN111104963A (en) * | 2019-11-22 | 2020-05-05 | 贝壳技术有限公司 | Target user determination method and device, storage medium and electronic equipment |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113902457A (en) * | 2021-11-19 | 2022-01-07 | 北京房江湖科技有限公司 | Method and device for evaluating reliability of house source information, electronic equipment and storage medium |
CN115409104A (en) * | 2022-08-25 | 2022-11-29 | 贝壳找房(北京)科技有限公司 | Method, apparatus, device, medium and program product for identifying object type |
CN115482014A (en) * | 2022-09-15 | 2022-12-16 | 广东数鼎科技有限公司 | Method and device for identifying false vehicle source of second-hand vehicle |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Pacheco et al. | Uncovering coordinated networks on social media: methods and case studies | |
CN106022834B (en) | Advertisement anti-cheating method and device | |
CN111914645A (en) | Method and device for identifying false information, electronic equipment and storage medium | |
CN109034583A (en) | Abnormal transaction identification method, apparatus and electronic equipment | |
CN112150014B (en) | Enterprise risk early warning method, device, equipment and readable storage medium | |
CN111522724B (en) | Method and device for determining abnormal account number, server and storage medium | |
CN103064987A (en) | Bogus transaction information identification method | |
CN104836781A (en) | Method distinguishing identities of access users, and device | |
CN110084468B (en) | Risk identification method and device | |
CN109828958A (en) | Event recording method and record system based on block chain | |
CN111401447A (en) | Artificial intelligence-based flow cheating identification method and device and electronic equipment | |
CN110457601B (en) | Social account identification method and device, storage medium and electronic device | |
CN104935578A (en) | Website malicious attack prevention method and system | |
CN112734161A (en) | Method, equipment and storage medium for accurately identifying empty-shell enterprises | |
CN111859234A (en) | Illegal content identification method and device, electronic equipment and storage medium | |
TW202105303A (en) | Fraud deduction system, fraud deduction method, and program | |
CN112561565A (en) | User demand identification method based on behavior log | |
CN110458401A (en) | Information processing unit, method and storage medium based on block chain | |
CN108737138B (en) | Service providing method and service platform | |
CN113420789B (en) | Method and device for predicting risk account number, storage medium and computer equipment | |
CN112511632B (en) | Object pushing method, device and equipment based on multi-source data and storage medium | |
CN112347457A (en) | Abnormal account detection method and device, computer equipment and storage medium | |
WO2021048902A1 (en) | Learning model application system, learning model application method, and program | |
CN111784360B (en) | Anti-fraud prediction method and system based on network link backtracking | |
CN106294406A (en) | A kind of method and apparatus accessing data for processing application |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |