CN110674414A - Target information identification method, device, equipment and storage medium - Google Patents

Target information identification method, device, equipment and storage medium Download PDF

Info

Publication number
CN110674414A
CN110674414A CN201910891826.4A CN201910891826A CN110674414A CN 110674414 A CN110674414 A CN 110674414A CN 201910891826 A CN201910891826 A CN 201910891826A CN 110674414 A CN110674414 A CN 110674414A
Authority
CN
China
Prior art keywords
data
identified
information
dimension
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910891826.4A
Other languages
Chinese (zh)
Inventor
李建波
项亮
张予
宝腾飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN201910891826.4A priority Critical patent/CN110674414A/en
Publication of CN110674414A publication Critical patent/CN110674414A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the disclosure provides a target information identification method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring data to be identified uploaded by a terminal corresponding to a target user; performing data processing on the data to be identified in multiple dimensions to obtain classification information of the data to be identified in each dimension; determining whether the data to be identified is target information according to classification information of the data to be identified on each dimension based on a target information identification model, wherein the target information identification model is obtained by training a first neural network model according to the obtained classification information of sample data uploaded by a terminal corresponding to each historical user in a plurality of historical users and an identification result corresponding to the classification information of the sample data, and the identification result is used for indicating whether the sample data is the target information. The method and the device for identifying the target article can solve the problem that whether the article is the target article with the abnormity cannot be timely and effectively identified in the prior art.

Description

Target information identification method, device, equipment and storage medium
Technical Field
The disclosed embodiments relate to the field of identification technologies, and in particular, to a target information identification method, apparatus, device, and storage medium.
Background
With the continuous development of internet technology, more and more internet service platforms emerge. At present, in the service provided by the platform, a user can publish custom content, and each internet service platform can display each account or data published by the user in the platform.
These users usually attract fans through published data such as titles and description information, and many abnormal articles appear, and these abnormal articles attract fans by repeating the content published by other original users or publishing contents with high similarity to the content published by other original users, which is not good.
However, in the prior art, the identification of the articles with the abnormality is generally determined by human identification, but the identification method generally cannot identify whether the articles are the target articles (the articles with the abnormality) in time, which may affect the original user and cause bad propagation of the platform. Therefore, the prior art cannot timely and effectively identify whether the article is the target article with the abnormality.
Disclosure of Invention
The embodiment of the disclosure provides a target information identification method, a target information identification device and a storage medium, so as to overcome the problem that whether an article is an abnormal target article cannot be timely and effectively identified in the prior art.
In a first aspect, an embodiment of the present disclosure provides a target information identification method, including:
acquiring data to be identified uploaded by a terminal corresponding to a target user;
performing data processing on the data to be identified in multiple dimensions to obtain classification information of the data to be identified in each dimension;
determining whether the data to be identified is target information according to classification information of the data to be identified on each dimension based on a target information identification model, wherein the target information identification model is obtained by training a first neural network model according to the obtained classification information of sample data uploaded by a terminal corresponding to each historical user in a plurality of historical users and an identification result corresponding to the classification information of the sample data, and the identification result is used for indicating whether the sample data is the target information.
In a second aspect, an embodiment of the present disclosure provides an object information identifying apparatus, including:
the data acquisition module to be identified is used for acquiring data to be identified uploaded by a terminal corresponding to a target user;
the classification information determining module is used for carrying out data processing on the data to be identified on a plurality of dimensions to obtain classification information of the data to be identified on each dimension;
the identification module is used for determining whether the data to be identified is target information or not according to the classification information of the data to be identified on each dimension based on a target information identification model, the target information identification model is obtained by training a first neural network model according to the classification information of the sample data uploaded by a terminal corresponding to each historical user in a plurality of acquired historical users and the identification result corresponding to the classification information of the sample data, and the identification result is used for indicating whether the sample data is the target information or not.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executes computer-executable instructions stored by the memory to cause the at least one processor to perform the method for identifying target information as set forth in the first aspect above and in various possible designs of the first aspect.
In a fourth aspect, the embodiments of the present disclosure provide a computer-readable storage medium, in which computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the target information identification method according to the first aspect and various possible designs of the first aspect is implemented.
The target information identification method, the device, the equipment and the storage medium provided by the embodiment of the disclosure are characterized in that data to be identified uploaded by a terminal corresponding to a target user is firstly obtained, classification information of the data to be identified on each dimension is obtained by processing the data of the data to be identified on a plurality of dimensions and is used for classifying the data to be identified on each dimension, and then whether the data to be identified is the target information or not can be obtained according to the classification information of the data to be identified on each dimension and a target information identification model obtained by training, so that the target information identification method based on machine learning is realized, and whether the data to be identified is abnormal or not can be effectively identified. According to the data processing method and device, data processing is carried out on the data to be recognized in multiple dimensions, classification information in each dimension is obtained, then the classification information in each dimension is input into the target information recognition model, whether the data to be recognized is the target information or not can be rapidly recognized, and meanwhile time and resources are saved.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present disclosure, and for those skilled in the art, other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a schematic diagram of an architecture of a target information identification system according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart of a target information identification method according to an embodiment of the present disclosure;
fig. 3 is a schematic flowchart of a target information identification method according to yet another embodiment of the disclosure;
fig. 4 is a schematic flowchart of a target information identification method according to another embodiment of the disclosure;
fig. 5 is a schematic flowchart of a target information identification method according to yet another embodiment of the disclosure;
fig. 6 is a schematic flowchart of a target information identification method according to another embodiment of the disclosure;
fig. 7 is a schematic flowchart of a target information identification method according to another embodiment of the disclosure;
fig. 8 is a block diagram illustrating a structure of the target information recognition apparatus according to an embodiment of the disclosure;
fig. 9 is a block diagram illustrating a structure of the target information recognition apparatus according to still another embodiment of the present disclosure;
fig. 10 is a schematic diagram of a hardware structure of an electronic device according to the disclosed embodiment.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
At present, in the service provided by the platform, users can publish customized contents, these users or users usually attract fans through published data such as titles, description information, and the like, and this also presents many abnormal articles, and these abnormal articles (the target articles at least include abnormal articles, and the abnormal articles may be articles with risks) generally attract fans by repeating the contents published by other original users or publishing contents with high similarity to the contents published by other original users, which forms an adverse effect. In the prior art, the identification of the target article is generally determined by artificial identification, but the identification mode generally cannot identify whether the article is abnormal in time, which may affect the benefit of the original user and cause poor propagation of the platform. The embodiment of the disclosure provides a target information identification method to solve the above problem.
Referring to fig. 1, fig. 1 is a schematic diagram of an architecture of a target information identification system according to an embodiment of the present disclosure. The embodiment of the present disclosure provides a target information identification system 10, which includes a server 102 and a client 101, or a target information identification system 10, which includes a terminal device and a client; the client or the terminal device may be a mobile terminal, a fixed terminal, or an electronic device, and the target information identification system 10 may be configured to implement a target information identification process.
The embodiment of the present disclosure does not limit the type of the model, the algorithm of the model, the model identification algorithm, and the like. The client may be considered as a terminal corresponding to an account (a terminal corresponding to a user, for example, a terminal of a user who issues a statement for an article uploaded by the terminal corresponding to the user), and the client may also be considered as a terminal corresponding to a user who generates an association with the account.
In practical application, the target article may be an abnormal article, for example, an article with a risk, so the target information identification method may be at least applied to identification of the risk article, and therefore the method may be an audit scene of an article that a user or a user needs to publish, an audit scene of an article that a user or a user needs to publicize, and the like. The client uploads some data to be identified to a terminal device or a server, the terminal device or the server acquires the data to be identified uploaded by a terminal corresponding to a target user, classification information of the data to be identified in each dimension is obtained by processing the data to be identified in multiple dimensions and is used for classifying the data to be identified in each dimension, and then whether the data to be identified is target information or not can be obtained according to the classification information of the data to be identified in each dimension and a target information identification model obtained through training, so that a target information identification method based on machine learning is realized, and whether the data to be identified is abnormal or not can be effectively identified. After the target information is obtained, the terminal device or the server can perform target processing on the data to be identified, so that adverse effects of the data to be identified on the platform and other users are reduced. According to the data processing method and device, data processing is carried out on the data to be recognized in multiple dimensions, classification information on each dimension is obtained, then the classification information on each dimension is input into the target information recognition model, whether the data to be recognized is the target information or not can be rapidly recognized, time and resources are saved, target processing can be carried out on the data to be recognized, and adverse effects of the data to be recognized on a platform and other users are reduced.
Referring to fig. 2 in conjunction with fig. 1, fig. 2 is a schematic flow chart of a target information identification method provided by the disclosed embodiment. The method of the embodiment of the present disclosure may be applied to a terminal device or a server, that is, the execution subject may be the terminal device or the server, which is not limited herein. The target information identification method comprises the following steps:
s101, acquiring data to be identified uploaded by a terminal corresponding to a target user.
In the embodiment of the disclosure, a terminal corresponding to a target user is not limited, and may be a terminal capable of uploading data, such as a mobile phone, a computer, a tablet, and the like, and data to be identified uploaded by the terminal corresponding to the target user may be data that has been published on a preset platform or data that has not been published and is waiting for review by the preset platform, and the implementation process of the target information identification method is to perform identification through real-time detection, and use a user who publishes or uploads data to the preset platform through the terminal at a current time as a target user, and use data that publishes or uploads data to the preset platform through the terminal at the current time as data to be identified by the target user.
The method includes the steps that articles or comments uploaded by a terminal corresponding to a target user at the current moment are acquired for data processing, the articles can be one or more, and data to be identified are detected in real time and can be effectively processed, so that when detailed processing of the articles is specifically introduced, data processing can be performed on one article.
S102, carrying out data processing on the data to be identified in multiple dimensions to obtain classification information of the data to be identified in each dimension.
In the embodiment of the disclosure, data processing is performed on the data to be identified uploaded by the terminal corresponding to the user through multiple dimensions, and the specific process of the data processing may be different in each dimension.
In one embodiment of the present disclosure, the plurality of dimensions include a keyword dimension, a title dimension, a sensitive domain dimension, a description object dimension, a text sending time dimension, a text sending frequency dimension; the data to be identified comprises text information and text sending time.
Specifically, the keyword dimension means that high-risk words are contained, and then the keyword dimension is combined with IP words; the title dimension refers to the content of the title file; the sensitive domain dimension contains sensitive information; describing an object dimension refers to the object being described; the message sending time dimension contains the message sending time; the frequency dimension of a message is meant to include the frequency of the message.
The data to be identified is subjected to data processing in each dimension, so that classification information of the data to be identified in each dimension can be obtained, wherein the data processing mode is not limited, and only classification identification of the data to be identified in each dimension can be performed.
S103, determining whether the data to be recognized is target information or not according to classification information of the data to be recognized on each dimension based on a target information recognition model, wherein the target information recognition model is obtained by training a first neural network model according to the classification information of sample data uploaded by a terminal corresponding to each historical user in a plurality of acquired historical users and a recognition result corresponding to the classification information of the sample data, and the recognition result is used for indicating whether the sample data is the target information or not.
In the embodiment of the disclosure, the construction process of the target information identification model is as follows: the method comprises the steps of obtaining historical data uploaded by a terminal corresponding to each historical user in a plurality of historical users and identification results corresponding to the historical data, carrying out classification identification on the historical data in each dimension to obtain classification information corresponding to the historical data, wherein the classification information can be a value of a two-classification model, and the historical data can be used as sample data.
Specifically, classification information (which may be a score of a two-classification model) corresponding to the data to be recognized is input into the target information recognition model, and a recognition result of prediction as to whether the data to be recognized is the target information or not can be output, that is, whether the data to be recognized is the target information or not is the target information, that is, whether the target data is data with an abnormality or a risk or whether the recognition result is a probability that the target data is an abnormal article or a risk article, can be output.
In practical application, the identification method of the target information may not be limited to be applied to identification of risk information, where a server or a terminal device of a preset platform collects data to be identified uploaded by a terminal corresponding to a certain user in real time, where the data to be identified may be one article or multiple articles, and here, the process of processing one article is as follows: processing data to be identified uploaded by a terminal corresponding to a user to obtain classification information of the user on multiple dimensions, and acquiring an output result by combining a target information identification model: the data to be recognized is target information or the data to be recognized is not target information (or the probability that the data to be recognized is target information). The target information identification model may adopt any neural network model, classification model, and the like.
In the embodiment of the disclosure, data to be recognized uploaded by a terminal corresponding to a target user is firstly acquired, classification information of the data to be recognized on each dimension is acquired by processing the data to be recognized on a plurality of dimensions, the classification information is used for classifying the data to be recognized on each dimension, and then according to the classification information of the data to be recognized on each dimension and a target information recognition model obtained by training, whether the data to be recognized is the target information can be acquired, so that a target information recognition method based on machine learning is realized, and whether the data to be recognized is abnormal can be effectively recognized. According to the data processing method and device, data processing is carried out on the data to be recognized in multiple dimensions, classification information in each dimension is obtained, then the classification information in each dimension is input into the target information recognition model, whether the data to be recognized is the target information or not can be rapidly recognized, and meanwhile time and resources are saved.
In the embodiment of the present disclosure, S102 may be implemented by the following two ways:
the first method is as follows: referring to fig. 3, fig. 3 is a schematic flowchart of a target information identification method according to still another embodiment of the present disclosure, and the embodiment of the present disclosure describes S102 in detail on the basis of the above-mentioned disclosed embodiment, for example, on the basis of the disclosed embodiment described in fig. 2. The data processing on the data to be identified in the multiple dimensions to obtain the classification information of the data to be identified in each dimension includes:
s201, through a preset high-risk word list, when the data to be recognized contain preset keywords, obtaining classification information of the data to be recognized on the keyword dimension;
s202, through a preset title word segmentation table, when the data to be identified contain a preset title, obtaining classification information of the data to be identified on the title dimension;
s203, obtaining classification information of the data to be recognized on the sensitive field dimension when the data to be recognized contains a preset sensitive field through a preset sensitive field table;
s204, identifying the description object in the data to be identified through a preset description object table to obtain the classification information of the data to be identified on the dimension of the description object;
s205, inquiring a preset text sending time table to obtain classification information of the data to be identified on the dimension of the text sending time;
s206, inquiring a preset text transmission frequency table to obtain the classification information of the data to be identified on the text transmission frequency dimension.
In the embodiment of the present disclosure, the query may be performed on each dimension through various preset tables, for example: the terminal equipment or the server is stored with a preset high-risk word list, a preset title word-dividing list, a preset sensitive field list, a preset description object list, a preset text-sending time list and a preset text-sending frequency list. The preset high-risk word list stores a first mapping relation between a keyword containing a high-risk word and classification information (which can be a value of a two-classification model) corresponding to the keyword, and the classification information of the data to be identified on the keyword dimension can be inquired in the preset high-risk word list according to the first mapping relation; a second mapping relation between the category of the preset title and the classification information (which can be a score of a two-classification model) corresponding to the preset title is stored in the preset title sub-word list, and the classification information of the data to be identified on the title dimension can be inquired in the preset title sub-word list according to the second mapping relation; a third mapping relation between the preset sensitive field and classification information (which can be a score of a two-classification model) corresponding to the preset sensitive field is stored in the preset sensitive field table, and the classification information of the data to be identified on the sensitive field dimension can be inquired in the preset sensitive field table according to the third mapping relation; a fourth mapping relation between the preset description object and classification information (which may be a score of a two-classification model) corresponding to the preset description object is stored in the preset description object table, and the classification information of the data to be identified on the dimension of the description object can be inquired in the preset description object table according to the fourth mapping relation; a fifth mapping relation between the preset text sending time and classification information (which can be a score of a two-classification model) corresponding to the preset text sending time is stored in the preset text sending time table, and the classification information of the data to be identified on the text sending time dimension can be inquired in the preset text sending time table according to the fifth mapping relation; the preset text frequency table stores a sixth mapping relationship between the preset text frequency and classification information (which may be a score of a two-classification model) corresponding to the preset text frequency, and classification information of the data to be identified in the text frequency dimension can be queried in the preset text frequency table according to the sixth mapping relationship.
The first method is as follows: referring to fig. 4, fig. 4 is a schematic flowchart of a target information identification method according to another embodiment of the present disclosure, and the embodiment of the present disclosure describes S102 in detail on the basis of the above-mentioned disclosed embodiment, for example, on the basis of the disclosed embodiment described in fig. 2. The data processing on the data to be identified in multiple dimensions to obtain the classification information of the data to be identified in each dimension includes:
s301, extracting the title of the text information;
s302, inputting the title into a pre-trained word segmentation model to obtain feature information of the title;
s303, acquiring the text sending frequency of the target user;
s304, taking the text sending time and the text sending frequency as feature information of a text sending rule corresponding to the target user, wherein the feature information of the title and the feature information of the text sending rule form feature information of the data to be identified;
s305, performing data processing on the feature information of the data to be identified in the multiple dimensions to obtain classification information of the data to be identified in each dimension.
In the embodiment of the disclosure, the text information includes the content of the article and the title of the article, the title in the text information is extracted from the text information by an identification technology, then the title is segmented, that is, the title is input into a pre-trained analysis model, so that the segmentation result of the title is the feature information of the title, the historical data uploaded by the target user through a terminal is obtained, the historical data uploaded by the target user through the terminal is counted, the text sending frequency of the target user is obtained, the text sending time of the data to be identified corresponding to the target user is combined with the text sending frequency of the target user, the feature information of the text sending rule corresponding to the target user is obtained, and then the feature information of the title and the feature information of the text sending rule are formed into the feature information of the data to be identified, that is the total feature information of the data to be identified, at this time, the feature information of the data to be identified is subjected to data processing in each dimension, so as to obtain the classification information of the data to be identified in each dimension.
Specifically, how to perform data processing on feature information of data to be identified in each dimension to obtain classification information of the data to be identified in each dimension is shown in fig. 5, where fig. 5 is a schematic flow chart of a target information identification method according to still another embodiment of the present disclosure, and S305 is described in detail in the embodiment of the present disclosure on the basis of the above-described disclosed embodiment, for example, on the basis of the disclosed embodiment described in fig. 4. The data processing is carried out on the characteristic information of the data to be identified in the multiple dimensions to obtain the classification information of the data to be identified in each dimension, and the method comprises the following steps:
s401, acquiring feature information of a plurality of historical data and classification information of the feature information of each historical data on each dimension;
s402, taking the feature information of the plurality of historical data and the classification information of the feature information of each historical data on each dimension as a training sample of a second neural network model, and training the second neural network model to obtain a multi-classification model;
s403, inputting the characteristic information of the data to be identified into the multi-classification model to obtain classification information of the target user on each dimension, wherein the classification information is a classification model score.
In the embodiment of the present disclosure, the classification information of the target user in each dimension is realized by a trained multi-classification model, wherein the building process of the multi-classification model is as follows: firstly, acquiring a plurality of historical data, wherein the historical data can be articles published or unpublished on a preset platform by a plurality of different users and classification information of the articles in each dimension, and according to the historical data, obtaining characteristic information of the plurality of historical data through the data processing process, namely whether the articles contain high-risk characteristic scores of high-risk words, whether the articles contain title characteristic scores of title files, whether the articles contain sensitive field characteristic scores of sensitive words, whether the articles contain description object characteristic scores of description objects, time characteristic scores of text-publishing moments and frequency characteristic scores of text-publishing frequencies; and then taking the feature information of the plurality of historical data and the classification information of the feature information of each historical data on each dimension as a training sample of a second neural network model, inputting the training sample into the second neural network model for training, outputting the predicted classification information of the feature information of each historical data on each dimension, and taking the trained second neural network model as a multi-classification model when the predicted classification information and the input classification information of the feature information of each historical data on each dimension tend to be stable.
Furthermore, the feature information of the data to be identified is used as an input quantity in the multi-classification model, and the output quantity is the predicted classification information of the target user on each dimension, wherein the classification information of the target user on each dimension can be a classification model score.
After the classification information of the data to be identified in each dimension is obtained, the classification information needs to be identified, and then it is determined whether the data to be identified is the target information, see fig. 6, where fig. 6 is a schematic flow diagram of a target information identification method provided by another embodiment of the present disclosure, and the embodiment of the present disclosure details S103 on the basis of the above-described disclosed embodiment, for example, on the basis of any one of the disclosed embodiments described in fig. 2 to 5. The identifying model based on the target information and determining whether the data to be identified is the target information according to the classification information of the data to be identified on each dimension comprises the following steps:
s501, obtaining target characteristic value information corresponding to the data to be identified of the target user according to the classification information of the target user in each dimension, wherein the target characteristic information is used for representing the classification information of the target user after the classification information of the target user in each dimension is fused;
s502, inputting the target characteristic information into the target information identification model to obtain the probability that the data to be identified is the target information;
s503, if the probability that the data to be identified is the target information is larger than the preset target probability, determining that the data to be identified is the target information.
In the embodiment of the disclosure, weighted fusion is performed according to classification information of a target user in each dimension, that is, classification information of data to be recognized in a keyword dimension, classification information of data to be recognized in a header dimension, classification information of data to be recognized in a sensitive domain dimension, classification information of data to be recognized in a description object dimension, classification information of data to be recognized in a text sending time dimension, and classification information of data to be recognized in a text sending frequency dimension, for example, classification information of data to be recognized in the keyword dimension, classification information of data to be recognized in the header dimension, classification information of data to be recognized in the sensitive domain dimension, classification information of data to be recognized in the description object dimension, classification information of data to be recognized in the text sending time dimension, and weights of the classification information of data to be recognized in the text sending frequency dimension are respectively 0.2, 0.3, 0.2, 0.1, therefore, the target characteristic value information corresponding to the data to be identified of the target user is: 0.2 × classification information of the data to be recognized in the keyword dimension +0.3 × classification information of the data to be recognized in the header dimension +0.2 × classification information of the data to be recognized in the sensitive field dimension +0.1 × classification information of the data to be recognized in the description object dimension +0.1 × classification information of the data to be recognized in the text time dimension +0.1 × classification information of the data to be recognized in the text frequency dimension.
And then inputting the target characteristic information into a target information identification model to obtain the probability that the data to be identified is target information, and determining whether the data to be identified is the target information or non-target information according to the probability of the target information and a preset target probability.
After determining that the data to be recognized is the target information, the data to be recognized may be further subjected to target processing to ensure that data of other users or a preset platform is complete, see fig. 7, where fig. 7 is a schematic flow diagram of a target information recognition method according to another embodiment of the present disclosure, and the embodiment of the present disclosure describes the target information recognition method in detail on the basis of the above-described disclosed embodiment, for example, on the basis of the disclosed embodiment described in fig. 6. After the determining that the data to be identified is the target information, the method further comprises:
s601, inquiring a processing priority corresponding to the data to be identified through a preset processing priority table according to the probability of the target information corresponding to the data to be identified, wherein the preset processing priority table stores a mapping relation between the probability of the preset target information and the processing priority corresponding to the probability of the preset target information.
In the embodiment of the disclosure, different processing priorities are set for the data to be identified according to the output probability of the target information identification model. In practical applications, it may be detected that a plurality of pieces of data to be identified are target information at the same time, and since a processing process for the target information requires time, a processing priority may be assigned to the data to be identified, for example, a higher processing priority may be set for the data to be identified with a higher output probability. For example, the articles are preferably subjected to an auditing process to avoid the dissemination of the target articles. The processing priority of the data to be identified may be obtained by: the terminal equipment or the server is stored with a preset processing priority table, the preset processing priority table is stored with a mapping relation between the probability of preset target information and the processing priority corresponding to the probability of the preset target information, and the processing priority corresponding to the data to be identified is obtained by inquiring the preset processing priority table according to the mapping relation between the probability of the preset target information and the processing priority corresponding to the probability of the preset target information. The preset processing priority may include a high level, a medium level, and a low level, among others.
S602, determining a processing mode of the data to be identified according to the processing priority corresponding to the data to be identified; the processing mode comprises the steps of carrying out number sealing on the target user or carrying out forbidden publishing, transferring or sharing on the data to be identified of the target user.
In the embodiment of the present disclosure, after the processing priority corresponding to the data to be identified is obtained, a processing opportunity and a processing manner of the data to be identified may be selected according to a preset processing priority: if the processing priority corresponding to the data to be identified is high, the target user needs to be immediately signed, if the processing priority corresponding to the data to be identified is medium, the data to be identified of the target user needs to be prohibited to be published immediately after the high-level data to be identified is processed, if the processing priority corresponding to the data to be identified is low, after the high-level data to be identified and the medium-level data to be identified are processed, the operation of prohibiting transfer or sharing of the data to be identified of the target user needs to be executed, and the benefits of target information on the original user and the influence of bad propagation of a platform are reduced.
The target information identification method provided by the embodiment of the disclosure obtains data to be identified uploaded by a terminal corresponding to a target user, obtains classification information of the data to be identified on each dimension by processing the data of the data to be identified on a plurality of dimensions, is used for classifying the data to be identified on each dimension, and can quickly identify whether the data to be identified is the target information according to the classification information of the data to be identified on each dimension and a target information identification model obtained by training. After the target information is identified, the target information can be processed in time, the completeness of data of other users or a preset platform is guaranteed, and the benefit of the target information on an original user and the influence of bad propagation of the platform are further reduced.
Fig. 8 is a block diagram of a target information recognition apparatus according to an embodiment of the present disclosure, corresponding to the target information recognition method according to the embodiment disclosed above. For ease of illustration, only portions that are relevant to embodiments of the present disclosure are shown. Referring to fig. 8, the object information recognition apparatus 80 includes: a to-be-identified data acquisition module 801, a classification information determination module 802 and an identification module 803; the to-be-identified data acquiring module 801 is configured to acquire to-be-identified data uploaded by a terminal corresponding to a target user, obtain classification information of the to-be-identified data in each dimension by processing the to-be-identified data in multiple dimensions, classify the to-be-identified data in each dimension, and then identify whether the to-be-identified data is target information quickly according to the classification information of the to-be-identified data in each dimension and a target information identification model obtained through training.
The apparatus provided in the embodiment of the present disclosure may be used to implement the technical solutions of the above method embodiments, and the implementation principles and technical effects are similar, which are not described herein again in the embodiment of the present disclosure.
In an embodiment of the present disclosure, on the basis of the above-described disclosed embodiment, for example, on the basis of the embodiment of fig. 8, the embodiment of the present disclosure specifies a plurality of dimensions and data to be identified. The multiple dimensions comprise a keyword dimension, a title dimension, a sensitive field dimension, a description object dimension, a text sending time dimension and a text sending frequency dimension, and the data to be identified comprises text information and text sending time.
In an embodiment of the present disclosure, on the basis of the above-mentioned disclosed embodiment, for example, on the basis of the embodiment of fig. 8, the embodiment of the present disclosure describes the classification information determining module 802 in detail. The classification information determining module 802 is specifically configured to:
through a preset high-risk word list, when the data to be recognized contain preset keywords, obtaining classification information of the data to be recognized on the keyword dimension; through a preset title word segmentation table, when the data to be identified contains a preset title, obtaining the classification information of the data to be identified on the title dimension; through a preset sensitive field table, when the data to be recognized contain a preset sensitive field, obtaining classification information of the data to be recognized on the sensitive field dimension; identifying the description object in the data to be identified through a preset description object table to obtain the classification information of the data to be identified on the dimension of the description object; inquiring a preset text sending time table to obtain classification information of the data to be identified on the dimension of the text sending time; and inquiring a preset text sending frequency table to obtain the classification information of the data to be identified on the text sending frequency dimension.
In an embodiment of the present disclosure, on the basis of the above-mentioned disclosed embodiment, for example, on the basis of the embodiment of fig. 8, the embodiment of the present disclosure describes the classification information determining module 802 in detail. The classification information determining module 802 includes: the system comprises a title extraction unit, a first characteristic information determination unit, a text sending frequency acquisition module, a second characteristic information determination unit and a classification information determination unit; a title extracting unit for extracting a title of the text information; the first characteristic information determining unit is used for inputting the title into a pre-trained word segmentation model to obtain the characteristic information of the title; the text frequency acquisition module is used for acquiring the text frequency of the target user; a second feature information determining unit, configured to use the text sending time and the text sending frequency as feature information of a text sending rule corresponding to the target user, where the feature information of the header and the feature information of the text sending rule form feature information of the data to be identified; and the classification information determining unit is used for carrying out data processing on the characteristic information of the data to be identified in the plurality of dimensions to obtain the classification information of the data to be identified in each dimension.
In an embodiment of the present disclosure, on the basis of the above-mentioned disclosed embodiment, for example, on the basis of the embodiment of fig. 8, the embodiment of the present disclosure describes the classification information determining unit in detail. The classification information determining unit is specifically configured to: acquiring characteristic information of a plurality of historical data and classification information of the characteristic information of each historical data on each dimension; taking the feature information of the plurality of historical data and the classification information of the feature information of each historical data on each dimension as a training sample of a second neural network model, and training the second neural network model to obtain a multi-classification model; and inputting the characteristic information of the data to be identified into the multi-classification model to obtain classification information of the target user on each dimension, wherein the classification information is a classification model score.
In an embodiment of the present disclosure, the identification module 803 is described in detail in the embodiment of the present disclosure on the basis of the above disclosed embodiment, for example, on the basis of the disclosed embodiment shown in fig. 8 or corresponding to any one of the above target information identification apparatuses. The identifying module 803 is specifically configured to:
obtaining target characteristic value information corresponding to the data to be identified of the target user according to the classification information of the target user in each dimension, wherein the target characteristic information is used for representing the classification information of the target user after the fusion of the classification information of the target user in each dimension; inputting the target characteristic information into the target information identification model to obtain the probability that the data to be identified is target information; and if the probability that the data to be identified is the target information is greater than the preset target probability, determining that the data to be identified is the target information.
Referring to fig. 9, fig. 9 is a block diagram illustrating a structure of the target information recognition apparatus according to still another embodiment of the present disclosure. The present disclosure describes the target information recognition device in detail based on the above-described disclosure. The device further comprises: a processing module 804; the processing module 804 is configured to query, according to a probability of target information corresponding to the data to be identified, a processing priority corresponding to the data to be identified through a preset processing priority table, where a mapping relationship between the probability of preset target information and a processing priority corresponding to the probability of preset target information is stored in the preset processing priority table; determining a processing mode of the data to be identified according to the processing priority corresponding to the data to be identified; the processing mode comprises the steps of carrying out number sealing on the target user or carrying out forbidden publishing, transferring or sharing on the data to be identified of the target user.
Referring to fig. 10, a schematic structural diagram of an electronic device 1000 suitable for implementing the embodiment of the present disclosure is shown, where the electronic device 1000 may be a terminal device or a server. Among them, the terminal Device may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a Digital broadcast receiver, a Personal Digital Assistant (PDA), a tablet computer (PAD), a Portable Multimedia Player (PMP), a car terminal (e.g., car navigation terminal), etc., and a fixed terminal such as a Digital TV, a desktop computer, etc. The electronic device shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 10, the electronic device 1000 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 1001 that may perform various suitable actions and processes according to a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage device 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the electronic apparatus 1000 are also stored. The processing device 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
Generally, the following devices may be connected to the I/O interface 1005: input devices 1006 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 1007 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 1008 including, for example, magnetic tape, hard disk, and the like; and a communication device 1009. The communication device 1009 may allow the electronic device 1000 to communicate with other devices wirelessly or by wire to exchange data. While fig. 10 illustrates an electronic device 1000 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 1009, or installed from the storage means 1008, or installed from the ROM 1002. The computer program, when executed by the processing device 1001, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the methods shown in the disclosed embodiments.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of Network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first retrieving unit may also be described as a "unit for retrieving at least two internet protocol addresses".
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
In a first aspect, an embodiment of the present disclosure provides a target information identification method, including:
acquiring data to be identified uploaded by a terminal corresponding to a target user;
performing data processing on the data to be identified in multiple dimensions to obtain classification information of the data to be identified in each dimension;
determining whether the data to be identified is target information according to classification information of the data to be identified on each dimension based on a target information identification model, wherein the target information identification model is obtained by training a first neural network model according to the obtained classification information of sample data uploaded by a terminal corresponding to each historical user in a plurality of historical users and an identification result corresponding to the classification information of the sample data, and the identification result is used for indicating whether the sample data is the target information.
According to one or more embodiments of the present disclosure, the multiple dimensions include a keyword dimension, a title dimension, a sensitive field dimension, a description object dimension, a text sending time dimension, and a text sending frequency dimension, and the data to be recognized includes text information and text sending time.
According to one or more embodiments of the present disclosure, the performing data processing on the data to be recognized in the multiple dimensions to obtain classification information of the data to be recognized in each dimension includes:
through a preset high-risk word list, when the data to be recognized contain preset keywords, obtaining classification information of the data to be recognized on the keyword dimension;
through a preset title word segmentation table, when the data to be identified contains a preset title, obtaining the classification information of the data to be identified on the title dimension;
through a preset sensitive field table, when the data to be recognized contain a preset sensitive field, obtaining classification information of the data to be recognized on the sensitive field dimension;
identifying the description object in the data to be identified through a preset description object table to obtain the classification information of the data to be identified on the dimension of the description object;
inquiring a preset text sending time table to obtain classification information of the data to be identified on the dimension of the text sending time;
and inquiring a preset text sending frequency table to obtain the classification information of the data to be identified on the text sending frequency dimension.
According to one or more embodiments of the present disclosure, the performing data processing on the data to be recognized in multiple dimensions to obtain classification information of the data to be recognized in each dimension includes:
extracting a title of the text information;
inputting the title into a pre-trained word segmentation model to obtain the characteristic information of the title;
acquiring the text sending frequency of the target user;
taking the text sending time and the text sending frequency as feature information of a text sending rule corresponding to the target user, wherein the feature information of the title and the feature information of the text sending rule form the feature information of the data to be identified;
and performing data processing on the characteristic information to-be-identified data of the to-be-identified data in the plurality of dimensions to obtain classification information of the to-be-identified data in each dimension.
According to one or more embodiments of the present disclosure, the data processing is performed on the feature information of the data to be identified in the multiple dimensions to obtain classification information to be identified of the data to be identified in each dimension, and the method includes:
acquiring characteristic information of a plurality of historical data and classification information of the characteristic information of each historical data on each dimension;
taking the feature information of the plurality of historical data and the classification information of the feature information of each historical data on each dimension as a training sample of a second neural network model, and training the second neural network model to obtain a multi-classification model;
and inputting the characteristic information of the data to be identified into the multi-classification model to obtain classification information of the target user on each dimension, wherein the classification information is a classification model score.
According to one or more embodiments of the present disclosure, the determining, based on a target information identification model and according to classification information of the data to be identified in each dimension, whether the data to be identified is target information includes:
obtaining target characteristic value information corresponding to the data to be identified of the target user according to the classification information of the target user in each dimension, wherein the target characteristic information is used for representing the classification information of the target user after the fusion of the classification information of the target user in each dimension;
inputting the target characteristic information into the target information identification model to obtain the probability that the data to be identified is target information;
and if the probability that the data to be identified is the target information is greater than the preset target probability, determining that the data to be identified is the target information.
According to one or more embodiments of the present disclosure, after the determining that the data to be identified is the target information, the method further includes:
inquiring a processing priority corresponding to the data to be identified through a preset processing priority table according to the probability of target information corresponding to the data to be identified, wherein the preset processing priority table stores a mapping relation between the probability of preset target information and the processing priority corresponding to the probability of preset target information;
determining a processing mode of the data to be identified according to the processing priority corresponding to the data to be identified;
the processing mode comprises the steps of carrying out number sealing on the target user or carrying out forbidden publishing, transferring or sharing on the data to be identified of the target user.
In a second aspect, an embodiment of the present disclosure provides an object information identifying apparatus, including:
the data acquisition module to be identified is used for acquiring data to be identified uploaded by a terminal corresponding to a target user;
the classification information determining module is used for carrying out data processing on the data to be identified on a plurality of dimensions to obtain classification information of the data to be identified on each dimension;
the identification module is used for determining whether the data to be identified is target information or not according to the classification information of the data to be identified on each dimension based on a target information identification model, the target information identification model is obtained by training a first neural network model according to the classification information of the sample data uploaded by a terminal corresponding to each historical user in a plurality of acquired historical users and the identification result corresponding to the classification information of the sample data, and the identification result is used for indicating whether the sample data is the target information or not.
According to one or more embodiments of the present disclosure, the multiple dimensions include a keyword dimension, a title dimension, a sensitive field dimension, a description object dimension, a text sending time dimension, and a text sending frequency dimension, and the data to be recognized includes text information and text sending time.
According to one or more embodiments of the present disclosure, the classification information determining module is specifically configured to:
through a preset high-risk word list, when the data to be recognized contain preset keywords, obtaining classification information of the data to be recognized on the keyword dimension;
through a preset title word segmentation table, when the data to be identified contains a preset title, obtaining the classification information of the data to be identified on the title dimension;
through a preset sensitive field table, when the data to be recognized contain a preset sensitive field, obtaining classification information of the data to be recognized on the sensitive field dimension;
identifying the description object in the data to be identified through a preset description object table to obtain the classification information of the data to be identified on the dimension of the description object;
inquiring a preset text sending time table to obtain classification information of the data to be identified on the dimension of the text sending time;
and inquiring a preset text sending frequency table to obtain the classification information of the data to be identified on the text sending frequency dimension.
According to one or more embodiments of the present disclosure, the data to be recognized includes text information and a text transmission time;
the classification information determination module includes:
a title extracting unit for extracting a title of the text information;
the first characteristic information determining unit is used for inputting the title into a pre-trained word segmentation model to obtain the characteristic information of the title;
the text frequency acquisition module is used for acquiring the text frequency of the target user;
a second feature information determining unit, configured to use the text sending time and the text sending frequency as feature information of a text sending rule corresponding to the target user, where the feature information of the header and the feature information of the text sending rule form feature information of the data to be identified;
and the classification information determining unit is used for carrying out data processing on the characteristic information of the data to be identified in the plurality of dimensions to obtain the classification information of the data to be identified in each dimension.
According to one or more embodiments of the present disclosure, the classification information determining unit is specifically configured to:
acquiring characteristic information of a plurality of historical data and classification information of the characteristic information of each historical data on each dimension;
taking the feature information of the plurality of historical data and the classification information of the feature information of each historical data on each dimension as a training sample of a second neural network model, and training the second neural network model to obtain a multi-classification model;
and inputting the characteristic information of the data to be identified into the multi-classification model to obtain classification information of the target user on each dimension, wherein the classification information is a classification model score.
According to one or more embodiments of the present disclosure, the identification module is specifically configured to:
obtaining target characteristic value information corresponding to the data to be identified of the target user according to the classification information of the target user in each dimension, wherein the target characteristic information is used for representing the classification information of the target user after the fusion of the classification information of the target user in each dimension;
inputting the target characteristic information into the target information identification model to obtain the probability that the data to be identified is target information;
and if the probability that the data to be identified is the target information is greater than the preset target probability, determining that the data to be identified is the target information.
According to one or more embodiments of the present disclosure, the apparatus further comprises: a processing module;
the processing module is used for inquiring the processing priority corresponding to the data to be identified through a preset processing priority table according to the probability of the target information corresponding to the data to be identified, and the preset processing priority table stores the mapping relation between the probability of the preset target information and the processing priority corresponding to the probability of the preset target information;
determining a processing mode of the data to be identified according to the processing priority corresponding to the data to be identified;
the processing mode comprises the steps of carrying out number sealing on the target user or carrying out forbidden publishing, transferring or sharing on the data to be identified of the target user.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executes computer-executable instructions stored by the memory to cause the at least one processor to perform the method for identifying target information as set forth in the first aspect above and in various possible designs of the first aspect.
In a fourth aspect, the embodiments of the present disclosure provide a computer-readable storage medium, in which computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the target information identification method according to the first aspect and various possible designs of the first aspect is implemented.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims (10)

1. A target information identification method is characterized by comprising the following steps:
acquiring data to be identified uploaded by a terminal corresponding to a target user;
performing data processing on the data to be identified in multiple dimensions to obtain classification information of the data to be identified in each dimension;
determining whether the data to be identified is target information according to classification information of the data to be identified on each dimension based on a target information identification model, wherein the target information identification model is obtained by training a first neural network model according to the obtained classification information of sample data uploaded by a terminal corresponding to each historical user in a plurality of historical users and an identification result corresponding to the classification information of the sample data, and the identification result is used for indicating whether the sample data is the target information.
2. The method of claim 1, wherein the plurality of dimensions include a keyword dimension, a title dimension, a sensitive domain dimension, a description object dimension, a text sending time dimension, and a text sending frequency dimension, and the data to be identified includes text information and text sending time.
3. The method according to claim 2, wherein the performing data processing on the data to be identified in multiple dimensions to obtain classification information of the data to be identified in each dimension comprises:
through a preset high-risk word list, when the data to be recognized contain preset keywords, obtaining classification information of the data to be recognized on the keyword dimension;
through a preset title word segmentation table, when the data to be identified contains a preset title, obtaining the classification information of the data to be identified on the title dimension;
through a preset sensitive field table, when the data to be recognized contain a preset sensitive field, obtaining classification information of the data to be recognized on the sensitive field dimension;
identifying the description object in the data to be identified through a preset description object table to obtain the classification information of the data to be identified on the dimension of the description object;
inquiring a preset text sending time table to obtain classification information of the data to be identified on the dimension of the text sending time;
and inquiring a preset text sending frequency table to obtain the classification information of the data to be identified on the text sending frequency dimension.
4. The method according to claim 2, wherein the performing data processing on the data to be identified in multiple dimensions to obtain classification information of the data to be identified in each dimension comprises:
extracting a title of the text information;
inputting the title into a pre-trained word segmentation model to obtain the characteristic information of the title;
acquiring the text sending frequency of the target user;
taking the text sending time and the text sending frequency as feature information of a text sending rule corresponding to the target user, wherein the feature information of the title and the feature information of the text sending rule form the feature information of the data to be identified;
and performing data processing on the characteristic information to-be-identified data of the to-be-identified data in the plurality of dimensions to obtain classification information of the to-be-identified data in each dimension.
5. The method according to claim 4, wherein the data processing is performed on the feature information of the data to be identified in the plurality of dimensions to obtain classification information to be identified of the data to be identified in each dimension, and the method comprises:
acquiring characteristic information of a plurality of historical data and classification information of the characteristic information of each historical data on each dimension;
taking the feature information of the plurality of historical data and the classification information of the feature information of each historical data on each dimension as a training sample of a second neural network model, and training the second neural network model to obtain a multi-classification model;
and inputting the characteristic information of the data to be identified into the multi-classification model to obtain classification information of the target user on each dimension, wherein the classification information is a classification model score.
6. The method according to any one of claims 1 to 5, wherein the determining whether the data to be identified is target information according to the classification information of the data to be identified in each dimension based on the target information identification model comprises:
obtaining target characteristic value information corresponding to the data to be identified of the target user according to the classification information of the target user in each dimension, wherein the target characteristic information is used for representing the classification information of the target user after the fusion of the classification information of the target user in each dimension;
inputting the target characteristic information into the target information identification model to obtain the probability that the data to be identified is target information;
and if the probability that the data to be identified is the target information is greater than the preset target probability, determining that the data to be identified is the target information.
7. The method of claim 6, wherein after the determining that the data to be identified is target information, the method further comprises:
inquiring a processing priority corresponding to the data to be identified through a preset processing priority table according to the probability of target information corresponding to the data to be identified, wherein the preset processing priority table stores a mapping relation between the probability of preset target information and the processing priority corresponding to the probability of preset target information;
determining a processing mode of the data to be identified according to the processing priority corresponding to the data to be identified;
the processing mode comprises the steps of carrying out number sealing on the target user or carrying out forbidden publishing, transferring or sharing on the data to be identified of the target user.
8. An object information identifying apparatus, comprising:
the data acquisition module to be identified is used for acquiring data to be identified uploaded by a terminal corresponding to a target user;
the classification information determining module is used for carrying out data processing on the data to be identified on a plurality of dimensions to obtain classification information of the data to be identified on each dimension;
the identification module is used for determining whether the data to be identified is target information or not according to the classification information of the data to be identified on each dimension based on a target information identification model, the target information identification model is obtained by training a first neural network model according to the classification information of the sample data uploaded by a terminal corresponding to each historical user in a plurality of acquired historical users and the identification result corresponding to the classification information of the sample data, and the identification result is used for indicating whether the sample data is the target information or not.
9. An electronic device, comprising: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the object information identification method of any of claims 1 to 7.
10. A computer-readable storage medium, wherein computer-executable instructions are stored in the computer-readable storage medium, and when executed by a processor, implement the object information identification method according to any one of claims 1 to 7.
CN201910891826.4A 2019-09-20 2019-09-20 Target information identification method, device, equipment and storage medium Pending CN110674414A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910891826.4A CN110674414A (en) 2019-09-20 2019-09-20 Target information identification method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910891826.4A CN110674414A (en) 2019-09-20 2019-09-20 Target information identification method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110674414A true CN110674414A (en) 2020-01-10

Family

ID=69078447

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910891826.4A Pending CN110674414A (en) 2019-09-20 2019-09-20 Target information identification method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110674414A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291071A (en) * 2020-01-21 2020-06-16 北京字节跳动网络技术有限公司 Data processing method and device and electronic equipment
CN111311290A (en) * 2020-04-17 2020-06-19 广州信天翁信息科技有限公司 Method for digitizing and verifying articles and related device
CN111460267A (en) * 2020-04-01 2020-07-28 腾讯科技(深圳)有限公司 Object identification method, device and system
CN111459780A (en) * 2020-04-01 2020-07-28 北京字节跳动网络技术有限公司 User identification method and device, readable medium and electronic equipment
CN111815066A (en) * 2020-07-21 2020-10-23 上海数鸣人工智能科技有限公司 User click prediction method based on gradient lifting decision tree
CN111858905A (en) * 2020-07-20 2020-10-30 北京百度网讯科技有限公司 Model training method, information identification method, device, electronic equipment and storage medium
CN112738567A (en) * 2020-12-22 2021-04-30 北京百度网讯科技有限公司 Platform content processing method and device, electronic equipment and storage medium
US20220383427A1 (en) * 2020-02-13 2022-12-01 Beijing Dajia Internet Information Technology Co., Ltd. Method and apparatus for group display
CN115588115A (en) * 2022-09-27 2023-01-10 北京羽乐创新科技有限公司 Method and device for identifying trademark picture

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107256257A (en) * 2017-06-12 2017-10-17 上海携程商务有限公司 Abnormal user generation content identification method and system based on business datum
CN109214501A (en) * 2017-06-29 2019-01-15 北京京东尚科信息技术有限公司 The method and apparatus of information for identification
CN109582788A (en) * 2018-11-09 2019-04-05 北京京东金融科技控股有限公司 Comment spam training, recognition methods, device, equipment and readable storage medium storing program for executing
CN109657243A (en) * 2018-12-17 2019-04-19 江苏满运软件科技有限公司 Sensitive information recognition methods, system, equipment and storage medium
CN109858039A (en) * 2019-03-01 2019-06-07 北京奇艺世纪科技有限公司 A kind of text information identification method and identification device
CN109977416A (en) * 2019-04-03 2019-07-05 中山大学 A kind of multi-level natural language anti-spam text method and system
US20190258783A1 (en) * 2018-02-21 2019-08-22 International Business Machines Corporation Stolen machine learning model identification
CN110210022A (en) * 2019-05-22 2019-09-06 北京百度网讯科技有限公司 Header identification method and device
CN110222170A (en) * 2019-04-25 2019-09-10 平安科技(深圳)有限公司 A kind of method, apparatus, storage medium and computer equipment identifying sensitive data

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107256257A (en) * 2017-06-12 2017-10-17 上海携程商务有限公司 Abnormal user generation content identification method and system based on business datum
CN109214501A (en) * 2017-06-29 2019-01-15 北京京东尚科信息技术有限公司 The method and apparatus of information for identification
US20190258783A1 (en) * 2018-02-21 2019-08-22 International Business Machines Corporation Stolen machine learning model identification
CN109582788A (en) * 2018-11-09 2019-04-05 北京京东金融科技控股有限公司 Comment spam training, recognition methods, device, equipment and readable storage medium storing program for executing
CN109657243A (en) * 2018-12-17 2019-04-19 江苏满运软件科技有限公司 Sensitive information recognition methods, system, equipment and storage medium
CN109858039A (en) * 2019-03-01 2019-06-07 北京奇艺世纪科技有限公司 A kind of text information identification method and identification device
CN109977416A (en) * 2019-04-03 2019-07-05 中山大学 A kind of multi-level natural language anti-spam text method and system
CN110222170A (en) * 2019-04-25 2019-09-10 平安科技(深圳)有限公司 A kind of method, apparatus, storage medium and computer equipment identifying sensitive data
CN110210022A (en) * 2019-05-22 2019-09-06 北京百度网讯科技有限公司 Header identification method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
段大高 等: "基于神经网络的微博虚假消息识别模型", 《信息网络安全》 *
许志华,吴立新: "《地灾与建筑损毁的无人机与地面LiDAR协同观测及评估》", 31 March 2019, 北京理工大学出版社 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291071B (en) * 2020-01-21 2023-10-17 北京字节跳动网络技术有限公司 Data processing method and device and electronic equipment
CN111291071A (en) * 2020-01-21 2020-06-16 北京字节跳动网络技术有限公司 Data processing method and device and electronic equipment
US20220383427A1 (en) * 2020-02-13 2022-12-01 Beijing Dajia Internet Information Technology Co., Ltd. Method and apparatus for group display
CN111459780B (en) * 2020-04-01 2023-04-07 北京字节跳动网络技术有限公司 User identification method and device, readable medium and electronic equipment
CN111459780A (en) * 2020-04-01 2020-07-28 北京字节跳动网络技术有限公司 User identification method and device, readable medium and electronic equipment
CN111460267B (en) * 2020-04-01 2023-04-07 腾讯科技(深圳)有限公司 Object identification method, device and system
CN111460267A (en) * 2020-04-01 2020-07-28 腾讯科技(深圳)有限公司 Object identification method, device and system
CN111311290B (en) * 2020-04-17 2023-08-08 广州信天翁信息科技有限公司 Article digitizing and verifying method and related device
CN111311290A (en) * 2020-04-17 2020-06-19 广州信天翁信息科技有限公司 Method for digitizing and verifying articles and related device
CN111858905A (en) * 2020-07-20 2020-10-30 北京百度网讯科技有限公司 Model training method, information identification method, device, electronic equipment and storage medium
CN111858905B (en) * 2020-07-20 2024-05-07 北京百度网讯科技有限公司 Model training method, information identification device, electronic equipment and storage medium
CN111815066A (en) * 2020-07-21 2020-10-23 上海数鸣人工智能科技有限公司 User click prediction method based on gradient lifting decision tree
CN111815066B (en) * 2020-07-21 2021-03-26 上海数鸣人工智能科技有限公司 User click prediction method based on gradient lifting decision tree
CN112738567A (en) * 2020-12-22 2021-04-30 北京百度网讯科技有限公司 Platform content processing method and device, electronic equipment and storage medium
CN112738567B (en) * 2020-12-22 2023-03-10 北京百度网讯科技有限公司 Platform content processing method and device, electronic equipment and storage medium
CN115588115A (en) * 2022-09-27 2023-01-10 北京羽乐创新科技有限公司 Method and device for identifying trademark picture

Similar Documents

Publication Publication Date Title
CN110674414A (en) Target information identification method, device, equipment and storage medium
CN110598157B (en) Target information identification method, device, equipment and storage medium
CN109886326B (en) Cross-modal information retrieval method and device and storage medium
CN109947989B (en) Method and apparatus for processing video
CN110633423B (en) Target account identification method, device, equipment and storage medium
CN111738316B (en) Zero sample learning image classification method and device and electronic equipment
CN113033707B (en) Video classification method and device, readable medium and electronic equipment
CN110059172B (en) Method and device for recommending answers based on natural language understanding
CN111090993A (en) Attribute alignment model training method and device
CN112446214B (en) Advertisement keyword generation method, device, equipment and storage medium
CN111262744B (en) Multimedia information transmitting method, backup server and medium
CN112069786A (en) Text information processing method and device, electronic equipment and medium
CN114140723B (en) Multimedia data identification method and device, readable medium and electronic equipment
CN113033682B (en) Video classification method, device, readable medium and electronic equipment
CN113222050B (en) Image classification method and device, readable medium and electronic equipment
CN112860999B (en) Information recommendation method, device, equipment and storage medium
CN111782933B (en) Method and device for recommending booklets
CN110334763B (en) Model data file generation method, model data file generation device, model data file identification device, model data file generation apparatus, model data file identification apparatus, and model data file identification medium
KR20210084641A (en) Method and apparatus for transmitting information
CN110543491A (en) Search method, search device, electronic equipment and computer-readable storage medium
CN110752958A (en) User behavior analysis method, device, equipment and storage medium
CN111581381B (en) Method and device for generating training set of text classification model and electronic equipment
CN113283115B (en) Image model generation method and device and electronic equipment
CN111259659B (en) Information processing method and device
CN112766285B (en) Image sample generation method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200110