WO2023240878A1 - Procédé et appareil de reconnaissance de ressource, et dispositif et support d'enregistrement - Google Patents

Procédé et appareil de reconnaissance de ressource, et dispositif et support d'enregistrement Download PDF

Info

Publication number
WO2023240878A1
WO2023240878A1 PCT/CN2022/127332 CN2022127332W WO2023240878A1 WO 2023240878 A1 WO2023240878 A1 WO 2023240878A1 CN 2022127332 W CN2022127332 W CN 2022127332W WO 2023240878 A1 WO2023240878 A1 WO 2023240878A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
resource
identified
identification
result
Prior art date
Application number
PCT/CN2022/127332
Other languages
English (en)
Chinese (zh)
Inventor
张琳
孙想
谢强
邓天生
于天宝
贠挺
陈国庆
林赛群
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Publication of WO2023240878A1 publication Critical patent/WO2023240878A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to a resource identification method, device, equipment and storage medium in the field of artificial intelligence technology.
  • the main methods for identifying resources in major media or platforms are: resource identification based on a priori information such as text semantic features and image features of the resources to be identified, so as to determine whether the resources to be identified are low-quality title resources.
  • the present disclosure provides a resource identification method, device, equipment and storage medium with higher accuracy.
  • a resource identification method including: obtaining posterior information and a priori information of the resource to be identified, where the posterior information is used to reflect the user's feedback information on the resource to be identified, so The a priori information is used to reflect the semantic information of the resource to be identified; the resource to be identified is identified according to the first identification model and the a posteriori information, and a first identification result is obtained; according to the second identification model and the a posteriori information Use the a priori information to identify the resource to be identified to obtain a second identification result; and generate a third identification result based on the first identification result and the second identification result.
  • a resource identification device including: a first acquisition module for acquiring posterior information and a priori information of the resource to be identified, where the posterior information is used to reflect the user's understanding of the resource to be identified.
  • Feedback information for identifying resources, the a priori information is used to reflect the semantic information of the resources to be identified;
  • the first identification module is used to identify the resources to be identified based on the first identification model and the a posteriori information.
  • the second identification module is used to identify the resource to be identified according to the second identification model and the prior information, and obtain the second identification result; and the generation module is used to identify the resource according to the second identification model and the prior information.
  • the first recognition result and the second recognition result generate a third recognition result.
  • an electronic device including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores information that can be executed by the at least one processor.
  • the instructions are executed by the at least one processor to enable the at least one processor to execute the method described in the present disclosure.
  • a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause a computer to perform the method described in the present disclosure.
  • a computer program product including a computer program that, when executed by a processor, implements the method described in the present disclosure.
  • the present disclosure provides a resource identification method, device, equipment and storage medium that combines a posteriori information and a priori information to identify whether the resource to be identified is a specific type of resource, thereby improving the accuracy of resource identification and user experience.
  • Figure 1 is a schematic flowchart of a resource identification method according to the first embodiment of the present disclosure
  • Figure 2 is a schematic flowchart of a resource identification method according to the third embodiment of the present disclosure.
  • Figure 3 is a schematic flowchart of a resource identification method according to the fourth embodiment of the present disclosure.
  • Figure 4 is a schematic flowchart of a resource identification method according to the sixth embodiment of the present disclosure.
  • Figure 5 is a schematic flowchart of a resource identification method according to the seventh embodiment of the present disclosure.
  • Figure 6 is a schematic structural diagram of a resource identification device according to the tenth embodiment of the present disclosure.
  • Figure 7 is a block diagram of an electronic device used to implement a resource identification method according to an embodiment of the present disclosure.
  • Figure 1 is a schematic flow chart of a resource identification method according to the first embodiment of the present disclosure. As shown in Figure 1, the method mainly includes:
  • Step S101 Obtain posterior information and prior information of the resource to be identified.
  • the posterior information is used to reflect the user's feedback information of the resource to be identified, and the prior information is used to reflect the semantic information of the resource to be identified.
  • the resources to be identified can be articles, videos, or photo albums published on self-media platforms.
  • the posterior information is used to reflect the user's needs to be identified.
  • the feedback information of the resource the posterior information can reflect whether the user likes the resource to be identified.
  • the posterior information can include the user's likes, dislikes, comments, shares and reports of the resource to be identified;
  • the prior information is used for Reflects the semantic information of the resource to be identified.
  • the prior information may include information such as title text, content graphics and subtitles of the resource to be identified.
  • the posterior information and prior information of the resource to be identified can be stored in the backend database of the self-media platform to which the resource to be identified belongs, and the resource to be identified can be obtained from the backend database based on the unique identifier of the resource to be identified.
  • posterior information and prior information may be the resource number of the resource to be identified, and the resource number of the resource to be identified is used to search and extract the posterior information and a priori information of the resource to be identified from the background database.
  • Step S102 Identify the resource to be identified based on the first identification model and posterior information, and obtain a first identification result.
  • the first recognition model can be generated based on machine learning model training.
  • the machine learning model can be a neural network model, a decision tree model, a support vector machine model, etc. An appropriate machine learning model can be selected for training according to actual needs.
  • the first recognition model this disclosure does not limit the first recognition model.
  • the first recognition model is used to identify the posterior information, and the obtained first recognition result can reflect the resource to be identified from the user's perspective. Whether the title of the resource matches the content, that is, whether the title is false, exaggerated, or distorted, etc.
  • the first identification result may be "yes” or "no", respectively used to display that the resource to be identified is a specific type of resource and that the resource to be identified is not a specific type of resource.
  • the first identification result may also include the first identification result.
  • the first identification result may also include the reason why the resource to be identified is a resource of a specific type, such as the title does not match the title or the title is exaggerated.
  • Step S103 Identify the resource to be identified based on the second identification model and a priori information, and obtain a second identification result.
  • the second recognition model can also be generated based on machine learning model training.
  • the machine learning model can be a neural network model, a decision tree model, a support vector machine model, etc. An appropriate machine learning model can be selected according to actual needs.
  • the second recognition model is trained. This disclosure does not limit the second recognition model.
  • the second recognition model is used to identify the prior information, and the obtained second recognition result can reflect whether the title of the resource to be identified exists. Typos, missing words, unclear sentences, and whether there is any overlap between the semantics of the title and the semantics of the content, etc.
  • the second identification result may be "yes” or "no", respectively used to display that the resource to be identified is a specific type of resource and that the resource to be identified is not a specific type of resource.
  • the second identification result may also include the second identification result. According to the corresponding confidence level, if the resource to be identified is a resource of a specific type, the second identification result may also include the reason why the resource to be identified is a resource of a specific type, such as a typo in the title or inconsistent inscription.
  • Step S104 Generate a third recognition result based on the first recognition result and the second recognition result.
  • the first recognition result and the second recognition result can be combined to generate a third recognition result.
  • the third recognition result can be used to reflect whether the resource to be identified belongs to a specific Type of resource, for example, whether the resource to be identified is a clickbait resource.
  • the first identification result reflects whether the resource to be identified is a specific type of resource from a user perspective
  • the second identification result reflects whether the resource to be identified is a specific type of resource from a semantic perspective. That is to say, as long as the first If one of the identification results and the second identification result shows that the resource to be identified is a resource of a specific type, the resource to be identified can be considered to be a resource of a specific type; if both the first identification result and the second identification result indicate that the resource to be identified is not a resource of a specific type , it is considered that the resource to be identified is not a specific type of resource.
  • the first recognition result is obtained by using the first recognition model and a posteriori information
  • the second recognition result is obtained by using the second recognition model and a priori information
  • the resource identification method not only combines a priori information related to the semantics of the resource to be identified, but also combines a posteriori information related to the user, so that identification results consistent with the user's cognition can be obtained, which can make up for the use of only
  • the problem of insufficient identification ability of resource titles such as exaggeration and falsehood can further improve the accuracy of resource identification.
  • the first recognition model and the second recognition model are obtained in the following manner:
  • the first training set includes the first sample resource for which the resource identification result has been obtained and its posterior information.
  • the second training set includes the second sample resource for which the resource identification result has been obtained and its posterior information.
  • the first training set is input into the first machine learning model for training to obtain the first recognition model
  • the second training set is input into the second machine learning model for training to obtain the second recognition model
  • a machine learning model is used to train and generate the first recognition model and the second recognition model.
  • the first training set and the second training set need to be obtained.
  • the first training set is used to train the first recognition model.
  • the second The training set is used to train the second identification model.
  • the first training set includes the first sample resource for which the resource identification result has been obtained and its posterior information.
  • the second training set includes the second sample resource for which the resource identification result has been obtained and its prior information.
  • some resources can be manually selected and marked whether they are specific types of resources, and then the posterior information or prior information of these resources can be obtained from the existing database to form the first training set and the second training set. It should be emphasized that the first sample resource and the second sample resource may be the same or different, and this disclosure does not limit them.
  • the first machine learning model and the second machine learning model can be trained according to the first training set and the second training set respectively until the first machine learning model The learning model and the second machine learning model converge, thereby obtaining the first recognition model and the second recognition model.
  • the machine learning model can be a neural network model, a decision tree model or a natural language processing model, etc., which can be selected according to actual needs.
  • a first test training set and a second test training set can also be obtained to test the trained first recognition model and the second recognition model.
  • the first test training set can include the first resource to be tested and For its posterior information
  • the second test training set may include the second resource to be tested and its prior information.
  • the first resource to be tested and the second resource to be tested may be the same or different.
  • the test training set tests the first recognition model and the second recognition model, and determines whether the accuracy of the first recognition model and the second recognition model meets the requirements based on the recognition results.
  • the machine learning model is trained according to the first training set and the second training set respectively to obtain the first recognition model and the second recognition model.
  • the first recognition model and the second recognition model can be used for
  • the resource to be identified is identified to determine whether the resource to be identified is a specific type of resource and improve the accuracy of resource identification.
  • FIG. 2 is a schematic flow chart of a resource identification method according to the third embodiment of the present disclosure. As shown in Figure 2, step S102 specifically includes:
  • Step S201 Obtain user information corresponding to the posterior information.
  • the posterior information includes the user's likes, dislikes, comments, shares, and reports of the identified resource, it is possible to determine the corresponding user for each like, dislike, comment, sharing, and reporting behavior. , and then obtain the user information of each user based on each user's unique identifier.
  • all user information related to the posterior information is obtained based on the posterior information of the resource to be identified.
  • the user information may include the user's unique identifier, the total number of likes, and total dislikes of the user. number, total number of comments, total number of shares, total number of reports, etc.
  • Step S202 Identify abnormal user information in the user information according to the user identification model.
  • abnormal users are users who often perform negative operations such as clicking on resources, making negative comments, or reporting, that is, “troll users" ".
  • the user recognition model can be generated based on machine learning model training.
  • the machine learning model can be selected according to actual needs.
  • the training method is similar to the first recognition model and the second recognition model, and will not be described again here.
  • Step S203 Delete the posterior information corresponding to the abnormal user information in the posterior information to obtain effective posterior information.
  • the posterior information corresponding to the abnormal user information after obtaining the abnormal user information, it is necessary to delete the posterior information corresponding to the abnormal user information in the posterior information, thereby obtaining effective posterior information.
  • the posterior information corresponding to the abnormal user information cannot accurately and truly reflect whether the user likes the resource to be identified, so the posterior information corresponding to the abnormal user needs to be deleted to obtain effective posterior information.
  • Step S204 Identify the resource to be identified based on the first identification model and valid posterior information, and obtain a first identification result.
  • the resource to be identified can be identified based on the first identification model and effective posterior information, and the first identification result can be obtained.
  • abnormal user information in user information is identified according to the user identification model, and then the posterior information corresponding to the abnormal user information is deleted to obtain effective posterior information. Finally, according to the first identification model and the effective posterior information information to obtain the first recognition result.
  • the posterior information corresponding to the abnormal user information is deleted, and the resources to be identified are identified only based on the effective posterior information, which can further improve the accuracy of resource identification.
  • Figure 3 is a schematic flow chart of a resource identification method according to the fourth embodiment of the present disclosure.
  • the posterior information includes the user's click information and comment information of the resource to be identified
  • the click information includes the user's click information of the resource to be identified.
  • the number of clicks corresponding to different operations and the display information of the resources to be identified.
  • the different operations include likes, dislikes, comments, reports, collections, etc.
  • the display information includes the length of stay, frequency of stays, and completion of playback or browsing of the resources to be identified. etc., step S204 specifically includes:
  • Step S301 Add a first label to the valid comment information according to the first classification model to obtain a first label result.
  • the posterior information includes the user's click information, comment information and reporting information of the resource to be identified. Therefore, the effective posterior information includes the user's effective click information, valid comment information and effective reporting information of the resource to be identified.
  • This embodiment first adds a first label to the effective comment information according to the first classification model to obtain the first label result.
  • the first classification model is generated based on machine learning model training.
  • the machine learning model can be selected according to actual needs.
  • the training method of the first classification model is similar to the first recognition model and the second recognition model, and will not be discussed here. Repeat.
  • Comment information is divided into valid negative comment information and valid general comment information.
  • Step S302 Count the number of valid comment information corresponding to all negative comment tags in the first tag result to obtain the number of valid negative comment information.
  • Step S303 Input the number of valid negative review information and valid point information into the first recognition model, identify the resources to be identified, and obtain the first recognition result.
  • the number of valid comment information corresponding to all negative comment tags in the first tag result can be counted to obtain the number of valid negative comment information, and then the number of valid negative comment information and The effective point spread information is input into the first recognition model, the resources to be identified are identified, and the first recognition result is obtained.
  • the number of click information in the effective click information includes the number of likes, dislikes, comments, reports, collections, blocks, etc. of the resource to be identified.
  • the effective click information and the effective load are The number of review information is input into the first recognition model for identification, which is equivalent to based on the user's positive reviews (including the number of likes, the number of collections, the number of valid general review information, etc.) and the negative reviews (including the number of dislikes, valid negative review information) of the resource to be identified. numbers, etc.) to jointly identify the resource to be identified, thereby obtaining the first identification result.
  • the first identification result is "yes", that is, the resource to be identified is a specific type of resource; if the number of negative evaluations of the resource to be identified is equal to If the ratio of the number of positive reviews is not greater than the preset threshold, the first identification result is "No", that is, the resource to be identified is not a specific type of resource.
  • the preset threshold can be set according to actual needs.
  • the number of valid negative review information is counted, and the number of valid negative review information and valid point information are input into the first identification model for identification, which can improve the accuracy of resource identification.
  • the posterior information includes the user's click information and reporting information of the resource to be identified
  • the click information includes the number of click information corresponding to the user's different operations on the resource to be identified and the display information of the resource to be identified, Different operations include likes, dislikes, comments, reports, collections, etc.
  • the display information includes the length of stay, frequency of stays, playback or browsing completion of the resource to be identified, etc.
  • the number of valid reporting information is counted to obtain the number of valid reporting information; the number of valid reporting information and the valid click information are input into the first identification model, the resources to be identified are identified, and the first identification result is obtained.
  • the number of click information in the effective click information includes the number of likes, dislikes, comments, reports, collections, blocks, etc. of the resource to be identified, and the effective click information and effective reports are The number of information is input into the first identification model for identification, which is equivalent to identifying the resource to be identified based on the user's positive evaluation (including the number of likes and collections) and negative evaluation (including the number of dislikes, the number of valid reporting information, etc.) of the resource to be identified. , thereby obtaining the first recognition result.
  • the first identification result is "yes", that is, the resource to be identified is a specific type of resource; if the number of negative evaluations of the resource to be identified is equal to If the ratio of the number of positive reviews is not greater than the preset threshold, the first identification result is "No", that is, the resource to be identified is not a specific type of resource.
  • the preset threshold can be set according to actual needs.
  • the number of valid reporting information is counted, and the number of valid reporting information and the valid point information are input into the first identification model for identification, which can improve the accuracy of resource identification.
  • Figure 4 is a schematic flow chart of a resource identification method according to the sixth embodiment of the present disclosure.
  • the posterior information includes the user's click information, comment information and report information of the resource to be identified, and the click information includes the user's click information.
  • the number of clicks corresponding to different operations on the resource to be identified and the display information of the resource to be identified.
  • the different operations include likes, dislikes, comments, reports, collections, etc.
  • the display information includes the length of stay, stay frequency and playback of the resource to be identified. Or browsing completion, etc.
  • Step S204 specifically includes:
  • Step S401 Add a first label to the valid comment information according to the first classification model to obtain a first label result.
  • Step S401 is similar to step S301 and will not be described again here.
  • Step S402 Count the number of valid comment information and the number of valid report information corresponding to all negative review tags in the first tag result, and obtain the number of valid negative review information and the number of valid report information.
  • Step S403 Enter the number of valid negative review information, the number of valid report information, and the valid point-of-view information into the first identification model, identify the resources to be identified, and obtain the first identification result.
  • the number of valid comment information and the number of valid report information corresponding to all negative review tags in the first tag result can be counted to obtain the number of valid negative review information and valid report information. number, and then input the number of valid negative review information, the number of valid report information and the valid point-of-view information into the first identification model, identify the resources to be identified, and obtain the first identification result.
  • the number of click information in the effective click information includes the number of likes, dislikes, comments, reports, collections, blocks, etc. of the resource to be identified, and the effective click information, effective negative
  • the number of comment information and the number of valid report information are input into the first identification model for identification, which is equivalent to based on the user's positive evaluation (including the number of likes, the number of collections, the number of valid general comment information, etc.) and negative evaluation (including the number of thumbs down) of the resource to be identified. number, the number of valid reporting information, the number of valid negative review information, etc.) to jointly identify the resources to be identified, thereby obtaining the first identification result.
  • the first identification result is "yes", that is, the resource to be identified is a specific type of resource; if the number of negative evaluations of the resource to be identified is equal to If the ratio of the number of positive reviews is not greater than the preset threshold, the first identification result is "No", that is, the resource to be identified is not a specific type of resource.
  • the preset threshold can be set according to actual needs.
  • the number of valid negative review information and the number of valid report information are counted, and the number of valid negative review information, the number of valid report information, and the valid click information are input into the first identification model for identification, which can further improve resources. Recognition accuracy.
  • FIG. 5 is a schematic flowchart of a resource identification method according to the seventh embodiment of the present disclosure. As shown in Figure 5, step S204 specifically includes:
  • Step S501 Add a first label to the valid comment information according to the first classification model to obtain a first label result.
  • Step S502 Count the number of valid comment information corresponding to different negative comment labels in the first label result to obtain the first statistical result.
  • its first label when training the first classification model, its first label can be set as article fabrication, title flaw, content fabrication, title incompatibility, average, excellent, etc., where article fabrication, title flaw, content Negative labels such as fabrication and title inconsistency can be classified as negative review labels; after adding the first label to the effective review information according to the first classification model, the obtained first label result will classify the effective review information into article fabrication and title falsification.
  • Step S503 Add a second label to the valid reporting information according to the second classification model to obtain a second label result.
  • Step S504 Count the number of valid reporting information corresponding to different second tags in the second tag result to obtain a second statistical result.
  • its second label when training the second classification model, its second label can be set to fabricated articles, flawed titles, low-quality titles, inconsistent titles, etc.; when adding a third label to the effective reporting information according to the second classification model After the second tag, the obtained second tag results will be divided into multiple categories such as fabricated articles, inaccurate titles, low-quality titles, inconsistent titles, etc., and then count the valid reports corresponding to different second tags in the second tag results.
  • the number of information for example, the number of valid reporting information corresponding to the article fabrication tag, the number of valid reporting information corresponding to the false title tag, the number of valid reporting information corresponding to the low-quality title tag, etc., thereby obtaining the second statistical result.
  • the second classification model is generated based on machine learning model training.
  • the machine learning model can be selected according to actual needs.
  • the training method of the second classification model is similar to the first recognition model and the second recognition model, and will not be discussed here. Repeat.
  • Step S505 Input the first statistical result, the second statistical result and the valid spread information into the first recognition model, identify the resources to be identified, and obtain the first recognition result.
  • the number of click information in the effective click information includes the number of likes, dislikes, comments, reports, collections, blocks, etc. of the resource to be identified.
  • the first statistical result, the second statistical result The results and valid click information are input into the first recognition model for identification, which is equivalent to based on the user's positive evaluation of the resource to be identified (including the number of likes, the number of collections, the number of valid general comment information, etc.) and the negative evaluation (including the number of dislikes, The number of valid reporting information, the number of valid negative review information, etc.) are jointly used to identify the resources to be identified, thereby obtaining the first identification result.
  • the first identification result is "yes", that is, the resource to be identified is a specific type of resource.
  • the first identification result may also include The reason why the resource to be identified is a specific type of resource, such as the title does not match or the title is exaggerated; if the ratio of the number of negative evaluations to the number of positive evaluations of the resource to be identified is not greater than the preset threshold, the first identification result is "No", that is The resources to be identified are not resources of a specific type, and the preset threshold can be set according to actual needs.
  • the number of valid comment information corresponding to different negative review tags and the number of valid report information corresponding to different second tags are counted to obtain the first statistical result and the second statistical result, and the first statistical result is The result, the second statistical result and the effective point spread information are input into the first identification model for identification, and the first identification result is obtained, which can further improve the accuracy of resource identification.
  • the second recognition model may be a model generated based on natural language processing model training.
  • Step S103 specifically includes:
  • the prior information is segmented and converted into a vector matrix; based on the vector matrix and the second recognition model, the resources to be identified are identified to obtain the second recognition result.
  • the prior information can be used as the semantic feature of the resource to be identified.
  • the prior information is segmented and converted into a vector matrix, that is, the prior information is converted into computer-recognizable data, and then based on the vector matrix and the third
  • the second identification model identifies the resource to be identified and obtains the second identification result.
  • the natural language processing model used to train the second recognition model can be a long short-term memory model (LSTM, Long-short term memory), a Transformer model, a BERT model (Bidirectional Encoder Representation from Transformers), etc., this There are no public restrictions on the selection of natural language processing models.
  • the prior information can be manually segmented, or a word segmentation tool can be used to segment the prior information.
  • a model such as the Word2Vec model or the GloVe model for generating word vectors can be used to segment the prior information. Convert empirical information into a vector matrix.
  • the eighth embodiment of the present disclosure uses natural language processing model training to generate a second recognition model, identifies the prior information of the resources to be identified according to the second recognition model, and obtains the second recognition result, so that the first recognition result and the second recognition result can be subsequently combined.
  • the identification results determine whether the resource to be identified is a specific type of resource, thereby improving the accuracy of resource identification.
  • step S104 specifically includes:
  • a third recognition result is generated to indicate that the resource to be identified belongs to a specific type of resource; otherwise, the generated result is used to indicate that the resource to be identified belongs to a specific type of resource.
  • the third identification result is that the resource does not belong to a specific type of resource.
  • the first identification result reflects whether the resource to be identified is a specific type of resource from the user's perspective
  • the second identification result reflects whether the resource to be identified is a specific type of resource from a semantic perspective.
  • the resource to be identified can be considered to be a resource of a specific type; if both the first identification result and the second identification result show that the resource to be identified is not a resource of a specific type, It is considered that the resource to be identified is not a specific type of resource.
  • the ninth embodiment of the present disclosure combines the first recognition result and the second recognition result to determine whether the resource to be identified is a specific type of resource, which can improve the accuracy of title recognition of the resource to be identified.
  • the posterior information is the user's feedback information on the video resource, such as the number of likes, the number of dislikes, and the number of comments. , number of reports, number of collections, number of blocks, etc.
  • the posterior information includes the user’s click information, comment information and report information of the resource to be identified.
  • the prior information is the semantic information of the video resource, such as title text and subtitles, etc.; then Delete the posterior information corresponding to the abnormal user information in the posterior information to obtain effective posterior information, and input the effective posterior information into the first recognition model for identification to obtain the first recognition result.
  • the first recognition result can reflect the video Whether the resource is a specific type of resource, of course, before entering the effective posterior information into the first identification model, you can also add labels to the effective negative review information and effective report information; then enter the prior information into the second identification model for identification, and get
  • the second recognition result can also reflect whether the video resource is a specific type of resource; if one of the first recognition result and the second recognition result shows that the video resource is a specific type of resource, a third recognition result can be obtained
  • the result is that the video resource is a specific type of resource, such as a clickbait resource, etc.
  • Figure 6 is a schematic structural diagram of a resource identification device according to the tenth embodiment of the present disclosure. As shown in Figure 6, the device mainly includes:
  • the first acquisition module 60 is used to obtain posterior information and prior information of the resource to be identified, the posterior information is used to reflect the user's feedback information of the resource to be identified, and the prior information is used to reflect the semantic information of the resource to be identified;
  • first The identification module 61 is used to identify the resource to be identified based on the first identification model and a posteriori information, and obtain the first identification result;
  • the second identification module 62 is used to identify the resource to be identified based on the second identification model and a priori information. Recognize to obtain a second recognition result;
  • the generation module 63 is configured to generate a third recognition result based on the first recognition result and the second recognition result.
  • the device further includes:
  • the second acquisition module is used to acquire the first training set and the second training set.
  • the first training set includes the resources for which the resource identification results have been obtained and their posterior information.
  • the second training set includes the resources for which the resource identification results have been obtained and their posterior information.
  • the first identification module 61 mainly includes:
  • the acquisition sub-module is used to obtain the user information corresponding to the posterior information; the user identification sub-module is used to identify abnormal user information in the user information according to the user identification model; the delete sub-module is used to delete the abnormal user information in the posterior information
  • the corresponding posterior information is used to obtain effective posterior information; the first identification submodule is used to identify the resource to be identified based on the first identification model and the effective posterior information, and obtain the first identification result.
  • the posterior information includes the user's click information, comment information and report information of the resource to be identified, and the click information includes the number of clicks corresponding to the user's different operations on the resource to be identified and the display information of the resource to be identified.
  • the first identification sub-module is used to: add a first label to the valid comment information according to the first classification model to obtain the first label result; count the number of valid comment information corresponding to all negative review labels in the first label result to obtain The number of valid negative review information; input the number of valid negative review information and valid point information into the first identification model, identify the resource to be identified, and obtain the first identification result.
  • the first identification sub-module is also used to: count the number of valid reporting information to obtain the number of valid reporting information; input the number of valid reporting information and the effective click information into the first identification model to identify the resources to be identified , get the first recognition result.
  • the first identification sub-module is also used to: add a first label to the valid review information according to the first classification model to obtain a first label result; and count the number of negative review labels corresponding to all negative review labels in the first label result.
  • the number of valid comment information and the number of valid report information are used to obtain the number of valid negative review information and the number of valid report information; the number of valid negative review information, the number of valid report information and the effective click information are input into the first identification model, and the resources to be identified are processed Recognize and obtain the first recognition result.
  • the first identification sub-module is also used to: add a first tag to the valid comment information according to the first classification model to obtain a first tag result; count the valid negative comment tags corresponding to the first tag result.
  • the number of comment information is used to obtain the first statistical result; according to the second classification model, a second label is added to the effective reporting information to obtain the second label result; the number of valid reporting information corresponding to different second labels in the second label result is counted, Obtain the second statistical result; input the first statistical result, the second statistical result and the effective point spread information into the first recognition model, identify the resources to be identified, and obtain the first recognition result.
  • the second identification module 62 mainly includes:
  • the conversion submodule is used to perform word segmentation processing on the prior information and convert it into a vector matrix; the second identification submodule is used to identify the resources to be identified based on the vector matrix and the second identification model, and obtain the second identification result.
  • the generation module 63 mainly includes:
  • the first generation submodule is used to generate a third identification result used to represent that the resource to be identified belongs to a specific type of resource if any of the first identification result and the second identification result indicates that the resource to be identified belongs to a specific type of resource;
  • the second generation submodule is used to generate a third identification result that indicates that the resource to be identified does not belong to a specific type of resource.
  • the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.
  • FIG. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure.
  • Electronic devices are intended to refer to various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are examples only and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • the device 700 includes a computing unit 701 that can execute according to a computer program stored in a read-only memory (ROM) 702 or loaded from a storage unit 708 into a random access memory (RAM) 703 Various appropriate actions and treatments. In the RAM 703, various programs and data required for the operation of the device 700 can also be stored.
  • Computing unit 701, ROM 702 and RAM 703 are connected to each other via bus 704.
  • An input/output (I/O) interface 705 is also connected to bus 704.
  • the I/O interface 705 includes: an input unit 706, such as a keyboard, a mouse, etc.; an output unit 707, such as various types of displays, speakers, etc.; a storage unit 708, such as a magnetic disk, optical disk, etc. ; and communication unit 709, such as a network card, modem, wireless communication transceiver, etc.
  • the communication unit 709 allows the device 700 to exchange information/data with other devices through computer networks such as the Internet and/or various telecommunications networks.
  • Computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various dedicated artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital signal processing processor (DSP), and any appropriate processor, controller, microcontroller, etc.
  • the computing unit 701 performs various methods and processes described above, such as a resource identification method.
  • a resource identification method may be implemented as a computer software program that is tangibly embodied in a machine-readable medium, such as storage unit 708.
  • part or all of the computer program may be loaded and/or installed onto device 700 via ROM 702 and/or communication unit 709.
  • the computer program When the computer program is loaded into RAM 703 and executed by computing unit 701, one or more steps of a resource identification method described above may be performed.
  • computing unit 701 may be configured to perform a resource identification method in any other suitable manner (eg, by means of firmware).
  • Various implementations of the systems and techniques described above may be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on a chip implemented in a system (SOC), complex programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOC system
  • CPLD complex programmable logic device
  • computer hardware firmware, software, and/or combinations thereof.
  • These various embodiments may include implementation in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor
  • the processor which may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • An output device may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • An output device may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that the program codes, when executed by the processor or controller, cause the functions specified in the flowcharts and/or block diagrams/ The operation is implemented.
  • the program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM portable compact disk read-only memory
  • magnetic storage device or any suitable combination of the above.
  • the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer.
  • a display device eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or a trackball
  • Other kinds of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and may be provided in any form, including Acoustic input, voice input or tactile input) to receive input from the user.
  • the systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., A user's computer having a graphical user interface or web browser through which the user can interact with implementations of the systems and technologies described herein), or including such backend components, middleware components, or any combination of front-end components in a computing system.
  • the components of the system may be interconnected by any form or medium of digital data communication (eg, a communications network). Examples of communication networks include: local area network (LAN), wide area network (WAN), and the Internet.
  • Computer systems may include clients and servers.
  • Clients and servers are generally remote from each other and typically interact over a communications network.
  • the relationship of client and server is created by computer programs running on corresponding computers and having a client-server relationship with each other.
  • the server can be a cloud server, a distributed system server, or a server combined with a blockchain.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente divulgation se rapporte au domaine technique des ordinateurs et, en particulier, au domaine technique de l'intelligence artificielle. L'invention concerne un procédé et un appareil de reconnaissance de ressource, et un dispositif et un support d'enregistrement. Le schéma de mise en œuvre spécifique consiste à : acquérir des informations postérieures et des informations antérieures d'une ressource à reconnaître, les informations postérieures étant utilisées pour refléter des informations de retour d'un utilisateur pour ladite ressource, et les informations antérieures étant utilisées pour refléter des informations sémantiques de ladite ressource; reconnaître ladite ressource selon un premier modèle de reconnaissance et les informations postérieures, de façon à obtenir un premier résultat de reconnaissance; reconnaître ladite ressource selon un second modèle de reconnaissance et les informations antérieures, de façon à obtenir un deuxième résultat de reconnaissance; et générer un troisième résultat de reconnaissance selon le premier résultat de reconnaissance et le deuxième résultat de reconnaissance. Au moyen du procédé et de l'appareil de reconnaissance de ressource, et du dispositif et du support d'enregistrement prévus dans la présente divulgation, la précision de la reconnaissance de ressource peut être améliorée.
PCT/CN2022/127332 2022-06-16 2022-10-25 Procédé et appareil de reconnaissance de ressource, et dispositif et support d'enregistrement WO2023240878A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210694398.8A CN115099239B (zh) 2022-06-16 2022-06-16 一种资源识别方法、装置、设备以及存储介质
CN202210694398.8 2022-06-16

Publications (1)

Publication Number Publication Date
WO2023240878A1 true WO2023240878A1 (fr) 2023-12-21

Family

ID=83290393

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/127332 WO2023240878A1 (fr) 2022-06-16 2022-10-25 Procédé et appareil de reconnaissance de ressource, et dispositif et support d'enregistrement

Country Status (2)

Country Link
CN (1) CN115099239B (fr)
WO (1) WO2023240878A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115099239B (zh) * 2022-06-16 2023-10-31 北京百度网讯科技有限公司 一种资源识别方法、装置、设备以及存储介质
CN117172245A (zh) * 2023-05-26 2023-12-05 国家计算机网络与信息安全管理中心 控制方法及控制系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170295189A1 (en) * 2016-04-11 2017-10-12 International Business Machines Corporation Identifying security breaches from clustering properties
US20180365574A1 (en) * 2017-06-20 2018-12-20 Beijing Baidu Netcom Science And Technology Co., L Td. Method and apparatus for recognizing a low-quality article based on artificial intelligence, device and medium
CN110705257A (zh) * 2019-09-16 2020-01-17 腾讯科技(深圳)有限公司 媒体资源的识别方法、装置、存储介质及电子装置
WO2022068600A1 (fr) * 2020-09-30 2022-04-07 百果园技术(新加坡)有限公司 Procédé et appareil d'apprentissage de modèle de détection d'utilisateur anormal, et procédé et appareil d'évaluation d'utilisateur anormal
CN115099239A (zh) * 2022-06-16 2022-09-23 北京百度网讯科技有限公司 一种资源识别方法、装置、设备以及存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103106412B (zh) * 2013-01-11 2016-04-20 广州广电运通金融电子股份有限公司 薄片类介质识别方法和识别装置
CN109684513B (zh) * 2018-12-14 2021-08-24 北京奇艺世纪科技有限公司 一种低质量视频识别方法及装置
CN113590968A (zh) * 2021-08-10 2021-11-02 平安普惠企业管理有限公司 资源推荐方法、装置、计算机设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170295189A1 (en) * 2016-04-11 2017-10-12 International Business Machines Corporation Identifying security breaches from clustering properties
US20180365574A1 (en) * 2017-06-20 2018-12-20 Beijing Baidu Netcom Science And Technology Co., L Td. Method and apparatus for recognizing a low-quality article based on artificial intelligence, device and medium
CN110705257A (zh) * 2019-09-16 2020-01-17 腾讯科技(深圳)有限公司 媒体资源的识别方法、装置、存储介质及电子装置
WO2022068600A1 (fr) * 2020-09-30 2022-04-07 百果园技术(新加坡)有限公司 Procédé et appareil d'apprentissage de modèle de détection d'utilisateur anormal, et procédé et appareil d'évaluation d'utilisateur anormal
CN115099239A (zh) * 2022-06-16 2022-09-23 北京百度网讯科技有限公司 一种资源识别方法、装置、设备以及存储介质

Also Published As

Publication number Publication date
CN115099239A (zh) 2022-09-23
CN115099239B (zh) 2023-10-31

Similar Documents

Publication Publication Date Title
TWI732271B (zh) 人機對話方法、裝置、電子設備及電腦可讀媒體
US10795939B2 (en) Query method and apparatus
CN108874776B (zh) 一种垃圾文本的识别方法及装置
US11200269B2 (en) Method and system for highlighting answer phrases
CN109670163B (zh) 信息识别方法、信息推荐方法、模板构建方法及计算设备
WO2023240878A1 (fr) Procédé et appareil de reconnaissance de ressource, et dispositif et support d'enregistrement
WO2020253350A1 (fr) Procédé et appareil de vérification de publication de contenu de réseau, dispositif informatique et support de stockage
WO2020155423A1 (fr) Procédé et appareil d'extraction d'informations inter-modes, et support de stockage
US10803253B2 (en) Method and device for extracting point of interest from natural language sentences
US11521603B2 (en) Automatically generating conference minutes
US20220027569A1 (en) Method for semantic retrieval, device and storage medium
US10803252B2 (en) Method and device for extracting attributes associated with centre of interest from natural language sentences
EP4053802A1 (fr) Procédé et appareil de classification de vidéos, dispositif et support d'informations
US11397952B2 (en) Semi-supervised, deep-learning approach for removing irrelevant sentences from text in a customer-support system
US11977567B2 (en) Method of retrieving query, electronic device and medium
CN112966081B (zh) 处理问答信息的方法、装置、设备和存储介质
CN114861889B (zh) 深度学习模型的训练方法、目标对象检测方法和装置
WO2023284327A1 (fr) Procédé d'entraînement d'un modèle d'évaluation de qualité de texte et procédé de détermination de qualité de texte
WO2023040230A1 (fr) Procédé et appareil d'évaluation de données, procédé et appareil d'entraînement et dispositif électronique et support de stockage
CN112579729A (zh) 文档质量评价模型的训练方法、装置、电子设备和介质
US20220198358A1 (en) Method for generating user interest profile, electronic device and storage medium
CN113204956B (zh) 多模型训练方法、摘要分段方法、文本分段方法及装置
CN113111658A (zh) 校验信息的方法、装置、设备和存储介质
US20230004715A1 (en) Method and apparatus for constructing object relationship network, and electronic device
WO2023060954A1 (fr) Procédé et appareil de traitement de données, procédé et appareil d'inspection de qualité de données, et support de stockage lisible

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22946548

Country of ref document: EP

Kind code of ref document: A1