CN113569557B

CN113569557B - Information quality identification method, device, equipment, storage medium and program product

Info

Publication number: CN113569557B
Application number: CN202111127146.9A
Authority: CN
Inventors: 王晨琛
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-09-26
Filing date: 2021-09-26
Publication date: 2022-01-04
Anticipated expiration: 2041-09-26
Also published as: CN113569557A

Abstract

The application provides a method, a device, equipment, a computer readable storage medium and a computer program product for identifying the quality of information, which are applied to the field of Internet of vehicles and the technical field of artificial intelligence; the method comprises the following steps: the method comprises the steps of obtaining characteristics of at least two dimensions of information in a first stage, carrying out characteristic combination processing on the characteristics of the at least two dimensions to obtain combined characteristics of the information, and determining a first quality parameter of the information based on the combined characteristics; the first stage is a period before online recommendation is carried out on the information; acquiring interactive characteristics related to the information recommendation process in the second stage, determining a second quality parameter of the information based on the interactive characteristics, and determining a quality identification result of the information by combining the first quality parameter and the second quality parameter; and the second stage is a period for recommending the information to be online. Through the method and the device, the quality of the information can be accurately identified, and the recommendation precision is improved.

Description

Information quality identification method, device, equipment, storage medium and program product

Technical Field

The present application relates to the field of car networking and the technical field of artificial intelligence, and in particular, to a method, an apparatus, a device, a computer-readable storage medium, and a computer program product for identifying quality of information.

Background

Artificial Intelligence (AI) is a theory, method and technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results.

The artificial intelligence technology is widely applied to a recommendation system, for example, information is recommended to users through the recommendation system, and due to the fact that the quality of the information is uneven, a large amount of low-quality information is recommended in a unified recommendation scheme, and the precision and the user experience of the recommendation system are affected.

The related art lacks an effective scheme for identifying the quality of information so as to improve the recommendation precision and the user experience of the recommendation system.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment, a computer readable storage medium and a computer program product for identifying the quality of information, which can accurately identify the quality of the information so as to improve the recommendation precision and user experience of a recommendation system.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides an information quality identification method, which comprises the following steps:

acquiring features of at least two dimensions of the information in a first stage, performing feature combination processing on the features of the at least two dimensions to obtain combined features of the information, and determining a first quality parameter of the information based on the combined features; wherein the first stage is a period before online recommendation of the information;

acquiring interactive characteristics related to the recommendation process of the information in a second stage, determining a second quality parameter of the information based on the interactive characteristics, and determining a quality identification result of the information by combining the first quality parameter and the second quality parameter; wherein the second stage is a period of online recommendation of the information.

The embodiment of the application provides a quality recognition device of information, includes:

the first determining module is used for acquiring the features of at least two dimensions of the information in a first stage, performing feature combination processing on the features of the at least two dimensions to obtain the combined features of the information, and determining a first quality parameter of the information based on the combined features; wherein the first stage is a period before online recommendation of the information;

the second determining module is used for acquiring interactive characteristics related to the information recommendation process in a second stage and determining a second quality parameter of the information based on the interactive characteristics;

a third determining module, configured to determine a quality identification result of the information in combination with the first quality parameter and the second quality parameter; wherein the second stage is a period of online recommendation of the information.

In the above scheme, the second determining module is further configured to periodically obtain, in a second stage, interactive data related to the information recommendation process, and extract corresponding interactive features from the interactive data, where the period is divided based on a sampling duration or a collection number of the interactive data;

determining a second quality parameter of the information in each period based on the interaction characteristics of each period;

the third determining module is further configured to determine, in combination with the first quality parameter and the second quality parameter of each period, a period quality identification result of the information in each period;

and determining normal distribution which is satisfied by the quality change trend of the information according to the period quality identification result of the information in each period, and taking the parameters of the normal distribution as the integral quality identification result of the information, wherein the parameters comprise a quality parameter mean value and a quality parameter variance.

In the foregoing solution, the second determining module is further configured to collect, in a sampling window at a second stage, interaction data related to the information recommendation process, and extract corresponding interaction features from the interaction data, where the type of the sampling window includes: setting a time length sampling window and a data volume sampling window;

determining a second quality parameter in the sampling window based on the interaction data;

and the third determining module is further configured to perform fusion processing on the first quality parameter and the second quality parameter in the sampling window, and use an obtained third quality parameter as a quality identification result.

In the above scheme, the apparatus further comprises:

the information shielding module is used for determining a corresponding shielding mode according to the low quality grade of the information and applying the corresponding shielding mode to the information when the information is the low quality information needing shielding according to the quality identification result;

wherein the shielding pattern comprises at least one of: performing weight reduction processing on the information in a sequencing link of a recommendation system; temporarily filtering the information in a recall result of the recommendation system; the information is permanently filtered in the recall results of the recommendation system.

In the foregoing solution, the first determining module is further configured to perform at least two of the following operations in the first stage:

acquiring content structure characteristics representing the information, wherein the content structure characteristics are used for representing the quality of a content structure of the information;

acquiring account characteristics of the information, wherein the account characteristics comprise the grade of an account issuing the information;

and acquiring content understanding characteristics of the information, wherein the content understanding characteristics characterize at least one quality class to which the information belongs, and each quality class is recommended to be shielded in the second stage.

In the foregoing solution, the first determining module is further configured to obtain at least one of the following continuous features of the information: title length, image number, image-text proportion and text length;

discretizing the continuous features to obtain corresponding discrete features;

and taking the discrete characteristic corresponding to at least one continuous characteristic as the content structure characteristic of the information.

In the above scheme, the first determining module is further configured to perform encoding processing on at least two words in the information to obtain a vector representation of each word, and perform iterative encoding processing based on a position of each word and the vector representation to obtain an encoding characteristic of the information;

and mapping the coding characteristics of the information to obtain the mapping characteristics of the information, and performing bias processing on the mapping characteristics to obtain the content understanding characteristics of the information.

In the above scheme, the first determining module is further configured to perform word segmentation processing on the information to obtain at least two words, and perform vector conversion on the at least two words to obtain a vector representation corresponding to each word;

according to the position of each word in the information, carrying out position embedding processing on the vector representation corresponding to each word to obtain a position code corresponding to each word;

and summing the vector representation corresponding to each word and the position code to determine the coding characteristics corresponding to each word.

In the above scheme, the dimension of the position code is the same as the dimension of the vector representation of the word; the first determining module is further configured to determine, when a sequence number of a dimension in the position code is an even number, a code value corresponding to the dimension in the position code according to a sine function, where the sine function takes a sorting position of the word in the information and a position code dimension as parameters;

and when the serial number of the dimensionality in the position code is an odd number, determining the code value corresponding to the dimensionality in the position code according to a cosine function, wherein the cosine function takes the sequencing position of the words in the information and the position code dimensionality as parameters.

In the above scheme, the first determining module is further configured to perform feature splitting on a first partial feature in the combined features to obtain a split feature of the information;

performing feature combination processing on the split features of the information and second partial features in the combined features to obtain logistic regression features of the information, wherein the discrimination of the first partial features is greater than that of the second partial features;

and performing quality parameter prediction processing on the information based on the logistic regression characteristics to obtain a first quality parameter of the information.

In the above solution, the interactive features include positive interactive features representing preference of the information and negative interactive features representing non-preference of the information; the second determination module is further configured to perform weighted summation processing on the forward interactive features of each dimension based on the first weight of the forward interactive features of each dimension, and determine a forward quality parameter negatively correlated to the first weighted summation processing result;

based on the second weight of the negative interactive features of each dimension, performing weighted summation processing on the negative interactive features of each dimension, and determining a negative quality parameter positively correlated with a second weighted summation result;

summing the positive quality parameter and the negative quality parameter to obtain a second quality parameter of the information;

the value of the positive interactive feature is in a negative correlation with the positive quality parameter, the value of the negative interactive feature is in a positive correlation with the negative quality parameter, and the value of the second quality parameter is in a negative correlation with the quality of the information.

In the above scheme, the apparatus further comprises:

a feature processing module for obtaining at least one of the following continuous forward interactive features characterizing the preference of the information: the amount of praise, click, comment and share;

discretizing the continuous forward interactive features to obtain corresponding discrete forward features;

and taking the discrete forward features corresponding to at least one continuous forward interactive feature as the forward interactive features.

In the above scheme, the apparatus further comprises: the weight determination module is used for determining the corresponding first weight and the second weight according to the influence degree of the positive interactive features of each dimension and the negative interactive features of each dimension on the quality identification result of the information; or, determining a quality class to which the information belongs, and determining the first weight and the second weight which are adapted to the quality class according to the quality class; or determining the information category to which the information belongs, and determining the first weight and the second weight which are matched with the information category according to the information category.

In the above scheme, the third determining module is further configured to multiply the first quality parameter and the second quality parameter, and use an obtained third quality parameter as a quality identification result of the information; or, the first quality parameter and the second quality parameter are subjected to weighted summation processing, and an obtained third quality parameter is used as a quality identification result of the information.

An embodiment of the present application provides an electronic device, including:

a memory for storing executable instructions;

and the processor is used for realizing the quality identification method of the information provided by the embodiment of the application when the processor executes the executable instructions stored in the memory.

The embodiment of the application provides a computer-readable storage medium, which stores executable instructions for causing a processor to execute the method for identifying the quality of information provided by the embodiment of the application.

Embodiments of the present application provide a computer program product, which includes a computer program or instructions, and when the computer program or instructions are executed by a processor, the method for identifying quality of information provided by embodiments of the present application is implemented.

The embodiment of the application has the following beneficial effects:

before online recommendation is carried out on information, a first quality parameter of the information is determined according to characteristics of multiple dimensionalities of the information, a second quality parameter of the information is determined according to interactive characteristics related to a recommendation process of the information during online recommendation of the information, and a quality identification result of the information is determined by combining the first quality parameter and the second quality parameter; quality parameters of information before online recommendation and in the online recommendation process are comprehensively considered from multiple dimensions, accuracy of a finally obtained quality identification result can be improved, accurate reference data are provided for a recommendation system, and recommendation precision and user experience are improved.

Drawings

Fig. 1 is a schematic architecture diagram of an information recommendation system 10 provided in an embodiment of the present application;

fig. 2 is a schematic structural diagram of an electronic device 500 for quality identification of information according to an embodiment of the present disclosure;

fig. 3A is a schematic flowchart of a method for identifying quality of information according to an embodiment of the present application;

fig. 3B is a schematic diagram of an obtaining method of content understanding features provided in an embodiment of the present application;

fig. 3C is a schematic diagram of a method for determining a first quality parameter according to an embodiment of the present disclosure;

fig. 3D is a schematic diagram of a method for determining a quality recognition result according to an embodiment of the present application;

fig. 3E is a schematic diagram of a method for determining a quality identification result according to an embodiment of the present application;

fig. 3F is a schematic diagram of a method for determining a second quality parameter according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of quality parameter prediction provided by an embodiment of the present application;

fig. 5 is a schematic diagram of information shielding recommendation processing provided in an embodiment of the present application;

FIG. 6 is a schematic diagram of Bert pre-training provided in an embodiment of the present application;

FIG. 7 is a diagram illustrating training of a text classification model according to an embodiment of the present disclosure;

fig. 8 is a schematic diagram of obtaining a prior quality score according to an embodiment of the present application.

Detailed Description

In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

In the following description, references to the terms "first \ second \ third" are only to distinguish similar objects and do not denote a particular order, but rather the terms "first \ second \ third" are used to interchange specific orders or sequences, where appropriate, so as to enable the embodiments of the application described herein to be practiced in other than the order shown or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.

1) And a terminal program: and various applications which can receive messages and information streams, such as an instant messaging application, a news browsing application and the like, run on the terminal.

2) Information flow product: the terminal program is a product form, and various video, audio, text, image and text information can be obtained on the product.

3) Machine Learning (ML): the method is the core of artificial intelligence, belongs to a branch of artificial intelligence, and enables a computer to have the learning ability like a human, simulate and realize the learning behavior and ability of the human, have the recognition and judgment ability like the human, and can be regarded as bionics. The core of machine learning lies in data, algorithm (model) and computational power (computer operational capability), which relates to the multi-field interdisciplines of probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc., and specially studies how a computer simulates or realizes the learning behavior of human beings so as to obtain new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer.

4) Natural Language Processing (NLP): it is an important direction in the fields of computer science and artificial intelligence, and it is used for researching various theories and methods for implementing effective communication between human body and computer by using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics.

5) Gradient Boosting Tree (GBDT), Gradient Boosting Decision Tree: the method is an iterative decision tree algorithm, the algorithm is composed of a plurality of decision trees, and the conclusions of all the trees are accumulated to make a final answer. XGboost is one of gradient lifting tree models, a loss function is subjected to second-order Taylor expansion, the loss function is optimized by using second-order derivative information of the loss function, and whether a node is split or not is selected greedy according to whether the loss function is reduced or not; meanwhile, means such as regularization, learning rate, column sampling, approximate optimal segmentation points and the like are added to the XGboost in the aspect of preventing overfitting.

6) A Logistic Regression (LR) model, which is a log probability model, is used to predict the result of binary classification of input features.

7) Quality parameters are as follows: the method is used for reflecting the statistical indexes of the quality of the information to be recommended or the recommended information, and can be represented by the quality score of the information or the conversion rate of the information. The two relations can be positively or negatively correlated.

For example, the larger the value of the quality parameter, the lower the quality of the information, i.e. the positive correlation between the two. In this case, the quality parameter is obtained by a quality parameter model, and the quality parameter model is obtained by training an information sample using the quality parameter negatively correlated with the information quality as a label in the model training stage.

For another example, the larger the value of the quality parameter is, the lower the quality of the information is, that is, the quality parameter is negatively correlated with the quality parameter, and in this case, the quality parameter model is trained by using the information sample labeled with the quality parameter positively correlated with the quality of the information.

For convenience of description, unless otherwise specified, the following description will take the case where the value of the quality parameter is inversely related to the quality of the information as an example.

The embodiment of the application provides a method, a device, equipment, a storage medium and a computer program product for identifying the quality of information, which can accurately identify the quality of the information so as to improve the recommendation precision and user experience of a recommendation system.

The quality identification method of the information provided by the embodiment of the application can be implemented by various electronic devices, for example, the quality identification method can be implemented by a terminal alone, a server alone, or a terminal and a server cooperatively. For example, the terminal alone performs a quality recognition method of information described below, or the terminal transmits a recognition request to the server, and the server performs the quality recognition method of information based on the received recognition request.

The electronic device for quality identification of information provided by the embodiment of the application can be various types of terminal devices or servers, wherein the server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server for providing cloud computing service; the terminal may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a vehicle-mounted terminal, etc., but is not limited thereto. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

Taking a server as an example, for example, the server cluster may be deployed in a cloud, and open an artificial intelligence cloud Service (AI as a Service, AIaaS) to users, the AIaaS platform may split several types of common AI services, and provide an independent or packaged Service in the cloud, this Service mode is similar to an AI theme mall, and all users may access one or more artificial intelligence services provided by the AIaaS platform by using an application programming interface.

For example, one of the artificial intelligence cloud services may be a quality identification service of information, that is, a quality identification program of information provided by the embodiment of the present application is encapsulated in a cloud server. A user calls a quality identification service of information in the cloud service through a terminal (a client is operated, such as an instant messaging client, a live broadcast client, a short video client, a social client and the like), so that a server deployed at a cloud terminal calls a quality identification program of packaged information, a quality identification result of the information is determined, and corresponding operation is performed on the information based on the quality identification result, for example, the identified low-quality information is filtered in a recall stage; and performing weight reduction sorting or filtering on the identified low-quality information in a sorting stage.

In some embodiments, the quality identification method of the information provided by the embodiments of the present application is implemented by a server alone. In a first stage before online recommendation of information, a server acquires characteristics of at least two dimensions of the information, performs characteristic combination processing on the characteristics of the at least two dimensions to obtain combined characteristics of the information, and determines a first quality parameter of the information based on the combined characteristics; in a second stage after the information is recommended online, the server acquires interactive characteristics related to the information recommendation process, determines a second quality parameter of the information based on the interactive characteristics, and determines a quality identification result of the information by combining the first quality parameter and the second quality parameter.

In some embodiments, a quality identification method for information provided by the embodiments of the present application is implemented by a server and a terminal in a cooperative manner. Referring to fig. 1, fig. 1 is a schematic structural diagram of an information recommendation system 10 provided in an embodiment of the present application. The terminal 400 is connected to the server 200 through a network 300, and the network 300 may be a wide area network or a local area network, or a combination of both.

The terminal 400 (running a client, such as an instant messaging client, a live client, a short video client, a social client, etc.) may be used to obtain an information recommendation request for a user, for example, when the user opens a news client running on the terminal, the terminal automatically obtains a news recommendation request for the user.

In some embodiments, after obtaining the information recommendation request, the terminal invokes an information recommendation interface (which may be provided in a cloud service form, that is, an information recommendation service), based on the information recommendation request, according to user data (such as age, gender, occupation, education level, consumption level, and other self-owned attribute data, or behavior data such as browsing, clicking, collecting, and purchasing), item data (such as tags, categories, or related interaction data of the information), and contextual information (such as a recommendation scenario) of the user, candidate information that meets the above characteristics of the user is recalled from an information base to be recommended, quality recognition is performed on the candidate information to determine whether the candidate information belongs to low-quality information according to a quality recognition result, and a corresponding shielding mode is applied to the candidate information belonging to the low-quality information. For example, in a recall link of the recommendation system, low-quality information is temporarily filtered or permanently filtered, and filtered candidate information is ranked; in a sorting link of a recommendation system, performing weight-reducing sorting on low-quality information; therefore, high-quality information is recommended to the terminal to be displayed, the wide spread of low-quality information is avoided, the overall information quality is indirectly improved, the user experience is improved, and the initial visit user and the revisit user are effectively reserved.

In some embodiments, the information quality identification method provided by the embodiment of the application can also be applied to information recommendation scenes related to internet of vehicles services (such as refueling, navigation, parking, maintenance and the like), for example, when information recommendation is performed on a vehicle-mounted terminal, the quality identification method provided by the embodiment of the application is used for performing quality identification on a plurality of candidate information to be recommended so as to determine whether the candidate information belongs to low-quality information according to a quality identification result, and a corresponding shielding mode is applied to the candidate information belonging to the low-quality information, so that the candidate information not belonging to the low-quality information is recommended to the vehicle-mounted terminal, thereby avoiding the wide spread of low-quality information, indirectly improving the overall information quality, and improving the user experience.

The structure of the electronic device for quality identification of information provided in the embodiment of the present application is described below, referring to fig. 2, fig. 2 is a schematic structural diagram of the electronic device 500 for quality identification of information provided in the embodiment of the present application, and taking the electronic device 500 as a server as an example, the electronic device 500 for quality identification of information shown in fig. 2 includes: at least one processor 510, memory 550, at least one network interface 520, and a user interface 530. The various components in the electronic device 500 are coupled together by a bus system 540. It is understood that the bus system 540 is used to enable communications among the components. The bus system 540 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 540 in fig. 2.

The Processor 510 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The memory 550 may comprise volatile memory or nonvolatile memory, and may also comprise both volatile and nonvolatile memory. The non-volatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 550 described in embodiments herein is intended to comprise any suitable type of memory. Memory 550 optionally includes one or more storage devices physically located remote from processor 510.

In some embodiments, memory 550 can store data to support various operations, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below.

An operating system 551 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;

a network communication module 552 for communicating to other computing devices via one or more (wired or wireless) network interfaces 520, exemplary network interfaces 520 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;

a presentation module 553 for enabling presentation of information (e.g., a user interface for operating peripherals and displaying content and information) via one or more output devices 531 (e.g., a display screen, speakers, etc.) associated with the user interface 530;

an input processing module 554 to detect one or more user inputs or interactions from one of the one or more input devices 532 and to translate the detected inputs or interactions.

In some embodiments, the quality identification apparatus for information provided in the embodiments of the present application may be implemented in a software manner, for example, the quality identification apparatus may be the quality identification service or the information recommendation service in the server described above, or may be the quality identification plug-in or the information recommendation plug-in the terminal described above. Of course, without limitation, the quality identification device of the information provided by the embodiments of the present application may be provided as various software embodiments, including various forms of applications, software modules, scripts or code.

In some embodiments, the quality recognition device for information provided by the embodiments of the present application may be implemented in software, and fig. 2 shows the quality recognition device 555 for information stored in the memory 550, which may be software in the form of programs and plug-ins, and includes the following software modules: the first determination module 5551, the second determination module 5552 and the third determination module 5553 are logical and thus may be arbitrarily combined or further split depending on the functionality implemented. The function of each module will be explained below.

In other embodiments, the quality recognition Device of the information provided by the embodiments of the present Application may be implemented in hardware, and for example, the quality recognition Device of the information provided by the embodiments of the present Application may be a processor in the form of a hardware decoding processor, which is programmed to execute the quality recognition method of the information provided by the embodiments of the present Application, for example, the processor in the form of the hardware decoding processor may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.

The following describes a quality identification method for information provided by an embodiment of the present application with reference to the accompanying drawings, where an execution subject of the quality identification method for information described below may be a server, and specifically, the server may be implemented by running the above various computer programs; of course, as will be understood from the following description, it is understood that the quality identification method of the information provided by the embodiments of the present application may be implemented by a terminal and a server in cooperation.

Referring to fig. 3A, fig. 3A is a schematic flowchart of a quality identification method of information provided in an embodiment of the present application, and will be described with reference to the steps shown in fig. 3A.

In step 101, the server obtains features of at least two dimensions of the information in a first stage, performs feature combination processing on the features of the at least two dimensions to obtain a combined feature of the information, and determines a first quality parameter of the information based on the combined feature.

The first stage is a period before information is recommended online, that is, the information does not enter the offline stage of the recommendation process.

In some embodiments, the obtaining of the characteristics of the at least two dimensions of the information in step 101 may be obtained by performing at least two of the following operations in the first stage: acquiring content structure characteristics of the representation information, wherein the content structure characteristics are used for representing the quality of the content structure of the information; acquiring account characteristics of information, wherein the account characteristics comprise the grade of an account for issuing the information; and acquiring content understanding characteristics of the information, wherein the content understanding characteristics characterize at least one quality class to which the information belongs, and each quality class is recommended to be shielded in the second stage.

In practical applications, the content structure of the information is used to characterize the content style of the information, for example, when the information is text, the content structure may include at least one of the following: title length, text length, image number, image-text ratio; when the information is a video, the content structure thereof may include at least one of: title length, video duration, and video-related description information; and obtaining the relevant characteristics of the quality of the content structure of the identification information by performing characteristic extraction on the content structure of the information. The account characteristics of the information can be characterized based on the level of the account issuing the information, the level of the account can be divided according to the starting rate of the account, the activity of the account, the number or frequency of information issued by the account, and the like, and generally, the level of the account and the recommended amount of the issued information are in a positive correlation relationship, for example, the recommended amount of the information issued by the authoritative account is greater than the recommended amount of the information issued by the general account. The content comprehension feature of the information is used for indicating whether the information belongs to a quality category of the masked recommendation in the second stage, such as when the information belongs to a quality category of at least one of a banners, non-nutritive texts, advertisement texts, news, and the like, the masked recommendation is performed on the information.

In some embodiments, the server may obtain the content structure feature characterizing the information by: acquiring at least one of the following successive features of the information: title length, image number, image-text proportion and text length; discretizing the continuous features to obtain corresponding discrete features; and taking the discrete characteristic corresponding to at least one continuous characteristic as the content structure characteristic of the information.

In practical application, when a plurality of continuous features are in different dimensions, in order to balance the influence of each continuous feature on the quality identification result of information, discretization processing needs to be performed on the plurality of continuous features to obtain corresponding discrete features, and the obtained discrete features are used as the content structure features of the information. In practical implementation, different continuous features may have different corresponding discretization processing manners, for example, according to statistics, in general, the header length range of the information is 0 to 50 words, and the number of images is 0 to 50, and because the number is small, the header length and the number of images can be directly used as the content structure feature of the information, but when the header length range is large (e.g., greater than 100) and the number of images is large (e.g., greater than 100), the header length or the number of images needs to be scaled, wherein, a scaling factor may be set according to circumstances, e.g., set to 5 or 10, and similarly, when the text length range is 100 to 10000 words, the text length may be scaled, e.g., set to 100; the image-text ratio is used for representing the ratio of the text length to the image quantity, and the image-text ratio can be subjected to binning processing by adopting the following formula (1):

（1）

wherein the content of the first and second substances,

is the ratio of the picture and the text,

Is the text length,

Is the number of images.

In some embodiments, referring to fig. 3B, fig. 3B is a schematic diagram of an obtaining method of a content understanding feature provided in an embodiment of the present application, where the content understanding feature of the obtained information may be implemented through steps 201 to 202 shown in fig. 3B:

in step 201, at least two words in the information are encoded to obtain a vector representation of each word, and iterative encoding processing is performed based on the position of each word and the vector representation to obtain an encoding characteristic of the information.

In some embodiments, at least two words in the information are encoded in step 201, resulting in a vector representation of each word: performing word segmentation processing on the information to obtain at least two words, and performing vector conversion on the at least two words to obtain a vector representation corresponding to each word; correspondingly, in step 201, iterative coding processing is performed based on the position and vector representation of each word, and coding characteristics of information are obtained: according to the position of each word in the information, carrying out position embedding processing on the vector representation corresponding to each word to obtain a position code corresponding to each word; and adding the vector representation corresponding to each word and the position code to determine the coding characteristics corresponding to each word.

The subsequent coding processing is carried out by taking the words obtained by segmenting the information as dimensions, so that the semantic understanding capacity of the information can be improved, and the vector representations of the words appearing at different positions in the information are distinguished by adding corresponding position codes in consideration of the difference of semantic information carried by the words appearing at different positions in the information, so that the real meaning of the information can be better expressed.

In some embodiments, the dimensions of the position encoding of a word are the same as the dimensions of the vector representation of the word; the position embedding processing is performed on each vector representation according to the position of each word in the information, and the position code corresponding to each word is obtained by the following method: when the serial number of the dimensionality in the position code is an even number, determining the code value of the corresponding dimensionality in the position code according to a sine function, wherein the sine function takes the sequencing position of the words in the information and the position code dimensionality as parameters; and when the serial number of the dimensionality in the position code is an odd number, determining a code value corresponding to the dimensionality in the position code according to a cosine function, wherein the cosine function takes the sequencing position of the words in the information and the position code dimensionality as parameters.

As an example, when the serial number of a dimension in the position code is an even number, the code value corresponding to the dimension in the position code is determined according to the following sine function (2):

（2）

when the serial number of the dimensionality in the position code is an odd number, determining the code value of the corresponding dimensionality in the position code according to the following cosine function (3):

（3）

whereinPE (i) is the coded value of the ith dimension in the position code, pos is the sequencing position of the participle in the understanding attribute information, i is the serial number of each dimension in the position code, i is an integer not less than 0, d_modelThe dimensions are encoded for position.

By the mode, the coding value of the corresponding dimension in the position code is determined by adopting a coding mode of a trigonometric function such as a sine function or a cosine function, so that not only can the absolute position information of the information be expressed, but also the relative position relation of the information can be expressed. In addition, the above is only one embodiment of the position coding, and in practical applications, the position coding is not limited to the coding using the trigonometric function, and the embodiment of the position coding is not limited in the present application.

After the coding features of each word are obtained, iterative coding processing is performed on the coding features of each word to obtain coding features of information, and in some embodiments, iterative coding processing is performed on the coding features of each word to obtain coding features of information, which can be implemented in the following manner: coding the input of the nth neural network model through the nth neural network model in the N cascaded neural network models, and transmitting the nth coding processing result output by the nth neural network model to the (N + 1) th neural network model for continuous coding; and taking the Nth encoding processing result output by the Nth neural network model as the encoding characteristic of the corresponding information.

Wherein n is an integer with the value increasing from 1, and the value range of n satisfies 1

n

N is an integer greater than or equal to 2; when n is 1, the input of the nth neural network model is the coding characteristics of each word, and when n is 2

n

When the input of the nth neural network model is the coding processing result of the nth-1 neural network model; the number N of the cascade connections of the neural network models may be set to 3, for example, the output result of the previous neural network model is the input of the next neural network model, the output of the last neural network model is the result of the encoding process for encoding the encoding characteristics of each word, and the input of the first neural network model is the encoding characteristics of each word.

In some embodiments, each neural network model includes an attention layer, a first normalization layer, a forward transport layer, and a second normalization layer; the encoding processing of the input of the nth neural network model by the nth neural network model of the N cascaded neural network models can be realized by the following modes: performing attention processing on the input of the nth neural network model through an attention layer to obtain an attention feature corresponding to the input of the nth neural network model; performing residual error connection processing and normalization processing on the attention characteristics and the input of the nth neural network model through a first normalization layer to obtain normalized characteristics corresponding to the input of the nth neural network model; performing linear rectification processing on the normalized features through a forward transmission layer to obtain a linear rectification processing result corresponding to the input of the nth neural network model; and performing residual error connection processing and normalization processing on the input normalization features and the linear rectification processing results of the nth neural network model through a second normalization layer to obtain an nth coding processing result output by the nth neural network model.

As an example, as shown in fig. 6, a cascaded neural network model may be obtained by cascading N neural network models, each neural network model includes an attention layer, a first normalization layer, a forward transmission layer, and a second normalization layer, then, taking a processing flow of the first neural network model as an example, taking a coding feature corresponding to each word as an input of the first neural network model, performing self-attention processing on the coding feature corresponding to each word through a self-attention mechanism of the attention layer to obtain an attention feature corresponding to each word, and learning a dependency relationship between words in the coding features corresponding to each word through the self-attention mechanism, thereby mining an important feature in information to be used for subsequent quality recognition processing and implementing an accurate recognition function.

The first normalization layer and the second normalization layer are both used for residual connection and normalization processing, for example, the first normalization layer is used for transposing the attention characteristic of each word to obtain a transposing characteristic corresponding to each word; adding the transposition characteristics corresponding to each word and the coding characteristics corresponding to each word to obtain an addition processing result corresponding to each word; and carrying out normalization processing on the addition processing result corresponding to each word to obtain the normalization characteristics corresponding to each word. The depth of the network can help the model to extract more abundant, abstract and semantic information-containing features, the depth increase cannot be performed simply by increasing the number of layers, not only can the gradient disperse or explode, but also can cause model degradation more seriously, residual error connection is used for solving the problem of degradation so as to keep the original input of the previous layer (namely the attention features corresponding to all words) as far as possible, normalization processing is used for normalizing the coding features corresponding to each word input at this time, normalization factors are the number of neurons at the current layer, and the convergence speed of the model can be improved through the normalization processing.

The forward transmission layer comprises two layers of Deep Neural Network (DNN) structures and an activation function layer, the activation function layer is used for performing linear rectification processing on input of the layer, the linear rectification processing can be realized through an activation function (such as Relu), the stacking use of a plurality of forward transmission layers can increase the precision degree of each word, the linear rectification result obtained through the forward transmission layer is input into a second normalization layer, residual error connection processing and normalization processing are performed on the linear rectification processing result corresponding to each word through the second normalization layer, a first coding processing result output by a first Neural network model, namely a first coding feature of information, is obtained, the first coding feature of the information is input into a subsequent cascaded Neural network model until an Nth coding processing result output by the last cascaded Neural network model is obtained as a coding feature of corresponding information, and the obtained coding characteristics of the corresponding information are used for carrying out quality class identification on the information subsequently.

In step 202, the encoding characteristic of the information is mapped to obtain a mapping characteristic of the information, and the mapping characteristic is biased to obtain a content understanding characteristic of the information.

After the coding features of the information are obtained, linear or nonlinear mapping processing can be performed on the coding features of the information through multilayer full connection to obtain corresponding mapping features, for example, the coding features of the information are subjected to full connection processing and transmitted to a hidden layer through an input layer, corresponding hidden layer features are obtained through the hidden layer, and the hidden layer features are subjected to feature mapping to obtain the mapping features of the information; then, the obtained mapping characteristics are subjected to quality class prediction through an activation function (such as ReLu), and content understanding characteristics of quality classes to which the characterization information belongs are obtained.

In some embodiments, after the coding features of the information are obtained, quality class prediction can be performed on the coding features of the information through a trained text classification model to obtain a prediction result representing the quality class to which the information belongs, and the prediction result is used as a content understanding feature of the information. The quality categories are low-quality categories of information, such as title parties, non-nutritive texts, advertising texts, old smells and the like, which need to be screened and recommended in the second stage, corresponding text classification models can be trained for each quality category when the text classification models are trained, namely the text classification models of the corresponding quality categories are trained by using training samples carrying corresponding quality category labels, for example, when the text classification models of the quality category of the title parties are trained, the training samples carry category labels (whether the labels are title parties or not, if the labels are the title parties, the labels are 1, and if the labels are not the title parties, the labels are 0), corresponding loss functions can be constructed based on the estimated quality categories and the category labels carried by the training samples, for example, a binary cross entropy loss function is constructed, and model parameters of the updated text classification models are solved by minimizing the loss functions.

After the features of at least two dimensions of the information are obtained, the features of at least two dimensions can be subjected to feature combination processing in the following mode to obtain the combined features of the information: performing feature splicing processing on the features of at least two dimensions to obtain combined features of information; or, carrying out feature weighted summation processing on the features of at least two dimensions to obtain the combined features of the information. In practical implementation, when performing weighted summation, the weighting parameters corresponding to the features of each of the at least two dimensions may be determined first, and the weighted summation is performed on the features of the at least two dimensions based on the weighting parameters of each dimension, so as to obtain the corresponding combined features.

In some embodiments, referring to fig. 3C, fig. 3C is a schematic diagram of a method for determining a first quality parameter provided in an embodiment of the present application, and the determining of the first quality parameter based on the combined feature in step 101 may be implemented by steps 1011 to 1013 shown in fig. 3C: in step 1011, feature splitting is performed on the first partial feature in the combined features, so as to obtain split features of the information; in step 1012, performing feature combination processing on the split features and the second partial features in the combined features of the information to obtain logistic regression features of the information, wherein the discrimination of the first partial features is greater than that of the second partial features; in step 1013, a quality parameter prediction process is performed on the information based on the logistic regression feature to obtain a first quality parameter of the information.

For example, if 80% of the information with the title length of more than 50 is low-quality content, and 30% of the information with the image number of more than 20 is low-quality content, the quality of the information can be easily determined according to the title length compared with the image number, so that the degree of distinction of the title length can be greater than the degree of distinction of the image number. When feature splitting is performed, the combined features are input into a gradient spanning tree (such as GBDT, XGBoost, and the like) model, referring to fig. 4, where fig. 4 is a quality parameter prediction schematic diagram provided in the embodiment of the present application, the gradient spanning tree model first performs feature splitting on a first part of features with relatively high degrees of distinction in the combined features to obtain corresponding split features, then inputs the obtained split features and a second part of features with relatively low degrees of distinction in the combined features into a logistic regression model, performs feature fusion on the split features and the second part of features to obtain logistic regression features, and performs quality parameter prediction processing on the logistic regression features through an activation function (such as ReLu) to obtain first quality parameters with high quality of characterization information.

The splitting characteristic is composed of elements with a value of 0/1, each element corresponds to a leaf node of a tree in the gradient lifting tree model, when a certain characteristic finally falls on a leaf node of the tree through the certain tree, the element value corresponding to the leaf node in a vector of the splitting characteristic is 1, the element values corresponding to other leaf nodes of the tree are 0, and the length of the splitting characteristic is equal to the sum of the leaf node numbers contained in all the trees in the gradient lifting tree model. Assuming that the gradient-boosted tree model includes two trees, the first tree has 3 leaf nodes and the second tree has 2 leaf nodes, a feature in the first part of features is input into the gradient-boosted tree model, and if it finally falls on the second leaf node in the first tree and on the first leaf node in the second tree, the vector of the split feature obtained by the gradient-boosted tree model is represented as [0, 1, 0, 1, 0], where the first three elements in the vector correspond to the 3 leaf nodes of the first tree and the last two elements correspond to the 2 leaf nodes of the second tree.

Through the mode, after the characteristics of part of the characteristics in the combined characteristics are split, richer split characteristics can be learned, and the first quality parameters obtained by predicting the quality parameters based on the split characteristics can be more accurate.

In some embodiments, the first quality parameter may be obtained by predicting the combined feature of the information by using only one of the gradient tree model and the logistic regression model, for example, the combined feature of the information is input into the logistic regression model, the combined feature is projected by the logistic regression model to obtain a corresponding projection feature, for example, the combined feature is subjected to linear logistic regression by using a logistic regression function, where the linear logistic regression may be linear summation or linear summation substituting the linear summation result into the logistic regression function to obtain the logistic regression feature, and then the first quality parameter representing the high or low quality of the information is obtained by performing the quality parameter prediction process on the logistic regression feature by using an activation function.

In some embodiments, the information may be subjected to first quality parameter prediction based on a Factorization Machine (FM) model, for example, the FM model performs feature intersection processing on two features of the multiple dimensions of the information to obtain intersection features, and performs quality prediction on the information based on the intersection features to obtain the first quality parameter.

In some embodiments, an end-to-end neural network model may also be used to perform a first quality prediction on the information, for example, the information to be identified is input into the trained neural network model, features of at least two dimensions of the information are extracted and obtained through the neural network model, the features of the at least two dimensions are subjected to feature combination processing to obtain a combined feature of the information, and the quality of the information is predicted based on the combined feature to obtain a first quality parameter.

It can be understood that, in actual implementation, other prediction methods may be used to predict the quality of the information, and the embodiment of the present application does not limit the prediction method of the quality parameter.

In step 102, interactive features related to the information recommendation process are acquired in the second stage, a second quality parameter of the information is determined based on the interactive features, and a quality identification result of the information is determined by combining the first quality parameter and the second quality parameter.

The second phase is a period for online recommending the information, namely the online phase of the information.

In some embodiments, referring to fig. 3D, fig. 3D is a schematic diagram of a method for determining a quality recognition result provided in an embodiment of the present application, and in step 102, in a second stage, an interactive feature related to a recommendation process of information is obtained, and a second quality parameter of the information is determined based on the interactive feature, which may be implemented through steps 1021 to 1022 shown in fig. 3D: in step 1021, in the second stage, periodically acquiring interaction data related to the recommendation process of the information, and extracting corresponding interaction features from the interaction data, wherein the period is divided based on the sampling duration or the collection number of the interaction data; in step 1022, a second quality parameter of the information in each period is determined based on the interactive features of each period;

in step 102, the first quality parameter and the second quality parameter are combined to determine the quality identification result of the information, which can be implemented through steps 1023 to 1024 shown in fig. 3D: in step 1023, a period quality identification result of the information in each period is determined by combining the first quality parameter and the second quality parameter of each period; in step 1024, according to the periodic quality identification result of the information in each period, determining normal distribution that the quality variation trend of the information satisfies, and taking the parameters of the normal distribution as the overall quality identification result of the information, wherein the parameters include a quality parameter mean and a quality parameter variance.

In practical implementation, in the second stage, a second quality parameter of the information in each period is calculated periodically according to interactive data related to a recommendation process of the information, such as a praise amount, a click amount, an evaluation amount, a share amount, a negative feedback amount and a report amount, the second quality parameter of the information in each period is respectively combined with the first quality parameter obtained in the first stage, and a period identification result of the information in each period is obtained in a manner of multiplying or weighting and summing the second quality parameter and the first quality parameter in each combination, so that the quality identification result of the information is continuously updated, for example, the period identification result of the current period represents that the quality of the information is unqualified, and the information is recalculated at next or multiple intervals, and when the information is continuously unqualified for multiple times, the information is determined to be low-quality content. When determining whether the information is low-quality content according to normal distribution satisfied by the quality variation trend of the information, the quality variation trend is represented by using a quality parameter mean value and a quality parameter variance in normal distribution parameters, and when the quality parameter mean value is higher than a quality parameter threshold and the quality parameter variance is larger than a variance threshold, the information is determined to be low-quality content and the information is shielded. Through the mode, the quality of the information can be accurately identified through the interactive data of a plurality of periods, and misjudgment caused by occasional quality jitter is avoided.

In some embodiments, referring to fig. 3E, fig. 3E is a schematic diagram of a method for determining a quality identification result provided in an embodiment of the present application, in step 102, in a second stage, an interactive feature related to a recommendation process of information is obtained, and a second quality parameter of the information is determined based on the interactive feature, which may be implemented through steps 1025 to 1026 shown in fig. 3E: in step 1025, collecting interaction data related to the recommendation process of the information in a sampling window of the second stage, and extracting corresponding interaction features from the interaction data, wherein the type of the sampling window includes: setting a time length sampling window and a data volume sampling window; in step 1026, determining a second quality parameter in the sampling window based on the interaction data; in step 102, the first quality parameter and the second quality parameter are combined to determine the quality identification result of the information, which may be implemented by step 1027 shown in fig. 3E: in step 1027, the first quality parameter and the second quality parameter in the sampling window are fused, and the obtained third quality parameter is used as the quality identification result.

The setting of the duration sampling window means that the sampling duration is fixed, for example, the sampling duration is 1 hour, and the sampling data volume is not fixed, for example, the sampling data volume is the number of interactive data related to the recommendation process of the information within 1 hour of sampling, and for different information, even if the sampling duration is the same, the corresponding sampling data volume may be different, or for the same information, the corresponding sampling data volume may be different in the same sampling duration within different sampling time periods. Setting the data volume sampling window means that the sampling data volume is fixed, and the time duration is not fixed, for example, the sampling time duration required for sampling the same interactive data volume in different time periods for the same information may be different.

In the second stage, calculating a first quality parameter according to the interactive data acquired by the sampling window, and fusing the second quality parameter with the first quality parameter to obtain a third quality parameter, wherein if the first quality parameter and the second quality parameter are multiplied, the obtained third quality parameter is used as a quality identification result of the information; or, the first quality parameter and the second quality parameter are subjected to weighted summation processing, and the obtained third quality parameter is used as the quality identification result of the information. Therefore, when the information is determined to be unqualified according to the primary calculation result, the information is determined to be low-quality content, the information is shielded, and the wide spread of the low-quality information can be avoided in time.

In some embodiments, the interaction features include positive interaction features characterizing the preference information and negative interaction features characterizing the non-preference information; the dimensionality of the positive interactive feature and the dimensionality of the negative interactive feature are at least two; referring to fig. 3F, fig. 3F is a schematic diagram of a method for determining a second quality parameter according to an embodiment of the present application, and the step 102 of determining the second quality parameter based on the interactive feature determination information may be implemented by steps 1028 to 1030: in step 1028, based on the first weight of the forward interactive feature of each dimension, performing weighted summation processing on the forward interactive feature of each dimension, and determining a forward quality parameter negatively correlated to the first weighted summation processing result; in step 1029, based on the second weight of the negative interactive feature of each dimension, performing weighted summation processing on the negative interactive feature of each dimension, and determining a negative quality parameter positively correlated to the second weighted summation result; in step 1030, the positive quality parameter and the negative quality parameter are summed to obtain a second quality parameter of the information.

As an example, the inverse of the first weighted sum processing result is taken as a positive quality parameter, and a power exponent to the bottom of the natural exponent and to the power of the second weighted sum processing result is taken as a negative quality parameter. The value of the positive interactive feature is in a negative correlation with the positive quality parameter, the value of the negative interactive feature is in a positive correlation with the negative quality parameter, and the value of the second quality parameter is in a negative correlation with the information quality.

When the continuous interaction features of each dimension are in different dimensions, in order to balance the influence of each interaction feature on the quality identification result of the information, the feature obtained by discretizing the interaction features is used as a final interaction feature, for example, at least one of the following continuous forward interaction features representing preference information is obtained: the amount of praise, click, comment and share; discretizing the continuous forward interactive features to obtain corresponding discrete forward features; and taking the discrete forward characteristic corresponding to at least one continuous forward interactive characteristic as a forward interactive characteristic.

As an example, since the amount of praise and the amount of click can reach the level of 10 ten thousand, the following formula (4) can be adopted to discretize the amount of praise and the amount of click:

（4）

wherein the content of the first and second substances,

is the amount of praise for the information,

is the forward interactive feature corresponding to the amount of praise,

is the amount of clicks on the information,

is the forward interactive feature corresponding to the click rate.

The evaluation quantity and the sharing quantity can be discretized by the following formula (5):

（5）

wherein the content of the first and second substances,

is the amount of review of the information,

is a forward interactive feature corresponding to the amount of comments,

is the amount of sharing of the information,

is the forward interactive feature corresponding to the share amount.

For negative feedback quantity and reporting quantity, because the two dimensions represent negative interactive characteristics of non-preference information and represent negative emotions of a user, the negative feedback quantity and the reporting quantity are directly used as the negative interactive characteristics without discretization.

In some embodiments, the first weight of the positive-going interactive features for each dimension and the second weight of the negative-going interactive features for each dimension may be determined by: determining a corresponding first weight and a second weight according to the influence degree of the positive interactive feature of each dimension and the negative interactive feature of each dimension on the quality identification result of the information; or determining the quality class to which the information belongs, and determining a first weight and a second weight which are matched with the quality class according to the quality class; or determining the information category to which the information belongs, and determining a first weight and a second weight which are matched with the information category according to the information category.

In practical application, no matter the interactive feature is a positive interactive feature or a negative interactive feature, the weight of each dimension has a positive correlation with the influence degree of the dimension on the quality identification result of the information, that is, the influence degree of the dimension on the quality identification result of the information is large, and correspondingly, the weight of the corresponding dimension is large.

Since the interaction behavior distribution of the user to the information of different quality categories is different, the weight adapted to the quality category to which the information belongs is set according to the quality category to which the information belongs, for example, the interaction behavior distribution of the user to the information of different quality categories such as a title party, a non-nutritional article, an advertisement article and the like is different, and the influence degree on the quality identification result of the information is different, so different weights are set, for example, for the two quality categories of the title party and the advertisement article, the weight of at least one dimension is different in the positive interaction features corresponding to the four dimensions (the amount of approval, the click rate, the amount of review and the amount of sharing), and the weight of two dimensions is different in the negative interaction features corresponding to the two dimensions (the amount of negative feedback and the amount of review).

The distribution of the interaction behavior of the user on the information of different information categories is also different, for example, the distribution of the interaction behavior of the user on different information categories such as entertainment, society, sports and the like is different, so when the weight of the feature of each dimension is set, different weights are set for the information of different information categories.

In some embodiments, after the quality identification result of the information is determined, when the quality identification result representation information is low-quality information needing to be shielded, a corresponding shielding mode is determined according to the low-quality grade of the information, and the corresponding shielding mode is applied to the information; wherein the shielding mode comprises at least one of: performing weight reduction processing on the information in a sequencing link of a recommendation system; temporarily filtering the information in the recall result of the recommendation system; the information is permanently filtered in the recall results of the recommendation system.

In addition, the quality identification method of information provided by the embodiment of the present application can be further applied to a recall stage of the recommendation system, when it is determined that there is low-quality information to be shielded in the recalled information, the low-quality information is temporarily filtered or permanently filtered out from the recalled information, and then subsequent ranking and re-recommendation are performed based on the filtered information, wherein the temporary filtering means that when the filtering duration of the information reaches a target duration, the information is recalled, the target duration is in a positive correlation with the quality parameter of the information, that is, the quality parameter of the information is larger (the quality of the information is lower), the target duration is larger, and if the quality parameter of the information 1 is larger than the quality parameter of the information 2 (that the quality of the information 1 is lower than the quality of the information 2), recall after filtering for 2 days for message 1 and 1 day for message 2; the method can also be applied to a sorting stage of a recommendation system, when the sorted information is determined to have low-quality information needing shielding, the low-quality information is subjected to weight reduction sorting to reduce the recommendation times or recommendation frequency, for example, before the weight reduction sorting is not adopted, the information can be recommended to 100 persons in one week, after the weight reduction sorting is adopted, the information can be recommended to only 20 persons in one week, in addition, the weight reduction amplitude and the quality of the information are in a negative correlation relationship, namely the lower the quality of the information, the larger the weight reduction amplitude is, and the lower the recommendation times or recommendation frequency aiming at the information in a certain time after the weight reduction sorting is carried out; therefore, recommendation is prohibited or reduced-weight recommendation is performed on low-quality information, so that wide spread of the low-quality information is avoided, the overall information quality is indirectly improved, the user experience is improved, and the initial visit user and the revisit user are effectively reserved.

It should be noted that, in the embodiment of the present application, the values of the first quality parameter and the second quality parameter are in a negative correlation with the quality of the information, that is, the larger the quality parameter is, the lower the quality of the information is, the value of the third quality parameter obtained by combining the first quality parameter and the second quality parameter is also in a negative correlation with the quality of the information, and when the value of the third quality parameter reaches the first parameter threshold, the information may be determined to be low-quality information. However, in practical applications, if the quality parameter is obtained through a quality parameter model and the quality parameter model is trained by using an information sample labeled with a quality parameter positively correlated with the information quality in a model training stage, the values of the first quality parameter and the second quality parameter are positively correlated with the information quality, that is, the value of the positive interactive feature and the positive quality parameter are positively correlated, and the value of the negative interactive feature and the negative quality parameter are negatively correlated, for example, if the first weighted sum processing result is used as the positive quality parameter, the inverse of the power exponent with the natural exponent as the base and the second weighted sum processing result as the power is used as the negative quality parameter, and thus, the value of the finally obtained third quality parameter is positively correlated with the information quality, that is, the smaller the quality parameter is, the lower the information quality is, when the value of the third quality parameter is below the second parameter threshold, the information may be determined to be low quality information.

Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described. Referring to fig. 5 and fig. 5 are schematic diagrams illustrating information shielding recommendation processing provided in an embodiment of the present application, in an information flow recommendation scenario, before information is recommended online, an information flow platform performs quality audit on information to be recommended, so as to eliminate information with poor quality and avoid recommendation online, the audit includes manual audit and machine audit, the manual audit adopts an audit patrol mode to find low-quality information, but is limited by personal knowledge reserve of auditors, and for some professional vertical information, such as news bagua of entertainment and cartoon episode discussion, the auditors lack background knowledge and cannot effectively judge the information, so that accuracy of information quality identification is low. The machine audit identifies the quality of the information based on the natural language processing model, and because the machine audit has certain accuracy and recall rate limitation, the identification accuracy is low (for example, less than 80 percent) and can not reach 100 percent of standard recall, the low-quality information can not be directly attacked for off-shelf (namely shielding and filtering), otherwise, a lot of normal information can be accidentally injured; after the information is recommended online, whether the information is low-quality or not is judged by monitoring the report quantity and the negative feedback quantity aiming at the single information, but the method carries out shielding recommendation on the corresponding low-quality information only when the report quantity and the negative feedback quantity are accumulated to a certain threshold value, so that the low-quality information is low in identification efficiency and is not shielded timely.

Therefore, before online recommendation is performed on information, the prior quality score (namely a first quality parameter) of the information is obtained by combining the content structure characteristics (including title length, text length, image quantity, image-text ratio), account number characteristics, content understanding characteristics and other characteristics of multiple dimensions of the information; in the period of online recommendation of the information, the interactive characteristics related to the recommendation process of the information are obtained, the posterior quality score (namely, the second quality parameter) of the information is determined based on the interactive characteristics, and the comprehensive quality score (namely, the third quality parameter) of the information is determined by combining the prior quality score, so that the identification efficiency and accuracy of the information quality can be improved, and the low-quality information is shielded and filtered in time, so that the recommendation precision and the user experience of a recommendation system are improved.

Firstly, before online recommendation is carried out on information, an acquisition method of characteristics of multiple dimensions of the information is explained, content structure characteristics such as title length, text length, image quantity, image-text ratio and the like of the information are acquired, when the characteristics are in different dimensions, in order to balance the influence of each characteristic on the quality identification result of the information, discretization processing needs to be carried out on the characteristics to obtain corresponding discrete characteristics, and the obtained discrete characteristics are used as the content structure characteristics of the information. In practical implementation, different discretization processing modes corresponding to different features may be different, for example, in a general case, the title length range of the information is 0 to 50 words, the number of the images is 0 to 50, the title length and the number of the images can be directly used as content structure features of the information due to the small number, the text length range is 100 to 10000 words, the text length is scaled, the scaling factor can be set, for example, to 100, and the scaled text length is used as the content structure feature; the image-text ratio is used for representing the ratio of the text length to the number of images, and the image-text ratio can be subjected to binning processing by adopting the formula (1). The account characteristics of the information are the account levels of the information publishers, and the account levels are 1-5 and can be directly used as the account characteristics.

Next, a description is given to an acquisition process of content understanding features of information, in actual implementation, the content understanding features may be obtained by performing feature extraction on the information through a trained bi-directional Encoder Representation (Bert) model of a Transformer, where the Bert model includes a Bert pre-training model and a text classification model, before training the Bert pre-training model, a network structure of the Bert pre-training model is defined in advance, see fig. 6, where fig. 6 is a schematic diagram of the Bert pre-training provided in this application, the network structure may adopt a structure of encoders in multiple Transformers, each unlabeled training sample in a training sample set is input into each layer of the Encoder in an unsupervised training manner to perform forward propagation, each Encoder includes an attention layer, a first normalization layer, a forward transmission layer, and a second normalization layer, and performs layer-by-layer training from the attention layer, and then adjusting the parameters of each layer in the encoder in a back propagation mode until convergence completes training.

Referring to fig. 7, fig. 7 is a schematic diagram of training a text classification model provided in an embodiment of the present application, where the text classification model is used to predict a quality class to which information belongs, and the quality class represents a low-quality class of the information, such as a banquet, a non-nutritional text, an advertisement text, a news, and the like, when the text classification model is trained, a corresponding text classification model is trained for each quality class in a supervised training manner, that is, a text classification model of a corresponding quality class is trained by using a training sample carrying a corresponding quality class label, and when the text classification model of the quality class of the banquet is trained, the training sample carries a class label (indicating whether the text classification model is a banquet, if the text classification model is a banquet label is 1, if the text classification model is not a banquet label is 0), a corresponding loss function, such as a binary cross entropy loss function, can be constructed based on the estimated quality class and the class label carried by the training sample, the model parameters of the updated text classification model are solved by minimizing the loss function.

After a Bert pre-training model and a text classification model in a Bert model are trained, inputting information to be recognized into the Bert model, coding at least two words in the information through the Bert pre-training model to obtain vector representation of each word, and performing position embedding processing on the vector representation corresponding to each word according to the position of each word in the information to obtain position codes corresponding to each word; adding the vector representation corresponding to each word and the position code, and determining the coding characteristics corresponding to each word; carrying out multiple iterative coding processing on the coding features of each word to obtain the coding features of the information, and inputting the coding features of the information into a text classification model; and performing quality category identification on the coding features of the information through a text classification model to obtain content understanding features of at least one quality category to which the characterization information belongs, for example, performing mapping processing on the coding features of the information to obtain mapping features of the information, and performing offset processing on the mapping features to obtain the content understanding features of the information.

Referring to fig. 8, fig. 8 is a schematic diagram of obtaining a priori quality score provided in the embodiment of the present application, after obtaining content structure features, account features, and content understanding features of information, the content structure features, the account features, and the content understanding features are feature-combined to obtain combined features, and the combined features are input into a quality score prediction model for predicting a priori quality score to predict the quality score, where the quality score prediction model includes a gradient spanning tree (XGBoost) model and a Logistic Regression (LR) model, the gradient spanning tree model first performs feature splitting on a first part of features with relatively high degrees of separation in the combined features to obtain corresponding split features, then inputs a second part of features with relatively low degrees of separation in the obtained split features and the combined features into the logistic regression model, and performs feature fusion on the split features and the second part of features to obtain logistic regression features, and performing quality score prediction processing on the logistic regression characteristics through an activation function (such as ReLu) to obtain prior quality scores with high and low representation information quality

。

In the period of online recommendation of information, interactive features related to the recommendation process of the information, such as positive interactive data including praise amount, click amount, comment amount, share amount and the like and negative interactive data including negative feedback amount, report amount and the like are obtained, after feature extraction is respectively performed on the positive interactive data and the negative interactive data, the extracted features can be further processed in a corresponding discretization processing mode, for example, as the praise amount and the click amount can reach 10 ten thousand levels, the praise amount and the click amount can be discretized by adopting the formula (4), and the comment amount and the share amount can be subjected to discretization processing by adopting the formula (5)Discretizing, and taking the discretized features as forward interactive features, so that the extrusion effect on other features caused by overlarge forward interactive features of a certain dimension can be prevented, and the discretized features are in the same dimension, so that the features can be obtained by weighting and summing

To characterize the value of the preference information, and therefore, the reciprocal of the value is taken as the forward quality score (i.e., the forward quality parameter)

Where n represents the total dimension of the forward interaction,

representing the forward interactive feature in the ith dimension,

the weight of the positive interactive feature representing the ith dimension, C, is in a negative correlation with the quality of the information, i.e. the larger C, the lower the quality of the information.

For negative feedback quantity and reporting quantity, because the negative interactive characteristics of the two dimensions representing the non-preference information represent the negative emotion of the user, the negative feedback quantity and the reporting quantity are directly used as the negative interactive characteristics, and then the negative quality score (namely the negative quality parameter)

Wherein A represents a report amount, B represents a negative feedback amount,

a weight representing the amount of the claim is raised,

the weight representing the amount of the report is such that D has a negative correlation with the quality of the information, i.e., the larger D, the lower the quality of the information.

Combined with positive massC and D, namely obtaining the posterior quality score of the information

Such as

Then, the prior quality scores of the information are combined

Obtaining the comprehensive quality score (i.e. the third quality parameter) of the information, for example, multiplying the prior quality score and the posterior quality score to obtain the comprehensive quality score

The overall quality score is inversely related to the quality of the information, i.e., the greater the overall quality score, the lower the quality of the information represented.

After the comprehensive score of the information is determined, the quality of the information is identified and judged according to the comprehensive quality score of the information, the information of which the comprehensive quality score exceeds a score threshold is determined as low-quality information, or the information recommended on the line is sorted according to the sequence of the comprehensive quality score from high to low, the information of TOP N (such as N = 100) before ranking is determined as the low-quality information, and the low-quality information is shielded and recommended to avoid the wide spread of the low-quality information.

Continuing with the exemplary structure of the information quality recognition device 555 provided by the embodiments of the present application as a software module, in some embodiments, as shown in fig. 2, the software module in the information quality recognition device 555 stored in the memory 550 may include:

a first determining module 5551, configured to obtain features of at least two dimensions of the information in a first stage, perform feature combination processing on the features of the at least two dimensions to obtain a combined feature of the information, and determine a first quality parameter of the information based on the combined feature; wherein the first stage is a period before online recommendation of the information;

a second determining module 5552, configured to obtain, in a second stage, an interaction feature related to the recommendation process of the information, and determine a second quality parameter of the information based on the interaction feature;

a third determining module 5553, configured to determine a quality identification result of the information by combining the first quality parameter and the second quality parameter; wherein the second stage is a period of online recommendation of the information.

In some embodiments, the second determining module 5552 is further configured to, in the second stage, periodically obtain interaction data related to the recommendation process of the information, and extract corresponding interaction features from the interaction data, where the period is divided based on a sampling duration or a collection number of the interaction data;

the third determining module 5553 is further configured to determine, by combining the first quality parameter and the second quality parameter of each period, a period quality identification result of the information in each period;

In some embodiments, the second determining module 5552 is further configured to collect interaction data related to the recommendation process of the information in a sampling window of the second stage, and extract corresponding interaction features from the interaction data, where the type of the sampling window includes: setting a time length sampling window and a data volume sampling window;

the third determining module 5553 is further configured to perform fusion processing on the first quality parameter and the second quality parameter in the sampling window, and use an obtained third quality parameter as a quality identification result.

In some embodiments, the apparatus further comprises:

In some embodiments, the first determining module 5551 is further configured to perform at least two of the following operations in the first stage:

In some embodiments, the first determining module 5551 is further configured to obtain at least one of the following consecutive characteristics of the information: title length, image number, image-text proportion and text length;

discretizing the continuous features to obtain corresponding discrete features;

In some embodiments, the first determining module 5551 is further configured to perform encoding processing on at least two words in the information to obtain a vector representation of each word, and perform iterative encoding processing based on a position of each word and the vector representation to obtain an encoding characteristic of the information;

In some embodiments, the first determining module 5551 is further configured to perform word segmentation on the information to obtain at least two words, and perform vector transformation on the at least two words to obtain a vector representation corresponding to each word;

In some embodiments, the dimensions of the position code are the same as the dimensions of the vector representation of the word; the first determining module is further configured to determine, when a sequence number of a dimension in the position code is an even number, a code value corresponding to the dimension in the position code according to a sine function, where the sine function takes a sorting position of the word in the information and a position code dimension as parameters;

In some embodiments, the first determining module 5551 is further configured to perform feature splitting on a first partial feature in the combined feature, so as to obtain a split feature of the information;

In some embodiments, the interaction features include positive interaction features characterizing a preference for the information and negative interaction features characterizing a non-preference for the information; the second determination module is further configured to perform weighted summation processing on the forward interactive features of each dimension based on the first weight of the forward interactive features of each dimension, and determine a forward quality parameter negatively correlated to the first weighted summation processing result;

In some embodiments, the apparatus further comprises:

In some embodiments, the apparatus further comprises: the weight determination module is used for determining the corresponding first weight and the second weight according to the influence degree of the positive interactive features of each dimension and the negative interactive features of each dimension on the quality identification result of the information; or, determining a quality class to which the information belongs, and determining the first weight and the second weight which are adapted to the quality class according to the quality class; or determining the information category to which the information belongs, and determining the first weight and the second weight which are matched with the information category according to the information category.

In some embodiments, the third determining module 5553 is further configured to multiply the first quality parameter and the second quality parameter, and use a third quality parameter obtained as a quality identification result of the information; or, the first quality parameter and the second quality parameter are subjected to weighted summation processing, and an obtained third quality parameter is used as a quality identification result of the information.

Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device executes the quality identification method of the information described in the embodiment of the present application.

Embodiments of the present application provide a computer-readable storage medium having stored therein executable instructions that, when executed by a processor, cause the processor to perform a method for quality identification of information provided by embodiments of the present application, for example, the method for quality identification of information as illustrated in fig. 3A.

In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

According to the method, before online recommendation is carried out on the information, a first quality parameter of the information is determined according to the characteristics of multiple dimensionalities of the information, during the online recommendation of the information, a second quality parameter of the information is determined according to the interactive characteristics related to the recommendation process of the information, and the quality identification result of the information is determined by combining the first quality parameter and the second quality parameter; therefore, quality parameters of information before online recommendation and in the online recommendation process are comprehensively considered from multiple dimensions, the accuracy of a finally obtained quality identification result can be improved, low-quality information is shielded and recommended according to the quality identification result, and wide spread of the low-quality information is avoided.

The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. A method for quality identification of information, the method comprising:

in the first stage, the characteristics of at least two dimensions of the information are obtained, the characteristics of the at least two dimensions are subjected to characteristic combination processing to obtain the combined characteristics of the information, and

determining a first quality parameter of the information based on the combined features; wherein the first stage is a period before online recommendation of the information;

collecting interactive data related to the recommendation process of the information in a sampling window in a second stage, extracting corresponding interactive features from the interactive data, determining a second quality parameter of the information in the sampling window based on the interactive features, an

Fusing the first quality parameter and the second quality parameter in the sampling window, and taking the obtained third quality parameter as the quality identification result of the information; wherein the second phase is a period of online recommendation for the information, and the types of the sampling window include: setting a time length sampling window and setting a data volume sampling window.

2. The method of claim 1, wherein the method further comprises:

when the information is represented to be low-quality information needing shielding according to the quality identification result, determining a corresponding shielding mode according to the low-quality grade of the information, and applying the corresponding shielding mode to the information;

3. The method of claim 1, wherein said obtaining features of at least two dimensions of said information in a first phase comprises:

performing in a first phase at least two of the following operations:

4. The method of claim 3, wherein said obtaining content structure features characterizing said information comprises:

acquiring at least one of the following successive characteristics of the information: title length, image number, image-text proportion and text length;

discretizing the continuous features to obtain corresponding discrete features;

5. The method of claim 3, wherein said obtaining a content understanding feature of said information comprises:

coding at least two words in the information to obtain vector representation of each word, and carrying out iterative coding processing based on the position and the vector representation of each word to obtain coding features of the information;

6. The method of claim 5, wherein said encoding at least two words of said information to obtain a vector representation of each said word comprises:

performing word segmentation processing on the information to obtain at least two words, and performing vector conversion on the at least two words to obtain a vector representation corresponding to each word;

the iterative coding processing is performed based on the position of each word and the vector representation to obtain the coding characteristics of the information, and the iterative coding processing comprises the following steps:

7. The method of claim 6, wherein the dimensions of the position code are the same as the dimensions of the vector representation of the word;

the performing, according to the position of each word in the information, position embedding processing on each vector representation to obtain a position code corresponding to each word includes:

when the serial number of the dimensionality in the position code is an even number, determining a code value corresponding to the dimensionality in the position code according to a sine function, wherein the sine function takes the sequencing position of the words in the information and the position code dimensionality as parameters;

8. The method of claim 1, wherein said determining a first quality parameter for the information based on the combined features comprises:

performing feature splitting on a first part of features in the combined features to obtain splitting features of the information;

9. The method of claim 1, wherein the interaction features comprise positive interaction features characterizing a preference for the information and negative interaction features characterizing a non-preference for the information;

the determining a second quality parameter of the information in the sampling window based on the interaction feature comprises:

based on the first weight of the forward interactive feature of each dimension, carrying out weighted summation processing on the forward interactive feature of each dimension, and determining a forward quality parameter which is negatively related to the result of the first weighted summation processing;

summing the positive quality parameter and the negative quality parameter to obtain a second quality parameter of the information in the sampling window;

10. The method of claim 9, wherein the method further comprises:

obtaining at least one of the following continuous forward interactive features characterizing preferences of the information: the amount of praise, click, comment and share;

11. The method of claim 9, wherein the method further comprises:

determining the corresponding first weight and the second weight according to the influence degree of the positive interactive features of each dimension and the negative interactive features of each dimension on the quality identification result of the information; alternatively, the first and second electrodes may be,

determining a quality class to which the information belongs, and determining the first weight and the second weight which are matched with the quality class according to the quality class; alternatively, the first and second electrodes may be,

and determining the information category to which the information belongs, and determining the first weight and the second weight which are matched with the information category according to the information category.

12. The method according to claim 1, wherein the fusing the first quality parameter and the second quality parameter in the sampling window, and using the obtained third quality parameter as the quality identification result of the information, comprises:

multiplying the first quality parameter and the second quality parameter in the sampling window, and taking the obtained third quality parameter as the quality identification result of the information; alternatively, the first and second electrodes may be,

and performing weighted summation processing on the first quality parameter and the second quality parameter in the sampling window, and taking the obtained third quality parameter as the quality identification result of the information.

13. An apparatus for quality identification of information, the apparatus comprising:

the second determination module is used for collecting interactive data related to the recommendation process of the information in a sampling window in a second stage, extracting corresponding interactive features from the interactive data, and determining a second quality parameter of the information in the sampling window based on the interactive features;

a third determining module, configured to perform fusion processing on the first quality parameter and the second quality parameter in the sampling window, and use an obtained third quality parameter as a quality identification result of the information; wherein the second phase is a period of online recommendation for the information, and the types of the sampling window include: setting a time length sampling window and setting a data volume sampling window.

14. An electronic device, characterized in that the electronic device comprises:

a memory for storing executable instructions;

a processor for implementing the method of quality identification of information of any one of claims 1 to 12 when executing executable instructions stored in the memory.

15. A computer-readable storage medium storing executable instructions, wherein the executable instructions when executed by a processor implement the method of quality identification of information of any one of claims 1 to 12.