CN114329219A

CN114329219A - Data processing method, method and device for outputting knowledge content

Info

Publication number: CN114329219A
Application number: CN202111673243.8A
Authority: CN
Inventors: 徐伟; 程鸣权; 杨海涛; 步君昭; 蒋俊翔; 刘欢; 骆金昌; 何伯磊; 和为; 陈坤斌; 毛丽媛; 周敏
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2022-04-12

Abstract

The disclosure provides a data processing method and a device for outputting knowledge content, and relates to the technical field of artificial intelligence, in particular to the technical field of content recommendation. The specific implementation scheme is as follows: acquiring a sample knowledge content set and sample labeling categories corresponding to all sample knowledge contents in the sample knowledge content set; determining initial sample knowledge content from a set of sample knowledge content; the following model training steps are performed on the model to be trained: determining a prediction category corresponding to the knowledge content of the initial sample based on the knowledge content of the initial sample and the model to be trained; and determining the model to be trained as a weight generation model after training is finished in response to the fact that the prediction type and the sample marking type corresponding to the initial sample knowledge content meet the preset convergence condition. The intelligent degree of knowledge content display can be improved by the implementation mode.

Description

Data processing method, method and device for outputting knowledge content

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical field of content recommendation.

Background

At present, with the continuous development of information technology, more and more contents are obtained by users from various channels.

In practice, it is found that the display of the contents is often disordered, and a user is required to select a corresponding content viewing mode according to own requirements, for example, to select a content viewing sequence according to own requirements. Therefore, the existing content display mode has the problem of low intelligent degree.

Disclosure of Invention

The disclosure provides a data processing method, a method and a device for outputting knowledge content.

According to an aspect of the present disclosure, there is provided a data processing method including: acquiring a sample knowledge content set and sample labeling categories corresponding to all sample knowledge contents in the sample knowledge content set; determining initial sample knowledge content from a set of sample knowledge content; the following model training steps are performed on the model to be trained: determining a prediction category corresponding to the knowledge content of the initial sample based on the knowledge content of the initial sample and the model to be trained; and determining the model to be trained as a weight generation model after training is finished in response to the fact that the prediction type and the sample marking type corresponding to the initial sample knowledge content meet the preset convergence condition.

According to another aspect of the present disclosure, there is provided a method for outputting knowledge content, including: acquiring a knowledge content set; determining the weight corresponding to each knowledge content in the knowledge content set based on the weight generation model obtained by the data processing method; determining an output sequence based on the weight corresponding to each knowledge content; and outputting the knowledge contents based on the output sequence.

According to another aspect of the present disclosure, there is provided a data processing apparatus including: the sample acquiring unit is configured to acquire a sample knowledge content set and a sample labeling category corresponding to each sample knowledge content in the sample knowledge content set; an initial sample determination unit configured to determine initial sample knowledge content from a set of sample knowledge content; a model training unit configured to perform the following model training steps on a model to be trained: determining a prediction category corresponding to the knowledge content of the initial sample based on the knowledge content of the initial sample and the model to be trained; and determining the model to be trained as a weight generation model after training is finished in response to the fact that the prediction type and the sample marking type corresponding to the initial sample knowledge content meet the preset convergence condition.

According to another aspect of the present disclosure, there is provided an apparatus for outputting knowledge content, including: a content acquisition unit configured to acquire a knowledge content set; a weight determination unit configured to determine a weight corresponding to each knowledge content in the knowledge content set based on the weight generation model obtained by the data processing method; an order determination unit configured to determine an output order based on weights corresponding to the respective knowledge contents; a content output unit configured to output the respective knowledge contents based on the output order.

According to another aspect of the present disclosure, there is provided an electronic device including: one or more processors; a memory for storing one or more programs; when the one or more programs are executed by the one or more processors, the one or more processors implement the data processing method or the method for outputting knowledge content as any one of the above.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the data processing method or the method for outputting knowledge content as any one of the above.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a data processing method or a method for outputting knowledge content as any one of the above.

According to the technology disclosed by the invention, the data processing method and the method for outputting the knowledge content are provided, the intelligent degree of content display can be improved, and the user experience is improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;

FIG. 2 is a flow diagram for one embodiment of a data processing method according to the present disclosure;

FIG. 3 is a flow diagram of another embodiment of a data processing method according to the present disclosure;

FIG. 4 is a flow diagram of one embodiment of a method for outputting knowledge content in accordance with the present disclosure;

FIG. 5 is a schematic diagram of one application scenario of a method for outputting knowledge content according to the present disclosure;

FIG. 6 is a software interface schematic of an application software for a method for outputting knowledge content according to the present disclosure;

FIG. 7 is a schematic block diagram of one embodiment of a data processing apparatus according to the present disclosure;

FIG. 8 is a schematic block diagram illustrating one embodiment of an apparatus for outputting knowledge content according to the present disclosure;

fig. 9 is a block diagram of an electronic device for implementing the data processing method or the method for outputting knowledge content of the embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The

terminal devices

101, 102, 103 interact with a server 105 via a network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may obtain, through the network 104, each knowledge content returned by the server 105 and an output order corresponding to each knowledge content, and the

terminal devices

101, 102, 103 may output each knowledge content according to the output order corresponding to each knowledge content.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, a mobile phone, a computer, a tablet, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server that provides various services, for example, the server 105 may acquire a knowledge content set that needs to be output, determine a weight corresponding to each knowledge content in the knowledge content set based on a weight generation model, determine an output order based on the weight corresponding to each knowledge content, and transmit each knowledge content and the corresponding output order to the

terminal devices

101, 102, and 103 via the network 104 based on the output order, so that the

terminal devices

101, 102, and 103 output each knowledge content in the output order corresponding to each knowledge content.

The server 105 may further obtain, through the network 104, knowledge contents historically output by the

terminal devices

101, 102, and 103, determine a user click condition corresponding to each knowledge content historically output, obtain a sample knowledge content set and sample label categories corresponding to each sample knowledge in the sample knowledge content set based on the user click condition, train the model to be trained based on the sample data, and obtain the weight generation model.

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the data processing method or the method for outputting knowledge content provided by the embodiment of the present disclosure may be executed by the

terminal devices

101, 102, and 103, or may be executed by the server 105, and the data processing apparatus or the apparatus for outputting knowledge content may be provided in the

terminal devices

101, 102, and 103, or may be provided in the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of another embodiment of a data processing method according to the present disclosure is shown. The data processing method of the embodiment comprises the following steps:

step 201, a sample knowledge content set and a sample labeling category corresponding to each sample knowledge content in the sample knowledge content set are obtained.

In this embodiment, the knowledge content may include technical knowledge and post information that match the output object, which is not limited in this embodiment.

In order to determine the display sequence of the knowledge contents and preferentially display the knowledge contents which are more interested by the user, the executive body can be trained in advance to obtain a weight generation model for determining the weight of the knowledge contents and sequencing the contents to be displayed based on the weight of the knowledge contents.

In the model training stage, an execution subject needs to acquire a sample knowledge content set and sample labeling categories corresponding to each sample knowledge content in the sample knowledge content set, wherein the sample labeling categories comprise positive samples or negative samples, the positive samples are historical knowledge contents clicked and checked by a user, and the negative samples are the historical knowledge contents clicked and checked by the user.

In step 202, initial sample knowledge content is determined from the set of sample knowledge content.

In this embodiment, the execution subject may randomly determine the initial sample knowledge content from the sample knowledge content set, and is used for the model to be trained to classify the initial sample knowledge content and perform model training.

Step 203, executing the following model training steps on the model to be trained: determining a prediction category corresponding to the knowledge content of the initial sample based on the knowledge content of the initial sample and the model to be trained; and determining the model to be trained as a weight generation model after training is finished in response to the fact that the prediction type and the sample marking type corresponding to the initial sample knowledge content meet the preset convergence condition.

In this embodiment, the model to be trained may be a classification model, where the classes include a class predicted to be clicked by the user and a class predicted not to be clicked by the user. The execution subject can input the knowledge content of the initial sample into the model to be trained to obtain the prediction category output by the model to be trained and corresponding to the knowledge content of the initial sample. The prediction category may be a probability that the initial sample knowledge content is a category that the user may click, and the prediction category may be a numerical value greater than or equal to zero and less than or equal to one, and the closer to one, the closer to the category the user may click the prediction category is. And comparing the prediction class to a sample annotation class, wherein the sample annotation class can be one for positive samples and zero for negative samples. And if the preset convergence condition is met, the difference between the prediction category and the sample labeling category is small, the training is determined to be finished, and the model to be trained is determined to be a weight generation model after the training is finished.

According to the data processing method provided by the embodiment of the disclosure, the weight generation model for predicting the weight of the knowledge content can be obtained, the knowledge content is output based on the weight generation model, and the intelligent degree of knowledge content display is improved.

With continued reference to FIG. 3, a flow 300 of another embodiment of a data processing method according to the present disclosure is shown. As shown in fig. 3, the data processing method of the present embodiment may include the following steps:

step 301, based on the preset data mining dimension, positive sample knowledge content and negative sample knowledge content are determined from a preset knowledge content base, and a sample knowledge content set is obtained.

In this embodiment, the sample labeling category includes a positive sample or a negative sample.

In this embodiment, the execution subject may perform data mining on knowledge-related data based on preset data mining dimensions to obtain sample knowledge data of different dimensions, and train the model to be trained based on the sample knowledge data, so that the trained weight generation model can generate weights of knowledge contents from different dimensions, thereby improving the determination diversity of the weights of the knowledge contents.

Optionally, after the cold start, the execution subject may obtain click data of each knowledge content of the real output object, update the sample knowledge data, and update the weight generation model.

In some optional implementations of this embodiment, the preset data mining dimension includes at least one of: information dimension, non-information dimension, output object dimension, skill tag dimension.

In this implementation manner, the information knowledge data may be work information corresponding to the output object, the non-information knowledge data may be technical knowledge corresponding to the output object, the output object data may be work attribute data corresponding to the output object, and the skill tag data may be a skill tag of each output object.

In other optional implementations of this embodiment, determining the positive sample knowledge content and the negative sample knowledge content from a preset knowledge content library based on a preset data mining dimension includes: determining a target information theme based on the information dimension; determining positive sample knowledge content from a preset knowledge content base based on information data matched with a target information theme; and determining the negative sample knowledge content based on the non-information data and the information data which is not matched with the target information theme from a preset knowledge content base.

In this implementation, for each output object, the executing agent may determine the department in which the output object is located; determining a target information theme corresponding to the department; determining corresponding candidate information knowledge data from a preset information knowledge base based on the target information theme; determining positive sample information knowledge data from the candidate information knowledge data; and selecting candidate information data corresponding to other subjects as first negative sample information data, and selecting non-information data as second negative sample information data. Therefore, for the positive and negative sample construction of the information knowledge data, the topic matching corresponding to the department and whether the information knowledge data is the information data are considered, and therefore the determination accuracy of the information knowledge data is improved.

In other optional implementations of this embodiment, determining the positive sample knowledge content and the negative sample knowledge content from a preset knowledge content library based on a preset data mining dimension includes: determining a target non-information theme based on the non-information dimensions; determining positive sample knowledge content based on non-information data matched with a target non-information theme from a preset knowledge content base; and determining the negative sample knowledge content based on the information data and the non-information data which is not matched with the target non-information theme from a preset knowledge content base.

In this implementation, for each output object, the executing agent may determine the department in which the output object is located; determining a target non-information theme corresponding to the department; determining corresponding candidate non-information knowledge data from a preset non-information knowledge base based on the target non-information theme; determining positive sample non-informative knowledge data from the candidate non-informative knowledge data; and selecting candidate non-information data corresponding to other subjects as first negative sample non-information data, and selecting information data as second negative sample non-information data. Therefore, for the positive and negative sample construction of the non-information knowledge data, two aspects of theme matching corresponding to departments and whether the non-information knowledge data is the information data are considered, and therefore the determination accuracy of the non-information knowledge data is improved.

In other optional implementations of this embodiment, determining the positive sample knowledge content and the negative sample knowledge content from a preset knowledge content library based on a preset data mining dimension includes: determining at least one output object class based on the output object dimensions; and determining positive sample knowledge content and negative sample knowledge content corresponding to each output object category from a preset knowledge content base.

In this implementation, the execution body obtains different output object categories; for each output object class, corresponding output object positive sample data and output object negative sample data are determined based on the class matching.

In other optional implementations of this embodiment, determining the positive sample knowledge content and the negative sample knowledge content from a preset knowledge content library based on a preset data mining dimension includes: determining each initial skill tag based on the skill tag dimensions; clustering each initial skill label to obtain each clustered target skill label; and determining positive sample knowledge content and negative sample knowledge content corresponding to each target skill tag from a preset knowledge content base.

In this implementation, the execution subject may determine, from a preset knowledge content library, positive sample knowledge content and negative sample knowledge content corresponding to each target skill tag based on the character string matching.

Step 302, determine initial sample knowledge content from a set of sample knowledge content.

Step 303, executing the following model training steps on the model to be trained: determining a prediction category corresponding to the knowledge content of the initial sample based on the knowledge content of the initial sample and the model to be trained; and determining the model to be trained as a weight generation model after training is finished in response to the fact that the prediction type and the sample marking type corresponding to the initial sample knowledge content meet the preset convergence condition.

In this embodiment, for the detailed description of step 302 to step 303, please refer to the detailed description of step 202 to step 203, which is not repeated herein.

And 304, in response to the fact that the prediction category and the sample marking category corresponding to the initial sample knowledge content do not meet the preset convergence condition, re-determining the initial sample knowledge content from the sample knowledge content set, and repeatedly executing the model training step.

In this embodiment, if the preset convergence condition is not satisfied, which indicates that the difference between the prediction category and the sample labeling category is large, the initial sample knowledge content is determined again for model training.

According to the data processing method provided by the embodiment of the disclosure, the weight generation model considers the targeted processing in the cold start stage, and the precision of weight generation can be improved. And the sample data is determined from a plurality of data mining dimensions, so that the precision of weight generation can be further improved.

With continued reference to FIG. 4, a flow 400 of one embodiment of a method for outputting knowledge content in accordance with the present disclosure is shown. The data processing method of the embodiment comprises the following steps:

step 401, acquiring a knowledge content set.

In this embodiment, the executing entity (such as the

terminal devices

101, 102, 103 or the server 105 in fig. 1) may obtain the knowledge content set that needs to be personalized from the locally stored or pre-connected electronic device. Wherein the knowledge content set comprises at least one knowledge content.

Step 402, determining the weight corresponding to each knowledge content in the knowledge content set based on the weight generation model obtained by the data processing method.

In this embodiment, the weight generation model trained in advance is used to generate the weight of each knowledge content in the knowledge content set. The weight generation model trained in advance can be determined and obtained based on the click condition of each user on each historical knowledge content, and the future click condition is predicted based on the historical actual click condition. The larger the weight is, the higher the probability that the user clicks the knowledge content is, that is, the higher the importance degree of the knowledge content is, at this time, the earlier the output order of the knowledge content can be set, so that the output parameters such as the output order of each knowledge content are determined based on the weight corresponding to each knowledge content, and the targeted output of the knowledge content is realized. The output parameter may include, but is not limited to, an output sequence, an output pattern, an output time, and the like, which is not limited in this embodiment.

In step 403, an output order is determined based on the weights corresponding to the respective knowledge contents.

In this embodiment, after obtaining the target weights corresponding to the knowledge contents, the executive body may set the output order of the knowledge contents from front to back according to the order of the target weights from large to small.

And step 404, outputting the knowledge contents based on the output sequence.

In this embodiment, the executing subject may output the knowledge content to at least one object in a targeted manner; and determining the weight corresponding to each knowledge content in the knowledge content set based on the weight generation model trained in advance, wherein the determining may include: for each object, determining the weight of each knowledge content in the knowledge content set relative to the object based on the object information of the object and a weight generation model trained in advance; and outputting the knowledge contents based on the weight of the knowledge contents relative to the object.

With continued reference to fig. 5, a schematic diagram of one application scenario of the method for outputting knowledge content according to the present disclosure is shown. In the application scenario of fig. 5, for the knowledge content set 501, the weight of each knowledge content corresponding to the knowledge content set 501 may be determined, and the output order of each knowledge content may be determined based on the weight of each knowledge content. And, the execution subject may output the knowledge contents according to the output order of the knowledge contents, resulting in the output interface 502. As shown in the output interface 502, the knowledge content 1, the knowledge content 2, the knowledge content 3 and the knowledge content 4 are displayed in turn according to the output sequence.

Referring to fig. 6 together, fig. 6 is a software interface diagram of an application software of the method for outputting knowledge content according to the present disclosure, as shown in fig. 6, the knowledge content set may include various knowledge contents to be pushed to the user, which may include "Q3 moving advertisement traffic observation white paper in 2021", and the like. The execution subject can determine the weight magnitude corresponding to the knowledge content, and determine the output order based on the weight magnitude.

The method for outputting the knowledge content provided by the above embodiment of the disclosure can improve the intelligent degree of the knowledge content output.

With further reference to fig. 7, as an implementation of the methods shown in the above-mentioned figures, the present disclosure provides an embodiment of a data processing apparatus, which corresponds to the method embodiment shown in fig. 2, and which may be specifically applied in a terminal device or a server.

As shown in fig. 7, the data processing apparatus 700 of the present embodiment includes: a sample acquisition unit 701, an initial sample determination unit 702, and a model training unit 703.

The sample acquiring unit 701 is configured to acquire the sample knowledge content set and a sample label category corresponding to each sample knowledge content in the sample knowledge content set.

An initial sample determination unit 702 configured to determine initial sample knowledge content from a set of sample knowledge content.

A model training unit 703 configured to perform the following model training steps on the model to be trained: determining a prediction category corresponding to the knowledge content of the initial sample based on the knowledge content of the initial sample and the model to be trained; and determining the model to be trained as a weight generation model after training is finished in response to the fact that the prediction type and the sample marking type corresponding to the initial sample knowledge content meet the preset convergence condition.

In some optional implementations of this embodiment, the model training unit 703 is further configured to: and in response to the fact that the prediction category and the sample marking category corresponding to the initial sample knowledge content do not meet the preset convergence condition, re-determining the initial sample knowledge content from the sample knowledge content set, and repeatedly executing the model training step.

In some optional implementations of this embodiment, the sample annotation category includes a positive sample or a negative sample; and, the sample acquiring unit 701 is further configured to: and determining the positive sample knowledge content and the negative sample knowledge content from a preset knowledge content base based on the preset data mining dimension.

In some optional implementations of this embodiment, the sample acquiring unit 701 is further configured to: determining a target information theme based on the information dimension; determining positive sample knowledge content from a preset knowledge content base based on information data matched with a target information theme; and determining the negative sample knowledge content based on the non-information data and the information data which is not matched with the target information theme from a preset knowledge content base.

In some optional implementations of this embodiment, the sample acquiring unit 701 is further configured to: determining a target non-information theme based on the non-information dimensions; determining positive sample knowledge content based on non-information data matched with a target non-information theme from a preset knowledge content base; and determining the negative sample knowledge content based on the information data and the non-information data which is not matched with the target non-information theme from a preset knowledge content base.

In some optional implementations of this embodiment, the sample acquiring unit 701 is further configured to: determining at least one output object class based on the output object dimensions; and determining positive sample knowledge content and negative sample knowledge content corresponding to each output object category from a preset knowledge content base.

In some optional implementations of this embodiment, the sample acquiring unit 701 is further configured to: determining each initial skill tag based on the skill tag dimensions; clustering each initial skill label to obtain each clustered target skill label; and determining positive sample knowledge content and negative sample knowledge content corresponding to each target skill tag from a preset knowledge content base.

It should be understood that the units 701 to 703 recited in the data processing apparatus 700 correspond to the respective steps in the method described with reference to fig. 2, respectively. Thus, the operations and features described above for the data processing method are equally applicable to the apparatus 700 and the units included therein and will not be described again here.

With further reference to fig. 8, as an implementation of the methods shown in the above-mentioned figures, the present disclosure provides an embodiment of an apparatus for outputting knowledge content, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 4, and the apparatus may be specifically applied to a terminal device or a server.

As shown in fig. 8, the apparatus 800 for outputting knowledge content of the present embodiment includes: a content acquisition unit 801, a weight determination unit 802, an order determination unit 803, and a content output unit 804.

A content acquisition unit 801 configured to acquire a knowledge content set.

The weight determination unit 802 is configured to determine a weight corresponding to each knowledge content in the knowledge content set based on the weight generation model obtained by the data processing method described above.

An order determination unit 803 configured to determine an output order based on the weights corresponding to the respective knowledge contents.

A content output unit 804 configured to output the respective knowledge contents based on the output order.

It should be understood that units 801 to 804 recited in the apparatus 800 for outputting knowledge content correspond to respective steps in the method described with reference to fig. 4. Thus, the operations and features described above for the method for outputting knowledge content are equally applicable to the apparatus 800 and the units included therein, and are not described in detail here.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 9 illustrates a schematic block diagram of an example electronic device 900 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM903, various programs and data required for the operation of the device 900 can also be stored. The calculation unit 901, ROM 902, and RAM903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

A number of components in the device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs the respective methods and processes described above, such as a data processing method or a method for outputting knowledge content. For example, in some embodiments, the data processing method or the method for outputting knowledge content may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 900 via ROM 902 and/or communications unit 909. When the computer program is loaded into the RAM903 and executed by the computing unit 901, one or more steps of the data processing method described above or the method for outputting knowledge content may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured by any other suitable means (e.g., by means of firmware) to perform a data processing method or a method for outputting knowledge content.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of data processing, comprising:

acquiring a sample knowledge content set and sample labeling categories corresponding to all sample knowledge contents in the sample knowledge content set;

determining initial sample knowledge content from the set of sample knowledge content;

the following model training steps are performed on the model to be trained: determining a prediction category corresponding to the initial sample knowledge content based on the initial sample knowledge content and the model to be trained; and determining the model to be trained as a weight generation model after training is completed in response to the fact that the prediction type and the sample labeling type corresponding to the initial sample knowledge content meet the preset convergence condition.

2. The method of claim 1, further comprising:

and in response to determining that the prediction category and the sample annotation category corresponding to the initial sample knowledge content do not satisfy the preset convergence condition, re-determining the initial sample knowledge content from the sample knowledge content set, and repeatedly performing the model training step.

3. The method of claim 1, wherein the sample annotation category comprises a positive sample or a negative sample; and

the acquiring of the sample knowledge content set and the sample labeling categories corresponding to the sample knowledge contents in the sample knowledge content set include:

and determining the positive sample knowledge content and the negative sample knowledge content from a preset knowledge content base based on the preset data mining dimension.

4. The method of claim 3, wherein the preset data mining dimensions include at least one of: information dimension, non-information dimension, output object dimension, skill tag dimension.

5. The method of claim 4, wherein the determining positive sample knowledge content and negative sample knowledge content from a preset knowledge content base based on preset data mining dimensions comprises:

determining a target information theme based on the information dimension;

determining positive sample knowledge content from the preset knowledge content base based on the information data matched with the target information subject;

and determining negative sample knowledge content from the preset knowledge content base based on non-information data and information data which is not matched with the target information theme.

6. The method of claim 4, wherein the determining positive sample knowledge content and negative sample knowledge content from a preset knowledge content base based on preset data mining dimensions comprises:

determining a target non-informational subject based on the non-informational dimensions;

determining positive sample knowledge content from the preset knowledge content base based on non-information data matched with the target non-information subject;

and determining negative sample knowledge content from the preset knowledge content base based on the information data and the non-information data which is not matched with the target non-information subject.

7. The method of claim 4, wherein the determining positive sample knowledge content and negative sample knowledge content from a preset knowledge content base based on preset data mining dimensions comprises:

determining at least one output object class based on the output object dimension;

and determining the positive sample knowledge content and the negative sample knowledge content corresponding to each output object category from the preset knowledge content library for each output object category.

8. The method of claim 4, wherein the determining positive sample knowledge content and negative sample knowledge content from a preset knowledge content base based on preset data mining dimensions comprises:

determining each initial skill tag based on the skill tag dimensions;

clustering each initial skill label to obtain each clustered target skill label;

and determining positive sample knowledge content and negative sample knowledge content corresponding to each target skill tag from the preset knowledge content base.

9. A method for outputting knowledge content, comprising:

acquiring a knowledge content set;

determining a weight corresponding to each knowledge content in the knowledge content set based on a weight generation model obtained by the data processing method of any one of claims 1 to 8;

determining an output sequence based on the weight corresponding to each knowledge content;

and outputting the knowledge contents based on the output sequence.

10. A data processing apparatus comprising:

the system comprises a sample acquisition unit, a storage unit and a display unit, wherein the sample acquisition unit is configured to acquire a sample knowledge content set and a sample labeling category corresponding to each sample knowledge content in the sample knowledge content set;

an initial sample determination unit configured to determine initial sample knowledge content from the set of sample knowledge content;

a model training unit configured to perform the following model training steps on a model to be trained: determining a prediction category corresponding to the initial sample knowledge content based on the initial sample knowledge content and the model to be trained; and determining the model to be trained as a weight generation model after training is completed in response to the fact that the prediction type and the sample labeling type corresponding to the initial sample knowledge content meet the preset convergence condition.

11. The apparatus of claim 10, wherein the model training unit is further configured to:

12. The apparatus of claim 10, wherein the sample annotation category comprises a positive sample or a negative sample; and

the sample acquisition unit is further configured to:

13. The apparatus of claim 12, wherein the preset data mining dimensions comprise at least one of: information dimension, non-information dimension, output object dimension, skill tag dimension.

14. The apparatus of claim 13, wherein the sample acquisition unit is further configured to:

determining a target information theme based on the information dimension;

15. The apparatus of claim 13, wherein the sample acquisition unit is further configured to:

16. The apparatus of claim 13, wherein the sample acquisition unit is further configured to:

17. The apparatus of claim 13, wherein the sample acquisition unit is further configured to:

determining each initial skill tag based on the skill tag dimensions;

18. An apparatus for outputting knowledge content, comprising:

a content acquisition unit configured to acquire a knowledge content set;

a weight determination unit configured to determine a weight corresponding to each knowledge content in the knowledge content set based on a weight generation model obtained by the data processing method according to any one of claims 1 to 8;

an order determination unit configured to determine an output order based on weights corresponding to the respective knowledge contents;

a content output unit configured to output the respective knowledge contents based on the output order.

19. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.

20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-9.

21. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-9.