CN110598011B

CN110598011B - Data processing method, device, computer equipment and readable storage medium

Info

Publication number: CN110598011B
Application number: CN201910925299.4A
Authority: CN
Inventors: 余志伟
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-09-27
Filing date: 2019-09-27
Publication date: 2024-05-28
Anticipated expiration: 2039-09-27
Also published as: CN110598011A

Abstract

The embodiment of the application discloses a data processing method, a device, computer equipment and a readable storage medium, wherein the method comprises the following steps: acquiring a plurality of initial characters of a historical release text corresponding to a text object in a text resource platform; determining a text type label corresponding to the text object according to word frequency and inverse document frequency corresponding to each initial character in the historical release text; acquiring a classification model based on an audio and video resource platform; determining a similar audio/video type label corresponding to the text type label based on label similarity between the text type label and the plurality of audio/video type labels; generating input features according to the text names, text type labels and similar audio and video type labels corresponding to the text objects; and determining an audio and video recommended object corresponding to the target user based on the target audio and video type label corresponding to the input feature output by the classification model. By adopting the embodiment of the application, the diversity of the recommendation modes can be improved, and the accuracy of the recommendation data can be improved.

Description

Data processing method, device, computer equipment and readable storage medium

Technical Field

The present application relates to the field of internet technologies, and in particular, to a data processing method, a data processing device, a computer device, and a readable storage medium.

Background

Along with the development of data informatization, the data volume is rapidly increased, and big data has a trend of diversification and dispersion. In the context of large-scale video data, the user's personal preferences may be determined by the user's portraits, thereby selecting video data from a large volume of video data that matches the user's preferences. However, for a cold start user, it is often not possible to determine what video data should be recommended for the cold start user since no video viewing record is produced.

In the prior art, video data can be recommended to a cold start user according to factors such as the attention degree, the propagation degree and whether the video is hot or not. Because the video data recommended to the cold start user is recommended only based on the information of the video itself, the recommending mode is too single, and the recommended video data may not match with the interest points of the cold start user, so that the accuracy of the video data recommended to the user is too low.

Disclosure of Invention

The embodiment of the application provides a data processing method, a data processing device, computer equipment and a readable storage medium, which can improve the diversity of recommendation modes and improve the accuracy of recommendation data.

In one aspect, an embodiment of the present application provides a data processing method, including:

acquiring a text object associated with a target user from a text resource platform, and acquiring a plurality of initial characters in a historical release text corresponding to the text object;

Acquiring word frequency and inverse document frequency corresponding to each initial character according to the historical release text, and selecting a text type label corresponding to the text object from the plurality of initial characters according to the word frequency and the inverse document frequency;

Acquiring a classification model based on an audio and video resource platform; the classification model is obtained by training based on audio and video titles and audio and video type labels corresponding to a plurality of audio and video objects in the audio and video resource platform;

Acquiring the label similarity between the text type label and a plurality of audio and video type labels, and determining similar audio and video type labels corresponding to the text type label from the plurality of audio and video type labels based on the label similarity;

generating input features according to the text names, the text type labels and the similar audio and video type labels corresponding to the text objects;

Inputting the input features into the classification model, outputting target audio and video type labels corresponding to the input features based on the classification model, and determining audio and video recommended objects corresponding to the target users according to the target audio and video type labels; the audio and video recommended object is an audio and video object in the audio and video resource platform.

The obtaining the plurality of initial characters in the historical release text corresponding to the text object includes:

acquiring a historical release text associated with the text object in a target time range;

and screening the historical release texts based on the disabling word stock, and dividing the screened historical release texts into the plurality of initial characters.

The step of obtaining word frequency and inverse document frequency corresponding to each initial character according to the historical published text, and selecting text type labels corresponding to the text objects from the plurality of initial characters according to the word frequency and the inverse document frequency comprises the following steps:

Respectively counting the unit number of each initial character in the historical release text, and determining word frequency respectively corresponding to each initial character based on the unit number and the total number of characters corresponding to the historical release text;

determining the number of documents corresponding to each initial character from a corpus, and determining the inverse document frequency corresponding to each initial character based on the number of documents and the total number of documents in the corpus;

and determining a weight value corresponding to each initial character according to the word frequency and the inverse document frequency, and selecting a text type label corresponding to the text object from the plurality of initial characters based on the weight value.

The obtaining the tag similarity between the text type tag and a plurality of audio/video type tags, and determining the similar audio/video type tag corresponding to the text type tag from the plurality of audio/video type tags based on the tag similarity includes:

Dividing the text type tag and the audio/video type tags into a plurality of unit characters respectively, and converting each unit character into a unit character vector;

Generating a first vector based on the unit character vector corresponding to the text type tag, and generating a second vector corresponding to each audio-video type tag based on the unit character vector corresponding to each audio-video type tag;

and determining the label similarity between the first vector and each second vector, and determining the audio-video type label corresponding to the second vector with the largest label similarity as the similar audio-video type label corresponding to the text type label.

The generating input features according to the text name, the text type tag and the similar audio and video type tag corresponding to the text object includes:

acquiring the text name corresponding to the text object, and generating a third vector according to unit character vectors respectively corresponding to a plurality of unit characters contained in the text name;

And splicing the first vector, the second vector corresponding to the similar audio and video type tag and the third vector into the input feature.

The step of inputting the input features into the classification model, outputting target audio and video type labels corresponding to the input features based on the classification model, and determining the audio and video recommended objects corresponding to the target users according to the target audio and video type labels comprises the following steps:

inputting the input features into the classification model, and generating attribute feature vectors corresponding to the input features in the classification model;

Acquiring the matching degree between the attribute feature vector and various attribute type labels in the classification model, and determining the attribute type label corresponding to the maximum matching degree as a target audio/video type label associated with the text object;

and determining the audio and video recommended object corresponding to the target user according to the target audio and video type tag.

The determining the audio and video recommended object corresponding to the target user according to the target audio and video type tag includes:

Determining target user portraits of the target users aiming at the audio and video resource platform according to the target audio and video type labels;

And determining an audio and video object matched with the target user portrait from the audio and video resource platform as an audio and video recommended object corresponding to the target user.

The determining, from the audio and video resource platform, the audio and video object matching the target user portrait as the audio and video recommended object corresponding to the target user includes:

Based on a recommendation model in the audio and video resource platform, obtaining user similarity between the target user portrait and the sample user portrait; the sample user portrait is a user portrait corresponding to a registered user in the audio and video resource platform;

determining the sample user portrait with the maximum user similarity as a similar user portrait corresponding to the target user portrait;

And acquiring a historical audio and video object of the registered user corresponding to the similar user portrait from the audio and video resource platform, and determining the audio and video recommended object corresponding to the target user from the historical audio and video object.

Wherein the method further comprises:

Acquiring a sample audio and video object from the audio and video resource platform;

acquiring sample tag information corresponding to the sample audio/video object; the sample tag information is used for marking the attribute type of the sample audio/video object;

And training the classification model according to the mapping relation between the sample audio/video object and the sample label information.

The obtaining sample tag information corresponding to the sample audio/video object includes:

Acquiring a sample name and a sample type corresponding to the sample audio and video object;

And carrying out semantic analysis on the sample name and the sample type, and setting the sample label information for the sample audio/video object based on a semantic analysis result.

The training the classification model according to the mapping relation between the sample audio/video object and the sample label information comprises the following steps:

Generating a sample input feature corresponding to the sample audio-video object based on the sample name and the sample type;

Inputting the sample input features into the classification model, and acquiring sample feature vectors corresponding to the sample input features in the classification model;

And training the classification model based on errors between the sample feature vectors and feature vectors corresponding to the sample label information.

In one aspect, an embodiment of the present application provides a data processing apparatus, including:

the first acquisition module is used for acquiring a text object associated with a target user from the text resource platform and acquiring a plurality of initial characters in a historical release text corresponding to the text object;

the tag selection module is used for acquiring word frequency and inverse document frequency corresponding to each initial character according to the historical release text, and selecting text type tags corresponding to the text objects from the plurality of initial characters according to the word frequency and the inverse document frequency;

The second acquisition module is used for acquiring a classification model based on the audio and video resource platform; the classification model is obtained by training based on audio and video titles and audio and video type labels corresponding to a plurality of audio and video objects in the audio and video resource platform;

The similar tag determining module is used for obtaining tag similarity between the text type tag and a plurality of audio/video type tags, and determining similar audio/video type tags corresponding to the text type tag from the plurality of audio/video type tags based on the tag similarity;

The generation module is used for generating input characteristics according to the text names, the text type labels and the similar audio and video type labels corresponding to the text objects;

The determining module is used for inputting the input features into the classification model, outputting target audio and video type labels corresponding to the input features based on the classification model, and determining audio and video recommended objects corresponding to the target users according to the target audio and video type labels; the audio and video recommended object is an audio and video object in the audio and video resource platform.

Wherein, the first acquisition module includes:

The historical text acquisition unit is used for acquiring a historical release text associated with the text object in a target time range;

and the screening unit is used for screening the historical release texts based on the deactivated word stock and dividing the screened historical release texts into the plurality of initial characters.

Wherein, the label selection module includes:

The first statistics unit is used for respectively counting the unit number of each initial character in the historical release text, and determining word frequency corresponding to each initial character based on the unit number and the total number of characters corresponding to the historical release text;

The second statistical unit is used for determining the number of the documents corresponding to each initial character from a corpus, and determining the inverse document frequency corresponding to each initial character based on the number of the documents and the total number of the documents in the corpus;

and the weight value determining unit is used for determining weight values respectively corresponding to the initial characters according to the word frequency and the inverse document frequency, and selecting text type labels corresponding to the text objects from the initial characters based on the weight values.

Wherein, the similar label determining module includes:

The conversion unit is used for dividing the text type tag and the audio/video type tags into a plurality of unit characters respectively and converting each unit character into a unit character vector;

The vector generation unit is used for generating a first vector based on the unit character vector corresponding to the text type label and generating a second vector corresponding to each audio and video type label based on the unit character vector corresponding to each audio and video type label;

And the label similarity determining unit is used for determining the label similarity between the first vector and each second vector, and determining the audio-video type label corresponding to the second vector with the largest label similarity as the similar audio-video type label corresponding to the text type label.

Wherein, the generating module includes:

the name acquisition unit is used for acquiring the text name corresponding to the text object and generating a third vector according to unit character vectors respectively corresponding to a plurality of unit characters contained in the text name;

and the vector splicing unit is used for splicing the first vector, the second vector corresponding to the similar audio/video type tag and the third vector into the input feature.

Wherein the determining module comprises:

the input unit is used for inputting the input features into the classification model, and generating attribute feature vectors corresponding to the input features in the classification model;

The matching degree determining unit is used for obtaining the matching degree between the attribute feature vector and various attribute type labels in the classification model, and determining an attribute type label corresponding to the maximum matching degree as a target audio/video type label associated with the text object;

And the recommended object determining unit is used for determining the audio and video recommended object corresponding to the target user according to the target audio and video type tag.

Wherein the recommended object determination unit includes:

The user portrait determining subunit is used for determining target user portraits of the target users aiming at the audio and video resource platform according to the target audio and video type labels;

And the matching subunit is used for determining a resource object matched with the target user portrait from the audio and video resource platform and taking the resource object as an audio and video recommended object corresponding to the target user.

Wherein the matching subunit comprises:

the user similarity obtaining subunit is used for obtaining the user similarity between the target user portrait and the sample user portrait based on the recommendation model in the audio and video resource platform; the sample user portrait is a user portrait corresponding to a registered user in the audio and video resource platform;

A similar user determining subunit, configured to determine a sample user portrait with the greatest user similarity as a similar user portrait corresponding to the target user portrait;

And the historical object acquisition subunit is used for acquiring the historical audio and video objects of the registered users corresponding to the similar user portrait in the audio and video resource platform, and determining the audio and video recommended objects corresponding to the target users from the historical audio and video objects.

Wherein the apparatus further comprises:

the sample object acquisition module is used for acquiring a sample audio and video object from the audio and video resource platform;

The sample tag acquisition module is used for acquiring sample tag information corresponding to the sample audio/video object; the sample tag information is used for marking the attribute type of the sample audio/video object;

and the training module is used for training the classification model according to the mapping relation between the sample audio/video object and the sample label information.

Wherein, the sample tag acquisition module includes:

the information acquisition unit is used for acquiring a sample name and a sample type corresponding to the sample audio/video object;

the sample label setting unit is used for carrying out semantic analysis on the sample name and the sample type, and setting the sample label information for the sample audio/video object based on a semantic analysis result.

Wherein, training module includes:

the sample characteristic generating unit is used for generating sample input characteristics corresponding to the sample audio/video objects based on the sample names and the sample types;

The sample input unit is used for inputting the sample input features to the classification model, and obtaining sample feature vectors corresponding to the sample input features in the classification model;

and the model training unit is used for training the classification model based on the error between the sample characteristic vector and the characteristic vector corresponding to the sample label information.

An aspect of an embodiment of the present application provides a computer device, including a processor and a memory, where the memory stores a computer program, where the computer program, when executed by the processor, causes the processor to perform a method as in an aspect of an embodiment of the present application.

An aspect of an embodiment of the present application provides a computer-readable storage medium storing a computer program comprising program instructions which, when executed by a processor, perform a method as in an aspect of an embodiment of the present application.

According to the embodiment of the application, the text object associated with the target user can be obtained from the text resource platform, a plurality of initial characters in the historical release text corresponding to the text object are obtained, the text type label corresponding to the text object is selected from the plurality of initial characters according to the word frequency and the inverse document frequency corresponding to each initial character in the historical release text, further, the similar audio/video type label corresponding to the text object is determined according to the label similarity between the text type label and the audio/video type label, the input feature is generated according to the text name corresponding to the text object, the text type label and the similar audio/video type label, the input feature is input into the classification model aiming at the audio/video resource platform, and the audio/video recommended object of the target user in the audio/video resource platform can be determined. Therefore, through the text names and the text types corresponding to the resource objects in the rest of the resource platforms (namely the text resource platforms), the corresponding audio and video recommended objects can be determined for the user in the audio and video resource platform, the diversity of the recommending mode can be improved, and the accuracy of the recommended data is improved.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of a network architecture according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a data processing scenario provided in an embodiment of the present application;

FIG. 3 is a schematic flow chart of a data processing method according to an embodiment of the present application;

FIGS. 4a and 4b are schematic diagrams illustrating a process for constructing a video user representation according to an embodiment of the present application;

FIG. 5 is a flowchart of another data processing method according to an embodiment of the present application;

FIGS. 6a and 6b are schematic flow diagrams of a training classification model according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Fig. 1 is a schematic diagram of a network architecture according to an embodiment of the present application. The network architecture may include a server 10d and a plurality of terminal devices (including a terminal device 10a, a terminal device 10b, and a terminal device 10c in particular as shown in fig. 1), and the server 10d may perform data transmission with each terminal device through a network.

Taking the terminal device 10a as an example, when the terminal device 10a detects that the user is a cold start user in a certain audio/video resource platform (i.e. the user registers and logs in to the audio/video resource platform for the first time), the terminal device 10a may send user information and a cold start state of the user to the server 10d, after receiving data sent by the terminal device 10a, the server 10d may obtain text object information about the user from other text resource platforms (such as a public number information platform, a book information platform, etc.) based on the user information of the user, classify the text object information about the user, determine an audio/video object recommended by the user in the audio/video resource platform based on a classification result, and send the audio/video object recommended in the platform to the terminal device 10a, so that the terminal device 10a may display the recommended audio/video object in a terminal interface.

Of course, if the terminal device 10a integrates the data classification function, the terminal device 10a may directly obtain the text object information focused by the user in other text resource platforms, and determine the recommended audio and video object for the user in the audio and video resource platform based on the classification result of the focused text object information. The following will specifically describe how the terminal device 10a determines an audio/video recommended object of the cold start user in the audio/video resource platform. The terminal device 100a, the terminal device 100b, the terminal device 100c, and the like may include a mobile phone, a tablet computer, a notebook computer, a palm computer, a Mobile Internet Device (MID), a wearable device (e.g., a smart watch, a smart bracelet, and the like), and the like.

Fig. 2 is a schematic diagram of a data processing scenario provided in an embodiment of the present application. As shown in fig. 2, when a user registers and logs in to a video platform for the first time, the user needs to actively input user information on a filling information interface 20a provided by the video platform to complete account registration on the video platform, and the user information may include user account information, a user mobile phone number, a login password and the like. After the user completes registration and successfully logs in on the video platform, the terminal device 10a may detect that the user is a cold start user, and collect user information 20b of the user (the user information 20b may include information such as a registration account number and a registration mobile phone number of the user in the video platform). The terminal device 10a may obtain, based on the collected user information 20b, a text object 20d associated with the user information 20b from the resource platform 20c, the resource platform 20c may be a public number information platform, and the text object 20d may refer to a public number of interest of the user in the resource platform 20 c. It will be appreciated that the video platform and the resource platform 20c may refer to different functional modules in the same software application or may refer to different software applications.

After acquiring the text object 20d of the user in the resource platform 20c, that is, acquiring the public number focused by the user from the public number information platform, the terminal device 10a may acquire the text name 20e (e.g. the "eating world") corresponding to the text object 20d, and all the article contents published by the public number in the recent period (e.g. one week), and aggregate all the article contents as the historical published text 20q. The terminal device 10a may split the history release text 20q into a plurality of words (may also be referred to as initial characters), filter common words (such as "yes", "in", "and the like") among all the words, count the number of occurrences of each word among all the remaining words in the history release text 20q, and determine the word frequency corresponding to each word, respectively, based on the number of occurrences. The terminal device 10a may determine the weight value corresponding to each word according to the frequency of use and word frequency of each word in the daily language (the larger the weight value indicates the more important words in the history release text 20 q), rank all the words according to the weight value from large to small, use the first n (n is a positive integer, such as n=2) words with the largest weight value as the text type tag 20f of the text object 20d, and determine "snack" and "delicacy" as the text type tag 20f corresponding to the text object 20d if the weight value of "snack" and "delicacy" is the highest among all the words included in the history release text 20q.

The terminal device 10a may obtain a video type tag corresponding to the video object in the video platform (the video type tag may be a type tag set by an uploader for the video object when the video object is uploaded to the video platform, or a type tag set by a video platform operator for the video object), calculate tag similarity between the text type tag 20f and each video type tag in the video platform, and use a video type tag "food" with the maximum tag similarity as the similar video type tag 20r of the text object 20 d. The terminal device 10a may perform separation processing on the text name 20e, the text type tag 20f, and the similar video type tag 20r to obtain a unit character set 20g: "eat", "world", "snack", "delicacy", "food". Since the history release text 20q and the text name 20e are text of a chinese description, and there is no separator in a chinese sentence to separate words in the sentence, it is also necessary that the terminal device 10a performs word segmentation processing on the history release text 20q and the text name 20e, respectively, using a chinese word segmentation algorithm. The chinese word segmentation algorithm may be a dictionary-based word segmentation algorithm, a statistical-based word segmentation algorithm, or the like, which is not limited herein.

Taking a word segmentation algorithm based on statistics as an example, the text name 20e can be taken as input, a sequence string 'BEBE' consisting of 'BEMS' is output, and then the text name 20e is segmented based on the sequence string 'BEBE', so that a plurality of unit characters of the text name 20e are obtained: "eat goods", "world". Wherein B represents the initial word in the word, M represents the middle word in the word, E represents the end word in the word, and S represents the word formed by the single word.

Since the unit character set 20g is a text described in natural language, the terminal device 10a may convert each unit character in the unit character set 20g into a word vector that can be understood by a computer, that is, a numerical representation of the unit character, and convert each unit character into a vector representation of a fixed length based on word embedding (Word Embedding). The terminal device 10a may convert the unit character "eat" into a word vector 20h, the unit character "world" into a word vector 20i, the unit character "snack" into a word vector 20j, the unit character "delicious" into a word vector 20k, and the unit character "food" into a word vector 20l. Word vectors corresponding to each unit character in the unit character set 10d are spliced to form input features corresponding to the text object 20 d. The order in which word vectors are stitched may be in terms of the position of the unit characters in the text name 20e, text type label 20f, and similar video type label 20 r.

The terminal device 10a may obtain a classification model 20m, and the classification model 20m may identify an attribute type of the input feature for the video, that is, a video category, which may include: sports, food, fun, children, etc. The classification model 20m has been trained based on video data in a video platform and can be used for classification of data, and the classification model 20m can include convolutional neural networks (Convolutional Neural Network, CNN), machine learning classification algorithms (e.g., support vector machines, bayes, and random forests), and the like. Input data is input to the classification model 20m, based on the classification model 20m, an attribute feature vector corresponding to the input feature can be obtained, the classification model 20m can identify the matching degree between the attribute feature vector and the attribute types, and then the attribute type with the largest matching degree can be determined as a video category corresponding to the text object 20d, namely, the video category corresponding to the text object 20d is a food category. The terminal device 10a may determine recommended videos (such as the video 20h and the video 20 p) for the user from all the food videos of the video platform, and display the recommended videos 20h and the recommended videos 20p in the display page of the video platform, so that the user may watch the recommended videos 20h and the recommended videos 20 p. Based on the public number information focused by the user in the public number information platform, video data are recommended to the user in the video platform, so that the diversity of video recommendation modes can be improved, and meanwhile, the accuracy of video recommendation of the cold-start user can be improved.

Fig. 3 is a schematic flow chart of a data processing method according to an embodiment of the application. As shown in fig. 3, the data processing method may include the steps of:

step S101, acquiring a text object associated with a target user from a text resource platform, and acquiring a plurality of initial characters in a historical release text corresponding to the text object;

Specifically, the terminal device (corresponding to the terminal device 10a in the embodiment corresponding to fig. 2 described above) may obtain the text object (corresponding to the text object 20d in the embodiment corresponding to fig. 2 described above) associated with the target user from the text resource platform (corresponding to the resource platform 20c in the embodiment corresponding to fig. 2 described above). The text resource platform can be a public number information platform and/or a book information platform; the text object may specifically be a public number focused by the target user in the public number information platform, and/or a book focused in the book information platform, or the like.

The terminal equipment can acquire the historical release text associated with the text object in the target time range, screen the historical release text based on the disabling word stock, and divide the screened historical release text into a plurality of initial characters. The terminal equipment can acquire release contents of the text objects in the text resource platform, acquire release time corresponding to each release content respectively, and collect release contents with release time belonging to a target time range as historical release texts corresponding to the text objects. The target time range may be preset according to actual needs, for example, about 3 days, about one week, about one month, and the like, which is not particularly limited herein. Of course, the terminal device may also aggregate all the published contents of the text object in the text resource platform into a historical published text. For example, where the text object is a public number of interest to the target user in a public number information platform, the history of publication text may refer to all article content published by the public number in the last week.

After the terminal device obtains the historical release content, the sequence corresponding to the historical release content can be marked based on a hidden Markov model (Hidden Markov Model, HMM), the historical release text is segmented according to the marking sequence to obtain a plurality of characters, and then the plurality of characters contained in the historical release text are screened according to the deactivated word stock to obtain a plurality of initial characters corresponding to the historical release text. Of course, the terminal device may first screen the historical publication text based on the deactivated word stock, and then divide the screened historical publication text to obtain a plurality of initial characters. Wherein, the HMM can be described by a five-tuple: the method comprises the steps of observing a sequence, hiding the initial probability of states, and converting the probability (namely transition probability) among the hidden states, wherein the hidden states represent the probability (namely emission probability) of an observed value, and the initial probability, the transition probability and the emission probability can be obtained through large-scale corpus statistics. Starting from the initial state of the hidden state, calculating the probability of the next hidden state, sequentially calculating all the transition probabilities of the hidden states, and finally determining the hidden state sequence with the highest probability as a hidden sequence, namely a sequence labeling result (which can be called as a BEMS labeling sequence). For example, the history release text is "sports are happy, sports make me healthy", and sequence labeling results can be obtained based on the HMM: BESSBEBESSBE, since the sentence end can only be E or S, the word cutting mode is: BE/S/S/BE/BE/S/S/BE, and then the word-cutting mode of obtaining the history release text of 'sports are happy, sports make me healthy' is as follows: sports/make/me/happy, sports/make/me/healthy, the resulting plurality of characters are respectively: sports, make me, happy, sports, make me, healthy. The deactivated word stock comprises words which have no practical meaning in common words, such as words of 'yes', 'can', and the like, and the history release text is screened according to the deactivated word stock that 'sports are happy, and the sports enable me to be healthy', so that a plurality of initial characters corresponding to the history release text can be determined as follows: sports, happy, sports, health.

Step S102, acquiring word frequency and inverse document frequency corresponding to each initial character according to the historical release text, and selecting text type labels corresponding to the text objects from the plurality of initial characters according to the word frequency and the inverse document frequency;

Specifically, the terminal device may respectively count the number of units of each initial character in the historical release text, and determine the word frequency corresponding to each initial character based on the total number of characters corresponding to the number of units and the historical release text; determining the number of documents corresponding to each initial character from a corpus, and determining the inverse document frequency corresponding to each initial character based on the number of documents and the total number of documents in the corpus; and determining a weight value corresponding to each initial character according to the word frequency and the inverse document frequency, and selecting a text type label corresponding to the text object from a plurality of initial characters based on the weight values. The corpus can be used for simulating a language use environment, namely the corpus comprises common language documents such as text documents contained in domestic common websites.

In order to extract keywords from the history release text, the number of occurrences (i.e., the number of units) of each initial character in the history release text may be counted, and the ratio between the number of occurrences of each initial character in the history release text and the total number of characters contained in the history release text may be determined as the word frequency corresponding to each initial character, that is: TF _i＝n_i/N, wherein TF _i represents word frequency corresponding to the ith initial character in the historical release text, N _i represents occurrence frequency of the ith initial character in the historical release text, N represents total number of characters contained in the historical release text, and the larger the word frequency corresponding to the initial character is, the greater the possibility that the initial character is a keyword of the historical release text is indicated. In the use process of an actual language, the use frequency of common characters such as 'we' is higher, the occurrence frequency of the common characters such as 'we' in a historical release text is possibly the same as the occurrence frequency of a real keyword, even exceeds the occurrence frequency of the real keyword, and extraction of the keyword is not facilitated, therefore, a terminal device can acquire a corpus (which can be used for simulating the use environment of the language), count the number of documents corresponding to each initial character respectively from the corpus, and logarithm the ratio between the total number of documents in the corpus and the number of documents corresponding to each initial character respectively to obtain the inverse document frequency corresponding to each initial character respectively, namely: IDF _i＝(P/(P_i +1)), wherein IDF _i represents the inverse document frequency corresponding to the i-th initial character in the history published text, P represents the total number of documents in the corpus, P _i represents the number of documents containing the i-th initial character in the corpus, and the greater the number of documents corresponding to the initial character, the more common the initial character, and the lower the inverse document frequency corresponding to the initial character. The terminal device may determine the product of the word frequency and the inverse document frequency as a weight value corresponding to each initial character, sort all initial characters corresponding to the historical release text according to the size of the weight value, determine the first n (n is a positive integer, e.g., n=3) initial characters with the largest weight value as keywords of the historical release text, and use the extracted keywords as text type labels corresponding to the target objects.

Step S103, obtaining a classification model based on an audio and video resource platform; the classification model is obtained by training based on audio and video titles and audio and video type labels corresponding to a plurality of audio and video objects in the audio and video resource platform;

Specifically, the terminal device may acquire a classification model (corresponding to the classification model 20m in the embodiment corresponding to fig. 2) based on an audio and video resource platform, where the audio and video resource platform may include video data and music data; the classification model can be obtained by training based on video titles and video types corresponding to all video objects contained in the audio and video resource platform; the classification model can also be obtained by training based on an audio title and an audio type label corresponding to an audio object contained in the audio-video platform; the classification model may also be obtained by training based on an audio title and an audio type tag corresponding to an audio object in the audio-video platform, and a video title and a video type corresponding to a video object. It can be understood that the text resource platform and the audio/video resource platform may refer to different functional modules in the same software application, or may refer to different software applications; the target user may refer to a cold start user in the audio-video resource platform (a user who registers and logs in to the audio-video resource platform for the first time), a user who logs in to the audio-video resource platform a small number of times (e.g., a user who logs in to the audio-video resource platform a number of times no more than 3 in the last half year), or a user who does not log in to the audio-video resource platform for a long time (e.g., a user who logs in to the audio-video resource platform a last time before one year).

Step S104, obtaining the label similarity between the text type label and a plurality of audio and video type labels, and determining the similar audio and video type label corresponding to the text type label from the plurality of audio and video type labels based on the label similarity;

Specifically, the terminal device may obtain audio and video type tags corresponding to the plurality of audio and video objects from the audio and video platform, calculate tag similarity between the text type tag and each audio and video type tag, and further determine the audio and video tag type with the maximum similarity as a similar audio and video type tag corresponding to the text type tag. The label similarity calculation method may include euclidean distance (Eucledian Distance), manhattan distance (MANHATTAN DISTANCE), markov distance (Minkowski distance), cosine similarity (Cosine Similarity), pearson correlation coefficient (Pearson Correlation Coefficient), and the like, which are not limited herein. Optionally, because the data size of the audio and video type tags contained in the audio and video platform is larger, the audio and video type tags with the target number (i.e., a preset number threshold, such as 100) can be obtained from the audio and video platform.

The terminal equipment can divide the text type tag and the audio/video type tags into a plurality of unit characters respectively, and convert each unit character into a unit character vector; generating a first vector based on the unit character vector corresponding to the text type tag, and generating a second vector corresponding to each audio-video type tag based on the unit character vector corresponding to each audio-video type tag; and determining the label similarity between the first vector and each second vector, and determining the audio-video type label corresponding to the second vector with the largest label similarity as a similar audio-video type label corresponding to the text type label. The dividing process of the text type tag and the plurality of audio/video type tags may refer to the text segmentation method in step S101, and will not be described herein. The terminal device may find out a one-hot code (one-hot code) corresponding to each unit character from a character word bag, where the character word bag includes a text type tag and a series of unit characters in an audio/video type tag, and the one-hot code corresponding to each unit character respectively, where the one-hot code is a vector including only one 1 and the rest are vectors of 0. For example, the text type tag corresponds to the audio/video type tag with a plurality of unit characters respectively: when the character bag contains only the four unit characters, the single thermal code of the unit character "snack" in the character bag can be expressed as: [1, 0]; the one-hot code of the unit character "delicious" in the character bag of words can be expressed as: [0,1, 0]; the one-hot code of the unit character "motion" in the character bag of words can be expressed as: [0,1, 0]; the one-hot code of the unit character "dessert" in the character bag of words can be expressed as: [0,0,0,1]. The single thermal code corresponding to each unit character can be directly used as a word vector of each unit character, and also can be called a unit character vector, the single thermal codes corresponding to each unit character contained in the text type label are spliced to obtain a first vector, and the single thermal codes corresponding to the unit characters contained in each audio/video type label are spliced to obtain a second vector corresponding to each audio/video type label.

Alternatively, the single thermal code is directly used as the word vector representation of the unit characters, the relationship between each unit character (such as the semantic relationship in the text) cannot be distinguished, and when the character word bag contains many unit characters, the dimension of the word vector represented by the single thermal code is large, which is unfavorable for the operation in the subsequent classification model. Therefore, the terminal equipment can acquire a word vector conversion model, reduce the high-dimensional character single hot code into a low-dimensional word vector, multiply the input single hot code with the weight matrix based on the weight matrix corresponding to the hidden layer in the word vector conversion model, and the vector obtained after multiplication is the word vector corresponding to the unit character, namely the unit character vector. The word vector conversion model may be obtained by training word2vec (word vector conversion model) and GloVe (word embedding tool), the number of rows of the weight matrix is equal to the dimension of the character unicode, and the number of columns of the weight matrix is equal to the dimension of the word vector (i.e. unit character vector). For example, the size of the one-hot code corresponding to the unit character is: 1×100, the size of the weight matrix is: 100×10, the unit character vector has the following size: 1X 10.

After the first vector corresponding to the text type tag and the second vector corresponding to the audio/video type tag are generated, the similarity between the first vector and each second vector can be calculated based on a similarity calculation method, namely, the tag similarity between the text type tag and each audio/video type tag is calculated, and the audio/video type tag with the highest tag similarity is determined to be the similar audio/video type tag corresponding to the text type tag. If the label similarity between the text type label and the audio/video type label 1 is 0.01, the label similarity between the text type label and the audio/video type label 2 is 0.50, the label similarity between the text type label and the audio/video type label 3 is 0.95, and the label similarity between the text type label and the audio/video type label 4 is 0.33, the audio/video type label 3 can be determined as a similar audio/video type label corresponding to the text type label.

Step S105, generating input features according to the text names, the text type labels and the similar audio and video type labels corresponding to the text objects;

Specifically, the terminal device may determine the text name, the text type tag and the similar audio/video type tag corresponding to the text object as input information of the classification model, and before inputting the text name, the text type tag and the similar audio/video type tag into the classification model, the text name, the text type tag and the similar audio/video type tag need to generate input features. In other words, the terminal device may acquire a text name corresponding to the text object, divide the text name into a plurality of unit characters, and convert each unit character into a unit character vector; and splicing the plurality of unit character vectors corresponding to the text names into a third vector, and further splicing the first vector corresponding to the text type label, the second vector corresponding to the similar audio/video type label and the third vector into input features corresponding to the classification model.

It should be noted that, if there are multiple text objects, a text name, a text type tag, and a similar audio/video type tag corresponding to each text object may be obtained, and an input feature corresponding to each text object may be generated based on the text name, the text type tag, and the similar audio/video type tag. In other words, each text object may correspond to an input feature.

Step S106, inputting the input features into the classification model, outputting target audio and video type labels corresponding to the input features based on the classification model, and determining audio and video recommended objects corresponding to the target users according to the target audio and video type labels; the audio and video recommended object is an audio and video object in the audio and video resource platform.

Specifically, after the input features corresponding to the text objects are generated, the terminal device may input the input features to the classification model, generate attribute feature vectors corresponding to the input features in the classification model, obtain the matching degree between the attribute feature vectors and various attribute type labels in the classification model, determine the attribute type label corresponding to the maximum matching degree as a target audio/video type label associated with the text objects, and determine the audio/video recommended objects corresponding to the target users according to the target audio/video type label, where the audio/video recommended objects may refer to audio/video objects in an audio/video resource platform, and the audio/video objects may include audio objects and/or video objects. In other words, after the input features are input into the classification model, a plurality of classification results, namely, the matching degree between the attribute feature vectors corresponding to the input features and a plurality of attribute type labels, can be obtained; and taking the largest result in the multiple classification results as a final classification result corresponding to the input feature, wherein the attribute type label corresponding to the final classification result is the target audio and video type label corresponding to the text object. According to the target audio and video type tag, a resource object of the target audio and video type tag can be determined, and then a user portrait of the target user in the audio and video resource platform (which can be called as a target user portrait, and the target user portrait can include user information of the target user, user preference and other information) can be constructed, and an audio and video object matched with the user portrait in the audio and video resource platform is determined as an audio and video recommended object corresponding to the target user (such as video 20n and video 20p in the embodiment corresponding to fig. 2). When the classification model is trained by using the video title and the video type label corresponding to the video object, the plurality of attribute types at least may include: joke, exercise, children, food, etc.; the user portraits can be represented as tagged user models that are abstracted from information such as the user's attributes, user preferences, lifestyle, etc.

Optionally, the terminal device may obtain a recommendation model in the audio and video resource platform, obtain, based on the recommendation model, a user similarity between the target user portrait and other user portraits in the audio and video resource platform (which may also be referred to as sample user portraits, and represent user portraits corresponding to users that have logged in multiple times in the near term in the audio and video resource platform), determine the sample user portraits with the maximum user similarity as similar user portraits corresponding to the target user portraits, obtain, in the audio and video resource platform, a historical audio and video object (i.e., a resource object that the similar user portraits belong to, and the user is watching or has watched) of the user, and determine an audio and video recommendation object of the target user from the historical audio and video object, for example, determine a historical audio and video object in approximately 3 days as an audio and video recommendation object corresponding to the target user.

Fig. 4a and fig. 4b are schematic views of a process for constructing a video user portrait according to an embodiment of the present application. Taking a text resource platform comprising public number information and book information as an example, the construction of a video user portrait is specifically described. As shown in fig. 4a, the terminal device may obtain a public number and a book (i.e. a text object) focused by a target user in the public number information platform and the book information platform, and obtain a title (i.e. a text name of the text object) and a label (i.e. a text type label of the text object) of the public number and/or the book, perform word segmentation operation on the focused public number and/or the title and label of the book to obtain a plurality of unit characters, and convert each unit character into a unit character vector to obtain an input feature corresponding to the public number and an input feature corresponding to the book; inputting the input features corresponding to the public numbers into the classification model for classification, so that the video category 1 (namely the target audio and video type label) of the public numbers in the video platform can be obtained, inputting the input features corresponding to the books into the classification model for classification, so that the video category 2 (the video category 1 and the video category 2 can be expressed as the same video category or can be expressed as different video categories) of the books in the video platform can be obtained; based on the video category 1 and the video category 2 determined by the classification model, a video image can be built for the target user, and further recommended video data (i.e., audio and video recommended objects) in the video platform can be determined for the target user.

As shown in fig. 4b, the terminal device 10a may obtain, from the text resource platform 30a, a text object 30b associated with the user a (i.e., the target user, which may be a cold start user in the video platform), where the text object 30b may include the public number 30c focused on by the user a in the text resource platform 30a and the focused book 30d, and further may obtain the public number 30c correspondence Wen Benming is called: the text type label corresponding to the public number 30d of the natural world of eating is: the food public number, the similar video type label corresponding to public number 30d is: a food; the text name corresponding to book 30d is: the text type labels corresponding to humorous master, book 30d are: the book is made up, and the similar video type tags corresponding to book 30d are: laughter. The terminal device 10a may perform word segmentation on the text name, the text type and the similar video type label corresponding to the public number 30c, and convert each unit character after word segmentation into a unit character vector, so as to obtain an input feature 30e corresponding to the public number 30 c; the word segmentation operation is performed on the text name and the text type similar video type label corresponding to the book 30d, and each unit character after word segmentation is converted into a unit character vector, so as to obtain the input feature 30f corresponding to the book 30d, the specific word segmentation operation can be referred to step S101 in the embodiment corresponding to fig. 3, and the vector conversion process can be referred to step S104 in the embodiment corresponding to fig. 3, which is not described herein.

The terminal device 10a may acquire a classification model 30g that is trained based on video information (video information including video titles and video type tags) in the audio-video platform. Inputting the input features 30e into the classification model 30g, based on the classification model 30g, obtaining classification results corresponding to the input features 30e, and based on the classification results, determining that the video category corresponding to the public number 30c in the audio and video platform is a food category; the input features 30f are input into the classification model 30g, so that a classification result corresponding to the input features 30f can be obtained, and the video category corresponding to the book 30d in the audio and video platform can be determined to be a smiling category based on the classification result. Based on the determined cate category and the fun category, a user portrait a (i.e., a target user portrait) corresponding to the user a may be constructed, and based on a recommendation model in the audio-video platform, a similarity between the user portrait a and the rest of user portraits in the audio-video platform may be determined, and the terminal device 10a may determine the user portrait with the greatest similarity as a similar user portrait corresponding to the user a, i.e., determine a user B to which the user portrait with the greatest similarity belongs as a similar user corresponding to the user a. The historical video information 30h (i.e. historical resource data) corresponding to the user B is obtained in the audio-video platform, and the historical video information 30h may include video information that the user B is watching or has already watched, for example, the name corresponding to the video information 1 is: in summer, mung beans are eaten more, porridge is not cooked, bean paste is not made, nutrition and delicacy are realized, and the name corresponding to video information 2 is: the terminal device 10a may select the video from the historical video information 30h corresponding to the user B as the recommended video data corresponding to the user a, for example, 10 videos that are recently watched in the historical video information 30h are the recommended video corresponding to the user a, and display the recommended video (for example, the video 30i, the video 30j, etc.) on a display page of the audio-video platform. Wherein the video information in the historical video information 30h may be ordered based on the viewing time record of user B.

Optionally, the classification model may be trained based on music information (music information includes a music title and a music type) in the audio-video resource platform, and the plurality of attribute types may at least include: light, lyrics, rap, rock, etc. The processing procedure of the data is the same as the processing procedure described above, and will not be described in detail.

Fig. 5 is a flowchart of another data processing method according to an embodiment of the present application. As shown in fig. 5, the data processing method may include the steps of:

Step S201, acquiring a sample audio/video object from the audio/video resource platform;

before the text object is classified by using the classification model, the classification model needs to be trained, and the training process of the classification model is specifically described in the following steps S201 to S203.

Specifically, the terminal device may obtain sample audio and video objects from the audio and video resource platform, that is, obtain all sample audio and video objects from the audio and video resource platform, where the sample audio and video objects are used to train the classification model, and the sample audio and video objects may include audio objects and/or video objects. For example, all audio objects (the audio objects may be music objects) contained in the audio-video resource platform are acquired as sample audio-video objects; or all video objects contained in the audio and video resource platform are obtained as sample audio and video objects; or all video objects and all audio objects contained in the audio and video resource platform are obtained as sample audio and video objects.

Step S202, sample label information corresponding to the sample audio/video object is obtained; the sample tag information is used for marking the attribute type of the sample audio/video object;

Specifically, after the terminal device obtains the sample audio/video objects for training the classification model, sample tag information corresponding to each sample audio/video object needs to be obtained. Sample label information can be set for the sample audio/video object based on a semantic analysis result by acquiring a sample name and a sample type corresponding to the sample audio/video object and performing semantic analysis on the sample name and the sample type. The sample type may refer to attribute information set by an uploader or an operator of the audio and video resource platform for the resource object when the resource object is uploaded to the audio and video resource platform.

Taking a video object in an audio-video resource platform as an example, namely, a sample audio-video object can refer to all video information in the audio-video platform, a sample name of the sample audio-video object can refer to a title text of the video, a sample type of the sample audio-video object can refer to a tag text of the video, namely, when the video is uploaded to the audio-video platform, a platform operator or an uploading person sets video tags (such as dessert, joke, juvenile, dish, drink, basketball, football and the like) for uploading the video based on video content. The terminal device may divide all videos into a plurality of video categories based on the title text and the tag text corresponding to the videos, where the video categories may be the same as the video tags when the videos are uploaded (i.e., the video tags corresponding to the videos may be sample tag information of the videos) or may be different. If all videos in the audio-video platform are divided into four categories of "food", "fun", "child" and "sports", sample tag information needs to be reset for each video based on the title text and tag text corresponding to the video.

Step S203, training a classification model according to the mapping relation between the sample audio/video object and the sample label information;

Specifically, the terminal device may train the classification model based on a mapping relationship between the sample audio/video object and the sample tag information. In the model training process, the terminal equipment can generate sample input features corresponding to each sample audio/video object based on the sample name and the sample type corresponding to each sample audio/video object, input each sample input feature into the classification model, perform feature extraction on the sample input features in the classification model to obtain sample feature vectors corresponding to the sample input features, and continuously adjust model parameters in the classification model based on errors between the sample feature vectors and feature vectors corresponding to sample label information until convergence is achieved, so that the whole training process is completed. The process of generating the sample feature vector may refer to the description of the attribute feature vector generation process in the embodiment corresponding to fig. 3, which is not described herein.

Fig. 6a and fig. 6b are schematic flow diagrams of training a classification model according to an embodiment of the application. Taking the video object in the audio and video resource platform as an example, as shown in fig. 6a, the terminal device may obtain all video information in the audio and video platform, that is, obtain the title text and the tag text corresponding to each video in the audio and video platform, and set sample tag information for each video based on the title text and the tag text corresponding to each video, as shown in fig. 6b, where n (n is a natural number) videos are included in the audio and video platform, and each video may be used as sample data, where the title text corresponding to video 1 is: the potatoes are added with eggs, so that the nutrition is satisfied, and the label text is as follows: breakfast, the sample label information that sets up is: a food; the title text corresponding to video 2 is: one section of gymnastics performance without heart walking, the girl is cool, and the label text is as follows: gymnastics, sample label information that sets up is: motion; the title text corresponding to video 3 is: the cake can be easily made at home without adding any additive, and the label text is as follows: dessert, sample label information that sets up is: a food; the title text corresponding to video n is: cat and mouse Sichuan dubbing, opening hilarious, tag text is: creative dubbing, and set sample tag information is: joking; etc. The terminal device may perform word segmentation on the header text and the tag text included in each video sample to obtain a plurality of unit characters, convert each unit character into a vector representation, that is, convert each unit character into a unit character vector, combine the converted unit character vectors to obtain sample input features corresponding to each video sample, input the sample input features into a classification model, and adjust parameters in the classification model based on an error between an actual output result (sample feature vector) obtained in the classification model and an expected output result (i.e., feature vector corresponding to sample tag information).

Taking a classification model as a convolutional neural network model as an example, performing word segmentation, vector conversion and other operations on each video sample, generating corresponding sample input features, inputting the sample input features into the convolutional neural network, and performing convolution operation and pooling operation in the convolutional neural network to calculate sample feature vectors corresponding to each video sample in the forward direction (namely, performing feature extraction on the input feature vectors based on the convolution operation and pooling operation in the convolutional neural network). By classifying the sample feature vectors, an actual output result corresponding to the video sample can be obtained, sample label information can be used as an expected output result corresponding to the video sample, counter propagation calculation is performed according to errors between the actual output result and the expected output result, model parameters of each layer in the convolutional neural network are updated until the errors between the actual output value and the expected output value output by the network are small (smaller than a set error value, such as 0.0002), training is completed, all parameter information of the convolutional neural network model is saved, and the convolutional neural network has a good classification function.

Alternatively, the classification model may be a support vector machine, and since the support vector machine is generally used to solve the classification problem, if the support vector machine is used to solve the video classification problem (the video may include multiple categories, so the classification problem of the video is a multi-classification problem), a one-to-many method may be used to construct the multi-class classifier.

For example, the audio/video platform includes k video categories, that is, the category of the sample tag information is k, and videos with the same sample tag information are the same category. When the terminal device classifies video samples of a certain category into one category, all the remaining video samples of the category can be classified into another category, so that k categorizers can be constructed by k video samples of the category. If the video categories corresponding to the sample video data are 4 categories, namely 1,2,3 and 4 respectively, when the training set is extracted, the training set can be extracted respectively: (1) The video samples corresponding to the category 1 are used as positive sample sets, and the video samples corresponding to the category 2, the category 3 and the category 4 are used as negative sample sets; (2) The video samples corresponding to the category 2 are used as positive sample sets, and the video samples corresponding to the category 1, the category 3 and the category 4 are used as negative sample sets; (3) The video samples corresponding to the category 3 are used as positive sample sets, and the video samples corresponding to the category 1, the category 2 and the category 4 are used as negative sample sets; (4) The video samples corresponding to the category 4 are used as positive sample sets, and the video samples corresponding to the category 1, the category 2 and the category 3 are used as negative sample sets. And training by using the four training sets respectively, and then obtaining four training result files to complete the training process of the classifier. If the support vector machine is used as the classification model, the sample input features corresponding to each video may be input into the classification model, and the output result corresponding to the sample input features may be directly output.

Step S204, acquiring a text object associated with a target user from a text resource platform, and acquiring a plurality of initial characters in a historical release text corresponding to the text object;

Step S205, acquiring word frequency and inverse document frequency corresponding to each initial character according to the historical release text, and selecting text type labels corresponding to the text objects from the plurality of initial characters according to the word frequency and the inverse document frequency;

Step S206, a classification model based on an audio and video resource platform is obtained; the classification model is obtained by training based on audio and video titles and audio and video type labels corresponding to a plurality of audio and video objects in the audio and video resource platform;

Step S207, obtaining the label similarity between the text type label and a plurality of audio and video type labels, and determining similar audio and video type labels corresponding to the text type label from the plurality of audio and video type labels based on the label similarity;

Step S208, generating input features according to the text names, the text type labels and the similar audio and video type labels corresponding to the text objects;

Step S209, inputting the input features into the classification model, outputting target audio and video type labels corresponding to the input features based on the classification model, and determining audio and video recommended objects corresponding to the target users according to the target audio and video type labels; the audio and video recommended object is a resource object in the audio and video resource platform.

The specific implementation process of step S204 to step S209 may refer to the description of step S101 to step S106 in the embodiment corresponding to fig. 3, and will not be described herein.

Fig. 7 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. As shown in fig. 7, the data processing apparatus 1 may include: the device comprises a first acquisition module 11, a label selection module 12, a second acquisition module 13, a similar label determination module 14, a generation module 15 and a determination module 16;

a first obtaining module 11, configured to obtain a text object associated with a target user in a text resource platform, and obtain a plurality of initial characters in a historical publication text corresponding to the text object;

The tag selection module 12 is configured to obtain a word frequency and an inverse document frequency corresponding to each initial character according to the historical published text, and select a text type tag corresponding to the text object from the plurality of initial characters according to the word frequency and the inverse document frequency;

The second acquisition module 13 is used for acquiring a classification model based on the audio and video resource platform; the classification model is obtained by training based on audio and video titles and audio and video type labels corresponding to a plurality of audio and video objects in the audio and video resource platform;

the similar tag determining module 14 is configured to obtain tag similarities between the text type tag and a plurality of audio/video type tags, and determine similar audio/video type tags corresponding to the text type tag from the plurality of audio/video type tags based on the tag similarities;

the generating module 15 is configured to generate an input feature according to a text name corresponding to the text object, the text type tag, and the similar audio/video type tag;

the determining module 16 is configured to input the input feature to the classification model, output a target audio/video type tag corresponding to the input feature based on the classification model, and determine an audio/video recommended object corresponding to the target user according to the target audio/video type tag; the audio and video recommended object is a resource object in the audio and video resource platform.

The specific functional implementation manner of the first obtaining module 11, the tag selecting module 12, the second obtaining module 13, the similar tag determining module 14, the generating module 15, and the determining module 16 may refer to step S101 to step S106 in the embodiment corresponding to fig. 3, which are not described herein.

Referring also to fig. 7, the data processing apparatus 1 may further include: a sample object acquisition module 17, a sample tag acquisition module 18, a training module 19;

the sample object obtaining module 17 is configured to obtain a sample audio/video object from the audio/video resource platform;

The sample tag obtaining module 18 is configured to obtain sample tag information corresponding to the sample audio/video object; the sample tag information is used for marking the attribute type of the sample audio/video object;

and the training module 19 is used for training the classification model according to the mapping relation between the sample audio/video object and the sample label information.

The specific functional implementation manners of the sample object obtaining module 17, the sample tag obtaining module 18, and the training module 19 may refer to step S201 to step S203 in the embodiment corresponding to fig. 5, which are not described herein.

Referring to fig. 7, the first obtaining module 11 may include: a history text acquisition unit 111, a screening unit 112;

a history text obtaining unit 111, configured to obtain a history release text associated with the text object in a target time range;

and a screening unit 112, configured to screen the historical publication text based on the deactivated word stock, and divide the screened historical publication text into the plurality of initial characters.

The specific function implementation manner of the history text obtaining unit 111 and the filtering unit 112 may refer to step S101 in the embodiment corresponding to fig. 3, which is not described herein.

Referring also to fig. 7, the tag selection module 12 may include: a first statistical unit 121, a second statistical unit 122, and a weight value determination unit 123;

a first statistics unit 121, configured to respectively count a unit number of each initial character in the historical published text, and determine word frequencies corresponding to each initial character based on a total number of characters corresponding to the unit number and the historical published text;

A second statistics unit 122, configured to determine, from a corpus, a number of documents corresponding to each initial character, and determine, based on the number of documents and a total number of documents in the corpus, an inverse document frequency corresponding to each initial character;

and a weight value determining unit 123, configured to determine a weight value corresponding to each initial character according to the word frequency and the inverse document frequency, and select a text type tag corresponding to the text object from the plurality of initial characters based on the weight value.

The specific functional implementation manner of the first statistics unit 121, the second statistics unit 122, and the weight value determining unit 123 may refer to step S102 in the embodiment corresponding to fig. 3, and will not be described herein.

Referring also to fig. 7, the similar tag determination module 14 may include: a conversion unit 141, a vector generation unit 142, a tag similarity determination unit 143;

A conversion unit 141, configured to divide the text type tag and the plurality of audio/video type tags into a plurality of unit characters, and convert each unit character into a unit character vector;

The vector generating unit 142 is configured to generate a first vector based on the unit character vector corresponding to the text type tag, and generate a second vector corresponding to each audio/video type tag based on the unit character vector corresponding to each audio/video type tag;

a tag similarity determining unit 143, configured to determine a tag similarity between the first vector and each second vector, and determine an audio/video type tag corresponding to the second vector with the largest tag similarity as the similar audio/video type tag corresponding to the text type tag.

The specific functional implementation manner of the conversion unit 141, the vector generation unit 142, and the tag similarity determination unit 143 may refer to step S104 in the embodiment corresponding to fig. 3, which is not described herein.

Referring to fig. 7, the generating module 15 may include: a name acquisition unit 151, a vector concatenation unit 152;

a name obtaining unit 151, configured to obtain the text name corresponding to the text object, and generate a third vector according to unit character vectors corresponding to a plurality of unit characters included in the text name;

And a vector stitching unit 152, configured to stitch the first vector, the second vector corresponding to the similar audio/video type tag, and the third vector into the input feature.

The specific function implementation manner of the name obtaining unit 151 and the vector stitching unit 152 may refer to step S105 in the embodiment corresponding to fig. 3, which is not described herein.

Referring also to fig. 7, the determining module 16 may include: an input unit 161, a matching degree determination unit 162, a recommendation object determination unit 163;

an input unit 161, configured to input the input feature into the classification model, and generate an attribute feature vector corresponding to the input feature in the classification model;

A matching degree determining unit 162, configured to obtain matching degrees between the attribute feature vector and multiple attribute type tags in the classification model, and determine an attribute type tag corresponding to the maximum matching degree as a target audio/video type tag associated with the text object;

And a recommended object determining unit 163, configured to determine the audio and video recommended object corresponding to the target user according to the target audio and video type tag.

The specific function implementation manner of the input unit 161, the matching degree determining unit 162, and the recommended object determining unit 163 may refer to step S106 in the embodiment corresponding to fig. 3, which is not described herein.

Referring also to fig. 7, the sample tag acquisition module 18 may include: an information acquisition unit 181, a sample tag setting unit 182;

An information obtaining unit 181, configured to obtain a sample name and a sample type corresponding to the sample audio/video object;

And a sample tag setting unit 182, configured to perform semantic analysis on the sample name and the sample type, and set the sample tag information for the sample audio/video object based on a semantic analysis result.

The specific functional implementation manner of the information obtaining unit 181 and the sample tag setting unit 182 may refer to step S202 in the embodiment corresponding to fig. 5, which is not described herein.

Referring also to fig. 7, the training module 19 may include: a sample feature generation unit 191, a sample input unit 192, a model training unit 193;

a sample feature generating unit 191, configured to generate a sample input feature corresponding to the sample audio/video object based on the sample name and the sample type;

A sample input unit 192, configured to input the sample input feature to the classification model, and obtain a sample feature vector corresponding to the sample input feature in the classification model;

a model training unit 193 for training the classification model based on an error between the sample feature vector and a feature vector corresponding to the sample tag information.

The specific functional implementation manner of the sample feature generating unit 191, the sample input unit 192, and the model training unit 193 may refer to step S203 in the embodiment corresponding to fig. 5, which is not described herein.

Referring to fig. 7 together, the recommended object determining unit 163 may include: a user representation determination sub-unit 1631, a matching sub-unit 1632;

a user portrait determining sub-unit 1631, configured to determine, according to the target audio/video type tag, a target user portrait of the target user for the audio/video resource platform;

and the matching subunit 1632 is configured to determine, from the audio and video resource platform, an audio and video object that matches the portrait of the target user as an audio and video recommended object corresponding to the target user.

The specific functional implementation manner of the user portrait determining subunit 1631 and the matching subunit 1632 may refer to step S106 in the embodiment corresponding to fig. 3, which is not described herein.

Referring also to fig. 7, the matching sub-unit 1632 may include: a user similarity acquisition sub-unit 16321, a similar user determination sub-unit 16322, a history object acquisition sub-unit 16323;

a user similarity obtaining subunit 16321, configured to obtain, based on a recommendation model in the audio and video resource platform, a similarity between the target user portrait and a sample user portrait; the sample user portrait is a user portrait corresponding to a registered user in the audio and video resource platform;

A similar user determination sub-unit 16322, configured to determine the sample user portrait with the greatest similarity as a similar user portrait corresponding to the target user portrait;

a historical object obtaining sub-unit 16323, configured to obtain, from the audio and video resource platform, a historical audio and video object of a registered user corresponding to the similar user portrait, and determine the audio and video recommended object corresponding to the target user from the historical audio and video object.

The specific functional implementation manner of the user similarity obtaining subunit 16321, the similar user determining subunit 16322, and the history object obtaining subunit 16323 may refer to step S106 in the embodiment corresponding to fig. 3, which is not described herein.

Fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 8, the computer device 1000 may include: processor 1001, network interface 1004, and memory 1005, and in addition, the above-described computer device 1000 may further include: a user interface 1003, and at least one communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display (Display), a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface, among others. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1004 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 1005 may also optionally be at least one storage device located remotely from the processor 1001. As shown in fig. 8, an operating system, a network communication module, a user interface module, and a device control application may be included in a memory 1005, which is a type of computer-readable storage medium.

In the computer device 1000 shown in FIG. 8, the network interface 1004 may provide network communication functions; while user interface 1003 is primarily used as an interface for providing input to a user; and the processor 1001 may be used to invoke a device control application stored in the memory 1005 to implement:

acquiring a resource text object associated with a target user from a text resource platform, and acquiring a plurality of initial characters in a historical release text corresponding to the text object;

Acquiring a classification model based on an audio and video resource platform; the classification model is obtained by training based on audio and video titles and audio and video types corresponding to a plurality of audio and video objects in the audio and video resource platform;

Inputting the input features into the classification model, outputting target audio and video type labels corresponding to the input features based on the classification model, and determining audio and video recommended objects corresponding to the target users according to the target audio and video type labels; the audio and video recommended object is a resource object in the audio and video resource platform.

It should be understood that the computer device 1000 described in the embodiment of the present application may perform the description of the data processing method in any of the embodiments corresponding to fig. 3 and 5, and may also perform the description of the data processing apparatus 1 in the embodiment corresponding to fig. 7, which is not repeated herein. In addition, the description of the beneficial effects of the same method is omitted.

Furthermore, it should be noted here that: the embodiment of the present application further provides a computer readable storage medium, in which a computer program executed by the aforementioned data processing apparatus 1 is stored, and the computer program includes program instructions, when executed by the processor, can execute the description of the data processing method in any of the foregoing embodiments corresponding to fig. 3 and 5, and therefore, will not be described herein in detail. In addition, the description of the beneficial effects of the same method is omitted. For technical details not disclosed in the embodiments of the computer-readable storage medium according to the present application, please refer to the description of the method embodiments of the present application.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random-access Memory (Random Access Memory, RAM), or the like.

The foregoing disclosure is illustrative of the present application and is not to be construed as limiting the scope of the application, which is defined by the appended claims.

Claims

1. A method of data processing, comprising:

When the target user is detected to be a cold start user in the audio and video resource platform, acquiring a text object associated with the target user in the text resource platform, and acquiring a plurality of initial characters in a historical release text corresponding to the text object; the audio and video resource platform and the text resource platform are different functional modules in the same software application, or the audio and video resource platform and the text resource platform are different software applications;

Inputting the input features into the classification model, outputting a target audio and video type label corresponding to the input features based on the classification model, constructing a target user portrait of the target user in an audio and video resource platform according to the target audio and video type label, and determining an audio and video object matched with the target user portrait in the audio and video resource platform as an audio and video recommended object corresponding to the target user; the target user portrait refers to a labeled user model of the target user in the audio and video resource platform.

2. The method of claim 1, wherein the obtaining the plurality of initial characters in the historical publication text corresponding to the text object comprises:

3. The method according to claim 2, wherein the obtaining, according to the history published text, a word frequency and an inverse document frequency corresponding to each initial character, and selecting, according to the word frequency and the inverse document frequency, a text type tag corresponding to the text object from the plurality of initial characters, includes:

4. The method of claim 1, wherein the obtaining the tag similarity between the text type tag and the plurality of audio and video type tags, and determining, from the plurality of audio and video type tags, a similar audio and video type tag corresponding to the text type tag based on the tag similarity, comprises:

5. The method of claim 4, wherein generating the input feature according to the text name, the text type tag, and the similar audio-video type tag corresponding to the text object comprises:

6. The method of claim 1, wherein the inputting the input feature into the classification model and outputting the target audio-video type tag corresponding to the input feature based on the classification model comprises:

And acquiring the matching degree between the attribute feature vector and various attribute type labels in the classification model, and determining the attribute type label corresponding to the maximum matching degree as a target audio/video type label associated with the text object.

7. The method of claim 1, wherein determining the audio-video object in the audio-video resource platform that matches the target user representation as the audio-video recommended object corresponding to the target user comprises:

8. The method as recited in claim 1, further comprising:

9. The method of claim 8, wherein the obtaining sample tag information corresponding to the sample audio-video object comprises:

10. The method of claim 9, wherein training the classification model based on the mapping between the sample audio video object and the sample tag information comprises:

11. A data processing apparatus, comprising:

The first acquisition module is used for acquiring a text object associated with a target user from the text resource platform when the target user is detected to be a cold start user in the audio and video resource platform, and acquiring a plurality of initial characters in a history release text corresponding to the text object; the audio and video resource platform and the text resource platform are different functional modules in the same software application, or the audio and video resource platform and the text resource platform are different software applications;

The determining module is used for inputting the input features into the classification model, outputting target audio and video type labels corresponding to the input features based on the classification model, constructing target user portraits of the target users in the audio and video resource platform according to the target audio and video type labels, and determining audio and video objects matched with the target user portraits in the audio and video resource platform as audio and video recommended objects corresponding to the target users; the target user portrait refers to a labeled user model of the target user in the audio and video resource platform.

12. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1 to 10.

13. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, perform the steps of the method according to any of claims 1 to 10.