CN110852047A

CN110852047A - Text score method, device and computer storage medium

Info

Publication number: CN110852047A
Application number: CN201911089616.XA
Authority: CN
Inventors: 缪畅宇
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-11-08
Filing date: 2019-11-08
Publication date: 2020-02-28

Abstract

The embodiment of the application discloses a text score method, a text score device and a computer storage medium, wherein the method relates to a natural language processing direction in the field of artificial intelligence, and the method comprises the following steps: the method comprises the steps of obtaining a sample text and multi-dimensional sample characteristic information corresponding to the sample text, predicting user feedback information of a browsing user aiming at the multi-dimension of the sample text based on a text score model and the sample characteristic information, obtaining loss corresponding to the user feedback information of each dimension based on the sample characteristic information and the user feedback information, training the text score model based on the loss corresponding to the user feedback information of each dimension to obtain a trained text score model, and predicting target score of a text to be scored based on the trained text score model. According to the scheme, the multidimensional sample characteristic information corresponding to the sample text is used as model input, and a plurality of optimization targets are set, so that the accuracy of the text score is improved.

Description

Text score method, device and computer storage medium

Technical Field

The application relates to the technical field of computers, in particular to a text score method, a text score device and a computer storage medium.

Background

In the process of reading paragraphs and articles or interactive chatting in a chatting application, the user can play proper background music to create good reading experience for the user, and the effects of obviously increasing the reading time of the user, the interaction times of the user and the like are achieved. However, at present, for the text music matching, the author needs to select the background music suitable for the text or the user by himself when editing the text, and the text music matching method is not only high in cost, but also the selected background music is not necessarily accurate.

Disclosure of Invention

The embodiment of the application provides a method and a device for matching music with texts and a computer storage medium, which can improve the accuracy of matching music with texts.

The embodiment of the application provides a method for matching music with texts, which comprises the following steps:

acquiring a sample text and multi-dimensional sample characteristic information corresponding to the sample text;

predicting multi-dimensional user feedback information of a browsing user for the sample text based on a text score model and the sample characteristic information;

based on the sample characteristic information and the user feedback information, obtaining the loss corresponding to the user feedback information of each dimension;

training the text score model based on the loss corresponding to the user feedback information of each dimension to obtain a trained text score model;

and predicting the target score of the text to be scored based on the trained text score model.

Correspondingly, the embodiment of the present application further provides a text dubbing device, including:

the acquisition model is used for acquiring a sample text and multi-dimensional sample characteristic information corresponding to the sample text;

the first prediction model is used for predicting multi-dimensional user feedback information of a browsing user for the sample text based on a text score model and the sample characteristic information;

the loss obtaining model is used for obtaining the loss corresponding to the user feedback information of each dimension based on the sample characteristic information and the user feedback information;

the training model is used for training the text score model based on the loss corresponding to the feedback information of each dimension user to obtain a trained text score model;

and the second prediction model is used for predicting the target score of the text to be scored based on the trained text score model.

Optionally, in some embodiments, the obtaining module may include an obtaining sub-module and an extracting sub-module, as follows:

the acquisition submodule is used for acquiring a sample text and a plurality of sample score information corresponding to the sample text;

and the extraction submodule is used for extracting the sample text and the characteristics of the sample score information to obtain multi-dimensional sample characteristic information.

At this time, the extracting sub-module may be specifically configured to extract sample score feature information corresponding to the sample score information based on a preset database, extract features of the sample text, and obtain sample text feature information corresponding to the sample text.

Optionally, in some embodiments, the first prediction module may include a first prediction sub-module, a second prediction sub-module, and a fusion sub-module, as follows:

the first prediction sub-module is used for predicting attribute prediction information of a browsing user aiming at the sample text based on the linear sub-model and the sample attribute information;

the second prediction submodule is used for predicting label prediction information of a browsing user aiming at the sample text based on the deep neural network submodel and the sample label information;

and the fusion submodule is used for fusing the attribute prediction information and the label prediction information to obtain multi-dimensional user feedback information.

At this time, the second prediction sub-module may be specifically configured to convert the sample label information into a sample label feature vector, and predict, based on the deep neural network sub-model and the sample label feature vector, label prediction information of the browsing user for the sample text.

Optionally, in some embodiments, the second prediction module may include a third prediction sub-module and a determination sub-module, as follows:

the third prediction submodule is used for predicting multi-dimensional target user feedback information of each piece of music in the music library aiming at the text to be dubbed based on the trained music dubbing model, the music library and the text to be dubbed;

and the determining submodule is used for determining the target score of the text to be scored from the music library according to the feedback information of the target user.

At this time, the third prediction sub-module may be specifically configured to obtain a to-be-dubbed music text and a plurality of text features corresponding to the to-be-dubbed music text, obtain a music library and music features corresponding to a plurality of pieces of music in the music library, and predict multi-dimensional target user feedback information of each piece of music in the music library for the to-be-dubbed music text based on the trained music model, the text features, and the music features.

At this time, the determining submodule may be specifically configured to perform weighted fusion on the multidimensional target user feedback information corresponding to each piece of music in the music library to obtain fused user feedback information corresponding to each piece of music, and determine the target score of the text to be scored from the pieces of music in the music library according to the fused user feedback information.

In addition, a computer storage medium is provided in an embodiment of the present application, where a plurality of instructions are stored in the computer storage medium, and the instructions are suitable for being loaded by a processor to perform steps in any one of the text score methods provided in the embodiment of the present application.

The method and the device for obtaining the target score of the text to be scored can obtain the sample text and multi-dimensional sample characteristic information corresponding to the sample text, predict and browse the multi-dimensional user feedback information of the user aiming at the sample text based on the text score model and the sample characteristic information, obtain the loss corresponding to the feedback information of each dimension user based on the sample characteristic information and the user feedback information, train the text score model based on the loss corresponding to the feedback information of each dimension user to obtain the trained text score model, and predict the target score of the text to be scored based on the trained text score model. According to the scheme, the multidimensional sample characteristic information corresponding to the sample text is used as model input, and a plurality of optimization targets are set, so that the accuracy of the text score is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a scene schematic diagram of a text score system provided in an embodiment of the present application;

fig. 2 is a first flowchart of a method for dubbing music in text according to an embodiment of the present application;

fig. 3 is a second flowchart of a method for matching music with text provided by an embodiment of the present application;

FIG. 4 is a flowchart of a post-training text score model applied according to an embodiment of the present application;

FIG. 5 is a flowchart of training a textual score model provided by an embodiment of the present application;

fig. 6 is a schematic structural diagram of a text score apparatus provided in the embodiment of the present application;

fig. 7 is a schematic structural diagram of a network device according to an embodiment of the present application.

Detailed Description

Referring to the drawings, wherein like reference numbers refer to like elements, the principles of the present application are illustrated as being implemented in a suitable computing environment. The following description is based on illustrated embodiments of the application and should not be taken as limiting the application with respect to other embodiments that are not detailed herein.

In the description that follows, specific embodiments of the present application will be described with reference to steps and symbols executed by one or more computers, unless otherwise indicated. Accordingly, these steps and operations will be referred to, several times, as being performed by a computer, the computer performing operations involving a processing unit of the computer in electronic signals representing data in a structured form. This operation transforms the data or maintains it at locations in the computer's memory system, which may be reconfigured or otherwise altered in a manner well known to those skilled in the art. The data maintains a data structure that is a physical location of the memory that has particular characteristics defined by the data format. However, while the principles of the application have been described in language specific to above, it is not intended to be limited to the specific form set forth herein, and it will be recognized by those of ordinary skill in the art that various of the steps and operations described below may be implemented in hardware.

The term "module" as used herein may be considered a software object executing on the computing system. The different components, modules, engines, and services described herein may be considered as implementation objects on the computing system. The apparatus and method described herein may be implemented in software, but may also be implemented in hardware, and are within the scope of the present application.

The terms "first", "second", and "third", etc. in this application are used to distinguish between different objects and not to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or modules is not limited to only those steps or modules listed, but rather, some embodiments may include other steps or modules not listed or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

An execution main body of the text score method can be the text score device provided by the embodiment of the present application, or a network device integrated with the text score device, wherein the text score device can be implemented in a hardware or software manner. The network device may be a smart phone, a tablet computer, a palm computer, a notebook computer, or a desktop computer. Network devices include, but are not limited to, computers, network hosts, a single network server, multiple sets of network servers, or a cloud of multiple servers.

Referring to fig. 1, fig. 1 is a schematic view of an application scenario of a text score method according to an embodiment of the present application, taking an example that a text score device is integrated in a network device, where the network device may obtain a sample text and multi-dimensional sample feature information corresponding to the sample text, predict and browse user feedback information of a user for the multi-dimensional sample text based on a text score model and the sample feature information, obtain a loss corresponding to each dimensional user feedback information based on the sample feature information and the user feedback information, train the text score model based on the loss corresponding to each dimensional user feedback information, obtain a trained text score model, and predict a target score of a text to be scored based on the trained text score model.

The text music matching method provided by the embodiment of the application relates to the computer vision direction in the field of artificial intelligence. According to the embodiment of the application, a plurality of feature maps in the feature map group can be aligned with the reference feature map through an image processing technology, and then the pixel type information corresponding to the video image to be processed is identified.

Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence software technology mainly comprises a computer vision technology, a machine learning/deep learning direction and the like.

Among them, Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

Referring to fig. 2, fig. 2 is a schematic flow chart of a text score method provided in the embodiment of the present application, which is specifically described by the following embodiments:

201. and acquiring the sample text and multi-dimensional sample characteristic information corresponding to the sample text.

The sample text can be a text which is already finished with the music and can be used as a sample for model training. Where text is a representation of a written language, a sentence or combination of sentences having a complete, systematic meaning may be referred to as text. The text may be a sentence, a paragraph or an article, etc. The sample text may be of various kinds, for example, the sample text may be a complete article written by an author in a public number, a small program, may be a section of a user's transcription or writing, or may also be a conversation, chat, etc. by a chat application for the user.

The sample feature information may be related to a sample text and may be used for model training, and in order to enable the trained model to more accurately determine music matched with the text, the model may be trained by using sample feature information of multiple dimensions. For example, the sample feature information may include feature information corresponding to sample music matched with the sample text, feature information corresponding to a user group browsing the sample text, feature information corresponding to published content related to publication of the sample text, and the like.

In practical applications, for example, before the model is trained, a training sample may be determined, a plurality of texts may be extracted from a database as sample texts, and sample feature information corresponding to sample music matched with the sample texts, sample feature information corresponding to a user group browsing the sample texts, and sample feature information corresponding to publication contents such as time and platform of publication of the sample texts, etc. are obtained. In order to consider factors such as the experience of the user for reading the dubbing music, various sample feedback information of the user browsing the sample text can be collected as an optimization target to optimize the model, and the sample feedback information can be embodied in a label form of a training sample. For example, the sample feedback information may include one or more of the average browsing duration of the user, the number of times of comments made by the user, the amount of money viewed by the user, the number of times of sharing by the user, and the like.

In an embodiment, the sample feedback information may not be limited to the above-mentioned multiple types of feedback information, and the model may be optimized by using other types of sample feedback information as an optimization target according to an actual situation or by an author specification, etc.

In an embodiment, in order to improve the modeling capability of the model, sample feature information of multiple dimensions needs to be acquired, so that the sample feature information can be obtained by acquiring multiple types of sample score information corresponding to a sample text and further performing feature extraction on the sample score information. Specifically, the step of "obtaining a sample text and multi-dimensional sample feature information corresponding to the sample text" may include:

acquiring a sample text and a plurality of sample score information corresponding to the sample text;

and extracting the sample text and the characteristics of the sample score information to obtain multi-dimensional sample characteristic information.

The sample score information can be information related to the score corresponding to the sample text, and in order to improve the modeling capability of the model, the sample score information of multiple dimensions corresponding to the sample text can be acquired. For example, the sample music information may include sample music information of music dimension, that is, sample music matched with the sample text, sample music information of user dimension, that is, a user group who browses the sample text, and sample music information of context dimension, that is, information of time and platform of publication of the sample text.

In practical applications, for example, several texts may be extracted from the database as sample texts, and then sample music information such as sample music matching the sample texts, a user group browsing the sample texts, and time and platform of posting the sample texts is obtained. The number of sample music matched with the sample text is not limited, and if the sample text is 'happy new year', the sample music can be a plurality of songs related to the new year, such as 'coming new year', 'good new year', and the like. The user group browsing the sample text can be a plurality of users browsing the sample text, and the overall experience of the users on the text score can be optimized by taking the information corresponding to the users browsing the sample text as the training sample, so that the model is not disturbed by the behavior of a single user. After the sample score information is obtained, the sample score information can be subjected to feature extraction to obtain sample feature information.

In an embodiment, the sample score information may include not only a music dimension, a user dimension, and a context dimension, but also sample score information of other dimensions may be added as a training sample according to an actual situation, so as to train the model.

In an embodiment, because the sample score information includes a plurality of dimensions, and the methods for extracting features of sample score information with different dimensions are different, sample feature information can be extracted by using different feature extraction methods according to the type of the sample score information. Specifically, the step of "extracting features of the sample text and the sample score information to obtain multi-dimensional sample feature information" may include:

extracting sample score feature information corresponding to the sample score information based on a preset database;

and extracting the characteristics of the sample text to obtain sample text characteristic information corresponding to the sample text.

The preset database can be a pre-established database related to the sample score information, and the purpose of obtaining the sample characteristic information corresponding to the sample score information can be achieved by searching the preset database. For example, the preset database may include a music song library, a music comment library, a user image library, and the like. Wherein, the music library can be a library including a plurality of sample music. The music comment library may be a database including comment information of the user on the sample music. The user image library may be a database including user information corresponding to a plurality of users.

The sample feature information may include sample text feature information and sample score feature information.

The sample text feature information may be feature information extracted according to the sample text, for example, the sample text feature information may include text label information, context attribute information, and the like.

The text label information may be a label that represents a feature of the sample text in a text form, for example, the text label information corresponding to the sample text may be an "emotion class", which represents that the sample text is a text of the emotion class. The text label information corresponding to the sample text can also be a history type, an eight diagrams type and the like.

The context attribute information may be attribute information representing characteristics of the sample text in a specific category form, for example, the context attribute information corresponding to the sample text may be "publication time 2019, 1 month and 1 day", which represents publication time 2019, 1 month and 1 day, and for example, the context attribute information corresponding to the sample text may also be "publication platform xx website", which represents publication platform xx website of the sample text.

The sample score feature information may be feature information extracted according to the sample score information, for example, the sample score feature information may include music tag information, music attribute information, audience tag information, audience attribute information, and the like.

The music tag information may be a tag that represents, in text form, a feature of the sample music that matches the sample text, for example, the music tag information corresponding to the sample music may be "jazz", which represents that the sample music that matches the sample text is jazz. The music label information corresponding to the sample music can be ' hurt ', ' instrumental music ', ' sound when lost, or ' quiet ', etc.

The music attribute information may be attribute information representing characteristics of the sample music matched with the sample text in a specific category, for example, the music attribute information corresponding to the sample music may be "duration 20 min", which represents that the duration of the sample music matched with the sample text is 20 minutes. The music attribute information corresponding to the sample music may also be "category national language", "key style C major", and the like.

The readership tag information may be a tag representing characteristics of a user group browsing the sample text in a text form, for example, the readership tag information corresponding to the user group may be "middle-aged person", which indicates that most of the user group browsing the sample text is middle-aged persons. The reader group tag information corresponding to the user group may also be "quadratic", "act cool", and so on.

The attribute information of the readership may be attribute information representing characteristics of a user group browsing the sample text in a specific category, for example, the attribute information of the readership corresponding to the user group may be "average age 28 years", which represents that the average age of the user group browsing the sample text is 28 years. The reader group attribute information corresponding to the user group can also be 'average height 170 cm', 'major city Beijing/Shanghai', and the like.

In practical applications, for example, as shown in fig. 5, after a sample text, sample music matched with the sample text, and a user group browsing the sample text are obtained, music tag information may be obtained by searching for song names, lyrics, music comments, and the like in a music song library and a music comment library; music attribute information is obtained by searching a music library; the method comprises the steps of obtaining reader group tag information by searching articles, published comments and the like which are watched by a user in a user image library; and acquiring the reader group attribute information through the user image library.

Because sample texts are increasing, text label information is not obtained by establishing a sample text library in advance, but is obtained by mining the sample texts serving as training samples. Since the context attribute information is also obtained according to the sample text, the context attribute information can also be obtained by mining the sample text as a training sample.

202. And predicting multi-dimensional user feedback information of the browsing user for the sample text based on the text score model and the sample characteristic information.

The text score model can be a network model which can match target music for the text to be scored. The method and the device for matching the target music with the text music in the embodiment of the application do not limit the type of the text music matching model, and the text music matching model can be used as the text music matching model in the embodiment of the application as long as the monitoring model can match the target music with the text to be matched. For example, the text score model may be a wide & deep model, where the wide & deep model is a model for classification and regression by combining the memory capability of a linear model and the generalization capability of a deep neural network model, the wide & deep model includes a linear sub-model and a deep neural network sub-model, and the linear sub-model is a simple shallow model, such as logistic regression, svm (Support vector machine), and the like, and can be used to process non-text features of numerical models, category models, and the like. The Deep Neural network submodel part is DNN (Deep Neural Networks) and can be used for processing label words, extracting feature vectors and carrying out forward propagation. And in the training process of the wide & deep model, the parameters of the two sub-models are optimized simultaneously, so that the optimal prediction capability of the whole model is achieved.

The user feedback information is information related to feedback of the user after reading the text under the background music, for example, the user feedback information may include one or more of average browsing duration of the user, user comment times, user appreciation amount, and user sharing times.

In practical applications, for example, after sample feature information of multiple dimensions is acquired, the sample feature information of multiple dimensions may be input into a text score model, and multi-dimensional user feedback information of a user browsing a sample text for the sample text is predicted based on the text score model, where the user feedback information may include one or more of an average browsing duration of the user, a number of user comments, a user appreciation amount, and a number of user shares, and the like.

In one embodiment, when the text score model is the wide & deep model, since the text score model includes a linear sub-model and a deep neural network sub-model, sample feature information needs to be classified and then input into the model. Specifically, the step of predicting multi-dimensional user feedback information of the browsing user for the sample text based on the text score model and the sample feature information may include:

predicting attribute prediction information of a browsing user for the sample text based on the linear submodel and the sample attribute information;

predicting label prediction information of a browsing user for the sample text based on the deep neural network submodel and the sample label information;

and fusing the attribute prediction information and the label prediction information to obtain multi-dimensional user feedback information.

The sample characteristic information comprises sample label information and sample attribute information.

The sample label information may be a label representing the sample characteristics in a text form, for example, the sample label information may include music label information, reader group label information, text label information, and the like.

The sample attribute information may be attribute information representing characteristics of the sample in a specific category, for example, the sample attribute information may include music attribute information, readership attribute information, context attribute information, and the like.

The text score model comprises a linear submodel and a deep neural network submodel.

The linear sub-model part is a simple shallow model, such as logistic regression, svm and the like, and can be used for processing non-text characteristics of numerical type, classification type and the like. The linear submodel has a memory capability, i.e. the correlation between features can be found from historical data.

The deep neural network submodel part is DNN and can be used for processing label words, extracting feature vectors and conducting forward propagation. The deep neural network submodel uses low-dimensional dense features as input through an embedding method, and can better generalize the combination of features which do not appear in a training sample.

In practical applications, for example, as shown in fig. 5, the sample feature information may be divided into sample label information and sample attribute information, the sample attribute information is input into the linear sub-model, the attribute prediction information of the browsing user for the sample text is predicted, the sample label information is input into the deep neural network sub-model, the label prediction information of the browsing user for the sample text is predicted, and then the attribute prediction information and the label prediction information are fused to obtain multi-dimensional user feedback information.

In one embodiment, in the deep neural network submodel, the prediction of the user feedback information may be performed by converting the sample label information into a vector form. Specifically, the step "predicting label prediction information of the browsing user for the sample text based on the deep neural network submodel and the sample label information" may include:

converting the sample label information into a sample label feature vector;

and predicting label prediction information of a browsing user for the sample text based on the deep neural network submodel and the sample label feature vector.

The deep neural network submodel comprises an embedded layer and a hidden layer.

The embedded layer is a network structure which is positioned in the deep neural network submodel and used for processing the sparse features, and dimensionality is reduced through weight matrix calculation of an embedding algorithm, so that the purpose of reducing dimensionality of the sparse features is achieved.

The hidden layer is positioned in the deep neural network submodel, other layers except the input layer and the output layer in the deep neural network submodel are hidden layers, and the hidden layers do not directly receive external signals and do not directly send signals to the outside. The hidden layer performs multi-level abstraction on the input features, and finally linearly divides the input features into different types of data.

In practical applications, for example, the sample label information may be input into the deep neural network sub-model, the sample label information is converted into a sample label feature vector through the embedding layer, the sample label feature vector may be represented in a vector form, and then the label prediction information of the browsing user for the sample text is predicted through the hiding layer.

203. And obtaining the loss corresponding to the user feedback information of each dimension based on the sample characteristic information and the user feedback information.

The objective function may be a function for achieving a desired objective in machine learning. In machine learning, in order to complete a certain target, an objective function needs to be constructed, and then the function takes a maximum value or a minimum value, so as to obtain model parameters of a machine learning algorithm.

In the embodiment of the application, in order to take various user feedback conditions into consideration, a plurality of objective functions can be set by taking a multi-task learning framework as a reference. For example, the embodiment of the application may construct the objective function around the core idea of "feedback after the user reads the text under the background music", and the constructed objective functions are all related to the behavior of the user after reading. For example, the objective function can be constructed according to the average browsing duration of the user, the number of times of user comments, the user reward amount, the number of times of user sharing and the like. In the embodiment of the present application, the objective function is not limited, and any objective function that satisfies the core idea may be used, and the type of the objective function may be adjusted according to the actual situation.

In practical application, for example, after the user feedback information of multiple dimensions is obtained based on the text score model, multiple objective functions corresponding to the user feedback information of multiple dimensions can be constructed according to the user feedback information and the sample feedback information corresponding to the sample characteristic information, and then the objective functions are solved to obtain a solved result.

204. And training the text score model based on the loss corresponding to the feedback information of each dimension user to obtain the trained text score model.

In practical application, for example, after solving a plurality of objective functions, parameters in the text score model can be adjusted according to the solving result of the objective functions, so as to achieve the purpose of training the text score model, and when the text score model is trained to be convergent, the trained text score model can be obtained.

205. And predicting the target score of the text to be scored based on the trained text score model.

In practical applications, for example, after the text score model is trained to obtain a trained text score model, the trained text score model can be used to predict the target score of the text to be scored.

In one embodiment, the target dubbing music can be determined from the music library by predicting the feedback information of each piece of music in the music library for the target user of the text to be dubbed music. Specifically, the step "predicting the target score of the text to be scored based on the trained text score model" may include:

predicting multi-dimensional target user feedback information of each piece of music in the music library aiming at the music text to be matched based on the trained music matching model, the music library and the music text to be matched;

and determining the target music of the text to be music matched from the music library according to the feedback information of the target user.

In practical application, for example, a music library including a plurality of pieces of music may be obtained, and the target music may be determined from the music library by training the music model. And then, according to the obtained target user feedback information, determining the target music of the text to be dubbed from the music library.

In an embodiment, the text to be dubbed music and the corresponding characteristics of the music library can be extracted, so that the dubbed music model can be trained conveniently to predict the feedback information of the target user. Specifically, the step "predicting multidimensional target user feedback information of each piece of music in the music library for the text to be dubbed based on the trained dubbing music model, the music library and the text to be dubbed" may include:

acquiring a text to be dubbed music and a plurality of text characteristics corresponding to the text to be dubbed music;

acquiring a music library and music characteristics corresponding to a plurality of pieces of music in the music library;

and predicting multi-dimensional target user feedback information of each piece of music in the music library aiming at the text to be dubbed based on the trained music dubbing model, the text features and the music features.

In practical applications, for example, as shown in fig. 4, music label information and music attribute information corresponding to each piece of music in a music library may be acquired through a preset music library and a music comment library. After an author writes a text to be dubbed music, text features corresponding to the text to be dubbed music can be obtained through the same method as that used in model training, the text features can include text feature information, readership feature information, author feature information and context feature information, and specifically, the text features can include text label information, readership attribute information, context attribute information and author feature information. And then inputting the music label information corresponding to each piece of music in the music library, the music attribute information corresponding to each piece of music in the music library, the text label information corresponding to the music text to be matched, the reader group label information corresponding to the music text to be matched and the context attribute information corresponding to the music text to be matched into a trained local music matching model, and predicting multi-dimensional target user feedback information of each piece of music in the music library for the music text to be matched, such as one or more of the average browsing duration of a user, the number of times of user comments, the amount of money of the user for appreciation and the number of times of user sharing.

In the model training stage, only the sample music matched with the sample text is input into the model together with the sample text for model training. In the using stage of the model, each piece of music in the music library and the text to be matched can be input into the model together to predict the feedback information of the user.

In an embodiment, in order to improve flexibility of selecting the target music, the target music may be selected based on user feedback information that is more concerned by the author according to the setting of the author. Specifically, the step "determining the target score of the text to be scored from the music library according to the target user feedback information" may include:

performing weighted fusion on the multi-dimensional target user feedback information corresponding to each piece of music in the music library to obtain fused user feedback information corresponding to each piece of music;

and determining the target music of the text to be matched from the plurality of pieces of music in the music library according to the fused user feedback information.

In practical application, for example, after multidimensional target user feedback information corresponding to each piece of music in a music library is obtained based on a trained text music score model, such as average browsing duration, target user comment times, target user appreciation amount, and target user sharing times of a target user corresponding to each piece of music in the music library, a plurality of pieces of music in the music library can be sorted according to the designation of an author, and the target music score is recommended.

For example, after obtaining the average browsing duration, the number of comments of the target user, the reward amount of the target user, and the number of shares of the target user corresponding to each piece of music in the music library, the author may set to recommend the reward amount according to the reward amount of the target user, the system may sort the pieces of music in the music library according to the predicted value of the reward amount of the target user for each piece of music in the music library, then recommend one or more pieces of music with a high reward amount of the target user as the target score to the author, and the author may select the score of the text to be scored from the recommended target score. In this way, the author can easily know which background music to match for the text to be enjoyed, and more appreciation is likely to be obtained. The type of the target user feedback information can be adjusted according to actual conditions, for example, user feedback information which is not concerned by an author can be deleted, or new user feedback information can be added.

For another example, after the average browsing duration, the number of comments of the target user, the reward amount of the target user, and the number of shares of the target user corresponding to each piece of music in the music library are obtained, the author may further set the attention degree of the author to each type of user feedback information, for example, the author may set the number of shares of the target user to the highest attention degree, then the reward amount, then the number of comments of the target user, and then the average browsing duration of the target user. Then, the system can set corresponding weight for each kind of user feedback information according to the attention degree set by the author for each kind of user feedback information, the user feedback information with high attention degree can be given higher weight, for example, the weight of the sharing time of the target user is set to 40%, the weight of the reward amount of the target user is set to 30%, the weight of the comment time of the target user is set to 20%, the weight of the average browsing time of the target user is set to 10%, and then according to the weights, weighting and fusing the feedback information of various target users to obtain fused user feedback information corresponding to each piece of music, and then sequencing the plurality of pieces of music in the music library according to the fused user feedback information, recommending one or more pieces of music with larger user feedback information after fusion to an author as target music matches, and selecting the music of the text to be matched from the recommended target music matches by the author. Therefore, the author can select the target music according to the attention degree of the author to the feedback information of different users.

In an embodiment, the author may not only specify the order of the attention degree of the author itself for each type of user feedback information, but also directly specify the weight of each type of user feedback information, for example, the author may directly set the weight of each type of user feedback information. In addition, the assignment of the weights by the system is not limited to the above weight values, as long as the weights required by the author can be achieved.

The text music matching method can be applied to a text music matching scene, on one hand, the time of an author can be saved, the author can be assisted to find out the target music which is most matched with the text to be matched, and on the other hand, more potential target music which can be used as background music can be found out for the author through intelligent recommendation of a system.

In an embodiment, the text music matching method can be applied to a scene of text music matching, and can also be applied to a multi-mode chatting system, for example, when a user chats through a chatting application, the system can automatically recommend music or audio books for the user to select, the user can play the music or audio books matched with the current chatting content during the chatting according to the recommendation of the system, for example, when the user inputs 'happy birthday' in the chatting system, the system can automatically recommend 'happy birthday of you', 'happy birthday', etc. songs related to the happy birthday, so that the user can select one song to play, and the user experience during the chatting is greatly improved.

As can be seen from the above, in the embodiment of the application, the sample text and the multidimensional sample characteristic information corresponding to the sample text can be obtained, the multidimensional user feedback information of the browsing user for the sample text is predicted based on the text score model and the sample characteristic information, the loss corresponding to the user feedback information of each dimension is obtained based on the sample characteristic information and the user feedback information, the text score model is trained based on the loss corresponding to the user feedback information of each dimension, the trained text score model is obtained, and the target score of the text to be scored is predicted based on the trained text score model. According to the scheme, the multidimensional sample characteristic information corresponding to the sample text is used as the model input, and a plurality of optimization targets are set, so that the modeling capacity of the text score model is improved. According to the text music matching model, music matching on the text to be matched can assist an author to find out appropriate background music, so that the time of the author is saved, more potential music which can be used as the background music can be found for the author, the behavior of the author in selecting target music is more flexible, and the accuracy of music matching on the text to be matched is improved.

The method described in the foregoing embodiment will be described in further detail below by way of example in which the text soundtrack apparatus is specifically integrated into a network device.

Referring to fig. 3, a specific flow of the text score method according to the embodiment of the present application may be as follows:

301. the network equipment acquires a training sample, wherein the training sample comprises a sample text, sample music matched with the sample text and a sample readership browsing the sample text.

In practical applications, for example, a text that has been completed with a score may be extracted from the database as a sample text, and a song matching the sample text may be obtained as sample music and information of a sample audience browsing the sample text, where the sample text, the sample music, and the sample audience may together form a training sample.

In an embodiment, the accuracy of the text score model can be improved by enriching the training samples, so that a plurality of sample texts can be obtained from the database to form a plurality of training samples for training the text score model. In addition, the sample music matched with one sample text is not limited to one piece, namely the sample music is not limited to one piece of music with history as sample text background music, and a plurality of pieces of music which are related to the content of the sample text and can be used as the sample text background music can be used as the sample music to enrich the training sample. Meanwhile, in order to avoid influencing the effect of model training due to individual behaviors of a single reader, the sample readership information can comprise a plurality of readers who browse the sample text, and the generality and the accuracy of the sample readership are ensured by enlarging the number of the sample readership.

In an embodiment, in order to perform construction of objective functions in multiple dimensions and perform target optimization through objective functions in multiple dimensions according to multiple user feedback information predicted by a text score model and real sample feedback information in a text score model training process, in a training stage, it is necessary to collect multiple sample feedback information read by a reader who browses a sample text, where the collected information is real sample feedback information corresponding to the sample text, such as average browsing duration of the reader, number of times of comments of the reader, amount of money viewed by the reader, and number of times of sharing by the reader. These real sample feedback information may be embodied as labels for training the model.

302. The network equipment acquires sample characteristic information, wherein the sample characteristic information comprises music label information, readership label information, text label information, music attribute information, readership attribute information and context attribute information.

In practical applications, for example, after the sample text, the sample music, and the sample readership are obtained, the music label information, the readership label information, the music attribute information, and the readership attribute information may be obtained by searching a preset music tune library, a preset music comment library, and a preset readership image library. Because the sample texts in the database are continuously increased, the database is not required to be established in advance, and the sample texts are directly mined to obtain the text label information and the context attribute information.

For example, as shown in fig. 5, a music song library and a music comment library may be searched, and music tag information is mined according to the music name, lyrics, music comment, and the like of music; music attribute information is obtained by searching a music library; and searching a reader portrait library, and mining texts such as articles and published comments which are watched by readers to obtain reader group tag information and reader group attribute information.

The tag information is a feature expressed by using a text form, and the tag information can be expressed in a word form, for example, the music tag information can be "jazz", "hurt", "instrumental music", "hearing when lost", "quiet", and the like; the readership tag information may be "quadratic", "middle-aged", "play cool", etc.; the text label information may be "emotion type", "history type", "eight diagrams type", and the like.

The attribute information is a feature expressed by using a specific category form, and can be expressed by a form of a feature name feature value, for example, the music attribute information may be "duration 20 min", "category national language", "tonality C major", and the like; the readership attribute information may be "average age 28 years", "average height 170 cm", "major city beijing/shanghai", etc.; the context attribute information may be "published time 2019, 1 month and 1 day", "published platform xx website", and the like.

303. And the network equipment inputs the sample characteristic information into the text score model, and predicts the multi-dimensional user feedback information corresponding to the sample text.

In practical applications, for example, when the text score model is a wide & deep model, the wide & deep model includes a wide part and a deep part, where the deep part is a deep neural network and is used for processing text features such as tag information, and performing embedding and forward propagation, and the wide part is a simple shallow model, such as logistic regression, svm, and is used for processing non-text features such as attribute information. Therefore, the sample characteristic information can be divided into sample label information and sample attribute information, wherein the sample label information includes music label information, reader group label information and text label information; the sample attribute information includes music attribute information, readership attribute information, and context attribute information.

And then, inputting the sample label information into a deep part in the wide & deep model, converting the sample label information into a sample label characteristic vector in a vector form through an embedding algorithm, and predicting label prediction information corresponding to the sample label information through a multilayer neural network. And inputting the sample attribute information into a wide part in a wide & deep model, and predicting attribute prediction information corresponding to the sample attribute information through a linear model of the wide part. And then fusing the acquired label prediction information and attribute prediction information to obtain multi-dimensional user feedback information.

In the embodiment of the application, the structure or the type of the text score model is not limited, and any network model can be used as long as the network model has a supervision model and can predict multi-dimensional user feedback information corresponding to the sample text.

304. And the network equipment constructs a plurality of objective functions according to the multidimensional user feedback information and the sample labels corresponding to the training samples.

In practical application, in the process of training the network model, an objective function is needed to make the model obtain a required target. Therefore, the embodiment of the application can build a plurality of objective functions related to the behavior of the user after reading by taking the frame of multi-task learning as a reference and surrounding the core idea of feedback after the user reads the article under the background music.

For example, after multi-dimensional user feedback information is obtained, the multi-dimensional user feedback information can be compared with sample feedback information corresponding to a training sample label, and a multi-dimensional objective function is constructed according to the multi-dimensional user feedback information and the sample feedback information corresponding to the training sample, wherein each objective function represents optimization of one type of user feedback information, that is, the objective functions represent optimization of user feedback information such as average browsing duration of readers, reader comment times, reader reward amount, reader sharing times and the like. In order to improve the flexibility of the text score model, the type of the objective function can be adjusted according to actual conditions, and the objective function meeting the core idea of 'feedback after a user reads an article under background music' can be incorporated into the system.

305. And the network equipment solves the plurality of objective functions, and adjusts parameters of the text music score model according to the solving result to obtain the trained text music score model.

In practical application, for example, after a plurality of objective functions are constructed, the plurality of objective functions can be solved, and parameters in the text score model are adjusted according to the solving result of the solved objective functions until the text score model converges, so that the trained text score model is obtained.

306. The network equipment acquires the text characteristics corresponding to the text to be dubbed music and the music characteristics corresponding to each piece of music in the music library.

In practical application, for example, after the model training is completed and the training current music model is obtained, the target music of the text to be matched can be predicted by using the training current music model. The method comprises the steps of obtaining a text to be dubbed which needs to be matched with background music, and mining text characteristics corresponding to the text to be dubbed, wherein the text characteristics can comprise text label information, reader group attribute information, context attribute information and author characteristic information corresponding to the text to be dubbed. And a music library can be obtained, the music library comprises a plurality of pieces of music, and then music characteristics corresponding to each piece of music in the music library are mined according to a preset music library, a music comment library and a reader portrait library, wherein the music characteristics can comprise music label information and music attribute information corresponding to each piece of music in the music library.

307. And the network equipment inputs the text characteristics and the music characteristics into the trained music matching model, and multi-dimensional target user feedback information is obtained through prediction.

In practical application, for example, the text features corresponding to the text to be dubbed and the music features corresponding to each piece of music in the music library may be input into a trained music dubbing model, and then multi-dimensional target user feedback information of each piece of music in the music library for the text to be dubbed is predicted based on the trained music dubbing model.

If the music library comprises a plurality of pieces of music such as music 1, music 2, music 3 … and the like, the training of the music matching model can respectively predict multidimensional target user feedback information when the music 1 is used as background music of a text to be matched; when the music 2 is used as background music of a text to be dubbed, multi-dimensional target user feedback information is obtained; and the music 3 is used as multi-dimensional target user feedback information when the background music of the text to be matched. The multi-dimensional target user feedback information comprises average browsing duration of readers, reader comment times, reader appreciation amount and reader sharing times.

When the training text score model is used for training, music input as a training sample is sample music matched with the sample text, and when the training text score model is used for information prediction, the input music is not limited to the music matched with the text to be matched, but all music in a music library can be input, and at the moment, multi-dimensional target user feedback information of each piece of music in the music library for the text to be matched can be acquired.

308. And the network equipment determines the target score from the plurality of pieces of music in the music library according to the feedback information of the target user.

In practical application, for example, after multi-dimensional target user feedback information of each piece of music in the music library for the text to be dubbed is acquired, multiple sequencing methods can be provided for the author of the text to be dubbed, when the author of the text to be dubbed pays more attention to the amount of money to be enjoyed by the user, the pieces of music in the music library can be sequenced according to the value of the amount of money to be enjoyed by the user, and one or more pieces of music in the music library are recommended to the author as target dubbing music, so that the author selects more appropriate music as background music of the text to be dubbed.

In an embodiment, for example, after obtaining multi-dimensional target user feedback information of each piece of music in the music library for the text to be dubbed, the multi-dimensional target user feedback information may be fused according to a weight specified by an author of the text to be dubbed to obtain fused user feedback information, then the pieces of music in the music library are sorted according to a numerical value of the fused user feedback information, and one or more pieces of music in the music library are recommended to the author as target dubbing, so that the author selects more appropriate music as background music of the text to be dubbed.

As can be seen from the above, in the embodiment of the present application, a training sample may be obtained through a network device, where the training sample includes a sample text, sample music matched with the sample text, and a sample reader group browsing the sample text, and sample characteristic information is obtained, where the sample characteristic information includes music tag information, reader group tag information, text tag information, music attribute information, reader group attribute information, and context attribute information, the sample characteristic information is input into a text music score model, multidimensional user feedback information corresponding to the sample text is predicted, a plurality of objective functions are constructed according to the multidimensional user feedback information and the sample tag corresponding to the training sample, the plurality of objective functions are solved, and parameters of the text music score model are adjusted according to the solution result, so as to obtain a training text music score model, and obtain text characteristic information corresponding to a text to be matched, And inputting the text characteristics and the music characteristics into a trained music matching model, predicting multi-dimensional target user feedback information, and determining the target music matching from the plurality of pieces of music in the music library according to the target user feedback information. According to the scheme, the multidimensional sample characteristic information corresponding to the sample text is used as the model input, and a plurality of optimization targets are set, so that the modeling capacity of the text score model is improved. According to the text music matching model, music matching on the text to be matched can assist an author to find out appropriate background music, so that the time of the author is saved, more potential music which can be used as the background music can be found for the author, the behavior of the author in selecting target music is more flexible, and the accuracy of music matching on the text to be matched is improved.

In order to better implement the above method, an embodiment of the present application may further provide a text score apparatus, where the text score apparatus may be specifically integrated in a network device, and the network device may include a server, a terminal, and the like, where the terminal may include: a mobile phone, a tablet Computer, a notebook Computer, or a Personal Computer (PC).

For example, as shown in fig. 6, the text dubbing apparatus may include an acquisition module 61, a first prediction module 62, a loss acquisition module 63, a training module 64, and a second prediction module 65, as follows:

the obtaining module 61 is configured to obtain a sample text and multi-dimensional sample feature information corresponding to the sample text;

a first prediction module 62, configured to predict multidimensional user feedback information of the browsing user for the sample text based on a text score model and the sample feature information;

a loss obtaining module 63, configured to obtain, based on the sample feature information and the user feedback information, a loss corresponding to each dimension of user feedback information;

a training module 64, configured to train the text score model based on the loss corresponding to the user feedback information of each dimension, to obtain a trained text score model;

and the second prediction module 65 is used for predicting the target score of the text to be scored based on the trained text score model.

In one embodiment, the obtaining module 61 may include an obtaining sub-module 611 and an extracting sub-module 612, as follows:

the obtaining sub-module 611 is configured to obtain a sample text and a plurality of sample score information corresponding to the sample text;

and the extracting submodule 612 is configured to extract the sample text and the features of the sample score information to obtain multi-dimensional sample feature information.

In an embodiment, the extracting sub-module 612 may be specifically configured to:

In one embodiment, the first prediction module 62 may include a first prediction sub-module 621, a second prediction sub-module 622, and a fusion sub-module 623 as follows:

the first prediction sub-module 621 is configured to predict attribute prediction information of the browsing user for the sample text based on the linear sub-model and the sample attribute information;

a second prediction sub-module 622, configured to predict, based on the deep neural network sub-model and the sample label information, label prediction information of a browsing user for the sample text;

and a fusion sub-module 623, configured to fuse the attribute prediction information and the tag prediction information to obtain multi-dimensional user feedback information.

In an embodiment, the second prediction sub-module 622 may be specifically configured to:

converting the sample label information into a sample label feature vector;

In one embodiment, the second prediction module 65 may include a third prediction sub-module 651 and a determination sub-module 652, as follows:

a third prediction submodule 651, configured to predict, based on the trained dubbing music model, the music library, and the text to be dubbed music, multi-dimensional target user feedback information of each piece of music in the music library for the text to be dubbed music;

the determining submodule 652 is configured to determine, according to the target user feedback information, a target score of the text to be scored from the music library.

In an embodiment, the third prediction sub-module 651 may be specifically configured to:

In an embodiment, the determining sub-module 652 may be specifically configured to:

In a specific implementation, the above units may be implemented as independent entities, or may be combined arbitrarily to be implemented as the same or several entities, and the specific implementation of the above units may refer to the foregoing method embodiments, which are not described herein again.

As can be seen from the above, in the embodiment of the application, the sample text and the multidimensional sample feature information corresponding to the sample text may be obtained by the obtaining module 61, the multidimensional user feedback information of the browsing user for the sample text is predicted based on the text score model and the sample feature information by the first prediction module 62, the loss corresponding to the user feedback information of each dimension is obtained by the loss obtaining module 63 based on the sample feature information and the user feedback information, the text score model is trained by the training module 64 based on the loss corresponding to the user feedback information of each dimension to obtain a trained local score model, and the target score of the text to be scored is predicted by the second prediction module 65 based on the trained text score model. According to the scheme, the multidimensional sample characteristic information corresponding to the sample text is used as the model input, and a plurality of optimization targets are set, so that the modeling capacity of the text score model is improved. According to the text music matching model, music matching on the text to be matched can assist an author to find out appropriate background music, so that the time of the author is saved, more potential music which can be used as the background music can be found for the author, the behavior of the author in selecting target music is more flexible, and the accuracy of music matching on the text to be matched is improved.

The embodiment of the application also provides network equipment which can integrate any one of the text music devices provided by the embodiment of the application.

For example, as shown in fig. 7, it shows a schematic structural diagram of a network device according to an embodiment of the present application, specifically:

the network device may include components such as a processor 71 of one or more processing cores, memory 72 of one or more computer-readable storage media, a power supply 73, and an input unit 74. Those skilled in the art will appreciate that the network device architecture shown in fig. 7 does not constitute a limitation of network devices and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. Wherein:

the processor 71 is a control center of the network device, connects various parts of the entire network device by using various interfaces and lines, and performs various functions of the network device and processes data by running or executing software programs and/or modules stored in the memory 72 and calling data stored in the memory 72, thereby performing overall monitoring of the network device. Alternatively, processor 71 may include one or more processing cores; preferably, the processor 71 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 71.

The memory 72 may be used to store software programs and modules, and the processor 71 executes various functional applications and data processing by operating the software programs and modules stored in the memory 72. The memory 72 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the network device, and the like. Further, the memory 72 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 72 may also include a memory controller to provide the processor 71 access to the memory 72.

The network device also includes a power supply 73 for supplying power to the various components, and preferably, the power supply 73 is logically connected to the processor 71 through a power management system, so that functions of managing charging, discharging, and power consumption are implemented through the power management system. The power supply 73 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

The network device may also include an input unit 74, the input unit 74 being operable to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the network device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 71 in the network device loads the executable file corresponding to the process of one or more application programs into the memory 72 according to the following instructions, and the processor 71 runs the application programs stored in the memory 72, thereby implementing various functions as follows:

the method comprises the steps of obtaining a sample text and multi-dimensional sample characteristic information corresponding to the sample text, predicting user feedback information of a browsing user aiming at the multi-dimension of the sample text based on a text score model and the sample characteristic information, obtaining loss corresponding to the user feedback information of each dimension based on the sample characteristic information and the user feedback information, training the text score model based on the loss corresponding to the user feedback information of each dimension to obtain a trained text score model, and predicting target score of a text to be scored based on the trained text score model.

The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.

To this end, embodiments of the present application provide a computer device, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps in any one of the text score methods provided by the embodiments of the present application. For example, the instructions may perform the steps of:

Wherein the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

Since the instructions stored in the storage medium can execute the steps in any of the text music methods provided in the embodiments of the present application, the beneficial effects that can be achieved by any of the text music methods provided in the embodiments of the present application can be achieved, which are detailed in the foregoing embodiments and will not be described again here.

The text score method, the text score device, and the computer storage medium provided by the embodiments of the present application are described in detail above, and specific examples are applied herein to illustrate the principles and implementations of the present application, and the description of the embodiments above is only used to help understand the method and the core ideas of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method for matching a score with a text, comprising:

2. The method of claim 1, wherein obtaining a sample text and multi-dimensional sample feature information corresponding to the sample text comprises:

3. The text score method of claim 2, wherein the sample feature information comprises sample text feature information and sample score feature information;

extracting the sample text and the characteristics of the sample score information to obtain multi-dimensional sample characteristic information, wherein the multi-dimensional sample characteristic information comprises the following steps:

4. The text dubbing method of claim 1, wherein the sample feature information includes sample label information and sample attribute information;

the text score model comprises a deep neural network submodel and a linear submodel;

predicting multi-dimensional user feedback information of a browsing user for the sample text based on a text score model and the sample feature information, wherein the method comprises the following steps:

5. The text score method of claim 4, wherein predicting label prediction information of a browsing user for the sample text based on the deep neural network submodel and the sample label information comprises:

converting the sample label information into a sample label feature vector;

6. The method of claim 1, wherein predicting the target score of the text to be scored based on the trained text score model comprises:

7. The method for matching music with text according to claim 6, wherein predicting multi-dimensional target user feedback information of each piece of music in a music library for the text to be matched based on the trained text matching model, the music library and the text to be matched comprises:

8. The method for dubbing music in text according to claim 6, wherein the step of determining the target dubbing music of the text to be dubbed music from the music library according to the target user feedback information comprises the following steps:

9. A text soundtrack apparatus, comprising:

10. A computer storage medium having a computer program stored thereon, which, when run on a computer, causes the computer to perform the method of text dubbing according to any one of claims 1 to 8.