CN114519114A

CN114519114A - Multimedia resource classification model construction method and device, server and storage medium

Info

Publication number: CN114519114A
Application number: CN202011311584.6A
Authority: CN
Inventors: 李旭; 陆子龙; 袁德东
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-11-20
Filing date: 2020-11-20
Publication date: 2022-05-20

Abstract

The disclosure relates to a multimedia resource classification model construction method, a multimedia resource classification model construction device, a server and a storage medium, and relates to the technical field of computers. The method comprises the following steps: acquiring a plurality of target multimedia resource samples of a target source; the target multimedia resource sample comprises a first category, and the first category is used for representing the type of the target multimedia resource sample; training by adopting a plurality of target multimedia resource samples to obtain an initial classification model; respectively inputting a plurality of candidate multimedia resource samples of other sources into the initial classification model to obtain a second category corresponding to each candidate multimedia resource sample; the second category is used for representing the type of the corresponding candidate multimedia resource sample, and the second category and the first category belong to a category system; constructing other multimedia resource samples according to each candidate multimedia resource sample and the corresponding second category; and training to obtain a multimedia resource classification model by adopting a plurality of target multimedia resource samples and other multimedia resource samples.

Description

Multimedia resource classification model construction method and device, server and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for building a multimedia resource classification model, a server, and a storage medium.

Background

Currently, the common advertisement transaction modes are: different advertisement sources can put advertisements on the same advertisement putting platform, the advertisement putting platform recommends advertisements to each terminal, and a user can check the advertisements through the terminal used by the user. Wherein, the advertisement putting platform itself can also be an advertisement source. Under the condition that the advertisement putting platform is also an advertisement source, the category of the advertisement put on the advertisement putting platform by each advertisement source belongs to the confidential information, so that the advertisement putting platform can only obtain the category of the advertisement put by the advertisement putting platform and cannot obtain the category of the advertisement put by other advertisement sources.

In the related art, the advertisements placed by each advertisement source on the advertisement placement platform are generally advertisements in which the user is interested, and the advertisements which are recommended to the terminal by the advertisement placement platform are advertisements in which the user using the terminal is interested. However, the advertisement delivery platform can only acquire the advertisement categories that the user is interested in the advertisement source of the user, and cannot acquire the advertisement categories that the user is interested in other advertisement sources. And because the advertisement category systems of different advertisement sources are different, the difference of the advertisement classification standards is possibly large, so that the advertisement putting platform cannot comprehensively analyze and obtain the advertisement categories which are interested by the user in all the advertisement sources.

Disclosure of Invention

The present disclosure provides a multimedia resource classification model construction method, apparatus, server and storage medium, which can construct a classification model common to multimedia resources from various sources, so as to obtain categories of multimedia resources that users are interested in over the whole network based on the classification model.

The technical scheme of the disclosure is as follows:

according to a first aspect of the present disclosure, there is provided a method for constructing a multimedia resource classification model, the method including:

acquiring a plurality of target multimedia resource samples of a target source; wherein the target multimedia asset sample comprises a first category for representing a type of the target multimedia asset sample;

training by adopting the plurality of target multimedia resource samples to obtain an initial classification model;

respectively inputting a plurality of candidate multimedia resource samples of other sources into the initial classification model to obtain a second category corresponding to each candidate multimedia resource sample; wherein the second category is used for representing the type of the corresponding candidate multimedia resource sample, and the second category and the first category belong to a category system;

constructing other multimedia resource samples according to each candidate multimedia resource sample and the corresponding second category;

and training to obtain the multimedia resource classification model by adopting the plurality of target multimedia resource samples and the other multimedia resource samples.

Optionally, the obtaining a plurality of target samples of multimedia resources from a target source includes:

acquiring a plurality of target description texts of the target source and a first category corresponding to each target description text;

performing word segmentation on each target description text to obtain a word segmentation result corresponding to each target description text;

and constructing the target multimedia resource sample according to the word segmentation result and the first category corresponding to each target description text, wherein the format of the target multimedia resource sample is a data format supported by the initial classification model.

Optionally, the obtaining of the plurality of target description texts of the target source and the first category corresponding to each target description text includes:

acquiring a plurality of pre-stored initial description texts of the target source and a first category corresponding to each initial description text;

deleting the initial description texts of which the text character string length is smaller than a preset length from the plurality of initial description texts to obtain a plurality of candidate description texts;

matching each candidate description text with keywords in a keyword library corresponding to a first category corresponding to the candidate description text;

deleting description texts with unmatched categories from the candidate description texts to obtain a plurality of target description texts; the description texts with unmatched categories do not include any keyword in the keyword library corresponding to the first category corresponding to the description texts.

Optionally, the method further comprises:

respectively inputting the candidate multimedia resource samples into the initial classification model, and obtaining category probability corresponding to each candidate multimedia resource sample in addition to obtaining a second category corresponding to each candidate multimedia resource sample;

constructing other multimedia resource samples according to each candidate multimedia resource sample and the corresponding second category, comprising:

determining candidate multimedia resource samples corresponding to the category probability greater than the preset probability as effective multimedia resource samples;

and constructing other multimedia resource samples according to each effective multimedia resource sample and the corresponding second category.

Optionally, after the training of the plurality of target multimedia resource samples is used to obtain the initial classification model, the method further includes:

obtaining a plurality of first multimedia resource samples of the target source, wherein the first multimedia resource samples comprise a first category;

and updating the initial classification model by adopting the plurality of first multimedia resource samples.

Optionally, the updating the initial classification model by using the plurality of first multimedia resource samples includes:

dividing first multimedia resource samples with the same first category into a category sample set;

for each first multimedia resource sample comprised by each set of type samples, performing: inputting the first multimedia resource sample into the initial classification model to obtain a second category corresponding to the first multimedia resource sample, and comparing the first category and the second category corresponding to the first multimedia resource sample;

in each type sample set, counting the proportion of the number of first multimedia resource samples with different first categories and second categories in the total number of the type sample sets;

adding a type sample set corresponding to the proportion smaller than the preset percentage to the plurality of target multimedia resource samples;

updating the initial classification model using the plurality of target multimedia resource samples to which the first multimedia resource sample is added.

According to a second aspect of the present disclosure, there is provided a multimedia resource classification model construction apparatus, including:

an acquisition module configured to perform acquiring a plurality of target multimedia resource samples of a target source; wherein the target multimedia asset sample comprises a first category for representing a type of the target multimedia asset sample;

the training module is configured to perform training on the plurality of target multimedia resource samples acquired by the acquisition module to obtain an initial classification model;

the processing module is configured to input a plurality of candidate multimedia resource samples of other sources into the initial classification model obtained by the training module respectively to obtain a second category corresponding to each candidate multimedia resource sample; wherein the second category is used for representing the type of the corresponding candidate multimedia resource sample, and the second category and the first category belong to a category system;

a construction module configured to perform construction of other multimedia resource samples according to each candidate multimedia resource sample and the corresponding second category;

the training module is further configured to perform training to obtain the multimedia resource classification model by using the plurality of target multimedia resource samples and the other multimedia resource samples.

Optionally, the obtaining module is specifically configured to perform:

Optionally, the processing module is further configured to perform inputting the plurality of candidate multimedia resource samples into the initial classification model respectively, and obtain a category probability corresponding to each candidate multimedia resource sample in addition to obtaining a second category corresponding to each candidate multimedia resource sample;

the building module is specifically configured to perform:

Optionally, the multimedia resource classification model building apparatus further includes: an update module;

the obtaining module is further configured to perform obtaining a plurality of first multimedia resource samples of the target source, the first multimedia resource samples including a first category;

the updating module is configured to perform updating the initial classification model using the plurality of first multimedia resource samples.

Optionally, the update module is specifically configured to perform:

According to a third aspect of the present disclosure, there is provided a server comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement any one of the above-mentioned first aspect, optionally the multimedia resource classification model construction method.

According to a fourth aspect of the present disclosure, there is provided a computer-readable storage medium having instructions stored thereon, which when executed by a processor of a server, enable the server to perform any one of the above-mentioned optional multimedia asset classification model construction methods of the first aspect.

According to a fifth aspect of the present disclosure, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of optionally building a multimedia resource classification model as in any one of the first aspects.

The technical scheme provided by the disclosure at least brings the following beneficial effects: a plurality of target multimedia resource samples of a target source are obtained, wherein the target multimedia resource samples comprise a first category, and the first category is used for representing the type of the target multimedia resource samples. The method comprises the steps of training by adopting a plurality of target multimedia resource samples to obtain an initial classification model, respectively inputting a plurality of candidate multimedia resource samples of other sources into the initial classification model to obtain a second category corresponding to each candidate multimedia resource sample, constructing other multimedia resource samples according to each candidate multimedia resource sample and the corresponding second category, and finally training by adopting a plurality of target multimedia resource samples and other multimedia resource samples to obtain a multimedia resource classification model. The second category is used for representing the type of the corresponding candidate multimedia resource sample, and the second category and the first category belong to a category system.

Therefore, the initial classification model is obtained by training the target multimedia resource sample of the target source, and the second category of the candidate multimedia resource sample of other sources is predicted by adopting the initial classification model, so that the category systems of the multimedia resources of all sources are unified into the category system of the multimedia resources of the target source. And then, constructing other multimedia resource samples according to the candidate multimedia resource samples and the corresponding second category, and training by using the target multimedia resource sample and the other multimedia resource samples to obtain a multimedia resource classification model. The classification model is a universal model of multimedia resources of all sources, and the category of each multimedia resource of each source can be obtained based on the classification model, so that the category of the multimedia resource interested by each user in the whole network is comprehensively analyzed. The analysis result integrates multimedia resources from multiple sources, so that the analysis result is more comprehensive and accurate, and the delivery effect of the multimedia resources can be improved by using the analysis result.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a schematic diagram illustrating a multimedia asset classification model construction system according to an exemplary embodiment.

Fig. 2 is one of the flow charts illustrating a multimedia asset classification model construction method according to an exemplary embodiment.

Fig. 3 is a second flowchart illustrating a method for constructing a multimedia resource classification model according to an exemplary embodiment.

Fig. 4 is a third flowchart illustrating a multimedia asset classification model construction method according to an exemplary embodiment.

Fig. 5 is a fourth flowchart illustrating a multimedia asset classification model construction method according to an exemplary embodiment.

Fig. 6 is a block diagram illustrating a logical structure of a multimedia asset classification model building apparatus according to an exemplary embodiment.

Fig. 7 is a block diagram illustrating a logical structure of another multimedia asset classification model building apparatus according to an exemplary embodiment.

FIG. 8 is a block diagram illustrating the structure of a server in accordance with an exemplary embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In the related art, under the condition that the advertisement delivery platform is also an advertisement source, the advertisement delivery platform can only obtain the category of the advertisement delivered by the advertisement delivery platform, cannot obtain the category of the advertisement delivered by other advertisement sources, and has different advertisement category systems of different advertisement sources, so that the advertisement delivery platform cannot comprehensively analyze and obtain the advertisement category of interest of the user in all advertisement sources.

In order to solve the above problem, an embodiment of the present disclosure provides a method for constructing a multimedia resource classification model, which is capable of constructing a classification model common to multimedia resources from various sources, so as to obtain categories of multimedia resources that are interested in a whole network by a user based on the classification model.

Fig. 1 is a schematic diagram illustrating a multimedia resource classification model building system according to an exemplary embodiment, to which the multimedia resource classification model building method provided by the embodiment of the present disclosure may be applied. As shown in fig. 1, the multimedia resource classification model construction system may include: a plurality of first servers 11, a plurality of second servers 12, and a plurality of terminals 13. Each first server 11 communicates with the second server 12 in a wired communication manner or a wireless communication manner, and each terminal 13 communicates with the second server 12 in a wired communication manner or a wireless communication manner.

The first server 11 may be a data server of multimedia resources, and is configured to deliver the multimedia resources to the second server 12. The plurality of first servers 11 shown in fig. 1 may represent multimedia resources from different sources. The category systems of multimedia resources from different sources are different. For example, the multimedia resource may be: advertisements, videos, etc. When the multimedia resource is the advertisement, some advertisement sources may classify the advertisement into daily life and some advertisement sources may classify the advertisement into household goods.

The second server 12, which may be a data server of the multimedia resource delivering platform, is used for storing and processing multimedia resources from various sources. For example, the second server 12 may store multimedia resources from various sources, and the second server 12 may recommend the stored multimedia resources to the terminal 13 through a wired network or a wireless network, so that the terminal 13 presents the multimedia resources to the user for the user to view. It should be noted that the second server 12 itself may serve as a source of a multimedia asset.

In some embodiments, the first server 11 or the second server 12 may be a single server, or may be a server cluster composed of a plurality of servers. In some embodiments, the server cluster may also be a distributed cluster. The embodiment of the present disclosure does not limit the specific implementation manner of the first server 11 or the second server 12.

The terminal 13 may be a personal intelligent device such as a mobile phone and a tablet computer, or may also be a device such as a notebook computer, a desktop computer, a television, and a projector. The disclosed embodiment also does not limit the type of the terminal 13.

It should be noted that, in the embodiment of the present disclosure, the process of the multimedia resource classification model building method is described in detail by taking the multimedia resource as an example of the advertisement. For the case that the multimedia resource is other than the advertisement, reference may be made to the following description of the process of the multimedia resource classification model construction method when the multimedia resource is the advertisement, and the embodiment of the present disclosure is not described herein again.

Fig. 2 is a flowchart illustrating a multimedia asset classification model construction method according to an exemplary embodiment, which may include steps 201 to 205 when the method is applied to the second server of fig. 1, as shown in fig. 2.

201. Multiple target advertisement samples of the target advertisement source are obtained.

Wherein the target advertisement sample comprises a first advertisement category, the first advertisement category being used to represent a type of the target advertisement sample.

202. And training by adopting a plurality of target advertisement samples to obtain an initial classification model.

203. And respectively inputting a plurality of candidate advertisement samples of other advertisement sources into the initial classification model to obtain a second advertisement category corresponding to each candidate advertisement sample.

The second advertisement category is used for representing the type of the corresponding candidate advertisement sample, and the second advertisement category and the first advertisement category belong to a category system.

204. And constructing other advertisement samples according to each candidate advertisement sample and the corresponding second advertisement category.

205. And training to obtain an advertisement classification model by adopting a plurality of target advertisement samples and other advertisement samples.

Optionally, when training the advertisement classification model, the greater the number of other advertisement samples of other advertisement sources, the greater the generalization ability of the resulting model. Based on the advertisement classification model, the advertisement category corresponding to each advertisement on the advertisement delivery platform can be determined.

The technical scheme provided by the embodiment at least has the following beneficial effects: the server obtains a plurality of target multimedia resource samples of a target source, wherein the target multimedia resource samples comprise a first category, and the first category is used for representing the type of the target multimedia resource samples. The method comprises the steps of training by adopting a plurality of target multimedia resource samples to obtain an initial classification model, respectively inputting a plurality of candidate multimedia resource samples of other sources into the initial classification model to obtain a second category corresponding to each candidate multimedia resource sample, constructing other multimedia resource samples according to each candidate multimedia resource sample and the corresponding second category, and finally training by adopting a plurality of target multimedia resource samples and other multimedia resource samples to obtain a multimedia resource classification model. The second category is used for representing the type of the corresponding candidate multimedia resource sample, and the second category and the first category belong to a category system.

Optionally, in this embodiment of the present disclosure, with reference to fig. 2, as shown in fig. 3, the step 201 may specifically include the following steps 201A to 201C.

201A, acquiring a plurality of target advertisement description texts of a target advertisement source and a first advertisement category corresponding to each target advertisement description text.

Optionally, in the embodiment of the present disclosure, the advertisement description text refers to text information describing the advertisement, including relevant information of the advertisement, for attracting the user to view. For example, an advertisement description text for a nut may be "brand a nut is cheap and tasty.

Optionally, in one implementation, the plurality of target advertisement description texts may be initial advertisement description texts of target advertisement sources pre-stored by the second server.

Optionally, in another implementation, the second server may process a plurality of initial advertisement description texts, and filter out a more accurate advertisement description text as a target advertisement description text. Specifically, the method comprises the following steps: the second server may first obtain a plurality of pre-stored initial advertisement description texts of the target advertisement source and a first advertisement category corresponding to each initial advertisement description text. The second server may then delete the initial advertisement description text having a text string length smaller than a preset length from the plurality of initial advertisement description texts, and the remaining advertisement description text is a candidate advertisement description text. Finally, the second server may match each candidate advertisement description text with a keyword in a keyword library corresponding to the first advertisement category corresponding to the candidate advertisement description text, and delete advertisement description texts with unmatched categories from the plurality of candidate advertisement description texts to obtain a plurality of target advertisement description texts. The advertisement description text with unmatched category refers to the text which does not include any keyword in the keyword library corresponding to the first advertisement category corresponding to the advertisement description text.

It should be noted that, in this embodiment of the present disclosure, the processing of the plurality of initial advertisement description texts by the second server is to delete the advertisement description text with a short text string length, and then delete the advertisement description text with a non-matching category. Of course, the second server may also delete the advertisement description text with a unmatched category first and then delete the advertisement description text with a shorter text string length from the plurality of initial advertisement description texts. The processing procedure of the multiple pieces of initial advertisement description text is not specifically limited in the embodiments of the present disclosure.

It can be understood that the reason why the short advertisement description text with the short text string length is deleted is that the information contained in the short advertisement description text is less, and if the initial classification model is trained by using the short advertisement description text without deletion, the accuracy of the initial classification model in predicting the advertisement category is affected. In addition, the advertisement description text with unmatched category is deleted in order to verify the accuracy of the first advertisement category corresponding to the advertisement description text of the target advertisement source. Through the deleting operations, more accurate advertisement description texts can be screened from the plurality of initial advertisement description texts.

And 201B, performing word segmentation on each target advertisement description text to obtain a word segmentation result corresponding to each target advertisement description text.

After the second server obtains the multiple target advertisement description texts, a pre-stored word segmentation tool can be adopted to segment words of each target advertisement description text, and a word segmentation result corresponding to each target advertisement description text is obtained.

And 201C, constructing a target advertisement sample according to the word segmentation result corresponding to each target advertisement description text and the first advertisement category.

It should be noted that the first advertisement category corresponding to the target advertisement sample is the first advertisement category corresponding to the corresponding target advertisement description text.

After obtaining the word segmentation result corresponding to each target advertisement description text, the second server may construct a target advertisement sample according to the word segmentation result corresponding to each target advertisement description text and the first advertisement category. And respectively processing the target advertisement description texts to obtain a plurality of target advertisement samples. The format of the target advertisement sample is a data format supported by the initial classification model.

For example, assuming that the initial classification model is a fasttext model, the second server constructing the target advertisement sample may specifically be that a word segmentation result corresponding to the target advertisement description text is spliced with the first advertisement category according to a preset rule to obtain a corresponding target advertisement sample.

The technical scheme provided by the embodiment at least has the following beneficial effects: and processing the target description text to construct a target multimedia resource sample, and preparing for training an initial classification model later. The target description text is obtained by deleting the description text which is short in text character string length and unmatched in category from the plurality of initial description texts, and compared with the target description text which is a plurality of initial description texts, the initial description text is screened, the obtained target description text is more accurate, the constructed target multimedia resource sample is more accurate, and the multimedia resource category predicted by the initial classification model obtained by training the target multimedia resource sample is more accurate.

Optionally, in this embodiment of the present disclosure, with reference to fig. 3, as shown in fig. 4, in step 203, a plurality of candidate multimedia resource samples are respectively input into the initial classification model, and besides obtaining the second category corresponding to each candidate multimedia resource sample, a category probability corresponding to each candidate multimedia resource sample is also obtained. Wherein a greater probability of a category indicates a more accurate second category of the predicted candidate multimedia resource sample. In this case, the step 204 may specifically include the following steps 204A to 204B.

204A, determining the candidate advertisement sample corresponding to the category probability larger than the preset probability as an effective advertisement sample.

Optionally, in this embodiment of the present disclosure, the process of the second server obtaining multiple candidate advertisement samples of other advertisement sources is: the second server cannot acquire the advertisement categories of other advertisement sources, can acquire a plurality of other advertisement description texts, and performs word segmentation on each other advertisement description text by adopting a preset word segmentation tool to obtain a word segmentation result of each other advertisement description text. And finally, the second server constructs a corresponding candidate advertisement sample according to the word segmentation result of each other advertisement description text. The format of the candidate advertisement sample is a data format supported by the initial classification model.

It is understood that the plurality of other advertisement description texts obtained by the second server may be obtained by processing other advertisement description texts that are pre-stored. The process may be to delete advertisement description text having a short text string length.

Optionally, in an implementation, the second server may construct other advertisement samples directly according to each candidate advertisement sample and the corresponding second advertisement category. In another implementation, the second server may screen out valid advertisement samples with higher accuracy from a plurality of candidate advertisement samples because the accuracy of predicting the advertisement categories of the advertisement samples of other advertisement sources directly using the initial classification model may be lower. Specifically, the second server may determine a candidate advertisement sample corresponding to the category probability greater than the preset probability as an effective advertisement sample.

204B, constructing other advertisement samples according to each valid advertisement sample and the corresponding second advertisement category.

It can be understood that, if there is no category probability greater than the preset probability, that is, the category probability corresponding to each candidate advertisement sample is less than the preset probability, the second server needs to re-acquire the advertisement sample of the target advertisement source and re-train the initial classification model, that is, to re-start to perform the above steps 201 to 202.

The technical scheme provided by the embodiment at least has the following beneficial effects: when the second server predicts the categories of the candidate multimedia resource samples of other sources by using the initial classification model, only the candidate multimedia resource samples corresponding to the category probability greater than the preset probability are used as effective multimedia resource samples by comparing the category probability corresponding to the candidate multimedia resource samples with the preset probability, and other multimedia resource samples are constructed according to each effective multimedia resource sample and the corresponding second category, so that the obtained other multimedia resource samples are samples with higher accuracy in the multimedia resource samples of other sources, preparation is made for training a universal multimedia resource classification model later, and the obtained multimedia resource classification model can predict the multimedia resource categories more accurately.

Optionally, in this embodiment of the present disclosure, after the second server obtains the initial classification model by training using a plurality of target multimedia resource samples, the second server may verify the initial classification model, and update the initial classification model according to a verification result, thereby improving accuracy of the initial classification model. Specifically, referring to fig. 4, as shown in fig. 5, after the step 202 is executed and before the step 203 is executed, the method for constructing a multimedia resource classification model provided in the embodiment of the present disclosure may further include the following steps 206 to 207.

206. A plurality of first advertisement samples of a target advertisement source is obtained.

Wherein the first advertisement sample includes a first advertisement category.

Optionally, in embodiments of the present disclosure, the plurality of first advertisement samples may be different from the plurality of target advertisement samples.

It should be noted that, in the embodiment of the present disclosure, the specific process of obtaining the first advertisement sample by the second server is the same as the process of obtaining the target advertisement sample. For the specific description of obtaining the first advertisement sample, reference may be made to the above-mentioned step 201 for obtaining the relevant description of the target advertisement sample.

207. The initial classification model is updated with a plurality of first advertisement samples.

Optionally, in this embodiment of the present disclosure, a specific process of updating the initial classification model by the second server is as follows: the second server may first divide first advertisement samples with the same first advertisement category among the plurality of first advertisement samples into a type sample set, so as to obtain at least one type sample set. For each first advertisement sample included in each type sample set, the second server may perform the following operations: inputting the first advertisement sample into the initial classification model obtained in step 202 (when the first advertisement sample is input into the initial classification model, the first advertisement category included in the first advertisement sample can be hidden), obtaining a second advertisement category corresponding to the first advertisement sample, and comparing the first advertisement category corresponding to the first advertisement sample with the second advertisement category. Then, the second server may count, in each type sample set, a ratio of the number of the first advertisement samples different from the first advertisement class and the second advertisement class in the total number of the type sample set, and add the type sample set corresponding to the ratio smaller than a preset percentage to the plurality of target advertisement samples in step 201. Finally, the second server may update the initial classification model with the plurality of target advertisement samples to which the first advertisement sample is added, i.e., retrain the initial classification model with the new advertisement sample.

It can be understood that the ratio of the number of the first advertisement samples with different first advertisement categories and second advertisement categories to the total number of the type sample set is smaller than the preset percentage, which indicates that the accuracy of predicting the advertisement categories of the type sample set is low by using the initial classification model, and indicates that the prediction capability of the initial classification model on the type sample set is weak. At this point, the first advertisement sample of the type sample set needs to be added to the plurality of target advertisement samples, and the initial classification model needs to be retrained.

It should be noted that, if the ratio counted in each type sample set is greater than or equal to the preset percentage, it indicates that the accuracy of predicting the advertisement category is high by using the initial classification model. At this point, the second server may directly perform step 203 without retraining the initial classification model.

The technical scheme provided by the embodiment at least has the following beneficial effects: after the initial classification model is obtained, the second server verifies the accuracy of the initial classification model by using the first multimedia resource sample of the target source, if the accuracy is not correct, the initial classification model is retrained, and if the accuracy is correct, the initial classification model is used for predicting the categories corresponding to the multimedia resource samples of other sources, so that the accuracy of predicting the multimedia resource categories by using the initial classification model can be improved.

Optionally, in the embodiment of the present disclosure, after the advertisement classification model is obtained by training the plurality of target advertisement samples and the plurality of other advertisement samples in step 205, the classification accuracy of the advertisement classification model may also be verified. When the accuracy of the advertisement classification model is determined to be high, the advertisement classification model can be put into use formally.

Fig. 6 is a block diagram illustrating a logical structure of a multimedia asset classification model building apparatus according to an exemplary embodiment. Referring to fig. 6, the multimedia resource classification model building apparatus applied to a server includes: an acquisition module 31, a training module 32, a processing module 33 and a construction module 34;

an obtaining module 31 configured to perform obtaining a plurality of target multimedia resource samples of a target source; wherein the target multimedia asset sample comprises a first category for representing a type of the target multimedia asset sample;

a training module 32 configured to perform training on the plurality of target multimedia resource samples acquired by the acquiring module 31 to obtain an initial classification model;

the processing module 33 is configured to perform the step of inputting a plurality of candidate multimedia resource samples from other sources into the initial classification model obtained by the training module 32, so as to obtain a second category corresponding to each candidate multimedia resource sample; wherein the second category is used for representing the type of the corresponding candidate multimedia resource sample, and the second category and the first category belong to a category system;

a construction module 34 configured to perform the construction of other multimedia asset samples according to each candidate multimedia asset sample and the corresponding second category;

the training module 32 is further configured to perform training to obtain the multimedia resource classification model by using the plurality of target multimedia resource samples and the other multimedia resource samples.

Optionally, the obtaining module 31 is specifically configured to perform:

Optionally, the processing module 33 is further configured to perform inputting the plurality of candidate multimedia resource samples into the initial classification model respectively, and obtain a category probability corresponding to each candidate multimedia resource sample in addition to obtaining the second category corresponding to each candidate multimedia resource sample;

the building module 34 is specifically configured to perform:

Optionally, as shown in fig. 7, the multimedia resource classification model building apparatus further includes: an update module 35;

the obtaining module 31 is further configured to perform obtaining a plurality of first multimedia resource samples of the target source, where the first multimedia resource samples include a first category;

the updating module 35 is configured to perform updating the initial classification model using the plurality of first multimedia resource samples.

Optionally, the updating module 35 is specifically configured to perform:

Fig. 8 is a block diagram illustrating a structure of a server, which may be a multimedia resource classification model building apparatus, according to an exemplary embodiment. The server, which may vary significantly due to configuration or performance, may include one or more processors 41 and one or more memories 42. The memory 42 stores at least one instruction, and the at least one instruction is loaded and executed by the processor 41 to implement the multimedia resource classification model building method provided by each of the above method embodiments. Of course, the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the server may also include other components for implementing the functions of the device, which are not described herein again.

The present disclosure also provides a computer-readable storage medium including instructions stored thereon, which, when executed by a processor of a computer device, enable a computer to perform the multimedia resource classification model construction method provided by the above-described illustrated embodiment. For example, the computer readable storage medium may be a memory 42 comprising instructions executable by a processor 41 of the server to perform the method described above. Alternatively, the computer readable storage medium may be a non-transitory computer readable storage medium, for example, which may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

The present disclosure also provides a computer program product containing instructions, which when run on a computer, cause the computer device to execute the method for building a multimedia resource classification model provided in the above-described illustrative embodiments.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A multimedia resource classification model construction method is characterized by comprising the following steps:

2. The method according to claim 1, wherein the obtaining a plurality of multimedia resource target samples of a target source comprises:

3. The method for constructing a multimedia resource classification model according to claim 2, wherein the obtaining a plurality of target description texts of the target source and a first category corresponding to each target description text comprises:

4. A method of constructing a multimedia resource classification model according to any of claims 1-3, characterized in that the method further comprises:

5. The method for constructing a multimedia resource classification model according to any one of claims 1 to 3, wherein after the training with the plurality of target multimedia resource samples to obtain the initial classification model, the method further comprises:

6. The method according to claim 5, wherein said updating the initial classification model using the plurality of first multimedia resource samples comprises:

7. A multimedia resource classification model construction device is characterized by comprising:

8. The apparatus according to claim 7, wherein the obtaining module is specifically configured to perform:

segmenting each target description text to obtain a segmentation result corresponding to each target description text;

9. A server, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the multimedia asset classification model building method of any of claims 1-6.

10. A computer-readable storage medium having instructions stored thereon, wherein the instructions in the computer-readable storage medium, when executed by a processor of a server, enable the server to perform the multimedia asset classification model construction method of any of claims 1-6.