CN111737554A - Scoring model training method, electronic book scoring method and device - Google Patents

Scoring model training method, electronic book scoring method and device Download PDF

Info

Publication number
CN111737554A
CN111737554A CN202010550244.2A CN202010550244A CN111737554A CN 111737554 A CN111737554 A CN 111737554A CN 202010550244 A CN202010550244 A CN 202010550244A CN 111737554 A CN111737554 A CN 111737554A
Authority
CN
China
Prior art keywords
sample
electronic book
scoring
model
acquiring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010550244.2A
Other languages
Chinese (zh)
Inventor
刘广东
杨勇
张洪祯
刘先钊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202010550244.2A priority Critical patent/CN111737554A/en
Publication of CN111737554A publication Critical patent/CN111737554A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0282Rating or review of business operators or products

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Finance (AREA)
  • Software Systems (AREA)
  • Accounting & Taxation (AREA)
  • Game Theory and Decision Science (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The application provides a scoring model training method, an electronic book scoring method and an electronic book scoring device. The method comprises the following steps: acquiring a learning sample and a testing sample, wherein the learning sample and the testing sample are multimedia file samples, and the testing sample corresponds to an initial score value; acquiring sample characteristics associated with the learning sample; inputting the sample characteristics into a preset model algorithm, and acquiring a target model file output by the preset model algorithm; inputting the sample characteristics corresponding to the test sample into the target model file, and acquiring the prediction score value corresponding to the test sample output by the target model file; and taking the target model file as a final content scoring model under the condition that the difference value between the predicted scoring value and the initial scoring value is within an error threshold value range. The method and the device can realize the quality scoring of the newly on-line multimedia file.

Description

Scoring model training method, electronic book scoring method and device
Technical Field
The present disclosure relates to the field of electronic book scoring technologies, and in particular, to a scoring model training method, an electronic book scoring method, and an electronic book scoring device.
Background
With the continuous development of science and technology, electronic devices (such as mobile phones, computers, etc.) gradually become an indispensable electronic tool in people's life and work.
Electronic devices are often used to browse multimedia files (e.g., electronic books, videos, etc.) during life and work of people, and when a user selects a browsed multimedia file, the user usually refers to the scoring data of the multimedia file for selection. In the prior art, most of the scoring methods for multimedia files collect user behavior data in a specified multimedia file platform, such as browsing amount, downloading amount, purchasing amount, comments and other data of the multimedia files, and analyze the data to obtain scoring data of the multimedia files.
In the above scoring method, a large amount of user behavior data needs to be combined to score the multimedia files, and the new online multimedia files cannot be scored due to the lack of user behavior data.
Disclosure of Invention
The embodiment of the application aims to provide a scoring model training method, an electronic book scoring method and an electronic book scoring device, so as to score a newly-online multimedia file. The specific technical scheme is as follows:
in a first aspect of the present application, there is provided a scoring model training method, including:
acquiring a learning sample and a testing sample, wherein the learning sample and the testing sample are multimedia file samples, and the testing sample corresponds to an initial score value;
acquiring sample characteristics associated with the learning sample;
inputting the sample characteristics into a preset model algorithm, and acquiring a target model file output by the preset model algorithm;
inputting the sample characteristics corresponding to the test sample into the target model file, and acquiring the prediction score value corresponding to the test sample output by the target model file;
and taking the target model file as a final content scoring model under the condition that the difference value between the predicted scoring value and the initial scoring value is within an error threshold value range.
Optionally, the obtaining of the sample feature associated with the learning sample includes:
extracting sample characteristics of the learning sample according to text information in the learning sample; and/or
Acquiring sample characteristics of the learning sample according to the file state information corresponding to the learning sample; and/or
And acquiring the sample characteristics of the learning sample according to the historical behavior data corresponding to the learning sample.
Optionally, the taking the target model file as a final content scoring model in a case that a difference between the predicted score value and the initial score value is within an error threshold includes:
acquiring a difference absolute value between the prediction score value and the initial score value;
and taking the target model file as the content scoring model when the absolute value of the difference value is within the error threshold range.
Optionally, after the testing the target model file by using the test sample to obtain the prediction score value corresponding to the test sample, the method further includes:
under the condition that the difference value between the predicted score value and the initial score value is out of the error threshold range, adjusting the preset model algorithm according to a preset adjusting strategy to obtain an adjusted model algorithm;
training the adjustment model algorithm according to the obtained reference sample;
the preset adjustment strategy comprises the following steps: at least one of an algorithm parameter adjustment strategy, a sample characteristic adjustment strategy and a sample number adjustment strategy.
In a second aspect of the present application, there is provided an electronic book scoring method, including:
acquiring an electronic book to be evaluated;
acquiring the electronic book characteristics of the electronic book to be evaluated;
inputting the electronic book features into a content scoring model, and acquiring the content scores of the electronic books to be scored, which are output by the content scoring model;
wherein, the content scoring model is obtained by training through any one of the scoring model training methods.
Optionally, the obtaining the electronic book features of the electronic book to be evaluated includes:
extracting the electronic book features of the electronic book to be evaluated according to the text information in the electronic book to be evaluated; and/or
Acquiring the electronic book characteristics of the electronic book to be evaluated according to the file state information corresponding to the electronic book to be evaluated; and/or
And acquiring the electronic book characteristics of the electronic book to be evaluated according to the historical behavior data corresponding to the electronic book to be evaluated.
In a third aspect of the present application, there is provided a scoring model training device, including:
the device comprises a test sample acquisition module, a test sample acquisition module and a data processing module, wherein the test sample acquisition module is used for acquiring a learning sample and a test sample, the learning sample and the test sample are multimedia file samples, and the test sample corresponds to an initial score value;
the sample characteristic acquisition module is used for acquiring sample characteristics related to the learning sample;
the target model file acquisition module is used for inputting the sample characteristics into a preset model algorithm and acquiring a target model file output by the preset model algorithm;
the prediction score value obtaining module is used for inputting the sample characteristics corresponding to the test sample into the target model file and obtaining the prediction score value corresponding to the test sample output by the target model file;
and the content scoring model acquisition module is used for taking the target model file as a final content scoring model under the condition that the difference value between the predicted scoring value and the initial scoring value is within an error threshold range.
Optionally, the sample feature obtaining module includes:
the first sample characteristic acquisition unit is used for extracting sample characteristics of the learning sample according to the text information in the learning sample;
the second sample characteristic acquisition unit is used for acquiring the sample characteristics of the learning sample according to the file state information corresponding to the learning sample;
and the third sample characteristic acquisition unit is used for acquiring the sample characteristics of the learning sample according to the historical behavior data corresponding to the learning sample.
Optionally, the content scoring model obtaining module includes:
a difference absolute value obtaining unit configured to obtain a difference absolute value between the prediction score value and the initial score value;
and the content scoring model obtaining unit is used for taking the target model file as the content scoring model under the condition that the absolute value of the difference value is within the error threshold range.
Optionally, the method further comprises:
the adjustment model algorithm obtaining module is used for adjusting the preset model algorithm according to a preset adjustment strategy under the condition that the difference value between the prediction score value and the initial score value is out of an error threshold range to obtain an adjustment model algorithm;
the adjusting model algorithm training module is used for training the adjusting model algorithm according to the obtained reference sample;
the preset adjustment strategy comprises the following steps: at least one of an algorithm parameter adjustment strategy, a sample characteristic adjustment strategy and a sample number adjustment strategy.
In a fourth aspect of this application, an electronic book scoring apparatus is provided, including:
the electronic book to be evaluated acquisition module is used for acquiring the electronic book to be evaluated;
the electronic book feature acquisition module is used for acquiring the electronic book features of the electronic book to be evaluated;
the content scoring acquisition module is used for inputting the electronic book characteristics to a content scoring model and acquiring the content scoring of the electronic book to be scored, which is output by the content scoring model;
wherein, the content scoring model is obtained by training through any one of the scoring model training devices.
Optionally, the electronic book feature obtaining module includes:
the first electronic book feature acquisition unit is used for extracting the electronic book features of the electronic book to be evaluated according to the text information in the electronic book to be evaluated;
the second electronic book feature acquisition unit is used for acquiring the electronic book features of the electronic book to be evaluated according to the file state information corresponding to the electronic book to be evaluated;
and the third electronic book characteristic acquisition unit is used for acquiring the electronic book characteristics of the electronic book to be evaluated according to the historical behavior data corresponding to the electronic book to be evaluated.
In another aspect of this application, there is also provided an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing the scoring model training method or the electronic book scoring method when executing the program stored in the memory.
In yet another aspect of this embodiment, there is also provided a computer-readable storage medium having stored therein instructions, which, when executed on a computer, cause the computer to execute the above scoring model training method, or the above electronic book scoring method.
According to the scoring model training method and the electronic book scoring method and device, a learning sample and a testing sample are obtained, the learning sample and the testing sample are multimedia file samples, the testing sample corresponds to an initial scoring value, sample characteristics related to the learning sample are obtained, the sample characteristics are input into a preset model algorithm, a target model file output by the preset model algorithm is obtained, the sample characteristics corresponding to the testing sample are input into the target model file, a predicted scoring value corresponding to the testing sample output by the target model is obtained, and the target model file is used as a final content scoring model under the condition that the difference value of the predicted scoring value and the initial scoring value is within an error threshold value range. According to the embodiment of the application, the scoring model is trained by combining the sample characteristics of the multimedia file sample, and for the newly online multimedia file, the quality scoring of the multimedia file can be realized under the condition of lacking of user behavior data.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
Fig. 1 is a flowchart illustrating steps of a scoring model training method according to an embodiment of the present disclosure;
FIG. 2 is a flowchart illustrating steps of a method for scoring an electronic book according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a scoring model training device according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of an electronic book scoring device according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an electronic book device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
Referring to fig. 1, a flowchart illustrating steps of a scoring model training method provided in an embodiment of the present application is shown, and as shown in fig. 1, the scoring model training method may specifically include the following steps:
step 101: the method comprises the steps of obtaining a learning sample and a testing sample, wherein the learning sample and the testing sample are multimedia file samples, and the testing sample corresponds to an initial scoring value.
The method and the device for training the grading model can be applied to a scene of training the grading model corresponding to the multimedia file.
The learning sample refers to a sample used for training a preset model algorithm.
The test sample is a sample for testing a model file output by a preset model algorithm.
In this embodiment, the learning sample and the testing sample are multimedia file samples, such as a video file sample, an electronic book file sample, a picture file sample, and the like, and specifically, the learning sample and the testing sample may be determined according to business requirements, which is not limited in this embodiment.
In some examples, the learning samples and the testing samples may be obtained through the internet, for example, when the electronic book scoring model needs to be trained, electronic book files may be obtained from the internet as the learning samples and the testing samples.
In some examples, the learning samples and the testing samples may be obtained through a pre-set multimedia file database, for example, when the video scoring model needs to be trained, the learning samples and the testing samples may be obtained from a video file database.
It should be understood that the above examples are only examples for better understanding of the technical solutions of the embodiments of the present application, and are not to be taken as the only limitation to the embodiments.
The initial scoring value refers to a pre-set scoring value corresponding to the test sample, and after the test sample is obtained, a service worker can give a scoring value to the test sample according to the sample characteristics of the test sample to serve as the initial scoring value.
In this embodiment, the number of the learning samples and the number of the testing samples may be preset by a service person, for example, hundreds, thousands, tens of thousands, and the like, and of course, the number of the learning samples and the number of the testing samples may be the same or different, and specifically, may be determined according to a service requirement, which is not limited in this embodiment.
After the learning samples and the test samples are acquired, step 102 is performed.
Step 102: and acquiring sample characteristics associated with the learning sample.
The sample feature refers to a feature associated with the learning sample, in this embodiment, the sample feature may be at least one of dimensional features such as a feature of an inherent dimension of the sample, a feature of a state dimension of the sample, and a feature of a behavior dimension of the user, and specifically, a process of obtaining the sample feature associated with the learning sample may be described in detail with reference to the following specific implementation manner.
In a specific implementation manner of this embodiment, the step 102 may include:
substep S1: extracting sample characteristics of the learning sample according to text information in the learning sample; and/or
Substep S2: acquiring sample characteristics of the learning sample according to the file state information corresponding to the learning sample; and/or
Substep S3: and acquiring the sample characteristics of the learning sample according to the historical behavior data corresponding to the learning sample.
In this embodiment, the sample feature may be a feature of a learning sample extracted according to intrinsic text information in the learning sample, for example, the learning sample is an electronic book, and the intrinsic text information refers to information of the electronic book itself, such as book classification, book tag, book word count, and the like, that is, at least one of the classification, tag, and word count of the electronic book is used as the sample feature of the electronic book.
The sample feature may also be a feature of the learning sample obtained according to file state information corresponding to the learning sample, for example, the learning sample takes an electronic book as an example, and the file state information may be state information such as whether the electronic book is continuously loaded, that is, the state information of the electronic book is used as the sample feature of the electronic book.
The sample characteristics may also be characteristics of a learning sample obtained according to historical behavior data of the learning sample, for example, the learning sample is an electronic book, and the historical behavior data refers to data indexes counted according to user behaviors, such as data of access amount, reading amount, conversion rate, comments, and the like.
Of course, the historical behavior data features may also be extended according to a time dimension, such as the number of visits in a day or three days.
The sample feature of the learning sample may be at least one of the above features, specifically, the dimension of the sample feature of the learning sample may be selected according to business requirements, and specifically, may be determined according to business requirements.
After the sample features associated with the learning samples are obtained, step 103 is performed.
Step 103: and inputting the sample characteristics into a preset model algorithm, and acquiring a target model file output by the preset model algorithm.
The preset model algorithm refers to an algorithm for training a model for scoring multimedia, and in this embodiment, the preset model algorithm may be a GBDT (Gradient Boosting Decision Tree) algorithm, which is a typical integrated learning algorithm, in a GBDT algorithm flow, two or more Decision trees at positions where some labeled samples are sequentially trained are used, and then the Decision trees at the training positions are integrated into one model to be used as a training result.
The target model file is a model file output by a preset model algorithm after training by adopting a learning sample is completed.
After obtaining the sample features associated with the learning samples, the sample features may be input to a preset model algorithm (in this example, the GBDT algorithm is taken as an example), and in the GBDT algorithm process, after one decision tree is trained, the next decision tree continues to be trained. For other decision trees than the first decision tree, the other decision trees are used to fit training residuals calculated from all decision trees prior to the other decision trees. Thus, as the GBDT algorithm flow advances, more and more decision trees are trained in sequence, and the training residual error becomes smaller and smaller. When the training residual is small enough, the fitting effect of the model parameters of the current model on the sample characteristics of the labeled samples reaches the standard, at this time, the training can be finished, and the model file output by the GBDT algorithm is the target model file.
Of course, in this embodiment, a set number (e.g., hundreds, thousands, etc.) of learning samples may be obtained, and after the set number of learning samples are trained by using the GBDT algorithm, the target model file may be output.
After the sample features are input to the pre-set model algorithm and the target model file output by the pre-set model algorithm is obtained, step 104 is performed.
Step 104: and inputting the sample characteristics corresponding to the test sample into the target model file, and acquiring the prediction score value corresponding to the test sample output by the target model file.
The prediction score value is the score value of the test sample obtained by processing the sample characteristics of the test sample by adopting the target model file.
After the target model file is obtained, the target model file can be tested by using the sample characteristics corresponding to the test sample. The specific test process is as follows: the sample characteristics corresponding to the test sample can be input into the target model file, the target model file correspondingly processes the sample characteristics of the test sample, and the prediction score value of the test sample can be output.
After obtaining the predicted score value corresponding to the test sample output by the target model file, step 105 is performed.
Step 105: and taking the target model file as a final content scoring model under the condition that the difference value between the predicted scoring value and the initial scoring value is within an error threshold value range.
The error threshold range refers to a difference range preset by a service person for determining whether a difference between the initial score value and the predicted score value of the test sample satisfies a condition.
After the prediction score value of the test sample is obtained, a difference absolute value between the initial score value and the prediction score value corresponding to the test sample may be calculated, and then whether the difference absolute value is within the error threshold range may be determined.
In a specific implementation manner of this embodiment, the step 105 may include:
sub-step M1: and acquiring the absolute value of the difference between the prediction scoring value and the initial scoring value.
In this embodiment, the absolute value of the difference is an absolute value of a difference between the predicted score value and the initial score value, and the absolute value of the difference may be calculated after the predicted score value of the test sample is obtained.
After the absolute value of the difference is obtained, sub-step M2 is performed.
Sub-step M2: and taking the target model file as the content scoring model when the absolute value of the difference value is within the error threshold range.
After obtaining the absolute value of the difference, it may be determined whether the absolute value of the difference is within an error threshold range, and when the absolute value of the difference is within the error threshold range, the target model file is used as a content scoring model, for example, the error threshold range is 5-10, and when the absolute value of the difference is between 5-10, the target model file output by the preset model algorithm is used as a content scoring model.
When the difference between the predicted score value and the initial score value is outside the error threshold range, the preset model algorithm may be adjusted by using a preset adjustment strategy to continue training, and specifically, the following description may be performed.
In a specific implementation manner of the present application, after the step 105, the method may further include:
step N1: and under the condition that the difference value between the predicted score value and the initial score value is out of the error threshold range, adjusting the preset model algorithm according to a preset adjusting strategy to obtain an adjusted model algorithm.
In this embodiment, the model adjustment algorithm refers to an adjustment model algorithm obtained after the preset model algorithm is adjusted according to the preset adjustment policy.
The preset adjusting strategy can comprise one or more of an algorithm parameter adjusting strategy, a sample characteristic adjusting strategy, a sample quantity adjusting strategy and the like.
The algorithm parameter adjustment strategy refers to a strategy for adjusting algorithm parameters of the preset model algorithm, for example, when the preset model algorithm includes multiple mathematical algorithms, the algorithm adjustment strategy may be a strategy for adjusting algorithm parameters of one mathematical algorithm in the preset model algorithm, or a strategy for adjusting algorithm parameters of multiple mathematical algorithms in the preset model algorithm.
The sample feature adjustment strategy is a strategy for adjusting the sample features of the learning samples. For example, in the process of training the content scoring model by the preset model algorithm, the sample features of the learning samples include features of three dimensions, and after the sample features of the learning samples are adjusted according to the sample feature adjustment strategy, the obtained sample features may be sample features of two or one dimension, or sample features of four or five dimensions.
The sample number adjustment policy is a policy for adjusting the number of learning samples, for example, when the number of learning samples used in the process of training the content scoring model by using the preset model algorithm is 3000, and when the difference between the obtained prediction score value and the initial score value is outside the error threshold range by using the test sample for testing, the number of learning samples may be adjusted to train the preset model algorithm again, for example, the number of adjusted learning samples is 2000, 800, and the like.
It should be understood that the above examples are only examples for better understanding of the technical solutions of the embodiments of the present application, and are not to be taken as the only limitation to the embodiments.
When the difference value between the predicted score value and the initial score value is out of the error threshold range, the preset model algorithm can be adjusted according to a preset adjusting strategy, and therefore the adjusted model algorithm can be obtained.
After the adjusted model algorithm is obtained, step N2 is performed.
Step N2: and training the adjustment model algorithm according to the obtained reference sample.
The reference sample is a sample used for retraining the adjustment model algorithm, and is also a multimedia file sample, and it can be understood that the reference sample is the same as the learning sample and the test sample in file type, for example, when the learning sample is an electronic book sample, the reference sample is an electronic book sample, and when the learning sample is a music file sample, the reference sample is a music file sample.
After the preset model algorithm is adjusted to obtain the adjusted model algorithm, a reference sample may be obtained, and then the adjusted model algorithm may be trained according to the reference sample, specifically, the training process may be the same as the process of training the preset model algorithm by using the learning sample, which is not described herein again.
After the adjustment model algorithm is trained by using the reference sample, the target model file output by the adjustment model algorithm can be obtained, and then the test sample can be used again to test the target model file output again, which can be similar to the test process.
The sample characteristics of the learning sample defined in this embodiment are not limited to the characteristics of the user behavior data, and the scoring model may be trained by combining the inherent characteristics and the state characteristics of the learning sample, so that the trained content scoring model may perform quality scoring on a newly online multimedia file lacking user behavior data.
The scoring model training method provided by the embodiment of the application comprises the steps of obtaining a learning sample and a testing sample, wherein the learning sample and the testing sample are multimedia file samples, the testing sample corresponds to an initial scoring value, obtaining sample characteristics related to the learning sample, inputting the sample characteristics to a preset model algorithm, obtaining a target model file output by the preset model algorithm, inputting the sample characteristics corresponding to the testing sample to the target model file, obtaining a predicted scoring value corresponding to the testing sample output by the target model, and taking the target model file as a final content scoring model under the condition that the difference value of the predicted scoring value and the initial scoring value is within an error threshold value range. According to the embodiment of the application, the scoring model is trained by combining the sample characteristics of the multimedia file sample, and for the newly online multimedia file, the quality scoring of the multimedia file can be realized under the condition of lacking of user behavior data.
Referring to fig. 2, a flowchart illustrating steps of an electronic book scoring method provided in an embodiment of the present application is shown, and as shown in fig. 2, the electronic book scoring method may specifically include the following steps:
step 201: and acquiring the electronic book to be evaluated.
The method and the device for scoring the electronic book can be applied to a scene that the electronic book is scored by adopting a content scoring model.
The electronic book to be scored refers to an electronic book for scoring.
In some examples, the electronic book to be scored may be an electronic book that is newly brought online on a reading platform, e.g., after a new book is brought online on xx reading platforms, the new book may be regarded as the electronic book to be scored.
In some examples, the electronic book to be evaluated may be an electronic book searched from a certain reading platform and online for a certain period of time, for example, when the xx reading platform needs to evaluate a certain type of electronic book published on the platform, a search may be performed on the platform, and the searched electronic book of the type is taken as the electronic book to be evaluated.
It should be understood that the above examples are only examples for better understanding of the technical solutions of the embodiments of the present application, and are not to be taken as the only limitation of the embodiments of the present application.
Step 202: and acquiring the electronic book characteristics of the electronic book to be evaluated.
The electronic book features refer to book features associated with the electronic book to be evaluated. In this embodiment, the electronic book sample may be at least one of dimensional features such as a feature of an inherent dimension of the electronic book, a feature of a state dimension of the electronic book, and a feature of a behavior dimension of a user, and specifically, a process of obtaining the electronic book features of the electronic book to be evaluated may be described in detail with reference to the following specific implementation manner.
In a specific implementation manner of the present application, the step 202 may include:
sub-step P1: extracting the electronic book features of the electronic book to be evaluated according to the text information in the electronic book to be evaluated; and/or
Sub-step P2: acquiring the electronic book characteristics of the electronic book to be evaluated according to the file state information corresponding to the electronic book to be evaluated; and/or
Sub-step P3: and acquiring the electronic book characteristics of the electronic book to be evaluated according to the historical behavior data corresponding to the electronic book to be evaluated.
In this embodiment, the electronic book features may be electronic book features extracted according to inherent text information in the electronic book to be evaluated, where the inherent text information refers to information of the electronic book to be evaluated, such as book classification, book tag, book word count, book author, and the like, that is, at least one of the features of the book classification, the book tag, the book word count, the book author, and the like is used as the electronic book features.
The electronic book features may also be features of the electronic book to be evaluated, which are obtained according to the file state information corresponding to the electronic book to be evaluated, for example, state features such as whether the electronic book to be evaluated is continuously loaded or not.
The electronic book features may be obtained according to historical behavior data of the electronic book to be evaluated, where the historical behavior data refers to data indexes counted according to user behaviors, such as data indexes of access amount, reading amount, conversion rate, comments and the like.
Of course, the historical behavior data features may also be extended according to a time dimension, such as the number of visits in a day or three days.
The electronic book features may be one or more of the above features, specifically, the dimensions of the electronic book features of the electronic book to be evaluated may be selected according to business requirements, specifically, the dimensions may be selected according to the business requirements, and this embodiment does not limit this.
After the e-book features of the e-book to be evaluated are obtained, step 203 is executed.
Step 203: inputting the electronic book features into a content scoring model, and acquiring the content scores of the electronic books to be scored, which are output by the content scoring model.
In this embodiment, the content scoring model is obtained by training using the scoring model training method.
The content scoring refers to scoring the electronic book to be scored by adopting a content scoring model to obtain the quality score of the electronic book to be scored.
After the electronic book features are obtained, the electronic book features may be input to the content scoring model, and the content scoring model may process the electronic book features to output the content scores of the electronic books to be scored.
According to the embodiment of the application, the quality scoring can be performed on the electronic book with long online time, the quality scoring can be performed on the electronic book which is newly online, the quality scoring can be performed on the electronic book without electronic book behavior data, and the application scene of the electronic book quality scoring is improved.
According to the electronic book scoring method provided by the embodiment of the application, the electronic book features of the electronic book to be scored are obtained through obtaining the electronic book to be scored, the electronic book features are input into the content scoring model, and the content scoring of the electronic book to be scored output by the content scoring model is obtained. The content scoring model provided by the embodiment of the application is not limited to be trained by combining with user behavior data of the electronic book, but also can be trained by combining with inherent characteristics of the electronic book, so that the quality of the newly on-line electronic book can be scored.
Referring to fig. 3, a schematic structural diagram of a scoring model training device provided in an embodiment of the present application is shown, and as shown in fig. 3, the scoring model training device may include the following modules:
a test sample obtaining module 310, configured to obtain a learning sample and a test sample, where the learning sample and the test sample are both multimedia file samples, and the test sample corresponds to an initial score value;
a sample feature obtaining module 320, configured to obtain a sample feature associated with the learning sample;
a target model file obtaining module 330, configured to input the sample characteristics to a preset model algorithm, and obtain a target model file output by the preset model algorithm;
a prediction score value obtaining module 340, configured to input sample characteristics corresponding to the test sample into the target model file, and obtain a prediction score value corresponding to the test sample output by the target model file;
a content scoring model obtaining module 350, configured to take the target model file as a final content scoring model when a difference between the predicted scoring value and the initial scoring value is within an error threshold range.
Optionally, the sample feature obtaining module 320 includes:
the first sample characteristic acquisition unit is used for extracting sample characteristics of the learning sample according to the text information in the learning sample;
the second sample characteristic acquisition unit is used for acquiring the sample characteristics of the learning sample according to the file state information corresponding to the learning sample;
and the third sample characteristic acquisition unit is used for acquiring the sample characteristics of the learning sample according to the historical behavior data corresponding to the learning sample.
Optionally, the content scoring model obtaining module 350 includes:
a difference absolute value obtaining unit configured to obtain a difference absolute value between the prediction score value and the initial score value;
and the content scoring model obtaining unit is used for taking the target model file as the content scoring model under the condition that the absolute value of the difference value is within the error threshold range.
Optionally, the method further comprises:
the adjustment model algorithm obtaining module is used for adjusting the preset model algorithm according to a preset adjustment strategy under the condition that the difference value between the prediction score value and the initial score value is out of an error threshold range to obtain an adjustment model algorithm;
the adjusting model algorithm training module is used for training the adjusting model algorithm according to the obtained reference sample;
the preset adjustment strategy comprises the following steps: at least one of an algorithm parameter adjustment strategy, a sample characteristic adjustment strategy and a sample number adjustment strategy.
The scoring model training device provided by the embodiment of the application obtains a learning sample and a testing sample, wherein the learning sample and the testing sample are multimedia file samples, the testing sample corresponds to an initial scoring value, sample characteristics related to the learning sample are obtained, the sample characteristics are input into a preset model algorithm, a target model file output by the preset model algorithm is obtained, the sample characteristics corresponding to the testing sample are input into the target model file, a predicted scoring value corresponding to the testing sample output by the target model is obtained, and the target model file is used as a final content scoring model under the condition that the difference value between the predicted scoring value and the initial scoring value is within an error threshold range. According to the embodiment of the application, the scoring model is trained by combining the sample characteristics of the multimedia file sample, and for the newly online multimedia file, the quality scoring of the multimedia file can be realized under the condition of lacking of user behavior data.
Referring to fig. 4, a schematic structural diagram of an electronic book scoring apparatus provided in an embodiment of the present application is shown, and as shown in fig. 4, the electronic book scoring apparatus may include the following modules:
an electronic book to be evaluated obtaining module 410, configured to obtain an electronic book to be evaluated;
an electronic book feature obtaining module 420, configured to obtain an electronic book feature of the electronic book to be evaluated;
a content score obtaining module 430, configured to input the e-book features into a content score model, and obtain a content score of the e-book to be scored, which is output by the content score model;
wherein, the content scoring model is obtained by training through any one of the scoring model training devices.
Optionally, the electronic book feature obtaining module 420 includes:
the first electronic book feature acquisition unit is used for extracting the electronic book features of the electronic book to be evaluated according to the text information in the electronic book to be evaluated;
the second electronic book feature acquisition unit is used for acquiring the electronic book features of the electronic book to be evaluated according to the file state information corresponding to the electronic book to be evaluated;
and the third electronic book characteristic acquisition unit is used for acquiring the electronic book characteristics of the electronic book to be evaluated according to the historical behavior data corresponding to the electronic book to be evaluated.
The electronic book scoring device provided by the embodiment of the application obtains the electronic book features of the electronic book to be scored by obtaining the electronic book to be scored, inputs the electronic book features into the content scoring model, and obtains the content scoring of the electronic book to be scored output by the content scoring model. The content scoring model provided by the embodiment of the application is not limited to be trained by combining with user behavior data of the electronic book, but also can be trained by combining with inherent characteristics of the electronic book, so that the quality of the newly on-line electronic book can be scored.
The embodiment of the present application further provides an electronic device, as shown in fig. 5, which includes a processor 501, a communication interface 502, a memory 503 and a communication bus 504, wherein the processor 501, the communication interface 502 and the memory 503 complete mutual communication through the communication bus 504,
a memory 503 for storing a computer program;
the processor 501, when executing the program stored in the memory 503, implements the following steps:
acquiring a learning sample and a testing sample, wherein the learning sample and the testing sample are multimedia file samples, and the testing sample corresponds to an initial score value;
acquiring sample characteristics associated with the learning sample;
inputting the sample characteristics into a preset model algorithm, and acquiring a target model file output by the preset model algorithm;
inputting the sample characteristics corresponding to the test sample into the target model file, and acquiring the prediction score value corresponding to the test sample output by the target model file;
and taking the target model file as a final content scoring model under the condition that the difference value between the predicted scoring value and the initial scoring value is within an error threshold value range.
Optionally, the obtaining of the sample feature associated with the learning sample includes:
extracting sample characteristics of the learning sample according to text information in the learning sample; and/or
Acquiring sample characteristics of the learning sample according to the file state information corresponding to the learning sample; and/or
And acquiring the sample characteristics of the learning sample according to the historical behavior data corresponding to the learning sample.
Optionally, the taking the target model file as a final content scoring model in a case that a difference between the predicted score value and the initial score value is within an error threshold includes:
acquiring a difference absolute value between the prediction score value and the initial score value;
and taking the target model file as the content scoring model when the absolute value of the difference value is within the error threshold range.
Optionally, after the testing the target model file by using the test sample to obtain the prediction score value corresponding to the test sample, the method further includes:
under the condition that the difference value between the predicted score value and the initial score value is out of the error threshold range, adjusting the preset model algorithm according to a preset adjusting strategy to obtain an adjusted model algorithm;
training the adjustment model algorithm according to the obtained reference sample;
the preset adjustment strategy comprises the following steps: at least one of an algorithm parameter adjustment strategy, a sample characteristic adjustment strategy and a sample number adjustment strategy.
The processor 501, when executing the program stored in the memory 503, implements the following steps:
acquiring an electronic book to be evaluated;
acquiring the electronic book characteristics of the electronic book to be evaluated;
inputting the electronic book features into a content scoring model, and acquiring the content scores of the electronic books to be scored, which are output by the content scoring model;
wherein, the content scoring model is obtained by training through any one of the scoring model training methods.
Optionally, the obtaining the electronic book features of the electronic book to be evaluated includes:
extracting the electronic book features of the electronic book to be evaluated according to the text information in the electronic book to be evaluated; and/or
Acquiring the electronic book characteristics of the electronic book to be evaluated according to the file state information corresponding to the electronic book to be evaluated; and/or
And acquiring the electronic book characteristics of the electronic book to be evaluated according to the historical behavior data corresponding to the electronic book to be evaluated.
The communication bus mentioned in the above terminal may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the terminal and other equipment.
The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In yet another embodiment provided by the present application, there is also provided a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to execute the above scoring model training method, or the above electronic book scoring method.
In yet another embodiment provided by the present application, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the above scoring model training method, or the above electronic book scoring method.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims (10)

1. A scoring model training method is characterized by comprising the following steps:
acquiring a learning sample and a testing sample, wherein the learning sample and the testing sample are multimedia file samples, and the testing sample corresponds to an initial score value;
acquiring sample characteristics associated with the learning sample;
inputting the sample characteristics into a preset model algorithm, and acquiring a target model file output by the preset model algorithm;
inputting the sample characteristics corresponding to the test sample into the target model file, and acquiring the prediction score value corresponding to the test sample output by the target model file;
and taking the target model file as a final content scoring model under the condition that the difference value between the predicted scoring value and the initial scoring value is within an error threshold value range.
2. The method of claim 1, wherein obtaining the sample features associated with the learning samples comprises:
extracting sample characteristics of the learning sample according to text information in the learning sample; and/or
Acquiring sample characteristics of the learning sample according to the file state information corresponding to the learning sample; and/or
And acquiring the sample characteristics of the learning sample according to the historical behavior data corresponding to the learning sample.
3. The method of claim 1, wherein the step of using the target model file as a final content scoring model in the case that the difference between the predicted score value and the initial score value is within a threshold error value comprises:
acquiring a difference absolute value between the prediction score value and the initial score value;
and taking the target model file as the content scoring model when the absolute value of the difference value is within the error threshold range.
4. The method according to claim 1, wherein after the testing the target model file by using the test sample to obtain the prediction score value corresponding to the test sample, the method further comprises:
under the condition that the difference value between the predicted score value and the initial score value is out of the error threshold range, adjusting the preset model algorithm according to a preset adjusting strategy to obtain an adjusted model algorithm;
training the adjustment model algorithm according to the obtained reference sample;
the preset adjustment strategy comprises the following steps: at least one of an algorithm parameter adjustment strategy, a sample characteristic adjustment strategy and a sample number adjustment strategy.
5. An electronic book scoring method, comprising:
acquiring an electronic book to be evaluated;
acquiring the electronic book characteristics of the electronic book to be evaluated;
inputting the electronic book features into a content scoring model, and acquiring the content scores of the electronic books to be scored, which are output by the content scoring model;
wherein the content scoring model is trained by the scoring model training method according to any one of claims 1 to 4.
6. The method according to claim 5, wherein the obtaining the electronic book features of the electronic book to be evaluated comprises:
extracting the electronic book features of the electronic book to be evaluated according to the text information in the electronic book to be evaluated; and/or
Acquiring the electronic book characteristics of the electronic book to be evaluated according to the file state information corresponding to the electronic book to be evaluated; and/or
And acquiring the electronic book characteristics of the electronic book to be evaluated according to the historical behavior data corresponding to the electronic book to be evaluated.
7. A scoring model training device, comprising:
the device comprises a test sample acquisition module, a test sample acquisition module and a data processing module, wherein the test sample acquisition module is used for acquiring a learning sample and a test sample, the learning sample and the test sample are multimedia file samples, and the test sample corresponds to an initial score value;
the sample characteristic acquisition module is used for acquiring sample characteristics related to the learning sample;
the target model file acquisition module is used for inputting the sample characteristics into a preset model algorithm and acquiring a target model file output by the preset model algorithm;
the prediction score value obtaining module is used for inputting the sample characteristics corresponding to the test sample into the target model file and obtaining the prediction score value corresponding to the test sample output by the target model file;
and the content scoring model acquisition module is used for taking the target model file as a final content scoring model under the condition that the difference value between the predicted scoring value and the initial scoring value is within an error threshold range.
8. An electronic book scoring device, comprising:
the electronic book to be evaluated acquisition module is used for acquiring the electronic book to be evaluated;
the electronic book feature acquisition module is used for acquiring the electronic book features of the electronic book to be evaluated;
the content scoring acquisition module is used for inputting the electronic book characteristics to a content scoring model and acquiring the content scoring of the electronic book to be scored, which is output by the content scoring model;
wherein the content scoring model is trained by the scoring model training device according to any one of claims 7 to 10.
9. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the scoring model training method of any one of claims 1-4 or the electronic book scoring method of claims 5-6 when executing the program stored in the memory.
10. A computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing the scoring model training method according to any one of claims 1 to 4, or the electronic book scoring method according to claims 5 to 6.
CN202010550244.2A 2020-06-16 2020-06-16 Scoring model training method, electronic book scoring method and device Pending CN111737554A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010550244.2A CN111737554A (en) 2020-06-16 2020-06-16 Scoring model training method, electronic book scoring method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010550244.2A CN111737554A (en) 2020-06-16 2020-06-16 Scoring model training method, electronic book scoring method and device

Publications (1)

Publication Number Publication Date
CN111737554A true CN111737554A (en) 2020-10-02

Family

ID=72649921

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010550244.2A Pending CN111737554A (en) 2020-06-16 2020-06-16 Scoring model training method, electronic book scoring method and device

Country Status (1)

Country Link
CN (1) CN111737554A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113536769A (en) * 2021-07-21 2021-10-22 深圳证券信息有限公司 Text conciseness and clarity evaluation method and related equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522950A (en) * 2018-11-09 2019-03-26 网易传媒科技(北京)有限公司 Image Rating Model training method and device and image methods of marking and device
CN109871858A (en) * 2017-12-05 2019-06-11 北京京东尚科信息技术有限公司 Prediction model foundation, object recommendation method and system, equipment and storage medium
CN110866119A (en) * 2019-11-14 2020-03-06 腾讯科技(深圳)有限公司 Article quality determination method and device, electronic equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109871858A (en) * 2017-12-05 2019-06-11 北京京东尚科信息技术有限公司 Prediction model foundation, object recommendation method and system, equipment and storage medium
CN109522950A (en) * 2018-11-09 2019-03-26 网易传媒科技(北京)有限公司 Image Rating Model training method and device and image methods of marking and device
CN110866119A (en) * 2019-11-14 2020-03-06 腾讯科技(深圳)有限公司 Article quality determination method and device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113536769A (en) * 2021-07-21 2021-10-22 深圳证券信息有限公司 Text conciseness and clarity evaluation method and related equipment

Similar Documents

Publication Publication Date Title
CN109657137B (en) Public opinion news classification model construction method, device, computer equipment and storage medium
CN109189990B (en) Search word generation method and device and electronic equipment
CN110704626A (en) Short text classification method and device
WO2020020287A1 (en) Text similarity acquisition method, apparatus, device, and readable storage medium
CN112669078A (en) Behavior prediction model training method, device, equipment and storage medium
CN114565196B (en) Multi-event trend prejudging method, device, equipment and medium based on government affair hotline
CN110689211A (en) Method and device for evaluating website service capability
CN111639696A (en) User classification method and device
CN111640099A (en) Method and device for determining image quality, electronic equipment and storage medium
CN108021713B (en) Document clustering method and device
CN111737554A (en) Scoring model training method, electronic book scoring method and device
CN112199500A (en) Emotional tendency identification method and device for comments and electronic equipment
CN113516251A (en) Machine learning system and model training method
CN110837732B (en) Method and device for identifying intimacy between target persons, electronic equipment and storage medium
CN112163415A (en) User intention identification method and device for feedback content and electronic equipment
CN113076487B (en) User interest characterization and content recommendation method, device and equipment
CN110309421B (en) UGC content quality evaluation method and device and electronic equipment
CN110674330B (en) Expression management method and device, electronic equipment and storage medium
CN111353052B (en) Multimedia object recommendation method and device, electronic equipment and storage medium
CN113656575A (en) Training data generation method and device, electronic equipment and readable medium
CN113536138A (en) Network resource recommendation method and device, electronic equipment and readable storage medium
CN113961811A (en) Conversational recommendation method, device, equipment and medium based on event map
CN113704452A (en) Data recommendation method, device, equipment and medium based on Bert model
CN112784032A (en) Conversation corpus recommendation evaluation method and device, storage medium and electronic equipment
CN116823407B (en) Product information pushing method, device, electronic equipment and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination