CN114428837A - Content quality evaluation method, device, medium and electronic equipment - Google Patents
Content quality evaluation method, device, medium and electronic equipment Download PDFInfo
- Publication number
- CN114428837A CN114428837A CN202111671403.5A CN202111671403A CN114428837A CN 114428837 A CN114428837 A CN 114428837A CN 202111671403 A CN202111671403 A CN 202111671403A CN 114428837 A CN114428837 A CN 114428837A
- Authority
- CN
- China
- Prior art keywords
- evaluation
- text
- target
- content
- determining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3346—Query execution using probabilistic model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本公开涉及一种内容质量评价方法、装置、介质及电子设备,所述方法包括:获取目标用户针对目标内容的第一评价结果和评价文本;根据所述评价文本,确定所述目标内容的第二评价结果;确定所述目标用户和所述目标内容对应的匹配度;根据所述第一评价结果、所述第二评价结果和所述匹配度,确定所述目标用户针对所述目标内容的质量评价结果。由此结合上述三者确定质量评价结果可以充分考虑用户输入的评价分数和评论文本,以使得确定出的质量评价结果与用户输入的评价文本之间保持一致性。同时结合用户与内容之间的匹配度可以在一定程度上降低用户的主观影响,保证该质量评价结果的准确性和客观性,提高该质量评价结果与该目标内容之间的匹配程度。
The present disclosure relates to a content quality evaluation method, device, medium and electronic device. The method includes: obtaining a first evaluation result and evaluation text of a target user for target content; and determining a first evaluation result of the target content according to the evaluation text. Second evaluation result; determine the matching degree corresponding to the target user and the target content; according to the first evaluation result, the second evaluation result and the matching degree, determine the target user's degree of matching with respect to the target content Quality evaluation results. Therefore, the determination of the quality evaluation result in combination with the above three can fully consider the evaluation score and the comment text input by the user, so as to maintain consistency between the determined quality evaluation result and the evaluation text input by the user. At the same time, combining the matching degree between the user and the content can reduce the subjective influence of the user to a certain extent, ensure the accuracy and objectivity of the quality evaluation result, and improve the matching degree between the quality evaluation result and the target content.
Description
技术领域technical field
本公开涉及数据处理领域,具体地,涉及一种内容质量评价方法、装置、介质及电子设备。The present disclosure relates to the field of data processing, and in particular, to a content quality evaluation method, apparatus, medium and electronic device.
背景技术Background technique
文本情感分析是指用自然语言处理技术、文本挖掘以及计算机语言学等方法对带有情感色彩的主观性文本进行分析、处理和抽取的过程。Text sentiment analysis refers to the process of analyzing, processing and extracting subjective texts with emotional color by using natural language processing technology, text mining and computer linguistics.
针对于电影、电视剧等内容质量评价系统,观众用户会给出自己的对于该内容的评价文本和以及评价分数,从而实现对该内容的质量评价,以对其他用户进行相应的内容推荐。然而在上述评价方式中,评分通常采用五分制,用户直接输入评分时很容易直接给出满分的评价,然而可能该用户对应的评价文本并非完全正面,即在一定程度上来说用户输入的评价评分和评价文本往往是不匹配的,从而使得许多内容的评价分数往往是虚高的,难以为用户提供准确的内容查看参考。For content quality evaluation systems such as movies and TV series, audience users will give their own evaluation texts and evaluation scores for the content, so as to realize the quality evaluation of the content and recommend corresponding content to other users. However, in the above evaluation methods, the rating is usually based on a five-point system. When the user directly enters the rating, it is easy to directly give a full-point evaluation. However, the evaluation text corresponding to the user may not be completely positive, that is, to a certain extent, the evaluation input by the user Ratings and evaluation texts often do not match, so that the evaluation scores of many contents are often falsely high, making it difficult to provide users with accurate content viewing references.
发明内容SUMMARY OF THE INVENTION
本公开的目的是提供一种准确的、客观的内容质量评价方法、装置、介质及电子设备。The purpose of the present disclosure is to provide an accurate and objective content quality evaluation method, device, medium and electronic device.
为了实现上述目的,根据本公开的第一方面,提供一种内容质量评价方法,所述方法包括:In order to achieve the above object, according to a first aspect of the present disclosure, a content quality evaluation method is provided, the method comprising:
获取目标用户针对目标内容的第一评价结果和评价文本;Obtain the first evaluation result and evaluation text of the target user for the target content;
根据所述评价文本,确定所述目标内容的第二评价结果;determining a second evaluation result of the target content according to the evaluation text;
确定所述目标用户和所述目标内容对应的匹配度;determining the matching degree corresponding to the target user and the target content;
根据所述第一评价结果、所述第二评价结果和所述匹配度,确定所述目标用户针对所述目标内容的质量评价结果。According to the first evaluation result, the second evaluation result and the matching degree, the quality evaluation result of the target user for the target content is determined.
可选地,所述确定所述目标用户和所述目标内容对应的匹配度,包括:Optionally, the determining the matching degree corresponding to the target user and the target content includes:
基于所述目标用户的历史评价文本,确定所述目标用户对应的用户向量;Determine the user vector corresponding to the target user based on the historical evaluation text of the target user;
基于所述目标内容对应的多个评价文本,确定所述目标内容对应的内容向量;determining a content vector corresponding to the target content based on a plurality of evaluation texts corresponding to the target content;
根据所述用户向量和所述内容向量,确定所述匹配度。The matching degree is determined according to the user vector and the content vector.
可选地,所述基于所述目标用户的历史评价文本,确定所述目标用户对应的用户向量,包括:Optionally, determining the user vector corresponding to the target user based on the historical evaluation text of the target user, including:
对所述历史评价文本进行聚类,获得所述历史评价文本对应的多个聚类簇;Clustering the historical evaluation text to obtain a plurality of clusters corresponding to the historical evaluation text;
针对每一所述聚类簇,将该聚类簇中文本长度小于预设的长度阈值的历史评价文本进行拼接,获得至少一个拼接文本;For each of the clusters, splicing historical evaluation texts whose text length is less than a preset length threshold in the cluster to obtain at least one spliced text;
基于各个聚类簇中的拼接文本、以及文本长度不小于所述长度阈值的历史评价文本、和主题生成模型,确定所述目标用户对应的主题词,并基于所述主题词对应的向量,确定所述用户向量。Based on the spliced text in each cluster, the historical evaluation text whose text length is not less than the length threshold, and the topic generation model, determine the subject word corresponding to the target user, and based on the vector corresponding to the subject word, determine the user vector.
可选地,所述基于所述目标内容对应的多个评价文本,确定所述目标内容对应的内容向量,包括:Optionally, determining the content vector corresponding to the target content based on multiple evaluation texts corresponding to the target content, including:
确定所述目标内容对应的多个评价文本中的每一分词对应的词频和逆向文档频率,以及所述分词对应的文本长度比例,其中,所述分词对应的文本长度比例为所述分词所属的评价文本的长度与所述多个评价文本的平均文本长度的比值;Determine the word frequency and reverse document frequency corresponding to each word segment in the multiple evaluation texts corresponding to the target content, and the text length ratio corresponding to the word segment, wherein the text length ratio corresponding to the word segment is the segment to which the word segment belongs. the ratio of the length of the evaluation text to the average text length of the plurality of evaluation texts;
针对每一所述分词,将所述分词对应的词频、逆向文档频率和所述分词对应的文本长度比例的乘积确定为所述分词对应的目标参数;For each of the word segmentations, the product of the word frequency corresponding to the word segmentation, the reverse document frequency and the text length ratio corresponding to the word segmentation is determined as the target parameter corresponding to the word segmentation;
根据每一所述分词对应的目标参数,确定所述目标内容对应的目标分词,并基于所述目标分词对应的向量确定所述内容向量。According to the target parameter corresponding to each of the word segments, the target word segment corresponding to the target content is determined, and the content vector is determined based on the vector corresponding to the target word segment.
可选地,所述根据所述评价文本,确定所述目标内容的第二评价结果,包括:Optionally, the determining the second evaluation result of the target content according to the evaluation text includes:
根据所述评价文本和文本分类模型,确定所述评价文本对应的分类,并将所述分类指示的评分确定为所述第二评价结果;According to the evaluation text and the text classification model, the classification corresponding to the evaluation text is determined, and the score indicated by the classification is determined as the second evaluation result;
其中,所述文本分类模型的训练过程中,基于特征提取子模型进行特征提取并基于全连接层获得目标特征,所述文本分类模型的预测结果是通过目标特征中的部分特征进行预测得出的。Wherein, in the training process of the text classification model, feature extraction is performed based on the feature extraction sub-model and target features are obtained based on the fully connected layer, and the prediction result of the text classification model is obtained by predicting some features in the target features .
可选地,所述根据所述第一评价结果、所述第二评价结果和所述匹配度,确定所述目标用户针对所述目标内容的质量评价结果,包括:Optionally, determining the quality evaluation result of the target user for the target content according to the first evaluation result, the second evaluation result and the matching degree includes:
将所述第一评价结果和所述第二评价结果的加权和确定为初始评价结果;determining the weighted sum of the first evaluation result and the second evaluation result as the initial evaluation result;
根据所述匹配度对所述初始评价结果进行调整,获得所述质量评价结果。The initial evaluation result is adjusted according to the matching degree to obtain the quality evaluation result.
可选地,所述方法还包括:Optionally, the method further includes:
根据内容库中的每一内容对应的质量评价结果,确定与所述目标用户对应的推荐内容;Determine the recommended content corresponding to the target user according to the quality evaluation result corresponding to each content in the content library;
输出所述推荐内容。The recommended content is output.
根据本公开的第二方面,提供一种内容质量评价装置,所述装置包括:According to a second aspect of the present disclosure, there is provided an apparatus for evaluating content quality, the apparatus comprising:
获取模块,用于获取目标用户针对目标内容的第一评价结果和评价文本;an acquisition module, used to acquire the first evaluation result and evaluation text of the target user for the target content;
第一确定模块,用于根据所述评价文本,确定所述目标内容的第二评价结果;a first determining module, configured to determine a second evaluation result of the target content according to the evaluation text;
第二确定模块,用于确定所述目标用户和所述目标内容对应的匹配度;a second determining module, configured to determine the matching degree corresponding to the target user and the target content;
第三确定模块,用于根据所述第一评价结果、所述第二评价结果和所述匹配度,确定所述目标用户针对所述目标内容的质量评价结果。The third determination module is configured to determine the quality evaluation result of the target user for the target content according to the first evaluation result, the second evaluation result and the matching degree.
根据本公开的第三方面,提供一种非临时性计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现第一方面中任一所述方法的步骤。According to a third aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium having a computer program stored thereon, the program implementing the steps of any one of the methods in the first aspect when the program is executed by a processor.
根据本公开的第四方面,提供一种电子设备,包括:According to a fourth aspect of the present disclosure, there is provided an electronic device, comprising:
存储器,其上存储有计算机程序;a memory on which a computer program is stored;
处理器,用于执行所述存储器中的所述计算机程序,以实现第一方面任一所述方法的步骤。A processor, configured to execute the computer program in the memory, to implement the steps of any one of the methods of the first aspect.
由此,通过上述技术方案,第一评价结果为目标用户输入的评分,第二评价结果为根据目标用户输入的评价文本确定出的评分,目标用户和目标内容对应的匹配度可以在一定程度上表征用户存在主观性评价的可能性,从而结合上述三者确定质量评价结果可以充分考虑用户输入的评价分数和评论文本,以使得确定出的质量评价结果与用户输入的评价文本之间保持一致性。同时结合用户与内容之间的匹配度可以在一定程度上降低用户的主观影响,从而保证该质量评价结果的准确性和客观性,提高该质量评价结果与该目标内容之间的匹配程度,为用户提供准确的内容查看参考,提升用户使用体验。Therefore, through the above technical solution, the first evaluation result is the score input by the target user, the second evaluation result is the score determined according to the evaluation text input by the target user, and the matching degree corresponding to the target user and the target content can be to a certain extent Indicates the possibility of subjective evaluation by the user, so that the quality evaluation result determined by combining the above three can fully consider the evaluation score and comment text input by the user, so as to maintain consistency between the determined quality evaluation result and the evaluation text input by the user . At the same time, combining the matching degree between the user and the content can reduce the subjective influence of the user to a certain extent, so as to ensure the accuracy and objectivity of the quality evaluation result, and improve the matching degree between the quality evaluation result and the target content. Users can provide accurate content viewing reference to improve user experience.
本公开的其他特征和优点将在随后的具体实施方式部分予以详细说明。Other features and advantages of the present disclosure will be described in detail in the detailed description that follows.
附图说明Description of drawings
附图是用来提供对本公开的进一步理解,并且构成说明书的一部分,与下面的具体实施方式一起用于解释本公开,但并不构成对本公开的限制。在附图中:The accompanying drawings are used to provide a further understanding of the present disclosure, and constitute a part of the specification, and together with the following detailed description, are used to explain the present disclosure, but not to limit the present disclosure. In the attached image:
图1是根据本公开的一种实施方式提供的内容质量评价方法的流程图;1 is a flowchart of a content quality evaluation method provided according to an embodiment of the present disclosure;
图2是根据本公开的一种实施方式提供的文本分类模型的结构示意图;2 is a schematic structural diagram of a text classification model provided according to an embodiment of the present disclosure;
图3是确定目标用户和目标内容的对应的匹配度的示例性实现方式的流程图;3 is a flowchart of an exemplary implementation of determining a corresponding degree of match between a target user and target content;
图4是根据本公开的一种实施方式提供的内容质量评价装置的框图;FIG. 4 is a block diagram of a content quality evaluation apparatus provided according to an embodiment of the present disclosure;
图5是根据一示例性实施例示出的一种电子设备的框图;5 is a block diagram of an electronic device according to an exemplary embodiment;
图6是根据一示例性实施例示出的一种电子设备的框图。Fig. 6 is a block diagram of an electronic device according to an exemplary embodiment.
具体实施方式Detailed ways
以下结合附图对本公开的具体实施方式进行详细说明。应当理解的是,此处所描述的具体实施方式仅用于说明和解释本公开,并不用于限制本公开。The specific embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are only used to illustrate and explain the present disclosure, but not to limit the present disclosure.
图1为根据本公开的一种实施方式提供的内容质量评价方法的流程图。如图1所示,所述方法可以包括:FIG. 1 is a flowchart of a content quality evaluation method provided according to an embodiment of the present disclosure. As shown in Figure 1, the method may include:
在步骤11中,获取目标用户针对目标内容的第一评价结果和评价文本。其中,目标内容可以为电影、电视剧、动画等多媒体内容中的任意内容。示例地,该第一评价结果可以是目标用户针对该目标内容输入的评分值,该评价文本则可以是该目标用户针对目标内容输入的评论字符。In step 11, the first evaluation result and evaluation text of the target user for the target content are obtained. Wherein, the target content may be any content in multimedia content such as movies, TV series, and animations. For example, the first evaluation result may be a rating value input by the target user for the target content, and the evaluation text may be comment characters input by the target user for the target content.
在步骤12中,根据评价文本,确定目标内容的第二评价结果。In
其中,评价文本为目标用户针对该目标内容的评论,因此该评价文本也可以在一定程度上反映目标用户对该目标内容的评价偏向,因此,在该实施例中,可以进一步根据该评价文本确定目标用户对目标内容的另一评价结果。示例地,该第二评价结果可以通过评分值进行表示,且该评分值的取值范围与第一评价结果对应的评分值的范围相同,则第二评价结果可以用于表示根据目标用户输入的评价文本确定出的评分。The evaluation text is the target user's comment on the target content, so the evaluation text can also reflect the evaluation bias of the target user on the target content to a certain extent. Therefore, in this embodiment, it can be further determined according to the evaluation text. Another evaluation result of the target user on the target content. For example, the second evaluation result can be represented by a score value, and the value range of the score value is the same as the range of the score value corresponding to the first evaluation result, then the second evaluation result can be used to indicate that according to the target user input The rating determined by the review text.
在步骤13中,确定目标用户和目标内容对应的匹配度。In
其中,每一目标用户都可能存在其对应的兴趣领域,属于其兴趣领域中的目标内容,该用户可能会给出更优的评价,即用户可能给出具有偏向性的评价。因此,在该实施例中,可以确定目标用户和目标内容之间的匹配度,以在一定程度上表征用户存在主观性评价的可能性,即匹配度越高表示该用户给出的评价为主观性评价的可能性越高。Among them, each target user may have its corresponding field of interest, which belongs to the target content in the field of interest, and the user may give a better evaluation, that is, the user may give a biased evaluation. Therefore, in this embodiment, the matching degree between the target user and the target content can be determined to represent the possibility of subjective evaluation of the user to a certain extent, that is, the higher the matching degree, the more subjective the evaluation given by the user. The higher the probability of sexual evaluation.
在步骤14中,根据第一评价结果、第二评价结果和匹配度,确定目标用户针对目标内容的质量评价结果。In
由此,通过上述技术方案,第一评价结果为目标用户输入的评分,第二评价结果为根据目标用户输入的评价文本确定出的评分,目标用户和目标内容对应的匹配度可以在一定程度上表征用户存在主观性评价的可能性,从而结合上述三者确定质量评价结果可以充分考虑用户输入的评价分数和评论文本,以使得确定出的质量评价结果与用户输入的评价文本之间保持一致性。同时结合用户与内容之间的匹配度可以在一定程度上降低用户的主观影响,从而保证该质量评价结果的准确性和客观性,提高该质量评价结果与该目标内容之间的匹配程度,为用户提供准确的内容查看参考,提升用户使用体验。Therefore, through the above technical solution, the first evaluation result is the score input by the target user, the second evaluation result is the score determined according to the evaluation text input by the target user, and the matching degree corresponding to the target user and the target content can be to a certain extent Indicates the possibility of subjective evaluation by the user, so that the quality evaluation result determined by combining the above three can fully consider the evaluation score and comment text input by the user, so as to maintain consistency between the determined quality evaluation result and the evaluation text input by the user . At the same time, combining the matching degree between the user and the content can reduce the subjective influence of the user to a certain extent, so as to ensure the accuracy and objectivity of the quality evaluation result, and improve the matching degree between the quality evaluation result and the target content. Users can provide accurate content viewing reference to improve user experience.
作为示例,可以基于评分模型对评价文本进行评分预测,从而获得第二评价结果。示例地,评分模型可以基于CNN(Convolutional Neural Networks,卷积神经网络)、RNN(Recurrent Neural Network,循环神经网络)等实现,基于标注评分的评价文本作为训练样本进行训练,其训练方式可以采用本领域中通用的方式进行训练,在此不再赘述。As an example, the rating text may be predicted based on the rating model to obtain the second rating result. For example, the scoring model can be implemented based on CNN (Convolutional Neural Networks, Convolutional Neural Networks), RNN (Recurrent Neural Networks, Recurrent Neural Networks), etc., and the evaluation texts based on annotation scores are used as training samples for training, and the training method can use this method. The training is carried out in a common way in the field, and will not be repeated here.
作为另一示例,在步骤12中根据评价文本,确定目标内容的第二评价结果的示例性实现方式如下,包括:As another example, an exemplary implementation manner of determining the second evaluation result of the target content according to the evaluation text in
根据所述评价文本和文本分类模型,确定所述评价文本对应的分类,并将所述分类指示的评分确定为所述第二评价结果。According to the evaluation text and the text classification model, the classification corresponding to the evaluation text is determined, and the score indicated by the classification is determined as the second evaluation result.
其中,所述文本分类模型的训练过程中,基于特征提取子模型进行特征提取并基于全连接层获得目标特征,所述文本分类模型的预测结果是通过目标特征中的部分特征进行预测得出的。Wherein, in the training process of the text classification model, feature extraction is performed based on the feature extraction sub-model and target features are obtained based on the fully connected layer, and the prediction result of the text classification model is obtained by predicting some features in the target features .
示例地,该特征提取子模型可以基于Transformer模型实现,相应地,该文本分类模型的结构示意图如图2所示,其中,文本分类模型中各特征层的层数仅为示例性说明,不对本公开方案进行限定。Exemplarily, the feature extraction sub-model can be implemented based on the Transformer model. Correspondingly, a schematic diagram of the structure of the text classification model is shown in FIG. 2 , wherein the number of layers of each feature layer in the text classification model is only an exemplary illustration, not the Public plans are limited.
如图2所示,文本分类模型包括特征提取子模型,该特征提取子模型可以由Transformer模型组成,之后包括全连接层Fully-Connected Layer、随机失活层DropoutLayer,同时可以结合激励函数和归一化层得出预测结果。示例地,激励函数可以为GELU。示例地,若设置的评分为1-5分,则可以在归一层设置5个分类,分别对应1、2、3、4、5分。As shown in Figure 2, the text classification model includes a feature extraction sub-model, which can be composed of a Transformer model, and then includes a fully-connected layer, a fully-connected layer, a random deactivation layer, DropoutLayer, and can be combined with excitation functions and normalization. layer to get the prediction result. Illustratively, the excitation function may be GELU. For example, if the set score is 1-5 points, 5 categories can be set in the normalization layer, corresponding to 1, 2, 3, 4, and 5 points respectively.
由此,在该实施例中,可以对评价文本进行分词,获得评价文本对应的分词序列,之后对每一分词进行向量化表示输入文本分类模型,在文本分类模型中特征提取时可以基于注意力机制更加关注与评分相关的特征,在提取特征之后基于全连接层获得目标特征,保证特征提取的准确性,为后续进行预测提供可靠的数据支持。并且,所述文本分类模型的预测结果是通过目标特征中的部分特征进行预测得出的,例如可以通过Dropout Layer设置每一个神经网络层进行dropout的概率,对于神经网络训练单元,按照该概率将其从网络中移除,以进行预测分类获得预测结果并进行训练。由此,在进行文本分类模型的训练过程中,可以随机的基于目标特征中的部分特征进行预测,并且对于随机梯度下降来说,在选择部分特征时是随机选择,从而使得文本分类模型的每一个mini-batch都在训练不同的网络,有效避免文本分类模型的训练进入过拟合,从而提高训练所得的文本分类模型的准确性,保证确定出的第二评价结果的准确性,为对目标内容进行客观、准确地评价提供支持。Therefore, in this embodiment, the evaluation text can be segmented to obtain a segmented sequence corresponding to the evaluation text, and then each segmented word can be vectorized to represent the input text classification model. In the text classification model, the feature extraction can be based on attention The mechanism pays more attention to the features related to scoring. After extracting the features, the target features are obtained based on the fully connected layer to ensure the accuracy of feature extraction and provide reliable data support for subsequent predictions. Moreover, the prediction result of the text classification model is obtained by predicting some features in the target features. For example, the probability of dropout of each neural network layer can be set through the Dropout Layer. For the neural network training unit, according to this probability, It is removed from the network for prediction classification to obtain predictions and train them. Therefore, during the training process of the text classification model, prediction can be made randomly based on some features in the target features, and for stochastic gradient descent, part of the features are selected randomly, so that every feature of the text classification model is selected randomly. A mini-batch is training different networks, which can effectively prevent the training of the text classification model from entering into overfitting, thereby improving the accuracy of the text classification model obtained by training, and ensuring the accuracy of the determined second evaluation result. The content is objectively and accurately evaluated to provide support.
在一种可能的实施例中,在步骤13中确定目标用户和目标内容的对应的匹配度的示例性实现方式如下,如图3所示,该步骤可以包括:In a possible embodiment, an exemplary implementation manner of determining the corresponding degree of matching between the target user and the target content in
在步骤31中,基于目标用户的历史评价文本,确定目标用户对应的用户向量。In
其中,该步骤中可以在获得用户授权的情况下,获取该目标用户的各个评价文本,从而可以从该用户的历史评价中获取到该用户的特征,即获得该用户向量。Wherein, in this step, each evaluation text of the target user may be obtained under the condition of obtaining the user's authorization, so that the characteristics of the user may be obtained from the historical evaluation of the user, that is, the user vector may be obtained.
作为示例,可以基于目标用户的多个历史评价文本进行关键词提取,从而可以将提取出的关键词作为该目标用户的特征词,之后通过对该特征词进行向量化获得该用户向量。As an example, keyword extraction may be performed based on multiple historical evaluation texts of the target user, so that the extracted keyword may be used as the characteristic word of the target user, and then the user vector may be obtained by vectorizing the characteristic word.
作为另一示例,所述基于所述目标用户的历史评价文本,确定所述目标用户对应的用户向量,可以包括:As another example, the determining the user vector corresponding to the target user based on the historical evaluation text of the target user may include:
对所述历史评价文本进行聚类,获得所述历史评价文本对应的多个聚类簇。The historical evaluation text is clustered to obtain a plurality of clusters corresponding to the historical evaluation text.
其中,可以基于K-means或者KNN等常用聚类算法对历史评价文本进行聚类,以获得多个聚类簇。Among them, the historical evaluation text can be clustered based on a common clustering algorithm such as K-means or KNN to obtain multiple clusters.
针对每一所述聚类簇,将该聚类簇中文本长度小于预设的长度阈值的历史评价文本进行拼接,获得至少一个拼接文本。For each of the clusters, the historical evaluation texts whose text lengths are less than a preset length threshold in the cluster are spliced to obtain at least one spliced text.
其中,在用户对内容进行评价时,可能会经常性地通过短文本进行评价,若评价文本长度过短则不便于主题词生成。因此,在该实施例中,可以首先对历史评价文本进行聚类,使得具有相似特征的历史评价文本聚集在一起。Among them, when the user evaluates the content, the evaluation may be performed frequently through short texts, and if the length of the evaluation text is too short, it is inconvenient to generate the subject words. Therefore, in this embodiment, the historical evaluation texts may be clustered first, so that historical evaluation texts with similar characteristics are clustered together.
示例地,预设的长度阈值可以基于实际应用场景进行设置,本公开对此不进行限定。针对每一聚类簇,若其中的历史评价文本的文本长度小于该长度阈值,表示该历史评价文本为短文本,该情况下可以将同一聚类簇中的短文本进行拼接,从而可以基于具有相似特征的文本对历史评价文本的长度进行扩展,实现文本扩展的同时,保证各个历史评价文本的自身特征。For example, the preset length threshold may be set based on an actual application scenario, which is not limited in the present disclosure. For each cluster, if the text length of the historical evaluation text is less than the length threshold, it means that the historical evaluation text is short text. The texts with similar features extend the length of the historical evaluation texts, and at the same time to realize the text expansion, the own characteristics of each historical evaluation text are guaranteed.
作为示例,可以将一个聚类簇中的所有的文本长度小于预设的长度阈值的历史评价文本进行拼接,获得一个拼接文本。作为另一示例,为避免过长的拼接文本,可以设置拼接文本的文本长度限值,即基于当前聚类簇中的短文本进行拼接时,当前的已拼接文本S的文本长度小于该文本长度限值,若新增一个历史评价文本A后所得的拼接文本S’的文本长度大于该文本长度限值,则不进行拼接,即将当前的已拼接文本S作为拼接完成的拼接文本,并将该新增的历史评价文本A作为一个新的拼接文本,以进一步与其他历史评价文本进行拼接,以获得多个拼接文本。As an example, all historical evaluation texts whose text lengths are less than a preset length threshold in a cluster may be spliced to obtain a spliced text. As another example, in order to avoid excessively long spliced text, the text length limit of the spliced text can be set, that is, when splicing based on the short text in the current cluster, the text length of the current spliced text S is less than the text length Limit, if the text length of the spliced text S' obtained after adding a new historical evaluation text A is greater than the text length limit, the splicing will not be performed, that is, the current spliced text S is regarded as the spliced text that has been spliced, and the The newly added historical evaluation text A is used as a new spliced text to be further spliced with other historical evaluation texts to obtain multiple spliced texts.
之后,基于各个聚类簇中的拼接文本、以及文本长度不小于所述长度阈值的历史评价文本、和主题生成模型,确定所述目标用户对应的主题词,并基于所述主题词对应的向量,确定所述用户向量。After that, based on the spliced text in each cluster, the historical evaluation text whose text length is not less than the length threshold, and the topic generation model, determine the subject word corresponding to the target user, and based on the vector corresponding to the subject word , determine the user vector.
其中,历史评价文本的文本长度不小于所述长度阈值,表示该历史评价文本可以直接应用于主题词生成。如上文所述,对每一聚类簇中文本长度小于长度阈值的历史评价文本进行拼接,获得拼接文本,则使得该拼接文本为文本长度较长的文本。由此,在该实施例中,可以基于该拼接文本和聚类簇中未进行拼接的历史评价文本基于主题生成模型获得主题词,如可以根据主题生成模型输出的概率分布中按照概率降序选择预设数量的词作为主题词,之后对主题词进行向量化表示,获得用于表征该用户特征的用户向量。Wherein, the text length of the historical evaluation text is not less than the length threshold, indicating that the historical evaluation text can be directly applied to the subject word generation. As described above, the spliced text is obtained by splicing the historical evaluation text whose text length is less than the length threshold in each cluster, so that the spliced text is a text with a longer text length. Therefore, in this embodiment, the subject words can be obtained based on the subject generation model based on the spliced text and the historical evaluation texts that have not been spliced in the cluster. Set the number of words as subject words, and then vectorize the subject words to obtain the user vector used to characterize the user's characteristics.
示例地,主题生成模型可以是基于LDA(Latent Dirichlet Allocation,隐含狄利克雷分布)实现的,其训练方式为现有技术,在此不再赘述。其中,LDA模型为无监督模型,在该实施例中可以基于已有的先验文本对LDA模型进行预训练,该先验文本可以例如为用户针对惊悚类电影内容的评价文本、针对喜剧类电影内容的评价文本等,从而可以在一定程度上提高主题生成模型针对用户关注的定制化场景下的主题词确定。For example, the topic generation model may be implemented based on LDA (Latent Dirichlet Allocation, latent Dirichlet distribution), and the training method thereof is the prior art, which will not be repeated here. The LDA model is an unsupervised model, and in this embodiment, the LDA model can be pre-trained based on existing prior texts. Content evaluation text, etc., so that to a certain extent, the topic generation model can improve the topic word determination in customized scenarios that users pay attention to.
由此,可以对具有相似特征的文本长度较短的文本进行合并拼接,从而可以基于长文本在进行主题词确定,在一定程度上提高主题词确定的准确性,即提高确定出的用户特征的准确性,为确定用户画像提供准确的数据支持。In this way, texts with similar characteristics and shorter text lengths can be combined and spliced, so that the subject words can be determined based on the long texts, and the accuracy of the subject words determination can be improved to a certain extent, that is, the accuracy of the determined user characteristics can be improved. Accuracy, providing accurate data support for determining user portraits.
在步骤32中,基于目标内容对应的多个评价文本,确定目标内容对应的内容向量。In
其中,多个用户可以针对同一内容进行评价,因此,可以基于多个用户对同一目标内容的评价文本,获得该目标内容的真实综合特征,即获得该内容向量。Among them, multiple users can evaluate the same content. Therefore, based on the evaluation texts of multiple users on the same target content, the real comprehensive feature of the target content can be obtained, that is, the content vector can be obtained.
作为示例,可以基于多个评价文本进行关键词提取,从而可以将提取出的关键词作为该目标内容的特征词,之后通过对该特征词进行向量化获得该内容向量。As an example, keyword extraction may be performed based on multiple evaluation texts, so that the extracted keywords may be used as feature words of the target content, and then the content vector may be obtained by vectorizing the feature words.
作为另一示例,所述基于所述目标内容对应的多个评价文本,确定所述目标内容对应的内容向量,可以包括:As another example, the determining a content vector corresponding to the target content based on multiple evaluation texts corresponding to the target content may include:
确定所述目标内容对应的多个评价文本中的每一分词对应的词频和逆向文档频率,以及所述分词对应的文本长度比例,其中,所述分词对应的文本长度比例为所述分词所属的评价文本的长度与所述多个评价文本的平均文本长度的比值。Determine the word frequency and reverse document frequency corresponding to each word segment in the multiple evaluation texts corresponding to the target content, and the text length ratio corresponding to the word segment, wherein the text length ratio corresponding to the word segment is the segment to which the word segment belongs. The ratio of the length of the evaluation text to the average text length of the plurality of evaluation texts.
其中,可以将每一评价文本作为一篇文档,对评价文本进行分词,获得多个评价文本对应的各个分词。由此,可以进一步确定出每一分词对应的词频TF、逆向文档频率IDF和文本长度比例DOC_LEN,公式如下:Wherein, each evaluation text can be regarded as a document, and the evaluation text can be segmented to obtain each segment corresponding to the multiple evaluation texts. Thus, the word frequency TF, inverse document frequency IDF and text length ratio DOC_LEN corresponding to each word segment can be further determined. The formula is as follows:
针对每一所述分词,将所述分词对应的词频、逆向文档频率和所述分词对应的文本长度比例的乘积确定为所述分词对应的目标参数;根据每一所述分词对应的目标参数,确定所述目标内容对应的目标分词,并基于所述目标分词对应的向量确定所述内容向量。For each of the word segmentations, the product of the word frequency corresponding to the word segmentation, the reverse document frequency and the text length ratio corresponding to the word segmentation is determined as the target parameter corresponding to the word segmentation; according to the target parameter corresponding to each of the word segmentation, A target word segment corresponding to the target content is determined, and the content vector is determined based on a vector corresponding to the target word segment.
其中,可以将按照分词对应的目标参数由大至小的顺序排名前N的分词确定为该目标内容对应的目标分词,则之后可以对该目标分词进行向量化,获得内容向量。其中,对目标用户的主题词进行向量化和对目标内容的目标分词进行向量化的方式相同,其可以选择本领域中常用的向量化的方式,如word2vec的方式,本公开对此不进行限定。Among them, the top N word segmentations ranked according to the target parameters corresponding to the word segmentation in descending order can be determined as the target word segmentation corresponding to the target content, and then the target word segmentation can be vectorized to obtain a content vector. Among them, the vectorization of the subject words of the target user is the same as the vectorization of the target word segmentation of the target content, and the vectorization method commonly used in the field can be selected, such as the word2vec method, which is not limited in this disclosure. .
由此,在从评价文本中的确定目标分词时,在考虑分词的词频以及逆向文档频率的同时,还结合了分词所属评价文本的文本长度。如上文所示,文本长度较短的评价文本中难以准确提取关键词,因此本公开中通过考虑文本长度比例以提高文本长度更长的评价文本中分词的重要性,保证确定出的目标分词的准确性,从而提高内容向量中的特征的准确性和全面性。Therefore, when segmenting the target word from the evaluation text, the word frequency of the segmented word and the reverse document frequency are considered, and the text length of the evaluation text to which the segmented word belongs is also combined. As shown above, it is difficult to accurately extract keywords from evaluation texts with short text lengths. Therefore, in the present disclosure, the importance of word segmentation in evaluation texts with longer text lengths is increased by considering the text length ratio, so as to ensure the accuracy of the determined target word segmentations. accuracy, thereby improving the accuracy and comprehensiveness of the features in the content vector.
在步骤33中,根据用户向量和内容向量,确定匹配度。In
示例地,可以基于所述用户向量和所述内容向量计算两者之间的余弦相似度,作为该匹配度,例如可以通过如下公式进行计算:Exemplarily, the cosine similarity between the user vector and the content vector can be calculated based on the user vector and the content vector. As the matching degree, for example, the following formula can be used to calculate:
其中,β1用于表示用户向量,β2用于表示内容向量,Ai用于表示用户向量中的第i个特征,Bi用于表示内容向量中的第i个特征,n用于表示用户向量和内容向量的特征的维度,两者对应的维度的数量相同。Among them, β 1 is used to represent the user vector, β 2 is used to represent the content vector, Ai is used to represent the ith feature in the user vector, Bi is used to represent the ith feature in the content vector, and n is used to represent the user vector and the dimension of the feature of the content vector, the number of dimensions corresponding to the two is the same.
由此,通过上述技术方案,可以基于用户的历史评价文本获得目标用户的兴趣特征,基于目标内容的多个评价文本获得该目标内容本身的特征,从而可以基于两者之间的匹配度表征用户是否可以给出该目标内容客观的评价,为后续基于该匹配度确定最终的质量评价结果提供数据参数,保证确定出的质量评价结果的客观性。Therefore, through the above technical solution, the interest characteristics of the target user can be obtained based on the user's historical evaluation texts, and the characteristics of the target content itself can be obtained based on multiple evaluation texts of the target content, so that the user can be characterized based on the matching degree between the two. Whether an objective evaluation of the target content can be given to provide data parameters for the subsequent determination of the final quality evaluation result based on the matching degree, so as to ensure the objectivity of the determined quality evaluation result.
在一种可能的实施例中,在步骤14中,根据第一评价结果、第二评价结果和匹配度,确定目标用户针对目标内容的质量评价结果的示例性实现方式如下,该步骤可以包括:In a possible embodiment, in
将所述第一评价结果和所述第二评价结果的加权和确定为初始评价结果。A weighted sum of the first evaluation result and the second evaluation result is determined as an initial evaluation result.
示例地,可以预先设置第一评价结果和第二评价结果的各自对应的权重,其中,第一评价结果和第二评价结果分别对应的权重的和为1,两者分别对应的权重可以根据实际应用场景进行设置,本公开对此不进行限定。作为示例,两者分别对应的权重为0.5,在可以将所述第一评价结果和所述第二评价结果的平均值确定为初始评价结果,从而综合考虑用户输入的评分和文本之间的一致性。For example, the respective corresponding weights of the first evaluation result and the second evaluation result may be preset, wherein the sum of the respective weights corresponding to the first evaluation result and the second evaluation result is 1, and the respective corresponding weights may be determined according to the actual situation. The application scenario is set, which is not limited in this disclosure. As an example, the weights corresponding to the two are 0.5, and the average value of the first evaluation result and the second evaluation result can be determined as the initial evaluation result, so as to comprehensively consider the consistency between the score input by the user and the text sex.
根据所述匹配度对所述初始评价结果进行调整,获得所述质量评价结果。The initial evaluation result is adjusted according to the matching degree to obtain the quality evaluation result.
其中,如上文所述,该匹配度用于表示目标用户对于目标内容的评价的主观性程度,其中匹配度越高,主观性程度越强,说明该目标用户更有可能给出具有偏向性的评价,因此,在确定质量评价结果是需要抵消主观偏好因素带来的影响,则根据所述匹配度对所述初始评价结果进行调整,可以是将初始评价结果减去该匹配度所得的结果作为该质量评价结果。或者,可以根据预先设置的匹配度的调整权重,将匹配度与该匹配度的调整权重的乘积作为调整值,并将初始评价结果减去该调整值所得的结果作为该质量评价结果。Among them, as mentioned above, the matching degree is used to indicate the subjectivity degree of the target user's evaluation of the target content. Therefore, when it is determined that the quality evaluation result needs to offset the influence of subjective preference factors, the initial evaluation result is adjusted according to the matching degree, and the result obtained by subtracting the matching degree from the initial evaluation result can be used as The quality evaluation results. Alternatively, according to the preset adjustment weight of the matching degree, the product of the matching degree and the adjustment weight of the matching degree may be used as the adjustment value, and the result obtained by subtracting the adjustment value from the initial evaluation result may be used as the quality evaluation result.
由此,通过上述技术方案,使得确定出的质量评价结果与用户输入的评价分数和评价文本保持一致,同时可以在一定程度上降低用户的主观偏好的影响,保证确定出的质量评价结果的准确性和客观性。并且,在一定程度上使得质量评价结果与用户输入的第一评价结果相较,变的更加分散,避免内容的评价结果重合度比较大而难以对内容质量进行分辨的情况,为用户明确各内容的实际质量提供更准确的数据参考。Therefore, through the above technical solution, the determined quality evaluation result is consistent with the evaluation score and evaluation text input by the user, and at the same time, the influence of the user's subjective preference can be reduced to a certain extent, so as to ensure the accuracy of the determined quality evaluation result. sex and objectivity. In addition, to a certain extent, the quality evaluation results are more dispersed compared with the first evaluation results input by the user, so as to avoid the situation where the evaluation results of the content overlap greatly and it is difficult to distinguish the quality of the content, and clarify each content for the user. The actual quality of the data provides a more accurate data reference.
在一种可能的实施例中,所述方法还可以包括:In a possible embodiment, the method may further include:
根据内容库中的每一内容对应的质量评价结果,确定与所述目标用户对应的推荐内容;输出所述推荐内容。其中,所述内容库中的全部或部分内容对应的质量评价结果为根据上文所述的内容质量评价方法确定出的。According to the quality evaluation result corresponding to each content in the content library, the recommended content corresponding to the target user is determined; the recommended content is output. Wherein, the quality evaluation result corresponding to all or part of the content in the content library is determined according to the content quality evaluation method described above.
作为示例,可以按照内容库中每一内容对应的质量评价结果由大到小的顺序选择前P作为目标用户的推荐内容,其中P可以根据实际用户需要进行设置。由此,可以为目标用户推荐质量较高的内容,保证用户内容观看体验。As an example, the top P may be selected as the recommended content for the target user according to the quality evaluation result corresponding to each content in the content library in descending order, where P may be set according to actual user needs. As a result, higher-quality content can be recommended for target users, and the user's content viewing experience can be guaranteed.
作为另一示例,可以按照内容库中每一内容对应的质量评价结果由大到小的顺序选择前Q作为目标用户的候选内容,其中Q可以根据实际用户需要进行设置。之后,基于该候选内容和目标用户之间的选择匹配度,将选择匹配度由高至低的顺序选择前P作为推荐内容并按照选择匹配度由高至低的顺序显示,其中P小于或者等于Q。其中,该选择匹配度可以是基于候选内容的标签向量和目标用户的兴趣向量进行相似度计算确定的,以用于表示该候选内容是否符合用户的兴趣偏好。示例地,该标签向量可以是基于候选内容的类型标签向量化得出的,该类型标签可以是体裁(如故事片、纪录片)、类型(如喜剧、悲剧、都市、乡村)等,该兴趣向量可以是在获得用户授权的情况下获得的用户的兴趣标签向量化所得,该兴趣标签可以是偶像剧、青春、都市等。由此,可以在为目标用户推荐质量较高的内容的同时,基于该目标用户的兴趣偏好进行推荐显示,保证推荐内容与目标用户之间的匹配度的同时,提高推荐内容的多样性和个性化,进一步提升用户使用体验。As another example, the top Q may be selected as the candidate content of the target user according to the quality evaluation result corresponding to each content in the content library in descending order, where Q may be set according to actual user needs. After that, based on the selection matching degree between the candidate content and the target user, the top P is selected as the recommended content in the order of the selection matching degree from high to low and displayed in the order of the selection matching degree from high to low, where P is less than or equal to Q. The selection matching degree may be determined by similarity calculation based on the tag vector of the candidate content and the interest vector of the target user, so as to indicate whether the candidate content conforms to the user's interest preference. Illustratively, the label vector may be obtained based on the type label vectorization of the candidate content, and the type label may be a genre (such as a feature film, a documentary), a genre (such as a comedy, tragedy, urban, rural), etc., and the interest vector may be It is obtained from the vectorization of the user's interest tag obtained under the authorization of the user, and the interest tag can be idol drama, youth, city, etc. In this way, while recommending high-quality content for the target user, the recommendation display can be performed based on the target user's interests and preferences, so as to ensure the matching degree between the recommended content and the target user, and at the same time improve the diversity and individuality of the recommended content to further enhance the user experience.
基于同样的发明构思,本公开还提供一种内容质量评价装置,如图4所示,所述装置10包括:Based on the same inventive concept, the present disclosure also provides an apparatus for evaluating content quality. As shown in FIG. 4 , the
获取模块100,用于获取目标用户针对目标内容的第一评价结果和评价文本;The obtaining
第一确定模块200,用于根据所述评价文本,确定所述目标内容的第二评价结果;a
第二确定模块300,用于确定所述目标用户和所述目标内容对应的匹配度;a second determining
第三确定模块400,用于根据所述第一评价结果、所述第二评价结果和所述匹配度,确定所述目标用户针对所述目标内容的质量评价结果。The
可选地,所述第二确定模块包括:Optionally, the second determining module includes:
第一确定子模块,用于基于所述目标用户的历史评价文本,确定所述目标用户对应的用户向量;a first determination submodule, configured to determine a user vector corresponding to the target user based on the historical evaluation text of the target user;
第二确定子模块,用于基于所述目标内容对应的多个评价文本,确定所述目标内容对应的内容向量;a second determination submodule, configured to determine a content vector corresponding to the target content based on a plurality of evaluation texts corresponding to the target content;
第三确定子模块,用于根据所述用户向量和所述内容向量,确定所述匹配度。The third determining submodule is configured to determine the matching degree according to the user vector and the content vector.
可选地,所述第一确定子模块包括:Optionally, the first determination submodule includes:
聚类子模块,用于对所述历史评价文本进行聚类,获得所述历史评价文本对应的多个聚类簇;a clustering submodule, configured to perform clustering on the historical evaluation text to obtain a plurality of clusters corresponding to the historical evaluation text;
拼接子模块,用于针对每一所述聚类簇,将该聚类簇中文本长度小于预设的长度阈值的历史评价文本进行拼接,获得至少一个拼接文本;A splicing sub-module, for splicing historical evaluation texts whose text length is less than a preset length threshold in the cluster for each of the clusters, to obtain at least one spliced text;
第四确定子模块,用于基于各个聚类簇中的拼接文本、以及文本长度不小于所述长度阈值的历史评价文本、和主题生成模型,确定所述目标用户对应的主题词,并基于所述主题词对应的向量,确定所述用户向量。The fourth determination sub-module is used to determine the subject word corresponding to the target user based on the spliced text in each cluster, the historical evaluation text whose text length is not less than the length threshold, and the subject generation model, and based on the The vector corresponding to the subject word is described, and the user vector is determined.
可选地,所述第二确定子模块包括:Optionally, the second determination submodule includes:
第五确定子模块,用于确定所述目标内容对应的多个评价文本中的每一分词对应的词频和逆向文档频率,以及所述分词对应的文本长度比例,其中,所述分词对应的文本长度比例为所述分词所属的评价文本的长度与所述多个评价文本的平均文本长度的比值;The fifth determination submodule is used to determine the word frequency and reverse document frequency corresponding to each word segment in the multiple evaluation texts corresponding to the target content, and the text length ratio corresponding to the word segment, wherein the text corresponding to the word segment The length ratio is the ratio of the length of the evaluation text to which the segmented word belongs and the average text length of the plurality of evaluation texts;
第六确定子模块,用于针对每一所述分词,将所述分词对应的词频、逆向文档频率和所述分词对应的文本长度比例的乘积确定为所述分词对应的目标参数;The sixth determination submodule is used to determine, for each of the word segmentations, the product of the word frequency corresponding to the word segmentation, the reverse document frequency and the text length ratio corresponding to the word segmentation as the target parameter corresponding to the word segmentation;
第七确定子模块,用于根据每一所述分词对应的目标参数,确定所述目标内容对应的目标分词,并基于所述目标分词对应的向量确定所述内容向量。The seventh determination sub-module is configured to determine the target word segment corresponding to the target content according to the target parameter corresponding to each word segment, and determine the content vector based on the vector corresponding to the target word segment.
可选地,所述第一确定模块包括:Optionally, the first determining module includes:
第八确定子模块,用于根据所述评价文本和文本分类模型,确定所述评价文本对应的分类,并将所述分类指示的评分确定为所述第二评价结果;an eighth determination submodule, configured to determine the classification corresponding to the evaluation text according to the evaluation text and the text classification model, and determine the score indicated by the classification as the second evaluation result;
其中,所述文本分类模型的训练过程中,基于特征提取子模型进行特征提取并基于全连接层获得目标特征,所述文本分类模型的预测结果是通过目标特征中的部分特征进行预测得出的。Wherein, in the training process of the text classification model, feature extraction is performed based on the feature extraction sub-model and target features are obtained based on the fully connected layer, and the prediction result of the text classification model is obtained by predicting some features in the target features .
可选地,所述第三确定模块包括:Optionally, the third determining module includes:
第九确定子模块,用于将所述第一评价结果和所述第二评价结果的加权和确定为初始评价结果;Ninth determination submodule, for determining the weighted sum of the first evaluation result and the second evaluation result as the initial evaluation result;
调整子模块,用于根据所述匹配度对所述初始评价结果进行调整,获得所述质量评价结果。An adjustment sub-module, configured to adjust the initial evaluation result according to the matching degree to obtain the quality evaluation result.
可选地,所述装置还包括:Optionally, the device further includes:
第四确定模块,用于根据内容库中的每一内容对应的质量评价结果,确定与所述目标用户对应的推荐内容;a fourth determination module, configured to determine the recommended content corresponding to the target user according to the quality evaluation result corresponding to each content in the content library;
输出模块,用于输出所述推荐内容。The output module is used for outputting the recommended content.
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。Regarding the apparatus in the above-mentioned embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment of the method, and will not be described in detail here.
图5是根据一示例性实施例示出的一种电子设备700的框图。如图5所示,该电子设备700可以包括:处理器701,存储器702。该电子设备700还可以包括多媒体组件703,输入/输出(I/O)接口704,以及通信组件705中的一者或多者。FIG. 5 is a block diagram of an
其中,处理器701用于控制该电子设备700的整体操作,以完成上述的内容质量评价方法中的全部或部分步骤。存储器702用于存储各种类型的数据以支持在该电子设备700的操作,这些数据例如可以包括用于在该电子设备700上操作的任何应用程序或方法的指令,以及应用程序相关的数据,例如联系人数据、收发的消息、图片、音频、视频等等。该存储器702可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,例如静态随机存取存储器(Static Random Access Memory,简称SRAM),电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,简称EEPROM),可擦除可编程只读存储器(Erasable Programmable Read-Only Memory,简称EPROM),可编程只读存储器(Programmable Read-Only Memory,简称PROM),只读存储器(Read-Only Memory,简称ROM),磁存储器,快闪存储器,磁盘或光盘。多媒体组件703可以包括屏幕和音频组件。其中屏幕例如可以是触摸屏,音频组件用于输出和/或输入音频信号。例如,音频组件可以包括一个麦克风,麦克风用于接收外部音频信号。所接收的音频信号可以被进一步存储在存储器702或通过通信组件705发送。音频组件还包括至少一个扬声器,用于输出音频信号。I/O接口704为处理器701和其他接口模块之间提供接口,上述其他接口模块可以是键盘,鼠标,按钮等。这些按钮可以是虚拟按钮或者实体按钮。通信组件705用于该电子设备700与其他设备之间进行有线或无线通信。无线通信,例如Wi-Fi,蓝牙,近场通信(Near FieldCommunication,简称NFC),2G、3G、4G、NB-IOT、eMTC、或其他5G等等,或它们中的一种或几种的组合,在此不做限定。因此相应的该通信组件705可以包括:Wi-Fi模块,蓝牙模块,NFC模块等等。The
在一示例性实施例中,电子设备700可以被一个或多个应用专用集成电路(Application Specific Integrated Circuit,简称ASIC)、数字信号处理器(DigitalSignal Processor,简称DSP)、数字信号处理设备(Digital Signal Processing Device,简称DSPD)、可编程逻辑器件(Programmable Logic Device,简称PLD)、现场可编程门阵列(Field Programmable Gate Array,简称FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述的内容质量评价方法。In an exemplary embodiment, the
在另一示例性实施例中,还提供了一种包括程序指令的计算机可读存储介质,该程序指令被处理器执行时实现上述的内容质量评价方法的步骤。例如,该计算机可读存储介质可以为上述包括程序指令的存储器702,上述程序指令可由电子设备700的处理器701执行以完成上述的内容质量评价方法。In another exemplary embodiment, a computer-readable storage medium including program instructions is also provided, the program instructions implement the steps of the above-mentioned content quality evaluation method when executed by a processor. For example, the computer-readable storage medium can be the above-mentioned
图6是根据一示例性实施例示出的一种电子设备1900的框图。例如,电子设备1900可以被提供为一服务器。参照图6,电子设备1900包括处理器1922,其数量可以为一个或多个,以及存储器1932,用于存储可由处理器1922执行的计算机程序。存储器1932中存储的计算机程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理器1922可以被配置为执行该计算机程序,以执行上述的内容质量评价方法。FIG. 6 is a block diagram of an
另外,电子设备1900还可以包括电源组件1926和通信组件1950,该电源组件1926可以被配置为执行电子设备1900的电源管理,该通信组件1950可以被配置为实现电子设备1900的通信,例如,有线或无线通信。此外,该电子设备1900还可以包括输入/输出(I/O)接口1958。电子设备1900可以操作基于存储在存储器1932的操作系统,例如WindowsServerTM,Mac OS XTM,UnixTM,LinuxTM等等。In addition, the
在另一示例性实施例中,还提供了一种包括程序指令的计算机可读存储介质,该程序指令被处理器执行时实现上述的内容质量评价方法的步骤。例如,该非临时性计算机可读存储介质可以为上述包括程序指令的存储器1932,上述程序指令可由电子设备1900的处理器1922执行以完成上述的内容质量评价方法。In another exemplary embodiment, a computer-readable storage medium including program instructions is also provided, the program instructions implement the steps of the above-mentioned content quality evaluation method when executed by a processor. For example, the non-transitory computer-readable storage medium can be the above-mentioned
在另一示例性实施例中,还提供一种计算机程序产品,该计算机程序产品包含能够由可编程的装置执行的计算机程序,该计算机程序具有当由该可编程的装置执行时用于执行上述的内容质量评价方法的代码部分。In another exemplary embodiment, there is also provided a computer program product comprising a computer program executable by a programmable apparatus, the computer program having, when executed by the programmable apparatus, for performing the above The code section of the content quality evaluation method.
以上结合附图详细描述了本公开的优选实施方式,但是,本公开并不限于上述实施方式中的具体细节,在本公开的技术构思范围内,可以对本公开的技术方案进行多种简单变型,这些简单变型均属于本公开的保护范围。The preferred embodiments of the present disclosure have been described above in detail with reference to the accompanying drawings. However, the present disclosure is not limited to the specific details of the above-mentioned embodiments. Various simple modifications can be made to the technical solutions of the present disclosure within the scope of the technical concept of the present disclosure. These simple modifications all fall within the protection scope of the present disclosure.
另外需要说明的是,在上述具体实施方式中所描述的各个具体技术特征,在不矛盾的情况下,可以通过任何合适的方式进行组合。为了避免不必要的重复,本公开对各种可能的组合方式不再另行说明。In addition, it should be noted that each specific technical feature described in the above-mentioned specific implementation manner may be combined in any suitable manner under the circumstance that there is no contradiction. In order to avoid unnecessary repetition, various possible combinations are not described in the present disclosure.
此外,本公开的各种不同的实施方式之间也可以进行任意组合,只要其不违背本公开的思想,其同样应当视为本公开所公开的内容。In addition, the various embodiments of the present disclosure can also be arbitrarily combined, as long as they do not violate the spirit of the present disclosure, they should also be regarded as the contents disclosed in the present disclosure.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111671403.5A CN114428837A (en) | 2021-12-31 | 2021-12-31 | Content quality evaluation method, device, medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111671403.5A CN114428837A (en) | 2021-12-31 | 2021-12-31 | Content quality evaluation method, device, medium and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114428837A true CN114428837A (en) | 2022-05-03 |
Family
ID=81312192
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111671403.5A Pending CN114428837A (en) | 2021-12-31 | 2021-12-31 | Content quality evaluation method, device, medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114428837A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114936797A (en) * | 2022-06-16 | 2022-08-23 | 身边云(北京)信息服务有限公司 | A freelancer evaluation method, electronic device and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112395855A (en) * | 2020-12-03 | 2021-02-23 | 中国联合网络通信集团有限公司 | Comment-based evaluation method and device |
WO2021082861A1 (en) * | 2019-10-31 | 2021-05-06 | 平安科技(深圳)有限公司 | Scoring method and apparatus, electronic device, and storage medium |
CN113420809A (en) * | 2021-06-22 | 2021-09-21 | 北京金山云网络技术有限公司 | Video quality evaluation method and device and electronic equipment |
CN113782125A (en) * | 2021-09-17 | 2021-12-10 | 平安国际智慧城市科技股份有限公司 | Clinic scoring method and device based on artificial intelligence, electronic equipment and medium |
-
2021
- 2021-12-31 CN CN202111671403.5A patent/CN114428837A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021082861A1 (en) * | 2019-10-31 | 2021-05-06 | 平安科技(深圳)有限公司 | Scoring method and apparatus, electronic device, and storage medium |
CN112395855A (en) * | 2020-12-03 | 2021-02-23 | 中国联合网络通信集团有限公司 | Comment-based evaluation method and device |
CN113420809A (en) * | 2021-06-22 | 2021-09-21 | 北京金山云网络技术有限公司 | Video quality evaluation method and device and electronic equipment |
CN113782125A (en) * | 2021-09-17 | 2021-12-10 | 平安国际智慧城市科技股份有限公司 | Clinic scoring method and device based on artificial intelligence, electronic equipment and medium |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114936797A (en) * | 2022-06-16 | 2022-08-23 | 身边云(北京)信息服务有限公司 | A freelancer evaluation method, electronic device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112533051B (en) | Barrage information display method, barrage information display device, computer equipment and storage medium | |
CN108009228B (en) | Method, device and storage medium for setting content label | |
CN112559800B (en) | Method, apparatus, electronic device, medium and product for processing video | |
CN108874832B (en) | Target comment determination method and device | |
US11361759B2 (en) | Methods and systems for automatic generation and convergence of keywords and/or keyphrases from a media | |
CN107704525A (en) | Video searching method and device | |
CN113987161B (en) | A text sorting method and device | |
CN104111925B (en) | Item recommendation method and device | |
CN111314732A (en) | Method for determining video label, server and storage medium | |
CN114547303B (en) | Text multi-feature classification method and device based on Bert-LSTM | |
WO2021155691A1 (en) | User portrait generating method and apparatus, storage medium, and device | |
CN113111197B (en) | Multimedia content recommendation method, device, equipment and storage medium | |
CN112000803B (en) | Text classification method and device, electronic equipment and computer readable storage medium | |
CN114357204B (en) | Media information processing method and related equipment | |
CN110992127A (en) | Article recommendation method and device | |
US20250103646A1 (en) | Machine learning selection of images | |
CN113688281B (en) | Video recommendation method and system based on deep learning behavior sequence | |
CN114428837A (en) | Content quality evaluation method, device, medium and electronic equipment | |
CN112801053B (en) | Video data processing method and device | |
CN106570003B (en) | Data pushing method and device | |
CN118095355A (en) | Model training method, content screening method and related devices | |
CN116628202A (en) | Intention recognition method, electronic device, and storage medium | |
CN115130453A (en) | Interactive information generation method and device | |
CN113934872A (en) | Search result sorting method, device, equipment and storage medium | |
CN110147488B (en) | Page content processing method, processing device, computing equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |