WO2022161470A1 - 内容的评价方法、装置、设备及介质 - Google Patents

内容的评价方法、装置、设备及介质 Download PDF

Info

Publication number
WO2022161470A1
WO2022161470A1 PCT/CN2022/074684 CN2022074684W WO2022161470A1 WO 2022161470 A1 WO2022161470 A1 WO 2022161470A1 CN 2022074684 W CN2022074684 W CN 2022074684W WO 2022161470 A1 WO2022161470 A1 WO 2022161470A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
article
evaluation
multimedia
feature
Prior art date
Application number
PCT/CN2022/074684
Other languages
English (en)
French (fr)
Inventor
朱灵子
马连洋
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2022161470A1 publication Critical patent/WO2022161470A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the field of artificial intelligence, and in particular, to a content evaluation method, device, equipment and medium.
  • the evaluation of article content refers to the computer equipment to determine the quality of the graphic content in the article, to determine the proportion of high-quality content and inferior content in the article, and to judge whether the quality of the article content is qualified according to the determined proportion.
  • the related technology is that computer equipment evaluates the content of the article from the perspective of text information through supervised learning or unsupervised learning.
  • the proportion of high-quality text information and inferior text information is obtained, and the evaluation results of relevant text information are obtained through the proportion, and the obtained evaluation results are used as the evaluation results of the entire article content.
  • the related technology can only evaluate the content of the article through text information, and cannot make a reasonable evaluation of the content of the article in some cases.
  • the present application provides a content evaluation method, device, equipment and medium, which can evaluate article content from multiple dimensions and obtain reasonable article evaluation results.
  • the technical solution is as follows:
  • a method for evaluating content comprising:
  • the article feature information includes text information and multimedia information
  • the multimedia information includes at least one of image information, video information and audio information
  • the article feature information obtain evaluation results of at least two dimensions of the article content
  • the evaluation results of the at least two dimensions are fused to obtain a content evaluation result of the article content.
  • a content evaluation device comprising:
  • an extraction module configured to extract article feature information in the article content, where the article feature information includes at least two of image information, text information, video information and audio information;
  • an evaluation module configured to obtain evaluation results of at least two dimensions of the article content according to the article feature information
  • the evaluation fusion module is used for fusing the evaluation results of the at least two dimensions to obtain the content evaluation result of the article content.
  • a computer device comprising: a processor and a memory, and the memory stores at least one instruction, at least one program, code set or instruction set, at least one instruction, at least one program , a code set or an instruction set is loaded and executed by the processor to implement the evaluation method of the content as described above.
  • a computer storage medium in which at least one piece of program code is stored, and the program code is loaded and executed by a processor to implement the content evaluation method described in the above aspect.
  • a computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the content evaluation method described above.
  • the article content is evaluated from multiple dimensions, so that the final evaluation result is more in line with the actual content of the article.
  • the final evaluation result integrates the evaluation of multiple dimensions, it can effectively reduce the erroneous evaluation results and improve the robustness of the whole scheme.
  • FIG. 1 is a schematic structural diagram of a VistaNet model provided by an exemplary embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a computer system provided by an exemplary embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a method for evaluating content provided by an exemplary embodiment of the present application
  • FIG. 4 is a schematic flowchart of a content evaluation method provided by an exemplary embodiment of the present application.
  • FIG. 5 is an exemplary structural diagram of an article content evaluation model provided by an exemplary embodiment of the present application.
  • FIG. 6 is an exemplary structural diagram of a text-multimedia evaluation sub-network provided by an exemplary embodiment of the present application.
  • FIG. 7 is an exemplary structural diagram of an objective prior feature sub-network provided by an exemplary embodiment of the present application.
  • FIG. 8 is an exemplary structural diagram of a text sub-network provided by an exemplary embodiment of the present application.
  • FIG. 9 is an exemplary structural diagram of a typesetting sub-network provided by an exemplary embodiment of the present application.
  • FIG. 10 is an exemplary complete structural diagram of an article content evaluation model provided by an exemplary embodiment of the present application.
  • FIG. 11 is a schematic flowchart of a text-multimedia sub-network training method provided by an exemplary embodiment of the present application.
  • FIG. 12 is a schematic flowchart of an objective prior feature sub-network training method provided by an exemplary embodiment of the present application.
  • FIG. 13 is a schematic flowchart of a text sub-network training method provided by an exemplary embodiment of the present application.
  • FIG. 14 is a schematic flowchart of a typesetting sub-network training method provided by an exemplary embodiment of the present application.
  • FIG. 15 is an exemplary service architecture diagram provided by an exemplary embodiment of the present application.
  • 16 is a schematic structural diagram of an apparatus for evaluating content provided by an exemplary embodiment of the present application.
  • FIG. 17 is a schematic structural diagram of a computer device provided by an exemplary embodiment of the present application.
  • Artificial Intelligence It is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive discipline, involving a wide range of fields, including both hardware-level technology and software-level technology.
  • the basic technologies of artificial intelligence generally include technologies such as sensors, special artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • Computer Vision is a science that studies how to make machines "see”. Further, it refers to the use of cameras and computers instead of human eyes to identify, track and measure objects. , and further do graphic processing, so that the computer processing becomes a more suitable image for human eye observation or transmission to the instrument for detection.
  • computer vision studies related theories and technologies, trying to build artificial intelligence systems that can obtain information from images or multidimensional data.
  • Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, OCR (Optical Character Recognition, Optical Character Recognition), video processing, video semantic understanding, video content/behavior recognition, 3D object reconstruction, 3D technology, virtual Reality, augmented reality, simultaneous positioning and map construction and other technologies, as well as common biometric identification technologies such as face recognition and fingerprint recognition.
  • OCR Optical Character Recognition, Optical Character Recognition
  • video processing video semantic understanding, video content/behavior recognition
  • 3D object reconstruction 3D technology
  • virtual Reality augmented reality
  • simultaneous positioning and map construction and other technologies as well as common biometric identification technologies such as face recognition and fingerprint recognition.
  • Natural Language Processing It is an important direction in the field of computer science and artificial intelligence. It studies various theories and methods that can realize effective communication between humans and computers using natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Therefore, research in this field will involve natural language, the language that people use on a daily basis, so it is closely related to the study of linguistics. Natural language processing technology usually includes text processing, semantic understanding, machine translation, robot question answering, knowledge graph and other technologies.
  • Machine Learning It is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in how computers simulate or realize human learning behaviors to acquire new knowledge or skills, and to reorganize existing knowledge structures to continuously improve their performance.
  • Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent, and its applications are in all fields of artificial intelligence.
  • Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other technologies.
  • artificial intelligence technology has been researched and applied in many fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, autonomous driving, drones It is believed that with the development of technology, artificial intelligence technology will be applied in more fields and play an increasingly important value.
  • Graphical and text prior high-quality is to construct a reasonable evaluation system for the quality of the article from the perspective of the article content itself, so as to help the recommender better understand and apply the graphic content published by the content center.
  • the multi-dimensional image and text, account number, article typesetting experience, atomic features of article linguistics (such as the lexical diversity used in the article, whether the article uses figurative sentences and other diverse syntaxes, whether the article cites ancient Poetry, etc.) and other dimensions are modeled separately, and finally an integrated article a priori high-quality identification method is constructed.
  • Multi-dimensional graphics and text refers to multi-dimensional machine learning (MultiModal Machine Learning, MMML), specifically refers to the ability to process and understand multi-source dimensional information through machine learning methods.
  • the main research direction is multi-dimensional learning between semantics, images, and videos.
  • Multi-dimensional learning can be divided into the following five research directions: multi-dimensional representation learning, dimensional transformation, alignment, multi-dimensional fusion, and collaborative learning.
  • Single-dimensional representation learning is responsible for representing information as numerical vectors that can be processed by computers or further abstracted into higher-level feature vectors, while multi-dimensional representation learning refers to eliminating redundancy between dimensions by exploiting the complementarity between multiple dimensions. , so as to learn a better feature representation.
  • Linguistics The scientific study of human language, involving the analysis of language form, language meaning, and context. Linguistics has an important thematic division between the study of language structure (grammar) and the study of meaning (semantics and pragmatics). Grammar includes morphology (the formation and composition of words), syntax (the rules that determine how words form phrases or sentences), and phonology (the study of sound systems and abstract sound units). In order to comprehensively evaluate the quality of the article, this application analyzes the overall semantic information of the article, the semantic relationship between the sentences in the article, the lexical diversity of the article, the diversity of rhetorical devices used in the article (such as metaphorical sentences, parallel sentences, etc.), and the situation of citing ancient poems in the article. A comprehensive evaluation of high-quality articles is carried out through various feature dimensions.
  • Ensemble learning is a machine learning paradigm. In ensemble learning, we train multiple models to solve the same problem and combine them for better results. A single learner is either prone to underfitting or overfitting. In order to obtain a learner with excellent generalization performance, multiple individual learners can be trained, and a strong learner can be finally formed through a certain combination strategy. This method of integrating multiple individual learners is called ensemble learning. Ensemble learning is to integrate the modeling results of all models by building multiple models on the data. The ensemble algorithm considers the modeling results of multiple estimators and aggregates them to obtain a comprehensive result to obtain better regression or classification performance than a single model.
  • BERT Bidirectional Encoder Representations from Transformers
  • BERT is a new language representation model. BERT aims to pretrain deep bidirectional representations based on the left and right contexts of all layers. As a result, pretrained BERT representations can be fine-tuned with only one additional output layer, thereby creating state-of-the-art models for many tasks (such as question answering and language inference) without extensive modifications to the task-specific architecture.
  • the concept of BERT is simple, but the experimental effect is very powerful. It refreshes the current best results of 11 natural language processing (NLP) tasks, including improving the benchmark of GLUE (General Language Understanding Evaluation).
  • NLP natural language processing
  • the HAN (Hierarchical Attention Network, multi-layer attention model) model has good classification accuracy in long text classification tasks.
  • the overall structure of the model is as follows: the input word vector sequence is w 2x , through the word level After the Bi-GRU (Bi-Gated Recurrent Unit, gated recurrent unit model), each word will have a corresponding hidden vector h output by Bi-GRU, and then get attention through the dot product of the u w vector and each h vector Force (attention) weight, and then make a weighted sum of the h sequence according to the attention weight to obtain the sentence summary vector s 2 , each sentence is obtained through the same Bi-GRU structure plus attention to obtain the final output document feature vector v vector , and then the final text classification result is obtained through the fully connected layer and the classifier according to the v vector.
  • the HAN model structure is very in line with the human understanding process from words to sentences to chapters. It not only solves the problem of TextCNN (Bi-Gated Recurrent Unit, gated
  • Attention mechanism It is a problem-solving method proposed by imitating human attention, which can quickly filter out high-value information from a large amount of information, and is usually used in encoder + decoder models.
  • the attention mechanism can help the model to assign different weights to each part of the input, extract more critical and important information, and enable the model to make more accurate judgments without bringing more overhead to the model's calculation and storage.
  • the encoder+decoder model is used for translation, between the input sentence and the output sentence, often one or several input words correspond to one or several output words. If each word in the sentence is assigned the same It is unreasonable to do so. Therefore, different weights will be assigned to different words to distinguish the important parts of the sentence.
  • VistaNet model Using the Attention mechanism to integrate image and text information, it cleverly solves the problem of inconsistency in the vector space of data of different dimensions, and enhances the model's ability to analyze sentiment for comments.
  • the VistaNet model is divided into three layers from bottom to top: word encoder + attention layer 11 (Word Encoder + Attention layer), sentence encoder + attention layer 12 (Sentence Encoder + Attention layer) and text encoder + attention layer 13 (Document Encoder+Attention layer).
  • word encoder + attention layer 11 Word Encoder + Attention layer
  • sentence encoder + attention layer 12 Seence Encoder + Attention layer
  • text encoder + attention layer 13 Document Encoder+Attention layer
  • the input data of this layer is the article word vector w (the maximum single number is T) of each word of each sentence in the article content.
  • the article word vector can be obtained through a pre-trained neural network model. After inputting the word vector w, the hidden state of each RNN in two directions is obtained through a bidirectional Recurrent Neural Network (RNN), and the obtained two hidden states are spliced together as the output of the time step (timestep ). Then, the attention mechanism is used to calculate the importance weight ⁇ of each time step, and after normalizing the importance weight ⁇ , the output of all time steps is weighted and summed to obtain the vector representation si of the sentence.
  • RNN bidirectional Recurrent Neural Network
  • u i,t represents the weight value of the t-th word in the i-th sentence.
  • U is a randomly initialized value.
  • tanh() represents the hyperbolic tangent function.
  • W w is the word embedding matrix corresponding to the article word vector w.
  • h i,t represents the hidden state of the t-th word in the i-th sentence.
  • b w is a constant.
  • ⁇ i,t represents the normalized weight value of the t-th word in the i-th sentence.
  • exp() represents an exponential function with base e of the natural logarithm.
  • s i represents the vector representation of the sentence, which is the sentence vector of the article.
  • This layer inputs the article sentence vector si (up to L sentences) of each sentence in the article content, and outputs the text representation d j for the j-th image, which is the text feature vector of the j-th image.
  • the input article sentence vector is passed through a bidirectional RNN to obtain the hidden state of each RNN in two directions, and the hidden state hi of each sentence is obtained by splicing.
  • the image feature vector mj of the jth image in the article content is extracted.
  • a method for obtaining an image feature vector is as follows: input the image a j in the article content (1 ⁇ j ⁇ M, M is the number of images in the article content) into the CNN network, and extract the features of the image.
  • the attention mechanism on hi by mj, the importance weight ⁇ corresponding to each hi is obtained, and the hi for the jth image is weighted and averaged according to the obtained ⁇ , and the text representation dj for the jth image is obtained.
  • the specific calculation formula is as follows:
  • p j represents the contribution value of the jth image to the weight value.
  • W p represents the embedding matrix of the jth image.
  • b p is a constant.
  • q i represents the contribution value of the i-th sentence to the weight value.
  • W q represents the embedding matrix of the ith sentence.
  • b q is a constant.
  • V is a randomly initialized value.
  • v j,i represents the weight value of the i-th sentence corresponding to the j-th image.
  • represents the exclusive OR operation.
  • ⁇ j,i represents the normalized weight value of the i-th sentence corresponding to the j-th image.
  • h i represents the ith hidden state.
  • m j represents the image feature vector of the jth image.
  • the input of the last layer is multiple text feature vectors dj generated for different images, and the corresponding weights are calculated using the attention mechanism to obtain the text feature vectors representing the final content of the entire article.
  • the content of the article is evaluated according to the final text feature vector.
  • the specific calculation formula is as follows:
  • k j represents the weight value corresponding to the jth image.
  • K is a randomly initialized value.
  • W d is the embedding matrix corresponding to the text feature vector d.
  • b d is a constant.
  • ⁇ j is the normalized weight value corresponding to the jth image.
  • d represents the evaluation result of the entire article content.
  • FIG. 2 shows a schematic structural diagram of a computer system provided by an exemplary embodiment of the present application.
  • the computer system 200 includes: a terminal 220 and a server 240 .
  • An application related to article content evaluation is installed on the terminal 220 .
  • the application may be a small program in an app (application, application), a special application, or a web client.
  • the user can receive the evaluation result of the article content on the terminal 220, or the user can send the article content to the server 240, and the server 240 makes a corresponding evaluation result and returns the evaluation result to the terminal 220 or other terminals.
  • the terminal 220 is at least one of a smart phone, an in-vehicle computer, a tablet computer, an e-book reader, an MP3 player, an MP4 player, a laptop computer, and a desktop computer.
  • the terminal 220 is connected to the server 240 through a wireless network or a wired network.
  • the server 240 may be an independent physical server, or a server cluster or a distributed system composed of multiple physical servers, or may provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, Cloud servers for basic cloud computing services such as middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms.
  • the server 240 is used to provide background services for the application program supporting article content evaluation.
  • the server 240 undertakes the main computing work, and the terminal 220 undertakes the secondary computing work; or, the server 240 undertakes the secondary computing work, and the terminal 220 undertakes the main computing work; or, both the server 240 and the terminal 220 adopt a distributed computing architecture perform collaborative computing.
  • FIG. 3 shows a schematic flowchart of a method for evaluating content provided by an exemplary embodiment of the present application.
  • the method can be executed by the terminal 220 or the server 240 shown in FIG. 2, and the method includes the following steps:
  • Step 302 Extract article feature information in the article content, where the article feature information includes text information and multimedia information, and the multimedia information is at least one of image information, video information, and audio information.
  • the article feature information mentioned in the embodiments of this application includes at least two kinds of information, and the aforementioned at least two kinds of information include text information, except In addition to the text information, the article feature information also includes at least one of image information, video information, and audio information.
  • Textual information refers to the text in the content of the article. For example, if there are text C and text D in the article content, then text C and text D are text information.
  • Image information refers to images in the content of the article. For example, if there are image A and image B in the article content, then image A and image B are image information.
  • the video information refers to the video included in the article or the video related to the article.
  • the video information refers to the video corresponding to the video link.
  • the video information can be converted into image information, wherein all the video information can be converted into image information, and part of the video information can also be converted into image information. for image information. For example, if a video consists of 30 video frame images, the 30 video frame images can be regarded as images in the content of the article, and then the video information can be converted into image information.
  • Audio information refers to the audio included in the article or the audio associated with the article. For example, if there is an audio link in the article, the audio information refers to the audio corresponding to the audio link.
  • the audio information is converted into text information through speech recognition technology. For example, the text content of the audio is determined by the speech recognition technology, and the obtained text content is used as the text information.
  • Step 304 Obtain evaluation results of at least two dimensions of the article content according to the article feature information.
  • the evaluation results of the at least two dimensions include at least two of text-multimedia evaluation results, text evaluation results, objective prior evaluation results, and typesetting evaluation results.
  • the evaluation results in at least two dimensions include text-multimedia evaluation results, text evaluation results, objective prior evaluation results, and typesetting evaluation results.
  • the evaluation results in at least two dimensions include text-multimedia evaluation results, text evaluation results, objective prior evaluation results, and typesetting evaluation results.
  • the evaluation results in at least two dimensions include text-audio evaluation results, objective prior evaluation results, and typesetting evaluation results.
  • Evaluation results can take many forms.
  • the evaluation result is the identification and determination of high-quality content and low-quality content in the article content.
  • the evaluation result is a score for the content of the article.
  • the text-multimedia evaluation results are used to represent the relevance between images and texts in the article content.
  • the text part of the article content is "I went swimming yesterday"
  • the corresponding image part is an image of a plum blossom.
  • the correlation between the image and the text is not high, and it can be considered that the text and the image here are Not relevant.
  • the text evaluation results are used to evaluate the quality of the text in the article content.
  • the text evaluation result can obtain high-quality text in the content of the article.
  • the high-quality text here refers to the rich information carried by the text and the amount of information that users can obtain by reading. For example, if there are two paragraphs with the same number of words, the first paragraph can extract information A, and the second paragraph can extract information A and information B. In comparison, the quality of the second paragraph is higher than that of the first paragraph. .
  • the objective prior evaluation result refers to the evaluation result that can be obtained from the content of the article or the information related to the content of the article without considering the content of the article.
  • the content of the article is published by account A, and account A has published many high-quality articles.
  • the objective prior evaluation here will be improved.
  • Typesetting evaluation results are used to evaluate the typesetting layout of the article content.
  • text A is on page 5 of the article
  • image A corresponding to text A is on page 12 of the article.
  • the layout of text A and image A here is unreasonable, and the user is reading
  • the content of the article will be very inconvenient, so the results of the typography evaluation here will be worse.
  • the method for obtaining evaluation results of at least two dimensions through a neural network model is used as an example for description:
  • Feature information of at least two dimensions refers to multi-dimensional information that can be extracted from image information and text information, including but not limited to text-level information, image-level information, information corresponding to the combination of text and images, and publication of article content. At least one of the information corresponding to the account and typesetting information.
  • the feature information of at least two dimensions includes an image feature vector, an article word vector, an article sentence vector, a text feature vector (referring to the feature vector corresponding to all or part of the text content in the article content), statistical features, language at least one of a learning feature, an image quality feature, and an account feature.
  • the article content evaluation model is a machine learning method that predicts the evaluation results of the corresponding dimensions according to the feature information of at least two dimensions. Model.
  • Step 306 Integrate the evaluation results of at least two dimensions to obtain the content evaluation result of the article content.
  • the content evaluation result is used to evaluate the article content from at least two dimensions.
  • the at least two dimensions here refer to at least two dimensions among the above-mentioned text-multimedia evaluation results, text evaluation results, objective prior evaluation results, and typesetting evaluation results.
  • the evaluation results of at least two dimensions are fused through a neural network.
  • corresponding weights are assigned to the evaluation results of at least two dimensions, and the corresponding content evaluation results are calculated by using the weights.
  • this embodiment makes corresponding evaluations on the article content from multiple dimensions by integrating the evaluation results of multiple dimensions of the article content, so that the final evaluation result is more in line with the actual content of the article.
  • the final evaluation result integrates the evaluation of multiple dimensions, it can effectively reduce the erroneous evaluation results and improve the robustness of the whole scheme.
  • FIG. 4 shows a schematic flowchart of a content evaluation method provided by an exemplary embodiment of the present application.
  • the method can be executed by the terminal 220 or the server 240 shown in FIG. 2, and the method includes the following steps:
  • Step 401 Extract article feature information in the article content.
  • the article feature information includes text information and multimedia information, and the multimedia information is at least one of image information, video information, and audio information.
  • Step 402 Extract feature information of article feature information in at least two dimensions.
  • FIG. 5 shows an exemplary structural diagram of an article content evaluation model provided by an exemplary embodiment of the present application.
  • the article content evaluation model includes but is not limited to four sub-networks and an attention fusion layer 55, which are respectively a text-multimedia evaluation sub-network 51, an objective prior feature sub-network 52, a text sub-network 53 and a typesetting sub-network 54.
  • the article content evaluation model is composed of at least two dimensions in the above four sub-networks.
  • the inputs of the above four sub-networks are corresponding at least two dimension feature information, and the outputs are their respective evaluation results.
  • the number of text-multimedia evaluation sub-networks 51 in the article content evaluation model is the same as the number of information included in the multimedia information.
  • the article content evaluation model includes two text-multimedia evaluation sub-networks, one of which is used to determine the text-image evaluation results, and the other is used to determine the text-multimedia evaluation results. - Video evaluation results.
  • the input of the attention fusion layer 55 is the evaluation result of at least two dimensions, and the output is the content evaluation result 56 .
  • the attention fusion layer 55 is used to assign corresponding weights to the input evaluation results of at least two dimensions, and perform corresponding weighting calculations to obtain the content evaluation results 56 .
  • Step 403 Extract text-multimedia feature vectors in the feature information of at least two dimensions.
  • the text-multimedia feature vector includes at least one of a text-image feature vector, a text-video feature vector, and a text-audio feature vector.
  • the text-image feature vector includes text feature vector and image feature vector
  • the text-video feature vector includes text feature vector and video feature vector
  • the text-audio feature vector includes text feature vector and audio feature vector.
  • the text feature vector is a feature vector corresponding to all or part of the text in the article content.
  • the image feature vector can be directly extracted, or it can be extracted after processing the original data.
  • the text feature vector can be directly extracted, or it can be extracted after processing the original data.
  • the text feature vector is obtained by directly extracting the feature vector of words.
  • the feature vector of the word is processed to obtain the feature vector of the sentence.
  • the image feature vector is used to characterize the feature vector of the image in the article content.
  • Image feature vectors can be obtained by corresponding convolutional neural networks.
  • the video feature vector is used to represent the feature vector of the video in the article content.
  • the video in the article content includes multiple video frame images, and the video feature vector is obtained by extracting image features of the aforementioned multiple video frame images.
  • the audio feature vector is used to represent the feature vector of the audio in the article content.
  • audio is converted into text through speech recognition technology, and an audio feature vector is obtained by extracting text features of the aforementioned text.
  • the text feature vector can refer to the article word vector or the article sentence vector.
  • Article word vectors can be obtained from pre-trained neural network models.
  • Step 404 Input the text-multimedia feature vector into the text-multimedia sub-network to obtain the text-multimedia evaluation result of the article content.
  • the text-multimedia evaluation results include at least one of text-image evaluation results, text-video evaluation results, and text-audio evaluation results.
  • the text-multimedia evaluation result is the text-image evaluation result for illustration, as shown below:
  • This step includes the following sub-steps:
  • Text feature representation refers to a feature vector that incorporates image information on the basis of text feature vector.
  • Image feature representation refers to feature vectors that incorporate text information on the basis of image feature vectors.
  • the text feature representation and the image feature representation are combined to generate the text-multimedia evaluation result of the article content.
  • the fusion method here can be fusion through the corresponding neural network model.
  • FIG. 6 shows an exemplary structural diagram of a text-multimedia evaluation sub-network provided by an exemplary embodiment of the present application.
  • the text-multimedia evaluation sub-network can be divided into two parts: left and right.
  • the left part is introduced first.
  • the left part from top to bottom is the transformer encoder 61 (transformer refers to a neural network model based on the encoder + decoder architecture, and the attention mechanism is used in the transformer model, which is used in the field of natural language processing), transformer encoder 62 and attention fusion layer 63.
  • transformer encoder 61 transformer refers to a neural network model based on the encoder + decoder architecture, and the attention mechanism is used in the transformer model, which is used in the field of natural language processing
  • transformer encoder 62 and attention fusion layer 63.
  • the input of the transformer encoder 61 is article word vectors W1 to Wn (n represents the maximum number of words in a sentence), and the output is article sentence vectors S1 to SL (L represents the maximum number of sentences in the article content).
  • the transformer encoder will combine the input n article word vectors to output L article sentence vectors.
  • the inputs of the transformer encoder 62 are article sentence vectors S1 to SL, and the outputs are text feature vectors H1 to Hm (m represents the maximum number of images in the article content), wherein the text feature vectors correspond to the images in the article content one-to-one.
  • the transformer encoder will combine the input L article sentence vectors to output m text feature vectors.
  • the input of the attention fusion layer 63 is the text feature vectors H1 to Hm, and the output is the text feature representation 66 of the fused image features.
  • the attention fusion layer 63 can fuse the corresponding image information on the basis of the text feature vector through the attention mechanism, so as to obtain the text feature representation 64 of the fused image feature.
  • the specific fusion process can refer to the related calculation methods in the sentence encoder + attention layer 12 and the text encoder + attention layer 13 in the VistaNet model shown in FIG. 1 .
  • the right part of the text-multimedia evaluation sub-network includes the extraction layer 64 and the attention fusion layer 65 .
  • the input of the extraction layer 64 is m images in the article content, and the output is image feature vectors M1 to Mm.
  • the extraction layer 64 is a pre-trained feature extraction network, which can extract the corresponding image feature vector from the image.
  • the input of the attention fusion layer 65 is the image feature vectors M1 to Mm, and the output is the image feature representation 66 of the fused text information.
  • the attention fusion layer 65 is similar to the attention fusion layer 63, and the corresponding text information can be fused on the basis of the image feature vector through the attention mechanism to obtain the image feature representation 64 fused with the text features.
  • the specific fusion process can refer to the related calculation methods in the sentence encoder + attention layer 12 and the text encoder + attention layer 13 in the VistaNet model shown in FIG. 1 .
  • the text-multimedia evaluation sub-network also includes a fusion layer 67 and a multi-layer perceptron 68 (Multi-Layer Perceptron, MLP).
  • the input of the fusion layer 67 is the text feature representation 64 of the fused image information and the image feature 66 of the fused text information, and the output is the fused feature.
  • the input of the multi-layer perceptron 68 is the fused feature, and the output is the text-multimedia evaluation result.
  • the multi-layer perceptron 68 is used to identify and classify the fused features, extract useful information therein, and obtain text-multimedia evaluation results.
  • the text-multimedia feature vector also includes the text-video feature vector
  • an additional text-multimedia evaluation sub-network needs to be set in the article content evaluation model, and the text-multimedia evaluation sub-network is used to determine Text-video evaluation results.
  • the text-multimedia feature vector also includes the text-audio feature vector
  • an additional text-multimedia evaluation sub-network needs to be set in the article content evaluation model, and the text-multimedia evaluation sub-network is used to determine the text-audio evaluation result.
  • the text-multimedia feature evaluation results can evaluate the degree of correlation between the image and the text in the article content, and can solve the problem that the image content itself is not high-quality, the image information configured for the same text description is redundant, and multi-angle repeated shooting. images, etc.
  • the sub-network obtains correct evaluation results, and the recognition effect of high-quality content is improved.
  • the text-multimedia evaluation sub-network can also learn the characteristics of text-multimedia inconsistency, which is outstanding in the content of articles such as feelings and chicken soup for the soul.
  • Step 405 Extract objective prior features in the feature information of at least two dimensions.
  • the objective prior features include at least one of statistical features, linguistic features, image quality features, and account features.
  • the statistical features include at least one of page height, image area, and the number of words and paragraphs.
  • the linguistic features include at least one of lexical diversity, syntactic diversity, rhetorical devices, and poetic citations.
  • the image quality feature includes at least one of image sharpness, image channel number, image size, and image quantity.
  • the account characteristics include at least one of account level, account verticality, and consumption data such as account favorites and likes.
  • Step 406 Input the objective prior feature into the objective prior feature sub-network to obtain the objective prior evaluation result of the article content.
  • FIG. 7 shows an exemplary structural diagram of an objective prior feature sub-network provided by an exemplary embodiment of the present application.
  • the objective prior feature sub-network is divided into an embedding layer 71 , a feature intersection layer 72 and a multilayer perceptron 74 from top to bottom.
  • the input of the embedding layer 71 is at least one of statistical features, linguistic features, image quality features, and account features, and the output is continuous feature 1 to feature x (x is a positive integer).
  • the statistical features, linguistic features, image quality features, and account features are converted into features that can be represented by continuous vectors, and the dimension of the input features can be reduced.
  • the input of the feature intersection layer 72 is the above-mentioned feature 1 to feature x, and the output is the total feature 73 after the intersection.
  • the feature intersection layer 72 multiplies the input features 1 to feature x two by two, and assigns weights to perform a summation operation to obtain the corresponding total feature 73.
  • the total feature 73 can indicate that the input is a statistical feature, language as a whole. characteristics, image quality characteristics, account characteristics.
  • the input of the multilayer perceptron 74 is the total feature 73, and the output is the objective prior evaluation result.
  • the multi-layer perceptron 74 is used to identify and classify the total feature 73, extract useful information therein, and obtain an objective prior feature evaluation result.
  • implicit objective experience results can be obtained through objective prior evaluation results, for example, the influence of account authority on the content of the article, the influence of the use of rhetorical devices in the article on the content of the article, etc. These effects are hard to notice, but real.
  • the objective a priori evaluation results can visualize these implicit objective experience results, and it is convenient to obtain reasonable evaluation results.
  • Step 407 Extract article word vectors in the feature information of at least two dimensions.
  • the article word vectors in the feature information of at least two dimensions are extracted from the pre-trained neural network model.
  • Step 408 Input the article word vector into the text sub-network to obtain the text evaluation result of the article content.
  • FIG. 8 shows an exemplary structural diagram of a text sub-network according to an exemplary embodiment of the present application.
  • the text sub-network includes a transformer encoder 81 , a transformer encoder 82 and a multilayer perceptron layer 84 .
  • the inputs of the transformer encoder 81 are article word vectors W1 to Wn (n represents the maximum number of words in a sentence), and the outputs are article sentence vectors S1 to SL (L represents the maximum number of sentences in the article content).
  • the transformer encoder will combine the input n article word vectors to output L article sentence vectors.
  • the inputs of the transformer encoder 82 are article sentence vectors S1 to SL, and the output is a text feature vector 83, wherein the text feature vector H represents a feature vector corresponding to all the characters in the article content.
  • the transformer encoder will combine the input L article sentence vectors to output a text feature vector 83 .
  • the input of the multilayer perceptron layer 84 is the text feature vector 83, and the output is the text evaluation result.
  • the multi-layer perceptron layer 84 is used to identify and classify the input text feature vector, extract useful information therein, and obtain text evaluation results.
  • the text evaluation result is a specific evaluation of the text. Since the input features are highly related to the text, the most accurate evaluation result of the text in the article content can be obtained, making the final evaluation result more accurate in terms of text. .
  • Step 409 Extract text-multimedia feature vectors in the feature information of at least two dimensions.
  • the text-multimedia feature vector includes at least one of a text-image feature vector, a text-video feature vector, and a text-audio feature vector.
  • the text-image feature vector includes text feature vector and image feature vector
  • the text-video feature vector includes text feature vector and video feature vector
  • the text-audio feature vector includes text feature vector and audio feature vector.
  • Step 410 Input the text-multimedia feature direction into the typesetting sub-network to obtain the typesetting evaluation result of the article content.
  • FIG. 9 shows an exemplary structural diagram of a typesetting subnet shown in an exemplary embodiment of the present application.
  • the typesetting sub-network includes: long short-term memory neural network 91 , attention fusion layer 92 , CNN93 and multi-layer perception layer 95 .
  • the long short-term memory neural network 91 and the attention fusion layer 92 work together.
  • the input of the long short-term memory neural network 91 is the staggered image feature vector and the text feature vector V1 to VL
  • the input of the attention fusion layer 92 is the above-mentioned typesetting feature fused with image information and text feature vector
  • the output is fused with image information.
  • the input of CNN93 is the staggered image feature vector and text feature vector V1 to VL, and the output is the typesetting feature of the image vector fused with text information.
  • the output of the attention fusion layer 92 and the output of the CNN 93 constitute typesetting features 94 .
  • the input of the multilayer perceptron layer 95 is the typesetting feature 94, and the output is the typesetting evaluation result.
  • the multi-layer perceptron layer 95 is used for identifying and classifying the input typesetting features, extracting useful information therein, and obtaining typesetting evaluation results.
  • the typesetting evaluation result can make a corresponding evaluation on the typesetting layout of the article content, because the typesetting layout of the article content is also an implicit objective experience result.
  • the impact of typography on the content of the article is hard to notice, but it is real.
  • Typesetting evaluation results can visualize these implicit objective experience results, so that reasonable evaluation results can be obtained.
  • Step 411 Assign corresponding weights to the evaluation results of at least two dimensions through the article content evaluation model and the attention mechanism.
  • the weight value can be adjusted according to actual needs.
  • the weight values are determined by a pretrained neural network.
  • Step 412 Perform a weighted average of the evaluation results of at least two dimensions through the article content evaluation model and the weight value to obtain the content evaluation result of the article content.
  • FIG. 10 shows a complete article content evaluation model of this embodiment.
  • FIG. 10 shows a complete article content evaluation model of this embodiment.
  • this embodiment covers all factors affecting the evaluation of the article content by integrating the evaluation results of multiple dimensions of the article content. Even in complex scenarios, the article content can be reasonably evaluated, and the The high-quality part and the inferior-quality part make the obtained evaluation results more in line with the actual situation. And four different sub-networks are used to obtain the corresponding evaluation results, and the corresponding evaluation results can be selected according to the actual needs for fusion to meet the actual needs.
  • FIG. 11 shows a schematic flowchart of a text-multimedia multi-sub-network training method provided by an exemplary embodiment of the present application.
  • the method can be executed by the terminal 220 or the server 240 shown in FIG. 2 or other computer equipment, and the method includes the following steps:
  • Step 1101 Obtain a text-multimedia training set, where the text-multimedia training set includes sample articles and real text-multimedia evaluation results corresponding to the sample articles.
  • sample articles for example, downloading sample articles from the network, or receiving sample articles sent by other terminals, or obtaining sample articles from local storage, or obtaining real-time input sample articles. This application does not limit this.
  • the real text-multimedia evaluation results are the evaluation results made by relevant technicians or reading users on the text-multimedia relations in the sample articles.
  • step 401 to step 404 For the specific process of step 1102 to step 1104 in this embodiment, reference may be made to step 401 to step 404 .
  • Step 1102 Extract feature information of sample articles in the sample articles.
  • the sample article feature information includes sample text information and sample multimedia information
  • the sample multimedia information includes at least one of sample image information, sample video information, and sample audio information.
  • sample article feature information includes sample image information and sample text information
  • sample image information and sample text information are directly extracted from the sample article.
  • the sample video information when sample video information exists in the sample article feature information, the sample video information is converted into multiple video frame images, image information is obtained according to the multiple video frame images, and the aforementioned obtained image information is used as sample image information.
  • the sample audio information when sample audio information exists in the sample article feature information, the sample audio information is converted into text information, and the text information is used as the sample text information.
  • Step 1103 Extract the sample text-multimedia feature vector according to the feature information of the sample article.
  • the sample text-multimedia feature vector includes at least one of a sample text-image feature vector, a sample text-video feature vector, and a sample text-audio feature vector.
  • the sample text-image feature vector includes the sample text feature vector and the sample image feature vector
  • the sample text-video feature vector includes the sample text feature vector and the sample video feature vector
  • the sample text-audio feature vector includes the sample text feature vector and the sample audio feature vector.
  • a sample image feature vector is extracted from the sample image information in the sample article feature information through an image feature extraction network.
  • a sample text feature vector is extracted from the sample text information in the sample article feature information through a text feature extraction network.
  • step 403 in the embodiment shown in FIG. 4 , which will not be repeated here.
  • Step 1104 Input the sample text-multimedia feature vector into the text-multimedia evaluation sub-network to obtain a predicted text-multimedia evaluation result.
  • the predicted text-multimedia evaluation result is obtained by fusing the sample image feature vector and the sample text feature vector through a text-multimedia evaluation sub-network.
  • Step 1105 Train the text-multimedia evaluation sub-network according to the error loss between the predicted text-multimedia evaluation result and the real text-multimedia evaluation result.
  • the network parameters in the text-multimedia evaluation sub-network are modified through an error back-propagation algorithm.
  • Thresholds can be determined by the technician.
  • the training of the text-multimedia evaluation sub-network is completed.
  • this embodiment provides a training method for a text-multimedia evaluation sub-network, which can quickly construct a text-multimedia evaluation sub-network, and the obtained text-multimedia evaluation sub-network can accurately obtain the evaluation of the content of the article. Multi-dimensional evaluation results of text-multimedia.
  • FIG. 12 shows a schematic flowchart of the objective prior feature sub-network training method provided by an exemplary embodiment of the present application.
  • the method can be executed by the terminal 220 or the server 240 shown in FIG. 2 or other computer equipment, and the method includes the following steps:
  • Step 1201 Obtain an objective prior training set, where the objective prior training set includes sample articles and real objective prior evaluation results corresponding to the sample articles.
  • the real objective prior evaluation results are the evaluation results made by relevant technicians or reading users on the objective prior features in the sample articles.
  • step 1202 to step 1204 For the specific process of step 1202 to step 1204 in this embodiment, reference may be made to step 401 to step 402 and step 405 to step 406 .
  • Step 1202 Extract feature information of sample articles in the sample articles.
  • the sample article feature information includes sample text information and sample multimedia information
  • the sample multimedia information includes at least one of sample image information, sample video information, and sample audio information.
  • Step 1203 Obtain sample objective priori features of the sample article according to the sample article feature information.
  • sample objective prior features include at least one of sample statistical features, sample linguistic features, sample image quality features, and sample account features.
  • the statistical features of the samples are obtained according to the sample text information and the sample image features in the sample article feature information.
  • the sample linguistic feature is obtained according to the sample text information in the sample article feature information.
  • the sample image quality feature is obtained according to the sample image information in the sample article feature information.
  • Step 1204 Input the objective priori feature of the sample into the objective priori feature sub-network, and obtain the prediction objective priori evaluation result.
  • Predicting objective prior evaluation results is used to predict implicit objective experience results in sample articles.
  • Step 1205 According to the error loss between the predicted objective prior evaluation result and the real objective prior evaluation result, the objective prior feature sub-network is trained.
  • the network parameters in the objective prior feature sub-network are corrected through an error back-propagation algorithm.
  • Thresholds can be determined by the technician.
  • the training of the objective prior feature sub-network is completed.
  • this embodiment provides a training method for an objective priori feature sub-network, which can quickly construct an objective priori feature sub-network, and the obtained objective priori feature sub-network can accurately obtain the content of the article. objective prior evaluation results.
  • FIG. 13 shows a schematic flowchart of a text sub-network training method provided by an exemplary embodiment of the present application.
  • the method can be executed by the terminal 220 or the server 240 shown in FIG. 2 or other computer equipment, and the method includes the following steps:
  • Step 1301 Obtain a text training set, where the text training set includes sample articles and real text evaluation results corresponding to the sample articles.
  • the real text evaluation results are the evaluation results made by relevant technicians or reading users on the text in the sample articles.
  • step 1302 to step 1304 in this embodiment reference may be made to step 401 to step 402 and step 407 to step 408 .
  • Step 1302 Extract sample text information in the sample article.
  • the sample text information refers to the text in the sample article, or the sample text information refers to the text in the image in the sample article.
  • Step 1303 Extract the word vector of the sample article according to the sample text information.
  • the word vector extraction network is called to process the sample text information, and output the word vector of the sample article.
  • Step 1304 Input the word vector of the sample article into the text sub-network to obtain the predicted text evaluation result.
  • the predicted text evaluation results are used to predict the degree of influence of the text of the sample article on the content of the article.
  • Step 1305 Train the text sub-network according to the error loss between the predicted text evaluation result and the real text evaluation result.
  • the network parameters in the text sub-network are corrected through an error back-propagation algorithm.
  • Thresholds can be determined by the technician.
  • the training of the text sub-network is completed.
  • this embodiment provides a training method for a text sub-network, which can quickly construct a text sub-network, and the obtained text sub-network can accurately obtain the text evaluation result of the article content.
  • FIG. 14 shows a schematic flowchart of a typesetting sub-network training method provided by an exemplary embodiment of the present application.
  • the method can be executed by the terminal 220 or the server 240 shown in FIG. 2, and the method includes the following steps:
  • Step 1401 Obtain a typesetting training set, where the typesetting training set includes sample articles and real typesetting evaluation results corresponding to the sample articles.
  • the real typesetting evaluation results are the evaluation results made by relevant technical personnel or reading users on the typesetting in the sample articles.
  • step 1402 to step 1404 For the specific process of step 1402 to step 1404 in this embodiment, reference may be made to step 401 to step 402 and step 409 to step 410 .
  • Step 1402 Extract feature information of sample articles in the sample articles.
  • the sample article feature information includes sample text information and sample multimedia information
  • the sample multimedia information includes at least one of sample image information, sample video information, and sample audio information.
  • Step 1403 Extract the sample text-multimedia feature vector according to the feature information of the sample article.
  • the sample text-multimedia feature vector includes at least one of a sample image feature vector and a sample text feature vector.
  • an image feature extraction network is invoked, data processing is performed on sample image information, and a sample image feature vector is output.
  • a text feature extraction network is invoked, data processing is performed on the sample text information, and a feature vector of the sample text is output.
  • Step 1404 Input the sample text-multimedia feature vector into the typesetting sub-network to obtain the prediction typesetting evaluation result.
  • the prediction typesetting evaluation results are used to predict the degree of influence of the typesetting of the sample articles on the content of the article.
  • Step 1405 Train the typesetting sub-network according to the error loss between the predicted typesetting evaluation result and the actual typesetting evaluation result.
  • the network parameters in the typesetting sub-network are corrected through an error back-propagation algorithm.
  • Thresholds can be determined by the technician.
  • the training of the typesetting sub-network is completed.
  • this embodiment provides a typesetting sub-network training method, which can quickly construct the typesetting subnet, and the obtained typesetting subnet can accurately obtain the text evaluation result of the article content.
  • FIG. 15 shows an exemplary service architecture diagram shown by an exemplary embodiment of the present application.
  • the architecture diagram is divided into two parts: a low-quality filtering module 1501 and a high-quality identification module 1502 .
  • the low-quality filtering module 1501 is used to filter the low-quality content and the next-low-quality content in the article content.
  • the low-quality content includes, but is not limited to, at least one of vulgar content, rum, headline parties, and advertising marketing.
  • Vulgar content means that the content of the article is not conducive to social progress and the physical and mental development of users. Rumors refer to the content of the article that does not conform to the actual reality. Title party means that the content of the article does not match the title of the article.
  • Advertising marketing refers to the content of the article that promotes and promotes the product.
  • Sub-low-quality content includes, but is not limited to, at least one of non-nutritive, routine, gossip, politicians, splicing, negative impact, slobber and advertising soft articles.
  • the low-quality filtering module 1501 is implemented using a corresponding low-quality filtering neural network.
  • the high-quality identification module 1502 is used to extract high-quality parts in the article content.
  • the high-quality identification module includes: a feature extraction layer 1505 , a feature fusion layer 1503 , a multi-objective feedback layer 1504 , and a logical decision layer 1506 .
  • the feature extraction layer 1505 extracts corresponding feature vectors from three aspects: graphic and text multi-dimensionality, graphic and text atomic capability, and graphic and text nested layout.
  • the input of the feature extraction layer is the article content, and the output is the extracted feature vector.
  • the multi-dimensional image and text is used to extract the feature vector corresponding to the image in the article content, the feature vector corresponding to the text, and the feature vector corresponding to the correlation between the image and the text.
  • the image-text atomic ability extracts corresponding feature vectors from four aspects: linguistics, statistics, account numbers and article style. Among them, linguistics also includes: lexical diversity, syntactic diversity, rhetorical devices and poetry quotations.
  • Statistics also include: page height, picture area and word count, paragraph number, and average picture clarity and beauty.
  • the account also includes: account level, account verticality (account verticality refers to the professional degree of the content published by the account in a specific field) and consumption data such as account collections and likes.
  • account verticality account verticality refers to the professional degree of the content published by the account in a specific field
  • consumption data such as account collections and likes.
  • the style of the article includes: practicality and professionalism.
  • the input of the feature fusion layer 1503 is the feature vector extracted from the article content, and the output is the a priori high-quality content of the article content (a priori high-quality content refers to the quality part of the article content without user feedback. predict what is obtained).
  • the feature fusion layer fuses the input feature vectors, and obtains the prior high-quality content according to the fused feature vectors.
  • the input of the multi-objective feedback layer 1504 is the fused feature vector and the feedback information of multiple users, and the output is the posterior high-quality content (the posterior high-quality content refers to the correction of the input feature vector when the user's feedback information is obtained. , the predicted content obtained).
  • the multi-objective feedback layer can modify the input feature vector according to the user's feedback information, and fuse the modified feature vector to obtain the posterior high-quality content.
  • the multi-objective feedback layer can first fuse the input feature vectors to obtain high-quality content, and modify the high-quality content according to the user's feedback information to obtain a posteriori high-quality content.
  • the input of the logical decision layer 1506 is the prior high-quality content and the posterior high-quality content, and the output is the high-quality content of the article content.
  • the logic decision layer 1506 will comprehensively evaluate the prior high-quality content and the posterior high-quality content to obtain high-quality content of the article content.
  • the complex scene of judging high-quality graphic and text content is diversified from multiple dimensions of graphic and text, accounts, article typesetting experience, and linguistic atomic features of the article (such as the lexical diversity used in the article, whether the article uses figurative sentences, arranging sentences, etc.). Syntax, whether the article cites ancient poems, etc.) and other dimensions for innovative dismantling, and build an integrated model that integrates graphic dimension sub-network, objective prior feature sub-network, text sub-network and typesetting sub-network, so as to complete the graphic and text Identification and judgment of a priori high-quality content.
  • the model is mainly used in the task of judging the quality of graphic and text content in the content algorithm research and development center.
  • the model accuracy rate reaches 94%, and the coverage rate of high-quality graphic and text content reaches 16%.
  • a weighted recommendation experiment is carried out on the identified high-quality graphic and text content on the browser and express terminal side, so that high-quality content is preferentially recommended to users, and good business results have been achieved on the business side.
  • FIG. 16 shows a schematic structural diagram of an apparatus for evaluating content provided by an exemplary embodiment of the present application.
  • the apparatus can be implemented by software, hardware or a combination of the two to become all or a part of computer equipment, and the apparatus 1600 includes:
  • Extraction module 1601 is used to extract the article feature information in the article content, the article feature information includes text information, and the article feature information also includes at least one of image information, video information, and audio information;
  • An evaluation module 1602 configured to obtain evaluation results of at least two dimensions of the article content according to the article feature information
  • the evaluation fusion module 1603 is configured to fuse the evaluation results of the at least two dimensions to obtain the content evaluation result of the article content.
  • the extraction module 1601 is further configured to extract feature information of the article feature information in at least two dimensions.
  • the evaluation module 1602 is further configured to input the feature information of the at least two dimensions into an article content evaluation model to obtain evaluation results of the at least two dimensions of the article content, and the article content evaluation model is based on the A machine learning model that predicts the evaluation result of the corresponding dimension based on the feature information of at least two dimensions.
  • the extraction module 1601 is further configured to extract a text-multimedia feature vector in the feature information of the at least two dimensions, where the text-multimedia feature vector includes a text-image feature vector , at least one of a text-video feature vector and a text-audio feature vector.
  • the evaluation module 1602 is further configured to input the text-multimedia feature vector into the text-multimedia evaluation sub-network to obtain the text-multimedia evaluation result of the article content.
  • the evaluation module 1602 is further configured to generate a text feature representation fused with image information and an image feature representation fused with text information through the text-multimedia evaluation sub-network; - A multimedia evaluation sub-network, which fuses the text feature representation of the fused image information and the image feature representation of the fused text information to generate the text-multimedia evaluation result of the article content.
  • the extraction module 1601 is further configured to extract objective prior features in the feature information of the at least two dimensions, where the objective prior features include statistical features, linguistic features , at least one of image quality features, and account features.
  • the evaluation module 1602 is further configured to input the objective prior feature into the objective prior feature sub-network to obtain an objective prior evaluation result of the article content.
  • the extraction module 1601 is further configured to extract article word vectors in the feature information of the at least two dimensions.
  • the evaluation module 1602 is further configured to input the article word vector into the text sub-network to obtain the text evaluation result of the article content.
  • the extraction module 1601 is further configured to extract the text-multimedia feature vector in the feature information of the at least two dimensions.
  • the evaluation module 1602 is further configured to input the text-multimedia feature vector into the typesetting sub-network to obtain the typesetting evaluation result of the article content.
  • the evaluation fusion module 1603 is further configured to assign corresponding weights to the evaluation results of the at least two dimensions through the article content evaluation model and the attention mechanism; The model and the weight value are weighted and averaged on the evaluation results of the at least two dimensions to obtain the content evaluation result of the article content.
  • the apparatus further includes: a training module 1604 .
  • the training module 1604 is used to obtain a text-multimedia training set, the text-multimedia training set includes sample articles and real text-multimedia evaluation results corresponding to the sample articles; extract the sample article feature information in the sample articles; According to the sample evaluation information, extract a sample text-multimedia feature vector, where the sample text-multimedia feature vector includes a sample image feature vector and a sample text feature vector; input the sample text-multimedia feature vector into the text-multimedia subsection network to obtain a predicted text-multimedia evaluation result; according to the error loss between the predicted text-multimedia evaluation result and the real text-multimedia evaluation result, the text-multimedia evaluation sub-network is trained.
  • the training module 1604 is further configured to obtain an objective prior training set, where the objective prior training set includes sample articles and real objective prior evaluation results corresponding to the sample articles; extracting The sample evaluation information in the sample article; according to the sample evaluation information, obtain the sample objective priori features of the sample article; input the sample objective priori features into the objective priori feature sub-network to obtain the predicted objective priori Evaluation result; according to the error loss between the predicted objective prior evaluation result and the real objective prior evaluation result, the objective prior feature sub-network is trained.
  • the training module 1604 is further configured to obtain a text training set, where the text training set includes sample articles and real text evaluation results corresponding to the sample articles; sample text information; according to the sample text information, extract the word vector of the sample article; input the word vector of the sample article into the text sub-network to obtain the predicted text evaluation result; according to the predicted text evaluation result and the real text evaluation The error loss between the results on which the text sub-network is trained.
  • the training module 1604 is further configured to obtain a typesetting training set, where the typesetting training set includes sample articles and real typesetting evaluation results corresponding to the sample articles; Sample evaluation information; extract sample text-multimedia feature vector according to the sample evaluation information; input the sample text-multimedia vector into the typesetting sub-network to obtain a predicted typesetting evaluation result; according to the predicted typesetting evaluation result and The error loss between the real typesetting evaluation results is used to train the typesetting sub-network.
  • this embodiment makes corresponding evaluations on the article content from multiple dimensions by integrating the evaluation results of multiple dimensions of the article content, so that the final evaluation result is more in line with the actual content of the article.
  • the final evaluation result integrates the evaluation of multiple dimensions, it can effectively reduce the erroneous evaluation results and improve the robustness of the whole scheme.
  • Fig. 17 is a schematic structural diagram of a computer device according to an exemplary embodiment.
  • the computer device 1700 includes a Central Processing Unit (CPU) 1701, a system memory 1704 including a Random Access Memory (RAM) 1702 and a Read-Only Memory (ROM) 1703, and A system bus 1705 that connects the system memory 1704 and the central processing unit 1701 .
  • the computer device 1700 also includes a basic input/output system (Input/Output, I/O system) 1706 that facilitates the transfer of information between various devices within the computer device, and is used to store an operating system 1713, application programs 1714, and other programs Mass storage device 1707 for module 1715.
  • I/O system Basic input/output system
  • the basic input/output system 1706 includes a display 1708 for displaying information and input devices 1709 such as a mouse, keyboard, etc., for user input of information.
  • the display 1708 and the input device 1709 are both connected to the central processing unit 1701 through the input and output controller 1710 connected to the system bus 1705.
  • the basic input/output system 1706 may also include an input output controller 1710 for receiving and processing input from a number of other devices such as a keyboard, mouse, or electronic stylus.
  • input output controller 1710 also provides output to a display screen, printer, or other type of output device.
  • the mass storage device 1707 is connected to the central processing unit 1701 through a mass storage controller (not shown) connected to the system bus 1705 .
  • the mass storage device 1707 and its associated computer device-readable media provide non-volatile storage for the computer device 1700 . That is, the mass storage device 1707 may include a computer device readable medium (not shown) such as a hard disk or a Compact Disc Read-Only Memory (CD-ROM) drive.
  • a computer device readable medium such as a hard disk or a Compact Disc Read-Only Memory (CD-ROM) drive.
  • the computer device readable medium may include computer device storage media and communication media.
  • Computer device storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer device readable instructions, data structures, program modules or other data.
  • Computer equipment storage media include RAM, ROM, Erasable Programmable Read Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), CD-ROM , Digital Video Disc (DVD) or other optical storage, cassette, magnetic tape, disk storage or other magnetic storage device.
  • EPROM Erasable Programmable Read Only Memory
  • EEPROM Electrically Erasable Programmable Read-Only Memory
  • CD-ROM Compact Disc
  • DVD Digital Video Disc
  • the computer device 1700 may also operate via a network connection to a remote computer device on a network, such as the Internet. That is, the computer device 1700 can be connected to the network 1711 through the network interface unit 1712 connected to the system bus 1705, or can also use the network interface unit 1712 to connect to other types of networks or remote computer equipment systems (not shown). out).
  • a network such as the Internet. That is, the computer device 1700 can be connected to the network 1711 through the network interface unit 1712 connected to the system bus 1705, or can also use the network interface unit 1712 to connect to other types of networks or remote computer equipment systems (not shown). out).
  • the memory also includes one or more programs, the one or more programs are stored in the memory, and the central processing unit 1701 implements all or part of the steps of the above-mentioned content evaluation method by executing the one or more programs.
  • a computer-readable storage medium stores at least one instruction, at least one piece of program, code set or instruction set, the at least one instruction, the At least one piece of program, the code set or the instruction set is loaded and executed by the processor to implement the method for evaluating the content provided by each of the above method embodiments.
  • the present application further provides a computer-readable storage medium, where at least one instruction, at least one piece of program, code set or instruction set is stored in the storage medium, the at least one instruction, the at least one piece of program, the code set or The instruction set is loaded and executed by the processor to implement the content evaluation method provided by the above method embodiments.
  • a computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the content evaluation method described above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种内容的评价方法、装置、设备及介质,涉及人工智能领域。方法包括:提取文章内容中的文章特征信息,文章特征信息包括文本信息和多媒体信息,多媒体信息包括图像信息、视频信息、音频信息中的至少一种(302);根据文章特征信息,获得文章内容的至少两个维度的评价结果(304);融合至少两个维度的评价结果,获得文章内容的内容评价结果(306)。本申请可以对文章内容做出较为全面的评价。

Description

内容的评价方法、装置、设备及介质
本申请要求于2021年01月29日提交的申请号为202110125965.3、发明名称为“文章内容的评价方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能领域,特别涉及一种内容的评价方法、装置、设备及介质。
背景技术
文章内容的评价是指计算机设备对文章中的图文内容进行质量判定,确定文章中优质内容和劣质内容的占比,并根据确定的占比来判断文章内容的质量是否合格。
相关技术是计算机设备通过有监督学习或者是无监督学习,从文本信息的角度对文章内容进行评价,例如,从文章字数、用词多样性等统计学维度对文章内容进行评价,获得文章内容中的优质文本信息和劣质文本信息的占比,通过占比获得有关文本信息的评价结果,将获得的评价结果作为整个文章内容的评价结果。
但是,相关技术只能通过文本信息对文章内容进行评价,在一些情形下无法对文章内容做出合理的评价。
发明内容
本申请提供了一种内容的评价方法、装置、设备及介质,可以从多个维度对文章内容进行评价,获得合理的文章评价结果。所述技术方案如下:
根据本申请的一方面,提供了一种内容的评价方法,所述方法包括:
提取所述文章内容中的文章特征信息,所述文章特征信息包括文本信息和多媒体信息,所述多媒体信息包括图像信息、视频信息和音频信息中的至少一种;
根据所述文章特征信息,获得所述文章内容的至少两个维度的评价结果;
融合所述至少两个维度的评价结果,获得所述文章内容的内容评价结果。
根据本申请的另一方面,提供了一种内容的评价装置,所述装置包括:
提取模块,用于提取所述文章内容中的文章特征信息,所述文章特征信息包括图像信息、文本信息、视频信息和音频信息中的至少两种;
评价模块,用于根据所述文章特征信息,获得所述文章内容的至少两个维度的评价结果;
评价融合模块,用于融合所述至少两个维度的评价结果,获得所述文章内容的内容评价结果。
根据本申请的另一方面,提供了一种计算机设备,该计算机设备包括:处理器和存储器,存储器中存储有至少一条指令、至少一段程序、代码集或指令集,至少一条指令、至少一段程序、代码集或指令集由处理器加载并执行以实现如上方面所述的内容的评价方法。
根据本申请的另一方面,提供了一种计算机存储介质,计算机可读存储介质中存储有至少一条程序代码,程序代码由处理器加载并执行以实现如上方面所述的内容的评价方法。
根据本申请的另一方面,提供了一种计算机程序产品或计算机程序,上述计算机程序产品或计算机程序包括计算机指令,上述计算机指令存储在计算机可读存储介质中。计算机设备的处理器从上述计算机可读存储介质读取上述计算机指令,上述处理器执行上述计算机指令,使得上述计算机设备执行如上方面所述的内容的评价方法。
本申请实施例提供的技术方案带来的有益效果至少包括:
通过融合文章内容的多个维度的评价结果,从多个维度对文章内容做出相应的评价,使得最终得到的评价结果更加符合文章的实际内容。另一方面,由于最终的评价结果综合了多 个维度的评价,可以有效减少错误的评价结果,提高整个方案的鲁棒性。
附图说明
图1是本申请一个示例性实施例提供的VistaNet模型的结构示意图;
图2是本申请一个示例性实施例提供的计算机系统的结构示意图;
图3是本申请一个示例性实施例提供的内容的评价方法的流程示意图;
图4是本申请一个示例性实施例提供的内容的评价方法的流程示意图;
图5是本申请一个示例性实施例提供的文章内容评价模型的示例性的结构图;
图6是本申请一个示例性实施例提供的文本-多媒体评价子网络的示例性结构图;
图7是本申请一个示例性实施例提供的客观先验特征子网络的示例性结构图;
图8是本申请一个示例性实施例提供的文本子网络的示例性结构图;
图9是本申请一个示例性实施例提供的排版子网络的示例性结构图;
图10是本申请一个示例性实施例提供的文章内容评价模型的示例性的完整结构图;
图11是本申请一个示例性实施例提供的文本-多媒体子网络训练方法的流程示意图;
图12是本申请一个示例性实施例提供的客观先验特征子网络训练方法的流程示意图;
图13是本申请一个示例性实施例提供的文本子网络训练方法的流程示意图;
图14是本申请一个示例性实施例提供的排版子网络训练方法的流程示意图;
图15是本申请一个示例性实施例提供的示例性业务架构图;
图16是本申请一个示例性实施例提供的内容的评价装置的结构示意图;
图17是本申请一个示例性实施例提供的计算机设备的结构示意图。
具体实施方式
首先,对本申请实施例中涉及的名词进行介绍:
人工智能(Artificial Intelligence,AI):是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
计算机视觉技术(Computer Vision,CV):计算机视觉是一门研究如何使机器“看”的科学,更进一步的说,就是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像语义理解、图像检索、OCR(Optical Character Recognition,光学字符识别)、视频处理、视频语义理解、视频内容/行为识别、三维物体重建、3D技术、虚拟现实、增强现实、同步定位与地图构建等技术,还包括常见的人脸识别、指纹识别等生物特征识别技术。
自然语言处理(Nature Language processing,NLP):是计算机科学领域与人工智能领域中的一个重要方向。它研究能实现人与计算机之间用自然语言进行有效通信的各种理论和方法。自然语言处理是一门融语言学、计算机科学、数学于一体的科学。因此,这一领域的研究将涉及自然语言,即人们日常使用的语言,所以它与语言学的研究有着密切的联系。自然语言处理技术通常包括文本处理、语义理解、机器翻译、机器人问答、知识图谱等技术。
机器学习(Machine Learning,ML):是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、式教学习等技术。
随着人工智能技术研究和进步,人工智能技术在多个领域展开研究和应用,例如常见的智能家居、智能穿戴设备、虚拟助理、智能音箱、智能营销、无人驾驶、自动驾驶、无人机、机器人、智能医疗、智能客服等,相信随着技术的发展,人工智能技术将在更多的领域得到应用,并发挥越来越重要的价值。
图文先验优质:图文先验优质是从文章内容本身的角度出发,构造文章质量的合理评价体系,从而帮助推荐侧更好的理解与应用内容中心出库的图文内容。为了综合评价文章质量,分别从图文多维度、账号、文章排版体验、文章语言学类原子特征(如文章使用的词法多样性、文章是否使用比喻句排比句等多样化句法、文章是否引用古诗词等)等维度分别建模,最终构造一体化的文章先验优质识别方法。
图文多维度:指多维度机器学习(MultiModal Machine Learning,MMML),具体是指在通过机器学习的方法实现处理和理解多源维度信息的能力。主要研究方向是语义、图像、视频之间的多维度学习。多维度学习可以划分为以下五个研究方向:多维度表示学习、维度转化、对齐、多维度融合、协同学习。单维度的表示学习负责将信息表示为计算机可以处理的数值向量或者进一步抽象为更高层的特征向量,而多维度表示学习是指通过利用多维度之间的互补性,剔除维度间的冗余性,从而学习到更好的特征表示。
语言学(linguistics):是一门关于人类语言的科学研究,涉及对语言形式、语言含义和语境的分析。语言学在语言结构(语法)研究与意义(语义与语用)研究之间存在重要的主题划分。语法中包含了词法(单词的形成与组成),句法(决定单词如何组成短语或句子的规则)以及语音(声音系统与抽象声音单元的研究)。为了综合评价文章质量,本申请从文章整体语意信息、文章句子之间的语意关系、文章的词法多样性、文章使用的修辞手法多样性(如比喻句、排比句等)、文章引用古诗文情况等各个特征维度进行实现优质文章的综合评定。
集成模型:集成学习(ensemble learning)是一种机器学习范式。在集成学习中,我们会训练多个模型解决相同的问题,并将它们结合起来以获得更好的结果。单个学习器要么容易欠拟合要么容易过拟合,为了获得泛化性能优良的学习器,可以训练多个个体学习器,通过一定的结合策略,最终形成一个强学习器。这种集成多个个体学习器的方法称为集成学习。集成学习是通过在数据上构建多个模型,集成所有模型的建模结果。集成算法会考虑多个评估器的建模结果,汇总之后得到一个综合的结果,以此来获得比单个模型更好的回归或分类表现。
BERT(Bidirectional Encoder Representations from Transformers,来自变换器的双向编码器表征量)模型:BERT是一种新的语言表征模型。BERT旨在基于所有层的左、右语境来预训练深度双向表征。因此,预训练的BERT表征可以仅用一个额外的输出层进行微调,进而为很多任务(如问答和语言推理)创建当前最优模型,无需对任务特定架构做出大量修改。BERT概念简单,但实验效果很强大,它刷新了11个自然语言处理(Natural Language Processing,NLP)任务的当前最优结果,包括将GLUE(General Language Understanding Evaluation,通用语言理解评估基准)的基准提升至80.4%(7.6%的绝对改进)、将MultiNLI(一个公开的自然语言数据集)的准确率提高到86.7%(5.6%的绝对改进),以及将SQuADv1.1(一个数据集)问答测试的得分提高至93.2分(1.5分绝对提高)——比人类性能还高出2.0分。
HAN(Hierarchical Attention Network,多层注意力模型)模型:HAN(Hierarchical Attention Network)模型在长文本分类任务上有不错的分类精度,其模型整体结构如下:输入 词向量序列为w 2x,通过词级别的Bi-GRU(Bi-Gated Recurrent Unit,门控循环单元模型)后,每个词都会有一个对应的Bi-GRU输出的隐向量h,再通过u w向量与每个h向量点积得到注意力(attention)权重,然后把h序列做一个根据注意力权重的加权和,得到句子summary向量s 2,每个句子在通过同样的Bi-GRU结构再加attention得到最终输出的文档特征向量v向量,然后根据v向量通过全连接层再加分类器得到最终的文本分类结果。综上所述,HAN模型结构非常符合人从词到句子再到篇章的理解过程,它不仅解决了TextCNN(Text Convolutional Neural Networks,文本卷积神经网络)丢失文本结构信息的问题,还有较强的可解释性。
注意力机制(attention机制):是模仿人类注意力而提出的一种解决问题的方法,可以从大量信息中快速筛选出高价值信息,通常用于编码器+解码器的模型中。注意力机制可以帮助模型对输入的每个部分赋予不同的权重,抽取出更加关键及重要的信息,使模型做出更加准确的判断,同时不会对模型的计算和存储带来更大的开销。例如,将编码器+解码器模型用于翻译时,输入的句子与输出句子之间,往往是输入一个或几个词对应于输出的一个或几个词,如果句子中每个词都赋予相同的权重,这样做是不合理的,因此,会对不同的词赋予不同的权重值以区分出句子中重要的部分,假设输入的句子为“Today,Ming runs”,则输出的句子为“今天,小明跑步”,从翻译后的句子中可以提取出词语“今天”、“小明”和“跑步”,显然在翻译后的句子中,三个词语是有不同的重要度的,其中,词语“今天”的重要度没有词语“小明”和“跑步”高,故可以将“今天”的权重值设为0.2,将词语“小明”和“跑步”的权重值均设为0.4,以提高词语“小明”和“跑步”的重要度。
VistaNet模型:利用Attention机制进行图像与文本信息的融合,巧妙的解决了不同维度数据的向量空间不一致问题,增强了模型针对评论的情感分析的能力。VistaNet模型从下至上分为三层:词语编码器+注意力层11(Word Encoder+Attention层)、句子编码器+注意力层12(Sentence Encoder+Attention层)和文本编码器+注意力层13(Document Encoder+Attention层)。接下来对各层的结构进行介绍,请参考图1:
1、词语编码器+注意力层11;
该层的输入数据为文章内容中每个句子的各个单词的文章词向量w(最大单次个数为T),可选地,文章词向量可以通过预先训练的神经网络模型获得。在输入词向量w后,通过双向的循环神经网络(Recurrent Neural Network,RNN)得到每个RNN两个方向的隐状态,将获得的两个隐状态进行拼接,作为该时间步长的输出(timestep)。然后使用attention机制计算各个时间步长的重要性权重α,对重要性权重α进行归一化处理后,对所有时间步长的输出进行加权求和,得到句子的向量表示si。具体计算公式如下:
u i,t=U Ttanh(W wh i,t+b w);
Figure PCTCN2022074684-appb-000001
s i=∑ tα i,th i,t
其中,u i,t表示第i个句子中第t个词语的权重值。U是一个随机初始化的值。tanh()表示双曲正切函数。W w是文章词向量w所对应的词嵌入矩阵。h i,t代表第i个句子中第t个词语的隐状态。b w为一常量。α i,t表示第i个句子中第t个词语归一化后的权重值。exp()表示以自然对数e为底的指数函数。s i表示句子的向量表示,即为文章句向量。
2、句子编码器+注意力层12;
该层输入文章内容中各个句子的文章句向量s i(最多L个句子),输出为对第j个图像的文本表示d j,即为第j个图像的文本特征向量。输入的文章句向量经过双向的RNN得到每个RNN两个方向的隐状态,拼接得到每个句子的隐状态hi。另一方面,提取文章内容中第j个图像的图像特征向量mj。示例性的,一种图像特征向量的获取方法如下:将文章内容中的图像a j(1≤j≤M,M为文章内容中图像的张数),输入到CNN网络中,提取图像的特征,并将特征输入到全连接层中,为各个特征分配权重,再由全连接层输出图像特征向量mj。通过 mj在hi上实施注意力机制,得到每个hi对应的重要性权重β,根据获得的β对针对第j张图像的hi进行加权平均,获得针对第j张图像的文本表示dj。具体计算公式如下:
p j=tanh(W pm j+b p);
q i=tanh(W qh i+b q);
v j,i=V T(p j⊙q i+q i);
Figure PCTCN2022074684-appb-000002
d j=∑ iβ j,ih i
其中,p j表示第j张图像对权重值的贡献值。W p表示第j张图像的嵌入矩阵。b p为一常量。q i表示第i个句子对权重值的贡献值。W q表示第i个句子的嵌入矩阵。b q为一常量。V为一随机初始化的值。v j,i表示第j张图像对应的第i个句子的权重值。⊙表示同或运算。β j,i表示第j张图像对应的第i个句子的归一化后的权重值。h i表示第i个隐状态。m j表示第j个图像的图像特征向量。
3、文本编码器+注意力层13;
最后一层输入是针对不同图像所生成的多个文本特征向量dj,使用attention机制计算对应的权重后进行加权,得到代表最终整个文章内容的文本特征向量。根据最终得到的文本特征向量来对文章内容进行评价。具体计算公式如下:
k j=K Ttanh(W dd j+b d);
Figure PCTCN2022074684-appb-000003
d=∑ jγ jd j
其中,k j表示第j张图像对应的权重值。K是一个随机初始化的值。W d为文本特征向量d对应的嵌入矩阵。b d为一常量。γ j为第j张图像对应的归一化后的权重值。d表示整个文章内容的评价结果。
图2示出了本申请一个示例性实施例提供的计算机系统的结构示意图。计算机系统200包括:终端220和服务器240。
终端220上安装有与文章内容评价相关的应用程序。该应用程序可以是app(application,应用程序)中的小程序,也可以是专门的应用程序,也可以是网页客户端。用户可以终端220上接受到文章内容的评价结果,或者,用户可以将文章内容发送到服务器240中,由服务器240做出相应的评价结果,并将评价结果返回给终端220或是其它终端。终端220是智能手机、车载计算机、平板电脑、电子书阅读器、MP3播放器、MP4播放器、膝上型便携计算机和台式计算机中的至少一种。
终端220通过无线网络或有线网络与服务器240相连。
服务器240可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN(Content Delivery Network,内容分发网络)、以及大数据和人工智能平台等基础云计算服务的云服务器。服务器240用于为支持文章内容评价的应用程序提供后台服务。可选地,服务器240承担主要计算工作,终端220承担次要计算工作;或者,服务器240承担次要计算工作,终端220承担主要计算工作;或者,服务器240和终端220两者采用分布式计算架构进行协同计算。
图3示出了本申请一个示例性实施例提供的内容的评价方法的流程示意图。该方法可由图2所示的终端220或服务器240执行,该方法包括如下步骤:
步骤302:提取文章内容中的文章特征信息,文章特征信息包括文本信息和多媒体信息,多媒体信息是图像信息、视频信息、音频信息中的至少一种。
文章内容的获取方法可以有多种,例如,从网络上下载文章内容,或者,接收由其他终端发送的文章内容,或者,从本地存储中获得文章内容,或者,获取实时输入的文章内容。本申请对此不作限制。
由于本申请要从多个维度对文章内容做出评价,而文章的主体是文本,所以本申请实施例提到的文章特征信息包括至少两种信息,且前述至少两种信息包括文本信息,除文本信息以外文章特征信息还包括图像信息、视频信息、音频信息中的至少一种。
文本信息指的是文章内容中的文本。例如,文章内容中有文本C和文本D,则文本C和文本D是文本信息。
图像信息指的是文章内容中的图像。例如,文章内容中有图像A和图像B,则图像A与图像B为图像信息。
视频信息指的是文章包括的视频或者是与文章相关的视频。例如,在一些情况下,文章中有视频链接,则视频信息指该视频链接对应的视频。在一种可能的实现方式中,由于视频是由多张视频帧图像构成的,故可以将视频信息转化为图像信息,其中,可以将全部视频信息转化为图像信息,也可以将部分视频信息转化为图像信息。比如,一段视频由30张视频帧图像构成,则可以将这30张视频帧图像认为是文章内容中的图像,进而将视频信息转化为图像信息。
音频信息指的是文章包括的音频或者是与文章相关的音频。例如,文章中有音频链接,则音频信息指该音频链接对应的音频。在一种可能的实现方式中,通过语音识别技术,将音频信息转化为文本信息。比如,通过语音识别技术确定音频的文本内容,将得到的文本内容作为文本信息。
步骤304:根据文章特征信息,获得文章内容的至少两个维度的评价结果。
可选地,至少两个维度的评价结果包括文本-多媒体评价结果、文本评价结果、客观先验评价结果和排版评价结果中的至少两种。
当文章特征信息包括文本信息和图像信息时,至少两个维度的评价结果包括文本-多媒体评价结果、文本评价结果、客观先验评价结果和排版评价结果。
当文章特征信息包括文本信息和视频信息时,至少两个维度的评价结果包括文本-多媒体评价结果、文本评价结果、客观先验评价结果和排版评价结果。
当文章特征信息包括文本信息和音频信息时,至少两个维度的评价结果包括文本-音频评价结果、客观先验评价结果和排版评价结果。
评价结果可以有多种表现形式。可选地,评价结果是文章内容中优质内容和劣质内容的识别和判定。可选地,评价结果是对文章内容的评分。
文本-多媒体评价结果用于表示文章内容中,图像和文本之间的关联性。示例性的,文章内容中的文本部分为“我昨天去游泳了”,而相应的图像部分则是一张梅花的图像,图像和文本之间的关联性不高,可以认为此处文本和图像不具有关联性。
文本评价结果用于评价文章内容中文本的质量。文本评价结果可以获得文章内容中的优质文本,这里的优质文本指的是文本所承载的信息丰富,用户通过阅读可以获得的信息量。比如,有两段字数相同的话,第一段话可以提取到信息A,第二段话可以提取到信息A和信息B,则相较之下,第二段话的质量高于第一段话。
客观先验评价结果指在不考虑文章内容的前提下,能从文章内容或文章内容关联信息中获得的评价结果。例如,文章内容是由账号A发表的,而账号A曾经发表过多篇优质的文章,此时,基于账号A的历史表现,这里的客观先验评价就会提高。
排版评价结果用于评价文章内容的排版布局。示例性的,文章内容中文本A在文章的第5页,而与文本A对应的图像A在文章的第12页,这里的文本A与图像A在排版布局上是不合理的,用户在阅读文章内容时会十分不方便,因此,这里的排版评价结果会变差。
在本实施例中,以通过神经网络模型获取至少两个维度的评价结果的方法为例进行说明:
1、提取文本特征信息在至少两个维度的特征信息。
至少两个维度的特征信息指可以从图像信息和文本信息中提取到的多维度的信息,包括但不限于文本层面的信息、图像层面的信息、文本和图像结合对应的信息、文章内容的发表账号对应的信息、排版信息中的至少一种。示例性的,至少两个维度的特征信息包括图像特征向量、文章词向量、文章句向量、文本特征向量(指文章内容中的全部或部分文本内容所对应的特征向量)、统计学特征、语言学特征、图像质量特征、账号特征中的至少一种。
2、将至少两个维度的特征信息输入文章内容评价模型,获得文章内容的至少两个维度的评价结果,文章内容评价模型是根据至少两个维度的特征信息预测相应维度的评价结果的机器学习模型。
步骤306:融合至少两个维度的评价结果,获得文章内容的内容评价结果。
内容评价结果用于从至少两个维度对文章内容进行评价。在本实施例中,这里的至少两个维度指的是上述文本-多媒体评价结果、文本评价结果、客观先验评价结果、排版评价结果中至少两种维度。
可选地,通过神经网络来融合至少两个维度的评价结果。
可选地,为至少两个维度的评价结果赋予相应的权重,通过权重来计算相应的内容评价结果。
综上所述,本实施例通过融合文章内容的多个维度的评价结果,从多个维度对文章内容做出相应的评价,使得最终得到的评价结果更加符合文章的实际内容。另一方面,由于最终的评价结果综合了多个维度的评价,可以有效减少错误的评价结果,提高整个方案的鲁棒性。
图4示出了本申请一个示例性实施例提供的内容的评价方法的流程示意图。该方法可由图2所示的终端220或服务器240执行,该方法包括如下步骤:
步骤401:提取文章内容中的文章特征信息。
文章特征信息包括文本信息和多媒体信息,多媒体信息是图像信息、视频信息、音频信息中的至少一种。
步骤402:提取文章特征信息在至少两个维度的特征信息。
在本实施例中,通过文章内容评价模型对至少两个维度特征信息进行处理。图5示出了本申请一个示例性实施例提供的文章内容评价模型的示例性的结构图。示例性的,文章内容评价模型包括但不限于四个子网络和注意力融合层55构成,分别为文本-多媒体评价子网络51、客观先验特征子网络52、文本子网络53和排版子网络54。其中,文章内容评价模型由上述四个子网络中的至少两个维度构成。上述四个子网络的输入为相应的至少两个维度特征信息,输出为各自的评价结果。需要说明的是,文章内容评价模型中文本-多媒体评价子网络51的数量与多媒体信息中包括的信息数量相同。比如,当多媒体信息包括图像信息和视频信息时,文章内容评价模型包括两个文本-多媒体评价子网络,其中一个文本-多媒体评价子网络用来确定文本-图像评价结果,另一个用来确定文本-视频评价结果。
注意力融合层55的输入为至少两个维度的评价结果,输出为内容评价结果56。其中,注意力融合层55用于对输入的至少两个维度的评价结果赋予相应的权值,并进行相应的加权计算,用以获得内容评价结果56。
步骤403:提取至少两个维度的特征信息中的文本-多媒体特征向量。
文本-多媒体特征向量包括文本-图像特征向量、文本-视频特征向量和文本-音频特征向量中的至少一种。其中,文本-图像特征向量包括文本特征向量和图像特征向量,文本-视频特征向量包括文本特征向量和视频特征向量,文本-音频特征向量包括文本特征向量和音频特征向量。
可选地,文本特征向量为文章内容中全部或部分的文字对应的特征向量。
图像特征向量可以是直接提取得到的,也可以是处理原数据以后提取得到的。
文本特征向量可以是直接提取得到的,也可以是处理原数据以后提取得到的。示例性的,文本特征向量是直接提取词语的特征向量得到的。示例性的,文本特征向量是在提取词语的特征向量后,对词语的特征向量进行处理,得到句子的特征向量。
图像特征向量用于表征文章内容中图像的特征向量。图像特征向量可以通过相应的卷积神经网络获得。
视频特征向量用于表示文章内容中视频的特征向量。可选地,文章内容中的视频包括多个视频帧图像,视频特征向量是通过提取前述多个视频帧图像的图像特征得到的。
音频特征向量用于表示文章内容中音频的特征向量。可选地,通过语音识别技术,将音频转化为文本,通过提取前述文本的文本特征得到音频特征向量。
由于文章内容中的文本可以被视作是由多个句子构成的,而句子又可以被视为是由多个词语构成的。故这里的文本特征向量可以指文章词向量,也可以指文章句向量。文章词向量可以由预先训练完成的神经网络模型获得。
步骤404:将文本-多媒体特征向量输入文本-多媒体子网络,获得文章内容的文本-多媒体评价结果。
文本-多媒体评价结果包括文本-图像评价结果、文本-视频评价结果和文本-音频评价结果中的至少一种。
在本申请实施例中,以文本-多媒体评价结果是文本-图像评价结果进行举例说明,如下所示:
该步骤包括以下子步骤:
1、通过文本-多媒体评价子网络,生成融合图像信息的文本特征表示和融合文本信息的图像特征表示。
文本特征表示指在文本特征向量的基础上,融合了图像信息的特征向量。
图像特征表示指在图像特征向量的基础上,融合了文本信息的特征向量。
2、通过文本-多媒体评价子网络,融合文本特征表示和图像特征表示,生成文章内容的文本-多媒体评价结果。
这里的融合方法可以是通过相应的神经网络模型进行融合。
示例性的,图6示出了本申请一个示例性实施例提供的文本-多媒体评价子网络的示例性结构图。文本-多媒体评价子网络从整体上来看可以分为左右两部分。
如图6所示,先对左侧部分进行介绍。左侧部分从上到下为transformer编码器61(transformer指一种基于编码器+解码器架构的神经网络模型,在transformer模型中运用了attention机制,运用于自然语言处理领域)、transformer编码器62和注意力融合层63。
其中,transformer编码器61的输入为文章词向量W1至Wn(n表示句子中词语的最大个数),输出为文章句向量S1至SL(L表示文章内容中句子的最大个数)。transformer编码器会将输入的n个文章词向量进行组合输出L个文章句向量。
transformer编码器62的输入是文章句向量S1至SL,输出为文本特征向量H1至Hm(m表示文章内容中图像的最大张数),其中,文本特征向量和文章内容中的图像一一对应。transformer编码器会将输入的L个文章句向量进行组合输出m个文本特征向量。
注意力融合层63的输入为文本特征向量H1至Hm,输出为融合图像特征的文本特征表示66。注意力融合层63可以通过注意力机制在文本特征向量的基础上融合相应的图像信息,以获得融合图像特征的文本特征表示64。具体的融合过程可以参照图1所示的VistaNet模型中句子编码器+注意力层12和文本编码器+注意力层13中的相关计算方式。
接下来对文本-多媒体评价子网络的右侧部分进行介绍:右侧部分包括提取层64和注意力融合层65。
提取层64的输入为文章内容中的m个图像,输出为图像特征向量M1至Mm。提取层64中为预先训练完成的特征提取网络,可以从图像中提取出相应的图像特征向量。
注意力融合层65的输入为图像特征向量M1至Mm,输出为融合文本信息的图像特征表示66。注意力融合层65和注意力融合层63类似,可以通过注意力机制在图像特征向量的基础上融合相应的文本信息,以获得融合文本特征的图像特征表示64。类似的,具体的融合过程可以参照图1所示的VistaNet模型中句子编码器+注意力层12和文本编码器+注意力层13中的相关计算方式。
文本-多媒体评价子网络还包括融合层67和多层感知器68(Multi-Layer Perceptron,MLP)。融合层67的输入为融合图像信息的文本特征表示64和融合文本信息的图像特征66,输出为融合后的特征。
多层感知器68的输入为融合后的特征,输出为文本-多媒体评价结果。多层感知器68用于对融合后的特征进行识别和分类,提取出其中有用的信息,并得出文本-多媒体评价结果。
需要说的是,在文本-多媒体特征向量还包括文本-视频特征向量的情况下,需要在文章内容评价模型中的额外设置一个文本-多媒体评价子网络,该文本-多媒体评价子网络用于确定文本-视频评价结果。
在文本-多媒体特征向量还包括文本-音频特征向量的情况下,需要在文章内容评价模型中的额外设置一个文本-多媒体评价子网络,该文本-多媒体评价子网络用于确定文本-音频评价结果。
综上,文本-多媒体特征评价结果可以对文章内容中图像与文本之间的关联程度做出评价,可以解决图像内容本身不够优质、针对同一文字描述所配置的图像信息冗余、多角度重复拍摄图像等问题。使子网络获得了正确的评价结果,优质内容的识别效果得到提升。同时,文本-多媒体评价子网络还可以学到文本-多媒体不符的特征,在感情、心灵鸡汤等文章内容表现突出。
步骤405:提取至少两个维度的特征信息中的客观先验特征。
可选地,客观先验特征包括统计学特征、语言学特征、图像质量特征、账号特征中的至少一种。
可选地,统计学特征包括页面高度、图像面积和字数、段落数中的至少一种。
可选地,语言学特征包括词法多样性、句法多样性、修辞手法和诗词引用中至少一种。
可选地,图像质量特征包括图像清晰度、图像通道数、图像大小、图像数量中的至少一种。
可选地,账号特征包括账号等级、账号垂直度和账号收藏、点赞等消费数据中的至少一种。
步骤406:将客观先验特征输入客观先验特征子网络中,获得文章内容的客观先验评价结果。
示例性的,图7示出了本申请一个示例性实施例提供的客观先验特征子网络的示例性结构图。客观先验特征子网络从上至下分为嵌入层71、特征交叉层72和多层感知器74。嵌入层71的输入为统计学特征、语言学特征、图像质量特征、账号特征中的至少一种,输出为连续的特征1至特征x(x为正整数),嵌入层71的作用是将离散的统计学特征、语言学特征、图像质量特征、账号特征转换为可以用连续向量表示的特征,同时可以减少输入特征的维数。
特征交叉层72的输入为上述特征1至特征x,输出为交叉后的总特征73。特征交叉层72将输入的特征1至特征x两两相乘,并赋予权值后进行求和运算,获得相应的总特征73,总特征73可以从整体上表示输入的为统计学特征、语言学特征、图像质量特征、账号特征。
多层感知器74的输入为总特征73,输出为客观先验评价结果。多层感知器74用于对总特征73进行识别和分类,提取出其中有用的信息,并得出客观先验特征评价结果。
综上,通过客观先验评价结果可以得到隐式的客观体验结果,例如,账号权威度对文章内容的影响、文章中修辞手法的运用对文章内容的影响等。这些影响是难以被注意到,但又实际存在的。客观先验评价结果可以将这些隐式的客观体验结果具象化,方便得出合理的评 价结果。
步骤407:提取至少两个维度的特征信息中的文章词向量。
可选地,由预先训练完成的神经网络模型提取至少两个维度的特征信息中的文章词向量。
步骤408:将文章词向量输入到文本子网络中,获得文章内容的文本评价结果。
图8示出了本申请一个示例性实施例示出的文本子网络的示例性结构图。该文本子网络包括了transformer编码器81、transformer编码器82和多层感知器层84。
transformer编码器81的输入为文章词向量W1至Wn(n表示句子中词语的最大个数),输出为文章句向量S1至SL(L表示文章内容中句子的最大个数)。transformer编码器会将输入的n个文章词向量进行组合输出L个文章句向量。
transformer编码器82的输入是文章句向量S1至SL,输出为文本特征向量83,其中,文本特征向量H表示是文章内容中全部的文字对应的特征向量。transformer编码器会将输入的L个文章句向量进行组合输出文本特征向量83。
多层感知器层84的输入为文本特征向量83,输出为文本评价结果。多层感知器层84用于对输入的文本特征向量进行识别和分类,提取出其中有用的信息,并得出文本评价结果。
综上,文本评价结果是对文本的具体评价,由于输入的特征均与文本高度关联,故可以得到对文章内容中的文本的最为准确的评价结果,使得最终的评价结果在文本方面更为准确。
步骤409:提取至少两个维度的特征信息中的文本-多媒体特征向量。
文本-多媒体特征向量包括文本-图像特征向量、文本-视频特征向量和文本-音频特征向量中的至少一种。其中,文本-图像特征向量包括文本特征向量和图像特征向量,文本-视频特征向量包括文本特征向量和视频特征向量,文本-音频特征向量包括文本特征向量和音频特征向量。
步骤410:将文本-多媒体特征向输入到排版子网络中,获得文章内容的排版评价结果。
图9示出了本申请一个示例性实施例示出的排版子网络的示例性结构图。该排版子网络包括:长短期记忆神经网络91、注意力融合层92、CNN93和多层感知层95。
长短期记忆神经网络91和注意力融合层92共同作用。长短期记忆神经网络91的输入为交错排列的图像特征向量和文本特征向量V1至VL,注意力融合层92的输入为上述融合了图像信息和文本特征向量的排版特征,输出为融合了图像信息的文本特征向量的排版特征。
CNN93的输入为交错排列的图像特征向量和文本特征向量V1至VL,输出为融合了文本信息的图像向量的排版特征。注意力融合层92的输出和CNN93的输出构成排版特征94。
多层感知器层95的输入为排版特征94,输出为排版评价结果。多层感知器层95用于对输入的排版特征进行识别和分类,提取出其中有用的信息,并得出排版评价结果。
综上,排版评价结果可以对文章内容的排版布局做出相应的评价,由于文章内容的排版布局也是属于隐式的客观体验结果。排版对文章内容的影响是难以被注意到,但又实际存在的。排版评价结果可以将这些隐式的客观体验结果具象化,方便得出合理的评价结果。
步骤411:通过文章内容评价模型和注意力机制,为至少两个维度的评价结果赋予相应的权重值。
权重值可以根据实际需求来进行调节。
可选地,通过预训练的神经网络来确定权重值。
步骤412:通过文章内容评价模型和权重值,对至少两个维度的评价结果进行加权平均,获得文章内容的内容评价结果。
示例性的,如图10所示,图10示出了本实施例的完整的文章内容评价模型,具体细节可参照图5至图9对应的内容,此处不再赘述。
综上所述,本实施例通过融合文章内容的多个维度的评价结果,覆盖了影响文章内容评价的所有因素,即使在复杂场景下,也能对文章内容进行合理的评价,获取文章内容中的优质部分和劣质部分,使得获得的评价结果更加符合实际情况。并使用四个不同的子网络来获 得相应的评价结果,可以根据实际的需求挑选相应的评价结果进行融合,契合实际需求。
图11示出了本申请一个示例性实施例提供的文本-多媒体多子网络训练方法的流程示意图。该方法可由图2所示的终端220或服务器240或其他计算机设备执行,该方法包括如下步骤:
步骤1101:获得文本-多媒体训练集,文本-多媒体训练集包括样本文章和与样本文章对应的真实文本-多媒体评价结果。
样本文章的获取方法可以有多种,例如,从网络上下载样本文章,或者,接收由其他终端发送的样本文章,或者,从本地存储中获得样本文章,或者,获取实时输入的样本文章。本申请对此不作限制。
真实文本-多媒体评价结果是由相关技术人员或阅读的用户对样本文章中的文本-多媒体关系做出的评价结果。
本实施例中的步骤1102至步骤1104的具体过程,可以参考步骤401至步骤404。
步骤1102:提取样本文章中的样本文章特征信息。
样本文章特征信息包括样本文本信息和样本多媒体信息,样本多媒体信息包括样本图像信息、样本视频信息、样本音频信息中的至少一种。
示例性的,当样本文章特征信息包括样本图像信息和样本文本信息时,从样本文章中直接提取出前述样本图像信息和样本文本信息。
可选地,当样本文章特征信息中存在样本视频信息时,将样本视频信息转化为多张视频帧图像,根据多张视频帧图像得到图像信息,将前述得到的图像信息作为样本图像信息。
可选地,当样本文章特征信息中存在样本音频信息时,将样本音频信息转化为文本信息,将文本信息作为样本文本信息。
步骤1103:根据样本文章特征信息,提取样本文本-多媒体特征向量。
样本文本-多媒体特征向量包括样本文本-图像特征向量、样本文本-视频特征向量和样本文本-音频特征向量中的至少一种。其中,样本文本-图像特征向量包括样本文本特征向量和样本图像特征向量,样本文本-视频特征向量包括样本文本特征向量和样本视频特征向量,样本文本-音频特征向量包括样本文本特征向量和样本音频特征向量。
可选地,通过图像特征提取网络从样本文章特征信息中的样本图像信息提取出样本图像特征向量。
可选地,通过文本特征提取网络从样本文章特征信息中的样本文本信息提取出样本文本特征向量。
具体内容可参考图4所示实施例的步骤403,此处不再赘述。
步骤1104:将样本文本-多媒体特征向量输入文本-多媒体评价子网络,获得预测文本-多媒体评价结果。
在一种可选地实施方式中,通过文本-多媒体评价子网络融合样本图像特征向量和样本文本特征向量,获得预测文本-多媒体评价结果。
步骤1105:根据预测文本-多媒体评价结果和真实文本-多媒体评价结果之间的误差损失,对文本-多媒体评价子网络进行训练。
可选地,通过误差反向传播算法修正文本-多媒体评价子网络中的网络参数。
可选地,当误差损失不大于阈值时,完成文本-多媒体评价子网络的训练。阈值可由技术人员自行确定。
可选地,当文本-多媒体评价子网络的迭代次数达到阈值时,完成文本-多媒体评价子网络的训练。
综上所述,本实施例给出了一种文本-多媒体评价子网络的训练方法,可以快速构造文本-多媒体子网络,并且,获得的文本-多媒体评价子网络可以准确地得到对文章内容的文本-多 媒体的多维度的评价结果。
图12示出了本申请一个示例性实施例提供的客观先验特征子网络训练方法的流程示意图。该方法可由图2所示的终端220或服务器240或其他计算机设备执行,该方法包括如下步骤:
步骤1201:获得客观先验训练集,客观先验训练集包括样本文章和与样本文章对应的真实客观先验评价结果。
真实客观先验评价结果是由相关技术人员或阅读的用户对样本文章中的客观先验特征做出的评价结果。
本实施例中的步骤1202至步骤1204的具体过程,可以参考步骤401至步骤402和步骤405至步骤406。
步骤1202:提取样本文章中的样本文章特征信息。
可选地,在本实施例中,样本文章特征信息包括样本文本信息和样本多媒体信息,样本多媒体信息包括样本图像信息、样本视频信息、样本音频信息中的至少一种。
步骤1203:根据样本文章特征信息,获得样本文章的样本客观先验特征。
可选地,样本客观先验特征包括样本统计学特征、样本语言学特征、样本图像质量特征、样本账号特征中的至少一种。
示例性的,根据样本文章特征信息中的样本文本信息和样本图像特征,得到样本统计学特征。
示例性的,根据样本文章特征信息中的样本文本信息,得到样本语言学特征。
示例性的,根据样本文章特征信息中的样本图像信息,获取样本图像质量特征。
步骤1204:将样本客观先验特征输入客观先验特征子网络,获得预测客观先验评价结果。
预测客观先验评价结果用于预测样本文章中隐式的客观体验结果。
步骤1205:根据预测客观先验评价结果和真实客观先验评价结果之间的误差损失,对客观先验特征子网络进行训练。
可选地,通过误差反向传播算法修正客观先验特征子网络中的网络参数。
可选地,当误差损失不大于阈值时,完成客观先验特征子网络的训练。阈值可由技术人员自行确定。
可选地,当客观先验特征子网络的迭代次数达到阈值时,完成客观先验特征子网络的训练。
综上所述,本实施例给出了一种客观先验特征子网络的训练方法,可以快速构造客观先验特征子网络,并且,获得的客观先验特征子网络可以准确地得到对文章内容的客观先验评价结果。
图13示出了本申请一个示例性实施例提供的文本子网络训练方法的流程示意图。该方法可由图2所示的终端220或服务器240或其他计算机设备执行,该方法包括如下步骤:
步骤1301:获得文本训练集,文本训练集包括样本文章和与样本文章对应的真实文本评价结果。
真实文本评价结果是由相关技术人员或阅读的用户对样本文章中的文本做出的评价结果。
本实施例中的步骤1302至步骤1304的具体过程,可以参考步骤401至步骤402和步骤407至步骤408。
步骤1302:提取样本文章中的样本文本信息。
在本申请实施例中,样本文本信息指样本文章中的文本,或者,样本文本信息指样本文章中图像里的文本。
步骤1303:根据样本文本信息,提取样本文章词向量。
可选地,调用词向量提取网络,对样本文本信息进行处理,输出样本文章词向量。
步骤1304:将样本文章词向量输入文本子网络,获得预测文本评价结果。
预测文本评价结果用于预测样本文章的文本对文章内容的影响程度。
步骤1305:根据预测文本评价结果和真实文本评价结果之间的误差损失,对文本子网络进行训练。
可选地,通过误差反向传播算法修正文本子网络中的网络参数。
可选地,当误差损失不大于阈值时,完成文本子网络的训练。阈值可由技术人员自行确定。
可选地,当文本子网络的迭代次数达到阈值时,完成文本子网络的训练。
综上所述,本实施例给出了一种文本子网络的训练方法,可以快速构造文本子网络,并且,获得的文本子网络可以准确地得到对文章内容的文本评价结果。
图14示出了本申请一个示例性实施例提供的排版子网络训练方法的流程示意图。该方法可由图2所示的终端220或服务器240执行,该方法包括如下步骤:
步骤1401:获得排版训练集,排版训练集包括样本文章和与样本文章对应的真实排版评价结果。
真实排版评价结果是由相关技术人员或阅读的用户对样本文章中的排版做出的评价结果。
本实施例中的步骤1402至步骤1404的具体过程,可以参考步骤401至步骤402和步骤409至步骤410。
步骤1402:提取样本文章中的样本文章特征信息。
可选地,在本申请实施例中,样本文章特征信息包括样本文本信息和样本多媒体信息,样本多媒体信息包括样本图像信息、样本视频信息、样本音频信息中的至少一种。
步骤1403:根据样本文章特征信息,提取样本文本-多媒体特征向量。
样本文本-多媒体特征向量包括样本图像特征向量和样本文本特征向量中的至少一种。
示例性的,调用图像特征提取网络,对样本图像信息进行数据处理,输出样本图像特征向量。
示例性的,调用文本特征提取网络,对样本文本信息进行数据处理,输出样本文本特征向量。
步骤1404:将样本文本-多媒体特征向量输入到排版子网络中,获得预测排版评价结果。
预测排版评价结果用于预测样本文章的排版对文章内容的影响程度。
步骤1405:根据预测排版评价结果与真实排版评价结果之间的误差损失,对排版子网络进行训练。
可选地,通过误差反向传播算法修正排版子网络中的网络参数。
可选地,当误差损失不大于阈值时,完成排版子网络的训练。阈值可由技术人员自行确定。
可选地,当客观先验特征子网络的迭代次数达到阈值时,完成排版子网络的训练。
综上所述,本实施例给出了一种排版子网络的训练方法,可以快速构造排版子网络,并且,获得的排版子网络可以准确地得到对文章内容的文本评价结果。
图15示出了本申请一个示例性实施例示出的示例性业务架构图。该架构图分为两部分:低质过滤模块1501和优质识别模块1502。
低质过滤模块1501用于对文章内容中的低质内容和次低质内容进行过滤。其中,低质内容包括但不限于内容低俗、谣言、标题党和广告营销中的至少一种。内容低俗指文章内容不利于社会进步和用户的身心发展。谣言指文章内容中有不符合实际现实的内容。标题党指文章内容与文章标题不匹配。广告营销指文章内容中有宣传推广产品的内容。次低质内容包括但不限于无营养、套路文、八卦文、宣发文、拼接文、负面影响文、口水文和广告软文中的至少一种。无营养指文章内容可有可无,用户无法从文章内容中获得有用的信息。套路文指文章内容是按照固定的模板书写而成。八卦文指对人或事进行无端猜测的文章。宣发文指用 于宣传个人、集体或地点的文章。拼接文指由其它文章的全部或片段拼接而成的文章。口水文是指未经反复推敲修饰,包括较多口头用语的文章。广告软文本质上属于广告,但用户难以直接根据文章内容直接得知该文章属于广告的范畴。可选地,低质过滤模块1501使用相应的低质过滤神经网络来实现。
优质识别模块1502用于提取文章内容中的优质部分。优质识别模块包括:特征提取层1505、特征融合层1503、多目标反馈层1504、逻辑决策层1506。
特征提取层1505从图文多维度、图文原子能力和图文嵌套排版三个方面来提取相应的特征向量。特征提取层的输入为文章内容,输出为提取到的特征向量。其中,图文多维度用于提取文章内容中图像所对应的特征系向量、文本所对应的特征向量以及图像和文本之间的关联度所对应的特征向量。图文原子能力从语言学、统计学、账号和文章风格四个方面来提取相应的特征向量。其中,语言学又包括:词法多样性、句法多样性、修辞手法和诗词引用。统计学又包括:页面高度、图片面积和字数、段落数、图片平均清晰美观。账号又包括:账号等级、账号垂直度(账号垂直度指账号发表的内容在特定领域内的专业程度)和账号收藏、点赞等消费数据。文章风格又包括:实用性和专业性。
特征融合层1503的输入为从文章内容中提取到的特征向量,输出为文章内容的先验优质内容(先验优质内容指在未经用户反馈的情况下,对文章内容中的优质部分的进行预测所获得的内容)。特征融合层将输入的特征向量进行融合,并根据融合后的特征向量来获得其中的先验优质内容。
多目标反馈层1504的输入为融合后的特征向量和多名用户的反馈信息,输出为后验优质内容(后验优质内容指在获得用户的反馈信息的情况下,对输入的特征向量进行修正后,所预测获得的内容)。多目标反馈层可以根据用户的反馈信息来修正输入的特征向量,并对修正后的特征向量进行融合,来获得其中的后验优质内容。在另一种实现方法中,多目标反馈层可以先对输入的特征的向量进行融合,获得优质内容,并根据用户的反馈信息对优质内容进行修正,获得后验优质内容。
逻辑决策层1506的输入为先验优质内容和后验优质内容,输出为文章内容的优质内容。逻辑决策层1506会对先验优质内容和后验优质内容进行综合评价,来获得文章内容的优质内容。
该实施例将图文优质内容判定这一复杂场景从图文多维度、账号、文章排版体验、文章语言学类原子特征(如文章使用的词法多样性、文章是否使用比喻句排比句等多样化句法、文章是否引用古诗词等)等多个维度进行创新性拆解,并搭建融合图文维度子网络、客观先验特征子网络、文本子网络和排版子网络的集成模型,从而完成图文先验优质内容的识别与判定。
该模型主要运用在对内容算法研发中心图文内容进行质量判定的任务中,模型准确率达到94%,图文优质内容覆盖率达到16%。在浏览器和快报端侧对识别出来的图文优质内容进行推荐加权实验,实现了将优质内容优先推荐给用户,在业务侧取得了良好的业务效果。使用本申请所述图文先验优质识别算法进行优质内容加权推荐实验后,在浏览器侧整体大盘点击pv(page view,页面浏览量)提升0.38%,大盘曝光效率提升0.43%,大盘CTR(Click-Through-Rate,点击通过率)提升0.394%,用户时长提升0.17%;同时DAU(Daily Active User,日活跃用户数量)次日留存提升0.165%,互动指标数据中人均分享提升1.705%,人均点赞提升4.215%,人均评论提升0.188%。
下面为本申请的装置实施例,对于装置实施例中未详细描述的细节,可以结合参考上述方法实施例中相应的记载,本文不再赘述。
图16示出了本申请的一个示例性实施例提供的内容的评价装置的结构示意图。该装置可以通过软件、硬件或者两者的结合实现成为计算机设备的全部或一部分,该装置1600包括:
提取模块1601,用于提取所述文章内容中的文章特征信息,所述文章特征信息包括文本 信息,所述文章特征信息还包括图像信息、视频信息、音频信息中的至少一种;
评价模块1602,用于根据所述文章特征信息,获得所述文章内容的至少两个维度的评价结果;
评价融合模块1603,用于融合所述至少两个维度的评价结果,获得所述文章内容的内容评价结果。
在本申请的一个可选设计中,所述提取模块1601,还用于提取所述文章特征信息在至少两个维度的特征信息。
所述评价模块1602,还用于将所述至少两个维度的特征信息输入文章内容评价模型,获得所述文章内容的所述至少两个维度的评价结果,所述文章内容评价模型是根据所述至少两个维度的特征信息预测相应维度的评价结果的机器学习模型。
在本申请的一个可选设计中,所述提取模块1601,还用于提取所述至少两个维度的特征信息中的文本-多媒体特征向量,所述文本-多媒体特征向量包括文本-图像特征向量、文本-视频特征向量和文本-音频特征向量中的至少一种。
所述评价模块1602,还用于将所述文本-多媒体特征向量输入所述文本-多媒体评价子网络,获得所述文章内容的所述文本-多媒体评价结果。
在本申请的一个可选设计中,所述评价模块1602,还用于通过所述文本-多媒体评价子网络,生成融合图像信息的文本特征表示和融合文本信息的图像特征表示;通过所述文本-多媒体评价子网络,融合所述融合图像信息的文本特征表示和所述融合文本信息的图像特征表示,生成所述文章内容的所述文本-多媒体评价结果。
在本申请的一个可选设计中,所述提取模块1601,还用于提取所述至少两个维度的特征信息中的客观先验特征,所述客观先验特征包括统计学特征、语言学特征、图像质量特征、账号特征中的至少一种。
所述评价模块1602,还用于将所述客观先验特征输入所述客观先验特征子网络中,获得所述文章内容的客观先验评价结果。
在本申请的一个可选设计中,所述提取模块1601,还用于提取所述至少两个维度的特征信息中的文章词向量。
所述评价模块1602,还用于将所述文章词向量输入到所述文本子网络中,获得所述文章内容的所述文本评价结果。
在本申请的一个可选设计中,所述提取模块1601,还用于提取所述至少两个维度的特征信息中的文本-多媒体特征向量。
所述评价模块1602,还用于将所述文本-多媒体特征向量输入到所述排版子网络中,获得所述文章内容的所述排版评价结果。
在本申请的一个可选设计中,所述评价融合模块1603,还用于通过文章内容评价模型和注意力机制,为所述至少两个维度的评价结果赋予相应的权重值;通过文章内容评价模型和所述权重值,对所述至少两个维度的评价结果进行加权平均,获得所述文章内容的所述内容评价结果。
在本申请的一个可选设计中,所述装置还包括:训练模块1604。
所述训练模块1604,用于获得文本-多媒体训练集,所述文本-多媒体训练集包括样本文章和与样本文章对应的真实文本-多媒体评价结果;提取所述样本文章中的样本文章特征信息;根据所述样本评价信息,提取样本文本-多媒体特征向量,所述样本文本-多媒体特征向量包括样本图像特征向量和样本文本特征向量;将所述样本文本-多媒体特征向量输入所述文本-多媒体子网络,获得预测文本-多媒体评价结果;根据所述预测文本-多媒体评价结果和所述真实文本-多媒体评价结果之间的误差损失,对所述文本-多媒体评价子网络进行训练。
在本申请的一个可选设计中,所述训练模块1604,还用于获得客观先验训练集,所述客观先验训练集包括样本文章和与样本文章对应的真实客观先验评价结果;提取样本文章中的 样本评价信息;根据所述样本评价信息,获得所述样本文章的样本客观先验特征;将所述样本客观先验特征输入所述客观先验特征子网络,获得预测客观先验评价结果;根据所述预测客观先验评价结果和所述真实客观先验评价结果之间的误差损失,对所述客观先验特征子网络进行训练。
在本申请的一个可选设计中,所述训练模块1604,还用于获得文本训练集,所述文本训练集包括样本文章和于样本文章对应的真实文本评价结果;提取所述样本文章中的样本文本信息;根据所述样本文本信息,提取样本文章词向量;将所述样本文章词向量输入所述文本子网络,获得预测文本评价结果;根据所述预测文本评价结果和所述真实文本评价结果之间的误差损失,对所述文本子网络进行训练。
在本申请的一个可选设计中,所述训练模块1604,还用于获得排版训练集,所述排版训练集包括样本文章和与样本文章对应的真实排版评价结果;提取所述样本文章中的样本评价信息;根据所述样本评价信息,提取样本文本-多媒体特征向量;将所述样本文本-多媒体向量输入到所述排版子网络中,获得预测排版评价结果;根据所述预测排版评价结果与所述真实排版评价结果之间的误差损失,对所述排版子网络进行训练。
综上所述,本实施例通过融合文章内容的多个维度的评价结果,从多个维度对文章内容做出相应的评价,使得最终得到的评价结果更加符合文章得到实际内容。另一方面,由于最终的评价结果综合了多个维度的评价,可以有效减少错误的评价结果,提高整个方案的鲁棒性。
图17是根据一示例性实施例示出的一种计算机设备的结构示意图。所述计算机设备1700包括中央处理单元(Central Processing Unit,CPU)1701、包括随机存取存储器(Random Access Memory,RAM)1702和只读存储器(Read-Only Memory,ROM)1703的系统存储器1704,以及连接系统存储器1704和中央处理单元1701的系统总线1705。所述计算机设备1700还包括帮助计算机设备内的各个器件之间传输信息的基本输入/输出系统(Input/Output,I/O系统)1706,和用于存储操作系统1713、应用程序1714和其他程序模块1715的大容量存储设备1707。
所述基本输入/输出系统1706包括有用于显示信息的显示器1708和用于用户输入信息的诸如鼠标、键盘之类的输入设备1709。其中所述显示器1708和输入设备1709都通过连接到系统总线1705的输入输出控制器1710连接到中央处理单元1701。所述基本输入/输出系统1706还可以包括输入输出控制器1710以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器1710还提供输出到显示屏、打印机或其他类型的输出设备。
所述大容量存储设备1707通过连接到系统总线1705的大容量存储控制器(未示出)连接到中央处理单元1701。所述大容量存储设备1707及其相关联的计算机设备可读介质为计算机设备1700提供非易失性存储。也就是说,所述大容量存储设备1707可以包括诸如硬盘或者只读光盘(Compact Disc Read-Only Memory,CD-ROM)驱动器之类的计算机设备可读介质(未示出)。
不失一般性,所述计算机设备可读介质可以包括计算机设备存储介质和通信介质。计算机设备存储介质包括以用于存储诸如计算机设备可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机设备存储介质包括RAM、ROM、可擦除可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、带电可擦可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM),CD-ROM、数字视频光盘(Digital Video Disc,DVD)或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知所述计算机设备存储介质不局限于上述几种。上述的系统存储器1704和大容量存储设备1707可以统称为存储器。
根据本公开的各种实施例,所述计算机设备1700还可以通过诸如因特网等网络连接到网络上的远程计算机设备运行。也即计算机设备1700可以通过连接在所述系统总线1705上的网络接口单元1712连接到网络1711,或者说,也可以使用网络接口单元1712来连接到其他类型的网络或远程计算机设备系统(未示出)。
所述存储器还包括一个或者一个以上的程序,所述一个或者一个以上程序存储于存储器中,中央处理器1701通过执行该一个或一个以上程序来实现上述内容的评价方法的全部或者部分步骤。
在示例性实施例中,还提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现上述各个方法实施例提供的内容的评价方法。
本申请还提供一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现上述方法实施例提供的内容的评价方法。
根据本申请的另一方面,提供了一种计算机程序产品或计算机程序,上述计算机程序产品或计算机程序包括计算机指令,上述计算机指令存储在计算机可读存储介质中。计算机设备的处理器从上述计算机可读存储介质读取上述计算机指令,上述处理器执行上述计算机指令,使得上述计算机设备执行如上方面所述的内容的评价方法。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种内容的评价方法,所述方法由计算机设备执行,所述方法包括:
    提取所述文章内容中的文章特征信息,所述文章特征信息包括文本信息和多媒体信息,所述多媒体信息包括图像信息、视频信息、音频信息中的至少一种;
    根据所述文章特征信息,获得所述文章内容的至少两个维度的评价结果;
    融合所述至少两个维度的评价结果,获得所述文章内容的内容评价结果。
  2. 根据权利要求1所述的方法,其中,所述根据所述文章特征信息,获得所述文章内容的至少两个维度的评价结果,包括:
    提取所述文章特征信息在至少两个维度的特征信息;
    将所述至少两个维度的特征信息输入文章内容评价模型,获得所述文章内容的所述至少两个维度的评价结果,所述文章内容评价模型是根据所述至少两个维度的特征信息预测相应维度的评价结果的机器学习模型。
  3. 根据权利要求2所述的方法,其中,所述文章内容评价模型包括:文本-多媒体子网络;所述至少两个维度的评价结果中的一个评价结果是文本-多媒体评价结果,所述文本-多媒体评价结果包括文本-图像评价结果、文本-视频评价结果和文本-音频评价结果中的至少一种;
    所述将所述至少两个维度的特征信息输入文章内容评价模型,获得所述文章内容的所述至少两个维度的评价结果,包括:
    提取所述至少两个维度的特征信息中的文本-多媒体特征向量,所述文本-多媒体特征向量包括文本-图像特征向量、文本-视频特征向量和文本-音频特征向量中的至少一种;
    将所述文本-多媒体特征向量输入所述文本-多媒体子网络,获得所述文章内容的所述文本-多媒体评价结果。
  4. 根据权利要求3所述的方法,其中,所述将所述文本-多媒体特征向量输入所述文本-多媒体子网络,获得所述文章内容的所述文本-多媒体评价结果,包括:
    通过所述文本-多媒体子网络,生成融合图像信息的文本特征表示和融合文本信息的图像特征表示;
    通过所述文本-多媒体子网络,将所述融合图像信息的文本特征表示和所述融合文本信息的图像特征表示进行融合,生成所述文章内容的所述文本-多媒体评价结果。
  5. 根据权利要求2所述的方法,其中,所述文章内容评价模型还包括:客观先验特征子网络;所述至少两个维度的评价结果中的一个评价结果是客观先验评价结果;
    所述将所述至少两个维度的特征信息输入文章内容评价模型,获得所述文章内容的所述至少两个维度的评价结果,包括:
    提取所述至少两个维度的特征信息中的客观先验特征,所述客观先验特征包括统计学特征、语言学特征、图像质量特征、账号特征中的至少一种;
    将所述客观先验特征输入所述客观先验特征子网络中,获得所述文章内容的客观先验评价结果。
  6. 根据权利要求2所述的方法,其中,所述文章内容评价模型还包括:文本子网络;所述至少两个维度的评价结果中的一个评价结果是文本评价结果;
    所述将所述至少两个维度的特征信息输入文章内容评价模型,获得所述文章内容的所述至少两个维度的评价结果,包括:
    提取所述至少两个维度的特征信息中的文章词向量;
    将所述文章词向量输入到所述文本子网络中,获得所述文章内容的所述文本评价结果。
  7. 根据权利要求2所述的方法,其中,所述文章内容评价模型还包括:排版子网络;所述至少两个维度的评价结果中的一个评价结果是排版评价结果;
    所述将所述至少两个维度的特征信息输入文章内容评价模型,获得所述待测文章的至少两个维度的评价结果,包括:
    提取所述至少两个维度的特征信息中的文本-多媒体特征向量;
    将所述文本-多媒体特征向量输入到所述排版子网络中,获得所述文章内容的所述排版评价结果。
  8. 根据权利要求1至7任一项所述的方法,其中,所述融合所述至少两个维度的评价结果,获得所述文章内容的内容评价结果,包括:
    通过文章内容评价模型和注意力机制,为所述至少两个维度的评价结果赋予相应的权重值;
    通过文章内容评价模型和所述权重值,对所述至少两个维度的评价结果进行加权计算,获得所述文章内容的所述内容评价结果。
  9. 根据权利要求2至4任一项所述的方法,其中,所述文本-多媒体评价子网络是通过以下方法训练而成的;
    获得文本-多媒体训练集,所述文本-多媒体训练集包括样本文章和与所述样本文章对应的真实文本-多媒体评价结果;
    提取所述样本文章中的样本文章特征信息;
    根据所述样本文章特征信息,提取样本文本-多媒体特征向量,所述样本文本-多媒体特征向量包括样本文本-图像特征向量、样本文本-视频特征向量和样本文本-音频特征向量中的至少一种;
    将所述样本文本-多媒体特征向量输入所述文本-多媒体评价子网络,获得预测文本-多媒体评价结果;
    根据所述预测文本-多媒体评价结果和所述真实文本-多媒体评价结果之间的误差损失,对所述文本-多媒体评价子网络进行训练。
  10. 根据权利要求5所述的方法,其中,所述客观先验特征子网络是通过以下方法训练而成的;
    获得客观先验训练集,所述客观先验训练集包括样本文章和与所述样本文章对应的真实客观先验评价结果;
    提取样本文章中的样本文章特征信息;
    根据所述样本图像信息和所述样本文本信息,获得所述样本文章的样本客观先验特征;
    将所述样本客观先验特征输入所述客观先验特征子网络,获得预测客观先验评价结果;
    根据所述预测客观先验评价结果和所述真实客观先验评价结果之间的误差损失,对所述客观先验特征子网络进行训练。
  11. 根据权利要求6所述的方法,其中,所述文本子网络是通过以下方法训练而成的;
    获得文本训练集,所述文本训练集包括样本文章和与所述样本文章对应的真实文本评价结果;
    提取所述样本文章中的样本文本信息;
    根据所述样本文本信息,提取样本文章词向量;
    将所述样本文章词向量输入所述文本子网络,获得预测文本评价结果;
    根据所述预测文本评价结果和所述真实文本评价结果之间的误差损失,对所述文本子网络进行训练。
  12. 根据权利要求7所述的方法,其中,所述排版子网络是通过以下方法训练而成的;
    获得排版训练集,所述排版训练集包括样本文章和与所述样本文章对应的真实排版评价结果;
    提取所述样本文章中的样本文章特征信息;
    根据所述样本文章特征信息,提取样本文本-多媒体特征向量;
    将所述样本文本-多媒体向量输入到所述排版子网络中,获得预测排版评价结果;
    根据所述预测排版评价结果与所述真实排版评价结果之间的误差损失,对所述排版子网络进行训练。
  13. 一种内容的评价装置,所述装置包括:
    提取模块,用于提取所述文章内容中的文章特征信息,所述文章特征信息包括文本信息,所述文章特征信息还包括图像信息、视频信息、音频信息中的至少一种;
    评价模块,用于根据所述文章特征信息,获得所述文章内容的至少两个维度的评价结果;
    评价融合模块,用于融合所述至少两个维度的评价结果,获得所述文章内容的内容评价结果。
  14. 根据权利要求13所述的装置,其中,
    提取模块,还用于提取所述文章特征信息在至少两个维度的特征信息;
    所述评价模块,还用于将所述至少两个维度的特征信息输入文章内容评价模型,获得所述文章内容的所述至少两个维度的评价结果,所述文章内容评价模型是根据所述至少两个维度的特征信息预测相应维度的评价结果的机器学习模型。
  15. 根据权利要求14所述的装置,其中,所述文章内容评价模型包括:文本-多媒体评价子网络;所述至少两个维度的评价结果中的一个评价结果是文本-多媒体评价结果,所述文本-多媒体评价结果包括文本-图像评价结果、文本-视频评价结果和文本-音频评价结果中的至少一种;
    提取模块,还用于提取所述至少两个维度的特征信息中的文本-多媒体特征向量,所述文本-多媒体特征向量包括图像特征向量和文本特征向量中的至少一种;
    所述评价模块,还用于将所述文本-多媒体特征向量输入所述文本-多媒体评价子网络,获得所述文章内容的所述文本-多媒体评价结果。
  16. 根据权利要求14所述的装置,其中,所述文章内容评价模型还包括:客观先验特征子网络;所述至少两个维度的评价结果中的一个评价结果是客观先验评价结果;
    所述提取模块,还用于提取所述至少两个维度的特征信息中的客观先验特征,所述客观先验特征包括统计学特征、语言学特征、图像质量特征、账号特征中的至少一种;
    所述评价模块,还用于将所述客观先验特征输入所述客观先验特征子网络中,获得所述文章内容的客观先验评价结果。
  17. 根据权利要求14所述的装置,其中,所述文章内容评价模型还包括:排版子网络;所述至少两个维度的评价结果中的一个评价结果是排版评价结果;
    所述提取模块,还用于提取所述至少两个维度的特征信息中的文本-多媒体特征向量;
    所述评价模块,还用于将所述文本-多媒体特征向量输入到所述排版子网络中,获得所述 文章内容的所述排版评价结果。
  18. 一种计算机设备,其中,所述计算机设备包括:处理器和存储器,所述存储器存储有计算机程序,所述计算机程序由所述处理器加载并执行以实现如权利要求1至12任一所述的内容的评价方法。
  19. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序由处理器加载并执行以实现如权利要求1至12任一所述的内容的评价方法。
  20. 一种计算机程序产品,包括计算机程序或指令,其中,所述计算机程序或指令被处理器执行时实现权利要求1至12中任一项所述的内容的评价方法。
PCT/CN2022/074684 2021-01-29 2022-01-28 内容的评价方法、装置、设备及介质 WO2022161470A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110125965.3A CN114818691A (zh) 2021-01-29 2021-01-29 文章内容的评价方法、装置、设备及介质
CN202110125965.3 2021-01-29

Publications (1)

Publication Number Publication Date
WO2022161470A1 true WO2022161470A1 (zh) 2022-08-04

Family

ID=82526671

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/074684 WO2022161470A1 (zh) 2021-01-29 2022-01-28 内容的评价方法、装置、设备及介质

Country Status (2)

Country Link
CN (1) CN114818691A (zh)
WO (1) WO2022161470A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115170250A (zh) * 2022-09-02 2022-10-11 杭州洋驼网络科技有限公司 一种电商平台的物品信息管理方法以及装置
CN116011893A (zh) * 2023-03-27 2023-04-25 深圳新闻网传媒股份有限公司 基于垂直领域的市区融媒评价方法、装置及电子设备
CN116719930A (zh) * 2023-04-28 2023-09-08 西安工程大学 基于视觉方面注意的多模态情感分析方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116346697B (zh) * 2023-05-30 2023-09-19 亚信科技(中国)有限公司 通信业务质量评测方法、装置及电子设备
CN116578763B (zh) * 2023-07-11 2023-09-15 卓谨信息科技(常州)有限公司 基于生成式ai认知模型的多源信息展览系统

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090300547A1 (en) * 2008-05-30 2009-12-03 Kibboko, Inc. Recommender system for on-line articles and documents
US20120084155A1 (en) * 2010-10-01 2012-04-05 Yahoo! Inc. Presentation of content based on utility
CN111311554A (zh) * 2020-01-21 2020-06-19 腾讯科技(深圳)有限公司 图文内容的内容质量确定方法、装置、设备及存储介质
CN111368075A (zh) * 2020-02-27 2020-07-03 腾讯科技(深圳)有限公司 文章质量预测方法、装置、电子设备及存储介质
CN111488931A (zh) * 2020-04-10 2020-08-04 腾讯科技(深圳)有限公司 文章质量评估方法、文章推荐方法及其对应的装置
CN112069802A (zh) * 2020-08-26 2020-12-11 北京小米松果电子有限公司 文章质量评分方法、文章质量评分装置及存储介质
CN113407663A (zh) * 2020-11-05 2021-09-17 腾讯科技(深圳)有限公司 基于人工智能的图文内容质量识别方法和装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090300547A1 (en) * 2008-05-30 2009-12-03 Kibboko, Inc. Recommender system for on-line articles and documents
US20120084155A1 (en) * 2010-10-01 2012-04-05 Yahoo! Inc. Presentation of content based on utility
CN111311554A (zh) * 2020-01-21 2020-06-19 腾讯科技(深圳)有限公司 图文内容的内容质量确定方法、装置、设备及存储介质
CN111368075A (zh) * 2020-02-27 2020-07-03 腾讯科技(深圳)有限公司 文章质量预测方法、装置、电子设备及存储介质
CN111488931A (zh) * 2020-04-10 2020-08-04 腾讯科技(深圳)有限公司 文章质量评估方法、文章推荐方法及其对应的装置
CN112069802A (zh) * 2020-08-26 2020-12-11 北京小米松果电子有限公司 文章质量评分方法、文章质量评分装置及存储介质
CN113407663A (zh) * 2020-11-05 2021-09-17 腾讯科技(深圳)有限公司 基于人工智能的图文内容质量识别方法和装置

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115170250A (zh) * 2022-09-02 2022-10-11 杭州洋驼网络科技有限公司 一种电商平台的物品信息管理方法以及装置
CN116011893A (zh) * 2023-03-27 2023-04-25 深圳新闻网传媒股份有限公司 基于垂直领域的市区融媒评价方法、装置及电子设备
CN116011893B (zh) * 2023-03-27 2023-05-26 深圳新闻网传媒股份有限公司 基于垂直领域的市区融媒评价方法、装置及电子设备
CN116719930A (zh) * 2023-04-28 2023-09-08 西安工程大学 基于视觉方面注意的多模态情感分析方法

Also Published As

Publication number Publication date
CN114818691A (zh) 2022-07-29

Similar Documents

Publication Publication Date Title
Guo Deep learning approach to text analysis for human emotion detection from big data
CN111444709B (zh) 文本分类方法、装置、存储介质及设备
WO2022161470A1 (zh) 内容的评价方法、装置、设备及介质
US11853436B2 (en) Protecting cognitive systems from model stealing attacks
US20190130110A1 (en) Protecting Cognitive Systems from Gradient Based Attacks through the Use of Deceiving Gradients
CN112131350B (zh) 文本标签确定方法、装置、终端及可读存储介质
Guo et al. Attention-based character-word hybrid neural networks with semantic and structural information for identifying of urgent posts in MOOC discussion forums
CN113590849A (zh) 多媒体资源分类模型训练方法和多媒体资源推荐方法
Cai et al. Intelligent question answering in restricted domains using deep learning and question pair matching
CN112257841A (zh) 图神经网络中的数据处理方法、装置、设备及存储介质
CN112231491B (zh) 基于知识结构的相似试题识别方法
Liu et al. Fact-based visual question answering via dual-process system
Meddeb et al. Personalized smart learning recommendation system for arabic users in smart campus
CN112131345A (zh) 文本质量的识别方法、装置、设备及存储介质
Hao et al. Sentiment recognition and analysis method of official document text based on BERT–SVM model
CN116385937A (zh) 一种基于多粒度的跨模态交互框架解决视频问答的方法及系统
Wagle et al. Explainable ai for multimodal credibility analysis: Case study of online beauty health (mis)-information
Chaudhuri Visual and text sentiment analysis through hierarchical deep learning networks
Yan et al. Hierarchical interpretation of neural text classification
Zhong et al. On the correction of errors in English grammar by deep learning
CN113326374B (zh) 基于特征增强的短文本情感分类方法及系统
Wu et al. Machine translation of English speech: Comparison of multiple algorithms
Biswas et al. A new ontology-based multimodal classification system for social media images of personality traits
Shi et al. Product feature extraction from Chinese online reviews: application to product improvement
Chen A hidden Markov optimization model for processing and recognition of English speech feature signals

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22745342

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 11.12.2023)