CN111144103A - Film review identification method and device - Google Patents

Film review identification method and device Download PDF

Info

Publication number
CN111144103A
CN111144103A CN201911311671.9A CN201911311671A CN111144103A CN 111144103 A CN111144103 A CN 111144103A CN 201911311671 A CN201911311671 A CN 201911311671A CN 111144103 A CN111144103 A CN 111144103A
Authority
CN
China
Prior art keywords
recognition model
evaluation
target
sample
comment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201911311671.9A
Other languages
Chinese (zh)
Inventor
李嘉琛
付骁弈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN201911311671.9A priority Critical patent/CN111144103A/en
Publication of CN111144103A publication Critical patent/CN111144103A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a film review identification method and device. Wherein, the method comprises the following steps: acquiring a target film comment to be identified; inputting a target film comment into a target recognition model, wherein the target recognition model is a model obtained by training an original recognition model by using a sample film comment, the target recognition model is used for recognizing the sample film comment to obtain an evaluation type of the sample film comment, the sample film comment comprises a first sample film comment and a second sample film comment, the second sample film comment is a film comment obtained by replacing a first named entity in the first sample film comment by using a second named entity, and the first named entity and the second named entity are of the same type; and obtaining the evaluation type of the target shadow evaluation output by the target recognition model, wherein the evaluation type is positive evaluation, neutral evaluation or negative evaluation. The invention solves the technical problem of low efficiency of film review identification in the related technology.

Description

Film review identification method and device
Technical Field
The invention relates to the field of computers, in particular to a film review identification method and device.
Background
In the related art, in the process of identifying the type of the movie review, the neural network model may be used for identification. However, when the evaluation is recognized using the neural network model, the neural network model needs to be trained in advance, and the training needs to prepare a plurality of sample data in advance.
However, with the above method, the sample data required to be prepared in the related art is too much, and each sample data requires a lot of manpower, which results in low efficiency of movie review identification.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides a film review identification method and device, which at least solve the technical problem of low film review identification efficiency in the related art.
According to an aspect of an embodiment of the present invention, there is provided a movie review identification method including: acquiring a target film comment to be identified; inputting the target film comment into a target recognition model, wherein the target recognition model is obtained by training an original recognition model by using a sample film comment, the target recognition model is used for recognizing the sample film comment to obtain an evaluation type of the sample film comment, the sample film comment comprises a first sample film comment and a second sample film comment, the second sample film comment is obtained by replacing a first named entity in the first sample film comment by using a second named entity, and the first named entity and the second named entity are of the same type; and acquiring the evaluation type of the target shadow rating output by the target recognition model, wherein the evaluation type is positive evaluation, neutral evaluation or negative evaluation.
As an optional example, before the inputting the target shadow rating into the target recognition model, the method further includes: acquiring the first sample shadow comment; determining said first named entity in said first sample movie review; acquiring the second named entity with the same type as the first named entity in the knowledge graph; and replacing the first named entity with the second named entity to obtain the second sample shadow review.
As an alternative example, before the obtaining of the second named entity in the knowledge-graph, which is of the same type as the first named entity, the method further includes: acquiring a plurality of first sample shadow comments; performing word segmentation on each first sample shadow score in the plurality of first sample shadow scores, and labeling each word in each first sample shadow score after word segmentation to obtain a labeling result; acquiring words labeled as named entities from the labeling result; and establishing association for the same type of words in the words labeled as the named entities, and storing the association in the knowledge graph.
As an alternative example, after the replacing the first named entity with the second named entity to obtain the second sample shadow rating, the method further includes: inputting the sample film comment into the original recognition model, and acquiring the evaluation type of the sample film comment output by the original recognition model; and determining the original recognition model as the target recognition model when the evaluation types of the M sample film evaluations are output after M sample film evaluations are input into the original recognition model, and when the output evaluation types of the M sample film evaluations comprise N evaluation types meeting a preset condition, wherein the preset condition is used for indicating that the evaluation type of the current sample film evaluation output by the original recognition model is the same as the evaluation type pre-marked to the current sample film evaluation, M and N are positive integers, and N/M is larger than a first threshold value.
As an alternative example, before the target comment is input into the target recognition model, the method further includes: adding an attention layer in the target recognition model, wherein the attention layer is a product of a feature matrix and a word vector after the feature matrix in the target recognition model is used for converting an entity phrase into the word vector, the entity phrase is a phrase consisting of all named entities in the target movie score, and the feature matrix is a matrix used for extracting word vector features in the target recognition model; the inputting the target movie scores into the target recognition model comprises: inputting said target rating into said target recognition model to which said attentive layer is added; the obtaining of the evaluation type of the target comment output by the target recognition model includes: acquiring the evaluation type of the target movie rating output by using the target recognition model added with the attention layer.
According to another aspect of the embodiments of the present invention, there is also provided a movie review identification apparatus including: the first acquisition unit is used for acquiring a target film comment to be identified; a first input unit, configured to input the target comment into a target recognition model, where the target recognition model is a model obtained by training an original recognition model using a sample comment, the target recognition model is used to recognize the sample comment to obtain an evaluation type of the sample comment, the sample comment includes a first sample comment and a second sample comment, the second sample comment is a comment obtained by replacing a first named entity in the first sample comment with a second named entity, and the first named entity and the second named entity are of the same type; and a second obtaining unit, configured to obtain an evaluation type of the target shadow rating output by the target recognition model, where the evaluation type is a positive evaluation, a neutral evaluation, or a negative evaluation.
As an optional example, the apparatus further includes: a third obtaining unit, configured to obtain the first sample shadow comment before the target shadow comment is input into the target recognition model; a first determining unit, configured to determine the first named entity in the first sample movie review; a fourth obtaining unit, configured to obtain the second named entity with the same type as the first named entity in the knowledge graph; and the replacing unit is used for replacing the first named entity with the second named entity to obtain the second sample shadow comment.
As an optional example, the apparatus further includes: a fifth obtaining unit, configured to obtain a plurality of first sample shadow comments before obtaining the second named entity with the same type as the first named entity in the knowledge graph; a word segmentation unit, configured to segment each of the plurality of first sample movie scores, and label each word in each of the segmented first sample movie scores to obtain a labeling result; a sixth obtaining unit, configured to obtain a word labeled as a named entity from the labeling result; and the association unit is used for establishing association for the words of the same type in the words marked as the named entities and storing the association into the knowledge graph.
As an optional example, the apparatus further includes: a second input unit, configured to input the sample movie rating into the original recognition model after the second sample movie rating is obtained by replacing the first named entity with the second named entity, and obtain an evaluation type of the sample movie rating output by the original recognition model; a second determining unit, configured to determine the original recognition model as the target recognition model when the evaluation types of the M sample evaluations are output after the M sample evaluations are input to the original recognition model, and when the output evaluation types of the M sample evaluations include N evaluation types meeting a predetermined condition, where the predetermined condition is used to indicate that the evaluation type of the current sample evaluation output by the original recognition model is the same as the evaluation type of the current sample evaluation that is labeled in advance, the M and the N are positive integers, and N/M is greater than a first threshold.
As an optional example, the apparatus further includes: an adding unit, configured to add an attention layer to the target recognition model before the target movie review is input into the target recognition model, where the attention layer is a product of a feature matrix and a word vector after an entity phrase is converted into the word vector by using the feature matrix in the target recognition model, the entity phrase is a phrase composed of all named entities in the target movie review, and the feature matrix is a matrix used for extracting features of the word vector in the target recognition model; the first input unit includes: an input module, configured to input the target movie rating into the target recognition model to which the attention layer is added; the second acquiring unit includes: an obtaining module, configured to obtain the evaluation type of the target movie rating output by using the target recognition model to which the attention layer is added.
In the embodiment of the invention, the target film comment to be identified is acquired; inputting the target film comment into a target recognition model, wherein the target recognition model is obtained by training an original recognition model by using a sample film comment, the target recognition model is used for recognizing the sample film comment to obtain an evaluation type of the sample film comment, the sample film comment comprises a first sample film comment and a second sample film comment, the second sample film comment is obtained by replacing a first named entity in the first sample film comment by using a second named entity, and the first named entity and the second named entity are of the same type; and acquiring the evaluation type of the target shadow evaluation output by the target recognition model, wherein the evaluation type is a positive evaluation method, a neutral evaluation method or a negative evaluation method. According to the method, in the process of training the film comment recognition model, the first named entity of the first sample film comment can be replaced after the first sample film comment is obtained, so that a plurality of sample data can be obtained, the efficiency of training the model is improved, and the film comment recognition efficiency is further improved. The technical problem of low efficiency of film review identification in the related technology is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
fig. 1 is a schematic flow chart of an alternative movie review identification method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an alternative film review identification apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
According to an aspect of the embodiments of the present invention, there is provided a movie review identification method, optionally, as an optional implementation manner, as shown in fig. 1, the movie review identification method includes:
s102, acquiring a target film comment to be identified;
s104, inputting the target film comment into a target recognition model, wherein the target recognition model is obtained by training an original recognition model by using a sample film comment, the target recognition model is used for recognizing the sample film comment to obtain an evaluation type of the sample film comment, the sample film comment comprises a first sample film comment and a second sample film comment, the second sample film comment is obtained by replacing a first named entity in the first sample film comment by using a second named entity, and the first named entity and the second named entity are of the same type;
s106, obtaining the evaluation type of the target shadow evaluation output by the target recognition model, wherein the evaluation type is positive evaluation, neutral evaluation or negative evaluation.
Alternatively, the movie review identification method may be, but is not limited to, applied to a terminal capable of calculating data, such as a mobile phone, a tablet computer, a notebook computer, a PC, and the like, and the terminal may interact with a server through a network, which may include, but is not limited to, a wireless network or a wired network. Wherein, this wireless network includes: WIFI and other networks that enable wireless communication. Such wired networks may include, but are not limited to: wide area networks, metropolitan area networks, and local area networks. The server may include, but is not limited to, any hardware device capable of performing computations.
Alternatively, the above-described movie review identification method may be applied to a process of identifying a movie review type. For example, for a movie, all the movie reviews are counted, and the movie reviews are analyzed to obtain the movie ratings. Alternatively, the movie preference of a user can be known by analyzing the movie ratings of the user for each movie.
Taking the identification film evaluation as an example, for a target film evaluation of a movie, after the target film evaluation is obtained, the target film evaluation is input into a target identification model, and the target identification model outputs the film evaluation as positive evaluation, neutral evaluation or negative evaluation of the movie.
The target recognition model is formed through pre-training. When training the target recognition model, a sample shadow review is first obtained. The sample shadow rating may comprise a first sample shadow rating. After the first sample evaluation is obtained, the named entities in the first sample evaluation can be identified, and there can be zero or one or more named entities. A first named entity in the named entities is determined (the evaluation of the named entities is not included without determining the first named entity), and then a second named entity of the same type as the first named entity is used to replace the first named entity, resulting in a second sample type. Therefore, after one sample data is replaced, a plurality of pieces of sample data are obtained. And a plurality of sample data training models are used, so that the training efficiency of the target recognition model is improved. And the target film comment is identified by using the target identification model, so that the identification efficiency of the film comment is improved.
Optionally, the target recognition model is trained on the original recognition model. The recognition accuracy of the original recognition model may be low, and when the original recognition model is trained by using a sample, the original recognition model outputs the evaluation type of the sample data after the sample data is input. And if the output evaluation type is consistent with the evaluation type marked by the sample data in advance, the identification of the original identification model is correct. If the probability of correct model identification is greater than the first threshold value, for example, 99%, the model is accurate enough, and at this time, the original identification model is determined as the target identification model and put into use. The target recognition model may also be corrected during use to further improve accuracy.
Optionally, the process of replacing named entities in the above process requires obtaining a second named entity of the same type as the first named entity. A knowledge graph of the film comments can be established in advance, and association and type determination can be carried out on each named entity.
The Knowledge map (also called scientific Knowledge map) is called Knowledge domain visualization or Knowledge domain mapping map in the book intelligence world, and is a series of different graphs for displaying the relation between the Knowledge development process and the structure, describing Knowledge resources and carriers thereof by using visualization technology, and mining, analyzing, constructing, drawing and displaying Knowledge and the mutual relation among the Knowledge resources and the carriers. The main goal of the knowledge-graph is to describe various entities and concepts existing in the real world and strong relationships between them, we use relationships to describe the association between two entities, such as the relationship between yaoming and rocket team, their attributes, and we describe its intrinsic characteristics with "attribute-value pairs", such as the person, who has the attributes of age, height, and weight.
A named entity refers to a real-world person or thing, such as a person, place, organization, product, etc., that can be named. Either as concrete entities or as abstract concepts.
The entity is a node in a graph represented by the knowledge base, and is represented as an object or concept in the physical world. For example, "Beijing" may represent one entity in the diagram.
The entity type stores people, organizations, object types, or concepts of related information. Describing the type of information being mastered. Such as beijing shanghai, is a site type.
A relationship is an edge in a graph represented by a knowledge base, connecting two entities, representing the association of the two entities, with a direction. For example, "Beijing" is located in "Shanghai" north, where "Beijing" and "Shanghai" are both entities, and "Beijing" and "Shanghai" are located in north.
The relationship type is the type of the entity and the entity, the type of the entity and the attribute, if the relationship has a direction, the relationship is directed, and if the relationship has no direction, the relationship is undirected.
The sequence labeling task is to label a category label for data of each unit of a sequence, and is commonly used for tasks such as word segmentation, part of speech labeling, named entity identification and the like. For example, a simple sequence tagging task defines tags in advance: person name (PER), Organization (ORG) and others (O), the sequence tagging model will be a sentence (sequence) — Xiaoming read at Beijing university-tagged. The following table (1)
Small Ming dynasty In that North China Jing made of Chinese medicinal materials Big (a) Study the design Then is turned on Reading
PER PER O ORG ORG ORG ORG O O
Watch (1)
Conditional Random Field (Conditional Random Field) is a statistical algorithm used for sequence labeling tasks.
The long-short memory network (LSTM) is a recurrent neural network model suitable for processing time series data, and is commonly used for text classification tasks. Bi-LSTM is information that extracts text from the forward and backward directions of a sequence using two layers of LSTM, respectively.
After a film comment is acquired, the film comment is an annotated film comment. Such as:
the final chapter of the duplicate connection 4 is just right. It is also very challenging to use spatio-temporal re-entrant to talk to itself before. The final outcome of the black widow and ironmen earn countless tears. "," evaluation ":
"Positive evaluation" }
Firstly, extracting entities (movie names, actors, directors, roles … …) from a movie review data set, namely Named Entity Recognition (NER) is a sequence labeling task and can be a keyword matching model based on a certain custom rule; or statistical-based algorithms such as CRF models; or an algorithm based on deep learning, such as the Bi-LSTM model. For example, design movie entity sequence annotation text using IOB annotation criteria.
When labeling, B _ MOV may be used to label the beginning character of a movie (movie) entity, I _ MOV labels the middle character of a movie (movie) entity, B _ CHAR labels the beginning character of a role (character) entity, I _ CHAR labels the middle character of a role (character) entity, and O labels others, resulting in the following table (2).
Compound medicine Couplet 4 Is/are as follows Final (a Chinese character of 'gan') Badge Is not limited to Mistakes Benefit to By using Time of flight
B_MOV I_MOV I_MOV O O O O O O O O
Air conditioner Heavy load Return to And in-line with the above Front side Is/are as follows From Has already got To pair Telephone
O O O O O O O O O O O
And also Is that Very much Is provided with Stem of stem Is/are as follows Most preferably Final (a Chinese character of 'gan') Black colour Oligo (A) Woman diseases
O O O O O O O O B_char I_char I_char
And steel Iron Swordsman Is/are as follows Knot Office Make an earn Tear Is free of Number of
O B_char I_char I_char O O O O O O O
Watch (2)
The above is the labeling result for one film comment. In the case where a plurality of movie reviews are included, the plurality of movie reviews may be labeled so that a plurality of named entities of the same type may be obtained. Saving a plurality of named entities of the same type to a knowledge graph. When the film comment training model is used, after one film comment is obtained, the film comment is labeled, and a plurality of film comments are obtained by replacing the named entity in the film comment, so that the effect of training the model by using the plurality of film comments is better.
For example, the previous example uses the movie, tamanik number, and role alleles to replace entities, construct artificial data:
the final chapter of { "content" [ Tatannik number ], good results. It is also very frustrating to use space-time to resume previous self-conversations. Finally the outcome of [ jack ] and [ dew ] earns countless tears. ",
"evaluation": front face "}.
Therefore, another sample film evaluation is obtained by replacing named entities such as the Tatannik number, Jack, and dews.
In addition to the above process, an attention layer may be added to the model in the present solution. In the text classification task, attention mechanism is added to enable the model to learn different weights for different words in training, and the words with influences on classification results can be interpreted as words which are 'noticed' by the training model.
The text classification model is a word frequency model based on statistics, or a language model based on pre-training, and the like, and essentially encodes the text into a weighted vector of some features. In the word frequency model, the characteristics are actually a dictionary formed by words, each word has a number, and the input text is a weighted vector of the word; when the pre-trained language model is used, a dictionary is actually compressed by using a large amount of text data in advance, each word of input text data is converted into a feature vector through the pre-trained model, the text is the average value of the feature vectors of all words, position information of the words in the text can be added and encoded into the vector, and the like to preprocess the text. In the model training process of text classification, proper weight is selected for the features to improve the accuracy of the classification model.
Grouping the entity words in the film and television map according to types, converting the entity word group into a word vector by using a characteristic matrix in a classification model, multiplying the word vector and the characteristic matrix to be used as an attention layer of a deep learning model, namely calculating the similarity between a text vector and the entity word group vector, and improving the characteristic weight of the entity word so that the model pays attention to the entity group influencing the result in the learning process, thereby judging the starting point of the emotion classification result according to the characteristic weight, for example, the positive emotion of film evaluation comes from the aspect of film characters, and increasing the weight of character nouns in the model learning process. A new text is input, so that the emotional polarity can be predicted, and the source of the emotional polarity can be found through the distribution of the attention layer weights on the words.
Through the embodiment, the method improves the efficiency of identifying the target movie reviews. Furthermore, due to the fact that the attention layer is added, the film reviews can be identified more accurately, and accuracy of identifying the film reviews is improved.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
According to another aspect of the embodiment of the invention, a film comment identification device for implementing the film comment identification method is also provided. As shown in fig. 2, the apparatus includes:
(1) a first obtaining unit 202, configured to obtain a target movie review to be identified;
(2) the first input unit 204 is configured to input the target comment into a target recognition model, where the target recognition model is a model obtained by training an original recognition model using a sample comment, the target recognition model is configured to recognize the sample comment to obtain an evaluation type of the sample comment, the sample comment includes a first sample comment and a second sample comment, the second sample comment is a comment obtained by replacing a first named entity in the first sample comment with a second named entity, and the first named entity and the second named entity are of the same type;
(3) a second obtaining unit 206, configured to obtain an evaluation type of the target shadow rating output by the target recognition model, where the evaluation type is a positive evaluation, a neutral evaluation, or a negative evaluation.
Alternatively, the above-described comment identification apparatus may be applied to a process of identifying a type of comment. For example, for a movie, all the movie reviews are counted, and the movie reviews are analyzed to obtain the movie ratings. Alternatively, the movie preference of a user can be known by analyzing the movie ratings of the user for each movie.
Taking the identification film evaluation as an example, for a target film evaluation of a movie, after the target film evaluation is obtained, the target film evaluation is input into a target identification model, and the target identification model outputs the film evaluation as positive evaluation, neutral evaluation or negative evaluation of the movie.
The target recognition model is formed through pre-training. When training the target recognition model, a sample shadow review is first obtained. The sample shadow rating may comprise a first sample shadow rating. After the first sample evaluation is obtained, the named entities in the first sample evaluation can be identified, and there can be zero or one or more named entities. A first named entity in the named entities is determined (the evaluation of the named entities is not included without determining the first named entity), and then a second named entity of the same type as the first named entity is used to replace the first named entity, resulting in a second sample type. Therefore, after one sample data is replaced, a plurality of pieces of sample data are obtained. And a plurality of sample data training models are used, so that the training efficiency of the target recognition model is improved. And the target film comment is identified by using the target identification model, so that the identification efficiency of the film comment is improved.
Optionally, the target recognition model is trained on the original recognition model. The recognition accuracy of the original recognition model may be low, and when the original recognition model is trained by using a sample, the original recognition model outputs the evaluation type of the sample data after the sample data is input. And if the output evaluation type is consistent with the evaluation type marked by the sample data in advance, the identification of the original identification model is correct. If the probability of correct model identification is greater than the first threshold value, for example, 99%, the model is accurate enough, and at this time, the original identification model is determined as the target identification model and put into use. The target recognition model may also be corrected during use to further improve accuracy.
Optionally, the process of replacing named entities in the above process requires obtaining a second named entity of the same type as the first named entity. A knowledge graph of the film comments can be established in advance, and association and type determination can be carried out on each named entity.
The Knowledge map (also called scientific Knowledge map) is called Knowledge domain visualization or Knowledge domain mapping map in the book intelligence world, and is a series of different graphs for displaying the relation between the Knowledge development process and the structure, describing Knowledge resources and carriers thereof by using visualization technology, and mining, analyzing, constructing, drawing and displaying Knowledge and the mutual relation among the Knowledge resources and the carriers. The main goal of the knowledge-graph is to describe various entities and concepts existing in the real world and strong relationships between them, we use relationships to describe the association between two entities, such as the relationship between yaoming and rocket team, their attributes, and we describe its intrinsic characteristics with "attribute-value pairs", such as the person, who has the attributes of age, height, and weight.
A named entity refers to a real-world person or thing, such as a person, place, organization, product, etc., that can be named. Either as concrete entities or as abstract concepts.
The entity is a node in a graph represented by the knowledge base, and is represented as an object or concept in the physical world. For example, "Beijing" may represent one entity in the diagram.
The entity type stores people, organizations, object types, or concepts of related information. Describing the type of information being mastered. Such as beijing shanghai, is a site type.
A relationship is an edge in a graph represented by a knowledge base, connecting two entities, representing the association of the two entities, with a direction. For example, "Beijing" is located in "Shanghai" north, where "Beijing" and "Shanghai" are both entities, and "Beijing" and "Shanghai" are located in north.
The relationship type is the type of the entity and the entity, the type of the entity and the attribute, if the relationship has a direction, the relationship is directed, and if the relationship has no direction, the relationship is undirected.
The sequence labeling task is to label a category label for data of each unit of a sequence, and is commonly used for tasks such as word segmentation, part of speech labeling, named entity identification and the like. For example, a simple sequence tagging task defines tags in advance: person name (PER), Organization (ORG) and others (O), the sequence tagging model will be a sentence (sequence) — Xiaoming read at Beijing university-tagged. The following table (1)
Small Ming dynasty In that North China Jing made of Chinese medicinal materials Big (a) Study the design Then is turned on Reading
PER PER O ORG ORG ORG ORG O O
Watch (1)
Conditional Random Field (Conditional Random Field) is a statistical algorithm used for sequence labeling tasks.
The long-short memory network (LSTM) is a recurrent neural network model suitable for processing time series data, and is commonly used for text classification tasks. Bi-LSTM is information that extracts text from the forward and backward directions of a sequence using two layers of LSTM, respectively.
After a film comment is acquired, the film comment is an annotated film comment. Such as:
the final chapter of the duplicate connection 4 is just right. It is also very challenging to use spatio-temporal re-entrant to talk to itself before. The final outcome of the black widow and ironmen earn countless tears. "," evaluation ":
"Positive evaluation" }
Firstly, extracting entities (movie names, actors, directors, roles … …) from a movie review data set, namely Named Entity Recognition (NER) is a sequence labeling task and can be a keyword matching model based on a certain custom rule; or statistical-based algorithms such as CRF models; or an algorithm based on deep learning, such as the Bi-LSTM model. For example, design movie entity sequence annotation text using IOB annotation criteria.
When labeling, B _ MOV may be used to label the beginning character of a movie (movie) entity, I _ MOV labels the middle character of a movie (movie) entity, B _ CHAR labels the beginning character of a role (character) entity, I _ CHAR labels the middle character of a role (character) entity, and O labels others, resulting in the following table (2).
Compound medicine Couplet 4 Is/are as follows Final (a Chinese character of 'gan') Badge Is not limited to Mistakes Benefit to By using Time of flight
B_MOV I_MOV I_MOV O O O O O O O O
Air conditioner Heavy load Return to And in-line with the above Front side Is/are as follows From Has already got To pair Telephone
O O O O O O O O O O O
And also Is that Very much Is provided with Stem of stem Is/are as follows Most preferably Final (a Chinese character of 'gan') Black colour Oligo (A) Woman diseases
O O O O O O O O B_char I_char I_char
And steel Iron Swordsman Is/are as follows Knot Office Make an earn Tear Is free of Number of
O B_char I_char I_char O O O O O O O
Watch (2)
The above is the labeling result for one film comment. In the case where a plurality of movie reviews are included, the plurality of movie reviews may be labeled so that a plurality of named entities of the same type may be obtained. Saving a plurality of named entities of the same type to a knowledge graph. When the film comment training model is used, after one film comment is obtained, the film comment is labeled, and a plurality of film comments are obtained by replacing the named entity in the film comment, so that the effect of training the model by using the plurality of film comments is better.
For example, the previous example uses the movie, tamanik number, and role alleles to replace entities, construct artificial data:
the final chapter of { "content" [ Tatannik number ], good results. It is also very frustrating to use space-time to resume previous self-conversations. Finally the outcome of [ jack ] and [ dew ] earns countless tears. ",
"evaluation": front face "}.
Therefore, another sample film evaluation is obtained by replacing named entities such as the Tatannik number, Jack, and dews.
In addition to the above process, an attention layer may be added to the model in the present solution. In the text classification task, attention mechanism is added to enable the model to learn different weights for different words in training, and the words with influences on classification results can be interpreted as words which are 'noticed' by the training model.
The text classification model is a word frequency model based on statistics, or a language model based on pre-training, and the like, and essentially encodes the text into a weighted vector of some features. In the word frequency model, the characteristics are actually a dictionary formed by words, each word has a number, and the input text is a weighted vector of the word; when the pre-trained language model is used, a dictionary is actually compressed by using a large amount of text data in advance, each word of input text data is converted into a feature vector through the pre-trained model, the text is the average value of the feature vectors of all words, position information of the words in the text can be added and encoded into the vector, and the like to preprocess the text. In the model training process of text classification, proper weight is selected for the features to improve the accuracy of the classification model.
Grouping the entity words in the film and television map according to types, converting the entity word group into a word vector by using a characteristic matrix in a classification model, multiplying the word vector and the characteristic matrix to be used as an attention layer of a deep learning model, namely calculating the similarity between a text vector and the entity word group vector, and improving the characteristic weight of the entity word so that the model pays attention to the entity group influencing the result in the learning process, thereby judging the starting point of the emotion classification result according to the characteristic weight, for example, the positive emotion of film evaluation comes from the aspect of film characters, and increasing the weight of character nouns in the model learning process. A new text is input, so that the emotional polarity can be predicted, and the source of the emotional polarity can be found through the distribution of the attention layer weights on the words.
Through the embodiment, the method improves the efficiency of identifying the target movie reviews. Furthermore, due to the fact that the attention layer is added, the film reviews can be identified more accurately, and accuracy of identifying the film reviews is improved.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A movie review identification method is characterized by comprising the following steps:
acquiring a target film comment to be identified;
inputting the target film comment into a target recognition model, wherein the target recognition model is obtained by training an original recognition model by using a sample film comment, the target recognition model is used for recognizing the sample film comment to obtain an evaluation type of the sample film comment, the sample film comment comprises a first sample film comment and a second sample film comment, the second sample film comment is obtained by replacing a first named entity in the first sample film comment by using a second named entity, and the first named entity and the second named entity are of the same type;
and obtaining the evaluation type of the target shadow evaluation output by the target recognition model, wherein the evaluation type is positive evaluation, neutral evaluation or negative evaluation.
2. The method of claim 1, wherein prior to said inputting said target rating into a target recognition model, said method further comprises:
acquiring the first sample film comment;
determining the first named entity in the first sample shadow review;
acquiring the second named entity with the same type as the first named entity in the knowledge graph;
and replacing the first named entity with the second named entity to obtain the second sample shadow comment.
3. The method of claim 2, further comprising, prior to the obtaining the second named entity of the same type as the first named entity in the knowledge-graph:
acquiring a plurality of first sample film reviews;
performing word segmentation on each first sample shadow score in the plurality of first sample shadow scores, and labeling each word in each first sample shadow score after word segmentation to obtain a labeling result;
acquiring words labeled as named entities from the labeling result;
and establishing association for the same type of words in the words labeled as the named entities, and storing the association in the knowledge graph.
4. The method of claim 2, wherein after the replacing the first named entity with the second named entity results in the second sample shadow rating, the method further comprises:
inputting the sample film comments into the original recognition model, and acquiring the evaluation types of the sample film comments output by the original recognition model;
after M sample film evaluations are input into the original recognition model, determining the original recognition model as the target recognition model under the condition that the evaluation types of the M sample film evaluations are output and N evaluation types meeting preset conditions are included in the output evaluation types of the M sample film evaluations, wherein the preset conditions are used for indicating that the evaluation type of the current sample film evaluation output by the original recognition model is the same as the evaluation type of the current sample film evaluation which is labeled in advance, M and N are positive integers, and N/M is larger than a first threshold value.
5. The method according to any one of claims 1 to 4,
before inputting the target shadow rating into a target recognition model, the method further comprises: adding an attention layer in the target recognition model, wherein the attention layer is a product of a feature matrix and a word vector after the feature matrix in the target recognition model is used for converting an entity phrase into the word vector, the entity phrase is a phrase consisting of all named entities in the target movie review, and the feature matrix is a matrix used for extracting word vector features in the target recognition model;
the inputting the target movie review into a target recognition model comprises: inputting the target shadow rating into the target recognition model to which the attention layer is added;
the obtaining of the evaluation type of the target comment output by the target recognition model includes: acquiring the evaluation type of the target shadow evaluation output by using the target recognition model added with the attention layer.
6. A movie review identification device, comprising:
the first acquisition unit is used for acquiring a target film comment to be identified;
the system comprises a first input unit, a second input unit and a third input unit, wherein the first input unit is used for inputting the target film comment into a target recognition model, the target recognition model is obtained after an original recognition model is trained by using a sample film comment, the target recognition model is used for recognizing the sample film comment to obtain an evaluation type of the sample film comment, the sample film comment comprises a first sample film comment and a second sample film comment, the second sample film comment is obtained after a first named entity in the first sample film comment is replaced by using a second named entity, and the first named entity and the second named entity are of the same type;
and the second acquisition unit is used for acquiring the evaluation type of the target shadow rating output by the target recognition model, wherein the evaluation type is positive evaluation, neutral evaluation or negative evaluation.
7. The apparatus of claim 6, further comprising:
a third obtaining unit, configured to obtain the first sample movie rating before the target movie rating is input into a target recognition model;
a first determination unit for determining the first named entity in the first sample movie review;
the fourth acquisition unit is used for acquiring the second named entity with the same type as the first named entity in the knowledge graph;
and the replacing unit is used for replacing the first named entity with the second named entity to obtain the second sample shadow comment.
8. The apparatus of claim 7, further comprising:
a fifth obtaining unit, configured to obtain a plurality of first sample reviews before obtaining the second named entity in the knowledge graph, where the second named entity is of the same type as the first named entity;
the word segmentation unit is used for segmenting each first sample film comment in the plurality of first sample film comments, labeling each word in each segmented first sample film comment, and obtaining a labeling result;
a sixth obtaining unit, configured to obtain a word labeled as a named entity from the labeling result;
and the association unit is used for establishing association for the words of the same type in the words marked as the named entities and storing the association into the knowledge graph.
9. The apparatus of claim 7, further comprising:
the second input unit is used for inputting the sample film comment into the original recognition model after the second sample film comment is obtained by replacing the first named entity with the second named entity, and acquiring the evaluation type of the sample film comment output by the original recognition model;
a second determining unit, configured to determine the original recognition model as the target recognition model when the M sample evaluations are output after the original recognition model inputs the M sample evaluations, and when the output M sample evaluations include N evaluation types meeting a predetermined condition, where the predetermined condition is used to indicate that the evaluation type of the current sample evaluation output by the original recognition model is the same as the evaluation type of the current sample evaluation that is pre-labeled, where M and N are positive integers, and N/M is greater than a first threshold.
10. The apparatus according to any one of claims 6 to 9,
the device further comprises: an adding unit, configured to add an attention layer to a target recognition model before the target evaluation is input into the target recognition model, where the attention layer is a product of a feature matrix and a word vector after an entity phrase is converted into the word vector by using the feature matrix in the target recognition model, the entity phrase is a phrase composed of all named entities in the target evaluation, and the feature matrix is a matrix used for extracting features of the word vector in the target recognition model;
the first input unit includes: an input module, configured to input the target movie rating into the target recognition model to which the attention layer is added;
the second acquisition unit includes: an obtaining module, configured to obtain the evaluation type of the target shadow comment output using the target recognition model to which the attention layer is added.
CN201911311671.9A 2019-12-18 2019-12-18 Film review identification method and device Withdrawn CN111144103A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911311671.9A CN111144103A (en) 2019-12-18 2019-12-18 Film review identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911311671.9A CN111144103A (en) 2019-12-18 2019-12-18 Film review identification method and device

Publications (1)

Publication Number Publication Date
CN111144103A true CN111144103A (en) 2020-05-12

Family

ID=70518807

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911311671.9A Withdrawn CN111144103A (en) 2019-12-18 2019-12-18 Film review identification method and device

Country Status (1)

Country Link
CN (1) CN111144103A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597804A (en) * 2020-05-15 2020-08-28 腾讯科技(深圳)有限公司 Entity recognition model training method and related device
CN111666751A (en) * 2020-06-04 2020-09-15 北京百度网讯科技有限公司 Training text extension method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710680A (en) * 2018-05-18 2018-10-26 哈尔滨理工大学 It is a kind of to carry out the recommendation method of the film based on sentiment analysis using deep learning
CN108733652A (en) * 2018-05-18 2018-11-02 大连民族大学 The test method of film review emotional orientation analysis based on machine learning
US10410224B1 (en) * 2014-03-27 2019-09-10 Amazon Technologies, Inc. Determining item feature information from user content
CN110489744A (en) * 2019-07-25 2019-11-22 腾讯科技(深圳)有限公司 A kind of processing method of corpus, device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10410224B1 (en) * 2014-03-27 2019-09-10 Amazon Technologies, Inc. Determining item feature information from user content
CN108710680A (en) * 2018-05-18 2018-10-26 哈尔滨理工大学 It is a kind of to carry out the recommendation method of the film based on sentiment analysis using deep learning
CN108733652A (en) * 2018-05-18 2018-11-02 大连民族大学 The test method of film review emotional orientation analysis based on machine learning
CN110489744A (en) * 2019-07-25 2019-11-22 腾讯科技(深圳)有限公司 A kind of processing method of corpus, device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
侯艳辉等: "基于本体特征的影评细粒度情感分类", 《计算机应用》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597804A (en) * 2020-05-15 2020-08-28 腾讯科技(深圳)有限公司 Entity recognition model training method and related device
CN111666751A (en) * 2020-06-04 2020-09-15 北京百度网讯科技有限公司 Training text extension method, device, equipment and storage medium
CN111666751B (en) * 2020-06-04 2023-09-29 北京百度网讯科技有限公司 Training text expansion method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
US11645517B2 (en) Information processing method and terminal, and computer storage medium
CN107766371B (en) Text information classification method and device
CN111339306A (en) Classification model training method, classification device, classification equipment and medium
WO2020001106A1 (en) Classification model training method and store classification method and device
CN109255027B (en) E-commerce comment sentiment analysis noise reduction method and device
CN112164391A (en) Statement processing method and device, electronic equipment and storage medium
CN111475613A (en) Case classification method and device, computer equipment and storage medium
CN112257452B (en) Training method, training device, training equipment and training storage medium for emotion recognition model
CN104142995A (en) Social event recognition method based on visual attributes
CN112258254B (en) Internet advertisement risk monitoring method and system based on big data architecture
CN110428277B (en) Touch method of recommended product, storage medium and program product
CN116010684A (en) Article recommendation method, device and storage medium
CN113434688B (en) Data processing method and device for public opinion classification model training
CN111144103A (en) Film review identification method and device
CN113032520A (en) Information analysis method and device, electronic equipment and computer readable storage medium
CN114722810A (en) Real estate customer portrait method and system based on information extraction and multi-attribute decision
CN113935880A (en) Policy recommendation method, device, equipment and storage medium
CN106055657A (en) Evaluation system for film viewing index of specific population
CN113220847A (en) Knowledge mastering degree evaluation method and device based on neural network and related equipment
CN107783958B (en) Target statement identification method and device
CN110782221A (en) Intelligent interview evaluation system and method
CN109948665B (en) Human activity type classification method and system based on long-time and short-time memory neural network
CN114722174A (en) Word extraction method and device, electronic equipment and storage medium
GB2608112A (en) System and method for providing media content
CN113449103A (en) Bank transaction flow classification method and system integrating label and text interaction mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200512