CN107657056B - Method and device for displaying comment information based on artificial intelligence - Google Patents

Method and device for displaying comment information based on artificial intelligence Download PDF

Info

Publication number
CN107657056B
CN107657056B CN201710972692.XA CN201710972692A CN107657056B CN 107657056 B CN107657056 B CN 107657056B CN 201710972692 A CN201710972692 A CN 201710972692A CN 107657056 B CN107657056 B CN 107657056B
Authority
CN
China
Prior art keywords
comment
current
information
quality
comment information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710972692.XA
Other languages
Chinese (zh)
Other versions
CN107657056A (en
Inventor
刘建林
徐伟建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201710972692.XA priority Critical patent/CN107657056B/en
Publication of CN107657056A publication Critical patent/CN107657056A/en
Application granted granted Critical
Publication of CN107657056B publication Critical patent/CN107657056B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a method and a device for displaying comment information based on artificial intelligence. One embodiment of the method comprises: acquiring current comment information; segmenting words of the current comment information to obtain a current comment segmentation sequence; calculating a characteristic value of a preset characteristic based on the current comment word segmentation sequence and a dictionary module; inputting the characteristic value into a scoring machine learning model to obtain the quality score of the current comment information; and determining the comment information for presentation based on the quality score of the current comment information. This embodiment improves the quality of the presented review information.

Description

Method and device for displaying comment information based on artificial intelligence
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to the technical field of computer networks, and particularly relates to a method and a device for displaying comment information based on artificial intelligence.
Background
The rapid development of Artificial Intelligence (AI) technology provides convenience for people's daily work and life. Artificial intelligence is a new technical science for studying and developing theories, methods, techniques and application systems for simulating, extending and expanding human intelligence. Artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence, a field of research that includes robotics, language recognition, image recognition, natural language processing, and expert systems, among others. Artificial intelligence is increasingly integrated into applications, the types of comment information can be accurately determined by combining the applications of the artificial intelligence, and different types of comment information are pushed to corresponding users according to the requirements of different users.
With the rapid development of artificial intelligence, how to make machines possess the same intelligence as human beings becomes a thing that countless enterprises want to do. In this context, it is desirable that the machine is capable of presenting the comment information to the user with a better quality when presenting the comment information to the user.
In the current method for calculating the quality score of the comment, a commonly used solution in the industry is to extract viewpoint words and feature words, judge emotional tendency, evaluate the comment quality of the comment information by combining the emotional tendency and the viewpoint features, and display the comment information according to the evaluation result.
Disclosure of Invention
The embodiment of the application aims to provide a method and a device for displaying comment information based on artificial intelligence.
In a first aspect, an embodiment of the present application provides a method for displaying comment information based on artificial intelligence, where the method includes: acquiring current comment information; segmenting words of the current comment information to obtain a current comment segmentation sequence; calculating a characteristic value of a preset characteristic based on the current comment word segmentation sequence and a dictionary module; inputting the characteristic value into a scoring machine learning model to obtain the quality score of the current comment information; and determining the comment information for presentation based on the quality score of the current comment information.
In some embodiments, the dictionary module includes: a keyword sub-dictionary comprising filter words and keywords; and/or a duplicate review sub-dictionary including duplicate review data.
In some embodiments, calculating the feature value of the preset feature based on the current comment participle sequence and the dictionary module includes: obtaining comment quality related characteristics of a hit dictionary module in a current comment word segmentation sequence; extracting comment quality correlation characteristics in the current comment participle sequence; and calculating a characteristic value of the preset characteristic based on the comment quality related characteristic and the comment quality related characteristic.
In some embodiments, the review quality associated features include one or more of: the proportion of English, number or special character string; chinese ratio; whether the contact information is contained; word granularity repetition; and the authority of the reviews.
In some embodiments, the scoring machine learning model is trained via the following steps: obtaining a comment sample marked with a quality score; extracting characteristic values of the comment samples; training the machine learning model based on the feature values and the quality scores of the comment samples.
In some embodiments, the scoring machine learning model is trained using any one of the following algorithms: random forest algorithm, support vector machine algorithm, logistic regression algorithm, Bayesian classifier and neural network algorithm.
In a second aspect, an embodiment of the present application provides an apparatus for displaying comment information based on artificial intelligence, where the apparatus includes: the comment information acquisition unit is used for acquiring current comment information; the word segmentation sequence determining unit is used for segmenting words of the current comment information to obtain a current comment word segmentation sequence; the characteristic value calculating unit is used for calculating the characteristic value of the preset characteristic based on the current comment word segmentation sequence and the dictionary module; the quality score calculating unit is used for inputting the characteristic value into the scoring machine learning model to obtain the quality score of the current comment information; and the display information determining unit is used for determining the comment information for display based on the quality score of the current comment information.
In some embodiments, the dictionary module in the feature value calculation unit includes: a keyword sub-dictionary comprising filter words and keywords; and/or a duplicate review sub-dictionary including duplicate review data.
In some embodiments, the feature value calculation unit includes: the relevant characteristic obtaining subunit is used for obtaining the comment quality relevant characteristics of the hit dictionary module in the current comment word segmentation sequence; the associated feature extraction subunit is used for extracting the comment quality associated features in the current comment participle sequence; and the characteristic value operator unit is used for calculating the characteristic value of the preset characteristic based on the comment quality related characteristic and the comment quality related characteristic.
In some embodiments, the review quality associated features extracted by the associated feature extraction subunit include one or more of: the proportion of English, number or special character string; chinese ratio; whether the contact information is contained; word granularity repetition; and the authority of the reviews.
In some embodiments, the scoring machine learning model in the quality score calculation unit is trained via the following steps: obtaining a comment sample marked with a quality score; extracting characteristic values of the comment samples; training the machine learning model based on the feature values and the quality scores of the comment samples.
In some embodiments, the scoring machine learning model in the quality score calculating unit is trained by using any one of the following algorithms: random forest algorithm, support vector machine algorithm, logistic regression algorithm, Bayesian classifier and neural network algorithm.
In a third aspect, an embodiment of the present application provides an apparatus, including: one or more processors; storage means for storing one or more programs; when executed by one or more processors, cause the one or more processors to implement a method for presenting review information based on artificial intelligence as in any above.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor, to implement any one of the above methods for displaying comment information based on artificial intelligence.
According to the method and the device for displaying the comment information based on artificial intelligence, the current comment information is firstly obtained; then, segmenting words of the current comment information to obtain a current comment segmentation sequence; then, calculating a characteristic value of a preset characteristic based on the current comment word segmentation sequence and a dictionary module; then, inputting the characteristic value into a scoring machine learning model to obtain the quality score of the current comment information; then, based on the quality score of the current comment information, comment information for presentation is determined. In the process, the characteristic value of the preset characteristic can be effectively calculated based on the dictionary module, so that low-quality comments can be filtered, high-quality comments can be screened for display, a large amount of auditing manpower is saved, and the quality of displayed comment information is improved.
Drawings
Other features, objects and advantages of embodiments of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, with reference to the accompanying drawings in which:
FIG. 1 is a schematic flow chart diagram illustrating one embodiment of a method for presenting review information based on artificial intelligence in accordance with the present application;
FIG. 2 is a schematic flow chart diagram illustrating yet another embodiment of a method for presenting review information based on artificial intelligence in accordance with the present application;
FIG. 3 is a schematic flow chart diagram of one application scenario of a method for displaying review information based on artificial intelligence in accordance with the present application;
FIG. 4 is an exemplary block diagram of one embodiment of an apparatus for presenting review information based on artificial intelligence in accordance with the present application;
fig. 5 is a schematic block diagram of a computer system suitable for implementing the terminal device or server of the present application.
Detailed Description
The embodiments of the present application will be described in further detail with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that, in the present application, the embodiments and the features of the embodiments may be combined with each other without conflict. The embodiments of the present application will be described in detail with reference to the accompanying drawings in conjunction with the embodiments.
FIG. 1 illustrates a flow 100 of one embodiment of a method for displaying review information based on artificial intelligence in accordance with the present application. The method for displaying the comment information based on artificial intelligence comprises the following steps:
in step 110, current comment information is acquired.
In this embodiment, the current comment information acquired by the electronic device running the method for displaying comment information based on artificial intelligence may be a comment information set for a certain product or service, where the comment information set includes a plurality of pieces of comment information related to the product or service.
In step 120, the current comment information is segmented to obtain a current comment segmentation sequence.
In this embodiment, when segmenting words of current comment information, a word segmentation method in the prior art or a technology developed in the future may be adopted, which is not limited in this application. For example, word segmentation may be done based on any of the following word segmentation methods: a word segmentation method based on character string matching, a word segmentation method based on understanding, a word segmentation method based on statistics, a word segmentation method based on rules and the like.
In step 130, a feature value of the preset feature is calculated based on the current comment segmentation sequence and the dictionary module.
In this embodiment, after the current comment segmentation sequence is obtained, a feature value of the preset feature may be calculated based on the feature in the dictionary module. The dictionary module comprises the characteristics set according to the contents to be filtered and the contents to be concerned. For example, the dictionary module may include yellow words, inverse verbs, dirty words, etc. that need to be filtered, and may further include attribute words and viewpoint words, etc. that need to be focused. In one optional example of the dictionary module, the dictionary module may include a keyword sub-dictionary that may include filter words that need to be filtered and/or keywords that need attention. Alternatively or additionally, in this alternative example, the dictionary module may include a duplicate comment sub-dictionary that requires attention, which may include duplicate comment data. The comment-duplicate data is data in which comment-duplicate information is described. For example, the repeated comment data may include a message digest identifier of the comment, an object of the comment, a time of the comment, an identity identifier of the reviewer, a content of the comment, and the like, and when the repeated comment data is hit by the message digest identifier of the current comment information, the current comment information may be added to the repeated comment data, so as to update the repeated comment sub-dictionary, and then the low-quality irrigated comment in the current comment is identified according to the repeated comment data. Further, the contents to be filtered and the contents to be focused can be classified and set according to the category of the field to which the current comment information belongs.
The preset features may be features set for the content to be filtered and the content to be paid attention to, and may include features in the dictionary module, or may include features in the dictionary module, preset features other than the dictionary module, and preset features that can be calculated by the features in the dictionary module and the preset features other than the dictionary module.
The method for calculating the feature value of the preset feature may be a method for calculating the feature value of the preset feature in the prior art or a technology developed in the future, and the method is not limited in this application. For example, the feature value of the preset feature may be calculated based on a word frequency-inverse document frequency (TF-IDF) algorithm, or may be calculated based on a preset rule for calculating the feature value.
For example, when calculating the feature value of the preset feature, the feature value of the preset feature may include, but is not limited to, the values of the following features: whether the words contain advertising words, the number of the words containing attributes, the number of the words containing emotions, the word granularity repetition degree, the proportion of English number special character strings, the contact information, the question sentences, the repetition rate of comment contents, the Chinese character proportion, the comment authority degree and the like.
In step 140, the feature values are input into the scoring machine learning model to obtain the quality score of the current comment information.
In this embodiment, the scoring machine learning model is a model trained in advance, which can automatically analyze and obtain a rule between the feature value and the quality score of the comment sample data from the comment sample data, and predict an unknown feature value by using the rule.
In an optional implementation manner of the present embodiment, the scoring machine learning model is trained through the following steps: obtaining a comment sample marked with a quality score; extracting characteristic values of the comment samples; training the machine learning model based on the feature values and the quality scores of the comment samples.
In this implementation manner, the scoring machine learning model may learn a rule between the comment sample and the quality score from the comment sample, and further obtain a rule between the feature value of the comment sample and the quality score from the feature value extracted from the comment sample, so that the quality score of the comment can be inferred from the feature value of the comment by using the rule.
The above scoring machine learning model may be obtained by training a classification algorithm in the prior art or in a technology developed in the future by using a comment sample labeled with a quality score, which is not limited in this application. For example, a comment sample labeled with a quality score may be used to train any one of the following algorithms: random forest algorithm, support vector machine algorithm, logistic regression algorithm, Bayesian classifier and neural network algorithm.
In step 150, the review information for presentation is determined based on the quality score of the current review information.
In this embodiment, after obtaining the quality score of the current comment information based on the scoring machine learning model, the quality score may be used as an important factor for ranking the current comment information, so that the current comment information ranked earlier is determined as the comment information for presentation.
According to the method for displaying the comment information based on the artificial intelligence, the feature value of the preset feature which needs to be filtered and/or needs to be focused in the current comment information can be calculated, then the feature value is input into a pre-trained scoring machine learning model, the quality score of the current comment information is deduced, and the comment information used for displaying is determined based on the quality score of the current comment information, so that the accuracy rate of the quality score of the current comment information is improved.
Further, referring to fig. 2, fig. 2 shows a schematic flow diagram of yet another embodiment of a method for presenting review information based on artificial intelligence according to the present application.
As shown in fig. 2, the method 200 for generating comment information based on artificial intelligence presentation includes:
in step 210, current comment information is acquired.
In this embodiment, the current comment information acquired by the electronic device running the method for displaying comment information based on artificial intelligence may be a comment information set for a certain product or service, where the comment information set includes a plurality of pieces of comment information related to the product or service.
In step 220, the current comment information is participled to obtain a current comment participle sequence.
In this embodiment, when segmenting words of current comment information, a word segmentation method in the prior art or a technology developed in the future may be adopted, which is not limited in this application. For example, word segmentation may be done based on any of the following word segmentation methods: a word segmentation method based on character string matching, a word segmentation method based on understanding, a word segmentation method based on statistics, a word segmentation method based on rules and the like.
In step 230, comment quality related features of the hit dictionary module in the current comment participle sequence are obtained.
In this embodiment, the dictionary module herein includes features set according to the content to be filtered and the content to be focused. For example, the dictionary module may include yellow words, inverse verbs, dirty words, etc. that need to be filtered, and may further include attribute words and viewpoint words, etc. that need to be focused. In one optional example of the dictionary module, the dictionary module may include a keyword sub-dictionary that may include filter words that need to be filtered and/or keywords that need attention. Alternatively or additionally, in this alternative example, the dictionary module may include a duplicate comment sub-dictionary that requires attention, which may include duplicate comment data. The comment-duplicate data is data in which comment-duplicate information is described. For example, the repeated comment data may include a message digest identifier of the comment, an object of the comment, a time of the comment, an identity identifier of the reviewer, a content of the comment, and the like, and when the repeated comment data is hit by the message digest identifier of the current comment information, the current comment information may be added to the repeated comment data, so as to update the repeated comment sub-dictionary, and then the low-quality irrigated comment in the current comment is identified according to the repeated comment data. Further, the contents to be filtered and the contents to be focused can be classified and set according to the category of the field to which the current comment information belongs.
If the participle in the current comment participle sequence hits the characteristics included in the dictionary module, the participle is the participle needing to be filtered or concerned, the participle can be determined as the comment quality related characteristics needing to be filtered or concerned.
In step 240, the comment quality association feature in the current comment participle sequence is extracted.
In this embodiment, in the current comment segmentation sequence, besides the features needing filtering or attention marked in the dictionary module, some features related to the comment quality are also needed to be paid attention, such as whether the contact information is included, the word granularity repetition degree in the comment, the authority degree of the comment, and the like. For the comment quality related characteristics needing attention, the comment quality related characteristics are extracted for subsequent use in judging the quality score of the current comment information.
In some optional implementations of the present embodiment, the review quality associated features include one or more of: the proportion of English, number or special character string; chinese ratio; whether the contact information is contained; word granularity repetition; and the authority of the reviews.
In this implementation, the chinese proportion refers to a proportion of chinese, and in a specific implementation, the chinese proportion may be obtained by dividing the comment length by a set value (e.g., 120 or other set integer value), and the length proportion value may be modified accordingly in consideration of the number of punctuation marks and the number of english.
Here, whether the contact address is included may include: whether the phone number is contained, the number of uniform resource locators (urls) is contained, whether the instant messaging number is contained, whether the email box is contained, whether the phone number is contained after processing, whether the instant messaging number is contained after processing, whether the email box is contained after processing, and the like.
Here, the authority of the comment may include authority of the content of the comment and/or authority of the length of the comment. The content authority is the proportion of real words, and the real words are words with practical meanings such as nouns and adjectives. The authority of the comment length may be positively correlated with the length of the comment, for example, the longer the comment length is, the higher the authority of the corresponding comment length is. When the authority of the comment can include the authority of the comment content and the authority of the comment length, a fusion method can be adopted to fuse the authority of the comment content and the authority of the comment length, for example, a weighting method is adopted to fuse the authority of the comment content and the authority of the comment length, and the authority of the comment is obtained.
In step 250, a feature value of the preset feature is calculated based on the review quality related feature and the review quality related feature.
In this embodiment, the preset features are features set according to the content to be filtered and the content to be paid attention to, and may include features in the dictionary module, such as filter words, keywords, and the like, and may further include preset features other than the dictionary module, such as whether to include a contact, and also include preset features that can be calculated by the features in the dictionary module and the preset features other than the dictionary module, such as whether to include advertisement words, the number of included attribute words, the number of included emotion words, the number of times that comment content appears repeatedly in the whole comment library, and the number of entity names covered by the comment content.
In step 260, the feature values are input into the scoring machine learning model to obtain the quality score of the current comment information.
In this embodiment, the scoring machine learning model is a model trained in advance, which can automatically analyze and obtain a rule between the feature value and the quality score of the comment sample data from the comment sample data, and predict an unknown feature value by using the rule.
In step 270, the review information for presentation is determined based on the quality score of the current review information.
In this embodiment, after obtaining the quality score of the current comment information based on the scoring machine learning model, the quality score may be used as an important factor for ranking the current comment information, so that the current comment information ranked earlier is determined as the comment information for presentation.
It should be understood by those skilled in the art that steps 210, 220, 260, 270 of the embodiment shown in fig. 2 correspond to steps 110, 120, 140, 150 of the embodiment shown in fig. 1, respectively, and thus, the operations and features described above for steps 110, 120, 140, 150 of the method 100 for displaying comment information based on artificial intelligence in fig. 1 are also applicable to steps 210, 220, 260, 270 of the method 200 for displaying comment information based on artificial intelligence, and are not described in detail here.
According to the method for displaying the comment information based on the artificial intelligence, when the feature value of the preset feature is calculated based on the current comment participle sequence and the dictionary module, the comment quality related feature hitting the dictionary module is obtained, the comment quality related feature in the current comment participle sequence is extracted, finally, the feature value of the preset feature is calculated based on the comment quality related feature and the comment quality related feature, namely, the quality score of the current comment information is calculated based on the features which need to be filtered and need to be concerned in many aspects, so that the accuracy of calculating the quality score of the current comment information is improved, and the quality of the displayed comment information is improved.
An exemplary application scenario of the artificial intelligence based comment information presentation method of the present application is described below in conjunction with fig. 3.
As shown in FIG. 3, FIG. 3 illustrates a schematic flow chart of one application scenario of an artificial intelligence based method of presenting review information in accordance with the present application.
As shown in fig. 3, a method 300 for displaying comment information based on artificial intelligence is executed in an electronic device 320, and includes:
first, current comment information 301 is acquired, for example, current comment information including the following comment information is acquired:
firstly, the system is very hot, has a fairly public price, has good service attitude of merchants, has good soft and hard environment, is only a plurality of tourists, has few personnel for security and maintenance, and almost does not see the personnel, and the work in the aspect is enhanced to prevent emergencies. "
Secondly, when the person goes for a short time, the sea water and sand are good, but the construction of the supporting facilities is incomplete, and in addition, the service personnel are the biggest patrolling of beautiful scenic spots. "
③ Church building which is the best known in Tianjin, Byzantine style, which is built in the 10 th century. The world is still the largest place for astronomical education activities, the world can be disseminated on time in the daytime, religious weddings can be dealt with, and the world is a place where young people gather, namely a mountain and sea. "
④“Fjjhgghuuyyhjjhyyyyyyy5(;hhiiiffhjikjjjjhhhhjjjjhggggbjjygh”
(xi) 'traveling farmhouse happy shop 1851222 XXXX'
Sixthly, the fifteen families with the orderly fifteen characters like that of seeing the orderly fifteen characters like that of saying that the fifteen characters are the most standard "
Then, segmenting words of the current comment information 301 to obtain a current comment segmentation sequence 302, for example, segmenting words of the current comment information including the comment information (i) - (c), to obtain current comment segmentation sequences of each current comment information;
then, the current comment participle sequence 302 is compared with the dictionary module 303 to obtain the features of the hit dictionary module 303 in the current comment participle sequence 302, and the hit features are used as comment quality related features 304. At this time, the dictionary module includes a keyword sub-dictionary and a repeated comment sub-dictionary, where the keyword sub-dictionary includes filter words and keywords, and taking the travel field as an example, the filter words may include yellow words, inverse verbs, and dirty words, and include: "adult drawings", "knocked down", "Wangbai eggs", etc.; the keywords may include: "environment", "tourist", "sea water", "scenic spots", "price", "building", "style", "location", etc.; the repeated comment sub-dictionary includes repeated comment data such as message digest identification of a comment that "fifteen people who have seen such regular fifteen words like they all say that fifteen words are the most standard", subject of the comment, time of the comment, identity of the reviewer, comment content, and so on.
Then, the comment quality related features 305 in the current comment participle sequence 302 are extracted, for example, the proportion of english, numbers or special character strings extracted from the current comment participle sequence, the proportion of chinese extracted, whether the extraction includes a contact, the granularity repetition of extracted words, the authority of extracted comments, and the like.
Then, based on the comment quality related feature 304 and the comment quality related feature 305, a feature value 306 of a preset feature is calculated, where the preset feature may include features in the dictionary module 303, such as filter words, keywords, and the like, and may also include preset features outside the dictionary module 303, such as whether to include contact information, and may also include preset features that may be calculated by the features in the dictionary module and the preset features outside the dictionary module, such as whether to include advertisement words, the number of included attribute words, the number of included emotion words, the number of repeated occurrences of the comment content in the whole comment library, the number of entity names covered by the comment content, the maximum number of times the comment content appears under a certain entity name, the number of commentators covered by the comment content, the maximum number of times the comment content is commented by a certain commentator, The maximum span (number of weeks) of review time covered by the review content, the maximum number of times that the review time covered by the review content appears within a certain week, and the like.
Then, the feature value 306 is input into the scoring machine learning model 307, and the quality score 308 of the current comment information is obtained. For example, the quality scores of the above comments (r) -sixth are obtained as follows:
firstly, the system is very hot, has a fairly public price, has good service attitude of merchants, has good soft and hard environment, is only a plurality of tourists, has few personnel for security and maintenance, and almost does not see the personnel, and the work in the aspect is enhanced to prevent emergencies. "quality score: 0.974417
Secondly, when the person goes for a short time, the sea water and sand are good, but the construction of the supporting facilities is incomplete, and in addition, the service personnel are the biggest patrolling of beautiful scenic spots. "quality score: 0.922083
③ Church building which is the best known in Tianjin, Byzantine style, which is built in the 10 th century. The world is still the largest place for astronomical education activities, the world can be disseminated on time in the daytime, religious weddings can be dealt with, and the world is a place where young people gather, namely a mountain and sea. "quality score: 0.701083
'Fjjhghuuyyhjhyyyyyyyyyy 5 (; hhiiffhjikjjjjhhhjjjjjhgggbjyggh' quality score is 0.055500
"travel farmhouse happy shop 1851222 XXXX" quality score: 0.038083
Sixthly, the fifteen families with the orderly fifteen characters like that you see the orderly fifteen characters prefer that all the fifteen characters are spoken to be the most standard' quality score: 0.061250
Finally, based on the quality score 308 of the current review information, review information 309 for presentation is determined. For example, of the above comments (r) - (c), the comment (r) - (c) with a higher quality score may be determined as comment information for presentation.
It should be understood that the method for displaying comment information based on artificial intelligence shown in fig. 3 is only an exemplary embodiment of the method for displaying comment information based on artificial intelligence, and does not represent a limitation on the embodiment of the present application. For example, in fig. 3, the comment quality related feature may not be extracted, but a feature value may be calculated directly from the comment quality related feature of the hit dictionary module, a quality score of the current comment information may be determined according to the feature value, and finally the comment information for presentation may be determined based on the quality score.
The method for displaying the comment information based on the artificial intelligence, provided in the application scene, can improve the quality of the comment information for displaying.
Further referring to fig. 4, as an implementation of the above method, the present application provides an embodiment of an apparatus for displaying comment information based on artificial intelligence, where the embodiment of the apparatus for displaying comment information based on artificial intelligence corresponds to the embodiment of the method for displaying comment information based on artificial intelligence shown in fig. 1 to 3, and thus, the operations and features described above for the method for displaying comment information based on artificial intelligence in fig. 1 to 3 are also applicable to the apparatus 400 for displaying comment information based on artificial intelligence and units included in the apparatus, and are not described again here.
As shown in fig. 4, the apparatus 400 for displaying comment information based on artificial intelligence includes: a comment information acquisition unit 410, a participle sequence determination unit 420, a feature value calculation unit 430, a quality score calculation unit 440, and a presentation information determination unit 450.
The comment information acquiring unit 410 is configured to acquire current comment information; a word segmentation sequence determining unit 420, configured to segment words of the current comment information to obtain a current comment word segmentation sequence; the feature value calculating unit 430 is configured to calculate a feature value of a preset feature based on the current comment segmentation sequence and the dictionary module; the quality score calculating unit 440 is configured to input the feature value into the scoring machine learning model to obtain a quality score of the current comment information; the display information determining unit 450 is configured to determine the comment information for display based on the quality score of the current comment information.
In some optional implementations of this embodiment, the dictionary module in the feature value calculation unit includes: a keyword sub-dictionary comprising filter words and keywords; and/or a duplicate review sub-dictionary including duplicate review data.
In some optional implementations of the present embodiment, the feature value calculating unit 430 includes: a relevant feature obtaining subunit 431, configured to obtain a comment quality relevant feature of a hit dictionary module in the current comment word segmentation sequence; the associated feature extraction subunit 432 is configured to extract a comment quality associated feature in the current comment participle sequence; and the feature value operator unit 433 is configured to calculate a feature value of the preset feature based on the comment quality related feature and the comment quality related feature.
In some optional implementations of the present embodiment, the comment quality associated feature extracted by the associated feature extraction subunit includes one or more of the following: the proportion of English, number or special character string; chinese ratio; whether the contact information is contained; word granularity repetition; and the authority of the reviews.
In some optional implementations of the present embodiment, the scoring machine learning model in the quality score calculating unit is trained through the following steps: obtaining a comment sample marked with a quality score; extracting characteristic values of the comment samples; training the machine learning model based on the feature values and the quality scores of the comment samples.
In some optional implementations of the present embodiment, the scoring machine learning model in the quality score calculating unit is obtained by training using any one of the following algorithms: random forest algorithm, support vector machine algorithm, logistic regression algorithm, Bayesian classifier and neural network algorithm.
The present application further provides an embodiment of an apparatus, comprising: one or more processors; storage means for storing one or more programs; when executed by one or more processors, cause the one or more processors to implement a method for displaying review information based on artificial intelligence as described in any of the above.
The present application further provides an embodiment of a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the artificial intelligence based method of presenting review information as described in any of the above.
Referring now to FIG. 5, a block diagram of a computer system 500 suitable for use in implementing a terminal device or server of an embodiment of the present application is shown. The terminal device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The computer program performs the above-described functions defined in the method of the embodiment of the present application when executed by the Central Processing Unit (CPU) 501.
It should be noted that the computer readable medium described in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present application, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a unit, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a comment information acquisition unit, a word segmentation sequence determination unit, a feature value calculation unit, a quality score calculation unit, and a presentation information determination unit, the names of which do not constitute a limitation on the unit itself in some cases, for example, the comment information acquisition unit may also be described as a "unit that acquires current comment information".
As another aspect, an embodiment of the present application further provides a non-volatile computer storage medium, where the non-volatile computer storage medium may be a non-volatile computer storage medium included in the apparatus in the foregoing embodiment; or it may be a non-volatile computer storage medium that exists separately and is not incorporated into the terminal. The non-transitory computer storage medium stores one or more programs that, when executed by a device, cause the device to: acquiring current comment information; segmenting words of the current comment information to obtain a current comment segmentation sequence; calculating a characteristic value of a preset characteristic based on the current comment word segmentation sequence and a dictionary module; inputting the characteristic value into a scoring machine learning model to obtain the quality score of the current comment information; and determining the comment information for presentation based on the quality score of the current comment information.
The above description is only a preferred embodiment of the embodiments of the present application and is intended to be illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention according to the embodiments of the present application is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept set forth above. For example, the above features and (but not limited to) the features with similar functions disclosed in the embodiments of the present application are mutually replaced to form the technical solution.

Claims (10)

1. A method for displaying comment information based on artificial intelligence is characterized by comprising the following steps:
acquiring current comment information;
segmenting words of the current comment information to obtain a current comment segmentation sequence;
calculating a feature value of a preset feature based on the current comment participle sequence and dictionary module, wherein the feature value comprises the following steps: obtaining comment quality related characteristics of a hit dictionary module in the current comment participle sequence; extracting comment quality correlation characteristics in the current comment participle sequence; calculating a feature value of a preset feature based on the comment quality related feature and the comment quality related feature; the dictionary module includes: a keyword sub-dictionary comprising filter words and keywords, and/or a repeated comment sub-dictionary comprising repeated comment data; the filtering words and the keywords are classified and set according to the category of the field to which the current comment information belongs;
inputting the characteristic value into a scoring machine learning model to obtain a quality score of the current comment information;
and determining the comment information for display based on the quality score of the current comment information.
2. The method of claim 1, wherein the review quality associated features include one or more of:
the proportion of English, number or special character string;
chinese ratio;
whether the contact information is contained;
word granularity repetition; and
authority of the review.
3. The method of claim 1, wherein the scoring machine learning model is trained by:
obtaining a comment sample marked with a quality score;
extracting characteristic values of the comment samples;
training a machine learning model based on the feature values and the quality scores of the comment samples.
4. The method of claim 1 or 3, wherein the scoring machine learning model is trained by using any one of the following algorithms:
random forest algorithm, support vector machine algorithm, logistic regression algorithm, Bayesian classifier and neural network algorithm.
5. An apparatus for displaying comment information based on artificial intelligence, the apparatus comprising:
the comment information acquisition unit is used for acquiring current comment information;
the word segmentation sequence determining unit is used for segmenting words of the current comment information to obtain a current comment word segmentation sequence;
the characteristic value calculating unit is used for calculating the characteristic value of the preset characteristic based on the current comment word segmentation sequence and the dictionary module; the dictionary module includes: a keyword sub-dictionary comprising filter words and keywords, and/or a repeated comment sub-dictionary comprising repeated comment data; the filtering words and the keywords are classified and set according to the category of the field to which the current comment information belongs;
the quality score calculating unit is used for inputting the characteristic value into a scoring machine learning model to obtain the quality score of the current comment information;
the display information determining unit is used for determining comment information for display based on the quality score of the current comment information;
the feature value calculation unit includes: the relevant characteristic obtaining subunit is used for obtaining the comment quality relevant characteristics of the hit dictionary module in the current comment word segmentation sequence; the associated feature extraction subunit is used for extracting the comment quality associated features in the current comment participle sequence; and the characteristic value operator unit is used for calculating the characteristic value of the preset characteristic based on the comment quality related characteristic and the comment quality related characteristic.
6. The apparatus of claim 5, wherein the review quality associated features extracted by the associated feature extraction subunit include one or more of:
the proportion of English, number or special character string;
chinese ratio;
whether the contact information is contained;
word granularity repetition; and
authority of the review.
7. The apparatus of claim 5, wherein the scoring machine learning model in the quality score calculation unit is trained by:
obtaining a comment sample marked with a quality score;
extracting characteristic values of the comment samples;
training a machine learning model based on the feature values and the quality scores of the comment samples.
8. The apparatus according to claim 5 or 7, wherein the scoring machine learning model in the quality score calculating unit is trained by using any one of the following algorithms:
random forest algorithm, support vector machine algorithm, logistic regression algorithm, Bayesian classifier and neural network algorithm.
9. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the method for presenting review information based on artificial intelligence of any of claims 1-4.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of presenting review information based on artificial intelligence according to any one of claims 1 to 4.
CN201710972692.XA 2017-10-18 2017-10-18 Method and device for displaying comment information based on artificial intelligence Active CN107657056B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710972692.XA CN107657056B (en) 2017-10-18 2017-10-18 Method and device for displaying comment information based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710972692.XA CN107657056B (en) 2017-10-18 2017-10-18 Method and device for displaying comment information based on artificial intelligence

Publications (2)

Publication Number Publication Date
CN107657056A CN107657056A (en) 2018-02-02
CN107657056B true CN107657056B (en) 2022-02-18

Family

ID=61118900

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710972692.XA Active CN107657056B (en) 2017-10-18 2017-10-18 Method and device for displaying comment information based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN107657056B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108550054B (en) * 2018-04-12 2022-10-14 百度在线网络技术(北京)有限公司 Content quality evaluation method, device, equipment and medium
CN110619070B (en) * 2018-06-04 2022-05-10 北京百度网讯科技有限公司 Article generation method and device
CN108920611B (en) * 2018-06-28 2019-10-01 北京百度网讯科技有限公司 Article generation method, device, equipment and storage medium
CN109471981B (en) * 2018-11-06 2021-05-25 北京达佳互联信息技术有限公司 Comment information sorting method and device, server and storage medium
CN111414561B (en) 2019-01-08 2023-07-28 百度在线网络技术(北京)有限公司 Method and device for presenting information
CN110414819B (en) * 2019-07-19 2023-05-26 中国电信集团工会上海市委员会 Work order scoring method
CN111159342A (en) * 2019-12-26 2020-05-15 北京大学 Park text comment emotion scoring method based on machine learning
CN111597409A (en) * 2020-04-29 2020-08-28 北京七麦智投科技有限公司 Malicious comment identification method and device
CN111581975B (en) * 2020-05-09 2023-06-20 北京明朝万达科技股份有限公司 Method and device for processing written text of case, storage medium and processor
CN113254824B (en) * 2021-05-14 2024-04-19 北京百度网讯科技有限公司 Content determination method, device, medium, and program product

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793503A (en) * 2014-01-24 2014-05-14 北京理工大学 Opinion mining and classification method based on web texts
CN105550269A (en) * 2015-12-10 2016-05-04 复旦大学 Product comment analyzing method and system with learning supervising function
CN106557948A (en) * 2016-10-18 2017-04-05 李超 A kind of methods of exhibiting and device of review information

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9626385B2 (en) * 2001-08-31 2017-04-18 Margaret Runchey Semantic model of everything recorded with ur-url combination identity-identifier-addressing-indexing method, means, and apparatus
US7453992B2 (en) * 2005-04-14 2008-11-18 International Business Machines Corporation System and method for management of call data using a vector based model and relational data structure
CN104142948A (en) * 2013-05-09 2014-11-12 富士通株式会社 Method and equipment for mining domain review leader
KR102156442B1 (en) * 2013-12-19 2020-09-16 한국전자통신연구원 Processing Method For Social Media Issue and Server Device supporting the same
CN103903164B (en) * 2014-03-25 2017-06-06 华南理工大学 Semi-supervised aspect extraction method and its system based on realm information
CN104281645B (en) * 2014-08-27 2017-06-16 北京理工大学 A kind of emotion critical sentence recognition methods interdependent based on lexical semantic and syntax
CN106649434B (en) * 2016-09-06 2020-10-13 北京蓝色光标品牌管理顾问股份有限公司 Cross-domain knowledge migration label embedding method and device
CN107220352B (en) * 2017-05-31 2020-12-08 北京百度网讯科技有限公司 Method and device for constructing comment map based on artificial intelligence

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793503A (en) * 2014-01-24 2014-05-14 北京理工大学 Opinion mining and classification method based on web texts
CN105550269A (en) * 2015-12-10 2016-05-04 复旦大学 Product comment analyzing method and system with learning supervising function
CN106557948A (en) * 2016-10-18 2017-04-05 李超 A kind of methods of exhibiting and device of review information

Also Published As

Publication number Publication date
CN107657056A (en) 2018-02-02

Similar Documents

Publication Publication Date Title
CN107657056B (en) Method and device for displaying comment information based on artificial intelligence
CN110175325B (en) Comment analysis method based on word vector and syntactic characteristics and visual interaction interface
CN107491534B (en) Information processing method and device
CN107679039B (en) Method and device for determining statement intention
CN106649818B (en) Application search intention identification method and device, application search method and server
CN107346336B (en) Information processing method and device based on artificial intelligence
US20230237328A1 (en) Information processing method and terminal, and computer storage medium
CN106682169B (en) Application label mining method and device, application searching method and server
CN109543058B (en) Method, electronic device, and computer-readable medium for detecting image
CN108984530A (en) A kind of detection method and detection system of network sensitive content
CN108363790A (en) For the method, apparatus, equipment and storage medium to being assessed
CN104503998B (en) For the kind identification method and device of user query sentence
CN111212303B (en) Video recommendation method, server and computer-readable storage medium
CN109697239B (en) Method for generating teletext information
CN106682170B (en) Application search method and device
CN110309114B (en) Method and device for processing media information, storage medium and electronic device
CN110929038A (en) Entity linking method, device, equipment and storage medium based on knowledge graph
US11651015B2 (en) Method and apparatus for presenting information
CN112699645B (en) Corpus labeling method, apparatus and device
CN107862058B (en) Method and apparatus for generating information
CN109325124A (en) A kind of sensibility classification method, device, server and storage medium
CN110990563A (en) Artificial intelligence-based traditional culture material library construction method and system
CN112069312A (en) Text classification method based on entity recognition and electronic device
CN113590810A (en) Abstract generation model training method, abstract generation device and electronic equipment
CN107291774A (en) Error sample recognition methods and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant