CN115438667A - Information processing method and device, electronic equipment and storage medium - Google Patents

Information processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115438667A
CN115438667A CN202211268580.3A CN202211268580A CN115438667A CN 115438667 A CN115438667 A CN 115438667A CN 202211268580 A CN202211268580 A CN 202211268580A CN 115438667 A CN115438667 A CN 115438667A
Authority
CN
China
Prior art keywords
information
comment
preset
positive
negative
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211268580.3A
Other languages
Chinese (zh)
Inventor
何泊宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202211268580.3A priority Critical patent/CN115438667A/en
Publication of CN115438667A publication Critical patent/CN115438667A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses an information processing method, an information processing device, electronic equipment and a storage medium. The information processing method comprises the following steps: obtaining user comment information according to a community text access interface; determining the emotion score of the user comment information according to a preset comment analysis model; and calling a preset information processing strategy according to the emotion score. According to the embodiment of the invention, the emotion scores of the comment information of the user are determined through the preset comment model, the preset information processing strategy is called according to the emotion scores of the comment information of the user, the comment information of the user with different emotional tendencies is rapidly processed, the accuracy of auditing the comment information of the user is improved, and the use experience of the user is improved.

Description

Information processing method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to an information processing method and apparatus, an electronic device, and a storage medium.
Background
From the beginning of the development of the internet, forums are gradually growing. At present, discussions about various practical problems in the community are endless, and users express their opinions in a posting and replying manner, and the opinions are generally subjective emotions. As more posts are made, how to review forum content becomes a problem.
The existing natural language emotion analysis library basically has good effect on English, rarely aims at Chinese, and Chinese language has large difference from English in sentence pattern composition and word segmentation, so that the error is large when emotion analysis is carried out. Therefore, a method capable of simply, quickly and accurately processing information becomes a problem to be solved at present.
Disclosure of Invention
The invention provides an information processing method, an information processing device, electronic equipment and a storage medium, which are used for realizing the rapid processing of comment information of a user, improving the accuracy of examining and verifying the comment information of the user and improving the use experience of the user.
According to an aspect of the present invention, there is provided an information processing method, wherein the method includes:
obtaining user comment information according to a community text access interface;
determining the emotion score of the comment information of the user according to a preset comment analysis model;
and calling a preset information processing strategy according to the emotion scores.
According to another aspect of the present invention, there is provided an information processing apparatus, wherein the apparatus includes:
the information acquisition module is used for acquiring user comment information according to the community text access interface;
the emotion recognition module is used for determining the emotion score of the comment information of the user according to a preset comment analysis model;
and the processing execution module is used for calling a preset information processing strategy according to the emotion scores.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor, the computer program being executed by the at least one processor to enable the at least one processor to perform the information processing method of any of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer-readable storage medium storing computer instructions for causing a processor to implement an information processing method according to any one of the embodiments of the present invention when the computer instructions are executed.
According to the technical scheme, the user comment information is obtained through the community text access interface, the emotion score of the user comment information is determined according to the preset comment analysis model, the preset information processing strategy is called according to the emotion score, the user comment information is processed according to the emotion tendency of the user comment information, and the use experience of a user is improved.
It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present invention, nor are they intended to limit the scope of the invention. Other features of the present invention will become apparent from the following description.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of an information processing method according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for training a pre-set comment analysis model according to a second embodiment of the present invention;
FIG. 3 is a flowchart of a method for training a pre-set comment analysis model according to a third embodiment of the present invention;
fig. 4 is a flowchart of an information processing method according to a fourth embodiment of the present invention;
fig. 5 is a schematic structural diagram of an information processing apparatus according to a fifth embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device implementing an information processing method according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example one
Fig. 1 is a flowchart of an information processing method according to an embodiment of the present invention, where the embodiment is applicable to a case of processing community user comment information, the method may be executed by an information processing apparatus, the information processing apparatus may be implemented in a form of hardware and/or software, and the information processing apparatus may be configured in an electronic device, where the electronic device may include, but is not limited to, a computer, a mobile phone, and the like. As shown in fig. 1, the method includes:
and S110, obtaining user comment information according to the community text access interface.
The community text access Interface may refer to an Interface for accessing community web pages to collect user comment information, and may include, but is not limited to, an Application Programming Interface (API); the user comment information may be comment information in which the user has subjective emotion, and may be information expressing his/her own viewpoint with respect to a thing to be commented on.
In the embodiment of the invention, the electronic equipment can acquire the source code of the community website through the community text access interface and extract the user comment information by analyzing the source code. In the actual operation process, a request for obtaining user comment information can be sent to a server through an Application Program Interface (API) through a hypertext Transfer Protocol (HTTP), data of community web pages are searched, and user comment information of all posts in a community is obtained; or, the user comment information can be extracted by means of a Python crawler. According to the technical scheme, the data acquisition, storage, use, processing and the like meet relevant regulations of national laws and regulations.
And S120, determining the emotion score of the comment information of the user according to a preset comment analysis model.
The preset comment analysis model can be a model for scoring the emotion of the comment information of the user and can be used for analyzing the emotion in the comment information of the user; the emotion scores can be used for representing emotional tendency of the users when the comment information is published, namely, the emotion scores carried by the users in the comment process.
In the embodiment of the invention, sentiment analysis can be performed on the user comment information according to the preset comment analysis model, and the sentiment analysis can comprise the processes of analyzing, processing, inducing and reasoning the user comment information. Because the user comment information expresses various emotional colors and emotional tendencies of the user, such as criticism, praise and the like, the user comment information can be subjected to emotion scoring, and the emotional conditions of the user can be displayed more intuitively. In an embodiment, the preset comment analysis model may be constructed based on a SnowNLP model, and the emotion scores of the comment information of the user may be divided into different gears by the preset comment analysis model to show the emotion condition of the user. Illustratively, the sentiment scores may be divided into 10 steps of 1 to 10, 1 being very negative, 10 being very positive, 2-5 being relatively negative, and 6-9 being relatively positive, and according to the different steps, the sentiment scores of the comment information of the user may be presented.
And S130, calling a preset information processing strategy according to the emotion scores.
The preset information processing strategy can be a processing method adopted by user comment information with different emotion scores, and the preset information processing strategy can be preset by community webpage managers and can include, but is not limited to, filtering negative comments and keeping positive comments.
In some embodiments, the preset information processing policy includes at least one of: deleting the user comment information; sending the user comment information to an auditor for auditing; and displaying the comment information of the user on the top.
In the embodiment of the present invention, the preset information processing policies may include multiple types, and different preset information processing policies may be invoked according to different emotion scores. In the actual operation process, the very negative user comment information can be deleted, the very positive user comment information is displayed on the top, the comparatively negative user comment information and the comparatively positive user comment information are sent to an auditor for auditing, the user comment information is retained if the auditing is passed, and the user comment information is deleted if the auditing is not passed. In an embodiment, the user comment information with the emotion score of 1 can be deleted, the user comment information with the emotion score of 10 is displayed on top, and the user comment information with the emotion score of 2-9 is audited by auditors, so that the user comment information can be reasonably displayed.
According to the embodiment of the invention, the user comment information is acquired according to the community text access interface, the emotion score of the user comment information is determined through the preset comment analysis model, the preset information processing strategy is called according to the emotion score, the user comment information with different emotion tendencies is rapidly processed, the accuracy of auditing the user comment information is improved, and the use experience of the user is improved.
In some embodiments, further comprising: and training a preset comment analysis model according to the positive information sample set and the negative information sample set, wherein the positive information sample set and the message information sample set comprise sample comments and label information of the sample comments.
The positive information sample set can be an information sample set for providing positive emotion for people, and can comprise a sample set consisting of positive user comment information; the negative information sample set may refer to an information sample set providing negative emotions to people, and may include a sample set consisting of negative user comment information. The sample comment can include a user comment information sample, and the tag information of the sample comment can be data characteristic information characterizing the sample comment, and can include, but is not limited to, an emotion tag. Illustratively, the tag information of the sample review may include, but is not limited to, positive, negative, and the like.
In an embodiment of the present invention, the preset opinion analysis model may be trained by a set of positive information samples and a set of negative information samples, which may include, but is not limited to, constructing a preset opinion model based on a SnowNLP model. In an embodiment, when the preset comment model is built through the SnowNLP model, the sample comment and the label information of the sample comment can be input into the SnowNLP model, and the preset comment analysis model is obtained through training.
Example two
Fig. 2 is a flowchart of a method for training a pre-set comment analysis model according to a second embodiment of the present invention, which is based on the second embodiment, and this embodiment further refines the training of the pre-set comment analysis model according to the positive information sample set and the negative information sample set. As shown in fig. 2, the method includes:
s210, constructing a positive information sample set and a negative information sample set according to comment text information of a preset forum.
The preset forum can be a preset site for online communication of users, and is a site for public discussion posting, and any user can post comment text information in the preset forum; the comment text information can be word information for expressing own opinions of certain events in a preset forum by a user, and can comprise positive information and negative information because the comment text information is subjective comments of the user on the events; the active information sample set may refer to a set consisting of active information as a sample; the negative information sample set may refer to a set composed of negative information as samples.
In the embodiment of the invention, the user can post the comment text information in the preset forum, and the comment text information can be divided into positive information and negative information according to the positive and negative properties of the comment text information of the preset forum. The positive information can be used as positive information samples, and a large number of positive information samples can construct a positive information sample set; the negative information can be used as a negative information sample, a large number of negative information samples can construct a positive information sample set, and the positive information sample set and the negative information sample set are respectively constructed according to the positive information and the negative information.
S220, constructing a preset comment analysis model based on the SnowNLP model.
The SnowNLP model can be a model constructed based on the SnowNLP; the preset comment analysis model may refer to a preset model for analyzing comments.
In the embodiment of the invention, a preset comment analysis model can be constructed based on the function of the SnowNLP model, so that the preset comment analysis model has the function of processing comment text information. A preset comment analysis model can be constructed according to data of the SnowNLP model by collecting information of the SnowNLP model.
And S230, training a preset comment analysis model according to the positive information sample set and the negative information sample set.
In the embodiment of the invention, the preset comment analysis model can be trained according to the active information sample set and the passive information sample set. In an embodiment, the content in the negative information sample set of the positive information sample set may be input into a preset comment analysis model, which is trained based on the functionality of the SnowNLP model. When the preset comment analysis model training reaches the function of analyzing comments, the preset comment analysis model training can be considered to be completed.
According to the embodiment of the invention, the positive information sample set and the negative information sample set are constructed according to the comment text information of the preset forum, the preset comment analysis model is constructed based on the SnowNLP model, and the preset comment analysis model is trained according to the positive information sample set and the negative information sample set, so that the training of the preset comment analysis model is realized, the effect of screening and auditing the comment information of the user is achieved by the preset comment analysis model, the accuracy of auditing the comment information of the user is improved, and the use experience of the user is improved.
EXAMPLE III
Fig. 3 is a flowchart of a method for training a preset comment analysis model according to a third embodiment of the present invention, and this embodiment is a further refinement of the method for training the preset comment analysis model on the basis of the foregoing embodiments. As shown in fig. 3, the method includes:
and S3010, respectively reading comment text information from the preset forum according to the forum module.
The forum module may refer to a unit module divided by the contents of the forum, and may include, but is not limited to, a news module, a video module, an entertainment module, and the like. The comment text information may be divided into different forum modules according to the category of the forum post.
In the embodiment of the invention, different forum modules can display forum posts corresponding to different types of forums in the preset forums, and users can issue comment text information in corresponding posts aiming at different posts. In the electronic equipment, the comment text information can be respectively read from the preset forum according to the forum module. The manner of reading the comment text information may include, but is not limited to, a manner by a web crawler. In an embodiment, a source code of a preset forum website can be crawled through a Urlilib library of a Python crawler, and the source code is analyzed to screen out comment text information in a corresponding forum module.
And S3020, performing data cleaning on each comment text message, wherein the data cleaning at least comprises stop word filtering and text word segmentation.
Data cleansing may refer to a process of reviewing and verifying data, may be used to find and correct data files, and may include, but is not limited to, stop word filtering and text segmentation. The stop words refer to characters or words which can be automatically filtered before or after text information is processed in the information retrieval process in order to save storage space and improve search efficiency, the stop words can be manually input and are not automatically generated, and the generated stop words can form a stop word list; stop word filtering may refer to a process of filtering stop words using a stop vocabulary. The text word segmentation may refer to a process of recombining continuous word sequences into word sequences according to a certain specification, and the text word segmentation manner may include, but is not limited to, a bidirectional maximum matching algorithm, a forward maximum matching algorithm, a reverse maximum matching algorithm, and the like.
In the embodiment of the invention, the comment text information under each forum module can be subjected to data cleaning respectively, and the data cleaning of each comment text information can include but is not limited to stop word filtering and text word segmentation. The stop word filtering of the comment text information may include, but is not limited to, being implemented by a jieba library of Python, and in one embodiment, the stop word list may be imported into the jieba library, and the jieba library performs stop word filtering on the comment text information according to the stop word list, and removes the stop words. The stop word list can be made by referring to a hot stop word list of the network, and the stop word list can be supplemented aiming at the non-actual meaning words such as high-frequency word words of the community forum. The text segmentation of the comment text information may include, but is not limited to, using a two-way maximum matching algorithm, and by using the two-way maximum matching algorithm, the result of the comment text information segmentation obtained by the forward maximum matching method may be compared with the result of the comment text information segmentation obtained by the reverse maximum matching method, and the result of the text segmentation with the least segmentation of the number of words is selected according to the maximum matching principle.
S3030, determining the audience preference degree of each comment text message by using an untrained SnowNLP model, wherein the audience preference degree comprises at least two gears.
The audience preference degree can be determined by the user according to the comment text information in each forum module, the audience preference degree of each forum module can be divided into a plurality of gears, and different gears can represent different positive and negative degrees of the user.
In the embodiment of the invention, the preference degree of the audience can at least comprise two positive and negative gears, and the preference degree of the audience for commenting the text information can be judged according to an untrained SnowNLP model. The SnowNLP model can determine the audience preference degree of each comment text message according to the positive and negative degree of each comment text message. Illustratively, the viewer preference degree output by the SnowNLP model may include 1 to 10, and a larger value indicates a higher viewer preference degree, and illustratively, a viewer preference degree of 1 may indicate a very negative result; a viewer preference of 10 may indicate very positive.
S3040, setting a negative tag for the comment text information whose viewer like degree is less than or equal to the threshold, and storing the comment text information into a negative information sample set.
The threshold may be a numerical value set by a predetermined community forum manager according to experience to distinguish viewer preference degrees. For example, when the viewer preference level is set to 1 to 10, 5 may be set as the threshold. When the preference degree of the audience is less than or equal to 5, the comment text information can be regarded as negative information; when the preference degree of the audience is more than 5, the comment text information is considered to be positive information.
In the embodiment of the invention, when the viewer preference degree of the comment text information is less than or equal to the threshold, the viewer preference degree can be represented to be low, and the comment text information can be considered as negative information. After the comment text information is confirmed to be the negative information, a negative tag can be set on the comment text information so as to distinguish the comment text information as the positive information or the negative information, and the comment text information is stored in the negative information sample set.
S3050, setting an active label for the comment text information with the audience preference degree larger than a threshold value, and storing the comment text information into an active information sample set.
In the embodiment of the invention, when the preference degree of the audience of the comment text information is greater than the threshold value, the preference degree of the audience can be represented to be high, and the comment text information can be considered as positive information. After the comment text information is confirmed to be positive information, a positive label can be set for the comment text information, and the comment text information is stored in a positive information sample set.
S3060, constructing a preset comment analysis model based on the SnowNLP model.
And S3070, inputting the positive information sample set and the negative information sample set into a preset comment analysis model.
In the embodiment of the invention, the data in the positive information sample set and the data in the negative information sample set can be respectively input into a preset comment analysis model, and the preference degree of the audience of each comment text message can be predicted. For example, the viewer preference degree of the comment analysis model for each comment text message may include 10 steps of 1 to 10, and the larger the value, the higher the viewer preference degree may be represented.
S3080, obtaining a prediction result of the preset comment analysis model, and determining an elimination viewpoint proportion and a positive viewpoint proportion in the prediction result.
In the embodiment of the invention, the electronic device may obtain the prediction result of the preset comment analysis model on each comment text message, where the prediction result may include, but is not limited to, viewer preference. The degree of aggressiveness of the comment text information can be judged according to the prediction structure. In an embodiment, comments of different users to an event in the same post may be different, and when the viewer preference degree is less than or equal to a threshold value, the comment text information may be determined as negative information, and at this time, a viewpoint corresponding to the comment text information may be determined as a negative viewpoint; when the viewer preference degree is greater than the threshold value, the comment text information can be confirmed to be positive information, and the viewpoint corresponding to the comment text information can be regarded as positive viewpoint. After the prediction results of the comment text information are obtained, the negative viewpoint proportion and the positive viewpoint proportion in the prediction results can be determined.
And S3090, judging whether the negative opinion ratio is matched with the first information ratio of the negative information sample set and whether the positive opinion ratio is matched with the second information ratio of the positive information sample set.
The first information ratio can refer to the ratio of the number of the negative information in the negative information sample set to the total number of the message samples in the negative information sample set and the positive information sample set, and can represent the ratio of the negative comment text information to the comment text information; the second information proportion may refer to a proportion of the amount of positive information in the positive information sample set to the total number of message samples in the negative information sample set and the positive information sample set, and may characterize a proportion of positive comment text information in the comment text information.
In the embodiment of the invention, whether the negative opinion proportion is matched with the first information proportion of the negative information sample set is judged, namely whether the proportion of the negative information analyzed through an untrained SnowNLP model in the comment text information is matched with the negative opinion proportion determined through a preset comment analysis model is judged. Whether the negative opinion score matches the first information ratio of the negative information sample set may be determined by determining whether the two ratios are the same; and judging whether the positive opinion ratio is matched with the second information ratio of the positive information sample set, namely judging whether the ratio of the positive information analyzed by the untrained SnowNLP model in the comment text information is matched with the positive opinion ratio determined by a preset comment analysis model. Whether the positive opinion ratio matches the second information ratio of the positive information sample set may be determined by determining whether the two ratios are the same.
And S3100, if the comment analysis models are matched, determining that the preset comment analysis model is trained.
S3110, if the two models are not matched, adjusting the weight parameters of the high-frequency words in the preset comment analysis model, and retraining the preset comment analysis model.
In the embodiment of the invention, when the negative opinion proportion is not matched with the first information proportion of the negative information sample set and the positive opinion proportion or the second information proportion of the positive information sample set, different weights can be assigned to high-frequency words according to the importance of the comments in the preset comment analysis model, and the preset comment analysis model is retrained. In an embodiment, the adjusting of the weight parameter of the high-frequency vocabulary in the preset comment analysis model may be based on a naive bayes algorithm, and the frequency of occurrence of the high-frequency vocabulary may be predicted according to the naive bayes algorithm, so as to adjust the weight parameter of the high-frequency vocabulary in the preset comment analysis model, and retrain the preset comment analysis model.
The method comprises the steps of respectively reading comment text information by a preset forum according to a forum module, carrying out data cleaning on each comment text information, rechecking and checking the comment text information, determining the viewer preference degree of each comment text information by using an untrained SnowNLP model, respectively setting a negative label for each comment text information with the viewer preference degree smaller than or equal to a threshold value, setting a positive label for each comment text information with the viewer preference degree larger than the threshold value, storing the positive label in a corresponding negative or positive information sample set, constructing a preset comment analysis model based on the SnowNLP model, inputting the positive information sample set and the negative information sample set into the preset comment analysis model, obtaining a prediction result of the preset comment analysis model, determining an extreme view proportion and a positive view proportion in the prediction result, judging whether the first information occupation ratio of the negative view proportion and the negative information sample set and the second information proportion of the positive information sample set are matched or not, when the preset comment analysis model is matched, adjusting a preset community occupation ratio of a high-frequency term in the preset comment analysis model, improving the efficiency of the preset comment text information, and improving the comment use efficiency of the network comment analysis model.
Further, constructing a positive information sample set and a negative information sample set according to comment text information of a preset forum, and further comprising:
probability distribution maps of different audience preference degrees and positive and negative people ratio maps are determined.
And determining noise points in the positive and negative people number ratio graph according to the probability distribution graph, and eliminating comment text information corresponding to the noise points.
In the example, when the audience like degree of the comment text information is greater than a threshold value in the probability distribution map, that is, the comment text information is in an active position, and in the positive and negative people number ratio map, the user corresponding to the comment text information is in a negative viewpoint people number, the user corresponding to the comment text information can be considered as a noise point in the positive and negative people number ratio map, and the comment text information corresponding to the noise point can be removed.
In the embodiment of the invention, after the untrained SnowNLP model is used for determining the preference degree of the audiences of each comment text message, different preference degrees of the audiences can be determined. The viewer preference degree can be divided into a plurality of gears, the probability distribution map of different viewer preferences can be determined by counting the probability of each gear, the types of the probability distribution map can include but are not limited to a bar chart and a pie chart, and different viewer preference degrees can be visually checked by a user or forum manager through the probability distribution map. The positive and negative people occupation ratio map can be used for visually analyzing the emotional tendency of the users who make comments on the events in the forum posts, counting the positive and negative viewpoint people numbers and drawing the positive and negative people occupation ratio map. The category of the extremely negative people proportion map can include, but is not limited to, a bar chart and a pie chart. Because the users corresponding to the comment text information with the audience preference degree smaller than or equal to the threshold value can be used as the negative viewpoint users, the users corresponding to the comment text information with the audience preference degree larger than the threshold value are used as the positive viewpoint users, the probability distribution maps of different audience preference degrees have corresponding relations with the positive and negative people number occupancy map, noise points in the positive and negative people number occupancy map can be determined according to the probability distribution maps, and the comment text information corresponding to the noise points is removed.
Example four
Fig. 4 is a flowchart of an information processing method according to a fourth embodiment of the present invention. Based on the above embodiments, the embodiment of the present invention takes a forum plate as a forum module, an active text test set as an active information sample set, a passive text test set as a passive information sample set, and a comment screener as a community text access interface, for example, and further describes an information processing method, as shown in fig. 4, the method includes:
step 1, crawling a community forum Uniform Resource Locator (URL) by a Python crawler to obtain comment text information of the community forum. In an embodiment, a source code of a forum website can be crawled through a Urlilib library of a Python crawler, the source code is analyzed, and comment text information contained in the comment text information is screened out. The post categories corresponding to the comment text information are classified according to different plates to generate several forum plates, and the comment text information can be distributed under the corresponding plates according to the post categories. In one embodiment, the number of forum sections may be at least two, and exemplary forum sections may include a section, B section, and the like.
And 2, respectively cleaning the data of posts under each forum plate. Where data clearness may include, but is not limited to, stop word filtering and text segmentation. When stop word filtering is performed on posts under each forum block, a stop word list can be made. The mode of making the stop word list can include referring to the network hot stop word list, and supplementing the stop word list aiming at the words without practical meaning such as the high-frequency language and mood words of the community. In one embodiment, stop word removal and punctuation filtering is achieved by applying the jieba library of Python. A bidirectional maximum matching method can be selected for word segmentation on Chinese word segmentation, the method compares the word segmentation result obtained by the forward maximum matching method with the result obtained by the reverse maximum matching method, and the result with the least word number segmentation is selected as the result according to the maximum matching principle, so that noise is eliminated as much as possible. Through data cleansing, a processed data set may be generated.
And 3, sending the processed data set into a SnowNLP model for prediction. The viewer preference of each module can be divided into 10 steps from "1" to "10", where "1" is very negative and "10" is very positive. The probability of each bin may be counted and a user emotion distribution graph may be drawn, wherein the form of the user emotion distribution graph may include, but is not limited to, a histogram. Meanwhile, the number of people with negative viewpoints and the number of people with positive viewpoints can be counted, and a positive and negative people ratio graph can be drawn. Comparing the first analysis result with the actual result, removing the noise which obviously causes inaccurate model judgment, and eliminating the comment text information corresponding to the noise. And constructing an active text test set and a passive text test set according to the residual comment text information, wherein the active text test set and the passive text test set form the test set and are used for training a new model.
And 4, the active text test set and the passive text test set can respectively comprise sample comments and label information of the sample comments, and the sample comments and the label information can be respectively stored in the project directory. And (4) training a new model by using a SnowNLP model, carrying out weight distribution on the high-frequency vocabulary, and giving different weights according to the importance of the comments. In one embodiment, the high-frequency vocabulary can be optimized by optimizing the weight distribution of the Bayesian algorithm based on the naive Bayesian algorithm. After a new SnowNLP model is generated through model training, iteration can be repeated for 2 to 3 times until the predicted structure is considerable, and then the model training is completed.
And step 5, the new SnowNLP model can be used for comment screening, the new SnowNLP model can be packaged into an interface to serve as a comment screener, when a user comments, the comment can be sent into the model for judgment, if the comment text information is judged to be a positive comment, the comment text information is normally sent, and if the comment text information is not judged to be a positive comment, the comment text information is intercepted or sent to a third party for auditing.
According to the embodiment of the invention, comment text information of community forums is obtained, post categories corresponding to the comment text information are classified according to different forum blocks, data cleaning is carried out on the comment text information under each forum block, audience preference degree of each module is predicted through a SnowNLP model, a user emotion distribution diagram is drawn, so that emotional tendency of a speaking user to each forum block is effectively analyzed in a comprehensive mode, a positive and negative people proportion diagram is drawn, people who support attitude or object attitude are easy to observe, and other users who are browsing posts are helped to better know emotion of most people to treat events so as to make own judgment. By optimizing the parameters of the SnowNLP model, the accuracy and the execution efficiency of processing the comment text information are improved.
EXAMPLE five
Fig. 5 is a schematic structural diagram of an information processing apparatus according to a fifth embodiment of the present invention. As shown in fig. 5, the apparatus includes: an information acquisition module 51, an emotion recognition module 52 and a processing execution module 53.
And the information acquisition module 51 is used for acquiring the user comment information according to the community text access interface.
And the emotion recognition module 52 is used for determining the emotion score of the comment information of the user according to the preset comment analysis model.
And the processing execution module 53 is configured to invoke a preset information processing policy according to the emotion score.
In the embodiment of the invention, the information acquisition module acquires the comment information of the user according to the community text access interface, the emotion recognition module determines the emotion score of the comment information of the user according to the preset comment analysis model, and the processing execution module calls the preset information processing strategy according to the emotion score to quickly process the comment information of the user with different emotion tendencies, so that the accuracy of auditing the comment information of the user is improved, and the use experience of the user is improved.
In some embodiments, an information processing apparatus further includes:
and the model training module is used for training a preset comment analysis model according to the positive information sample set and the negative information sample set, wherein the positive information sample set and the message information sample set comprise sample comments and label information of the sample comments.
In some embodiments, the model training module comprises:
and the sample set construction unit is used for constructing a positive information sample set and a negative information sample set according to the comment text information of the preset forum.
And the model building unit is used for building a preset comment analysis model based on the SnowNLP model.
And the model training unit is used for training the preset comment analysis model according to the positive information sample set and the negative information sample set.
In some embodiments, the sample set construction unit comprises:
and the information reading unit is used for respectively reading the comment text information from the preset forum according to the forum module.
And the information data cleaning unit is used for cleaning data of each comment text message, wherein the data cleaning at least comprises stop word filtering and text word segmentation.
And the preference degree determining unit is used for determining the preference degree of the audience of each comment text message by using an untrained SnowNLP model, wherein the preference degree of the audience comprises at least two gears.
And the negative label setting unit is used for setting a negative label for the comment text information with the viewer preference degree smaller than or equal to the threshold value and storing the comment text information into the negative information sample set.
And the positive label setting unit is used for setting positive labels for the comment text information with the audience preference degree larger than a threshold value and storing the comment text information into a positive information sample set.
In some embodiments, the sample set constructing unit further comprises:
and the preference degree determining unit is used for determining probability distribution maps of different viewer preference degrees and positive and negative people ratio maps.
And the noise point removing unit is used for determining noise points in the active and passive people ratio graph according to the probability distribution graph and removing comment text information corresponding to the noise points.
In some embodiments, a model training unit, comprising:
and the information input unit is used for inputting the positive information sample set and the negative information sample set into a preset comment analysis model.
And the viewpoint proportion determining unit is used for acquiring a prediction result of the preset comment analysis model and determining a negative viewpoint proportion and a positive viewpoint proportion in the prediction result.
And a ratio information determination unit for determining whether the negative opinion ratio matches the first information ratio of the negative information sample set and the positive opinion ratio matches the second information ratio of the positive information sample set.
And the training completion determining unit is used for determining that the training of the preset comment analysis model is completed if the preset comment analysis model is matched with the preset comment analysis model.
And the vocabulary weight adjusting unit is used for adjusting the weight parameters of the high-frequency vocabularies in the preset comment analysis model if the high-frequency vocabularies are not matched with the preset comment analysis model, and retraining the preset comment analysis model.
In some embodiments, the information processing policy preset in the processing execution module 53 includes at least one of:
deleting the user comment information;
sending the user comment information to an auditor for auditing;
and carrying out top display on the user comment information.
The information processing device provided by the embodiment of the invention can execute the information processing method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
EXAMPLE six
Fig. 6 is a schematic structural diagram of an electronic device 10 implementing an information processing method according to an embodiment of the present invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 6, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 may also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
Processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. The processor 11 performs the various methods and processes described above, such as an information processing method.
In some embodiments, an information processing method may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into the RAM 13 and executed by the processor 11, one or more steps of an information processing method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform an information processing method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for implementing the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. An information processing method characterized by comprising:
obtaining user comment information according to a community text access interface;
determining the emotion score of the user comment information according to a preset comment analysis model;
and calling a preset information processing strategy according to the emotion scores.
2. The method of claim 1, further comprising:
training the preset comment analysis model according to a positive information sample set and a negative information sample set, wherein the positive information sample set and the message information sample set comprise sample comments and tag information of the sample comments.
3. The method of claim 2, wherein training the pre-set opinion analysis model based on a set of positive information samples and a set of negative information samples comprises:
constructing the positive information sample set and the negative information sample set according to comment text information of a preset forum;
constructing the preset comment analysis model based on a SnowNLP model;
and training the preset comment analysis model according to the positive information sample set and the negative information sample set.
4. The method of claim 3, wherein the constructing the positive information sample set and the negative information sample set according to comment text information of a preset forum comprises:
respectively reading the comment text information from the preset forum according to a forum module;
performing data cleaning on each piece of evaluation paper information, wherein the data cleaning at least comprises stop word filtering and text word segmentation;
determining a viewer preference for each of the rated paper information using the untrained SnowNLP model, wherein the viewer preference comprises at least two steps;
setting a negative tag for the comment text information with the viewer like degree less than or equal to a threshold value, and storing the comment text information into the negative information sample set;
setting an active label for the comment text information with the viewer preference degree larger than the threshold value, and storing the comment text information into the active information sample set.
5. The method of claim 4, further comprising:
determining probability distribution maps of different preference degrees of the audiences and positive and negative people occupation ratio maps;
and determining noise points in the positive and negative people number ratio graph according to the probability distribution graph, and eliminating the comment text information corresponding to the noise points.
6. The method of claim 3, wherein training the pre-set comment analysis model according to the positive set of information samples and the negative set of information samples comprises:
inputting the positive information sample set and the negative information sample set into the preset comment analysis model;
obtaining a prediction result of the preset comment analysis model, and determining an elimination viewpoint proportion and a positive viewpoint proportion in the prediction result;
determining whether the negative opinion score matches a first information aspect of the negative information sample set and the positive opinion score matches a second information aspect of the positive information sample set;
if so, determining that the preset comment analysis model is trained;
and if not, adjusting the weight parameters of the high-frequency words in the preset comment analysis model, and retraining the preset comment analysis model.
7. The method of claim 1, wherein the predetermined information processing policy comprises at least one of:
deleting the user comment information;
sending the user comment information to an auditor for auditing;
and carrying out top display on the user comment information.
8. An information processing apparatus characterized by comprising:
the information acquisition module is used for acquiring user comment information according to the community text access interface;
the emotion recognition module is used for determining the emotion score of the user comment information according to a preset comment analysis model;
and the processing execution module is used for calling a preset information processing strategy according to the emotion score.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the information processing method of any one of claims 1 to 7.
10. A computer-readable storage medium storing computer instructions for causing a processor to implement the information processing method according to any one of claims 1 to 7 when executed.
CN202211268580.3A 2022-10-17 2022-10-17 Information processing method and device, electronic equipment and storage medium Pending CN115438667A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211268580.3A CN115438667A (en) 2022-10-17 2022-10-17 Information processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211268580.3A CN115438667A (en) 2022-10-17 2022-10-17 Information processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115438667A true CN115438667A (en) 2022-12-06

Family

ID=84250228

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211268580.3A Pending CN115438667A (en) 2022-10-17 2022-10-17 Information processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115438667A (en)

Similar Documents

Publication Publication Date Title
CN108073568B (en) Keyword extraction method and device
US20190146984A1 (en) Prioritizing survey text responses
CN103399891A (en) Method, device and system for automatic recommendation of network content
CN110390408A (en) Trading object prediction technique and device
McKelvey et al. Visualizing communication on social media: Making big data accessible
CN108563625A (en) Text analyzing method, apparatus, electronic equipment and computer storage media
CN112732910B (en) Cross-task text emotion state evaluation method, system, device and medium
US10073891B2 (en) Forensic system, forensic method, and forensic program
CN113392218A (en) Training method of text quality evaluation model and method for determining text quality
CN111275338A (en) Method, device, equipment and storage medium for judging enterprise fraud behaviors
CN116109373A (en) Recommendation method and device for financial products, electronic equipment and medium
US11755677B2 (en) Data mining method, data mining apparatus, electronic device and storage medium
CN113392920B (en) Method, apparatus, device, medium, and program product for generating cheating prediction model
CN113870998A (en) Interrogation method, device, electronic equipment and storage medium
CN117216275A (en) Text processing method, device, equipment and storage medium
CN116108844A (en) Risk information identification method, apparatus, device and storage medium
WO2023040220A1 (en) Video pushing method and apparatus, and electronic device and storage medium
CN110837732A (en) Method and device for identifying intimacy between target people, electronic equipment and storage medium
CN115438667A (en) Information processing method and device, electronic equipment and storage medium
CN112818221B (en) Entity heat determining method and device, electronic equipment and storage medium
CN110990709B (en) Role automatic recommendation method and device and electronic equipment
CN113962216A (en) Text processing method and device, electronic equipment and readable storage medium
CN110929175B (en) Method, device, system and medium for evaluating user evaluation
CN112115300A (en) Text processing method and device, electronic equipment and readable storage medium
KR102309802B1 (en) Analysis method for trend of sns

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination