CN113609390A - Information analysis method and device, electronic equipment and computer readable storage medium - Google Patents

Information analysis method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN113609390A
CN113609390A CN202110902332.9A CN202110902332A CN113609390A CN 113609390 A CN113609390 A CN 113609390A CN 202110902332 A CN202110902332 A CN 202110902332A CN 113609390 A CN113609390 A CN 113609390A
Authority
CN
China
Prior art keywords
text
analyzed
emotion type
preset
probability value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110902332.9A
Other languages
Chinese (zh)
Inventor
刘文强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jindi Credit Service Co ltd
Original Assignee
Beijing Jindi Credit Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jindi Credit Service Co ltd filed Critical Beijing Jindi Credit Service Co ltd
Priority to CN202110902332.9A priority Critical patent/CN113609390A/en
Publication of CN113609390A publication Critical patent/CN113609390A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Primary Health Care (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the disclosure provides an information analysis method and device, a computer-readable storage medium and an electronic device. The method comprises the following steps: acquiring a text to be analyzed; predicting the probability value of the text to be analyzed as a preset emotion type by using an analysis model obtained by pre-training to obtain a first prediction result; the preset emotion type comprises any one or more of positive emotion and negative emotion; and determining a first emotion type of the text to be analyzed based on the first prediction result. The public technical scheme can improve the accuracy of public opinion news recognition results, can avoid the user to the wrong understanding of public opinion news, help accurate discernment negative public opinion news, can reduce user's unnecessary loss.

Description

Information analysis method and device, electronic equipment and computer readable storage medium
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to an information analysis method and apparatus, an electronic device, and a computer-readable storage medium.
Background
Public opinion refers to the social attitude of the people as the subject in the direction of social managers, enterprises, individuals and other organizations as objects and their politics, society, morality, etc. around the occurrence, development and change of social events in a certain social space.
With the rapid development of internet technology, the openness and flexibility of the network make it one of the main carriers reflecting social public opinions. By extracting information and storing the information structurally of the public opinion news of the enterprise, the user can conveniently acquire the comprehensive public opinion information of the concerned enterprise, the public opinion information of the enterprise can be analyzed, the development trend of the enterprise can be accurately judged, and a public opinion report and various statistical reports can be further generated so as to facilitate decision making.
The prior art can not accurately identify the polarity of public sentiment news of enterprises, namely, the public sentiment news can not be judged to be positive public sentiment, negative public sentiment or neutral public sentiment, and the public sentiment information of the enterprises can not be effectively monitored and analyzed, so that users can not judge the development trend of the enterprises according to the public sentiment information, and unnecessary loss can be caused to the users. For example, in the investment field, if a user cannot timely find negative information of an investment enterprise and further adjust an investment strategy, huge economic loss may be brought to the user.
Disclosure of Invention
The present disclosure is directed to an information analysis method and apparatus, an electronic device, and a computer-readable storage medium, so as to improve the accuracy of a public opinion news recognition result at least to a certain extent.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to a first aspect of the present disclosure, there is provided an information analysis method including:
acquiring a text to be analyzed;
predicting the probability value of the text to be analyzed as a preset emotion type by using an analysis model obtained by pre-training to obtain a first prediction result; wherein the preset emotion types comprise any one or more of the following: positive emotions, negative emotions;
and determining a first emotion type of the text to be analyzed based on the first prediction result.
Optionally, in the information analysis method according to any of the above embodiments of the present disclosure, after the obtaining the text to be analyzed, the method further includes:
acquiring a target entity in the text to be analyzed;
the predicting the emotion type of the text to be analyzed by using the analysis model obtained by pre-training, and obtaining a first prediction result comprises:
splicing the text to be analyzed and the target entity according to a preset mode to obtain a spliced text;
inputting the spliced text into the analysis model, and outputting the first prediction result through the analysis model;
after determining the first emotion type of the text to be analyzed based on the first prediction result, the method further includes:
and outputting the target entity and the first emotion type of the text to be analyzed.
Optionally, in the information analysis method according to any of the above embodiments of the present disclosure, the obtaining a target entity in the text to be analyzed includes:
identifying entities included in the text to be analyzed;
determining whether the number of entities included in the text to be analyzed is greater than 1;
and if the number of the entities in the text to be analyzed is more than 1, determining one entity from the entities in the text to be analyzed as the target entity based on a preset mode.
Optionally, in the information analysis method according to any one of the above embodiments of the present disclosure, the determining, based on a preset manner, one entity from the entities included in the text to be analyzed as the target entity includes any one or more of the following: and determining the entity with the most occurrence times as a target entity, determining the entity with the most occurrence times of the first person as the target entity, and determining the enterprise entity as the target entity.
Optionally, in the information analysis method according to any of the embodiments of the present disclosure, the splicing the text to be analyzed and the target entity according to a preset manner to obtain a spliced text includes:
and performing mask processing on other entities except the target entity in the entities included in the text to be analyzed, and splicing the text to be analyzed after mask processing and the target entity according to a preset mode to obtain a spliced text.
Optionally, in the information analysis method according to any one of the above embodiments of the present disclosure, the obtaining a target entity in the text to be analyzed further includes:
and if the number of the entities in the text to be analyzed is equal to 1, taking the entities in the text to be analyzed as the target entities.
Optionally, in the information analysis method according to any one of the above embodiments of the present disclosure, the first prediction result includes: the probability value of the text to be analyzed as the positive emotion type and the probability value of the text to be analyzed as the negative emotion type; or the text to be analyzed is the probability value of the positive emotion type and the probability value of the text not being the positive emotion type; or the text to be analyzed has a probability value of a negative emotion type and a probability value of a non-negative emotion type.
Optionally, in the information analysis method according to any of the embodiments of the present disclosure, a probability value that the text to be analyzed is a preset emotion type is predicted by using an analysis model obtained through pre-training;
determining the emotion type of the text to be analyzed based on the predicted probability value of the text to be analyzed as a preset emotion type; the first prediction result comprises a probability value that the text to be analyzed is a preset emotion type and an emotion type of the text to be analyzed;
the determining the first emotion type of the text to be analyzed based on the first prediction result comprises:
and acquiring the emotion type in the first prediction result.
Optionally, in the information analysis method according to any of the above embodiments of the present disclosure, after the obtaining the text to be analyzed, the method further includes:
determining whether the length of the text to be analyzed is larger than a preset length;
and if the length of the text to be analyzed is not greater than the preset length, executing an analysis model obtained by pre-training, predicting the probability value of the text to be analyzed to be the preset emotion type, and obtaining a first prediction result.
Optionally, in the information analysis method according to any of the above embodiments of the present disclosure, after the obtaining the text to be analyzed, the method further includes:
if the length of the text to be analyzed is greater than a preset length, dividing the text to be analyzed into N text sections by taking the preset length as a unit; wherein N is an integer greater than 1;
the predicting the probability value of the text to be analyzed as the preset emotion type by using the analysis model obtained by pre-training, and obtaining a first prediction result comprises the following steps:
predicting the probability value of the N text segments as the preset emotion type by using an analysis model obtained by pre-training to obtain N second prediction results;
and determining a first prediction result of the text to be analyzed based on the N second prediction results.
Optionally, in the information analysis method according to any of the above embodiments of the present disclosure, the method further includes:
acquiring a preset part of text in the text to be analyzed as a sub-text to be analyzed;
predicting the probability value of the sub-text to be analyzed as the preset emotion type by using the analysis model to obtain a third prediction result;
determining a third emotion type of the sub-text to be analyzed based on the third prediction result;
and determining a fourth emotion type of the text to be analyzed based on the first emotion type and the third emotion type.
Optionally, in the information analysis method according to any of the embodiments of the present disclosure, the training of the analysis model includes:
inputting each first corpus and emotion type labeling information in the plurality of first corpuses into the analysis model, and outputting probability values of whether each first corpus is preset with emotion types or not through the analysis model;
and training the analysis model based on whether the probability value of each preset emotion type of the first training corpora and the probability value corresponding to the corresponding emotion type marking information.
According to a second aspect of the present disclosure, there is provided an information analysis apparatus including:
the first acquisition module is used for acquiring a text to be analyzed;
the first prediction module is used for predicting the probability value of the text to be analyzed as the preset emotion type by using an analysis model obtained by pre-training to obtain a first prediction result; wherein the preset emotion types comprise any one or more of the following: positive emotions, negative emotions;
and the first determining module is used for determining a first emotion type of the text to be analyzed based on the first prediction result.
Optionally, in the information analysis apparatus according to any of the above embodiments of the present disclosure, the apparatus further includes:
the second acquisition module is used for acquiring a target entity in the text to be analyzed;
the splicing module is used for splicing the text to be analyzed and the target entity according to a preset mode to obtain a spliced text;
the first prediction module is used for inputting the spliced text into the analysis model and outputting the first prediction result through the analysis model;
and the output module is used for outputting the target entity and the first emotion type of the text to be analyzed.
Optionally, in the information analysis apparatus according to any of the above embodiments of the present disclosure, the second obtaining module is specifically configured to:
identifying entities included in the text to be analyzed;
determining whether the number of entities included in the text to be analyzed is greater than 1;
and if the number of the entities in the text to be analyzed is more than 1, determining one entity from the entities in the text to be analyzed as the target entity based on a preset mode.
Optionally, in the information analysis apparatus according to any one of the above embodiments of the present disclosure, the determining, by the second obtaining module, an entity from entities included in the text to be analyzed as the target entity includes any one or more of the following, based on a preset manner: and determining the entity with the most occurrence times as a target entity, determining the entity with the most occurrence times of the first person as the target entity, and determining the enterprise entity as the target entity.
Optionally, in the information analysis apparatus according to any of the embodiments of the present disclosure, the splicing module is configured to perform mask processing on other entities except the target entity in the entities included in the text to be analyzed, and splice the text to be analyzed after the mask processing with the target entity according to a preset manner, so as to obtain a spliced text.
Optionally, in the information analysis apparatus according to any one of the above embodiments of the present disclosure, the second obtaining module is further configured to:
and if the number of the entities in the text to be analyzed is equal to 1, taking the entities in the text to be analyzed as the target entities.
Optionally, in the information analysis apparatus according to any one of the above embodiments of the present disclosure, the first prediction result includes: the probability value of the text to be analyzed as the positive emotion type and the probability value of the text to be analyzed as the negative emotion type; or the text to be analyzed is the probability value of the positive emotion type and the probability value of the text not being the positive emotion type; or the text to be analyzed has a probability value of a negative emotion type and a probability value of a non-negative emotion type.
Optionally, in the information analysis apparatus according to any of the embodiments of the present disclosure, the first prediction module is configured to predict a probability value that the text to be analyzed is of a preset emotion type by using an analysis model obtained through pre-training;
the first determining module is used for determining the emotion type of the text to be analyzed based on the predicted probability value that the text to be analyzed is a preset emotion type; the first prediction result comprises a probability value that the text to be analyzed is a preset emotion type and an emotion type of the text to be analyzed;
the first determining module is further configured to obtain an emotion type in the first prediction result.
Optionally, in the information analysis apparatus according to any of the above embodiments of the present disclosure, the apparatus further includes:
the second determining module is used for determining whether the length of the text to be analyzed is greater than a preset length;
the first prediction module is used for executing an analysis model obtained by pre-training according to the determination result of the second determination module and if the length of the text to be analyzed is not greater than the preset length, predicting the probability value of the text to be analyzed to be the preset emotion type, and obtaining a first prediction result;
optionally, in the information analysis apparatus according to any of the above embodiments of the present disclosure, the apparatus further includes:
the segmentation module is used for dividing the text to be analyzed into N text segments by taking the preset length as a unit if the length of the text to be analyzed is greater than the preset length according to the determination result of the second determination module; wherein N is an integer greater than 1;
the first prediction module is used for predicting the probability value of the N text segments as the preset emotion type by using an analysis model obtained by pre-training to obtain N second prediction results; and determining a first prediction result of the text to be analyzed based on the N second prediction results.
Optionally, in the information analysis apparatus according to any of the above embodiments of the present disclosure, the apparatus further includes:
the third acquisition module is used for acquiring a preset part of texts in the texts to be analyzed as sub-texts to be analyzed;
the first prediction module is used for predicting the probability value of the sub-text to be analyzed as the preset emotion type by using the analysis model to obtain a third prediction result;
the first determining module is configured to determine a third emotion type of the sub-text to be analyzed based on the third prediction result; and determining a fourth emotion type of the text to be analyzed based on the first emotion type and the third emotion type.
Optionally, in the information analysis apparatus according to any of the above embodiments of the present disclosure, the apparatus further includes:
the training module is used for inputting each first training corpus and emotion type labeling information in the plurality of first training corpuses into the analysis model, and outputting probability values of whether each first training corpus is preset with emotion types through the analysis model; and training the analysis model based on whether the probability value of each preset emotion type of the first training corpora and the probability value corresponding to the corresponding emotion type marking information.
According to a third aspect of the present disclosure, there is provided an electronic device comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the information analysis method described above via execution of the executable instructions.
According to a fourth aspect of the present disclosure, there is provided a computer storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the information analysis method described above.
As can be seen from the foregoing technical solutions, the information analysis method and apparatus, the electronic device, and the computer-readable storage medium in the exemplary embodiments of the present disclosure have at least the following advantages and positive effects:
the information analysis method and device, the electronic device and the computer readable storage medium in the embodiment of the disclosure are characterized in that firstly, a text to be analyzed is obtained, then, a probability value of the text to be analyzed to be a preset emotion type is predicted by using an analysis model obtained through pre-training, a first prediction result is obtained, the preset emotion type comprises any one or more of positive emotion and negative emotion, and then, the first emotion type of the text to be analyzed is determined based on the first prediction result. Because the preset emotion types do not include neutral emotions, when the text to be analyzed is predicted by using the analysis model obtained by pre-training, only the probability that the text to be analyzed is the positive emotion type and/or the negative emotion type is predicted, and the probability that the text to be analyzed is the neutral emotion type is not predicted, so that the influence of the inconspicuous characteristics of neutral public opinion news on the accuracy of the identification result can be avoided, the accuracy of the identification result of the public opinion news can be improved, the misunderstanding of a user on the public opinion news can be avoided, in addition, the improvement of the accuracy of the identification result of the public opinion news is facilitated, the negative public opinion news can be accurately identified, and the unnecessary loss of the user can be reduced.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.
FIG. 1 illustrates a system architecture diagram to which embodiments of the present disclosure may be applied;
fig. 2 shows a schematic flow chart of an information analysis method in a first exemplary embodiment of the present disclosure;
fig. 3 shows a schematic flow chart of an information analysis method in a second exemplary embodiment of the present disclosure;
fig. 4 shows a block diagram of an information analysis apparatus in a first exemplary embodiment of the present disclosure;
fig. 5 shows a block diagram of an information analysis apparatus in a second exemplary embodiment of the present disclosure;
fig. 6 shows a block diagram of an electronic device in an exemplary embodiment of the disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, apparatus, steps, etc. In other instances, well-known structures, methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the disclosure.
Furthermore, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present disclosure, "a plurality" means at least two, e.g., two, three, etc., unless explicitly specifically limited otherwise. The symbol "/" generally indicates that the former and latter associated objects are in an "or" relationship.
In the present disclosure, unless otherwise expressly specified or limited, the terms "connected" and the like are to be construed broadly, e.g., as meaning electrically connected or in communication with each other; may be directly connected or indirectly connected through an intermediate. The specific meaning of the above terms in the present disclosure can be understood by those of ordinary skill in the art as appropriate.
FIG. 1 shows a system architecture diagram to which embodiments of the present disclosure may be applied. As shown in fig. 1, the system architecture 100 may include one or more of terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, server 105 may be a server cluster comprised of multiple servers, or the like.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may be various electronic devices having display screens including, but not limited to, smart phones, tablets, portable and desktop computers, digital cinema projectors, and the like.
The server 105 may be a server that provides various information texts, such as servers of various websites, self-media platforms, databases, and the like. For example, the user uses the terminal device 103 (or the terminal device 101 or 102) to obtain information from the server 105 as a text to be analyzed in real time or periodically, and executes the information analysis method of the embodiment of the present disclosure to obtain the emotion type of the text to be analyzed, and stores the emotion type in the structured database in a structured information storage manner for subsequent analysis.
Fig. 2 shows a schematic flow chart of an information analysis method in a first exemplary embodiment of the present disclosure. The present embodiment can be applied to an electronic device, and as shown in fig. 2, the information analysis method of the present embodiment includes the following steps:
s201, obtaining a text to be analyzed.
The characters in the text to be analyzed in the embodiment of the present disclosure may be chinese characters, english characters, or characters of any type such as numbers. In addition, the text to be analyzed in the embodiment of the present disclosure may be a text in any field, and the content and the field of the text to be analyzed in the embodiment of the present disclosure are not limited.
Optionally, in some possible implementations, in step S201, the text to be analyzed may be a public opinion news text of an enterprise, where the public opinion news text may be an original public opinion news text, or a public opinion news text obtained by preprocessing the original public opinion news text, where the preprocessing may be, for example, removing emoticons, wrong punctuation marks, and the like in the original public opinion news text, and the embodiment of the present disclosure is not limited to the specific content and representation form of the public opinion news text, whether to preprocess, and the specific manner of preprocessing.
In an optional example, the public opinion news text of the enterprise can be an ' AA trip on-line national first vehicle end ' smart trip ' system, and the trip efficiency is improved. AA goes out in 2020 and 10 months, the first vehicle-end intelligent travel system in China is brought on line in Western Ann, and intelligent interaction of people, vehicles and machines is realized for the first time. The AA trip 'wisdom trip' system can carry out effective linkage with traditional "car machine end", can also realize functions such as intelligence receipt, synchronous destination navigation, intelligent voice broadcast except functions such as simple system pronunciation, route navigation, and the online of these new functions also can promote taxi driver efficiency when bringing more orders for the taxi driver, lets the interior scene of car safer and convenient. In recent years, in a large four-time meeting of thirteen nationwide people in the country called by Beijing, the digital development is accelerated, and the digital transformation and upgrading of the whole industrial chain are promoted. ".
The text to be analyzed in the embodiment of the present disclosure, for example, public opinion news text of an enterprise, is unstructured information.
In the embodiment of the present disclosure, the text to be analyzed may be acquired from each website, forum, self-media platform, or the like in real time or according to a certain period, or the text to be analyzed input by the user may also be received.
S202, predicting the probability value of the text to be analyzed as the preset emotion type by using the analysis model obtained by pre-training to obtain a first prediction result.
The emotion type in the disclosed embodiment is used for representing the emotion polarity of public opinion news. For example, if the information described in the public opinion news is positive and positive, the emotion polarity of the public opinion news can be determined to be positive, that is, the emotion type of the public opinion news is positive emotion; if the information described by the public sentiment news is negative and negative, the sentiment polarity of the public sentiment news can be used as a negative pole, namely the sentiment type of the public sentiment news is negative sentiment; if the information described by the public sentiment news is neutral, the sentiment type of the public sentiment news can be determined to be neutral sentiment. The preset emotion types in the embodiment of the present disclosure may include any one or more of the following: positive emotions and negative emotions.
The analysis model in the embodiment of the present disclosure may be obtained by training based on a plurality of first corpus, and the first corpus may be labeled with emotion type labeling information. The emotion types of the plurality of first corpus in the embodiment of the present disclosure may include any one or more of the following: positive emotions and negative emotions.
Optionally, in some possible implementations, in step S202, if the emotion types of the first training corpora include positive emotion and negative emotion, the analysis model obtained based on the training of the first training corpora may be used for two classifications of public opinion news, that is, whether the public opinion news is a positive emotion type or a negative emotion type is predicted, and whether the public opinion news is a neutral emotion type is predicted, so that the influence of the inconspicuous characteristics of the neutral public opinion news on the accuracy of the recognition result can be avoided, which is helpful for improving the accuracy of the public opinion news recognition result.
Or, in another possible implementation manner, in the step S202, if the emotion types of the first training corpora only include negative emotions, the analysis model trained based on the first training corpora learns the recognition of the negative public opinion news with emphasis, which is helpful for accurately recognizing the negative public opinion news, so that the user can avoid misunderstanding of the public opinion news, and unnecessary loss of the user can be reduced.
Optionally, in some possible implementation manners, in step S202, the analysis model may select a pre-training language model, for example, a large pre-training language model such as a BERT model, a RoBERTa model, an ERNIE model, and the like.
The pre-training language model can learn semantic representation of complete concepts through prior semantic knowledge such as entity concepts in modeling mass data, the representation of semantic knowledge units is closer to the real world, the prior semantic knowledge units are directly modeled while the modeling is input based on character features, and the pre-training language model has strong semantic representation capability.
Alternatively, in another possible implementation manner, in step S202, a large pre-training language model such as a BERT model, a RoBERTa model, an ERNIE model, or the like is used as an analysis model, and the pre-training language model is refined (fine-tuning) by using a large amount of training corpora to implement training of the pre-training language model.
In the embodiment of the present disclosure, an activation function of an analysis model may be selected according to actual needs, for example, a sigmoid function may be selected as the activation function of the analysis model, and a softmax function may also be selected as the activation function of the analysis model.
For example, in training an analysis model for identifying public sentiment news of a certain sentiment type (e.g., negative sentiment), a sigmoid function may be selected as an activation function of the analysis model, and in training an analysis model for identifying public sentiment news of a plurality of sentiment types (e.g., positive sentiment and negative sentiment), a softmax function may be selected as an activation function of the analysis model.
Optionally, in some possible implementation manners, in step S202, when the analysis model is obtained by training based on the first corpus in advance, the first corpus and the emotion type label information in the first corpuses may be input into the analysis model, the probability value of whether each emotion type is preset in each first corpus is output through the analysis model, and the analysis model is trained based on the probability value of whether each emotion type is preset in each first corpus and the probability value corresponding to the emotion type label information.
Based on the possible implementation manner, a large number of first training corpora can be used for training the analysis model, so that the analysis model can fully learn the stipulation relationship between each first training corpus and the emotion type, and the trained analysis model can accurately identify the first emotion type of the text to be analyzed.
S203, determining a first emotion type of the text to be analyzed based on the first prediction result.
The analysis model can be obtained through pre-training based on the possible implementation mode, after the analysis model is trained, the analysis model can be used for predicting the probability value of the text to be analyzed to be the preset emotion type to obtain a first prediction result, and then the emotion type of the text to be analyzed can be determined based on the probability value of the text to be analyzed to be the preset emotion type obtained through prediction.
Optionally, in some possible implementations, in step S203, the first prediction result may include a probability value that the text to be analyzed is of a positive emotion type and a probability value that the text to be analyzed is of a negative emotion type; or the text to be analyzed is the probability value of the positive emotion type and the probability value of the text not being the positive emotion type; or the probability value of the text to be analyzed being the negative emotion type and the probability value of the text not being the negative emotion type.
Based on this, the first emotion type of the text to be analyzed may be determined according to the probability value included in the first prediction result and the size of the preset threshold, where the size of the preset threshold may be set according to actual needs, which is not limited in the embodiment of the present disclosure.
Taking the first prediction result comprising the probability value of the text to be analyzed as the positive emotion type and the probability value of the text to be analyzed as the negative emotion type, taking a first preset threshold value of 0.7 as an example, the probability value that the text to be analyzed is of the positive emotion type and the probability value that the text to be analyzed is of the negative emotion type can be respectively compared with a first preset threshold, if the probability value that the text to be analyzed is of the positive emotion type is more than 0.7, the first emotion type of the text to be analyzed can be determined as a positive emotion type, and if the probability value of the text to be analyzed as a negative emotion type is greater than 0.7, the first emotion type of the text to be analyzed can be determined as a negative emotion type, and if the probability value of the text to be analyzed as a positive emotion type is not more than 0.7, and the probability value that the text to be analyzed is the negative emotion type is not more than 0.7, and the first emotion type of the text to be analyzed can be determined to be the neutral emotion type.
Taking the first prediction result as another example that the first prediction result comprises a probability value that the text to be analyzed is of a negative emotion type and a probability value that the text to be analyzed is not of a negative emotion type, the second preset threshold is 0.84, and the third preset threshold is 0.36, the probability value that the text to be analyzed is of a negative emotion type and the probability value that the text to be analyzed is not of a negative emotion type can be respectively compared with the second preset threshold and the third preset threshold, if the probability value that the text to be analyzed is of a negative emotion type is greater than 0.84, the first emotion type of the text to be analyzed can be determined to be of a negative emotion type, if the probability value that the text to be analyzed is not of a negative emotion type is greater than 0.36, the first emotion type of the text to be analyzed can be determined to be of a positive emotion type, if the probability value that the text to be analyzed is of a negative emotion type is not greater than 0.84, and the probability value that the text to be not of a negative emotion type is not greater than 0.36, the first emotion type of the text to be analyzed can be determined to be a neutral emotion type.
According to the information analysis method provided by the embodiment of the disclosure, firstly, a text to be analyzed is obtained, then, a probability value of the text to be analyzed to be a preset emotion type is predicted by using an analysis model obtained through pre-training, a first prediction result is obtained, the preset emotion type comprises any one or more of positive emotion and negative emotion, and then, the first emotion type of the text to be analyzed is determined based on the first prediction result. Because the preset emotion types do not include neutral emotions, when the text to be analyzed is predicted by using the analysis model obtained by pre-training, only the probability that the text to be analyzed is the positive emotion type and/or the negative emotion type is predicted, and the probability that the text to be analyzed is the neutral emotion type is not predicted, so that the influence of the inconspicuous characteristics of neutral public opinion news on the accuracy of the identification result can be avoided, the accuracy of the identification result of the public opinion news can be improved, the misunderstanding of a user on the public opinion news can be avoided, in addition, the improvement of the accuracy of the identification result of the public opinion news is facilitated, the negative public opinion news can be accurately identified, and the unnecessary loss of the user can be reduced.
Optionally, in some possible implementation manners, after the text to be analyzed is obtained, a target entity in the text to be analyzed may also be obtained, the text to be analyzed and the target entity are spliced according to a preset manner to obtain a spliced text, the spliced text is input into the analysis model, and a first prediction result is output through the analysis model; and determining a first emotion type of the text to be analyzed based on the first prediction result, and outputting the target entity and the first emotion type of the text to be analyzed.
The target entity in the possible implementation manner may be a social manager, a business, an individual, or other various organizations, and the type of the target entity is not limited in the embodiments of the present disclosure. In the embodiment of the present disclosure, an entity action target entity in a text to be analyzed may be manually obtained, or an identification model for identifying an entity included in the text to be analyzed may be trained in advance, and the text to be analyzed is identified by using the identification model, so that the entity in the text to be analyzed is obtained as the target entity.
In this possible implementation manner, the text to be analyzed and the target entity are spliced according to a preset manner to obtain a spliced text, and the text to be analyzed and the target entity are spliced according to a preset manner by using a preset field to obtain a text.
In an optional example, the text to be analyzed is "AA trip in 10 months in 2020, a first car-end" smart trip "system in the whole country is brought on line in west security, intelligent interaction between people, cars and machines is realized for the first time, the target entity is" AA trip ", the text to be analyzed and the" click trip "of the target entity are spliced according to a preset mode, and the spliced text to be analyzed is obtained as follows: { "text": "AA trip in 2020 and 10 months on the line in Western-style land the first vehicle-end" wisdom trip "system, realized people, vehicle, machine intelligent interaction for the first time", "entity": "AA trip". The preset field "text" represents the text to be analyzed, and the preset setting "entity" represents the target entity.
The stitched text is then input into an analysis model, and a first prediction result is output through the analysis model, where the first prediction result may include a probability value that the text to be analyzed is of a preset emotion type, for example, the first prediction result may be: { "animation _ with _ entry" { "name": "AA trip", "prob": [0.95868, 0.0432] } }, wherein a preset field "animation _ with _ entry" represents a prediction result, a preset field "name" represents an entity name, and a preset field "prob" represents a probability value included in the first prediction result, and may include two components.
The meanings of the two components in the preset field "prob" may be set according to actual needs, for example, the first component may be set as a probability value that the text to be analyzed is of a negative emotion type, and the second component is a probability value that the text to be analyzed is not of a negative emotion type; the first component can also be set as the probability value that the text to be analyzed is of the positive emotion type, and the second component is the probability value that the text to be analyzed is of the negative emotion type; the first component may also be set as a probability value that the text to be analyzed is of a positive emotion type, and the second component is a probability value that the text to be analyzed is not of a positive emotion type, which is not limited in the embodiment of the present disclosure.
Optionally, in some possible implementations, the first prediction result may include a probability value that the text to be analyzed is a preset emotion type and an emotion type of the text to be analyzed, and the emotion type in the first prediction result may be obtained as the first emotion type of the text to be analyzed.
For example, the first predicted result may be { "animation _ with _ entity": "AA travel", "prob": 0.041316501796245575,0.9586834907531738 "," label ": negative" } }, where the meanings of the preset field "observation _ with _ entry", the preset field "name", and the preset field "prob" may be referred to in the above possible implementation manners of the present disclosure, and are not described herein again. The preset field "label" represents the emotion type, and the emotion type "negative" in the first prediction result can be acquired as the first emotion type of the text to be analyzed.
In this optional implementation manner, based on the probability value included in the first prediction result, the first emotion type of the text to be analyzed may be determined, and the target entity and the first emotion type of the text to be analyzed may be output to the display device of the user, so that the user may find negative information of the enterprise to be invested by the user in time, and further adjust the investment policy, thereby avoiding great economic loss to the user.
Optionally, in some possible implementation manners, when obtaining a target entity in the text to be analyzed, the entities included in the text to be analyzed may be identified, whether the number of the entities included in the text to be analyzed is greater than 1 is determined, and if the number of the entities included in the text to be analyzed is equal to 1, the entities included in the text to be analyzed are used as the target entity.
In this possible implementation manner, the text to be analyzed may be identified by using a pre-trained identification model, entities included in the text to be analyzed may be obtained, and then the target entity may be determined according to the number of the entities included in the text to be analyzed. And if the number of the entities in the text to be analyzed is equal to 1, taking the entities in the text to be analyzed as target entities.
In an alternative example, the text to be analyzed is "BB arbitrator 4000 person: when you are abandoned in the era, no call is made when the phone calls are on! The method comprises the steps of identifying a text to be analyzed by using an identification model, obtaining a unique entity 'BB' in the text to be analyzed, and further taking the entity 'BB' as a target entity.
Or, in another possible implementation manner, if the number of entities included in the text to be analyzed is greater than 1, one entity may be determined as the target entity from the entities included in the text to be analyzed based on a preset manner.
In this possible implementation, the target entity may be determined according to the number of times each entity appears in the text to be analyzed, the number of times each entity appears in the text to be analyzed in the first person name, and/or whether the entity is a business entity.
Based on this, in some possible implementation manners, based on a preset manner, determining an entity as a target entity from the entities included in the text to be analyzed may include any one or more of the following: and determining the entity with the most occurrence times as a target entity, determining the entity with the most occurrence times of the first person as the target entity, and determining the enterprise entity as the target entity. In this possible implementation manner, based on a preset manner, an entity is determined from entities included in a text to be analyzed as a target entity, and the entity with the largest number of occurrences may be selected as the target entity, and the number of occurrences of the first person is the largest, and/or the entity of the enterprise entity is selected as the target entity. In an optional example, the text to be analyzed is "statistics shows that 10 thousands of CC mobile phones and 8 thousands of DD mobile phones are sold in total in 2020, and the comparison shows that the sales volume of the CC mobile phones is higher and the sales of the CC mobile phones is better, and the text to be analyzed is identified by using the identification model, so that the entities in the text to be analyzed include the CC and the DD, wherein the number of occurrences of the CC is 3, the number of occurrences of the DD is 1, the number of occurrences of the CC is 3 by the first name, and the number of occurrences of the DD is 1 by the first name. The comparison shows that the number of occurrences of the CC and the number of occurrences of the first person are respectively more than the number of occurrences of the DD and the number of occurrences of the first person, so that the CC can be selected as a target entity of the text to be analyzed.
In another alternative example, the text to be analyzed is "currently, there are mainly DD and CC for the most popular brands of domestic mobile phones on the market. The weight of the DD brand mobile phone is similar to that of an apple. The text to be analyzed is identified by using the identification model, and entities in the text to be analyzed can be obtained to include DD, CC and apple, wherein DD is an enterprise entity, the occurrence frequency is 2, CC is an enterprise entity, the occurrence frequency is 1, and apple is a non-enterprise entity, and the occurrence frequency is 1. The comparison shows that the number of times of occurrence of the enterprise entity DD is the largest, so that the DD can be selected as a target entity of the text to be analyzed.
Optionally, in some possible implementation manners, after an entity is determined as a target entity, masking other entities except for the target entity in the entities included in the text to be analyzed, and splicing the text to be analyzed after masking with the target entity according to a preset manner to obtain a spliced text.
Here, the masking of the other entities in the text to be analyzed may be to use a preset mask field instead of the other entities in the text to be analyzed. For example, "MASK" may be used as a preset MASK field in place of other entities in the text to be analyzed. The content of the preset mask field is not limited in the implementation of the present disclosure.
In one optional example, the text to be analyzed is "EE profit 1000 ten thousand; the FF loss is 1000 ten thousand, MASK processing is carried out on other entities 'FF' in the text to be analyzed by using a preset MASK field 'MASK', the text to be analyzed after the MASK processing is spliced with a target entity 'EE' according to a preset mode, and the obtained spliced text is as follows: { "text": "EE profit 1000 ten thousand; MASK loss 1000 ten thousand "," entity ": "EE" }
Based on the optional implementation mode, the analysis model predicts the preset emotion type of the text to be analyzed after mask processing to obtain a first prediction result, so that other entities in the text to be analyzed can be shielded, the information content of the text to be analyzed is reduced, the influence of information of other entities on the prediction speed and the accuracy of the prediction result is reduced, the prediction speed and the accuracy of the prediction result are improved, and the recognition speed and the accuracy of public sentiment news can be improved.
Optionally, in some possible implementation manners, after the text to be analyzed is obtained, it may be further determined whether the length of the text to be analyzed is greater than a preset length, and if the length of the text to be analyzed is not greater than the preset length, an analysis model obtained through pre-training is executed to predict a probability value that the text to be analyzed is a preset emotion type, so as to obtain an operation of a first prediction result.
In practical applications, if the size of the text to be analyzed is too large, the amount of information contained in the text to be analyzed is too large, and may include repeated information or redundant information, which affects the prediction speed and the accuracy of the prediction result of the text to be analyzed.
Based on the method, whether the length of the text to be analyzed is appropriate or not can be judged according to whether the length of the text to be analyzed is larger than the preset length or not. If the length of the text to be analyzed is not greater than the preset length, determining that the length of the text to be analyzed is proper, and performing operation of predicting the probability value of the text to be analyzed to be the preset emotion type by using an analysis model to obtain a first prediction result; if the length of the text to be analyzed is larger than the preset length, the length of the text to be analyzed is determined to be not appropriate, the text to be analyzed is divided into N text sections with appropriate lengths by taking the preset length as a unit, wherein N is an integer larger than 1, and then the probability value that the text to be analyzed is the preset emotion type is predicted based on the N text sections.
In this possible implementation manner, the preset length may be set according to actual needs, for example, the preset length may be 512 characters, which is not limited in this disclosure. Based on the possible implementation mode, the text to be analyzed can be divided into at least one text segment by taking the preset length as a unit, the length of the text segment is proper, and the influence of repeated information or redundant information on the prediction speed and the accuracy of the prediction result can be avoided.
Or, in another possible implementation manner, if the length of the text to be analyzed is greater than the preset length, the preset length is taken as a unit, the text to be analyzed is divided into N text segments, where N is an integer greater than 1, the probability value that the N text segments are the preset emotion type is predicted by using the analysis model obtained through pre-training, N second prediction results are obtained, and the first prediction result of the text to be analyzed is determined based on the N second prediction results.
In this possible implementation manner, the text to be analyzed may be divided into at least one text segment, then, the analysis model is used to predict the probability value of each text segment in the at least one text segment to the preset emotion type, so as to obtain at least one second prediction result, and then, the first prediction result of the text to be analyzed is determined based on the at least one second prediction result.
Optionally, in some possible implementations, the first prediction result of the text to be analyzed may be determined based on the N second prediction results by averaging or voting.
In this possible implementation manner, the first prediction result of the text to be analyzed is determined by averaging, the N second prediction results may be added and summed first, and then the number N is averaged to obtain an average value of the N second prediction results, and then the average value of the N second prediction results is used as the first prediction result of the text to be analyzed.
In this possible implementation manner, the first prediction result of the text to be analyzed is determined by voting, the emotion types of N text segments may be counted, the number N1 of the text segments whose emotion types are positive emotions and the number N2 of the text segments whose emotion types are negative emotions are determined, where N1 and N2 are both integers not less than 0, then N1 and N2 are compared, if N1 is greater than N2, it may be determined that the first emotion type of the text to be analyzed is positive emotion, if N1 is less than N2, it may be determined that the first emotion type of the text to be analyzed is negative emotion, and if N1 is equal to N2, it may be determined that the first emotion type of the text to be analyzed is neutral emotion.
Based on the possible implementation mode, the text to be analyzed can be divided into N text segments by taking the preset length as a unit, then the first prediction result of the text to be analyzed is determined based on the second prediction results of the N text segments, the length of the text segments is proper, the prediction speed and the accuracy of the prediction result are not affected, in addition, the first prediction result of the text to be analyzed is determined based on the N second prediction results in an averaging or voting mode, and the objectivity and the accuracy of the first prediction result can be ensured.
Fig. 3 shows a schematic flow chart of an information analysis method in a second exemplary embodiment of the present disclosure. As shown in fig. 3, on the basis of the foregoing embodiments, the information analysis method of this embodiment may further include:
s301, acquiring a preset part of text in the text to be analyzed as a sub-text to be analyzed.
The preset part in the present embodiment may be any paragraph in the text to be analyzed, or may be any chapter in the text to be analyzed. For example, the preset portion may be the first 1 paragraph in the text to be analyzed, or may be a summary portion in the text to be analyzed.
S302, predicting the probability value of the sub-text to be analyzed as the preset emotion type by using the analysis model to obtain a third prediction result.
S303, determining a third emotion type of the sub text to be analyzed based on the third prediction result.
In step S303, the third prediction result may include a probability value that the sub text to be analyzed is the preset emotion type, or may include a probability value that the sub text to be analyzed is the preset emotion type and an emotion type of the sub text to be analyzed, and then the third emotion type of the sub text to be analyzed may be determined according to the probability value in the third prediction result, and the emotion type in the third prediction result may also be obtained as the third emotion type of the value text to be analyzed.
S304, determining a fourth emotion type of the text to be analyzed based on the first emotion type and the third emotion type.
In step S304, a fourth emotion type of the text to be analyzed is determined based on the first emotion type and the third emotion type, and the fourth emotion type of the text to be analyzed may be determined according to whether the first emotion type and the third emotion type are consistent.
Optionally, in some possible implementation manners, if the first emotion type is consistent with the third emotion type, any one of the first emotion type and the third emotion type may be determined as a fourth emotion type of the text to be analyzed.
Or, in other possible implementation manners, if the first emotion type is not consistent with the third emotion type, a fourth prediction result may be determined based on the first prediction result and the third prediction result according to a preset rule, and a fourth emotion type of the text to be analyzed may be determined according to the fourth preset result. For example, the first prediction and the third prediction may be weighted and summed to determine the fourth prediction.
Based on the embodiment, when the first emotion type is consistent with the third emotion type, any one of the first emotion type and the third emotion type is directly determined to be used as the fourth emotion type of the text to be analyzed, and when the first emotion type is inconsistent with the third emotion type, the fourth prediction result can be determined according to the preset rule, so that the fourth emotion type of the text to be analyzed is determined based on the fourth prediction result, which is equivalent to verifying and optimizing the first prediction result of the text to be analyzed through the third prediction result of the sub-text to be analyzed, the accuracy of the public opinion news recognition result can be further improved, and the user can be helped to avoid misinterpretation of the public opinion news.
Fig. 4 schematically shows a block diagram of an information analysis apparatus in a first exemplary embodiment of the present disclosure. The information analysis apparatus provided in the embodiment of the present disclosure may be disposed on a terminal device, may also be disposed on a server, or may be partially disposed on a terminal device and partially disposed on a server, for example, may be disposed on the server 105 in fig. 1 (according to actual replacement), but the present disclosure is not limited thereto.
As shown in fig. 4, the information analysis apparatus of this embodiment includes: a first obtaining module 401, a first predicting module 402, and a first determining module 403. Wherein the content of the first and second substances,
the first obtaining module 401 is configured to obtain a text to be analyzed.
The first prediction module 402 is configured to predict a probability value that a text to be analyzed is a preset emotion type by using an analysis model obtained through pre-training, so as to obtain a first prediction result; the preset emotion types comprise any one or more of the following: positive emotions and negative emotions.
A first determining module 403, configured to determine a first emotion type of the text to be analyzed based on the first prediction result.
The information analysis device based on the embodiment first acquires a text to be analyzed, then predicts a probability value of the text to be analyzed as a preset emotion type by using an analysis model obtained by pre-training to obtain a first prediction result, wherein the preset emotion type comprises any one or more of positive emotion and negative emotion, and then determines the first emotion type of the text to be analyzed based on the first prediction result. Because the preset emotion types do not include neutral emotion types, when the text to be analyzed is predicted by using the analysis model obtained by pre-training, only the probability that the text to be analyzed is the positive emotion type and/or the negative emotion type is predicted, and the probability that the text to be analyzed is the neutral emotion type is not predicted, so that the influence of the inconspicuous characteristics of neutral public opinion news on the accuracy of the identification result can be avoided, the accuracy of the identification result of the public opinion news can be improved, the misunderstanding of a user on the public opinion news can be avoided, in addition, the accuracy of the identification result of the public opinion news is improved, the negative public opinion news can be accurately identified, and the unnecessary loss of the user can be reduced.
Optionally, in some possible implementations, the first prediction module 502 may include: the splicing unit is used for splicing the text to be analyzed and the target entity according to a preset mode to obtain the spliced text to be analyzed; and the first prediction unit is used for inputting the spliced text to be analyzed into the first analysis model and outputting a first prediction result through the first analysis model.
Fig. 5 shows a block diagram of an information analysis apparatus in a second exemplary embodiment of the present disclosure. As shown in fig. 5, on the basis of the embodiment shown in fig. 4, the information analysis apparatus of this embodiment may further include: a second acquisition module 404, a stitching module 405, and an output module 406. Wherein:
and a second obtaining module 404, configured to obtain a target entity in the text to be analyzed.
And the splicing module 405 is configured to splice the text to be analyzed and the target entity according to a preset mode to obtain a spliced text.
The first prediction module 402 is configured to input the stitched text into an analysis model, and output a first prediction result through the analysis model.
And the output module 406 is used for outputting the target entity and the first emotion type of the text to be analyzed.
Optionally, in some possible implementation manners, the second obtaining module 404 is specifically configured to: identifying entities included in the text to be analyzed; determining whether the number of entities included in the text to be analyzed is greater than 1; and if the number of the entities in the text to be analyzed is more than 1, determining one entity from the entities in the text to be analyzed as a target entity based on a preset mode.
Optionally, in some possible implementation manners, the second obtaining module 404 determines, based on a preset manner, that one entity as a target entity from entities included in the text to be analyzed includes any one or more of the following: and determining the entity with the most occurrence times as a target entity, determining the entity with the most occurrence times of the first person as the target entity, and determining the enterprise entity as the target entity.
Optionally, in some possible implementation manners, the splicing module 405 is configured to perform mask processing on other entities except for the target entity in the entities included in the text to be analyzed, and splice the text to be analyzed after the mask processing with the target entity according to a preset manner to obtain a spliced text.
Optionally, in some possible implementations, the second obtaining module 504 is further configured to:
and if the number of the entities in the text to be analyzed is equal to 1, taking the entities in the text to be analyzed as the target entities.
Optionally, in some possible implementations, the first prediction result includes: the probability value of the text to be analyzed as the positive emotion type and the probability value of the text to be analyzed as the negative emotion type; or the text to be analyzed is the probability value of the positive emotion type and the probability value of the text not being the positive emotion type; or the probability value of the text to be analyzed being the negative emotion type and the probability value of the text not being the negative emotion type.
Optionally, in some possible implementation manners, the first prediction module 402 is configured to predict a probability value that the text to be analyzed is the preset emotion type, by using an analysis model obtained through pre-training; a first determining module 403, configured to determine an emotion type of a text to be analyzed based on a probability value that the predicted text to be analyzed is a preset emotion type; the first prediction result comprises a probability value that the text to be analyzed is a preset emotion type and the emotion type of the text to be analyzed; the first determining module 403 is further configured to obtain an emotion type in the first prediction result.
Optionally, referring to fig. 5 again, on the basis of the foregoing embodiments, the information analysis apparatus may further include: a second determination module 407. Wherein:
a second determining module 407, configured to determine whether the length of the text to be analyzed is greater than a preset length.
The first prediction module 402 is configured to, according to the determination result of the second determination module 407, execute an analysis model obtained by pre-training if the length of the text to be analyzed is not greater than the preset length, predict a probability value that the text to be analyzed is of the preset emotion type, and obtain an operation of a first prediction result.
Optionally, referring to fig. 5 again, on the basis of the above embodiments, the information analysis apparatus may further include a segmentation module 408. Wherein the content of the first and second substances,
a dividing module 408, configured to, according to the determination result of the second determining module 407, divide the text to be analyzed into N text segments by taking the preset length as a unit if the length of the text to be analyzed is greater than the preset length; wherein N is an integer greater than 1.
The first prediction module 402 is configured to predict probability values of the N text segments to the preset emotion types by using an analysis model obtained through pre-training, so as to obtain N second prediction results; and determining a first prediction result of the text to be analyzed based on the N second prediction results.
Optionally, in some possible implementation manners, on the basis of the foregoing embodiments, the information analysis apparatus may further include a third obtaining module 409. Wherein:
the third obtaining module 409 is configured to obtain a preset part of text in the text to be analyzed as a sub-text to be analyzed.
The first prediction module 402 is configured to predict, by using the analysis model, a probability value that the sub-text to be analyzed is the preset emotion type, and obtain a third prediction result.
A first determining module 403, configured to determine a third emotion type of the sub-text to be analyzed based on the third prediction result; and determining a fourth emotion type of the text to be analyzed based on the first emotion type and the third emotion type.
Optionally, referring to fig. 5 again, on the basis of the above embodiments, the information analysis apparatus may further include a training module 410.
The training module 410 is configured to input each first training expected and emotion type labeling information in the plurality of first training expected into the analysis model, and output a probability value of whether each first training corpus is of a preset emotion type through the analysis model; and training the analysis model based on whether the probability value of each preset emotion type of the first training corpora and the probability value corresponding to the corresponding emotion type marking information.
The specific implementation of each module, unit, and subunit in the information analysis apparatus provided in the embodiment of the present disclosure may refer to the content in the information analysis method, and is not described herein again.
It should be noted that although several modules, units and sub-units of the apparatus for action execution are mentioned in the above detailed description, such division is not mandatory. Indeed, the features and functionality of two or more modules, units and sub-units described above may be embodied in one module, unit and sub-unit, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module, unit and sub-unit described above may be further divided into embodiments by a plurality of modules, units and sub-units.
As shown in fig. 6, the example electronic device 60 includes a processor 601 for executing software routines although a single processor is shown for clarity, the electronic device 60 may include a multi-processor system. The processor 601 is connected to a communication infrastructure 602 for communicating with other components of the electronic device 60. The communication infrastructure 602 may include, for example, a communication bus, a crossbar, or a network.
Electronic device 60 also includes Memory, such as Random Access Memory (RAM), which may include a main Memory 603 and a secondary Memory 610. The secondary memory 610 may include, for example, a hard disk drive 611 and/or a removable storage drive 612, which removable storage drive 612 may comprise a floppy disk drive, a magnetic tape drive, an optical disk drive, etc. The removable storage drive 612 reads from and/or writes to a removable storage unit 613 in a conventional manner. Removable storage unit 613 may comprise a floppy disk, magnetic tape, optical disk, etc. which is read by and written to by removable storage drive 612. As will be appreciated by those skilled in the relevant art, the removable storage unit 613 comprises a computer-readable storage medium having stored thereon computer-executable program code instructions and/or data.
In an alternative embodiment, secondary memory 610 may additionally or alternatively include other similar means for allowing computer programs or other instructions to be loaded into electronic device 60. Such means may include, for example, a removable storage unit 621 and an interface 620. Examples of the removable storage unit 621 and the interface 620 include: a program cartridge (cartridge) and cartridge interface (such as that found in video game console devices), a removable memory chip (such as an EPROM, or PROM) and associated socket, and other removable storage units 621 and interfaces 620 which allow software and data to be transferred from the removable storage unit 621 to electronic device 60.
The electronic device 60 also includes at least one communication interface 640. Communications interface 640 allows software and data to be transferred between electronic device 60 and external devices via communications path 641. In various embodiments of the invention, communication interface 640 allows data to be transferred between electronic device 60 and a data communication network, such as a public or private data communication network. The communication interface 640 may be used to exchange data between different electronic devices 60, which electronic devices 60 form part of an interconnected computer network. Examples of communication interface 640 may include a modem, a network interface (such as an ethernet card), a communication port, an antenna with associated circuitry, and so forth. Communication interface 640 may be wired or may be wireless. Software and data transferred via communications interface 640 are in the form of signals which may be electronic, magnetic, optical or other signals capable of being received by communications interface 640. These signals are provided to the communication interface via communication path 641.
As shown in fig. 6, electronic device 60 also includes a display interface 631 and an audio interface 632, display interface 631 performing operations for rendering images to an associated display 630, and audio interface 632 performing operations for playing audio content through an associated speaker 633.
In this document, the term "computer program product" may refer, in part, to: removable storage unit 613, removable storage unit 621, a hard disk installed in hard disk drive 611, or a carrier wave carrying software through communication path 641 (wireless link or cable) to communication interface 640. Computer-readable storage media refers to any non-transitory tangible storage medium that provides recorded instructions and/or data to electronic device 60 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROMs, DVDs, Blu-ray optical disks, hard drives, ROMs, or integrated circuits, USB memory, magneto-optical disks, or a computer-readable card, such as a PCMCIA card, among others, whether internal or external to the electronic device 60. Transitory or non-tangible computer-readable transmission media may also participate in providing software, applications, instructions, and/or data to the electronic device 60, examples of such transmission media including radio or infrared transmission channels, network connections to another computer or another networked device, and the internet or intranet including e-mail transmissions and information recorded on websites and the like.
Computer programs (also called computer program code) are stored in the main memory 603 and/or the secondary memory 610. Computer programs may also be received via communications interface 640. Such computer programs, when executed, enable the electronic device 60 to perform one or more features of embodiments discussed herein. In various embodiments, the computer programs, when executed, enable the processor 601 to perform the features of the embodiments described above. Accordingly, such computer programs represent controllers of the computer system 60.
The software may be stored in a computer program product and loaded into electronic device 60 using removable storage drive 612, hard drive 611, or interface 620. Alternatively, the computer program product may be downloaded to computer system 60 via communications path 641. The software, when executed by the processor 601, causes the electronic device 60 to perform the functions of the embodiments described herein.
It should be understood that the embodiment of fig. 6 is given by way of example only. Accordingly, in some embodiments, one or more features of electronic device 60 may be omitted. Also, in some embodiments, one or more features of the electronic device 60 may be combined together. Additionally, in some embodiments, one or more features of electronic device 60 may be separated into one or more components.
It will be appreciated that the elements shown in fig. 6 serve to provide a means for performing the various functions and operations of the server described in the above embodiments.
In one embodiment, a server may be generally described as a physical device including at least one processor and at least one memory including computer program code. The at least one memory and the computer program code are configured to, with the at least one processor, cause the physical device to perform necessary operations.
Embodiments of the present application also provide a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the functions of the method shown in fig. 2-3.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by an electronic device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
From the above description of the embodiments, it is clear to those skilled in the art that the embodiments of the present disclosure can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the embodiments of the present specification may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments of the present specification.
The basic principles of the present invention have been described above with reference to specific embodiments, but it should be noted that the advantages, effects, etc. mentioned in the present invention are only examples and are not limiting, and the advantages, effects, etc. must not be considered to be possessed by various embodiments of the present invention. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the invention is not limited to the specific details described above.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The method and apparatus of the present invention may be implemented in a number of ways. For example, the methods and apparatus of the present invention may be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustrative purposes only, and the steps of the method of the present invention are not limited to the order specifically described above unless specifically indicated otherwise. Furthermore, in some embodiments, the present invention may also be embodied as a program recorded in a recording medium, the program including machine-readable instructions for implementing a method according to the present invention. Thus, the present invention also covers a recording medium storing a program for executing the method according to the present invention.
The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to practitioners skilled in this art. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

Claims (15)

1. An information analysis method, comprising:
acquiring a text to be analyzed;
predicting the probability value of the text to be analyzed as a preset emotion type by using an analysis model obtained by pre-training to obtain a first prediction result; wherein the preset emotion types comprise any one or more of the following: positive emotions, negative emotions;
and determining a first emotion type of the text to be analyzed based on the first prediction result.
2. The method of claim 1, further comprising:
acquiring a target entity in the text to be analyzed;
the predicting the emotion type of the text to be analyzed by using the analysis model obtained by pre-training, and obtaining a first prediction result comprises:
splicing the text to be analyzed and the target entity according to a preset mode to obtain a spliced text;
inputting the spliced text into the analysis model, and outputting the first prediction result through the analysis model;
after determining the first emotion type of the text to be analyzed based on the first prediction result, the method further includes:
and outputting the target entity and the first emotion type of the text to be analyzed.
3. The method according to claim 2, wherein the obtaining of the target entity in the text to be analyzed comprises:
identifying entities included in the text to be analyzed;
determining whether the number of entities included in the text to be analyzed is greater than 1;
and if the number of the entities in the text to be analyzed is more than 1, determining one entity from the entities in the text to be analyzed as the target entity based on a preset mode.
4. The method according to claim 3, wherein the determining an entity as the target entity from the entities included in the text to be analyzed based on a preset manner includes any one or more of the following: and determining the entity with the most occurrence times as a target entity, determining the entity with the most occurrence times of the first person as the target entity, and determining the enterprise entity as the target entity.
5. The method according to claim 3 or 4, wherein the splicing the text to be analyzed and the target entity according to a preset mode to obtain a spliced text comprises:
and performing mask processing on other entities except the target entity in the entities included in the text to be analyzed, and splicing the text to be analyzed after mask processing and the target entity according to a preset mode to obtain a spliced text.
6. The method of claim 3, wherein the obtaining of the target entity in the text to be analyzed further comprises:
and if the number of the entities in the text to be analyzed is equal to 1, taking the entities in the text to be analyzed as the target entities.
7. The method of claim 1, wherein the first prediction comprises: the probability value of the text to be analyzed as the positive emotion type and the probability value of the text to be analyzed as the negative emotion type; or the text to be analyzed is the probability value of the positive emotion type and the probability value of the text not being the positive emotion type; or the text to be analyzed has a probability value of a negative emotion type and a probability value of a non-negative emotion type.
8. The method according to claim 1, characterized in that a probability value of the text to be analyzed as a preset emotion type is predicted by using an analysis model obtained by pre-training;
determining the emotion type of the text to be analyzed based on the predicted probability value of the text to be analyzed as a preset emotion type; the first prediction result comprises a probability value that the text to be analyzed is a preset emotion type and an emotion type of the text to be analyzed;
the determining the first emotion type of the text to be analyzed based on the first prediction result comprises:
and acquiring the emotion type in the first prediction result.
9. The method according to any one of claims 1-8, wherein after obtaining the text to be analyzed, the method further comprises:
determining whether the length of the text to be analyzed is larger than a preset length;
and if the length of the text to be analyzed is not greater than the preset length, executing an analysis model obtained by pre-training, predicting the probability value of the text to be analyzed to be the preset emotion type, and obtaining a first prediction result.
10. The method of claim 9, wherein after obtaining the text to be analyzed, further comprising:
if the length of the text to be analyzed is greater than a preset length, dividing the text to be analyzed into N text sections by taking the preset length as a unit; wherein N is an integer greater than 1;
the predicting the probability value of the text to be analyzed as the preset emotion type by using the analysis model obtained by pre-training, and obtaining a first prediction result comprises the following steps:
predicting the probability value of the N text segments as the preset emotion type by using an analysis model obtained by pre-training to obtain N second prediction results;
and determining a first prediction result of the text to be analyzed based on the N second prediction results.
11. The method of claim 1, further comprising:
acquiring a preset part of text in the text to be analyzed as a sub-text to be analyzed;
predicting the probability value of the sub-text to be analyzed as the preset emotion type by using the analysis model to obtain a third prediction result;
determining a third emotion type of the sub-text to be analyzed based on the third prediction result;
and determining a fourth emotion type of the text to be analyzed based on the first emotion type and the third emotion type.
12. The method of any of claims 1-11, wherein the training of the analytical model comprises:
inputting each first corpus and emotion type labeling information in the plurality of first corpuses into the analysis model, and outputting probability values of whether each first corpus is preset with emotion types or not through the analysis model;
and training the analysis model based on whether the probability value of each preset emotion type of the first training corpora and the probability value corresponding to the corresponding emotion type marking information.
13. An information analysis apparatus, characterized by comprising:
the first acquisition module is used for acquiring a text to be analyzed;
the first prediction module is used for predicting the probability value of the text to be analyzed as the preset emotion type by using an analysis model obtained by pre-training to obtain a first prediction result; wherein the preset emotion types comprise any one or more of the following: positive emotions, negative emotions;
and the first determining module is used for determining a first emotion type of the text to be analyzed based on the first prediction result.
14. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the information analysis method of any one of claims 1-12 via execution of the executable instructions.
15. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the information analysis method of any one of claims 1 to 12.
CN202110902332.9A 2021-08-06 2021-08-06 Information analysis method and device, electronic equipment and computer readable storage medium Pending CN113609390A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110902332.9A CN113609390A (en) 2021-08-06 2021-08-06 Information analysis method and device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110902332.9A CN113609390A (en) 2021-08-06 2021-08-06 Information analysis method and device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN113609390A true CN113609390A (en) 2021-11-05

Family

ID=78307483

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110902332.9A Pending CN113609390A (en) 2021-08-06 2021-08-06 Information analysis method and device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113609390A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114117042A (en) * 2021-11-15 2022-03-01 盐城金堤科技有限公司 Method, device, equipment and medium for predicting emotion of enterprise entity in public opinion text
CN115248846A (en) * 2022-07-26 2022-10-28 贝壳找房(北京)科技有限公司 Text recognition method, apparatus, medium, and program product

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108255805A (en) * 2017-12-13 2018-07-06 讯飞智元信息科技有限公司 The analysis of public opinion method and device, storage medium, electronic equipment
US20190197105A1 (en) * 2017-12-21 2019-06-27 International Business Machines Corporation Unsupervised neural based hybrid model for sentiment analysis of web/mobile application using public data sources
CN110232123A (en) * 2019-05-28 2019-09-13 第四范式(北京)技术有限公司 The sentiment analysis method and device thereof of text calculate equipment and readable medium
CN111241842A (en) * 2018-11-27 2020-06-05 阿里巴巴集团控股有限公司 Text analysis method, device and system
CN111324739A (en) * 2020-05-15 2020-06-23 支付宝(杭州)信息技术有限公司 Text emotion analysis method and system
CN112100388A (en) * 2020-11-18 2020-12-18 南京华苏科技有限公司 Method for analyzing emotional polarity of long text news public sentiment
CN112699682A (en) * 2020-12-11 2021-04-23 山东大学 Named entity identification method and device based on combinable weak authenticator
CN112860841A (en) * 2021-01-21 2021-05-28 平安科技(深圳)有限公司 Text emotion analysis method, device and equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108255805A (en) * 2017-12-13 2018-07-06 讯飞智元信息科技有限公司 The analysis of public opinion method and device, storage medium, electronic equipment
US20190197105A1 (en) * 2017-12-21 2019-06-27 International Business Machines Corporation Unsupervised neural based hybrid model for sentiment analysis of web/mobile application using public data sources
CN111241842A (en) * 2018-11-27 2020-06-05 阿里巴巴集团控股有限公司 Text analysis method, device and system
CN110232123A (en) * 2019-05-28 2019-09-13 第四范式(北京)技术有限公司 The sentiment analysis method and device thereof of text calculate equipment and readable medium
CN111324739A (en) * 2020-05-15 2020-06-23 支付宝(杭州)信息技术有限公司 Text emotion analysis method and system
CN112100388A (en) * 2020-11-18 2020-12-18 南京华苏科技有限公司 Method for analyzing emotional polarity of long text news public sentiment
CN112699682A (en) * 2020-12-11 2021-04-23 山东大学 Named entity identification method and device based on combinable weak authenticator
CN112860841A (en) * 2021-01-21 2021-05-28 平安科技(深圳)有限公司 Text emotion analysis method, device and equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114117042A (en) * 2021-11-15 2022-03-01 盐城金堤科技有限公司 Method, device, equipment and medium for predicting emotion of enterprise entity in public opinion text
CN115248846A (en) * 2022-07-26 2022-10-28 贝壳找房(北京)科技有限公司 Text recognition method, apparatus, medium, and program product

Similar Documents

Publication Publication Date Title
CN112860852B (en) Information analysis method and device, electronic equipment and computer readable storage medium
CN113010638B (en) Entity recognition model generation method and device and entity extraction method and device
CN112597759B (en) Emotion detection method and device based on text, computer equipment and medium
CN112686022A (en) Method and device for detecting illegal corpus, computer equipment and storage medium
JP2022088304A (en) Method for processing video, device, electronic device, medium, and computer program
CN112579733B (en) Rule matching method, rule matching device, storage medium and electronic equipment
CN113609390A (en) Information analysis method and device, electronic equipment and computer readable storage medium
CN113032520A (en) Information analysis method and device, electronic equipment and computer readable storage medium
CN112995690B (en) Live content category identification method, device, electronic equipment and readable storage medium
CN111767394A (en) Abstract extraction method and device based on artificial intelligence expert system
CN111915086A (en) Abnormal user prediction method and equipment
CN111553138A (en) Auxiliary writing method and device for standardizing content structure document
CN114817478A (en) Text-based question and answer method and device, computer equipment and storage medium
CN113011169B (en) Method, device, equipment and medium for processing conference summary
CN110516236B (en) Social short text fine-grained emotion acquisition method
CN110377706B (en) Search sentence mining method and device based on deep learning
US20230274161A1 (en) Entity linking method, electronic device, and storage medium
CN112115720B (en) Method, device, terminal equipment and medium for determining association relation between entities
CN111523034B (en) Application processing method, device, equipment and medium
CN114595318A (en) Customer service reply quality evaluation method and system
CN114428867A (en) Data mining method and device, storage medium and electronic equipment
CN113065353A (en) Entity identification method and device
CN112784015A (en) Information recognition method and apparatus, device, medium, and program
CN110574102B (en) Information processing system, information processing apparatus, recording medium, and dictionary database updating method
CN113569091A (en) Video data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination