CN117473049A - Method, apparatus, device and storage medium for information processing - Google Patents

Method, apparatus, device and storage medium for information processing Download PDF

Info

Publication number
CN117473049A
CN117473049A CN202210861703.8A CN202210861703A CN117473049A CN 117473049 A CN117473049 A CN 117473049A CN 202210861703 A CN202210861703 A CN 202210861703A CN 117473049 A CN117473049 A CN 117473049A
Authority
CN
China
Prior art keywords
target
factors
factor
structuring
metric
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210861703.8A
Other languages
Chinese (zh)
Inventor
林阿弟
冯璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to CN202210861703.8A priority Critical patent/CN117473049A/en
Priority to US18/355,250 priority patent/US20240028836A1/en
Priority to JP2023118434A priority patent/JP2024014830A/en
Publication of CN117473049A publication Critical patent/CN117473049A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/169Annotation, e.g. comment data or footnotes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Machine Translation (AREA)

Abstract

According to embodiments of the present disclosure, methods, apparatuses, devices, and storage media for information processing are provided. The method includes obtaining a set of target factors for a target object. A set of target factors is determined based on the unstructured text set for the target object, and each target factor represents an aspect of the target object. The method also includes determining at least one key factor for the target object based on a set of target factors and a set of structuring factors for the target object, wherein at least one target factor of the set of target factors is different from the set of structuring factors. This helps to recognize key aspects of the target object, thereby facilitating optimization of the target object.

Description

Method, apparatus, device and storage medium for information processing
Technical Field
Example embodiments of the present disclosure relate generally to the field of computers and, more particularly, relate to methods, apparatuses, devices, and computer-readable storage media for information processing.
Background
Through unstructured text, one may provide comments on some object (such as a product, service, etc.). For example, user reviews are often displayed in a purchase page of a product or a presentation page of a service. As another example, a questionnaire may include open questions for a surveyor (response) to provide text comments. Such unstructured text typically contains rich information about the described objects. It is desirable to be able to interpret and utilize such information.
Disclosure of Invention
In a first aspect of the present disclosure, a method of information processing is provided. The method comprises the following steps: obtaining a set of target factors for the target object, the set of target factors being determined based on the unstructured text set for the target object, and each target factor representing an aspect of the target object; and determining at least one key factor for the target object based on a set of target factors and a set of structuring factors for the target object, wherein at least one target factor of the set of target factors is different from the set of structuring factors.
In a second aspect of the present disclosure, an electronic device is provided. The electronic device includes at least one processing circuit. The at least one processing circuit is configured to: obtaining a set of target factors for the target object, the set of target factors being determined based on the unstructured text set for the target object, and each target factor representing an aspect of the target object; and determining at least one key factor for the target object based on a set of target factors and a set of structuring factors for the target object, wherein at least one target factor of the set of target factors is different from the set of structuring factors.
In a third aspect of the present disclosure, an electronic device is provided. The apparatus comprises at least one processing unit; and at least one memory coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit. The instructions, when executed by at least one processing unit, cause the apparatus to perform the method of the first aspect.
In a fourth aspect of the present disclosure, a computer-readable storage medium is provided. The computer readable storage medium has stored thereon a computer program executable by a processor to implement the method of the first aspect.
It should be understood that what is described in this summary is not intended to limit the critical or essential features of the embodiments of the disclosure nor to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The above and other features, advantages and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, wherein like or similar reference numerals denote like or similar elements, in which:
FIG. 1 illustrates a schematic diagram of an example environment in which embodiments of the present disclosure may be implemented;
FIG. 2 illustrates a schematic diagram of one example of an information collection sheet (form) according to some embodiments of the present disclosure;
FIG. 3 illustrates a flow chart of a process of determining a targeting factor according to some embodiments of the present disclosure;
FIG. 4 illustrates a schematic diagram of keyword groupings in accordance with some embodiments of the present disclosure;
FIG. 5 illustrates a schematic diagram of information related to a target factor, according to some embodiments of the present disclosure;
FIG. 6A illustrates one example of metrics for a target factor according to some embodiments of the present disclosure;
FIG. 6B illustrates another example of metrics for a target factor according to some embodiments of the present disclosure;
FIG. 7 illustrates a flow chart of a process of determining key factors according to some embodiments of the present disclosure;
FIG. 8A illustrates a schematic diagram of selecting key factors from target factors and structured factors, respectively, according to some embodiments of the present disclosure;
FIG. 8B illustrates a schematic diagram of selecting key factors from a union of target factors and structural factors, according to some embodiments of the present disclosure;
FIG. 9 illustrates a flowchart of a process of presenting an information cell phone bill, according to some embodiments of the present disclosure;
FIG. 10 illustrates a schematic diagram of an updated version of an information collection sheet, according to some embodiments of the present disclosure;
FIG. 11 illustrates a schematic diagram of hints about target factors in accordance with some embodiments of the present disclosure;
FIG. 12A illustrates a schematic diagram of a machine learning model for a tendencies score, according to some embodiments of the present disclosure;
FIG. 12B illustrates a schematic diagram of a machine learning model for conditional outcome expectations in accordance with some embodiments of the present disclosure; and
fig. 13 illustrates a block diagram of an apparatus capable of implementing various embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been illustrated in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather, these embodiments are provided so that this disclosure will be more thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
In describing embodiments of the present disclosure, the term "comprising" and its like should be taken to be open-ended, i.e., including, but not limited to. The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The term "some embodiments" should be understood as "at least some embodiments". Other explicit and implicit definitions are also possible below.
The term "circuitry" as used herein may refer to hardware circuitry and/or a combination of hardware circuitry and software. For example, the circuitry may be a combination of analog and/or digital hardware circuitry and software/firmware. As another example, the circuitry may be any portion of a hardware processor with software, including digital signal processor(s), software, and memory(s) that work together to enable the device to operate to perform various functions. In yet another example, the circuitry may be hardware circuitry and/or a processor, such as a microprocessor or a portion of a microprocessor, that requires software/firmware for operation, but software may not be present when not required for operation. As used herein, the term "circuitry" also encompasses a hardware circuit or processor alone or as part of a hardware circuit or processor and its (or their) implementation in conjunction with software and/or firmware.
As used herein, the term "model" may learn the association between the respective inputs and outputs from training data so that, for a given input, a corresponding output may be generated after training is completed. The generation of the model may be based on machine learning techniques. Deep learning is a machine learning algorithm that processes inputs and provides corresponding outputs through the use of multiple layers of processing units. The "model" may also be referred to herein as a "machine learning model," "machine learning network," or "network," and these terms are used interchangeably herein. A model may in turn comprise different types of processing units or networks.
As mentioned briefly above, unstructured text about an object contains rich information about the object. It is desirable to be able to interpret and utilize such information. In the conventional scheme, an object is described using a manually specified sentence or a sentence extracted from text. This conventional approach fails to extract factors for the subject, nor does it quantify the degree of interest of these factors (degree of concern). Thus, traditional schemes have limited interpretation of unstructured text and also do not provide information that is available for further use.
Embodiments of the present disclosure propose a scheme for information processing. In one aspect of the disclosure, keywords are extracted from a text set pertaining to a target object, and a set of target factors for the target object is determined based on groupings of the extracted keywords. Each target factor represents an aspect of the target object. By extracting the target factors from unstructured text, new factors that affect the target object can be discovered.
In another aspect of the disclosure, at least one key factor of the target object is determined from a set of target factors and a set of structuring factors for the target object. At least one target factor is different from the structuring factor. By taking into account both existing structuring factors and newly extracted target factors, the determined key factors are more accurate. This helps to recognize key aspects of the target object, thereby facilitating optimization of the target object.
In yet another aspect of the disclosure, an information collection sheet for collecting a description of a target object is presented based on the determined at least one key factor. In this way, the design of the information collection sheet is optimized, thereby more effectively collecting evaluation information about the target object.
Example Environment
FIG. 1 illustrates a schematic diagram of an example environment 100 in which embodiments of the present disclosure may be implemented. In the environment 100, the first computing device 110 receives the text set 105 for the target object, or the first computing device 110 extracts the text set 105 from the raw data. The text set 105 includes a plurality of texts 101-1, 101-2, which are also collectively or individually referred to as text 101. The target object may include any tangible object, intangible object, and combinations thereof. For example, the target object may be a product such as an article of daily use, food, or the like. As another example, the target object may be a service, such as a cloud computing service, a cloud storage service, or the like. For another example, the target object may be an entity that provides services and items, such as a flight, restaurant, hotel, etc.
Text 101 may be a description of the target object by a user of the target object. Text 101 may include sentences with emotions, such as apple being good or apple being bad. Text 101 may also include sentences that do not have emotion, e.g., i eat apples. The text 101 may be a rating (evaluation), comment (comments), comment (reviews), assessment (evaluation), suggestion, feeling, or the like for the target object. Text 101 contains information about factors affecting the target object. Each text 101 in the text set 105 may be provided by a different user or by the same user at different times.
In some embodiments, text 101 may be a user rating in a presentation page of a target object. The presentation page may originate, for example, from a shopping Application (APP), a service providing APP, a critique APP, etc.
In some embodiments, the text 101 may originate from an information collection sheet 150 for the target object, as shown in FIG. 1. As used herein, an "information collection sheet" is used to collect descriptions (e.g., evaluations, experiences, etc.) of a target object, and may be, for example, an electronic questionnaire, comment, etc. The information collection sheet 150 includes open questions about the target object. Text 101 may be a user's answer to an open question.
Fig. 2 shows one example of an information collection sheet 150. In this example, the information collection sheet 150 for a flight includes an open question 230. The user may provide an assessment of the flight, etc., through a text box. The reply set 250 of the information collection sheet 150 is shown in tabular form. Each row in reply set 250 represents a reply record from the same user. In each answer record, column 258 is an answer to open question 230. Text 101 may be text in column 258.
With continued reference to fig. 1. The first computing device 110 determines target factors 102-1, 102-2, of the target object based on the text set 105, which are also collectively referred to as a set of target factors 102 or individually referred to as target factors 102. Such target factors 102 are determined from unstructured text and are thus also referred to as "extracted factors" or "unstructured factors".
A set of target factors 102 is provided to the second computing device 120. The second computing device 120 also receives or determines the structuring factors 103-1, 103-2, of the target object, which are also collectively referred to as a set of structuring factors 103 or individually referred to as structuring factors 103. As used herein, the term "structuring factor" refers to a factor for which a metric (criterion) has a predetermined choice (e.g., a predetermined value, category, star rating, etc.). For a structuring factor, the user may select one of the predetermined options to evaluate or describe the target object from the point of view of the structuring factor. The structuring factors are quantitative and highly organized. The description of the structuring factors (e.g., evaluation, assessment) is not open, but needs to conform to an architecture with predetermined options.
The structuring factors may in turn include numeric or categorical factors. The predetermined options for numeric factors include predetermined values or star levels, etc. The predetermined options for the category type factors include predetermined categories, such as the category of the bunk, and the like. The target factors and structuring factors are also collectively or individually referred to herein as "factors".
In some embodiments, a set of structuring factors 103 may be from an information collection sheet 150, as shown in FIG. 1. The information collection sheet 150 may include closed questions about the structuring factor 103. "closed questions" refer to questions in which answers are selected from predetermined options. In the example of fig. 2, the information collection sheet 150 includes a closed question 210-1 about a structural factor "seat comfort", a closed question 210-2 about a structural factor "cabin service", a closed question 210-3 about a structural factor "food and beverage", a closed question 210-4 about a structural factor "entertainment", a closed question 210-5 about a structural factor "ground service", and a closed question 210-6 about a structural factor "value for mole". The closed questions 210-1 through 210-6 are also collectively or individually referred to as closed questions 210. Each closed question 210 has 5 scores for selection by the user. In answer set 250, columns 252-257 are user answers to closed questions 210-1 through 210-6, respectively.
With continued reference to fig. 1. The second computing device 120 determines at least one key factor 104-1, 104-2, … … of the target object from a set of target factors 102 and a set of structured factors 103, which are also collectively or individually referred to as key factors 104. As used herein, the term "key factor" refers to a factor that has an effect on a target object. The impact on the target object may include an impact on performance, service, functionality, overall evaluation, or satisfaction of the target object. In particular, the key factor may be a factor having a high degree of influence on the target object among the factors. The degree of influence may reflect the degree of importance of the factor on the target object.
In the example of fig. 2, the information collection sheet 150 includes closed questions 220 about the overall evaluation of the target object. In answer set 250, column 251 is a user's answer to closed questions 220.
With continued reference to fig. 1. The determined key factors 104 are provided to the third computing device 130. The third computing device 130 presents an information collection sheet 160 for the target object based on the key factors 104. In some embodiments, the information collection sheet 160 may be generated based on the key factors 104. In some embodiments, the information collection sheet 160 may be an updated version of the information collection sheet 150.
In environment 100, first computing device 110, second computing device 120, and third computing device 130 may be any type of computing-capable device, including a terminal device or a server device. The terminal device may be any type of mobile terminal, fixed terminal, or portable terminal, including a mobile handset, desktop computer, laptop computer, notebook computer, netbook computer, tablet computer, media computer, multimedia tablet, personal Communication System (PCS) device, personal navigation device, personal Digital Assistant (PDA), audio/video player, digital camera/camcorder, positioning device, television receiver, radio broadcast receiver, electronic book device, game device, or any combination of the preceding, including accessories and peripherals for these devices, or any combination thereof. The server devices may include, for example, computing systems/servers, such as mainframes, edge computing nodes, computing devices in a cloud environment, and so forth.
It should be understood that the structure and function of environment 100 are described for illustrative purposes only and are not meant to suggest any limitation as to the scope of the disclosure. Although the first computing device 110, the second computing device 120, and the third computing device 130 are shown separately in fig. 1, in some embodiments, two or all of the first computing device 110, the second computing device 120, and the third computing device 130 may be the same device, or belong to the same computing system.
Furthermore, the information collection sheet shown in FIG. 2 is merely exemplary and is not intended to limit the scope of the present disclosure. The open problems, closed problems, and the number thereof shown in fig. 2 are merely exemplary. In embodiments of the present disclosure, the information collection sheet may have any suitable number of open questions and closed questions. Additionally, although English is taken as an example, embodiments of the present disclosure may be used to process text and information collection sheets in any language.
Extraction of target factors
Fig. 3 illustrates a flow chart of a process 300 of determining target factors according to some embodiments of the present disclosure. The process 300 may be implemented at the first computing device 110. For ease of discussion, the process 300 is described with reference to FIG. 1.
At block 310, the first computing device 110 extracts a plurality of keywords from the unstructured text set 105 for the target object. The extracted keywords may have any suitable number of tokens. The keywords may include a word-segmentation keyword such as "flight", "set", "service", and the like, and a word-segmentation keyword such as "bin brew", "flight attendant", and the like. Any suitable keyword extraction algorithm may be used, such as, but not limited to, TF-IDF, KP-Miner, SBKE, RAKE, textRank, YAKE, keyBERT, and the like.
In some embodiments, the text 101 in the text set 105 may be pre-processed, such as removing named entities and stop words, prior to applying the keyword extraction algorithm. Named entities are, for example, person names, organization names, place names, etc., which do not describe any aspect of the target object. For english text, the stop words are, for example, "a", "an", "the", "and", etc. For chinese text, the stop words are, for example, "one", "and", "but" etc. Alternatively, in some embodiments, the text 101 may be preprocessed by a keyword extraction algorithm.
In some embodiments, nouns may be extracted from the corpus 105 as keywords using a keyword extraction algorithm. In this way, words that cannot describe other attributes of aspects of the target object can be avoided from being extracted. This can effectively reduce the difficulty of subsequent processing.
In some embodiments, the first computing device 110 may extract keywords based on the number of occurrences (i.e., word frequency) of each word in the text set 105. In particular, the first computing device 110 may extract candidate words from the text 101 of the text set 105. If the number of occurrences of the candidate word in the corpus 105 is greater than a threshold number of times, the candidate word is determined to be one of the keywords. If the number of occurrences of the candidate word in the corpus 105 is less than a threshold number of occurrences, the candidate word is removed.
For example, a keyword extraction algorithm may be utilized to extract candidate words from the column 258 of each reply record. For each candidate word extracted, the number of occurrences of the candidate word in the entire text set 105 is calculated. Candidate words having a number of occurrences greater than a threshold number of occurrences are determined as keywords, and candidate words having a number of occurrences less than the threshold number of occurrences are removed. In such an embodiment, by filtering the preliminarily extracted candidate words, the determination of unimportant word disturbance target factors may be avoided.
Alternatively, in some embodiments, the first computing device 110 may extract keywords based on the semantics of the text 101 in the text set 105. For example, sentences having emotion can be determined by semantic analysis, and nouns related to emotion in such sentences are used as keywords.
At block 320, the first computing device 110 groups at least a portion of the plurality of keywords based on the semantics of the plurality of keywords. In some embodiments, all keywords may be grouped. In some embodiments, keywords may be filtered based on the results of the preliminary grouping and the filtered keywords may be grouped.
The first computing device 110 may group the extracted plurality of keywords using clustering. For this purpose, a word vector representing the semantics is generated for each keyword. Any suitable method may be used to generate the word vector, such as word2vector, gloVe, etc. Embodiments of the disclosure are not limited in this respect.
The plurality of keywords may be clustered based on the word vector to determine a plurality of clusters, wherein each cluster includes at least one keyword. The clustering algorithm may divide the keywords into independent, non-overlapping clusters based on their semantic similarity. Any suitable clustering algorithm may be employed, such as K-Means (K-Means), density-based noise-robust spatial clustering algorithm (Density-Based Spatial Clustering of Applications with Noise, DBSCAN), gaussian mixture model, and the like.
In some embodiments, keywords may be filtered based on the quality of the individual clusters. The quality of a cluster represents the degree of semantic aggregation of keywords in the cluster. For example, the sum of the squared distances of the keywords in a cluster can be used as the quality of the cluster. Alternatively or additionally, the contour coefficients (Silhouette coefficient) can also be used as the quality of the clusters.
The quality of each cluster resulting from the clustering may be determined. In some embodiments, keywords in clusters having a quality below a threshold quality may be removed to determine remaining keywords. The remaining keywords may be grouped based on their semantics. For example, the remaining keywords may be clustered. The keywords in the same cluster are treated as a group of keywords. Alternatively, in some embodiments, clusters with a quality below a threshold quality may be removed, and other clusters with a quality above the threshold quality may be retained. For the reserved clusters, the keywords in the same cluster are considered as a set of keywords. In such an embodiment, there is no need to regroup the remaining keywords.
Fig. 4 shows an example of keyword grouping. The grouping result may be obtained by processing text in column 258 in reply set 250. In fig. 4, a keyword group 410, a keyword group 420, a keyword group 430, a keyword group 440, a keyword group 450, a keyword group 460, and a keyword group 470 are determined by clustering. Each keyword group includes one or more keywords.
With continued reference to fig. 3. At block 330, the first computing device 110 determines target factors 102 corresponding to a set of keywords based on the results of the grouping. The target factor 102 represents one aspect of the target object. The same set of keywords have similar semantics and thus represent the same aspect of the target object. In view of this, a set of keywords may correspond to one target factor 102.
The name or identification of the target factor 102 corresponding to a set of keywords may be determined based on the set of keywords. As an example, any one of the set of keywords may be used to represent a corresponding target factor. As another example, the center of a cluster of a set of keywords may be determined, and the keywords whose semantic features are closest to the center are used to represent the corresponding target factors. As yet another example, the target factor may be represented by an aspect (e.g., service or performance) of the target object described by the set of keywords.
In the example of fig. 4, the target factor corresponding to the keyword group 410 is "tv service". The target factor corresponding to the keyword group 420 is "boarding procedure (boarding flow)". The target factor corresponding to the keyword group 430 is "baggage service". The target factor corresponding to the keyword group 440 is "movie service". The target factor corresponding to the keyword group 450 is "price". The target factor corresponding to the keyword group 460 is "time". The target factor corresponding to the keyword group 470 is "leg room".
In some embodiments, one or more sets of keywords that are the same as or similar to the structuring factor may be removed. In this case, the first computing device 110 determines target factors corresponding to the set of keywords that have not been removed. For example, for each set of keywords, the first computing device 110 may determine whether the set of keywords semantically resemble the structuring factors of the target object. If the set of keywords is semantically dissimilar to any of the structuring factors, a targeting factor is determined based on the set of keywords. The set of keywords may be removed if they semantically resemble a certain structuring factor.
As an example, a set of keywords "food", "mean", "drink", "snack" may be obtained by processing the text in column 258. The set of keywords is semantically similar to the structuring factor "food and beverage" in fig. 2. Accordingly, the set of keywords is removed without determining the targeting factors corresponding thereto.
Through the above-described process 300, factors for the target object are extracted from open text reviews or comments. In this way, the information contained in such unstructured text is analyzed, which facilitates the discovery of new factors that affect the target object.
Process 300 may also include additional blocks. In some embodiments, the first computing device 110 may determine at least one target sentence corresponding to the target factor 102 based on the text set 105. The target statement reflects a perspective about the target factor. For example, the target sentence may be a simple and intelligible sentence related to the target factor. The target statement may be used to interpret the target factor.
Each target sentence should relate to (e.g., describe or discuss) the target factor and have an explicit emotion. The target statement may be a statement reflecting a positive perspective about the target factor. Alternatively or additionally, the target statement may be a statement reflecting a negative perspective on the target factor. Additionally, each target statement should be a valid and understandable statement. In some embodiments, the target statement may relate only to the target factor, and not to other factors of the target object. In such an embodiment, the target statement may explicitly interpret the individual factors to avoid confusion.
The first computing device 110 may determine the target statement in any suitable manner. For example, one or more statements may be generated that relate to the target factor. It is determined whether there are semantically matched sentences and the number of matched sentences in the text set 105 that match the generated sentences. If the number of matched sentences exceeds the threshold number, the generated sentences may be taken as target sentences.
In some embodiments, the first computing device 110 may utilize keywords corresponding to target factors to determine the target statement. Specifically, the first computing device 110 may extract at least one candidate sentence from the text set 105. Each candidate sentence extracted contains at least one keyword from a set of keywords corresponding to the target factor. For example, the candidate sentence extracted for the target factor "leg" includes at least one keyword in the keyword group 470.
The first computing device 110 may in turn determine at least one target sentence related to the target factor based on the extracted at least one candidate sentence. For example, the extracted candidate sentence may be directly taken as the target sentence. For another example, candidate sentences having the same emotion may be fused into one target sentence, or one target sentence may be generated based on candidate sentences having the same emotion.
Depending on the perspective of the target factor reflected by the text in the text set 105, the target sentence may include a sentence reflecting a positive perspective about the target factor, a sentence reflecting a negative perspective about the target factor, or both. That is, the target sentence may include a sentence having positive emotion, a sentence having negative emotion, or both.
Table 500 in fig. 5 shows information about the target factor "leg room". Target sentence 501 has positive emotion with respect to target factor "leg room", while target sentence 502 has negative emotion with respect to target factor "leg room". The information conveyed by the target factors is limited or unintelligible to present alone. The target factor can be interpreted using the target statement. The target sentence is presented together with the target factor, so that a relevant party of the target object can more intuitively understand the target factor.
In some embodiments, the first computing device 110 may further determine a number of sentences in the text set 105 that have similar semantics to the target sentence. This number may be used as a frequency of target statements. For example, the frequency of the target statement 501 is shown as 500 in FIG. 5, while the frequency of the target statement 502 is shown as 800. This means that the negative point of view is more than the positive point of view with respect to the target factor "leg room". By determining and presenting the frequency of the target sentences, the advantages and disadvantages of the target object in terms of target factors can be intuitively known.
Quantification of target factors
In some embodiments, the target factor may be further quantified. As used herein, quantifying a factor refers to determining a measure of the factor that represents the degree of interest, importance, or effort of the factor. Metrics for a target factor 102 may be determined based on a set of keywords corresponding to the target factor 102 and the text 101 in the text set 105.
Such a metric may be expressed as the number of occurrences of a set of keywords in text 101 corresponding to target factor 102. A metric of the target factor 102 may be determined for each text 101. In this case, the number of occurrences of the keyword in each text 101 may be determined as a metric. In embodiments where text 101 originates from information collection sheet 150, the number of occurrences of keywords may be determined for each reply record of information collection sheet 150. In some embodiments, the emotion of the text may also be analyzed, and the number of occurrences of the keyword may be determined based on the text with emotion. Such an embodiment will be described below with reference to fig. 7.
For example, to determine a measure of the target factor "leg room" corresponding to the keyword group 470, the number of occurrences of keywords "leg", "leg room", and "leg space" in each text 101 may be determined. Fig. 6A shows an example of a measure of the target factor "leg room". Each value in column 610 represents the number of occurrences of the keywords "leg", "leg room", and "leg space" in the corresponding text of column 258.
Alternatively, the metric for target factor 102 may be expressed as an emotion level of a sentence in text 101 that includes a keyword corresponding to target factor 102. The emotion rating may be divided into five classes, for example, which are represented by the values 1 to 5, respectively. A metric of the target factor 102 may be determined for each text 101. In this case, the emotion level of the sentence including the keyword in each text 101 may be determined. In embodiments where text 101 originates from information collection sheet 150, the emotion level of the sentence is determined for each reply record.
Still taking the target factor "leg" as an example, in order to determine its metric, the emotion level of a sentence containing at least one of the keywords "leg", "leg room", and "leg space" in each text 101 may be determined. Fig. 6B shows an example of a measure of the target factor "leg. Each value in column 620 represents an emotion level of a sentence that contains at least one of the keywords "leg", "leg room", and "leg space" in the corresponding text of column 258. The value "0" may indicate that the corresponding text does not contain the keywords "leg", "leg room", and "leg space", or that the sentence containing the keywords "leg", "leg room", and "leg space" is a neutral sentence without emotion. Further, the initially quantized values may be converted for consistency with the metrics of the structuring factors. For example, the value "0" may be converted to a score of 3 to represent neutral emotion. It should be appreciated that the above is merely one example of quantization of target factors, and that any suitable manner and number of times the target factors may be quantized in embodiments of the disclosure.
It should be understood that the values of the metrics of the target factors shown in fig. 6A and 6B are merely exemplary and are not intended to limit the scope of the present disclosure. The quantification of the target factors described above may be implemented by either or both of the first computing device 110 or the second computing device 120. The target factors may also be quantified using the approach described below.
Determination of key factors
Fig. 7 illustrates a flow chart of a process 700 of determining key factors according to some embodiments of the present disclosure. Process 700 may be implemented at second computing device 120. For ease of discussion, process 700 is described with reference to fig. 1 and 7.
At block 710, the second computing device 120 obtains a set of target factors 102 for a target object. The target factors 102 are determined based on unstructured text sets 105 for the target object, and each target factor 102 represents an aspect of the target object.
In some embodiments, the second computing device 120 may receive an indication of the target factor 102 from the first computing device 110, as shown in fig. 1. Alternatively, in some embodiments, the second computing device 120 may determine the target factor 102 based on the text set 105, as described above with reference to fig. 3.
The second computing device 120 also receives or determines a set of structuring factors 103 for the target object. The structuring factor 103 originates, for example, from the information collection sheet 150, as described with reference to fig. 1. At least one target factor of the set of target factors 102 is different from the set of structuring factors 103.
In some embodiments, all of the target factors 102 are different from the structuring factors 103.
At block 720, the second computing device 120 determines at least one key factor 104 for the target object based on the set of target factors 102 and the set of structured factors 103 for the target object. In some embodiments, the second computing device 120 may determine the number of occurrences of the keyword corresponding to each factor in the corpus 105. The target factors 102 and the structured factors 103 are ordered by number of occurrences, and the top-ranked number of factors is determined to be the key factor.
In some embodiments, to determine key factors, the second computing device 120 may quantify target factors and structuring factors. In particular, second computing device 120 may determine a respective first metric for a set of target factors 102 by analyzing the emotion of text 101 in text set 105. The first metric represents a degree of interest of the corresponding target factor. The second computing device 120 may also determine a respective second metric for the set of structuring factors 103, the second metric representing a degree of interest for the respective structuring factor. To determine the key factors, the measurements of the different types of factors should be consistent. Thus, the first metric and the second metric are matched in terms of metrology scale.
In some embodiments, the first metric may be expressed as a number of occurrences of the keyword. For each target factor in a set of target factors, a sentence containing a keyword corresponding to the target factor and having emotion may be determined from the text 101 of the text set 105. A first metric for the target factor may be determined based on the number of occurrences of the keyword corresponding to the target factor in the sentence.
As an example, for the target factor "leg room", a sentence including keywords "leg", "leg room", and "leg space" and having emotion may be determined in each text 101. The number of occurrences of the keywords "leg", "leg room", and "leg space" in these sentences is determined as a first metric. For example, column 610 in FIG. 6A shows a first metric of the target factor "leg room".
In such an embodiment, to match the second metric of the structuring factor to the first metric, the scores (rating) in columns 252 through 258 are not suitable for direct use as the second metric. For this reason, the structuring factor needs to be re-quantized. In particular, for each structuring factor in a set of structuring factors, the second computing device 120 may determine, from the text 101, a sentence that contains a keyword corresponding to the structuring factor and has emotion. A second metric for the structuring factor may be determined based on the number of occurrences of the keyword corresponding to the structuring factor in the sentence.
As an example, for the structuring factor "food and beverage", sentences including the keywords "food", "mean", "drink", and "snack" and having emotion may be determined in each text 101. The number of occurrences of the keywords "food", "mean", "drink" and "snack" in these sentences is determined as a second measure. For example, column 630 in FIG. 6A shows a second metric of the structuring factor "food and beverage".
Alternatively, in some embodiments, the first metric may be represented as an emotion level of the sentence containing the keyword. For each target factor in a set of target factors, a sentence containing a keyword corresponding to the target factor and having emotion may be determined from the text 101 in the text set 105. A first metric of the target factor is determined based on the emotion level of the sentence. The emotion level of a sentence may be determined in any suitable manner, embodiments of the present disclosure are not limited in this respect.
As an example, for the target factor "leg room", a sentence including keywords "leg", "leg room", and "leg space" and having emotion may be determined in each text 101. The emotion level of the sentence is determined as a first metric. For example, column 620 in FIG. 6B shows a first metric of the target factor "leg room".
In such an embodiment, for each structuring factor in the set of structuring factors 103, a second metric for that structuring factor may be determined based on a response to the closed-ended question regarding that structuring factor. For example, the user's score for the structuring factor may be used as a second metric. In fig. 6B, the scores in columns 252 through 258 may be used as a second metric for the corresponding structuring factor. It can be seen from fig. 6A and 6B that a first metric and a second metric are determined for each reply record.
The determination of the first metric and the second metric is described above. The second computing device 120 may, in turn, determine a degree of impact of each factor on the target object based on the respective first metrics of the set of target factors 102 and the respective second metrics of the set of structured factors 103. The degree of influence may be determined according to any suitable algorithm. Such algorithms may include, but are not limited to, linear regression, logistic regression, eplerian values, and the like.
The factor intensity may be determined for each target factor 102 and each structuring factor 103 as an indication of the extent of the influence. The factor intensity represents the importance of the corresponding factor to the result related to the target object. The results related to the target object may include, for example, performance of the target object, overall evaluation of the target object, satisfaction with the target object, and the like. For the example in fig. 2, the result related to the target object is a reply to the closed question 220, i.e., a score listed in column 251.
Key factors may then be selected from the target factors 102 and the structured factors 103 according to the degree of influence. For example, a number of factors may be selected that rank first by the degree of influence. The process of ordering factors by their degree of influence on a target object (e.g., their strength) is also referred to herein as "critical factor ordering (key factor ranking, KFR)".
In some embodiments, key factors may be selected from a set of target factors 102 and a set of structuring factors 103, respectively. Specifically, a first number of target factors may be selected as key factors from a set of target factors 102 according to the degree to which each of the set of target factors 102 affects the target object. A second number of structuring factors may be selected from the set of structuring factors 103 as key factors in accordance with the degree to which each of the set of structuring factors 103 affects the target object.
The first number and the second number of values may be predetermined. Alternatively, the selected factor may be a factor that affects a degree greater than a threshold degree (e.g., a factor intensity greater than a threshold intensity). In this case, the values of the first number and the second number are not predetermined. Implementations of the disclosure are not limited in this respect.
FIG. 8A shows the results of critical factor ordering for target and structured factors, respectively. The factor intensities on the abscissa in fig. 8A represent the degree of influence of the corresponding elements on the target object. As shown, the target factors "price", "movie service", and "tv service" are selected from the target factors as key factors according to the factor intensities. The structuring factors "value for mole", "group service", "case service", "seat comfort" and "food and beverage" are selected from the structuring factors as key factors according to the factor intensities.
In some embodiments, the key factors may be selected from a union of a set of target factors 102 and a set of structuring factors 103. Specifically, a third number of factors may be selected as key factors from a union of the set of target factors 102 and the set of structuring factors 103 according to the degree of influence of each of the set of target factors 102 on the target object and the degree of influence of each of the set of structuring factors 103 on the target object.
The third number of values may be predetermined. Alternatively, the selected factor may be a factor that affects a degree greater than a threshold degree (e.g., a factor intensity greater than a threshold intensity). In this case, the value of the third number is not predetermined. Implementations of the disclosure are not limited in this respect.
FIG. 8B shows the result of critical factor ordering for target factors along with structured factors. The factor intensities on the abscissa in fig. 8B represent the degree of influence of the corresponding elements on the target object. As shown, the factors "value for mole", "group service", "case service", "seat component", "food and beverage", "price", "movie service", and "tv service" are selected as key factors according to the factor intensities.
Presentation of information collection sheets
Fig. 9 illustrates a flow chart of a process 900 of presenting an information collection sheet according to some embodiments of the present disclosure. Process 900 may be implemented at third computing device 130. For ease of discussion, process 900 is described with reference to fig. 1 and 9.
At block 910, the third computing device 130 obtains a set of target factors 102 for the target object. The target factors 102 are determined based on unstructured text sets 105 for the target object, and each target factor 102 represents an aspect of the target object.
In some embodiments, the third computing device 130 may receive an indication of the target factor 102 from the first computing device 110, as shown in fig. 1. Alternatively, in some embodiments, the third computing device 130 may determine the target factor 102 based on the text set 105, as described above with reference to fig. 3.
At block 920, the third computing device 130 presents an information collection sheet for collecting a description of the target object based on the at least one key factor 104 of the target object. At least one key factor 104 is determined from a set of structuring factors 103 and a set of target factors 102 of the target object. In some embodiments, the third computing device 130 may receive an indication of at least one key factor 104 from the second computing device 120, as shown in fig. 1. Alternatively, in some embodiments, the third computing device 130 may determine key factors from a set of structural factors 103 and a set of target factors 102, as described above with reference to fig. 7.
In some embodiments, the text 101 in the text set 105 originates from a reply to an open question in the information collection sheet, and the information collection sheet includes a corresponding closed question with respect to a set of structuring factors 103. In such an embodiment, the third computing device 130 presents an updated version of the information collection sheet based on at least one key factor. The updated version includes updated closed-form questions. In this way, the new information collection sheet can more directly collect user ratings for the aspects of interest.
As an example, based on the information collection sheet 150 and the corresponding reply set 250 shown in fig. 2, key factors "value for mole", "group service", "case service", "set component", "food and beverage", "program", "movie service", and "tv service" are determined, such as shown in fig. 8B. The third computing device 130 presents the information collection sheet 160 shown in fig. 10, which is an updated version of the information collection sheet 150 shown in fig. 2. In contrast to fig. 2 and 10, closed questions 210-1 through 210-6 are updated to closed questions 210-1, 210-2, 210-3, 210-5, 210-6, and 1010.
In some embodiments, if at least one key factor 104 includes a certain target factor, the third computing device 130 may add closed questions about the target factor in the updated version of the information collection sheet. The third computing device 130 in turn presents an updated version of the information collection sheet that includes the closed question.
Continuing with the example above, the key factors include the target factor "price". Accordingly, the presented information collection sheet 160 includes closed questions 1010 about the target factor "price". In this way, aspects that the user is likely to be concerned with are added to the information collection sheet as structuring factors. This helps to more fully and conveniently collect user ratings of the target object.
In some embodiments, if at least one key factor 104 does not include a certain structuring factor, the third computing device 130 may remove closed-form issues with respect to the structuring factor from the information collection sheet. The third computing device 130 may in turn present an updated version in which the closed-form problem with the structuring factor is removed.
Continuing with the example above, the key factors do not include the structural factor "entry information" in the information collection sheet 150. Accordingly, the closed problem 210-4 regarding the structuring factor "entry is removed. In contrast to the information collection sheet 150, the presented information collection sheet 160 does not include the closed question 210-4. In this way, aspects that may be of less concern to the user are removed from the information collection sheet. This may avoid interference of unimportant questions to the user.
Alternatively or additionally, in some embodiments, the third computing device 130 may present hints about target factors included in the key factors while the information collection sheet is presented. The prompt facilitates the user to give a description about the target factors, such as experience, rating, satisfaction, etc. In particular, while the information collection sheet 160 is presented, the third computing device 130 may detect a reply to the open question 230 in the information collection sheet 160. If a reply is detected being provided, the third computing device 130 may present such a prompt.
Fig. 11 shows an example of a hint about a targeting factor. Continuing with the example above, the key factors include the objective factor "movie service". As shown, the user is typing The text "The food is OK, and (food is not good, and)" in text box 1120 of open question 230. In response to detecting that the text is being typed, the third computing device 130 may present a prompt 1110"How about the movie (movie how. The prompt 1110 alerts the user to give an experience or assessment of the targeting factor "movie service".
In some embodiments, the third computing device 130 may interactively determine and present the information collection sheet 160. In particular, the third computing device 130 may present at least one key factor 104. While the at least one key factor is presented, the third computing device 130 may detect a selection of the at least one key factor. If a selection of at least one key factor is detected, the third computing device 130 may add a closed question to the information collection sheet 160 regarding the selected key factor, thereby presenting the information collection sheet 160 including the closed question.
In such an embodiment, the third computing device 130 may be a device related to a domain expert (domain expert) of the target object. The determined key factors are presented to the domain expert. The domain expert can determine those key factors that require the addition of a closed problem. The third computing device 130 may set closed questions in the information collection sheet 160 according to the selection of the domain expert. In this way, objective data can be utilized to help domain professionals design better information collection sheets, such as better questionnaires.
Topic model and emotion representation
Determination and quantification of target factors is described above with reference to fig. 3-6B. A keyword-aided topic model (topic model) may also be used to extract target factors from the text set 105. Keyword-assisted topic models (also referred to below simply as topic models) incorporate domain knowledge into the topic models via "anchor" words. The anchor word may be used as a marker for a particular topic, i.e., the anchor word encourages topic models to search for topics related to the anchor word. Thus, anchor words help the topic model separate different topics from each other. The topic model can help find topics of interest. Topic models include, but are not limited to, anchor CorEX.
In view of this, a topic model may be used to determine the target factors implicit in the text set 105. The plurality of keywords extracted from the text set 105 may be used as anchor words for the topic model. In this case, the topic derived from the topic model may be used as the target factor 102. Individual keywords are anchored to a topic through a topic model. Therefore, the corresponding relation between the target factors and the keywords can be determined by using the topic model.
The emotion representation may be used to quantify the target factors derived from the topic model. Any text analysis method capable of analyzing emotion may be used. As an example, a language exploration and word count (LIWC) dictionary may be used. The LIWC dictionary may map words to a plurality of categories. These categories may capture the vocabulary and semantic features of the text. Categories related to positive emotions may be used. The LIWC categories that measure positive emotion are grouped. LIWC categories related to positive emotions can be used. The vector representation of positive emotion is the frequency of words belonging to the positive emotion category. An estimate of a certain target factor may be made up of a vector representation of binarized anchor topic variables and emotions.
Multi-modal model
A machine learning model may also be used to analyze information (such as reply set 250) that includes both text and predetermined options. Fig. 12A illustrates a machine learning model 1200 for a tendencies score. The language model 1210 in the model 1200 is configured to generate a feature representation of the text 101, which may be considered a target factor determined from the text set 105. Unlike the process described above with reference to fig. 3, the target factor determined in this way is an implicit representation.
A multi-layer perceptual (MLP) layer 1220 is used to generate a feature representation of the structuring factor 103. If the structuring factor 103 includes both numeric (e.g., as shown in FIG. 2) and category (e.g., bilge category) factors, the MLP layer 1220 may include two MLP layers for handling numeric and category factors, respectively.
The MLP layer 1230 generates features h based on the feature representation of the text 101 and the feature representation of the structuring factor 103 e . By matching the characteristic h e Applying the softmax activation function, a predisposition score may be determined.
The model 1200 may be trained using cross entropy loss functions. The language model 1210 may be any suitable type of language characterization model, such as a bi-directional encoder representation (Bidirectional Encoder Representations from Transformers, BERT) model from a transformer.
Fig. 12B shows a machine learning model 1250 for conditional outcome expectations. Language model 1260 in model 1250 is configured to generate a feature representation of text 101 that may be considered a target factor determined from text set 105. The target factor determined in this way is an implicit representation.
The MLP layer 1270 is used to generate a feature representation of the structuring factor 103. If the structuring factors 103 include numeric-type factors (e.g., as shown in FIG. 2) and category-type factors (e.g., bunk categories), the MLP layer 1270 may include two MLP layers for handling numeric-type factors and category-type factors, respectively.
The MLP layer 1280 generates a feature based on the feature representation of the text 101 and the feature representation of the structuring factor 103Sign h Q . In the case of non-continuous (e.g. category or value) results Y, by applying to the feature h Q The condition outcome expectations may be determined by applying a softmax activation function. In the case of continuous result Y, by comparison with feature h Q The conditional outcome expectations can be determined by applying a linear activation function.
In the case where the result Y is discontinuous, the model 1250 may be trained using a cross entropy loss function. In the case where the results Y are continuous, the model 1250 may be trained using Mean Square Error (MSE). Similar to language model 1210, language model 1260 may be any suitable type of language characterization model, such as a BERT model.
The key factor ranking may be based on one or both of the tendencies score and the conditional outcome expectations. Thus, key factors of the target object can be determined. Process 700 may also be implemented using a machine learning model as described herein.
Example apparatus
Fig. 13 illustrates a block diagram that shows a computing device 1300 in which one or more embodiments of the disclosure may be implemented. It should be understood that the computing device 1300 illustrated in fig. 13 is merely exemplary and should not be taken as limiting the functionality and scope of the embodiments described herein. The computing device 1300 illustrated in fig. 13 may be used to implement the first computing device 110, the second computing device 120, or the third computing device 130 of fig. 1.
As shown in fig. 13, computing device 1300 is in the form of a general purpose computing device. Components of computing device 1300 may include, but are not limited to, one or more processors or processing units 1310, memory 1320, storage 1330, one or more communication units 1340, one or more input devices 1350, and one or more output devices 1360. The processing unit 1310 may be an actual or virtual processor and is capable of performing various processes in accordance with programs stored in the memory 1320. In a multiprocessor system, multiple processing units execute computer-executable instructions in parallel to increase the parallel processing capabilities of computing device 1300.
Computing device 1300 typically includes a number of computer storage media. Such media can be any available media that is accessible by computing device 1300 and includes, but is not limited to, volatile and non-volatile media, removable and non-removable media. The memory 1320 may be volatile memory (e.g., registers, cache, random Access Memory (RAM)), non-volatile memory (e.g., read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory), or some combination thereof. The storage device 1330 may be a removable or non-removable medium and may include a machine-readable medium such as a flash drive, a magnetic disk, or any other medium that may be capable of storing information and/or data (e.g., training data for training) and may be accessed within the computing device 1300.
Computing device 1300 may further include additional removable/non-removable, volatile/nonvolatile storage media. Although not shown in fig. 13, a magnetic disk drive for reading from or writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk may be provided. In these cases, each drive may be connected to a bus (not shown) by one or more data medium interfaces. Memory 1320 may include a computer program product 1325 having one or more program modules configured to perform the various methods or acts of the various embodiments of the present disclosure.
Communication unit 1340 enables communication with other computing devices via a communication medium. Additionally, the functionality of the components of computing device 1300 may be implemented as a single computing cluster or as multiple computing machines capable of communicating over a communications connection. Accordingly, computing device 1300 may operate in a networked environment using logical connections to one or more other servers, a network Personal Computer (PC), or another network node.
The input device 1350 may be one or more input devices such as a mouse, keyboard, trackball, etc. The output device 1360 may be one or more output devices such as a display, speakers, printer, etc. Computing device 1300 can also communicate with one or more external devices (not shown), such as storage devices, display devices, etc., with one or more devices that enable a user to interact with computing device 1300, or with any device (e.g., network card, modem, etc.) that enables computing device 1300 to communicate with one or more other computing devices, as desired, via communications unit 1340. Such communication may be performed via an input/output (I/O) interface (not shown).
According to an exemplary implementation of the present disclosure, a computer-readable storage medium having stored thereon computer-executable instructions, wherein the computer-executable instructions are executed by a processor to implement the method described above is provided. According to an exemplary implementation of the present disclosure, there is also provided a computer program product tangibly stored on a non-transitory computer-readable medium and comprising computer-executable instructions that are executed by a processor to implement the method described above.
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus, devices, and computer program products implemented according to the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various implementations of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The foregoing description of implementations of the present disclosure has been provided for illustrative purposes, is not exhaustive, and is not limited to the implementations disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various implementations described. The terminology used herein was chosen in order to best explain the principles of each implementation, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand each implementation disclosed herein.

Claims (17)

1. A method of information processing, comprising:
obtaining a set of target factors for a target object, the set of target factors determined based on an unstructured text set for the target object, and each target factor representing an aspect of the target object; and
at least one key factor for the target object is determined based on the set of target factors and a set of structuring factors for the target object, wherein at least one target factor of the set of target factors is different from the set of structuring factors.
2. The method of claim 1, wherein determining the at least one key factor comprises:
Selecting a first number of target factors from the set of target factors as part of the at least one key factor according to the degree to which each of the set of target factors affects the target object; and
a second number of structuring factors is selected from the set of structuring factors as part of the at least one key factor in accordance with the degree to which each of the set of structuring factors affects the target object.
3. The method of claim 1, wherein determining the at least one key factor comprises:
and selecting a third number of factors from a combination of the set of target factors and the set of structuring factors as the at least one key factor according to the degree of influence of each of the set of target factors on the target object and the degree of influence of each of the set of structuring factors on the target object.
4. The method of claim 1, wherein determining the one key factor comprises:
determining a first metric for each of the set of target factors by analyzing emotion of text in the set of text, the first metric representing a degree of interest of the corresponding target factor;
determining a respective second metric for the set of structuring factors, the second metric representing a degree of interest for the respective structuring factor, the second metric matching the first metric in terms of scale; and
The at least one key factor is determined based on the first metric and the second metric.
5. The method of claim 4, wherein determining the first metric comprises:
for a target factor in the set of target factors,
determining a first sentence which contains a first keyword corresponding to the target factor and has emotion from the texts in the text set; and
determining the first measure of the target factor based on the number of occurrences of the first keyword in the first sentence, and
wherein determining the second metric comprises:
for a structuring factor in the set of structuring factors,
determining a second sentence which contains a second keyword corresponding to the structural factor and has emotion from the text;
the second metric of the structuring factor is determined based on a number of occurrences of the second keyword in the second sentence.
6. The method of claim 4, wherein determining the first metric comprises:
for a target factor in the set of target factors,
determining sentences which contain keywords corresponding to the target factors and have emotion from the texts in the text set;
Determining the first measure of the target factor based on the emotion level of the sentence, and
wherein determining the second metric comprises:
for a structuring factor in the set of structuring factors, determining the second metric for the structuring factor based on a reply to a closed-ended question regarding the structuring factor.
7. The method of claim 4, wherein determining the at least one key factor based on the first metric and the second metric comprises:
determining a degree of influence of each of the set of target factors and the set of structuring factors on the target object based on the first metric and the second metric; and
the at least one key factor is selected from the set of target factors and the set of structuring factors based on the degree of influence.
8. The method of claim 7, wherein determining the degree of influence is based on at least one of:
a linear regression of the data obtained from the data obtained,
a logistic regression is performed to determine,
xia Puli value.
9. An electronic device, comprising:
at least one processing circuit configured to:
obtaining a set of target factors for a target object, the set of target factors determined based on an unstructured text set for the target object, and each target factor representing an aspect of the target object; and
At least one key factor for the target object is determined based on the set of target factors and a set of structuring factors for the target object, wherein at least one target factor of the set of target factors is different from the set of structuring factors.
10. The apparatus of claim 9, wherein determining the at least one key factor comprises:
selecting a first number of target factors from the set of target factors as part of the at least one key factor according to the degree to which each of the set of target factors affects the target object; and
a second number of structuring factors is selected from the set of structuring factors as part of the at least one key factor in accordance with the degree to which each of the set of structuring factors affects the target object.
11. The apparatus of claim 9, wherein determining the at least one key factor comprises:
and selecting a third number of factors from a combination of the set of target factors and the set of structuring factors as the at least one key factor according to the degree of influence of each of the set of target factors on the target object and the degree of influence of each of the set of structuring factors on the target object.
12. The apparatus of claim 9, wherein determining the one key factor comprises:
determining a first metric for each of the set of target factors by analyzing emotion of text in the set of text, the first metric representing a degree of interest of the corresponding target factor;
determining a respective second metric for the set of structuring factors, the second metric representing a degree of interest for the respective structuring factor, the second metric matching the first metric in terms of scale; and
the at least one key factor is determined based on the first metric and the second metric.
13. The apparatus of claim 12, wherein determining the first metric comprises:
for a target factor in the set of target factors,
determining a first sentence which contains a first keyword corresponding to the target factor and has emotion from the texts in the text set; and
determining the first measure of the target factor based on the number of occurrences of the first keyword in the first sentence, and
wherein determining the second metric comprises:
for a structuring factor in the set of structuring factors,
Determining a second sentence which contains a second keyword corresponding to the structural factor and has emotion from the text;
the second metric of the structuring factor is determined based on a number of occurrences of the second keyword in the second sentence.
14. The apparatus of claim 12, wherein determining the first metric comprises:
for a target factor in the set of target factors,
determining sentences which contain keywords corresponding to the target factors and have emotion from the texts in the text set;
determining the first measure of the target factor based on the emotion level of the sentence, and
wherein determining the second metric comprises:
for a structuring factor in the set of structuring factors, determining the second metric for the structuring factor based on a reply to a closed-ended question regarding the structuring factor.
15. The apparatus of claim 12, wherein determining the at least one key factor based on the first metric and the second metric comprises:
determining a degree of influence of each of the set of target factors and the set of structuring factors on the target object based on the first metric and the second metric; and
The at least one key factor is selected from the set of target factors and the set of structuring factors based on the degree of influence.
16. The apparatus of claim 15, wherein determining the degree of influence is based on at least one of:
a linear regression of the data obtained from the data obtained,
a logistic regression is performed to determine,
xia Puli value.
17. A computer readable storage medium having stored thereon a computer program executable by a processor to implement the method of any of claims 1 to 8.
CN202210861703.8A 2022-07-20 2022-07-20 Method, apparatus, device and storage medium for information processing Pending CN117473049A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202210861703.8A CN117473049A (en) 2022-07-20 2022-07-20 Method, apparatus, device and storage medium for information processing
US18/355,250 US20240028836A1 (en) 2022-07-20 2023-07-19 Method, apparatus, device and storage medium for information processing
JP2023118434A JP2024014830A (en) 2022-07-20 2023-07-20 Method for information processing, device, installation, and memory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210861703.8A CN117473049A (en) 2022-07-20 2022-07-20 Method, apparatus, device and storage medium for information processing

Publications (1)

Publication Number Publication Date
CN117473049A true CN117473049A (en) 2024-01-30

Family

ID=89638410

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210861703.8A Pending CN117473049A (en) 2022-07-20 2022-07-20 Method, apparatus, device and storage medium for information processing

Country Status (1)

Country Link
CN (1) CN117473049A (en)

Similar Documents

Publication Publication Date Title
US11288444B2 (en) Optimization techniques for artificial intelligence
CN106897428B (en) Text classification feature extraction method and text classification method and device
CN108073568B (en) Keyword extraction method and device
US8549016B2 (en) System and method for providing robust topic identification in social indexes
EP3729231A1 (en) Domain-specific natural language understanding of customer intent in self-help
US20100169317A1 (en) Product or Service Review Summarization Using Attributes
CN112115299A (en) Video searching method and device, recommendation method, electronic device and storage medium
CN110334356B (en) Article quality determining method, article screening method and corresponding device
US20150019951A1 (en) Method, apparatus, and computer storage medium for automatically adding tags to document
CN108280124B (en) Product classification method and device, ranking list generation method and device, and electronic equipment
CN110597978B (en) Article abstract generation method, system, electronic equipment and readable storage medium
US9514223B1 (en) Synonym identification based on categorical contexts
US20160063596A1 (en) Automatically generating reading recommendations based on linguistic difficulty
CN112733042A (en) Recommendation information generation method, related device and computer program product
US9418058B2 (en) Processing method for social media issue and server device supporting the same
CN107077640A (en) Analyzed via experience ownership, it is qualification and intake unstructured data sources system and processing
CN113392218A (en) Training method of text quality evaluation model and method for determining text quality
WO2022245469A1 (en) Rule-based machine learning classifier creation and tracking platform for feedback text analysis
CN107908649B (en) Text classification control method
CN109284384B (en) Text analysis method and device, electronic equipment and readable storage medium
CN116484829A (en) Method and apparatus for information processing
JP2016177690A (en) Service recommendation device, service recommendation method, and service recommendation program
CN112529627B (en) Method and device for extracting implicit attribute of commodity, computer equipment and storage medium
CN115080741A (en) Questionnaire survey analysis method, device, storage medium and equipment
CN117473049A (en) Method, apparatus, device and storage medium for information processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication