CN115309867A - Text processing method, device, equipment and medium - Google Patents

Text processing method, device, equipment and medium Download PDF

Info

Publication number
CN115309867A
CN115309867A CN202210978990.0A CN202210978990A CN115309867A CN 115309867 A CN115309867 A CN 115309867A CN 202210978990 A CN202210978990 A CN 202210978990A CN 115309867 A CN115309867 A CN 115309867A
Authority
CN
China
Prior art keywords
frequency
target
sentence
target high
elements
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210978990.0A
Other languages
Chinese (zh)
Inventor
王兆麟
丁冠源
回姝
郭富琦
黄嘉桐
郑彤
张文娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
FAW Group Corp
Original Assignee
FAW Group Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by FAW Group Corp filed Critical FAW Group Corp
Priority to CN202210978990.0A priority Critical patent/CN115309867A/en
Publication of CN115309867A publication Critical patent/CN115309867A/en
Priority to PCT/CN2023/112841 priority patent/WO2024037483A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/169Annotation, e.g. comment data or footnotes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses a text processing method, a text processing device, text processing equipment and a text processing medium. The method comprises the following steps: splitting the comment text into at least two target sentences; determining target high-frequency sentence elements according to the word frequency of each sentence element in each target sentence appearing in all comment texts and the similarity between the sentence elements; calculating an importance quantitative value associated with each target high-frequency sentence element according to the evaluation level of the comment text in which each target high-frequency sentence element is located; and visually displaying the text processing result according to the importance quantization value, the theme type of each target high-frequency sentence element and the word frequency of each target high-frequency sentence element. According to the technical scheme, the high-frequency sentence elements can be extracted from the comment text, the text processing result is visually displayed based on the related information of the high-frequency sentence elements, the comment text is more accurately and effectively processed, and more accurate effective information can be extracted and visualized.

Description

Text processing method, device, equipment and medium
Technical Field
The present invention relates to the field of computers, and in particular, to a text processing method, apparatus, device, and medium.
Background
Online reviews are text-based presentations of a user's experience, rating, or opinion. The method has the characteristics of timeliness, quantitative property, non-structuring, complex content and the like. With the rapid increase of the number of the network comment information, text mining is carried out on the online comment, so that the online comment text mining method can help consumers to make rational decisions and guide designers to carry out design, production and version updating.
Therefore, how to better perform text processing on the comment text, accurately extract effective information in the text and perform visualization so that related personnel can pertinently improve the product quality is a problem to be solved urgently at present.
Disclosure of Invention
The invention provides a text processing method, a text processing device, text processing equipment and a text processing medium, which can realize more accurate and effective processing of comment texts and can extract more accurate effective information for visualization.
According to an aspect of the present invention, there is provided a text processing method, including:
splitting the comment text into at least two target sentences; the comment text is a text for commenting the relevant performance of the automobiles of the preset ranking automobile types;
determining target high-frequency sentence elements according to the word frequency of each sentence element in each target sentence appearing in all comment texts and the similarity between the sentence elements; wherein, the sentence elements are nouns and/or phrases;
calculating an importance quantitative value associated with each target high-frequency sentence element according to the evaluation level of the comment text in which each target high-frequency sentence element is positioned;
and visually displaying the text processing result according to the importance quantization value, the theme type to which each target high-frequency sentence element belongs and the word frequency of each target high-frequency sentence element.
According to another aspect of the present invention, there is provided a text processing apparatus including:
the splitting module is used for splitting the comment text into at least two target sentences; the comment text is a text for commenting the relevant performance of the automobiles of the preset ranking automobile types;
the determining module is used for determining target high-frequency sentence elements according to the word frequency of each sentence element in each target sentence appearing in all comment texts and the similarity between the sentence elements; wherein, the sentence elements are nouns and/or phrases;
the calculation module is used for calculating the importance quantitative value associated with each target high-frequency statement element according to the evaluation level of the comment text in which each target high-frequency statement element is located;
and the visualization module is used for visually displaying the text processing result according to the importance quantization value, the theme type of each target high-frequency sentence element and the word frequency of each target high-frequency sentence element.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform a text processing method according to any of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer-readable storage medium storing computer instructions for causing a processor to implement a text processing method according to any one of the embodiments of the present invention when the computer instructions are executed.
According to the technical scheme, the comment text is divided into at least two target sentences, the target high-frequency sentence elements are determined according to the word frequency of each sentence element in each target sentence appearing in all the comment text and the similarity between the sentence elements, the importance quantitative value related to each target high-frequency sentence element is calculated according to the evaluation grade of the comment text where each target high-frequency sentence element is located, and the text processing result is visually displayed according to the importance quantitative value, the theme type to which each target high-frequency sentence element belongs and the word frequency of each target high-frequency sentence element. By the mode, the high-frequency sentence elements can be extracted from the comment text, the text processing result is visually displayed based on the related information of the high-frequency sentence elements, the comment text can be more accurately and effectively processed, more accurate effective information can be extracted and visualized, and subsequent improvement on product quality is facilitated.
It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present invention, nor are they intended to limit the scope of the invention. Other features of the present invention will become apparent from the following description.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a text processing method according to an embodiment of the present invention;
fig. 2 is a flowchart of a text processing method according to a second embodiment of the present invention;
fig. 3 is a flowchart of a text processing method according to a third embodiment of the present invention;
FIG. 4 is a block diagram of a document processing apparatus according to a fourth embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," "alternative," "target," and the like in the description and claims of the invention and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example one
Fig. 1 is a flowchart of a text processing method according to an embodiment of the present invention, where the embodiment is suitable for performing analysis processing on a comment text related to automobile-related performance, and the method may be executed by a text processing apparatus, and the apparatus may be implemented in a software and/or hardware manner, and may be integrated in an electronic device having a function of implementing text processing. As shown in fig. 1, the method includes:
s101, dividing the comment text into at least two target sentences.
The comment text is a text for commenting the relevant performance of the automobile of the preset ranking automobile type. The target sentence refers to a sentence which meets the screening condition in the comment text. Each comment text may include at least two target sentences.
Optionally, a crawler tool may be used to randomly crawl user comments in a related automobile comment website, for example, a preset number (for example, 5000) of comment texts commenting on top-10 vehicle types may be obtained, that is, comment texts are obtained.
Optionally, the comment text may be split into at least two target sentences directly according to preset punctuation marks in the comment text, such as question marks, periods and exclamation marks; the comment text can be divided into at least two target sentences according to the preset punctuation marks (such as question marks, period marks and exclamation marks); the comment text can also be directly input into a pre-trained model, and at least two split target sentences are output, namely the comment text is split into at least two target sentences; the comment text may also be divided by using a word token function in a Natural Language Toolkit (NLTK), that is, the comment text is divided into at least two target sentences.
And S102, determining target high-frequency sentence elements according to the word frequency of each sentence element in each target sentence appearing in all comment texts and the similarity between the sentence elements.
Wherein, the sentence elements are nouns and/or phrases. Phrase refers to a common phrase made up of at least two words. A sentence element can be a noun or a phrase. For example, the sentence elements may be "blue", "wiper", "window", and "vehicle mirror", etc.
The term frequency refers to the frequency of occurrence of the sentence elements. The similarity refers to semantic similarity between semantic features represented by the sentence elements. The target high-frequency sentence elements refer to high-frequency sentence elements of which the sentence elements in each target sentence meet preset target screening conditions.
Optionally, the statement elements with the word frequency meeting the preset screening condition may be determined according to the word frequency of each statement element in each target statement appearing in all comment texts, and the statement elements with the word frequency meeting the preset screening condition are further screened according to the similarity between the statement elements with the word frequency meeting the preset screening condition, so as to determine the target high-frequency statement elements; the word frequency of each sentence element in each target sentence appearing in all comment texts and the similarity between the sentence elements can be input into a pre-trained model, and the target high-frequency sentence elements are output.
And S103, calculating an importance quantitative value associated with each target high-frequency sentence element according to the evaluation level of the comment text in which each target high-frequency sentence element is positioned.
Specifically, when a user evaluates the target high-frequency sentence element and outputs a comment text, the evaluation level of the target high-frequency sentence element can be selected, so that the evaluation level of the comment text where the target high-frequency sentence element is located by the user can be obtained. The rating may be very non-approved (Strong Disagree), non-approved (Disagree), general (Neither), very approved (Strong Agree), or approved (Agree). The importance quantization value refers to a quantization value that can represent the importance of the information of the target high-frequency sentence element.
Optionally, for each target high-frequency sentence element, inputting the evaluation level of each comment text in which the target high-frequency sentence element is located into a pre-trained model, and outputting an importance quantization value associated with the target high-frequency sentence element; for each target high-frequency sentence element, according to a preset rule, performing statistical analysis on the evaluation level of each comment text in which the target high-frequency sentence element is located, to determine an importance quantization value associated with the target high-frequency sentence element, specifically, according to the evaluation level of the comment text in which the target high-frequency sentence element is located, calculating an importance quantization value associated with each target high-frequency sentence element, including: determining the evaluation score of each target high-frequency sentence element in each target sentence to which the target high-frequency sentence element belongs according to the evaluation grade of each comment text in which the target high-frequency sentence element is positioned and the incidence relation between the evaluation grade and the evaluation score; calculating the average value of the scores of the target high-frequency sentence elements in the target sentences for each target high-frequency sentence element; the average value is used as an importance quantization value associated with each target high-frequency sentence element.
Here, the evaluation score refers to a score corresponding to the evaluation level. The correlation between the evaluation level and the evaluation score is one-to-one correspondence, and specifically, if the evaluation level is very non-recognizable (strong disagrere), the evaluation score may be 1; if the evaluation grade is not approved (Disagree); the rating score may be 2; if the rating scale is general (Neither), the rating score may be 3; if the rating level is very approved (strong age), the rating score may be 4; if the rating is affirmative (agre), the rating score may be 5.
Illustratively, for the target high-frequency sentence element a, which appears twice in the comment text B and appears in the target sentence C and the target sentence D in the comment text B, respectively, appears once in the comment text E and appears in the target sentence F, the evaluation level of the comment text B is not recognized, i.e., the evaluation score is 2, the evaluation level of the comment text E is very recognized, i.e., the evaluation score is 4, then the average value of the scores of the target high-frequency sentence element a in the target sentences (i.e., the target sentences C, D and F) to which the target high-frequency sentence element a belongs is (2 + 4)/3, i.e., the importance quantization value of the target high-frequency sentence element a is 8/3.
And S104, visually displaying the text processing result according to the importance quantization value, the theme type of each target high-frequency sentence element and the word frequency of each target high-frequency sentence element.
The theme type refers to a theme direction type represented by the semantics of the target high-frequency sentence element. The theme type may be a experience theme, an automobile hardware theme, a product display theme, or an automobile color theme. The sentence elements of the experience theme may be, for example, "stability", "comfort", "controllability", and the like. The sentence elements of the automobile hardware theme can be, for example, "door", "window", "microphone", "display", and the like. The sentence elements of the product presentation theme may be, for example, "pressure", "traction", "strut", and the like. The sentence elements of the car color theme may be, for example, "white", "black", and the like. The word frequency of the target high-frequency sentence element refers to the total frequency of the target high-frequency sentence element appearing in all the comment texts.
Optionally, the importance quantization value, the theme type and the word frequency of each target high-frequency sentence element can be used as a text processing result for visual display according to a preset display mode; the target high-frequency sentence elements can also be screened according to the importance quantization value, the topic type and the word frequency of the target high-frequency sentence elements, and the importance quantization value, the topic type and the word frequency of the target high-frequency sentence elements meeting preset screening conditions are used as text processing results to be visually displayed, namely the text processing results are visually displayed according to the importance quantization value, the topic type to which each target high-frequency sentence element belongs and the word frequency of each target high-frequency sentence element.
It should be noted that by processing and visually displaying the comment text, relevant personnel can conveniently evaluate the use experience of the user, and the product quality is pertinently improved, so that the use experience of the user is improved, and the product quality is perceived.
According to the technical scheme, the comment text is divided into at least two target sentences, the target high-frequency sentence elements are determined according to the word frequency of each sentence element in each target sentence appearing in all the comment texts and the similarity between the sentence elements, the importance quantization value associated with each target high-frequency sentence element is calculated according to the evaluation level of the comment text in which each target high-frequency sentence element is located, and the text processing result is visually displayed according to the importance quantization value, the theme type to which each target high-frequency sentence element belongs and the word frequency of each target high-frequency sentence element. By the mode, the high-frequency sentence elements can be extracted from the comment text, and the text processing result is visually displayed based on the related information of the high-frequency sentence elements, so that the comment text is more accurately and effectively processed, more accurate effective information can be extracted and visualized, and the subsequent improvement of the product quality is facilitated.
Optionally, before splitting the comment text, the method may further pre-process the obtained comment text, and specifically, the pre-process operation includes: and deleting brackets in the comment text, words and special symbols contained in the comment text and preset sensitive words.
It should be noted that the sentence in the parentheses is deleted because the sentence in the parentheses is often an explanatory supplement for the comment information and has no practical significance, and the deletion of the special symbol and the preset sensitive word facilitates subsequent identification of the subject of each target high-frequency sentence element more accurately, and improves the efficiency of text processing.
Optionally, if the comment text is english, the preprocessing of the comment text may further include: all English letters in the comment text are corrected to be in a lower case form.
Example two
Fig. 2 is a flowchart of a text processing method according to a second embodiment of the present invention, and this embodiment further explains in detail "determining target high-frequency sentence elements according to word frequencies of the sentence elements in each target sentence appearing in all comment texts and similarities between the sentence elements" based on the foregoing embodiments, and as shown in fig. 2, the method includes:
s201, dividing the comment text into at least two target sentences.
S202, determining the sentence elements in each target sentence, and counting the word frequency of each sentence element in all the comment texts.
Optionally, the nouns and/or phrases in the target sentences may be extracted according to a preset matching rule, that is, the sentence elements in the target sentences are determined.
Optionally, after determining the sentence elements of all target sentences of all the comment texts, counting, for each sentence element, the number of times that the sentence element appears in all the target sentences of all the comment texts, as the word frequency of the sentence element, that is, counting the word frequency that each sentence element appears in all the comment texts.
S203, according to the size relation between the word frequency of each sentence element and a preset word frequency threshold, determining alternative high-frequency sentence elements from the sentence elements in each target sentence.
The preset word frequency threshold refers to a preset threshold for measuring word frequency of the sentence element, for example, the preset word frequency threshold may be 50. The alternative high-frequency sentence elements refer to sentence elements meeting the relation with the preset word frequency size.
Optionally, after determining the word frequency of each sentence element, the word frequency may be compared with a preset word frequency threshold, and if the word frequency of the sentence element is greater than the preset word frequency threshold, the sentence element is determined to be an alternative sentence element.
And S204, determining the target high-frequency sentence elements from the alternative high-frequency sentence elements according to the similarity between the alternative high-frequency sentence elements.
The similarity refers to the similarity of semantic features represented by the alternative high-frequency sentence elements. The target high-frequency sentence elements refer to high-frequency sentence elements which meet preset similarity screening conditions in the alternative high-frequency sentence elements.
Optionally, the similarity between all the alternative high-frequency sentence elements and each other alternative high-frequency sentence element can be directly input into a pre-trained model, and the target high-frequency sentence element is output; the method may also analyze the similarity between the alternative high-frequency sentence elements based on a preset similarity screening condition to determine the target high-frequency sentence element from the alternative high-frequency sentence elements, and specifically, the determining the target high-frequency sentence element from the alternative high-frequency sentence elements according to the similarity between the alternative high-frequency sentence elements includes: determining the similarity between at least two alternative high-frequency sentence elements, and if the similarity is higher than a preset similarity threshold, dividing the at least two alternative high-frequency sentence elements into a group; and determining the target high-frequency sentence elements from each group of alternative high-frequency sentence elements according to the word frequency of each alternative high-frequency sentence element in each group of alternative high-frequency sentence elements in all comment texts.
The preset similarity threshold is a preset threshold for measuring semantic similarity between alternative high-frequency sentence elements. And determining at least one target high-frequency sentence element from each group of alternative high-frequency sentence elements.
Illustratively, the similarity between the alternative high-frequency sentence element "pattern" and the alternative high-frequency sentence element "style" is higher than a preset similarity threshold, and the similarity between the alternative high-frequency sentence element "booth" and the alternative high-frequency sentence element "materials" is higher than the preset similarity threshold.
For example, if the similarity between the candidate high-frequency term element a and the candidate high-frequency term element B is 0.4, the similarity between the candidate high-frequency term element C and the candidate high-frequency term element D is 0.7, and the similarity threshold is 0.5, it may be determined that the candidate high-frequency term element C and the candidate high-frequency term element D satisfy the preset similarity screening condition, the candidate high-frequency term element C and the candidate high-frequency term element D are grouped into one group, and further, if the word frequency of the candidate high-frequency term element C is 50 and the word frequency of the candidate high-frequency term element D is 70, it may be determined that the target high-frequency term element of the group of candidate high-frequency term elements is the candidate high-frequency term element D.
And S205, calculating the importance quantitative value associated with each target high-frequency sentence element according to the evaluation level of the comment text in which each target high-frequency sentence element is positioned.
And S206, visually displaying the text processing result according to the importance quantization value, the theme type of each target high-frequency sentence element and the word frequency of each target high-frequency sentence element.
The technical scheme of the embodiment of the invention comprises the steps of determining statement elements in each target statement, counting word frequencies of the statement elements in all comment texts, determining alternative high-frequency statement elements from the statement elements in each target statement according to the size relation between the word frequencies of the statement elements and a preset word frequency threshold, determining the target high-frequency statement elements from the alternative high-frequency statement elements according to the similarity between the alternative high-frequency statement elements, and finally determining importance quantitative values associated with the target high-frequency statement elements and performing visual display. In this way, an implementable way of determining the target high-frequency sentence elements from the target sentence is provided, and more accurate and effective high-frequency sentence elements can be determined
Optionally, determining the topic type to which each target high-frequency sentence element belongs includes: clustering the target sentences containing the target high-frequency sentence elements by using a clustering algorithm based on the semantic similarity between the target sentences in which the target high-frequency sentence elements are located; and determining the topic type of each target high-frequency sentence element according to the clustering result.
The clustering algorithm can be a commonly used K-means clustering algorithm (K-means clustering algorithm), also called a K-means algorithm. The topic types include: experience themes, automotive hardware themes, product display themes, and automotive color themes.
Optionally, semantic features of target sentences to which the target high-frequency sentence elements belong may be determined first, semantic similarity between the target sentences is further determined, the clustering algorithm is used to cluster the semantic similarities in the target sentences into one category, and finally all the target sentences are clustered into four categories, which are respectively a experience theme, an automobile hardware theme, a product display theme and an automobile color theme.
EXAMPLE III
Fig. 3 is a flowchart of a text processing method according to a third embodiment of the present invention, and this embodiment further explains in detail "visually displaying a text processing result according to an importance quantization value, a topic type to which each target high-frequency sentence element belongs, and a word frequency of each target high-frequency sentence element" on the basis of the foregoing embodiment, as shown in fig. 3, the method includes:
s301, dividing the comment text into at least two target sentences.
S302, determining target high-frequency sentence elements according to the word frequency of each sentence element in each target sentence appearing in all comment texts and the similarity between the sentence elements.
And S303, calculating an importance quantitative value associated with each target high-frequency sentence element according to the evaluation level of the comment text in which each target high-frequency sentence element is positioned.
S304, obtaining the target theme type to be visually displayed, and screening out the target high-frequency sentence elements corresponding to the target theme type from the target high-frequency sentence elements.
The target theme type refers to a theme type which is specified by related personnel and needs to be visually displayed. The target theme type may be a experience theme, an automobile hardware theme, a product presentation theme, or an automobile color theme.
Optionally, the topic type to which each target high-frequency sentence element belongs may be obtained, and the target high-frequency sentence element whose topic type is the target topic type is determined, that is, the target high-frequency sentence element corresponding to the target topic type is screened from the target high-frequency sentence elements.
And S305, sequencing the target high-frequency sentence elements according to the importance quantization values of the target high-frequency sentence elements corresponding to the target topic types, and taking the word frequency of each target high-frequency sentence element as the associated label information.
Optionally, the importance quantization values of the target high-frequency sentence elements corresponding to the target topic type may be determined, and the target high-frequency sentence elements corresponding to the target topic type may be sorted according to the size of the importance quantization values and the descending order, so as to determine the sorting order number of the target high-frequency sentence elements corresponding to the target topic type. And counting the word frequency of each target high-frequency sentence element corresponding to the target subject type in all comment texts, and taking the word frequency of each target high-frequency sentence element as the associated label information.
And S306, visually displaying the text processing result corresponding to the target theme type to be visually displayed according to the sequencing result and the label information.
The sequencing result refers to a sequencing sequence number result obtained after sequencing all target high-frequency statement elements corresponding to the target subject type.
Optionally, the target high-frequency sentence elements corresponding to the target topic type can be visually displayed according to the sequence number of the target high-frequency sentence element corresponding to each target topic type in the sequencing result and the sequence number sequence, and the label information of each target high-frequency sentence element is also visually displayed in a label form, that is, the text processing result corresponding to the target topic type to be visually displayed is visually displayed.
According to the technical scheme of the embodiment of the invention, the target theme type to be visually displayed is obtained, the target high-frequency sentence elements corresponding to the target theme type are screened out from the target high-frequency sentence elements, the target high-frequency sentence elements are sequenced according to the importance quantization value of the target high-frequency sentence elements corresponding to the target theme type, the word frequency of each target high-frequency sentence element is used as the associated label information, and finally the text processing result corresponding to the target theme type to be visually displayed is visually displayed according to the sequencing result and the label information. By the method, the implementation mode for visually displaying the text processing result is provided, the text processing result can be better displayed, and related personnel can conveniently acquire information.
Optionally, after splitting the comment text after the replacement operation is performed into at least two target sentences, the method further includes: performing semantic analysis on each target sentence to determine whether the semantics of each target sentence are smooth or not; if not, analyzing the context of the comment text where the target sentence is located, and determining the words with misspelling from the target sentence; and if the word with misspelling is determined to belong to the preset common error form word, correcting the word with misspelling according to the correct form of the common error form word.
Optionally, if it is determined that the semantics of the target sentence are not smooth, it may be determined that the target sentence may include a misspelled word, at this time, the context of the comment text where the target sentence is located may be analyzed to infer a word in which a misspelled word exists, the misspelled word is compared with a pre-stored common error form word library, and it is determined whether the misspelled word exists in the pre-stored common error form word library, if so, the misspelled word may be corrected to a correct form of a corresponding common error form word, that is, the misspelled word is corrected according to the correct form of the common error form word. If not, the words with misspellings are corrected according to a preset spelling corrector (such as a FAROO spelling corrector).
Optionally, the target sentence may also be directly analyzed to determine whether the target sentence contains misspelled words; if yes, the misspelled word is corrected according to a preset spelling corrector (such as a FAROO spelling corrector).
Example four
FIG. 4 is a block diagram of a document processing apparatus according to a fourth embodiment of the present invention; the text processing device provided by the embodiment of the invention can execute the text processing method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
As shown in fig. 4, the apparatus includes:
a splitting module 401, configured to split the comment text into at least two target sentences; the comment text is a text for commenting the relevant performance of the automobiles of the preset ranking automobile types;
a determining module 402, configured to determine a target high-frequency sentence element according to a word frequency of each sentence element in each target sentence appearing in all comment texts and a similarity between the sentence elements; wherein, the sentence elements are nouns and/or phrases;
a calculating module 403, configured to calculate an importance quantization value associated with each target high-frequency sentence element according to an evaluation level of the comment text in which each target high-frequency sentence element is located;
and the visualization module 404 is configured to visually display a text processing result according to the importance quantization value, the topic type to which each target high-frequency sentence element belongs, and the word frequency of each target high-frequency sentence element.
According to the technical scheme, the comment text is divided into at least two target sentences, the target high-frequency sentence elements are determined according to the word frequency of each sentence element in each target sentence appearing in all the comment texts and the similarity between the sentence elements, the importance quantization value associated with each target high-frequency sentence element is calculated according to the evaluation level of the comment text in which each target high-frequency sentence element is located, and the text processing result is visually displayed according to the importance quantization value, the theme type to which each target high-frequency sentence element belongs and the word frequency of each target high-frequency sentence element. By the mode, the high-frequency sentence elements can be extracted from the comment text, and the text processing result is visually displayed based on the related information of the high-frequency sentence elements, so that the comment text is more accurately and effectively processed, more accurate effective information can be extracted and visualized, and the subsequent improvement of the product quality is facilitated.
Further, the determining module 402 may include:
the first determining unit is used for determining the sentence elements in each target sentence and counting the word frequency of each sentence element in all comment texts;
a second determining unit, configured to determine, according to a size relationship between a word frequency of each term element and a preset word frequency threshold, an alternative high-frequency term element from the term elements in each target term;
and a third determining unit configured to determine a target high-frequency term element from the candidate high-frequency term elements according to the similarity between the candidate high-frequency term elements.
Further, the third determining unit is specifically configured to:
determining the similarity between at least two alternative high-frequency sentence elements, and if the similarity is higher than a preset similarity threshold, dividing the at least two alternative high-frequency sentence elements into a group;
and determining the target high-frequency sentence elements from each group of alternative high-frequency sentence elements according to the word frequency of each alternative high-frequency sentence element in each group of alternative high-frequency sentence elements in all comment texts.
Further, the calculating module 403 is specifically configured to:
determining the evaluation score of each target high-frequency sentence element in each target sentence to which the target high-frequency sentence element belongs according to the evaluation grade of each comment text in which the target high-frequency sentence element is positioned and the incidence relation between the evaluation grade and the evaluation score;
calculating the average value of the scores of the target high-frequency sentence elements in the target sentences to which the target high-frequency sentence elements belong aiming at each target high-frequency sentence element;
and taking the average value as an importance quantization value associated with each target high-frequency sentence element.
Further, the above apparatus further comprises:
the theme type determining module is used for clustering the target sentences containing the target high-frequency sentence elements by utilizing a clustering algorithm based on the semantic similarity between the target sentences in which the target high-frequency sentence elements are positioned;
determining the theme type of each target high-frequency sentence element according to the clustering result; the theme types include: experience themes, automotive hardware themes, product display themes, and automotive color themes.
Further, the above apparatus further comprises:
the preprocessing module is used for splitting the comment text after the replacement operation is executed into at least two target sentences, and then performing semantic analysis on each target sentence to determine whether the semantics of each target sentence are smooth or not;
if not, analyzing the context of the comment text where the target sentence is located, and determining the words with misspelling from the target sentence;
and if the word with the misspelling is determined to belong to a preset common error form word, correcting the word with the misspelling according to the correct form of the common error form word.
Further, the visualization module 404 is specifically configured to:
acquiring a target theme type to be visually displayed, and screening out a target high-frequency sentence element corresponding to the target theme type from the target high-frequency sentence elements;
sequencing the target high-frequency sentence elements according to the importance quantization values of the target high-frequency sentence elements corresponding to the target theme types, and taking the word frequency of each target high-frequency sentence element as the associated label information;
and visually displaying the text processing result corresponding to the target theme type to be visually displayed according to the sequencing result and the label information.
It should be noted that, in the technical solution of the present application, the obtaining, storing, using, processing, and the like of the comment text and the related data information thereof all conform to the relevant regulations of the national laws and regulations.
EXAMPLE five
Fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present invention. FIG. 5 illustrates a block diagram of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 5, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
Processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The processor 11 performs the various methods and processes described above, such as a text processing method.
In some embodiments, the text processing method may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into the RAM 13 and executed by the processor 11, one or more steps of the text processing method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the text processing method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Computer programs for implementing the methods of the present invention can be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the Internet.
The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method of text processing, comprising:
splitting the comment text into at least two target sentences; the comment text is a text for commenting the relevant performance of the automobiles of the preset ranking automobile types;
determining target high-frequency sentence elements according to the word frequency of each sentence element in each target sentence appearing in all comment texts and the similarity between the sentence elements; wherein, the sentence elements are nouns and/or phrases;
calculating an importance quantitative value associated with each target high-frequency sentence element according to the evaluation level of the comment text in which each target high-frequency sentence element is located;
and visually displaying the text processing result according to the importance quantization value, the theme type to which each target high-frequency sentence element belongs and the word frequency of each target high-frequency sentence element.
2. The method according to claim 1, wherein determining the target high-frequency sentence elements according to the word frequency of each sentence element in each target sentence appearing in all the comment texts and the similarity between the sentence elements comprises:
determining sentence elements in each target sentence, and counting word frequencies of the sentence elements in all comment texts;
determining alternative high-frequency sentence elements from the sentence elements in the target sentences according to the size relation between the word frequency of each sentence element and a preset word frequency threshold;
and determining a target high-frequency sentence element from the alternative high-frequency sentence elements according to the similarity between the alternative high-frequency sentence elements.
3. The method according to claim 2, wherein determining a target high-frequency sentence element from the alternative high-frequency sentence elements based on the similarity between the alternative high-frequency sentence elements comprises:
determining the similarity between at least two alternative high-frequency sentence elements, and if the similarity is higher than a preset similarity threshold, dividing the at least two alternative high-frequency sentence elements into a group;
and determining the target high-frequency sentence elements from each group of alternative high-frequency sentence elements according to the word frequency of each alternative high-frequency sentence element in each group of alternative high-frequency sentence elements in all the comment texts.
4. The method according to claim 1, wherein calculating the importance quantification value associated with each target high-frequency sentence element according to the evaluation level of the comment text in which the target high-frequency sentence element is located comprises:
determining the evaluation score of each target high-frequency sentence element in each target sentence to which the target high-frequency sentence element belongs according to the evaluation grade of each comment text in which the target high-frequency sentence element is positioned and the incidence relation between the evaluation grade and the evaluation score;
calculating the average value of the scores of the target high-frequency sentence elements in the target sentences to which the target high-frequency sentence elements belong aiming at each target high-frequency sentence element;
and taking the average value as an importance quantization value associated with each target high-frequency sentence element.
5. The method according to claim 1, wherein determining a topic type to which each target high-frequency sentence element belongs comprises:
clustering the target sentences containing the target high-frequency sentence elements by using a clustering algorithm based on the semantic similarity between the target sentences in which the target high-frequency sentence elements are located;
determining the theme type of each target high-frequency sentence element according to the clustering result; the theme types include: experience themes, automotive hardware themes, product display themes, and automotive color themes.
6. The method according to any one of claims 1 to 5, wherein after splitting the comment text after performing the replacement operation into at least two target sentences, further comprising:
performing semantic analysis on each target sentence to determine whether the semantics of each target sentence are smooth or not;
if not, analyzing the context of the comment text where the target sentence is located, and determining the words with misspelling from the target sentence;
and if the word with misspelling is determined to belong to a preset common error form word, correcting the word with misspelling according to the correct form of the common error form word.
7. The method according to claim 1, wherein the visually displaying the text processing result according to the importance quantization value, the topic type to which each target high-frequency sentence element belongs, and the word frequency of each target high-frequency sentence element comprises:
acquiring a target theme type to be visually displayed, and screening out target high-frequency sentence elements corresponding to the target theme type from the target high-frequency sentence elements;
sequencing the target high-frequency sentence elements according to the importance quantization values of the target high-frequency sentence elements corresponding to the target theme types, and taking the word frequency of each target high-frequency sentence element as the associated label information;
and visually displaying the text processing result corresponding to the target theme type to be visually displayed according to the sequencing result and the label information.
8. A text processing apparatus, comprising:
the splitting module is used for splitting the comment text into at least two target sentences; the comment text is a text for commenting the relevant performance of the automobiles of the preset ranking automobile types;
the determining module is used for determining target high-frequency sentence elements according to the word frequency of each sentence element in each target sentence appearing in all comment texts and the similarity between the sentence elements; wherein, the sentence elements are nouns and/or phrases;
the calculation module is used for calculating the importance quantitative value associated with each target high-frequency statement element according to the evaluation level of the comment text in which each target high-frequency statement element is located;
and the visualization module is used for visually displaying the text processing result according to the importance quantization value, the theme type to which each target high-frequency sentence element belongs and the word frequency of each target high-frequency sentence element.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the text processing method of any of claims 1-7.
10. A computer-readable storage medium storing computer instructions for causing a processor to perform the text processing method of any one of claims 1-7 when executed.
CN202210978990.0A 2022-08-16 2022-08-16 Text processing method, device, equipment and medium Pending CN115309867A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210978990.0A CN115309867A (en) 2022-08-16 2022-08-16 Text processing method, device, equipment and medium
PCT/CN2023/112841 WO2024037483A1 (en) 2022-08-16 2023-08-14 Text processing method and apparatus, and electronic device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210978990.0A CN115309867A (en) 2022-08-16 2022-08-16 Text processing method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN115309867A true CN115309867A (en) 2022-11-08

Family

ID=83861957

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210978990.0A Pending CN115309867A (en) 2022-08-16 2022-08-16 Text processing method, device, equipment and medium

Country Status (2)

Country Link
CN (1) CN115309867A (en)
WO (1) WO2024037483A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024037483A1 (en) * 2022-08-16 2024-02-22 中国第一汽车股份有限公司 Text processing method and apparatus, and electronic device and medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200020000A1 (en) * 2018-07-16 2020-01-16 Ebay Inc. Generating product descriptions from user reviews
CN109408809A (en) * 2018-09-25 2019-03-01 天津大学 A kind of sentiment analysis method for automobile product comment based on term vector
CN110175325B (en) * 2019-04-26 2023-07-11 南京邮电大学 Comment analysis method based on word vector and syntactic characteristics and visual interaction interface
CN115309867A (en) * 2022-08-16 2022-11-08 中国第一汽车股份有限公司 Text processing method, device, equipment and medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024037483A1 (en) * 2022-08-16 2024-02-22 中国第一汽车股份有限公司 Text processing method and apparatus, and electronic device and medium

Also Published As

Publication number Publication date
WO2024037483A1 (en) 2024-02-22

Similar Documents

Publication Publication Date Title
CN112541070A (en) Method and device for excavating slot position updating corpus, electronic equipment and storage medium
CN112560461A (en) News clue generation method and device, electronic equipment and storage medium
CN112989235A (en) Knowledge base-based internal link construction method, device, equipment and storage medium
WO2024037483A1 (en) Text processing method and apparatus, and electronic device and medium
CN115409039A (en) Standard vehicle type data analysis method and device, electronic equipment and medium
CN114625834A (en) Enterprise industry information determination method and device and electronic equipment
CN117216275A (en) Text processing method, device, equipment and storage medium
CN112560425A (en) Template generation method and device, electronic equipment and storage medium
CN112632982A (en) Dialogue text emotion analysis method capable of being used for supplier evaluation
CN114662469B (en) Emotion analysis method and device, electronic equipment and storage medium
CN115563242A (en) Automobile information screening method and device, electronic equipment and storage medium
KR20220024251A (en) Method and apparatus for building event library, electronic device, and computer-readable medium
CN114676699A (en) Entity emotion analysis method and device, computer equipment and storage medium
CN114647727A (en) Model training method, device and equipment applied to entity information recognition
CN114201953A (en) Keyword extraction and model training method, device, equipment and storage medium
CN113641724A (en) Knowledge tag mining method and device, electronic equipment and storage medium
CN112989805A (en) Text detection method, device, equipment and storage medium
CN115309868A (en) Text processing method, device, equipment and medium
CN114186552B (en) Text analysis method, device and equipment and computer storage medium
CN115525733A (en) Text processing method, device, equipment and medium
US11907668B2 (en) Method for selecting annotated sample, apparatus, electronic device and storage medium
CN115545011A (en) Automobile comment type determination method, device, equipment and storage medium
CN115408496A (en) Method, device, equipment and storage medium for determining attention difference of each function of automobile
CN115828925A (en) Text selection method and device, electronic equipment and readable storage medium
CN115129874A (en) Text information processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination