CN114064895A - Method, device, equipment and medium for discovering new user suggestions in real time - Google Patents

Method, device, equipment and medium for discovering new user suggestions in real time Download PDF

Info

Publication number
CN114064895A
CN114064895A CN202111356575.3A CN202111356575A CN114064895A CN 114064895 A CN114064895 A CN 114064895A CN 202111356575 A CN202111356575 A CN 202111356575A CN 114064895 A CN114064895 A CN 114064895A
Authority
CN
China
Prior art keywords
cluster center
tested
clustering
stored
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111356575.3A
Other languages
Chinese (zh)
Other versions
CN114064895B (en
Inventor
李赟扬
叶永龙
刘宝强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Skieer Information Technology Co ltd
Original Assignee
Shenzhen Skieer Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Skieer Information Technology Co ltd filed Critical Shenzhen Skieer Information Technology Co ltd
Priority to CN202111356575.3A priority Critical patent/CN114064895B/en
Publication of CN114064895A publication Critical patent/CN114064895A/en
Application granted granted Critical
Publication of CN114064895B publication Critical patent/CN114064895B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method, a device, equipment and a medium for discovering new suggestions of a user in real time, which relate to the technical field of data processing and comprise the following steps: extracting the user comment text data according to a preset sentence pattern rule to obtain a suggested text segment to be clustered; clustering a plurality of the suggested text segments to be clustered to obtain a plurality of clustering centers to be tested; respectively judging whether a plurality of clustering centers to be detected are similar to pre-stored clustering centers; and if the current cluster center to be tested is not similar to the pre-stored cluster center, newly building a class of cluster centers on the pre-stored cluster centers, and meanwhile judging that a new user suggestion is detected. The method comprises the steps of obtaining text segments of suggestions to be clustered from user comment text data according to preset sentence pattern rules, clustering to obtain a cluster center to be tested, and judging whether the cluster center to be tested is similar to a pre-stored cluster center or not, so that whether the text segments of the suggestions to be clustered corresponding to the cluster center to be tested are new user suggestions or not is confirmed, and the efficiency of finding the new user suggestions is improved.

Description

Method, device, equipment and medium for discovering new user suggestions in real time
Technical Field
The invention relates to the technical field of data processing, in particular to a method, a device, equipment and a medium for discovering new suggestions of a user in real time.
Background
With the continuous development of internet technology, people prefer shopping on an e-commerce platform more and more, and the shopping mode provides great convenience for our life. Generally, after a shopping is finished, the shopping experience, the product use feeling and the suggestion of the product are released to an e-commerce platform to express the view of the product. And the user feedback texts are used as carriers of user problems, suggestions and attitudes, and have great value for product evaluation and improvement optimization. Therefore, the brand party can collect the suggestions of the users to the products from the user feedback, and therefore the feasibility, the applicability and the profit point of the suggestions of the users are researched, and the products are made to be better.
There are related art methods that can extract user suggestions for a product from user reviews. The common method is to directly cluster text data to obtain the suggestion of a user on a product. Because all texts are clustered in a full amount every time a new user suggestion is made, the problems that the clustering result generation time is long and the time for finding the new user suggestion is long exist.
Disclosure of Invention
The invention provides a method, a device, equipment and a medium for discovering new user suggestions in real time, which are used for solving the problems that in the prior art, the clustering result generation time is long and the time for discovering new user suggestions is long when all texts are clustered in a full scale.
In order to solve the problems, the invention adopts the following technical scheme:
in a first aspect, the present invention provides a method for discovering a new suggestion in real time by a user, including:
acquiring user comment text data and a rule file, wherein the rule file comprises a preset sentence pattern rule;
extracting the user comment text data according to a preset sentence pattern rule to obtain a suggested text segment to be clustered;
clustering a plurality of the suggested text segments to be clustered to obtain a plurality of clustering centers to be tested;
respectively judging whether a plurality of clustering centers to be detected are similar to pre-stored clustering centers;
if the current cluster center to be tested is not similar to the pre-stored cluster center, newly building a cluster center class on the pre-stored cluster center, and meanwhile judging that a new user suggestion is detected, wherein the cluster center to be tested is the newly built cluster center class;
if the current cluster center to be tested is similar to the pre-stored cluster center, merging the cluster center to be tested into the pre-stored cluster center;
judging whether the current cluster center to be tested belongs to the last cluster center to be tested in the plurality of cluster centers to be tested;
if not, returning to the step of respectively judging whether the plurality of clustering centers to be detected are similar to the pre-stored clustering centers.
The further technical scheme is that before the user comment text data and the rule file are obtained, the method further comprises the following steps:
and carrying out data cleaning processing on the user comment text data to filter out noise data.
The further technical scheme is that the clustering is carried out on a plurality of the suggested text segments to be clustered to obtain a plurality of clustering centers to be tested, and the clustering method comprises the following steps:
preprocessing the suggested text segments to be clustered to obtain a plurality of text characteristic words;
converting the text feature words into corresponding word vectors one by one to obtain a plurality of word vectors;
superposing the word vectors to obtain a text vector;
and clustering the plurality of text vectors by using a preset clustering algorithm to obtain a plurality of clustering centers to be tested and the cluster number of each text vector.
The further technical scheme is that the cluster center to be tested comprises a cluster center vector, the pre-stored cluster center comprises a pre-stored cluster center vector, and whether a plurality of cluster centers to be tested are similar to the pre-stored cluster centers is respectively judged, including:
acquiring a cluster center vector in the current cluster center to be detected;
judging whether the similarity degree of the current clustering center vector and a pre-stored clustering center vector is smaller than a preset threshold value or not;
and if the similarity degree of the current cluster center vector and the pre-stored cluster center vector is smaller than a preset threshold value, judging that the current cluster center to be tested is not similar to the pre-stored cluster center.
The technical scheme is that the text segments to be clustered and suggested include stop words, and the word segmentation preprocessing is performed on the text segments to be clustered and suggested to obtain a plurality of text characteristic words, and the method comprises the following steps:
removing the stop words in the suggested text segments to be clustered by using a stop word dictionary.
The technical scheme is that the suggested text segments to be clustered further comprise synonyms, and the word segmentation preprocessing is performed on the suggested text segments to be clustered to obtain a plurality of text characteristic words, and the method comprises the following steps:
and carrying out synonym replacement on all synonyms in the suggested text segment to be clustered by utilizing a synonym dictionary.
The further technical scheme is that after respectively judging whether a plurality of clustering centers to be detected are similar to the pre-stored clustering centers, the method further comprises the following steps:
and if the current cluster center to be tested is the last cluster center to be tested in the plurality of cluster centers to be tested, returning to the step of obtaining the user comment text data and the rule file so as to obtain the user comment text data again.
In a second aspect, the present invention also provides a device for discovering new suggestions in real time by a user, including means for performing the method according to the first aspect.
In a third aspect, the present invention further provides an electronic device, including a processor, a communication interface, a memory and a communication bus, where the processor, the communication interface, and the memory complete mutual communication through the communication bus;
a memory for storing a computer program;
a processor for implementing the steps of the method of the first aspect when executing the program stored in the memory.
In a fourth aspect, the invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of the first aspect.
Compared with the prior art, the technical scheme provided by the embodiment of the invention has the following advantages:
according to the method, the text segments of the suggestions to be clustered are obtained from the user comment text data according to the preset sentence pattern rule, then the text segments of the suggestions to be clustered are clustered to obtain the clustering center to be tested, and then whether the clustering center to be tested is similar to the pre-stored clustering center or not is judged, so that whether the text segments of the suggestions to be clustered corresponding to the clustering center to be tested are the new user suggestions or not is determined, the efficiency of finding the new user suggestions is improved, the whole user comment text data does not need to be clustered completely, the time is saved, and the efficiency of finding the new user suggestions is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a schematic flowchart of a method for discovering a new user suggestion in real time according to embodiment 1 of the present invention;
fig. 2 is a schematic flow chart of clustering a plurality of text segments of a new user suggestion to be clustered in the method for discovering a new user suggestion in real time according to embodiment 1 of the present invention to obtain a plurality of clustering centers to be tested;
fig. 3 is a schematic flowchart of a method for discovering a new user suggestion in real time according to embodiment 2 of the present invention;
fig. 4 is a block diagram illustrating a structure of a device for discovering new suggestions in real time according to embodiment 3 of the present invention;
fig. 5 is a block diagram illustrating a structure of a device for discovering new suggestions in real time according to embodiment 4 of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to embodiment 5 of the present invention.
Detailed Description
In order to more fully understand the technical content of the present invention, the technical solution of the present invention will be further described and illustrated with reference to the following specific embodiments, but not limited thereto.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Example 1
Referring to fig. 1, and referring to fig. 2, fig. 1 is a schematic flowchart of a method for discovering a new user suggestion in real time according to embodiment 1 of the present invention. The method can be applied to electronic equipment, wherein the electronic equipment comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the communication bus, and the method is not particularly limited. Specifically, as shown in FIG. 1, the method includes the following steps S101-S108.
S101, obtaining user comment text data and a rule file, wherein the rule file comprises a preset sentence pattern rule.
The user comment text data, i.e., user suggestions; the rule file is used for storing the preset sentence pattern rule; the preset sentence pattern rule is a rule set by a user according to needs, and the preset sentence pattern rule is formed by writing a regular expression or a more complex regular expression by the user.
And S102, extracting the text data of the user comments according to a preset sentence pattern rule to obtain a suggested text segment to be clustered.
Extracting the user comment text data according to a preset sentence pattern rule to obtain a suggested text segment to be clustered, for example, a user needs to extract a text which is mentioned in the user comment text data and is 'xx is more perfect', and the text which is 'xx is more perfect' is a suggested text segment to be clustered; the processor analyzes the expression in the sentence rule and matches the text mentioned in the user comment text data, wherein the text is "more perfect if xx" so as to obtain the recommended text segment to be clustered, and the preset sentence rule is described by a regular expression or a more complex regular expression written by the user.
S103, clustering the plurality of suggested text segments to be clustered to obtain a plurality of clustering centers to be tested.
And the clustering is used for classifying the suggested text segments to be clustered and obtaining a plurality of clustering centers to be tested.
In an embodiment, the clustering the plurality of suggested text segments to be clustered to obtain a plurality of clustering centers to be tested includes:
and S1031, preprocessing the suggested text segments to be clustered to obtain a plurality of text characteristic words.
And the preprocessing is used for screening the keywords in the suggested text segments to be clustered so as to obtain a plurality of text characteristic words.
In a specific implementation, the suggested text segments to be clustered include stop words, and performing word segmentation preprocessing on the suggested text segments to be clustered to obtain a plurality of text feature words includes:
removing the stop words in the suggested text segments to be clustered by using a stop word dictionary.
The stop word dictionary has stored therein associated instructions for removing stop words.
In a specific implementation, the suggested text segments to be clustered further include synonyms, and performing word segmentation preprocessing on the suggested text segments to be clustered to obtain a plurality of text feature words includes:
and carrying out synonym replacement on all synonyms in the suggested text segment to be clustered by utilizing a synonym dictionary.
The synonym dictionary stores relevant instructions for synonym replacement of all synonyms.
S1032, the text feature words are converted into corresponding word vectors one by one, and a plurality of word vectors are obtained.
And converting the text characteristic words into corresponding Word vectors one by one to obtain a plurality of Word vectors, and converting the text characteristic words into corresponding Word vectors one by using a Word2vec or Glove model to obtain a plurality of Word vectors.
And S1033, superposing the word vectors and averaging to obtain a text vector.
And converting all the word vectors into text vectors to be represented by adopting a word vector weighted average method.
S1034, clustering the plurality of text vectors by using a preset clustering algorithm to obtain a plurality of clustering centers to be tested and the cluster number of each text vector.
The preset clustering algorithm is used for clustering the text vectors, and the cluster number of the text vector means that the text vector belongs to a category of a certain cluster center to be tested.
In specific implementation, hierarchical clustering is used for clustering the text vectors to obtain a cluster center to be tested of the suggested text segments to be clustered.
And S104, respectively judging whether the plurality of clustering centers to be detected are similar to the pre-stored clustering centers.
In specific implementation, if a plurality of text segments to be clustered are clustered to obtain a plurality of clustering centers to be tested, a certain clustering center to be tested is a ', the pre-stored clustering centers are A, B and C, similarity calculation is performed on the clustering center to be tested A' and the pre-stored clustering centers A, B and C, so that the similarity between the clustering center to be tested A 'and one of the pre-stored clustering centers A, B and C is greater than a preset threshold value, and the text segment to be clustered corresponding to the clustering center to be tested A' is determined to be an existing user suggestion; if the similarity between the cluster center A 'to be detected and the pre-stored cluster centers A, B and C is smaller than the preset threshold, the text segment of the suggestion to be clustered corresponding to the cluster center A' to be detected is judged to be a new user suggestion.
In an embodiment, the cluster centers to be tested include cluster center vectors, the pre-stored cluster centers include pre-stored cluster center vectors, and the determining whether the plurality of cluster centers to be tested are similar to the pre-stored cluster centers respectively includes:
acquiring a cluster center vector in a current cluster center to be detected;
judging whether the similarity degree of the current clustering center vector and a pre-stored clustering center vector is smaller than a preset threshold value or not;
and if the similarity degree of the current cluster center vector and the pre-stored cluster center vector is smaller than a preset threshold value, judging that the current cluster center to be tested is not similar to the pre-stored cluster center.
In a specific implementation, the cluster center to be tested comprises a cluster center vector, and the pre-stored cluster center comprises a pre-stored cluster center vector; clustering a batch of text vectors to obtain a plurality of clustering centers to be detected, judging whether the clustering centers to be detected are similar to the pre-stored clustering centers or not by the processor according to the similarity degree of the clustering center vectors and the clustering center vectors in the pre-stored clustering centers, judging that the clustering centers to be detected are similar to the pre-stored clustering centers when the similarity degree is greater than a preset threshold value, and adjusting the preset threshold value by a user according to the specific requirements of a service scene; for example, after clustering a plurality of text segments to be clustered and suggested, a plurality of clustering centers to be tested are obtained, wherein one clustering center to be tested in the plurality of clustering centers is a ', the pre-stored clustering centers are A, B and C, similarity calculation is performed on the clustering center to be tested a' and the pre-stored clustering centers A, B and C, so that the similarity between the clustering center to be tested a 'and one clustering center in the pre-stored clustering centers A, B and C is greater than a preset threshold value, and then the text segment to be clustered and suggested corresponding to the clustering center to be tested a' is determined to be an existing user suggestion; if the similarity between the cluster center A 'to be detected and the pre-stored cluster centers A, B and C is smaller than the preset threshold, the text segment of the suggestion to be clustered corresponding to the cluster center A' to be detected is judged to be a new user suggestion.
S105, if the current cluster center to be tested is not similar to the pre-stored cluster center, newly building a cluster center class on the pre-stored cluster center, and meanwhile, judging that a new user suggestion is detected, wherein the cluster center to be tested is the newly built cluster center class.
When the processor judges that the cluster center to be tested is not similar to the pre-stored cluster center, the text segment of the suggestion to be clustered corresponding to the cluster center to be tested is judged to be a new user suggestion, so that the new user suggestion is found, the pre-stored cluster center is updated at the same time, the cluster center to be tested is added into the pre-stored cluster center and is in parallel relation with the pre-stored cluster center, and namely the cluster center to be tested is a newly-built cluster center.
And S106, if the current cluster center to be tested is similar to the pre-stored cluster center, merging the cluster center to be tested into the pre-stored cluster center.
And when the processor judges that the cluster center to be tested is similar to the pre-stored cluster center, judging that the text segment of the suggested text to be clustered corresponding to the cluster center to be tested is the existing user suggestion, and combining the cluster center to be tested into the pre-stored cluster center.
S107, judging whether the current cluster center to be tested belongs to the last cluster center to be tested in the plurality of cluster centers to be tested.
And S108, if not, returning to the step of respectively judging whether the plurality of to-be-detected clustering centers are similar to the pre-stored clustering centers.
And returning to the step of respectively judging whether the plurality of cluster centers to be detected are similar to the pre-stored cluster center or not by judging whether the current cluster center to be detected belongs to the last cluster center to be detected in the plurality of cluster centers to be detected or not and when the current cluster center to be detected is not the last cluster center to be detected, so that the similarity comparison between the next cluster center to be detected and the pre-stored cluster center is carried out.
Specifically, through a user new suggestion real-time discovery method, a text segment to be clustered is acquired from user comment text data according to a preset sentence pattern rule, then the text segment to be clustered is clustered to obtain a cluster center to be tested, and then whether the cluster center to be tested is similar to a pre-stored cluster center is judged, so that whether the text segment to be clustered corresponding to the cluster center to be tested is a new user suggestion is determined, the efficiency of discovering the new user suggestion is improved, the whole user comment text data is not required to be clustered completely, the time is saved, and the efficiency of discovering the new user suggestion is improved.
Example 2
Referring to fig. 3, fig. 3 is a schematic flowchart of a method for discovering a new user suggestion in real time according to embodiment 2 of the present invention. The method for discovering the new user suggestion in real time in embodiment 2 includes steps S201 to S210, where steps S201 to S208 are similar to steps S101 to S108 in embodiment 1, and are not described herein again. The added steps S209 to S210 in the present embodiment are explained in detail below.
Before the obtaining of the user comment text data and the rule file, the method further includes:
and S209, performing data cleaning processing on the user comment text data to filter out noise data.
The data cleaning processing of the user comment text data mainly filters noise data, and the data cleaning processing mainly comprises: filtering out meaningless symbols appearing in the water force comments and the user comments; punctuation marks and English letters appearing in the user comment text data are converted into unified expressions in a unified writing mode; and for the longer user comment text data which is not divided by any punctuation marks, dividing the user comment text data by adopting a named entity identification method, and adding punctuation marks for sentence breaking.
After respectively judging whether a plurality of to-be-detected clustering centers are similar to the pre-stored clustering centers, the method further comprises the following steps:
and S210, if the current cluster center to be tested is the last cluster center to be tested in the plurality of cluster centers to be tested, returning to the step of obtaining the user comment text data and the rule file so as to obtain the user comment text data again.
If the current cluster center to be tested is the last cluster center to be tested in the plurality of cluster centers to be tested, representing that the process of comparing the similarity of all the cluster centers to be tested in the batch with the pre-stored cluster centers is completed, returning to the step of obtaining the user comment text data and the rule file so as to obtain the user comment text data again, and continuing to compare the similarity of all the cluster centers to be tested in the next batch with the pre-stored cluster centers.
Specifically, the data cleaning processing is carried out on the user comment text data to filter out noise data, so that the quality of the obtained user comment text data can be improved, and the frequency of errors in the subsequent processing process of the user comment text data is reduced; and returning to the step of acquiring the user comment text data and the rule file to acquire the user comment text data again, so that new user suggestions are continuously discovered.
Example 3
Referring to fig. 4, an embodiment of the present invention further provides a new user suggestion real-time discovery apparatus 400, where the new user suggestion real-time discovery apparatus 400 includes a first obtaining unit 401, a first extracting unit 402, a first clustering unit 403, a first determining unit 404, a first creating unit 405, a first merging unit 406, a second determining unit 407, and a first returning unit 408.
A first obtaining unit 401, configured to obtain user comment text data and a rule file, where the rule file includes a preset sentence pattern rule;
a first extracting unit 402, configured to extract the user comment text data according to a preset sentence pattern rule, so as to obtain a suggested text segment to be clustered;
the first clustering unit 403 is configured to cluster the plurality of suggested text segments to be clustered to obtain a plurality of clustering centers to be tested;
a first judging unit 404, configured to respectively judge whether a plurality of cluster centers to be detected are similar to a pre-stored cluster center;
a first newly building unit 405, configured to, if the current to-be-detected cluster center is not similar to the pre-stored cluster center, newly build a new cluster center on the pre-stored cluster center, and meanwhile determine that a new user suggestion is detected, where the to-be-detected cluster center is the newly built cluster center;
a first merging unit 406, configured to merge the to-be-detected cluster center into the pre-stored cluster center if the current to-be-detected cluster center is similar to the pre-stored cluster center;
a second judging unit 407, configured to judge whether the current to-be-detected cluster center belongs to a last to-be-detected cluster center among the multiple to-be-detected cluster centers;
the first returning unit 408 is configured to return to the step of respectively determining whether the plurality of cluster centers to be detected are similar to the pre-stored cluster center if the current cluster center to be detected does not belong to the last cluster center to be detected among the plurality of cluster centers to be detected.
In an embodiment, the clustering the plurality of suggested text segments to be clustered to obtain a plurality of clustering centers to be tested includes:
preprocessing the suggested text segments to be clustered to obtain a plurality of text characteristic words;
converting the text feature words into corresponding word vectors one by one to obtain a plurality of word vectors;
superposing the word vectors to obtain a text vector;
and clustering the plurality of text vectors by using a preset clustering algorithm to obtain a plurality of clustering centers to be tested and the cluster number of each text vector.
In an embodiment, the cluster centers to be tested include cluster center vectors, the pre-stored cluster centers include pre-stored cluster center vectors, and the determining whether the plurality of cluster centers to be tested are similar to the pre-stored cluster centers respectively includes:
acquiring a cluster center vector in a current cluster center to be detected;
judging whether the similarity degree of the current clustering center vector and a pre-stored clustering center vector is smaller than a preset threshold value or not;
and if the similarity degree of the current cluster center vector and the pre-stored cluster center vector is smaller than a preset threshold value, judging that the current cluster center to be tested is not similar to the pre-stored cluster center.
In an embodiment, the text segments to be clustered and suggested include stop words, and the performing word segmentation preprocessing on the text segments to be clustered and suggested to obtain a plurality of text feature words includes:
removing the stop words in the suggested text segments to be clustered by using a stop word dictionary.
In an embodiment, the suggested text segments to be clustered further include synonyms, and performing word segmentation preprocessing on the suggested text segments to be clustered to obtain a plurality of text feature words includes:
and carrying out synonym replacement on all synonyms in the suggested text segment to be clustered by utilizing a synonym dictionary.
In the embodiment of the invention, the text segments of the suggestions to be clustered are obtained from the user comment text data through the preset sentence pattern rule, then the text segments of the suggestions to be clustered are clustered to obtain the clustering center to be tested, and then whether the clustering center to be tested is similar to the pre-stored clustering center is judged, so that whether the text segments of the suggestions to be clustered corresponding to the clustering center to be tested are new user suggestions is determined, the efficiency of finding new user suggestions is improved, the whole user comment text data is not required to be clustered completely, the time is saved, and the efficiency of finding new user suggestions is improved.
Example 4
Referring to fig. 5, an embodiment of the present invention further provides a new user suggestion real-time discovery apparatus 400, where the new user suggestion real-time discovery apparatus 400 differs from the new user suggestion real-time discovery apparatus 400 provided in embodiment 3 in that the apparatus further includes: a first cleaning unit 409 and a first updating unit 410.
The first cleaning unit 409 is configured to perform data cleaning processing on the user comment text data before the user comment text data and the rule file are acquired, so as to filter out noise data.
A second returning unit 410, configured to return to the step of obtaining the user comment text data and the rule file if the current cluster center to be detected is the last cluster center to be detected among the multiple cluster centers to be detected, so as to obtain the user comment text data again.
In the embodiment of the invention, the data cleaning processing is carried out on the user comment text data to filter out noise data, so that the quality of the obtained user comment text data can be improved, and the frequency of errors in the subsequent processing process of the user comment text data is reduced; and returning to the step of acquiring the user comment text data and the rule file to acquire the user comment text data again, so that new user suggestions are continuously discovered.
Example 5
Referring to fig. 6, an embodiment of the present invention further provides an electronic device, which includes a processor 111, a communication interface 112, a memory 113, and a communication bus 114, where the processor 111, the communication interface 112, and the memory 113 complete mutual communication through the communication bus 114.
A memory 113 for storing a computer program;
and the processor 111 is used for executing the program stored in the memory 113 to realize the real-time discovery method of the new user suggestion provided by the embodiment 1.
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by the processor 111, implements the steps of the method for discovering new suggestions in real time by a user as provided in embodiment 1.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for discovering new suggestions in real time by a user, comprising:
acquiring user comment text data and a rule file, wherein the rule file comprises a preset sentence pattern rule;
extracting the user comment text data according to a preset sentence pattern rule to obtain a suggested text segment to be clustered;
clustering a plurality of the suggested text segments to be clustered to obtain a plurality of clustering centers to be tested;
respectively judging whether a plurality of clustering centers to be detected are similar to pre-stored clustering centers;
if the current cluster center to be tested is not similar to the pre-stored cluster center, newly building a cluster center class on the pre-stored cluster center, and meanwhile judging that a new user suggestion is detected, wherein the cluster center to be tested is the newly built cluster center class;
if the current cluster center to be tested is similar to the pre-stored cluster center, merging the cluster center to be tested into the pre-stored cluster center;
judging whether the current cluster center to be tested belongs to the last cluster center to be tested in the plurality of cluster centers to be tested;
if not, returning to the step of respectively judging whether the plurality of clustering centers to be detected are similar to the pre-stored clustering centers.
2. The method of claim 1, wherein before obtaining the user comment text data and the rule file, the method further comprises:
and carrying out data cleaning processing on the user comment text data to filter out noise data.
3. The method according to claim 1, wherein the clustering the plurality of text segments to be clustered and suggested to obtain a plurality of clustering centers to be tested comprises:
preprocessing the suggested text segments to be clustered to obtain a plurality of text characteristic words;
converting the text feature words into corresponding word vectors one by one to obtain a plurality of word vectors;
superposing the word vectors to obtain a text vector;
and clustering the plurality of text vectors by using a preset clustering algorithm to obtain a plurality of clustering centers to be tested and the cluster number of each text vector.
4. The method according to claim 1, wherein the cluster centers to be tested include a cluster center vector, the pre-stored cluster centers include a pre-stored cluster center vector, and the determining whether the plurality of cluster centers to be tested are similar to the pre-stored cluster centers respectively comprises:
acquiring a cluster center vector in the current cluster center to be detected;
judging whether the similarity degree of the current clustering center vector and a pre-stored clustering center vector is smaller than a preset threshold value or not;
and if the similarity degree of the current cluster center vector and the pre-stored cluster center vector is smaller than a preset threshold value, judging that the current cluster center to be tested is not similar to the pre-stored cluster center.
5. The method according to claim 3, wherein the text segments to be clustered and suggested include stop words, and performing word segmentation preprocessing on the text segments to be clustered and suggested to obtain a plurality of text feature words comprises:
removing the stop words in the suggested text segments to be clustered by using a stop word dictionary.
6. The method according to claim 3, wherein the text segments to be clustered and suggested further include synonyms, and the pre-processing of the segmentation of the text segments to be clustered and suggested to obtain a plurality of text feature words includes:
and carrying out synonym replacement on all synonyms in the suggested text segment to be clustered by utilizing a synonym dictionary.
7. The method according to claim 1, wherein after determining whether the plurality of to-be-detected cluster centers are similar to the pre-stored cluster centers, the method further comprises:
and if the current cluster center to be tested is the last cluster center to be tested in the plurality of cluster centers to be tested, returning to the step of obtaining the user comment text data and the rule file so as to obtain the user comment text data again.
8. An apparatus for user-new advice real-time discovery, comprising means for performing the method of any of claims 1-7.
9. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the steps of the method of any one of claims 1 to 7 when executing a program stored in the memory.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202111356575.3A 2021-11-16 2021-11-16 Method, device, equipment and medium for discovering new suggestions of user in real time Active CN114064895B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111356575.3A CN114064895B (en) 2021-11-16 2021-11-16 Method, device, equipment and medium for discovering new suggestions of user in real time

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111356575.3A CN114064895B (en) 2021-11-16 2021-11-16 Method, device, equipment and medium for discovering new suggestions of user in real time

Publications (2)

Publication Number Publication Date
CN114064895A true CN114064895A (en) 2022-02-18
CN114064895B CN114064895B (en) 2023-12-19

Family

ID=80272982

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111356575.3A Active CN114064895B (en) 2021-11-16 2021-11-16 Method, device, equipment and medium for discovering new suggestions of user in real time

Country Status (1)

Country Link
CN (1) CN114064895B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170337266A1 (en) * 2016-05-19 2017-11-23 Conduent Business Services, Llc Method and system for data processing for text classification of a target domain
CN109766437A (en) * 2018-12-07 2019-05-17 中科恒运股份有限公司 A kind of Text Clustering Method, text cluster device and terminal device
CN110888978A (en) * 2018-09-06 2020-03-17 北京京东金融科技控股有限公司 Article clustering method and device, electronic equipment and storage medium
CN111091000A (en) * 2019-12-24 2020-05-01 深圳视界信息技术有限公司 Processing system and method for extracting user fine-grained typical opinion data
CN111753082A (en) * 2020-03-23 2020-10-09 北京沃东天骏信息技术有限公司 Text classification method and device based on comment data, equipment and medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170337266A1 (en) * 2016-05-19 2017-11-23 Conduent Business Services, Llc Method and system for data processing for text classification of a target domain
CN110888978A (en) * 2018-09-06 2020-03-17 北京京东金融科技控股有限公司 Article clustering method and device, electronic equipment and storage medium
CN109766437A (en) * 2018-12-07 2019-05-17 中科恒运股份有限公司 A kind of Text Clustering Method, text cluster device and terminal device
CN111091000A (en) * 2019-12-24 2020-05-01 深圳视界信息技术有限公司 Processing system and method for extracting user fine-grained typical opinion data
CN111753082A (en) * 2020-03-23 2020-10-09 北京沃东天骏信息技术有限公司 Text classification method and device based on comment data, equipment and medium

Also Published As

Publication number Publication date
CN114064895B (en) 2023-12-19

Similar Documents

Publication Publication Date Title
CN108647205B (en) Fine-grained emotion analysis model construction method and device and readable storage medium
CN108804512B (en) Text classification model generation device and method and computer readable storage medium
CN106776544B (en) Character relation recognition method and device and word segmentation method
CN103336766B (en) Short text garbage identification and modeling method and device
JP2012118977A (en) Method and system for machine-learning based optimization and customization of document similarity calculation
CN106445915B (en) New word discovery method and device
CN111767725A (en) Data processing method and device based on emotion polarity analysis model
CN111753082A (en) Text classification method and device based on comment data, equipment and medium
CN112214576B (en) Public opinion analysis method, public opinion analysis device, terminal equipment and computer readable storage medium
CN110674301A (en) Emotional tendency prediction method, device and system and storage medium
CN112860896A (en) Corpus generalization method and man-machine conversation emotion analysis method for industrial field
CN113239268A (en) Commodity recommendation method, device and system
CN110457707B (en) Method and device for extracting real word keywords, electronic equipment and readable storage medium
CN109543002B (en) Method, device and equipment for restoring abbreviated characters and storage medium
CN111428487B (en) Model training method, lyric generation method, device, electronic equipment and medium
CN113609865A (en) Text emotion recognition method and device, electronic equipment and readable storage medium
CN112163415A (en) User intention identification method and device for feedback content and electronic equipment
CN113204643A (en) Entity alignment method, device, equipment and medium
CN110347934B (en) Text data filtering method, device and medium
CN111460114A (en) Retrieval method, device, equipment and computer readable storage medium
CN109511000B (en) Bullet screen category determination method, bullet screen category determination device, bullet screen category determination equipment and storage medium
CN114064895B (en) Method, device, equipment and medium for discovering new suggestions of user in real time
CN107590163B (en) The methods, devices and systems of text feature selection
CN114385791A (en) Text expansion method, device, equipment and storage medium based on artificial intelligence
CN112632229A (en) Text clustering method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 518057 401, block a, sharing building, No. 78, Keyuan North Road, songpingshan community, Xili street, Nanshan District, Shenzhen, Guangdong

Applicant after: Shenzhen Shukuo Information Technology Co.,Ltd.

Address before: 518057 401, block a, sharing building, No. 78, Keyuan North Road, songpingshan community, Xili street, Nanshan District, Shenzhen, Guangdong

Applicant before: SHENZHEN SKIEER INFORMATION TECHNOLOGY CO.,LTD.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant