CN111831819A - Text updating method and device - Google Patents

Text updating method and device Download PDF

Info

Publication number
CN111831819A
CN111831819A CN201910492295.1A CN201910492295A CN111831819A CN 111831819 A CN111831819 A CN 111831819A CN 201910492295 A CN201910492295 A CN 201910492295A CN 111831819 A CN111831819 A CN 111831819A
Authority
CN
China
Prior art keywords
text
target
vocabulary
clustering
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910492295.1A
Other languages
Chinese (zh)
Other versions
CN111831819B (en
Inventor
陈道昌
郑海霞
刘明星
王奕
朱宏图
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Didi Infinity Technology and Development Co Ltd
Original Assignee
Beijing Didi Infinity Technology and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Didi Infinity Technology and Development Co Ltd filed Critical Beijing Didi Infinity Technology and Development Co Ltd
Priority to CN201910492295.1A priority Critical patent/CN111831819B/en
Publication of CN111831819A publication Critical patent/CN111831819A/en
Application granted granted Critical
Publication of CN111831819B publication Critical patent/CN111831819B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a text updating method and a text updating device, which are used for acquiring a plurality of option texts corresponding to a target problem text; respectively determining a text vector corresponding to each option text; based on the text vectors which are clustered to obtain the text vectors included in each clustering group in a plurality of clustering groups, and the number of the text vectors included in each clustering group, screening target vocabularies from vocabularies corresponding to the text vectors; and determining a new option text corresponding to the target problem text based on the target vocabulary obtained by screening. According to the method and the device, the text vectors of the corresponding option texts in the target problem texts can be clustered, the target vocabulary is further screened out, the option texts of the target problem texts are updated, the option texts corresponding to the problem texts do not need to be updated manually, resources are effectively saved, the efficiency of updating the option texts is improved, and the timeliness of the option texts is enhanced.

Description

Text updating method and device
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a text updating method and apparatus.
Background
In life and work, questionnaires are a common tool used by people to collect data during research activities. With the advanced development of network information technology, the method realizes the information collection by using the questionnaire through the network, can save a large amount of time for searching for the questionnaire, and reduces the workload for processing the questionnaire due to the digitalization of the questionnaire.
However, currently, the method of updating and adjusting the option text of the question in the questionnaire is still manual processing, which has high labor cost, high requirement on professional level of the processor, low processing efficiency, and low timeliness of the option text of the question in the questionnaire.
Disclosure of Invention
In view of this, an object of the present application is to provide a text updating method and apparatus, in which a text vector of an option text corresponding to a target problem text in a questionnaire is clustered, a target vocabulary is screened, and then the option text of the target problem text is updated by using the target vocabulary, so that manual updating of the option text is avoided, resources are effectively saved, efficiency of updating the option text is improved, and timeliness of the option text corresponding to the problem text is enhanced.
In a first aspect, the present application provides a text updating method, including:
acquiring a plurality of option texts corresponding to the target question text;
respectively determining a text vector corresponding to each option text;
based on clustering processing of the text vectors to obtain the text vectors included in each clustering group in a plurality of clustering groups, and the number of the text vectors included in each clustering group, screening target vocabularies from vocabularies corresponding to the text vectors;
and determining a new option text corresponding to the target question text based on the target vocabulary obtained by screening.
In a possible implementation, the separately determining a text vector corresponding to each option text includes:
performing word segmentation processing on each option text respectively, and determining a word vector corresponding to each vocabulary obtained through word segmentation processing;
respectively determining a first weight corresponding to each vocabulary based on the occurrence frequency of each vocabulary in a target text to which the target problem text belongs and the occurrence frequency of each vocabulary in all target texts; wherein each target text comprises at least one target question text;
for each option text, determining a text vector corresponding to the option text based on the word vector corresponding to each vocabulary in the option text and the first weight corresponding to each vocabulary.
In a possible implementation manner, the determining a word vector corresponding to each vocabulary resulting from the word segmentation process includes:
obtaining a word vector library obtained through pre-training; the word vector library comprises a plurality of word vectors, and each word vector corresponds to a vocabulary;
and screening the word vector corresponding to each vocabulary obtained by word segmentation processing from the word vector library.
In a possible implementation manner, the determining a text vector corresponding to the option text based on the word vector corresponding to each vocabulary in the option text and the first weight corresponding to each vocabulary includes:
aiming at each vocabulary in the option text, calculating the product of the word vector corresponding to the vocabulary and the first weight of the vocabulary to obtain the word vector weighted by the vocabulary;
and calculating the sum of the word vectors weighted by all the vocabularies in the option text to obtain a text vector corresponding to the option text.
In a possible implementation manner, the determining the first weight corresponding to each vocabulary based on the number of times that each vocabulary appears in the target text to which the target question text belongs and the number of times that each vocabulary appears in all target texts respectively includes:
and aiming at each vocabulary, calculating the ratio of the number of times of the vocabulary appearing in the target text to which the target problem text belongs to and the number of times of the vocabulary appearing in all the target texts, and obtaining a first weight corresponding to the vocabulary.
In a possible implementation manner, the screening a target vocabulary from vocabularies corresponding to text vectors based on clustering the text vectors to obtain the text vectors included in each of a plurality of clustering groups, where each clustering group includes the number of text vectors, includes:
clustering all the text vectors, and selecting the first N clustering groups with the largest number of the text vectors from a plurality of clustering groups obtained by clustering to obtain N target clustering groups; wherein N is a positive integer;
aiming at each target clustering group, determining a clustering center vector corresponding to the target clustering group based on all text vectors in the target clustering group;
and screening target words from words corresponding to the text vectors included in all the target cluster groups based on all the text vectors included in each target cluster group and the cluster center vector corresponding to each target cluster group.
In a possible implementation, the screening target words from words corresponding to text vectors included in all target cluster groups based on all text vectors included in each target cluster group and a cluster center vector corresponding to each target cluster group includes:
for each text vector in each target clustering group, determining a second weight corresponding to the text vector based on the text vector and a clustering center vector corresponding to the target clustering group;
determining a third weight of each vocabulary corresponding to the text vector based on the second weight corresponding to the text vector;
and screening target words from all words corresponding to each text vector in the N target clustering groups based on the determined third weight of each word.
In a possible embodiment, the determining a second weight corresponding to the text vector based on the cluster center vector corresponding to the text vector and the target cluster group includes:
calculating cosine values of the text vectors and cluster center vectors corresponding to the target cluster groups;
and determining a second weight corresponding to the text vector based on the obtained cosine value.
In a possible implementation, the filtering target vocabularies from all vocabularies corresponding to each text vector in the N target cluster groups based on the determined third weight of each vocabulary includes:
screening the vocabulary corresponding to the maximum M third weights as the target vocabulary; wherein M is a positive integer.
In a possible embodiment, the determining, based on all text vectors in the target cluster group, a cluster center vector corresponding to the target cluster group includes:
calculating the sum of all text vectors in the target clustering group to obtain a candidate center vector;
and dividing the candidate center vector by the number of the text vectors in the target clustering group to obtain a clustering center vector corresponding to the target clustering group.
In a second aspect, the present application provides a text updating apparatus, including:
the acquisition module is used for acquiring a plurality of option texts corresponding to the target question text;
the first determining module is used for respectively determining a text vector corresponding to each option text;
the screening module is used for screening target vocabularies from vocabularies corresponding to the text vectors based on the text vectors included in each clustering group in the plurality of clustering groups obtained by clustering the text vectors, and the number of the text vectors included in each clustering group;
and the second determining module is used for determining a new option text corresponding to the target question text based on the target vocabulary obtained by screening.
In one possible implementation, the first determining module includes:
the first determining unit is used for performing word segmentation processing on each option text and determining a word vector corresponding to each vocabulary obtained through word segmentation processing;
the second determining unit is used for respectively determining a first weight corresponding to each vocabulary based on the occurrence frequency of each vocabulary in the target text to which the target problem text belongs and the occurrence frequency of each vocabulary in all the target texts; wherein each target text comprises at least one target question text;
and a third determining unit, configured to determine, for each option text, a text vector corresponding to the option text based on the word vector corresponding to each vocabulary in the option text and the first weight corresponding to each vocabulary.
In a possible implementation manner, when determining the word vector corresponding to each vocabulary obtained by the word segmentation process, the first determining unit is specifically configured to:
obtaining a word vector library obtained through pre-training; the word vector library comprises a plurality of word vectors, and each word vector corresponds to a vocabulary;
and screening the word vector corresponding to each vocabulary obtained by word segmentation processing from the word vector library.
In a possible implementation manner, the third determining unit is specifically configured to:
aiming at each vocabulary in the option text, calculating the product of the word vector corresponding to the vocabulary and the first weight of the vocabulary to obtain the word vector weighted by the vocabulary;
and calculating the sum of the word vectors weighted by all the vocabularies in the option text to obtain a text vector corresponding to the option text.
In a possible implementation manner, the second determining unit is specifically configured to:
and aiming at each vocabulary, calculating the ratio of the number of times of the vocabulary appearing in the target text to which the target problem text belongs to and the number of times of the vocabulary appearing in all the target texts, and obtaining a first weight corresponding to the vocabulary.
In one possible embodiment, the screening module includes:
the clustering unit is used for clustering all the text vectors, and selecting the first N clustering groups with the largest number of the text vectors from a plurality of clustering groups obtained by clustering to obtain N target clustering groups; wherein N is a positive integer;
a fourth determining unit, configured to determine, for each target cluster group, a cluster center vector corresponding to the target cluster group based on all text vectors in the target cluster group;
and the screening unit is used for screening target vocabularies from vocabularies corresponding to the text vectors included in all the target cluster groups based on all the text vectors included in each target cluster group and the cluster center vector corresponding to each target cluster group.
In a possible embodiment, the screening unit is specifically configured to:
for each text vector in each target clustering group, determining a second weight corresponding to the text vector based on the text vector and a clustering center vector corresponding to the target clustering group;
determining a third weight of each vocabulary corresponding to the text vector based on the second weight corresponding to the text vector;
and screening target words from all words corresponding to each text vector in the N target clustering groups based on the determined third weight of each word.
In a possible implementation manner, when determining the second weight corresponding to the text vector based on the cluster center vector corresponding to the text vector and the target cluster group, the screening unit is specifically configured to:
calculating cosine values of the text vectors and cluster center vectors corresponding to the target cluster groups;
and determining a second weight corresponding to the text vector based on the obtained cosine value.
In a possible implementation manner, the screening unit, when screening the target vocabulary from all vocabularies corresponding to each text vector in the N target cluster groups based on the determined third weight of each vocabulary, is specifically configured to:
screening the vocabulary corresponding to the maximum M third weights as the target vocabulary; wherein M is a positive integer.
In a possible implementation manner, the fourth determining unit is specifically configured to:
calculating the sum of all text vectors in the target clustering group to obtain a candidate center vector;
and dividing the candidate center vector by the number of the text vectors in the target clustering group to obtain a clustering center vector corresponding to the target clustering group.
In a third aspect, an embodiment of the present application further provides an electronic device, including: the device comprises a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, when an electronic device runs, the processor and the storage medium communicate through the bus, and the processor executes the machine-readable instructions to execute the steps in any one of the possible implementation manners of the first aspect and the first possible implementation manner of the first aspect of the embodiment of the present application.
In a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to perform the steps in any one of the possible implementation manners of the first aspect and the first aspect of the embodiment of the present application.
The method and the device for updating the text acquire a plurality of option texts corresponding to the target problem text; respectively determining a text vector corresponding to each option text; based on clustering processing of the text vectors to obtain the text vectors included in each clustering group in a plurality of clustering groups, and the number of the text vectors included in each clustering group, screening target vocabularies from vocabularies corresponding to the text vectors; and determining a new option text corresponding to the target question text based on the target vocabulary obtained by screening. According to the technical scheme, the text vectors of the option texts corresponding to the target problem texts in the questionnaire are determined and clustered, so that the target words are screened out, the option texts corresponding to the target problem texts are updated by the target words, the option texts in the problem texts of the questionnaire do not need to be updated manually, resources are effectively saved, the efficiency of updating the option texts corresponding to the problem texts is improved, and the timeliness of the option texts is enhanced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a flowchart illustrating a text updating method provided by an embodiment of the present application;
FIG. 2 is a flow chart illustrating another text updating method provided by the embodiment of the present application;
FIG. 3 is a flow chart of another text updating method provided by the embodiment of the application;
FIG. 4 is a block diagram of a text update apparatus according to an embodiment of the present application;
fig. 5 shows a second block diagram of a text updating apparatus according to an embodiment of the present application;
fig. 6 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for illustrative and descriptive purposes only and are not used to limit the scope of protection of the present application. Additionally, it should be understood that the schematic drawings are not necessarily drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be performed out of order, and steps without logical context may be performed in reverse order or simultaneously. One skilled in the art, under the guidance of this application, may add one or more other operations to, or remove one or more operations from, the flowchart.
In addition, the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that in the embodiments of the present application, the term "comprising" is used to indicate the presence of the features stated hereinafter, but does not exclude the addition of further features.
In order to save unnecessary resource consumption and improve the efficiency of updating option texts corresponding to question texts, the application aims to provide a text updating method and a text updating device, which can be used for screening out target words and updating option texts corresponding to target question texts by determining text vectors of option texts corresponding to target question texts in an questionnaire and clustering the text vectors, so that the option texts corresponding to the target question texts are not required to be updated manually, unnecessary resource consumption is effectively saved, the efficiency of updating the option texts is improved, and the timeliness of the option texts is enhanced.
Referring to fig. 1, fig. 1 is a flowchart of a text updating method according to an embodiment of the present application, where the method is executed by a server.
As shown in fig. 1, the text updating method includes the following steps:
and S110, acquiring a plurality of option texts corresponding to the target question texts.
In this step, the text updating apparatus or the server may establish a communication connection with a database in which a plurality of questionnaires are stored, and obtain a target question text and a plurality of option texts corresponding to the target question text therefrom. The multiple option texts can be multiple option texts corresponding to target question texts in multiple questionnaires; the option text may also be text entered by the survey user.
Specifically, the multiple questionnaire texts can be segmented according to the questions, the option texts corresponding to the same question texts in each questionnaire are taken as a whole, and then the option texts corresponding to the same question texts are processed.
The target question text may be a specific question text in a type of questionnaire, the option text may be an option corresponding to the target question text, and the option text may be a chinese text, an english text, or the like, or a mixed text including multiple languages.
For example, in the questionnaire text of the service industry category, the "what aspects of the service need improvement" may be used as the target question text, and the options related to the question text, such as "quality of service", "speed of service", and the text input by the surveyed user according to the target question text, may be used as the option text.
And S120, respectively determining a text vector corresponding to each option text.
In this step, the text updating apparatus or the server may process each option text, extract text features in the option text, and determine a text vector corresponding to each option text based on the text features.
The text vectors can be vectors capable of representing option text semantics, the distance between the text vectors in a preset coordinate system can represent the similarity degree of the option text semantics corresponding to the text vectors, and the closer the distance between the two text vectors is, the closer the semantics of the option text corresponding to the two text vectors are.
S130, based on the text vectors which are clustered to obtain the text vectors included in each clustering group in the clustering groups, and the number of the text vectors included in each clustering group, screening target vocabularies from vocabularies corresponding to the text vectors.
In this step, the text vectors may be clustered according to specific numerical values of the text vectors to obtain a plurality of clustering groups, each clustering group including a plurality of text vectors corresponding to the clustering group; the text vectors corresponding to the clustering groups have spatial similarity, i.e. their semantics are similar.
Further, according to the specific numerical value of the text vector in each cluster group and the number of the text vectors in each cluster group, the relatively important cluster group in the plurality of cluster groups and one or more target vocabularies which can best reflect the semantics of the text vectors in the cluster group can be determined.
And S140, determining a new option text corresponding to the target problem text based on the target vocabulary obtained by screening.
In this step, the plurality of target words can be converted into new option texts corresponding to the target problem texts according to the semantics of the one or more target words obtained through screening.
When a plurality of target vocabularies exist from a plurality of cluster groups, the new option text corresponding to each cluster group can be determined respectively according to the target vocabulary corresponding to each cluster group. Of course, the target words of the cluster groups with different groups can be combined to form a new option text. In addition, necessary conjunctions may be added in the process of generating a new option text using the target vocabulary.
The text updating method provided by the embodiment of the application obtains a plurality of option texts corresponding to a target problem text; respectively determining a text vector corresponding to each option text; based on clustering processing of the text vectors to obtain the text vectors included in each clustering group in a plurality of clustering groups, and the number of the text vectors included in each clustering group, screening target vocabularies from vocabularies corresponding to the text vectors; and determining a new option text corresponding to the target question text based on the target vocabulary obtained by screening. According to the technical scheme, the text vectors of the option texts corresponding to the target problem texts in the questionnaire are determined and clustered, so that the target words are screened out, the target problem texts are updated, the option texts in the problem texts of the questionnaire do not need to be updated manually, unnecessary resource consumption is effectively saved, the efficiency of updating the option texts is improved, and the timeliness of the option texts corresponding to the problem texts is enhanced.
Referring to fig. 2, fig. 2 is a flowchart of another text updating method according to an embodiment of the present disclosure. As shown in fig. 2, the specific implementation process is as follows:
s210, a plurality of option texts corresponding to the target question texts are obtained.
And S220, performing word segmentation processing on each option text respectively, and determining a word vector corresponding to each vocabulary obtained through word segmentation processing.
In this step, after obtaining a plurality of option texts corresponding to a target problem text, a text updating device or a server may perform word segmentation processing on the plurality of option texts by using a word segmentation tool such as Chinese segmentation in the Chinese, remove stop words in the option texts, and query a word vector corresponding to each vocabulary through a preset word vector library.
The preset word vector library can be obtained by training through the text in the corpus and a preset word vector algorithm. Specifically, a context formed by each vocabulary and the preceding and following vocabularies thereof can be trained, and semantic representation of each vocabulary is determined through a neural network, so that a word vector corresponding to each vocabulary is obtained. The spatial position of each word vector corresponds to the semantics of the word vector.
And S230, respectively determining a first weight corresponding to each vocabulary based on the occurrence frequency of each vocabulary in the target text to which the target problem text belongs and the occurrence frequency of each vocabulary in all the target texts.
Wherein each target text comprises at least one target question text, and specifically, the target text here may be a questionnaire.
Specifically, the word frequency inverse document rate (Tf-IDF) of each vocabulary in the target file to which the target question text belongs may be calculated, and the obtained word frequency inverse document rate is used as the first weight corresponding to each vocabulary. In specific implementation, the first weight is a quotient obtained by dividing the number of times that the corresponding vocabulary appears in the target text to which the target question text belongs by the number of times that the vocabulary appears in all the target texts.
S240, aiming at each option text, determining a text vector corresponding to the option text based on a word vector corresponding to each vocabulary in the option text and a first weight corresponding to each vocabulary.
In this step, after the text updating apparatus or the server determines the word vector and the first weight corresponding to each vocabulary, specifically, the text vector corresponding to the selected text may be obtained by performing weighted summation on the word vector corresponding to each vocabulary, where the added weight may be the first weight.
The text vector corresponding to the option text is formed by integrating word vectors of words corresponding to the option text, and the form of the text vector is similar to the form of the word vector and can represent the semantic meaning of the corresponding option text.
S250, based on the text vectors which are clustered to obtain the text vectors included in each clustering group in the clustering groups, and the number of the text vectors included in each clustering group, screening target words from words corresponding to the text vectors.
And S260, determining a new option text corresponding to the target question text based on the target vocabulary obtained by screening.
The descriptions of step S210, step S250 to step S260 may refer to the descriptions of step S110, step S130 to step S140, and the same technical effect may be achieved, which is not described herein again.
In some embodiments of the present application, step S220 comprises:
obtaining a word vector library obtained through pre-training; the word vector library comprises a plurality of word vectors, and each word vector corresponds to a vocabulary; and screening the word vector corresponding to each vocabulary obtained by word segmentation processing from the word vector library.
In this step, after performing word segmentation processing on each option text, a word vector library obtained by pre-training may be obtained through communication connection or direct obtaining, where the word vector library stores a plurality of word vectors, the stored word vectors may be obtained by processing a preset text through a preset word vector model, the word vector model may be a model such as a neural network, each word vector corresponds to a vocabulary, and the word vectors may represent semantics of the vocabulary corresponding to the word vector.
Further, word vectors corresponding to the words obtained by word segmentation processing can be searched from a word vector database based on the corresponding relationship between the word vectors and the words.
In some embodiments of the present application, step S240 includes:
aiming at each vocabulary in the option text, calculating the product of the word vector corresponding to the vocabulary and the first weight of the vocabulary to obtain the word vector weighted by the vocabulary; and calculating the sum of the word vectors weighted by all the vocabularies in the option text to obtain a text vector corresponding to the option text.
In this way, the word vector corresponding to the vocabulary is weighted, and the obtained text vector corresponding to the option text can reflect the semantics of the option text more completely.
In some embodiments of the present application, step S230 comprises:
and aiming at each vocabulary, calculating the ratio of the number of times of the vocabulary appearing in the target text to which the target problem text belongs to and the number of times of the vocabulary appearing in all the target texts, and obtaining a first weight corresponding to the vocabulary.
In this step, the frequency of each vocabulary appearing in the target text to which the target problem text belongs, that is, the word frequency of the vocabulary in the target text, may represent the importance degree of the vocabulary in the target text, and the higher the frequency of appearance in the target text, the more important the vocabulary in the target text is; the ratio of the number of times that the vocabulary appears in all the target texts, namely the inverse text rate of the vocabulary, indicates that the less the number of times that the vocabulary appears in all the target texts, relative to the importance degree of the target texts to which the vocabulary belongs, the more important the vocabulary appears in the target texts to which the vocabulary belongs.
Specifically, the first weight may be calculated according to the following formula:
Figure BDA0002087435200000141
wherein, A is the number of all target texts, B is the number of the target texts with the vocabulary, C is the number of times of the vocabulary appearing in the target texts to which the target problem texts belong, and D is the total number of the vocabulary in the target texts to which the target problem texts belong.
In some embodiments of the present application, step S250 specifically includes the following steps:
step (1), clustering all text vectors, and selecting the first N clustering groups with the largest number of text vectors from a plurality of clustering groups obtained by clustering to obtain N target clustering groups; wherein N is a positive integer.
In the step, clustering algorithms such as k-means can be used for clustering the text vectors, and parameters in the clustering algorithms can be adjusted according to the target text, namely the specific type of the questionnaire. After the text vectors are clustered by the text updating device or the server, a plurality of cluster classifications can be obtained, each cluster classification comprises one or more text vectors, specifically, the obtained plurality of cluster classifications are sorted in a descending order according to the number of the text vectors, and the more the number of the text vectors in one cluster classification is, the more important the option text corresponding to the cluster classification is; and selecting the first N clustering groups with the largest text vector quantity, and taking the selected N clustering groups as target clustering groups.
And (2) aiming at each target clustering group, and determining a clustering center vector corresponding to the target clustering group based on all text vectors in the target clustering group.
In the step, the clustering center vector corresponding to the target clustering group can be calculated according to the specific numerical values of all the text vectors in the target clustering group and the k-means and other aggregation algorithms.
Specifically, the center of the target cluster group may be calculated, and then the cluster center vector corresponding to the target cluster group is determined according to the distance between each text vector and the center.
And (3) screening target vocabularies from vocabularies corresponding to the text vectors included in all the target cluster groups based on all the text vectors included in each target cluster group and the cluster center vector corresponding to each target cluster group.
In this step, after the text updating device or the server determines the clustering center vector corresponding to the target clustering group, the text vector having the closest relationship may be screened out according to the semantic relationship represented by all the text vectors and the clustering center vector in the target clustering group, and the target vocabulary may be screened out from the screened text vectors.
In some embodiments of the present application, step (3) comprises the steps of:
and (31) determining a second weight corresponding to each text vector in each target cluster group based on the cluster center vector corresponding to the text vector and the target cluster group.
And (32) determining a third weight of each vocabulary corresponding to the text vector based on the second weight corresponding to the text vector.
Specifically, the second weight corresponding to the text vector may be used as the third weight of each vocabulary corresponding to the text vector.
And (33) screening target vocabularies from all vocabularies corresponding to each text vector in the N target cluster groups based on the determined third weight of each vocabulary.
In this step, the third weight of each vocabulary may represent the correlation between the semantics of the vocabulary in the text vector and the text vector, and the vocabulary with the most correlation between the semantics and the text vector may be selected as the target vocabulary.
In some embodiments of the present application, step (31) comprises:
calculating cosine values of the text vectors and cluster center vectors corresponding to the target cluster groups; and determining a second weight corresponding to the text vector based on the obtained cosine value.
Specifically, the cosine value of the cluster center vector corresponding to the text vector and the target cluster group may be used as the second weight corresponding to the text vector.
In some embodiments of the present application, step (33) comprises:
screening the vocabulary corresponding to the maximum M third weights as the target vocabulary; wherein M is a positive integer.
Therefore, the words with the highest relevance with the corresponding text vectors can be screened out and used as the target words, and the target words are converted into new option texts according to the semantics of the target words.
In some embodiments of the present application, step (2) comprises:
calculating the sum of all text vectors in the target clustering group to obtain a candidate center vector;
and dividing the candidate center vector by the number of the text vectors in the target clustering group to obtain a clustering center vector corresponding to the target clustering group.
The text updating method provided by the embodiment of the application obtains a plurality of option texts corresponding to a target problem text; performing word segmentation processing on each option text respectively, and determining a word vector corresponding to each vocabulary obtained through word segmentation processing; respectively determining a first weight corresponding to each vocabulary based on the occurrence frequency of each vocabulary in a target text to which the target problem text belongs and the occurrence frequency of each vocabulary in all target texts; wherein each target text comprises at least one target question text; for each option text, determining a text vector corresponding to the option text based on a word vector corresponding to each vocabulary in the option text and a first weight corresponding to each vocabulary; based on clustering processing of the text vectors to obtain the text vectors included in each clustering group in a plurality of clustering groups, and the number of the text vectors included in each clustering group, screening target vocabularies from vocabularies corresponding to the text vectors; and determining a new option text corresponding to the target question text based on the target vocabulary obtained by screening. According to the technical scheme, the text vectors of the option texts corresponding to the target question texts in the questionnaire are determined and clustered, so that the target words are screened out, the target question texts are updated, the option texts in the question texts of the questionnaire do not need to be updated manually, unnecessary resource consumption is effectively saved, the efficiency of updating the question texts is improved, and the timeliness of the option texts in the question texts is enhanced.
Referring to fig. 3, fig. 3 is a flowchart of another text updating method according to an embodiment of the present application.
As shown in fig. 3, the text updating method provided in this embodiment includes:
obtaining a questionnaire file;
dividing a questionnaire file into a plurality of sub-question answer texts according to questions in the questionnaire file, wherein the sub-question answer texts comprise a plurality of answer texts (namely the option texts) corresponding to the questions (namely the question texts);
performing feature extraction on the answer text to obtain text features of the answer text;
converting the text features into text vectors, wherein the text vectors are determined based on the pre-trained word vectors;
inputting the text vector into a semantic clustering model, and screening out a target vocabulary based on a clustering result of the text vector;
converting the target vocabulary into a new option text of a question answer text through a text abstract model;
and updates the new option text to the question answer text.
Referring to fig. 4 and 5, fig. 4 shows one block diagram of a text updating apparatus according to an embodiment of the present application, and fig. 5 shows a second block diagram of the text updating apparatus according to the embodiment of the present application. As shown in fig. 4, the text updating apparatus 400 includes:
an obtaining module 410, configured to obtain multiple option texts corresponding to the target question text;
a first determining module 420, configured to determine a text vector corresponding to each option text;
the screening module 430 is configured to screen a target vocabulary from vocabularies corresponding to text vectors based on clustering the text vectors to obtain the text vectors included in each of a plurality of clustering groups, where the number of the text vectors included in each clustering group is the number of the text vectors;
and a second determining module 440, configured to determine, based on the target vocabulary obtained by screening, a new option text corresponding to the target question text.
As shown in fig. 5, in some embodiments of the present application, the text updating apparatus 500 includes: an obtaining module 510, a first determining module 520, a screening module 530, and a second determining module 540. The first determining module 520 includes:
the first determining unit 521 is configured to perform word segmentation on each option text, and determine a word vector corresponding to each vocabulary obtained through word segmentation;
a second determining unit 522, configured to determine a first weight corresponding to each vocabulary respectively based on the number of times that each vocabulary appears in the target text to which the target question text belongs, and the number of times that each vocabulary appears in all target texts; wherein each target text comprises at least one target question text;
a third determining unit 523, configured to determine, for each option text, a text vector corresponding to the option text based on the word vector corresponding to each vocabulary in the option text and the first weight corresponding to each vocabulary.
In some embodiments of the present application, when determining the word vector corresponding to each vocabulary obtained by the word segmentation process, the first determining unit 521 is specifically configured to:
obtaining a word vector library obtained through pre-training; the word vector library comprises a plurality of word vectors, and each word vector corresponds to a vocabulary; and screening the word vector corresponding to each vocabulary obtained by word segmentation processing from the word vector library.
In some embodiments of the present application, the third determining unit 523 is specifically configured to:
aiming at each vocabulary in the option text, calculating the product of the word vector corresponding to the vocabulary and the first weight of the vocabulary to obtain the word vector weighted by the vocabulary; and calculating the sum of the word vectors weighted by all the vocabularies in the option text to obtain a text vector corresponding to the option text.
In some embodiments of the present application, the second determining unit 522 is specifically configured to:
and aiming at each vocabulary, calculating the ratio of the number of times of the vocabulary appearing in the target text to which the target problem text belongs to and the number of times of the vocabulary appearing in all the target texts, and obtaining a first weight corresponding to the vocabulary.
In some embodiments of the present application, the screening module 530 includes:
the clustering unit 531 is configured to perform clustering processing on all the text vectors, and select the first N clustering groups including the largest number of text vectors from a plurality of clustering groups obtained through the clustering processing, so as to obtain N target clustering groups; wherein N is a positive integer;
a fourth determining unit 532, configured to determine, for each target cluster group, a cluster center vector corresponding to the target cluster group based on all text vectors in the target cluster group;
the screening unit 533 is configured to screen the target words from the words corresponding to the text vectors included in all the target cluster groups based on all the text vectors included in each target cluster group and the cluster center vector corresponding to each target cluster group.
In some embodiments of the present application, the screening unit 533 is specifically configured to:
for each text vector in each target clustering group, determining a second weight corresponding to the text vector based on the text vector and a clustering center vector corresponding to the target clustering group; determining a third weight of each vocabulary corresponding to the text vector based on the second weight corresponding to the text vector; and screening target words from all words corresponding to each text vector in the N target clustering groups based on the determined third weight of each word.
In some embodiments of the present application, when determining the second weight corresponding to the text vector based on the cluster center vector corresponding to the text vector and the target cluster group, the screening unit 533 is specifically configured to:
calculating cosine values of the text vectors and cluster center vectors corresponding to the target cluster groups; and determining a second weight corresponding to the text vector based on the obtained cosine value.
In some embodiments of the present application, the filtering unit 533, when filtering the target vocabulary from all vocabularies corresponding to each text vector in the N target cluster groups based on the determined third weight of each vocabulary, is specifically configured to:
screening the vocabulary corresponding to the maximum M third weights as the target vocabulary; wherein M is a positive integer.
In some embodiments of the present application, the fourth determining unit 532 is specifically configured to:
calculating the sum of all text vectors in the target clustering group to obtain a candidate center vector;
and dividing the candidate center vector by the number of the text vectors in the target clustering group to obtain a clustering center vector corresponding to the target clustering group.
An embodiment of the present application discloses an electronic device, as shown in fig. 6, including: a processor 601, a memory 602, and a bus 603, wherein the memory 602 stores machine-readable instructions executable by the processor 601, and when the electronic device is operated, the processor 601 and the memory 602 communicate via the bus 603.
The machine readable instructions, when executed by the processor 601, perform the steps of the text updating method of:
acquiring a plurality of option texts corresponding to the target question text;
respectively determining a text vector corresponding to each option text;
based on clustering processing of the text vectors to obtain the text vectors included in each clustering group in a plurality of clustering groups, and the number of the text vectors included in each clustering group, screening target vocabularies from vocabularies corresponding to the text vectors;
and determining a new option text corresponding to the target question text based on the target vocabulary obtained by screening.
In some embodiments, the processor 601 is specifically configured to, when determining the text vector corresponding to each option text respectively:
performing word segmentation processing on each option text respectively, and determining a word vector corresponding to each vocabulary obtained through word segmentation processing;
respectively determining a first weight corresponding to each vocabulary based on the occurrence frequency of each vocabulary in a target text to which the target problem text belongs and the occurrence frequency of each vocabulary in all target texts; wherein each target text comprises at least one target question text;
for each option text, determining a text vector corresponding to the option text based on the word vector corresponding to each vocabulary in the option text and the first weight corresponding to each vocabulary.
In some embodiments, the processor 601 is specifically configured to perform, when determining the word vector corresponding to each vocabulary obtained by the word segmentation process:
obtaining a word vector library obtained through pre-training; the word vector library comprises a plurality of word vectors, and each word vector corresponds to a vocabulary;
and screening the word vector corresponding to each vocabulary obtained by word segmentation processing from the word vector library.
In some embodiments, the processor 601 is specifically configured to, when determining the text vector corresponding to the option text based on the word vector corresponding to each vocabulary in the option text and the first weight corresponding to each vocabulary, perform:
aiming at each vocabulary in the option text, calculating the product of the word vector corresponding to the vocabulary and the first weight of the vocabulary to obtain the word vector weighted by the vocabulary;
and calculating the sum of the word vectors weighted by all the vocabularies in the option text to obtain a text vector corresponding to the option text.
In some embodiments, the processor 601 is specifically configured to perform, when determining the first weight corresponding to each vocabulary respectively based on the number of times that each vocabulary appears in the target text to which the target question text belongs and the number of times that each vocabulary appears in all target texts:
and aiming at each vocabulary, calculating the ratio of the number of times of the vocabulary appearing in the target text to which the target problem text belongs to and the number of times of the vocabulary appearing in all the target texts, and obtaining a first weight corresponding to the vocabulary.
In some embodiments, the processor 601 is specifically configured to perform, when obtaining, based on the text vector being subjected to clustering processing, a text vector included in each of a plurality of clustering groups, and a number of text vectors included in each clustering group, and screening a target vocabulary from vocabularies corresponding to the text vectors, that:
clustering all the text vectors, and selecting the first N clustering groups with the largest number of the text vectors from a plurality of clustering groups obtained by clustering to obtain N target clustering groups; wherein N is a positive integer;
aiming at each target clustering group, determining a clustering center vector corresponding to the target clustering group based on all text vectors in the target clustering group;
and screening target words from words corresponding to the text vectors included in all the target cluster groups based on all the text vectors included in each target cluster group and the cluster center vector corresponding to each target cluster group.
In some embodiments, the processor 601 is specifically configured to perform, when filtering target vocabularies from vocabularies corresponding to text vectors included in all target cluster groups based on all text vectors included in each target cluster group and a cluster center vector corresponding to each target cluster group, the following steps:
for each text vector in each target clustering group, determining a second weight corresponding to the text vector based on the text vector and a clustering center vector corresponding to the target clustering group;
determining a third weight of each vocabulary corresponding to the text vector based on the second weight corresponding to the text vector;
and screening target words from all words corresponding to each text vector in the N target clustering groups based on the determined third weight of each word.
In some embodiments, the processor 601 is specifically configured to, when determining the second weight corresponding to the text vector based on the cluster center vector corresponding to the text vector and the target cluster group, perform:
calculating cosine values of the text vectors and cluster center vectors corresponding to the target cluster groups; and determining a second weight corresponding to the text vector based on the obtained cosine value.
In some embodiments, the processor 601 is specifically configured to perform, when filtering the target vocabulary from all vocabularies corresponding to each text vector in the N target cluster groups based on the determined third weight of each vocabulary, the following steps:
screening the vocabulary corresponding to the maximum M third weights as the target vocabulary; wherein M is a positive integer.
In some embodiments, when determining the cluster center vector corresponding to the target cluster group based on all the text vectors in the target cluster group, the processor 601 is specifically configured to perform:
calculating the sum of all text vectors in the target clustering group to obtain a candidate center vector;
and dividing the candidate center vector by the number of the text vectors in the target clustering group to obtain a clustering center vector corresponding to the target clustering group.
The embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the steps of the text updating method in any of the above embodiments.
The embodiment of the present application further provides a computer program product, which includes a computer-readable storage medium storing a nonvolatile program code executable by a processor, where instructions included in the program code may be used to execute the text updating method described in the foregoing method embodiment, and specific implementation may refer to the method embodiment, and is not described herein again.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to corresponding processes in the method embodiments, and are not described in detail in this application. In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and there may be other divisions in actual implementation, and for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some communication interfaces, and may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (11)

1. A text updating method, comprising:
acquiring a plurality of option texts corresponding to the target question text;
respectively determining a text vector corresponding to each option text;
based on clustering processing of the text vectors to obtain the text vectors included in each clustering group in a plurality of clustering groups, and the number of the text vectors included in each clustering group, screening target vocabularies from vocabularies corresponding to the text vectors;
and determining a new option text corresponding to the target question text based on the target vocabulary obtained by screening.
2. The method of claim 1, wherein the determining the text vector corresponding to each option text separately comprises:
performing word segmentation processing on each option text respectively, and determining a word vector corresponding to each vocabulary obtained through word segmentation processing;
respectively determining a first weight corresponding to each vocabulary based on the occurrence frequency of each vocabulary in a target text to which the target problem text belongs and the occurrence frequency of each vocabulary in all target texts; wherein each target text comprises at least one target question text;
for each option text, determining a text vector corresponding to the option text based on the word vector corresponding to each vocabulary in the option text and the first weight corresponding to each vocabulary.
3. The method according to claim 2, wherein the determining a word vector corresponding to each vocabulary obtained by the word segmentation process comprises:
obtaining a word vector library obtained through pre-training; the word vector library comprises a plurality of word vectors, and each word vector corresponds to a vocabulary;
and screening the word vector corresponding to each vocabulary obtained by word segmentation processing from the word vector library.
4. The method of claim 2, wherein determining the text vector corresponding to the option text based on the word vector corresponding to each vocabulary in the option text and the first weight corresponding to each vocabulary comprises:
aiming at each vocabulary in the option text, calculating the product of the word vector corresponding to the vocabulary and the first weight of the vocabulary to obtain the word vector weighted by the vocabulary;
and calculating the sum of the word vectors weighted by all the vocabularies in the option text to obtain a text vector corresponding to the option text.
5. The text updating method according to claim 1, wherein the step of screening a target vocabulary from vocabularies corresponding to the text vectors based on clustering the text vectors to obtain the text vectors included in each of a plurality of clustering groups, the number of the text vectors included in each clustering group comprises:
clustering all the text vectors, and selecting the first N clustering groups with the largest number of the text vectors from a plurality of clustering groups obtained by clustering to obtain N target clustering groups; wherein N is a positive integer;
aiming at each target clustering group, determining a clustering center vector corresponding to the target clustering group based on all text vectors in the target clustering group;
and screening target words from words corresponding to the text vectors included in all the target cluster groups based on all the text vectors included in each target cluster group and the cluster center vector corresponding to each target cluster group.
6. The text updating method according to claim 5, wherein the screening of the target vocabularies from the vocabularies corresponding to the text vectors included in all the target cluster groups based on all the text vectors included in each target cluster group and the cluster center vector corresponding to each target cluster group comprises:
for each text vector in each target clustering group, determining a second weight corresponding to the text vector based on the text vector and a clustering center vector corresponding to the target clustering group;
determining a third weight of each vocabulary corresponding to the text vector based on the second weight corresponding to the text vector;
and screening target words from all words corresponding to each text vector in the N target clustering groups based on the determined third weight of each word.
7. The method of claim 6, wherein determining the second weight corresponding to the text vector based on the cluster center vector corresponding to the text vector and the target cluster group comprises:
calculating cosine values of the text vectors and cluster center vectors corresponding to the target cluster groups;
and determining a second weight corresponding to the text vector based on the obtained cosine value.
8. The method of claim 6, wherein the filtering target words from all words corresponding to each text vector in the N target cluster groups based on the determined third weight of each word comprises:
screening the vocabulary corresponding to the maximum M third weights as the target vocabulary; wherein M is a positive integer.
9. A text updating apparatus, comprising:
the acquisition module is used for acquiring a plurality of option texts corresponding to the target question text;
the first determining module is used for respectively determining a text vector corresponding to each option text;
the screening module is used for screening target vocabularies from vocabularies corresponding to the text vectors based on the text vectors included in each clustering group in the plurality of clustering groups obtained by clustering the text vectors, and the number of the text vectors included in each clustering group;
and the second determining module is used for determining a new option text corresponding to the target question text based on the target vocabulary obtained by screening.
10. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the text updating method according to any one of claims 1 to 8.
11. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the text updating method according to any one of claims 1 to 8.
CN201910492295.1A 2019-06-06 2019-06-06 Text updating method and device Active CN111831819B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910492295.1A CN111831819B (en) 2019-06-06 2019-06-06 Text updating method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910492295.1A CN111831819B (en) 2019-06-06 2019-06-06 Text updating method and device

Publications (2)

Publication Number Publication Date
CN111831819A true CN111831819A (en) 2020-10-27
CN111831819B CN111831819B (en) 2024-07-16

Family

ID=72911563

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910492295.1A Active CN111831819B (en) 2019-06-06 2019-06-06 Text updating method and device

Country Status (1)

Country Link
CN (1) CN111831819B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112328799A (en) * 2021-01-06 2021-02-05 腾讯科技(深圳)有限公司 Question classification method and device
CN115544969A (en) * 2022-11-29 2022-12-30 明度智云(浙江)科技有限公司 Page comparison method, equipment and medium based on hypertext markup language

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102141977A (en) * 2010-02-01 2011-08-03 阿里巴巴集团控股有限公司 Text classification method and device
US20160253312A1 (en) * 2015-02-27 2016-09-01 Microsoft Technology Licensing, Llc Topically aware word suggestions
US20160370954A1 (en) * 2015-06-18 2016-12-22 Qualtrics, Llc Recomposing survey questions for distribution via multiple distribution channels
CN106294314A (en) * 2016-07-19 2017-01-04 北京奇艺世纪科技有限公司 Topics Crawling method and device
CN106611052A (en) * 2016-12-26 2017-05-03 东软集团股份有限公司 Text label determination method and device
CN107590125A (en) * 2017-09-07 2018-01-16 国网山东省电力公司 A kind of big data text real-time interaction method and device based on random algorithm
CN107741933A (en) * 2016-08-08 2018-02-27 北京京东尚科信息技术有限公司 Method and apparatus for detecting text
CN108170773A (en) * 2017-12-26 2018-06-15 百度在线网络技术(北京)有限公司 Media event method for digging, device, computer equipment and storage medium
US20180365248A1 (en) * 2017-06-14 2018-12-20 Sap Se Document representation for machine-learning document classification

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102141977A (en) * 2010-02-01 2011-08-03 阿里巴巴集团控股有限公司 Text classification method and device
US20160253312A1 (en) * 2015-02-27 2016-09-01 Microsoft Technology Licensing, Llc Topically aware word suggestions
US20160370954A1 (en) * 2015-06-18 2016-12-22 Qualtrics, Llc Recomposing survey questions for distribution via multiple distribution channels
CN106294314A (en) * 2016-07-19 2017-01-04 北京奇艺世纪科技有限公司 Topics Crawling method and device
CN107741933A (en) * 2016-08-08 2018-02-27 北京京东尚科信息技术有限公司 Method and apparatus for detecting text
CN106611052A (en) * 2016-12-26 2017-05-03 东软集团股份有限公司 Text label determination method and device
US20180365248A1 (en) * 2017-06-14 2018-12-20 Sap Se Document representation for machine-learning document classification
CN107590125A (en) * 2017-09-07 2018-01-16 国网山东省电力公司 A kind of big data text real-time interaction method and device based on random algorithm
CN108170773A (en) * 2017-12-26 2018-06-15 百度在线网络技术(北京)有限公司 Media event method for digging, device, computer equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张涛;刘康;赵军;: "一种基于图模型的维基概念相似度计算方法及其在实体链接系统中的应用", 中文信息学报, no. 02, 15 March 2015 (2015-03-15) *
薛苏琴;牛永洁;: "基于向量空间模型的中文文本相似度的研究", 电子设计工程, no. 10 *
马甲林;刘金岭;于长辉;: "一种高效中文文本聚类算法", 计算机工程与科学, no. 02 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112328799A (en) * 2021-01-06 2021-02-05 腾讯科技(深圳)有限公司 Question classification method and device
CN115544969A (en) * 2022-11-29 2022-12-30 明度智云(浙江)科技有限公司 Page comparison method, equipment and medium based on hypertext markup language
CN115544969B (en) * 2022-11-29 2023-03-21 明度智云(浙江)科技有限公司 Page comparison method, equipment and medium based on hypertext markup language

Also Published As

Publication number Publication date
CN111831819B (en) 2024-07-16

Similar Documents

Publication Publication Date Title
CN110162593B (en) Search result processing and similarity model training method and device
CN108804641B (en) Text similarity calculation method, device, equipment and storage medium
CN106649818B (en) Application search intention identification method and device, application search method and server
CN110297988B (en) Hot topic detection method based on weighted LDA and improved Single-Pass clustering algorithm
CN110287328B (en) Text classification method, device and equipment and computer readable storage medium
CN106874292B (en) Topic processing method and device
CN106709754A (en) Power user grouping method based on text mining
CN106886569B (en) ML-KNN multi-tag Chinese text classification method based on MPI
US20150199567A1 (en) Document classification assisting apparatus, method and program
CN107683469A (en) A kind of product classification method and device based on deep learning
CN106951498A (en) Text clustering method
CN110134777B (en) Question duplication eliminating method and device, electronic equipment and computer readable storage medium
CN108734159B (en) Method and system for detecting sensitive information in image
CN112836509A (en) Expert system knowledge base construction method and system
KR20190128246A (en) Searching methods and apparatus and non-transitory computer-readable storage media
CN111061939B (en) Scientific research academic news keyword matching recommendation method based on deep learning
CN103218368B (en) A kind of method and apparatus excavating hot word
CN112347223A (en) Document retrieval method, document retrieval equipment and computer-readable storage medium
CN111831819B (en) Text updating method and device
CN113392329A (en) Content recommendation method and device, electronic equipment and storage medium
CN110110143B (en) Video classification method and device
CN110929169A (en) Position recommendation method based on improved Canopy clustering collaborative filtering algorithm
CN111930885B (en) Text topic extraction method and device and computer equipment
CN117420998A (en) Client UI interaction component generation method, device, terminal and medium
CN109325096B (en) Knowledge resource search system based on knowledge resource classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant