CN111831819A

CN111831819A - Text updating method and device

Info

Publication number: CN111831819A
Application number: CN201910492295.1A
Authority: CN
Inventors: 陈道昌; 郑海霞; 刘明星; 王奕; 朱宏图
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2019-06-06
Filing date: 2019-06-06
Publication date: 2020-10-27
Anticipated expiration: 2039-06-06
Also published as: CN111831819B

Abstract

The application provides a text updating method and a text updating device, which are used for acquiring a plurality of option texts corresponding to a target problem text; respectively determining a text vector corresponding to each option text; based on the text vectors which are clustered to obtain the text vectors included in each clustering group in a plurality of clustering groups, and the number of the text vectors included in each clustering group, screening target vocabularies from vocabularies corresponding to the text vectors; and determining a new option text corresponding to the target problem text based on the target vocabulary obtained by screening. According to the method and the device, the text vectors of the corresponding option texts in the target problem texts can be clustered, the target vocabulary is further screened out, the option texts of the target problem texts are updated, the option texts corresponding to the problem texts do not need to be updated manually, resources are effectively saved, the efficiency of updating the option texts is improved, and the timeliness of the option texts is enhanced.

Description

Text updating method and device

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a text updating method and apparatus.

Background

In life and work, questionnaires are a common tool used by people to collect data during research activities. With the advanced development of network information technology, the method realizes the information collection by using the questionnaire through the network, can save a large amount of time for searching for the questionnaire, and reduces the workload for processing the questionnaire due to the digitalization of the questionnaire.

However, currently, the method of updating and adjusting the option text of the question in the questionnaire is still manual processing, which has high labor cost, high requirement on professional level of the processor, low processing efficiency, and low timeliness of the option text of the question in the questionnaire.

Disclosure of Invention

In view of this, an object of the present application is to provide a text updating method and apparatus, in which a text vector of an option text corresponding to a target problem text in a questionnaire is clustered, a target vocabulary is screened, and then the option text of the target problem text is updated by using the target vocabulary, so that manual updating of the option text is avoided, resources are effectively saved, efficiency of updating the option text is improved, and timeliness of the option text corresponding to the problem text is enhanced.

In a first aspect, the present application provides a text updating method, including:

acquiring a plurality of option texts corresponding to the target question text;

respectively determining a text vector corresponding to each option text;

based on clustering processing of the text vectors to obtain the text vectors included in each clustering group in a plurality of clustering groups, and the number of the text vectors included in each clustering group, screening target vocabularies from vocabularies corresponding to the text vectors;

and determining a new option text corresponding to the target question text based on the target vocabulary obtained by screening.

In a possible implementation, the separately determining a text vector corresponding to each option text includes:

performing word segmentation processing on each option text respectively, and determining a word vector corresponding to each vocabulary obtained through word segmentation processing;

respectively determining a first weight corresponding to each vocabulary based on the occurrence frequency of each vocabulary in a target text to which the target problem text belongs and the occurrence frequency of each vocabulary in all target texts; wherein each target text comprises at least one target question text;

for each option text, determining a text vector corresponding to the option text based on the word vector corresponding to each vocabulary in the option text and the first weight corresponding to each vocabulary.

In a possible implementation manner, the determining a word vector corresponding to each vocabulary resulting from the word segmentation process includes:

obtaining a word vector library obtained through pre-training; the word vector library comprises a plurality of word vectors, and each word vector corresponds to a vocabulary;

and screening the word vector corresponding to each vocabulary obtained by word segmentation processing from the word vector library.

In a possible implementation manner, the determining a text vector corresponding to the option text based on the word vector corresponding to each vocabulary in the option text and the first weight corresponding to each vocabulary includes:

aiming at each vocabulary in the option text, calculating the product of the word vector corresponding to the vocabulary and the first weight of the vocabulary to obtain the word vector weighted by the vocabulary;

and calculating the sum of the word vectors weighted by all the vocabularies in the option text to obtain a text vector corresponding to the option text.

In a possible implementation manner, the determining the first weight corresponding to each vocabulary based on the number of times that each vocabulary appears in the target text to which the target question text belongs and the number of times that each vocabulary appears in all target texts respectively includes:

and aiming at each vocabulary, calculating the ratio of the number of times of the vocabulary appearing in the target text to which the target problem text belongs to and the number of times of the vocabulary appearing in all the target texts, and obtaining a first weight corresponding to the vocabulary.

In a possible implementation manner, the screening a target vocabulary from vocabularies corresponding to text vectors based on clustering the text vectors to obtain the text vectors included in each of a plurality of clustering groups, where each clustering group includes the number of text vectors, includes:

clustering all the text vectors, and selecting the first N clustering groups with the largest number of the text vectors from a plurality of clustering groups obtained by clustering to obtain N target clustering groups; wherein N is a positive integer;

aiming at each target clustering group, determining a clustering center vector corresponding to the target clustering group based on all text vectors in the target clustering group;

and screening target words from words corresponding to the text vectors included in all the target cluster groups based on all the text vectors included in each target cluster group and the cluster center vector corresponding to each target cluster group.

In a possible implementation, the screening target words from words corresponding to text vectors included in all target cluster groups based on all text vectors included in each target cluster group and a cluster center vector corresponding to each target cluster group includes:

for each text vector in each target clustering group, determining a second weight corresponding to the text vector based on the text vector and a clustering center vector corresponding to the target clustering group;

determining a third weight of each vocabulary corresponding to the text vector based on the second weight corresponding to the text vector;

and screening target words from all words corresponding to each text vector in the N target clustering groups based on the determined third weight of each word.

In a possible embodiment, the determining a second weight corresponding to the text vector based on the cluster center vector corresponding to the text vector and the target cluster group includes:

calculating cosine values of the text vectors and cluster center vectors corresponding to the target cluster groups;

and determining a second weight corresponding to the text vector based on the obtained cosine value.

In a possible implementation, the filtering target vocabularies from all vocabularies corresponding to each text vector in the N target cluster groups based on the determined third weight of each vocabulary includes:

screening the vocabulary corresponding to the maximum M third weights as the target vocabulary; wherein M is a positive integer.

In a possible embodiment, the determining, based on all text vectors in the target cluster group, a cluster center vector corresponding to the target cluster group includes:

calculating the sum of all text vectors in the target clustering group to obtain a candidate center vector;

and dividing the candidate center vector by the number of the text vectors in the target clustering group to obtain a clustering center vector corresponding to the target clustering group.

In a second aspect, the present application provides a text updating apparatus, including:

the acquisition module is used for acquiring a plurality of option texts corresponding to the target question text;

the first determining module is used for respectively determining a text vector corresponding to each option text;

the screening module is used for screening target vocabularies from vocabularies corresponding to the text vectors based on the text vectors included in each clustering group in the plurality of clustering groups obtained by clustering the text vectors, and the number of the text vectors included in each clustering group;

and the second determining module is used for determining a new option text corresponding to the target question text based on the target vocabulary obtained by screening.

In one possible implementation, the first determining module includes:

the first determining unit is used for performing word segmentation processing on each option text and determining a word vector corresponding to each vocabulary obtained through word segmentation processing;

the second determining unit is used for respectively determining a first weight corresponding to each vocabulary based on the occurrence frequency of each vocabulary in the target text to which the target problem text belongs and the occurrence frequency of each vocabulary in all the target texts; wherein each target text comprises at least one target question text;

and a third determining unit, configured to determine, for each option text, a text vector corresponding to the option text based on the word vector corresponding to each vocabulary in the option text and the first weight corresponding to each vocabulary.

In a possible implementation manner, when determining the word vector corresponding to each vocabulary obtained by the word segmentation process, the first determining unit is specifically configured to:

In a possible implementation manner, the third determining unit is specifically configured to:

In a possible implementation manner, the second determining unit is specifically configured to:

In one possible embodiment, the screening module includes:

the clustering unit is used for clustering all the text vectors, and selecting the first N clustering groups with the largest number of the text vectors from a plurality of clustering groups obtained by clustering to obtain N target clustering groups; wherein N is a positive integer;

a fourth determining unit, configured to determine, for each target cluster group, a cluster center vector corresponding to the target cluster group based on all text vectors in the target cluster group;

and the screening unit is used for screening target vocabularies from vocabularies corresponding to the text vectors included in all the target cluster groups based on all the text vectors included in each target cluster group and the cluster center vector corresponding to each target cluster group.

In a possible embodiment, the screening unit is specifically configured to:

In a possible implementation manner, when determining the second weight corresponding to the text vector based on the cluster center vector corresponding to the text vector and the target cluster group, the screening unit is specifically configured to:

In a possible implementation manner, the screening unit, when screening the target vocabulary from all vocabularies corresponding to each text vector in the N target cluster groups based on the determined third weight of each vocabulary, is specifically configured to:

In a possible implementation manner, the fourth determining unit is specifically configured to:

In a third aspect, an embodiment of the present application further provides an electronic device, including: the device comprises a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, when an electronic device runs, the processor and the storage medium communicate through the bus, and the processor executes the machine-readable instructions to execute the steps in any one of the possible implementation manners of the first aspect and the first possible implementation manner of the first aspect of the embodiment of the present application.

In a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to perform the steps in any one of the possible implementation manners of the first aspect and the first aspect of the embodiment of the present application.

The method and the device for updating the text acquire a plurality of option texts corresponding to the target problem text; respectively determining a text vector corresponding to each option text; based on clustering processing of the text vectors to obtain the text vectors included in each clustering group in a plurality of clustering groups, and the number of the text vectors included in each clustering group, screening target vocabularies from vocabularies corresponding to the text vectors; and determining a new option text corresponding to the target question text based on the target vocabulary obtained by screening. According to the technical scheme, the text vectors of the option texts corresponding to the target problem texts in the questionnaire are determined and clustered, so that the target words are screened out, the option texts corresponding to the target problem texts are updated by the target words, the option texts in the problem texts of the questionnaire do not need to be updated manually, resources are effectively saved, the efficiency of updating the option texts corresponding to the problem texts is improved, and the timeliness of the option texts is enhanced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a flowchart illustrating a text updating method provided by an embodiment of the present application;

FIG. 2 is a flow chart illustrating another text updating method provided by the embodiment of the present application;

FIG. 3 is a flow chart of another text updating method provided by the embodiment of the application;

FIG. 4 is a block diagram of a text update apparatus according to an embodiment of the present application;

fig. 5 shows a second block diagram of a text updating apparatus according to an embodiment of the present application;

fig. 6 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for illustrative and descriptive purposes only and are not used to limit the scope of protection of the present application. Additionally, it should be understood that the schematic drawings are not necessarily drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be performed out of order, and steps without logical context may be performed in reverse order or simultaneously. One skilled in the art, under the guidance of this application, may add one or more other operations to, or remove one or more operations from, the flowchart.

In addition, the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that in the embodiments of the present application, the term "comprising" is used to indicate the presence of the features stated hereinafter, but does not exclude the addition of further features.

In order to save unnecessary resource consumption and improve the efficiency of updating option texts corresponding to question texts, the application aims to provide a text updating method and a text updating device, which can be used for screening out target words and updating option texts corresponding to target question texts by determining text vectors of option texts corresponding to target question texts in an questionnaire and clustering the text vectors, so that the option texts corresponding to the target question texts are not required to be updated manually, unnecessary resource consumption is effectively saved, the efficiency of updating the option texts is improved, and the timeliness of the option texts is enhanced.

Referring to fig. 1, fig. 1 is a flowchart of a text updating method according to an embodiment of the present application, where the method is executed by a server.

As shown in fig. 1, the text updating method includes the following steps:

and S110, acquiring a plurality of option texts corresponding to the target question texts.

In this step, the text updating apparatus or the server may establish a communication connection with a database in which a plurality of questionnaires are stored, and obtain a target question text and a plurality of option texts corresponding to the target question text therefrom. The multiple option texts can be multiple option texts corresponding to target question texts in multiple questionnaires; the option text may also be text entered by the survey user.

Specifically, the multiple questionnaire texts can be segmented according to the questions, the option texts corresponding to the same question texts in each questionnaire are taken as a whole, and then the option texts corresponding to the same question texts are processed.

The target question text may be a specific question text in a type of questionnaire, the option text may be an option corresponding to the target question text, and the option text may be a chinese text, an english text, or the like, or a mixed text including multiple languages.

For example, in the questionnaire text of the service industry category, the "what aspects of the service need improvement" may be used as the target question text, and the options related to the question text, such as "quality of service", "speed of service", and the text input by the surveyed user according to the target question text, may be used as the option text.

And S120, respectively determining a text vector corresponding to each option text.

In this step, the text updating apparatus or the server may process each option text, extract text features in the option text, and determine a text vector corresponding to each option text based on the text features.

The text vectors can be vectors capable of representing option text semantics, the distance between the text vectors in a preset coordinate system can represent the similarity degree of the option text semantics corresponding to the text vectors, and the closer the distance between the two text vectors is, the closer the semantics of the option text corresponding to the two text vectors are.

S130, based on the text vectors which are clustered to obtain the text vectors included in each clustering group in the clustering groups, and the number of the text vectors included in each clustering group, screening target vocabularies from vocabularies corresponding to the text vectors.

In this step, the text vectors may be clustered according to specific numerical values of the text vectors to obtain a plurality of clustering groups, each clustering group including a plurality of text vectors corresponding to the clustering group; the text vectors corresponding to the clustering groups have spatial similarity, i.e. their semantics are similar.

Further, according to the specific numerical value of the text vector in each cluster group and the number of the text vectors in each cluster group, the relatively important cluster group in the plurality of cluster groups and one or more target vocabularies which can best reflect the semantics of the text vectors in the cluster group can be determined.

And S140, determining a new option text corresponding to the target problem text based on the target vocabulary obtained by screening.

In this step, the plurality of target words can be converted into new option texts corresponding to the target problem texts according to the semantics of the one or more target words obtained through screening.

When a plurality of target vocabularies exist from a plurality of cluster groups, the new option text corresponding to each cluster group can be determined respectively according to the target vocabulary corresponding to each cluster group. Of course, the target words of the cluster groups with different groups can be combined to form a new option text. In addition, necessary conjunctions may be added in the process of generating a new option text using the target vocabulary.

The text updating method provided by the embodiment of the application obtains a plurality of option texts corresponding to a target problem text; respectively determining a text vector corresponding to each option text; based on clustering processing of the text vectors to obtain the text vectors included in each clustering group in a plurality of clustering groups, and the number of the text vectors included in each clustering group, screening target vocabularies from vocabularies corresponding to the text vectors; and determining a new option text corresponding to the target question text based on the target vocabulary obtained by screening. According to the technical scheme, the text vectors of the option texts corresponding to the target problem texts in the questionnaire are determined and clustered, so that the target words are screened out, the target problem texts are updated, the option texts in the problem texts of the questionnaire do not need to be updated manually, unnecessary resource consumption is effectively saved, the efficiency of updating the option texts is improved, and the timeliness of the option texts corresponding to the problem texts is enhanced.

Referring to fig. 2, fig. 2 is a flowchart of another text updating method according to an embodiment of the present disclosure. As shown in fig. 2, the specific implementation process is as follows:

s210, a plurality of option texts corresponding to the target question texts are obtained.

And S220, performing word segmentation processing on each option text respectively, and determining a word vector corresponding to each vocabulary obtained through word segmentation processing.

In this step, after obtaining a plurality of option texts corresponding to a target problem text, a text updating device or a server may perform word segmentation processing on the plurality of option texts by using a word segmentation tool such as Chinese segmentation in the Chinese, remove stop words in the option texts, and query a word vector corresponding to each vocabulary through a preset word vector library.

The preset word vector library can be obtained by training through the text in the corpus and a preset word vector algorithm. Specifically, a context formed by each vocabulary and the preceding and following vocabularies thereof can be trained, and semantic representation of each vocabulary is determined through a neural network, so that a word vector corresponding to each vocabulary is obtained. The spatial position of each word vector corresponds to the semantics of the word vector.

And S230, respectively determining a first weight corresponding to each vocabulary based on the occurrence frequency of each vocabulary in the target text to which the target problem text belongs and the occurrence frequency of each vocabulary in all the target texts.

Wherein each target text comprises at least one target question text, and specifically, the target text here may be a questionnaire.

Specifically, the word frequency inverse document rate (Tf-IDF) of each vocabulary in the target file to which the target question text belongs may be calculated, and the obtained word frequency inverse document rate is used as the first weight corresponding to each vocabulary. In specific implementation, the first weight is a quotient obtained by dividing the number of times that the corresponding vocabulary appears in the target text to which the target question text belongs by the number of times that the vocabulary appears in all the target texts.

S240, aiming at each option text, determining a text vector corresponding to the option text based on a word vector corresponding to each vocabulary in the option text and a first weight corresponding to each vocabulary.

In this step, after the text updating apparatus or the server determines the word vector and the first weight corresponding to each vocabulary, specifically, the text vector corresponding to the selected text may be obtained by performing weighted summation on the word vector corresponding to each vocabulary, where the added weight may be the first weight.

The text vector corresponding to the option text is formed by integrating word vectors of words corresponding to the option text, and the form of the text vector is similar to the form of the word vector and can represent the semantic meaning of the corresponding option text.

S250, based on the text vectors which are clustered to obtain the text vectors included in each clustering group in the clustering groups, and the number of the text vectors included in each clustering group, screening target words from words corresponding to the text vectors.

And S260, determining a new option text corresponding to the target question text based on the target vocabulary obtained by screening.

The descriptions of step S210, step S250 to step S260 may refer to the descriptions of step S110, step S130 to step S140, and the same technical effect may be achieved, which is not described herein again.

In some embodiments of the present application, step S220 comprises:

obtaining a word vector library obtained through pre-training; the word vector library comprises a plurality of word vectors, and each word vector corresponds to a vocabulary; and screening the word vector corresponding to each vocabulary obtained by word segmentation processing from the word vector library.

In this step, after performing word segmentation processing on each option text, a word vector library obtained by pre-training may be obtained through communication connection or direct obtaining, where the word vector library stores a plurality of word vectors, the stored word vectors may be obtained by processing a preset text through a preset word vector model, the word vector model may be a model such as a neural network, each word vector corresponds to a vocabulary, and the word vectors may represent semantics of the vocabulary corresponding to the word vector.

Further, word vectors corresponding to the words obtained by word segmentation processing can be searched from a word vector database based on the corresponding relationship between the word vectors and the words.

In some embodiments of the present application, step S240 includes:

aiming at each vocabulary in the option text, calculating the product of the word vector corresponding to the vocabulary and the first weight of the vocabulary to obtain the word vector weighted by the vocabulary; and calculating the sum of the word vectors weighted by all the vocabularies in the option text to obtain a text vector corresponding to the option text.

In this way, the word vector corresponding to the vocabulary is weighted, and the obtained text vector corresponding to the option text can reflect the semantics of the option text more completely.

In some embodiments of the present application, step S230 comprises:

In this step, the frequency of each vocabulary appearing in the target text to which the target problem text belongs, that is, the word frequency of the vocabulary in the target text, may represent the importance degree of the vocabulary in the target text, and the higher the frequency of appearance in the target text, the more important the vocabulary in the target text is; the ratio of the number of times that the vocabulary appears in all the target texts, namely the inverse text rate of the vocabulary, indicates that the less the number of times that the vocabulary appears in all the target texts, relative to the importance degree of the target texts to which the vocabulary belongs, the more important the vocabulary appears in the target texts to which the vocabulary belongs.

Specifically, the first weight may be calculated according to the following formula:

wherein, A is the number of all target texts, B is the number of the target texts with the vocabulary, C is the number of times of the vocabulary appearing in the target texts to which the target problem texts belong, and D is the total number of the vocabulary in the target texts to which the target problem texts belong.

In some embodiments of the present application, step S250 specifically includes the following steps:

step (1), clustering all text vectors, and selecting the first N clustering groups with the largest number of text vectors from a plurality of clustering groups obtained by clustering to obtain N target clustering groups; wherein N is a positive integer.

In the step, clustering algorithms such as k-means can be used for clustering the text vectors, and parameters in the clustering algorithms can be adjusted according to the target text, namely the specific type of the questionnaire. After the text vectors are clustered by the text updating device or the server, a plurality of cluster classifications can be obtained, each cluster classification comprises one or more text vectors, specifically, the obtained plurality of cluster classifications are sorted in a descending order according to the number of the text vectors, and the more the number of the text vectors in one cluster classification is, the more important the option text corresponding to the cluster classification is; and selecting the first N clustering groups with the largest text vector quantity, and taking the selected N clustering groups as target clustering groups.

And (2) aiming at each target clustering group, and determining a clustering center vector corresponding to the target clustering group based on all text vectors in the target clustering group.

In the step, the clustering center vector corresponding to the target clustering group can be calculated according to the specific numerical values of all the text vectors in the target clustering group and the k-means and other aggregation algorithms.

Specifically, the center of the target cluster group may be calculated, and then the cluster center vector corresponding to the target cluster group is determined according to the distance between each text vector and the center.

And (3) screening target vocabularies from vocabularies corresponding to the text vectors included in all the target cluster groups based on all the text vectors included in each target cluster group and the cluster center vector corresponding to each target cluster group.

In this step, after the text updating device or the server determines the clustering center vector corresponding to the target clustering group, the text vector having the closest relationship may be screened out according to the semantic relationship represented by all the text vectors and the clustering center vector in the target clustering group, and the target vocabulary may be screened out from the screened text vectors.

In some embodiments of the present application, step (3) comprises the steps of:

and (31) determining a second weight corresponding to each text vector in each target cluster group based on the cluster center vector corresponding to the text vector and the target cluster group.

And (32) determining a third weight of each vocabulary corresponding to the text vector based on the second weight corresponding to the text vector.

Specifically, the second weight corresponding to the text vector may be used as the third weight of each vocabulary corresponding to the text vector.

And (33) screening target vocabularies from all vocabularies corresponding to each text vector in the N target cluster groups based on the determined third weight of each vocabulary.

In this step, the third weight of each vocabulary may represent the correlation between the semantics of the vocabulary in the text vector and the text vector, and the vocabulary with the most correlation between the semantics and the text vector may be selected as the target vocabulary.

In some embodiments of the present application, step (31) comprises:

calculating cosine values of the text vectors and cluster center vectors corresponding to the target cluster groups; and determining a second weight corresponding to the text vector based on the obtained cosine value.

Specifically, the cosine value of the cluster center vector corresponding to the text vector and the target cluster group may be used as the second weight corresponding to the text vector.

In some embodiments of the present application, step (33) comprises:

Therefore, the words with the highest relevance with the corresponding text vectors can be screened out and used as the target words, and the target words are converted into new option texts according to the semantics of the target words.

In some embodiments of the present application, step (2) comprises:

The text updating method provided by the embodiment of the application obtains a plurality of option texts corresponding to a target problem text; performing word segmentation processing on each option text respectively, and determining a word vector corresponding to each vocabulary obtained through word segmentation processing; respectively determining a first weight corresponding to each vocabulary based on the occurrence frequency of each vocabulary in a target text to which the target problem text belongs and the occurrence frequency of each vocabulary in all target texts; wherein each target text comprises at least one target question text; for each option text, determining a text vector corresponding to the option text based on a word vector corresponding to each vocabulary in the option text and a first weight corresponding to each vocabulary; based on clustering processing of the text vectors to obtain the text vectors included in each clustering group in a plurality of clustering groups, and the number of the text vectors included in each clustering group, screening target vocabularies from vocabularies corresponding to the text vectors; and determining a new option text corresponding to the target question text based on the target vocabulary obtained by screening. According to the technical scheme, the text vectors of the option texts corresponding to the target question texts in the questionnaire are determined and clustered, so that the target words are screened out, the target question texts are updated, the option texts in the question texts of the questionnaire do not need to be updated manually, unnecessary resource consumption is effectively saved, the efficiency of updating the question texts is improved, and the timeliness of the option texts in the question texts is enhanced.

Referring to fig. 3, fig. 3 is a flowchart of another text updating method according to an embodiment of the present application.

As shown in fig. 3, the text updating method provided in this embodiment includes:

obtaining a questionnaire file;

dividing a questionnaire file into a plurality of sub-question answer texts according to questions in the questionnaire file, wherein the sub-question answer texts comprise a plurality of answer texts (namely the option texts) corresponding to the questions (namely the question texts);

performing feature extraction on the answer text to obtain text features of the answer text;

converting the text features into text vectors, wherein the text vectors are determined based on the pre-trained word vectors;

inputting the text vector into a semantic clustering model, and screening out a target vocabulary based on a clustering result of the text vector;

converting the target vocabulary into a new option text of a question answer text through a text abstract model;

and updates the new option text to the question answer text.

Referring to fig. 4 and 5, fig. 4 shows one block diagram of a text updating apparatus according to an embodiment of the present application, and fig. 5 shows a second block diagram of the text updating apparatus according to the embodiment of the present application. As shown in fig. 4, the text updating apparatus 400 includes:

an obtaining module 410, configured to obtain multiple option texts corresponding to the target question text;

a first determining module 420, configured to determine a text vector corresponding to each option text;

the screening module 430 is configured to screen a target vocabulary from vocabularies corresponding to text vectors based on clustering the text vectors to obtain the text vectors included in each of a plurality of clustering groups, where the number of the text vectors included in each clustering group is the number of the text vectors;

and a second determining module 440, configured to determine, based on the target vocabulary obtained by screening, a new option text corresponding to the target question text.

As shown in fig. 5, in some embodiments of the present application, the text updating apparatus 500 includes: an obtaining module 510, a first determining module 520, a screening module 530, and a second determining module 540. The first determining module 520 includes:

the first determining unit 521 is configured to perform word segmentation on each option text, and determine a word vector corresponding to each vocabulary obtained through word segmentation;

a second determining unit 522, configured to determine a first weight corresponding to each vocabulary respectively based on the number of times that each vocabulary appears in the target text to which the target question text belongs, and the number of times that each vocabulary appears in all target texts; wherein each target text comprises at least one target question text;

a third determining unit 523, configured to determine, for each option text, a text vector corresponding to the option text based on the word vector corresponding to each vocabulary in the option text and the first weight corresponding to each vocabulary.

In some embodiments of the present application, when determining the word vector corresponding to each vocabulary obtained by the word segmentation process, the first determining unit 521 is specifically configured to:

In some embodiments of the present application, the third determining unit 523 is specifically configured to:

In some embodiments of the present application, the second determining unit 522 is specifically configured to:

In some embodiments of the present application, the screening module 530 includes:

the clustering unit 531 is configured to perform clustering processing on all the text vectors, and select the first N clustering groups including the largest number of text vectors from a plurality of clustering groups obtained through the clustering processing, so as to obtain N target clustering groups; wherein N is a positive integer;

a fourth determining unit 532, configured to determine, for each target cluster group, a cluster center vector corresponding to the target cluster group based on all text vectors in the target cluster group;

the screening unit 533 is configured to screen the target words from the words corresponding to the text vectors included in all the target cluster groups based on all the text vectors included in each target cluster group and the cluster center vector corresponding to each target cluster group.

In some embodiments of the present application, the screening unit 533 is specifically configured to:

for each text vector in each target clustering group, determining a second weight corresponding to the text vector based on the text vector and a clustering center vector corresponding to the target clustering group; determining a third weight of each vocabulary corresponding to the text vector based on the second weight corresponding to the text vector; and screening target words from all words corresponding to each text vector in the N target clustering groups based on the determined third weight of each word.

In some embodiments of the present application, when determining the second weight corresponding to the text vector based on the cluster center vector corresponding to the text vector and the target cluster group, the screening unit 533 is specifically configured to:

In some embodiments of the present application, the filtering unit 533, when filtering the target vocabulary from all vocabularies corresponding to each text vector in the N target cluster groups based on the determined third weight of each vocabulary, is specifically configured to:

In some embodiments of the present application, the fourth determining unit 532 is specifically configured to:

An embodiment of the present application discloses an electronic device, as shown in fig. 6, including: a processor 601, a memory 602, and a bus 603, wherein the memory 602 stores machine-readable instructions executable by the processor 601, and when the electronic device is operated, the processor 601 and the memory 602 communicate via the bus 603.

The machine readable instructions, when executed by the processor 601, perform the steps of the text updating method of:

respectively determining a text vector corresponding to each option text;

In some embodiments, the processor 601 is specifically configured to, when determining the text vector corresponding to each option text respectively:

In some embodiments, the processor 601 is specifically configured to perform, when determining the word vector corresponding to each vocabulary obtained by the word segmentation process:

In some embodiments, the processor 601 is specifically configured to, when determining the text vector corresponding to the option text based on the word vector corresponding to each vocabulary in the option text and the first weight corresponding to each vocabulary, perform:

In some embodiments, the processor 601 is specifically configured to perform, when determining the first weight corresponding to each vocabulary respectively based on the number of times that each vocabulary appears in the target text to which the target question text belongs and the number of times that each vocabulary appears in all target texts:

In some embodiments, the processor 601 is specifically configured to perform, when obtaining, based on the text vector being subjected to clustering processing, a text vector included in each of a plurality of clustering groups, and a number of text vectors included in each clustering group, and screening a target vocabulary from vocabularies corresponding to the text vectors, that:

In some embodiments, the processor 601 is specifically configured to perform, when filtering target vocabularies from vocabularies corresponding to text vectors included in all target cluster groups based on all text vectors included in each target cluster group and a cluster center vector corresponding to each target cluster group, the following steps:

In some embodiments, the processor 601 is specifically configured to, when determining the second weight corresponding to the text vector based on the cluster center vector corresponding to the text vector and the target cluster group, perform:

In some embodiments, the processor 601 is specifically configured to perform, when filtering the target vocabulary from all vocabularies corresponding to each text vector in the N target cluster groups based on the determined third weight of each vocabulary, the following steps:

In some embodiments, when determining the cluster center vector corresponding to the target cluster group based on all the text vectors in the target cluster group, the processor 601 is specifically configured to perform:

The embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the steps of the text updating method in any of the above embodiments.

The embodiment of the present application further provides a computer program product, which includes a computer-readable storage medium storing a nonvolatile program code executable by a processor, where instructions included in the program code may be used to execute the text updating method described in the foregoing method embodiment, and specific implementation may refer to the method embodiment, and is not described herein again.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to corresponding processes in the method embodiments, and are not described in detail in this application. In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and there may be other divisions in actual implementation, and for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some communication interfaces, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A text updating method, comprising:

respectively determining a text vector corresponding to each option text;

2. The method of claim 1, wherein the determining the text vector corresponding to each option text separately comprises:

3. The method according to claim 2, wherein the determining a word vector corresponding to each vocabulary obtained by the word segmentation process comprises:

4. The method of claim 2, wherein determining the text vector corresponding to the option text based on the word vector corresponding to each vocabulary in the option text and the first weight corresponding to each vocabulary comprises:

5. The text updating method according to claim 1, wherein the step of screening a target vocabulary from vocabularies corresponding to the text vectors based on clustering the text vectors to obtain the text vectors included in each of a plurality of clustering groups, the number of the text vectors included in each clustering group comprises:

6. The text updating method according to claim 5, wherein the screening of the target vocabularies from the vocabularies corresponding to the text vectors included in all the target cluster groups based on all the text vectors included in each target cluster group and the cluster center vector corresponding to each target cluster group comprises:

7. The method of claim 6, wherein determining the second weight corresponding to the text vector based on the cluster center vector corresponding to the text vector and the target cluster group comprises:

8. The method of claim 6, wherein the filtering target words from all words corresponding to each text vector in the N target cluster groups based on the determined third weight of each word comprises:

9. A text updating apparatus, comprising:

10. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the text updating method according to any one of claims 1 to 8.

11. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the text updating method according to any one of claims 1 to 8.