WO2021159814A1

WO2021159814A1 - Text data error detection method and apparatus, terminal device, and storage medium

Info

Publication number: WO2021159814A1
Application number: PCT/CN2020/132478
Authority: WO
Inventors: 朱昭苇; 孙行智; 胡岗
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-09-28
Filing date: 2020-11-27
Publication date: 2021-08-19
Also published as: CN111883222B; CN111883222A

Abstract

Disclosed are a text data error detection method and apparatus, a terminal device, and a storage medium, applicable to digital medical treatment. The method comprises: acquiring text data to be verified from any data source, wherein the text data to be verified comprises state description data of a target object and state determination data with regard to the target object (S101); acquiring a first feature vector corresponding to the state description data, and inputting the first feature vector into a generator in a generative adversarial network in order to output a second feature vector by means of the generator (S102), wherein the generator is obtained by means of adversarial training based on sample text data from at least two data sources and at least two discriminators in the generative adversarial network; and acquiring a third feature vector corresponding to the state determination data, and determining, according to the second feature vector and the third feature vector, whether the state determination data is erroneous data (S103). The method can improve the text data testing accuracy, and has high applicability.

Description

Error detection method, device, terminal equipment and storage medium of text data

This application claims the priority of the Chinese patent application filed with the Chinese Patent Office on September 28, 2020, the application number is 202011042326.2, and the invention title is "text data error detection method, device, terminal equipment and storage medium", and its entire content Incorporated in this application by reference.

Technical field

This application relates to the field of data processing, and in particular to an error detection method, device, terminal device and storage medium for text data.

Background technique

In the development process of an enterprise, various types of text data are usually generated. In order to improve the construction and management of the enterprise, the quality control of some important text data (for ease of description, referred to as quality control) can help Companies grow better. For example, for hospitals, diagnostic quality control of medical record data is an important part of hospital management and construction. It should be understood that diagnostic quality control is of great value for doctors' assessment and event tracing. Generally speaking, diagnostic quality control generally includes misdiagnosis and missed diagnosis. From the perspective of hospitals and doctors, the detection of misdiagnosis is more important for maintaining the normal operation of the hospital. However, due to the huge population base of our country and the number of people seeking medical treatment far exceeds the world average, therefore, for a large amount of medical record data, manual sampling can usually be used to perform diagnostic quality control on the medical record data, but this manual sampling method is inefficient. And it takes a long time. Therefore, the prior art also proposes to perform diagnostic quality control through a model, but the inventor realized that because this type of method only uses its own hospital data to train the model when modeling, it cannot be effectively migrated to other hospitals for application. , Poor universality and low detection accuracy.

technical problem

The embodiments of the present application provide an error detection method, device, terminal device, and storage medium for text data, which can improve the accuracy of text data detection and have high applicability.

Technical solutions

In the first aspect, an embodiment of the present application provides an error detection method for text data. The method includes: obtaining text data to be verified from any data source. The text data to be verified includes the state description data of the target object and The state determination data of the target object; obtain the first feature vector corresponding to the state description data, and input the first feature vector into the generator in the generative confrontation network to output the second feature vector through the generator, and the generator is based on The sample text data of at least two data sources is obtained by adversarial training with at least two discriminators in the above-mentioned generative confrontation network, where one discriminator is obtained by training the sample text data of one of the above-mentioned at least two data sources ; Acquire a third feature vector corresponding to the state determination data, and determine whether the state determination data is error data based on the second feature vector and the third feature vector.

In a second aspect, an embodiment of the present application provides an error detection device for text data. The device includes: a data acquisition module for acquiring text data to be verified from any data source. The text data to be verified includes information about the target object. State description data and state determination data for the above-mentioned target object; a data processing module, used to obtain the first feature vector corresponding to the above-mentioned state description data, and input the above-mentioned first feature vector into the generator in the generative confrontation network to pass the above The generator outputs a second feature vector, and the generator is obtained by adversarial training based on sample text data from at least two data sources and at least two discriminators in the generative confrontation network, wherein one discriminator is obtained from the at least two data The sample text data of a data source in the source is obtained through training; the data detection module is used to obtain the third feature vector corresponding to the state determination data, and determine whether the state determination data is based on the second feature vector and the third feature vector Bad data.

In a third aspect, an embodiment of the present application provides a terminal device. The terminal device includes a processor and a memory, and the processor and the memory are connected to each other. The memory is used to store a computer program, the computer program includes program instructions, and the processor is configured to call the above program instructions to execute the following method: obtain text data to be verified from any data source, and the text data to be verified includes a target The state description data of the object and the state determination data for the target object; the first feature vector corresponding to the state description data is obtained, and the first feature vector is input to the generator in the generative confrontation network to output the first feature vector through the generator. Two feature vectors, the generator is obtained by confrontation training based on sample text data from at least two data sources and at least two discriminators in the generative confrontation network, wherein one discriminator is obtained from one of the at least two data sources The sample text data of the data source is obtained through training; the third feature vector corresponding to the state determination data is obtained, and whether the state determination data is wrong data is determined according to the second feature vector and the third feature vector.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program includes program instructions that, when executed by a processor, cause the processor to execute The following method: obtain the text data to be verified from any data source, the text data to be verified includes the state description data of the target object and the state determination data for the target object; obtain the first feature vector corresponding to the state description data, and The first feature vector is input to a generator in a generative confrontation network to output a second feature vector through the generator. The generator is based on sample text data from at least two data sources and at least two of the generative confrontation network. The discriminator is obtained by adversarial training, where a discriminator is obtained by training sample text data from one of the above-mentioned at least two data sources; acquiring the third feature vector corresponding to the above-mentioned state determination data, and according to the above-mentioned second feature vector and The third feature vector determines whether the state determination data is error data.

Beneficial effect

By adopting the embodiments of the present application, the detection accuracy of text data can be improved, and the applicability is strong.

Description of the drawings

FIG. 1 is a schematic flowchart of a method for detecting errors in text data provided by an embodiment of the present application.

Fig. 2 is a schematic diagram of a scenario of medical record data provided by an embodiment of the present application.

FIG. 3 is another schematic flowchart of the method for detecting errors in text data provided by an embodiment of the present application.

Fig. 4 is a schematic diagram of the framework of a generative confrontation network and a data pair matching model provided by an embodiment of the present application.

FIG. 5 is a schematic structural diagram of an error detection device for text data provided by an embodiment of the present application.

FIG. 6 is a schematic diagram of another structure of an error detection device for text data provided by an embodiment of the present application.

Fig. 7 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.

Embodiments of the present invention

The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application.

The technical solution of this application can be applied to the fields of artificial intelligence, smart city, digital medical, blockchain and/or big data technology to realize text detection. Optionally, the data involved in this application, such as text, vectors, and/or judgment results, can be stored in a database, or can be stored in a blockchain, such as distributed storage through a blockchain, which is not limited in this application.

For example, the error detection method of text data provided in the embodiment of this application (for convenience of description, the method provided in the embodiment of this application may be referred to as the method provided in the embodiment of this application), can be widely applied to any of multiple application fields such as medical treatment, investment, and insurance. . Among them, the method provided by the embodiment of the present application obtains the to-be-verified text data from any data source, and the to-be-verified text data can include the state description data of the target object and the state determination data for the target object. By obtaining the first feature vector corresponding to the state description data, and inputting the first feature vector into the generator in the generative confrontation network, the second feature vector can be output through the generator. Further, by acquiring the third feature vector corresponding to the state determination data, whether the state determination data is error data can be determined according to the second feature vector and the third feature vector. Wherein, the above generator is obtained by adversarial training based on sample text data from at least two data sources and at least two discriminators in the generative confrontation network, and each discriminator is obtained from a sample from one of the at least two data sources. The text data is trained. By adopting the embodiments of the present application, the detection accuracy of text data can be improved, and the applicability is strong.

The methods and related devices provided by the embodiments of the present application will be described in detail below with reference to FIGS. 1 to 7 respectively.

Please refer to FIG. 1. FIG. 1 is a schematic flowchart of a method for detecting errors in text data according to an embodiment of the present application. The method provided in the embodiment of the present application may include the following steps S101 to S103.

S101. Acquire text data to be verified from any data source, and the text data to be verified includes state description data of the target object and state determination data for the target object.

In some feasible implementation manners, the to-be-verified text data of any data source is acquired, and the acquired to-be-verified text data may include the state description data of the target object and the state determination data for the target object. It is not difficult to understand that in different application fields, the data sources of the above-mentioned text data to be verified are different. For example, in the field of medical applications, the aforementioned text data to be verified may include medical record data, where the data source of the medical record data may be a hospital. It should be understood that when the aforementioned text data to be verified is medical record data, the state description data for the target object in the text data to be verified may be the condition description data of the patient in the medical record data, and the state determination data for the target object in the text data to be verified It can be the doctor's diagnosis data for the patient's condition. Among them, the condition description data can include the chief complaint and the current medical history, etc., which is not limited here. For another example, in the field of insurance applications, the aforementioned text data to be verified may include insurance data, and the data source of the insurance data may be an insurance company. It should be understood that when the aforementioned text data to be verified is insurance data, the status description data for the target object in the text data to be verified may be the insurance requirement data of the applicant, and the status determination data for the target object in the text data to be verified may be The insurance agent customizes data for the policyholder’s insurance plan, etc. For the convenience of description, the following embodiments of the present application are described by taking the medical application field as an example. Please refer to FIG. 2, which is a schematic diagram of a scenario of medical record data provided by an embodiment of the present application. As shown in Figure 2, the medical record data can include data such as the patient's name, gender, age, the department visited, the date of visit, the visiting doctor, the chief complaint, the history of current illness, and the diagnosis result. Among them, by extracting the main complaint and current medical history included in the medical record data, the current medical history of the main complaint can be determined as the patient's condition description data, and by extracting the diagnosis result included in the medical record data, the diagnosis result can be determined as the patient's condition diagnosis data.

S102. Obtain a first feature vector corresponding to the state description data, and input the first feature vector into a generator in a generative confrontation network to output a second feature vector through the generator.

In some feasible implementation manners, by obtaining the first feature vector corresponding to the state description data, and inputting the first feature vector into the generator in the generative confrontation network, the second feature vector can be output through the generator. The above generator can be obtained by conducting confrontation training based on sample text data of at least two data sources and at least two discriminators in the generative confrontation network. A discriminator is trained on sample text data from one of at least two data sources. In other words, each discriminator can be trained on sample text data from a data source. For example, assuming that the aforementioned at least two data sources include a first data source and a second data source, and the aforementioned at least two discriminators include a first discriminator and a second discriminator, the generator may be based on sample text data from the first data source. The sample text data of the second data source is trained against the first discriminator and the second discriminator in the generative confrontation network. The first discriminator can be trained by the sample text data of the first data source, and the second discriminator It can be trained on sample text data from the second data source. It should be understood that in the field of medical applications, the above-mentioned at least two data sources may include at least two hospitals in the same area, or two hospitals in different areas, which are determined according to actual application scenarios and are not limited here. .

S103: Acquire a third feature vector corresponding to the state determination data, and determine whether the state determination data is error data according to the second feature vector and the third feature vector.

In some feasible implementation manners, by acquiring the third feature vector corresponding to the state determination data, whether the state determination data is wrong data can be determined according to the second feature vector and the third feature vector. Wherein, the second feature vector and the third feature vector may be input to the data pair matching model, and based on the output result of the data pair matching model, it is determined whether the state determination data is wrong data. It should be understood that the aforementioned data pair matching model can be obtained by training based on at least one sample data pair and the matching label of each sample data pair. One of the sample data pairs includes a fourth feature vector corresponding to the state description data in the sample text data and a fifth feature vector corresponding to the state determination data. The matching label of any sample data pair is used to identify any sample data pair. Whether the fourth feature vector and the fifth feature vector match. That is to say, by inputting the feature vector corresponding to the state description data and the feature vector corresponding to the state determination data into the matching model, it can be determined whether the state description data and the state determination data match according to the matching model. Among them, when the state description data does not match the state determination data, it can be considered whether the state determination data is wrong data.

In the embodiment of the present application, by obtaining the text data to be verified from any data source, the text data to be verified includes the state description data of the target object and the state determination data for the target object. By obtaining the first feature vector corresponding to the state description data, and inputting the first feature vector into the generator in the generative confrontation network, the second feature vector can be output through the generator. Further, by acquiring the third feature vector corresponding to the state determination data, whether the state determination data is error data can be determined according to the second feature vector and the third feature vector. Wherein, the above generator is obtained by adversarial training based on sample text data from at least two data sources and at least two discriminators in the generative confrontation network, and each discriminator is obtained from a sample from one of the at least two data sources. The text data is trained. By adopting the embodiments of the present application, the detection accuracy of text data can be improved, and the applicability is strong.

Referring to FIG. 3, FIG. 3 is another schematic flowchart of the method for detecting errors in text data provided by an embodiment of the present application. The error detection method of text data provided in the embodiment of the present application can be described by the implementation manner provided in the following steps S201 to S203.

S201. Obtain a training sample set, construct a first discriminator based on sample text data from a first data source in the training sample set, and construct a second discriminator based on sample text data from a second data source in the training sample set.

In some feasible implementation manners, a training sample set is obtained, and the training sample set may include sample text data from at least two data sources. Among them, sample text data from a data source can be used to construct a discriminator. For example, the first discriminator can be constructed based on the sample text data from the first data source in the training sample set, and the second discriminator can be constructed based on the sample text data from the second data source in the training sample set. The sample text data of the three data sources constructs a third discriminator, etc., which are specifically determined according to actual application scenarios, and there is no restriction here. It should be understood that the number of data sources included in the training sample set may be greater than or equal to the number of discriminators constructed. Schematically, in the following embodiments of the present application, the training sample set includes two data sources (for convenience of description, the first data source and the second data source are taken as examples for illustration), and the constructed discriminator includes the first discriminator Take the second discriminator as an example.

It is not difficult to understand that in the field of medical applications, the first discriminator and the second discriminator can be respectively disease classification models (for ease of description, the first disease classification model and the second disease classification model can be used as examples for illustration). Among them, the sample text data from the first data source in the training sample set can be used to train or construct the model parameters of the first disease classification model, and the sample text data from the second data source in the training sample set can be used to train or construct the second disease The model parameters of the classification model. It is not difficult to understand that any disease classification model can include Convolutional Neural Networks (CNN), fully connected layer and softmax layer. Among them, CNN includes multiple convolutional layers and multiple pooling layers. The size of the convolution kernel of each convolutional layer can be set according to actual application scenarios. The pooling layer can be the maximum pooling layer or the average pooling layer. There is no restriction here. By inputting the feature vector corresponding to the disease description data in the sample text data from the first data source or the second data source into the disease classification model, after sequentially passing through the CNN, fully connected layer and softmax layer, various diseases output by the softmax layer can be obtained The corresponding disease probability is the disease probability distribution. By calculating the loss function between the disease classification label corresponding to the disease diagnosis data in each sample text data and the disease probability distribution, the model parameters of the disease classification model can be continuously adjusted until a disease classification model that satisfies the convergence condition is obtained.

Optionally, in some feasible implementation manners, the first discriminator and the second discriminator may also be the first classification parameter and the second classification parameter in the disease classification model, where the first classification parameter may be based on The sample feature vector corresponding to the sample data of the first data source and the sample data classification result are trained, and the second classification parameter is trained according to the sample feature vector corresponding to the sample data from the second data source and the sample data classification result. get.

S202: Construct a generator based on each sample text data in the training sample set and the first discriminator and the second discriminator in the generative confrontation network.

In some feasible implementations, by obtaining the state description data in each sample text data in the training sample set, and inputting the first state description feature vector corresponding to the state description data in each sample text data into the generator, the generated The second state output by the generator describes the feature vector. By inputting the second state description feature vector to the first discriminator and the second discriminator, respectively, the probability distribution of the first judgment result output by the first discriminator and the probability distribution of the second judgment result output by the second discriminator can be obtained. Further, the model parameters of the generator can be adjusted according to the probability distribution of the first determination result and the probability distribution of the second determination result to obtain a generator that satisfies the convergence condition.

It should be understood that the first standard deviation can be obtained by calculating the standard deviation of the multiple judgment result probabilities included in the probability distribution of the first judgment result, and the standard deviation of the multiple judgment result probabilities included in the second judgment result probability distribution can be calculated. Poor, the second standard deviation can be obtained. Wherein, when the first standard deviation and the second standard deviation are both less than or equal to the preset standard deviation threshold, it can be determined that the generator meets the convergence condition after adjusting the model parameters. That is to say, when the disease probabilities of each disease output by the first discriminator and the second discriminator are basically similar, it can be considered that the feature vector output based on the generator is relatively pure, that is, the generator has learned information from multiple data sources. , And not doped with impurity information from a single data source.

S203. Acquire text data to be verified from any data source, and the text data to be verified includes state description data of the target object and state determination data for the target object.

In some feasible implementation manners, after performing confrontation training on the generator and the discriminator in the generative confrontation network based on the sample text data of at least two data sources. Can obtain the text data to be verified from any data source, and perform error detection on the text data to be verified. It should be understood that any of the aforementioned data sources may be any one of the at least two data sources included in the training sample set. Alternatively, any data source may be any data source different from each data source included in the training sample set. Wherein, when any of the aforementioned data sources is a certain data source included in the training sample set, the text data to be verified is new text data, that is, text data that is not used as a training sample. For example, in the field of medical applications, the aforementioned text data to be verified may include medical record data, where the data source of the medical record data may be a hospital. It should be understood that when the aforementioned text data to be verified is medical record data, the state description data for the target object in the text data to be verified may be the condition description data of the patient in the medical record data, and the state determination data for the target object in the text data to be verified It can be the doctor's diagnosis data for the patient's condition. Among them, the condition description data can include the chief complaint and the current medical history, etc., which is not limited here. For another example, in the field of insurance applications, the aforementioned text data to be verified may include insurance data, and the data source of the insurance data may be an insurance company. It should be understood that when the aforementioned text data to be verified is insurance data, the status description data for the target object in the text data to be verified may be the insurance requirement data of the applicant, and the status determination data for the target object in the text data to be verified may be The insurance agent customizes data for the policyholder’s insurance plan, etc. For the convenience of description, the following embodiments of the present application are described by taking the medical application field as an example.

Among them, suppose that the training sample set includes sample medical record data x from hospital a (for example, sample medical record data x can be the medical record data of hospital a in 2019) and sample text data y from hospital b (for example, sample medical record data y can be The medical record data of hospital b in 2019), based on the sample medical record data x of hospital a and the sample text data y from hospital b, after training the corresponding generator and discriminator respectively, the new medical record data from hospital a can be further obtained As the text data to be verified, for example, the text data to be verified can be the medical record data of a patient or multiple patients who visited hospital a in 2020, or the text data to be verified can also be a patient who visited hospital a in 2018. Medical history data of one patient or multiple patients. Alternatively, the medical record data from hospital c can be further obtained as the text data to be verified. For example, the text data to be verified can be the medical record data of a patient or multiple patients who visited hospital c in 2019, or the text data to be verified It can also be the medical record data of a certain patient or multiple patients in 2020, etc., which are specifically determined according to actual application scenarios, and there is no limitation here.

S204. Obtain a first feature vector corresponding to the state description data, and input the first feature vector into a generator in a generative confrontation network to output a second feature vector through the generator.

It should be understood that by performing word segmentation processing on the state description data included in the text data to be verified, multiple words that make up the state description data can be obtained. By obtaining the word vector corresponding to each of the multiple words constituting the state description data, the first feature vector corresponding to the state description data can be generated according to the word vector corresponding to each word. For example, when the text data to be verified includes medical record data, the state description data for the target object in the text data to be verified may include the condition description data of the patient, and the state determination data for the target object in the text data to be verified may include the state of the patient. Diagnostic data. Therefore, by performing word segmentation processing on the disease description data, multiple words that make up the disease description data can be obtained. By obtaining the word vector corresponding to each of the multiple words that make up the disease description data, the word vector corresponding to each word can be obtained. Generate the first feature vector corresponding to the disease description data. It should be understood that when determining the word vector corresponding to each word, a preset word vector lookup table can be obtained. Among them, the word vector lookup table includes multiple word indexes and word vectors corresponding to each word index, where one word corresponds to one word index. Therefore, according to the word index corresponding to each word in the multiple words, the word vector corresponding to each word in the multiple words constituting the disease description data can be determined from the word vector lookup table. Further, by summing or weighted summation of the word vectors corresponding to each word, the first feature vector corresponding to the state description data can be obtained.

Optionally, in some feasible implementation manners, after performing word segmentation processing on the state description data and obtaining multiple words that make up the state description data, stop words in the multiple words may be eliminated first, and then the elimination may be stopped. The remaining word segmentation after the word is processed to obtain the word vector corresponding to the remaining word segmentation, and then the feature vector determined according to the word vector corresponding to the remaining word segmentation is used as the first feature vector corresponding to the state description data. Among them, the eliminated stop words may include modal particles, adverbs, prepositions, conjunctions, etc., which are specifically determined according to actual application scenarios and are not limited here.

S205: Acquire a third feature vector corresponding to the state determination data, and determine whether the state determination data is error data according to the second feature vector and the third feature vector.

It should be understood that by performing word segmentation processing on the state determination data included in the text data to be verified, multiple words constituting the state determination data can be obtained. By obtaining the word vector corresponding to each word in the plurality of words constituting the state determination data, the first feature vector corresponding to the state determination data can be generated according to the word vector corresponding to each word. For example, when the text data to be verified includes medical record data, the status determination data for the target object in the text data to be verified may include the patient's condition diagnosis data, and the status determination data for the target object in the text data to be verified may include the condition of the patient. Diagnostic data. Therefore, by performing word segmentation processing on the disease diagnosis data, multiple words that make up the disease diagnosis data can be obtained. By obtaining the word vector corresponding to each of the multiple words that make up the disease diagnosis data, the word vector corresponding to each word can be obtained. Generate the first feature vector corresponding to the disease diagnosis data. It should be understood that when determining the word vector corresponding to each word, a preset word vector lookup table can be obtained. Among them, the word vector lookup table includes multiple word indexes and word vectors corresponding to each word index, where one word corresponds to one word index. Therefore, according to the word index corresponding to each word in the multiple words, the word vector corresponding to each word in the multiple words constituting the disease diagnosis data can be determined from the word vector lookup table. Further, by summing or weighted summation of the word vectors corresponding to each word, the first feature vector corresponding to the state determination data can be obtained. Optionally, after performing word segmentation processing on the state determination data and obtaining multiple words that make up the state determination data, stop words in the multiple words can be eliminated first, and then the remaining word segmentation after eliminating the stop words is processed , To obtain the word vector corresponding to the remaining participle, and then determine the feature vector according to the word vector corresponding to the remaining participle as the first feature vector corresponding to the state determination data. Among them, the eliminated stop words may include modal particles, adverbs, prepositions, conjunctions, etc., which are specifically determined according to actual application scenarios and are not limited here.

Among them, the data pair matching model can be an end-to-end model. By inputting at least one sample data to the end-to-end model, the end-to-end model can be continuously optimized and adjusted based on the output result of the end-to-end model and the matching label of each sample data pair. The model parameters of the end-to-end model, and then the end-to-end model that satisfies the convergence condition is obtained. For example, the sample data pair may include the feature vector corresponding to the disease description data and the feature vector corresponding to the disease diagnosis data, where the matching label includes 1 and 0, where 1 indicates that the disease description data in the data pair matches the disease diagnosis data, and 0 Indicates that the condition description data in the data pair does not match the condition diagnosis data. It is not difficult to understand that when it is determined that the output result of the matching model is not matched, it can be determined that the state description data is wrong data. For example, when the output result of the matching model is that the condition description data and the condition diagnosis data do not match, it can be determined that the condition diagnosis data is wrong data, that is, misdiagnosis data.

For example, please refer to FIG. 4, which is a schematic diagram of the framework of a generative confrontation network and a data pair matching model provided by an embodiment of the present application. As shown in FIG. 4, the first discriminator can be constructed based on the sample text data of the first data source in the training sample set, and the second discriminator can be constructed based on the sample text data from the second data source in the training sample set. Then, based on each sample text data in the training sample set (for example, the sample text data of the first data source and the sample text data of the second data source) and the first discriminator and the second discriminator in the generative confrontation network Build the generator after training. Further, the to-be-verified text data of any data source is obtained, and the to-be-verified text data includes the state description data of the target object and the state determination data for the target object. The first feature vector corresponding to the state description data is obtained, and the first feature vector is input to the generator in the generative confrontation network to output the second feature vector through the generator. By obtaining the third feature vector corresponding to the state determination data, the second feature vector and the third feature vector can be input to the data pair matching model, and based on the output result of the data pair matching model, it is determined whether the state determination data is wrong data.

In the embodiment of the present application, by obtaining the training sample set, the first discriminator can be constructed based on the sample text data from the first data source in the training sample set, and the first discriminator can be constructed based on the sample text data from the second data source in the training sample set. Two discriminator. Further, confrontation training can be performed on the two discriminators according to the sample text data of at least two data sources in the training sample set to obtain the generator in the generative confrontation network. Therefore, by obtaining the to-be-verified text data from any data source, the to-be-verified text data can be obtained including the state description data of the target object and the state determination data for the target object. By obtaining the first feature vector corresponding to the state description data, and inputting the first feature vector into the generator in the generative confrontation network, the second feature vector can be output through the generator. Further, by acquiring the third feature vector corresponding to the state determination data, whether the state determination data is error data can be determined according to the second feature vector and the third feature vector. By adopting the embodiments of the present application, the detection accuracy of text data can be improved, and the applicability is strong.

Referring to FIG. 5, FIG. 5 is a schematic structural diagram of an error detection apparatus for text data provided by an embodiment of the present application. The error detection device for text data provided by the embodiment of the present application includes: a data acquisition module 31 for acquiring text data to be verified from any data source. The text data to be verified includes the state description data of the target object and the description data for the target object. The state determination data; the data processing module 32 is used to obtain the first feature vector corresponding to the state description data, and input the first feature vector into the generator in the generative confrontation network to output the second feature vector through the generator , The above generator is obtained by adversarial training based on sample text data from at least two data sources and at least two discriminators in the generative confrontation network, wherein one discriminator is derived from one of the at least two data sources. The sample text data is obtained through training; the data detection module 33 is used to obtain the third feature vector corresponding to the state determination data, and determine whether the state determination data is wrong data according to the second feature vector and the third feature vector.

Please refer to FIG. 6 together. FIG. 6 is another schematic diagram of the structure of the text data error detection apparatus provided by an embodiment of the present application.

In some feasible implementation manners, the data detection module 33 is specifically configured to: input the second feature vector and the third feature vector into a data pair matching model, and determine whether the state determination data is based on the output result of the data pair matching model. Is wrong data.

The aforementioned data pair matching model is obtained by training based on at least one sample data pair and the matching label of each sample data pair, and one of the sample data pairs includes a fourth feature vector corresponding to the state description data in the sample text data and state determination data Corresponding to the fifth feature vector, the matching label of any sample data pair is used to identify whether the fourth feature vector and the fifth feature vector in any one of the sample data pairs match.

In some feasible embodiments, the at least two data sources include a first data source and a second data source, the at least two discriminators include a first discriminator and a second discriminator, and the device further includes a first training module 34. The first training module 34 is configured to: obtain a training sample set, the training sample set includes sample text data from the first data source and sample text data from the second data source, wherein one sample data pair includes a sample The state description data in the text data and the state determination label of the state description data; the first discriminator is constructed based on the sample text data from the first data source in the training sample set, and the second discriminator is based on the training sample set from the second The sample text data of the data source constructs the above-mentioned second discriminator.

In some feasible implementation manners, the above-mentioned apparatus further includes a second training module 35, and the above-mentioned second training module 35 includes: a training data obtaining unit 351, configured to obtain state description data in each sample text data in the above-mentioned training sample set; The training data processing unit 352 is configured to input the first state description feature vector corresponding to the state description data in the above-mentioned sample text data into the above generator, and obtain the second state description feature vector output by the above generator; the determination result obtaining unit 353. The second state description feature vector is used to input the first discriminator and the second discriminator respectively, and obtain the probability distribution of the first judgment result output by the first discriminator and the first judgment result output by the second discriminator. Second, the probability distribution of the determination result; the generator adjustment unit 354 is configured to adjust the model parameters of the generator according to the probability distribution of the first determination result and the probability distribution of the second determination result to obtain a generator that satisfies the convergence condition.

In some feasible implementation manners, the generator adjustment unit 354 is further configured to: calculate the first standard deviation of the multiple judgment result probabilities included in the first judgment result probability distribution and the first standard deviation of the plurality of judgment result probabilities included in the second judgment result probability distribution. The second standard deviation of the probabilities of multiple judgment results; when the first standard deviation and the second standard deviation are both less than or equal to the preset standard deviation threshold, the generator is determined to meet the convergence condition after adjusting the model parameters.

In some feasible implementation manners, the text data to be verified includes medical record data, the state description data for the target object in the text data to be verified includes disease description data of the patient, and the text data to be verified is for the state of the target object. The judgment data includes disease diagnosis data for the above-mentioned patients.

In some feasible implementation manners, the aforementioned data processing module 32 includes a first feature vector acquiring unit 321 and a second feature vector acquiring unit 322, wherein the aforementioned first feature vector acquiring unit 321 is specifically configured to: Word segmentation processing to obtain multiple words that make up the disease description data; obtain the word vector corresponding to each of the multiple words that make up the disease description data, and generate the corresponding word vector for the disease description data according to the word vector corresponding to each word. The first feature vector.

In the embodiment of the present application, the text data error detection device can construct the first discriminator based on the sample text data from the first data source in the training sample set, and construct the first discriminator based on the sample text data from the second data source in the training sample set. Two discriminator. Further, confrontation training can be performed on the two discriminators according to the sample text data of at least two data sources in the training sample set to obtain the generator in the generative confrontation network. Therefore, by obtaining the to-be-verified text data from any data source, the to-be-verified text data can be obtained including the state description data of the target object and the state determination data for the target object. By obtaining the first feature vector corresponding to the state description data, and inputting the first feature vector into the generator in the generative confrontation network, the second feature vector can be output through the generator. Further, by acquiring the third feature vector corresponding to the state determination data, whether the state determination data is error data can be determined according to the second feature vector and the third feature vector. By adopting the embodiments of the present application, the detection accuracy of text data can be improved, and the applicability is strong.

Refer to FIG. 7, which is a schematic structural diagram of a terminal device provided by an embodiment of the present application. As shown in FIG. 7, the terminal device in this embodiment may include: one or more processors 401, a memory 402, and a transceiver 403. The aforementioned processor 401, memory 402, and transceiver 403 are connected via a bus 404. The memory 402 is used to store a computer program, the computer program includes program instructions, and the processor 401 is used to execute the program instructions stored in the memory 402 to perform the following operations: obtain text data to be verified from any data source, and the text data to be verified includes The state description data of the target object and the state determination data for the target object; the first feature vector corresponding to the state description data is obtained, and the first feature vector is input to the generator in the generative confrontation network to output through the generator The second feature vector is obtained by the generator based on the sample text data of at least two data sources and at least two discriminators in the generative confrontation network, where one discriminator is obtained from the at least two data sources. The sample text data of a data source is obtained through training; the third feature vector corresponding to the state determination data is obtained, and whether the state determination data is wrong data is determined according to the second feature vector and the third feature vector.

In some feasible implementation manners, the processor 401 is configured to: determine whether the state determination data is error data according to the second feature vector and the third feature vector, including: combining the second feature vector with the first feature vector Three feature vector input data pair matching model, based on the output result of the above data pair matching model, determine whether the state judgment data is wrong data; wherein, the above data pair matching model is based on at least one sample data pair and the matching label training of each sample data pair Obtained, one of the sample data pairs includes a fourth feature vector corresponding to the state description data in the sample text data and a fifth feature vector corresponding to the state determination data. The matching label of any sample data pair is used to identify any of the above samples. Whether the fourth feature vector and the fifth feature vector in the data pair match.

In some feasible implementation manners, the aforementioned at least two data sources include a first data source and a second data source, and the processor 401 is configured to: obtain a training sample set, and the training sample set includes sample text from the first data source. Data and the sample text data of the second data source, one of the sample data pairs includes the state description data in the sample text data and the state determination label of the state description data; based on the training sample set from the first data source The first discriminator is constructed based on the sample text data of the training sample set from the second data source.

In some feasible implementation manners, the above-mentioned processor 401 is configured to: obtain the state description data in each sample text data in the above-mentioned training sample set; Input the above-mentioned generator, and obtain the second state description feature vector output by the above-mentioned generator; input the above-mentioned second state description characteristic vector into the above-mentioned first discriminator and the above-mentioned second discriminator respectively, and obtain the output of the above-mentioned first discriminator The probability distribution of the first determination result and the probability distribution of the second determination result output by the second discriminator; according to the probability distribution of the first determination result and the probability distribution of the second determination result, the model parameters of the generator are adjusted to obtain the model parameters that satisfy the convergence condition Builder.

In some feasible implementation manners, the processor 401 is configured to: calculate the first standard deviation of the probabilities of the multiple judgment results included in the probability distribution of the first judgment result and the multiple judgments included in the probability distribution of the second judgment result. The second standard deviation of the result probability; when the first standard deviation and the second standard deviation are both less than or equal to the preset standard deviation threshold, it is determined that the generator meets the convergence condition after adjusting the model parameters.

In some feasible implementation manners, the processor 401 is configured to: perform word segmentation processing on the disease description data to obtain multiple words composing the disease description data; obtain each word among the multiple words composing the disease description data According to the corresponding word vector, the first feature vector corresponding to the disease description data is generated according to the word vector corresponding to each word.

It should be understood that, in some feasible implementation manners, the aforementioned processor 401 may be a central processing unit (CPU), and the processor may also be other general-purpose processors or digital signal processors. (digital signal processor, DSP), application specific integrated circuit (application specific integrated circuit, ASIC), off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. The memory 402 may include a read-only memory and a random access memory, and provides instructions and data to the processor 401. A part of the memory 402 may also include a non-volatile random access memory. For example, the memory 402 may also store device type information.

In specific implementation, the above-mentioned terminal device can execute the implementation manners provided in the steps in Figures 1 to 3 through its built-in functional modules. For details, please refer to the implementation manners provided in the above-mentioned steps, which will not be repeated here.

In this embodiment of the application, the terminal device may construct a first discriminator based on sample text data from a first data source in the training sample set, and construct a second discriminator based on sample text data from a second data source in the training sample set. Further, confrontation training can be performed on the two discriminators according to the sample text data of at least two data sources in the training sample set to obtain the generator in the generative confrontation network. Therefore, by obtaining the to-be-verified text data from any data source, the to-be-verified text data can be obtained including the state description data of the target object and the state determination data for the target object. By obtaining the first feature vector corresponding to the state description data, and inputting the first feature vector into the generator in the generative confrontation network, the second feature vector can be output through the generator. Further, by acquiring the third feature vector corresponding to the state determination data, whether the state determination data is error data can be determined according to the second feature vector and the third feature vector. By adopting the embodiments of the present application, the detection accuracy of text data can be improved, and the applicability is high.

The embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, the computer program includes program instructions, the program instructions are executed by a processor to implement the steps shown in FIGS. 1 to 3 For the error detection method of the provided text data, please refer to the implementation manners provided in the above steps for details, which will not be repeated here. Optionally, the storage medium involved in this application, such as a computer-readable storage medium, may be non-volatile or volatile.

The foregoing computer-readable storage medium may be the text data error detection apparatus provided in any of the foregoing embodiments or the internal storage unit of the foregoing terminal device, such as a hard disk or memory of an electronic device. The computer-readable storage medium may also be an external storage device of the electronic device, such as a plug-in hard disk, a smart media card (SMC), or a secure digital (SD) card equipped on the electronic device. Flash memory card card) etc. Further, the computer-readable storage medium may also include both an internal storage unit of the electronic device and an external storage device. The computer-readable storage medium is used to store the computer program and other programs and data required by the electronic device. The computer-readable storage medium can also be used to temporarily store data that has been output or will be output.

The terms "first", "second", "third", "fourth", etc. in the claims, specification and drawings of this application are used to distinguish different objects, rather than describing a specific order. In addition, the terms "including" and "having" and any variations of them are intended to cover non-exclusive inclusions. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes unlisted steps or units, or optionally also includes Other steps or units inherent to these processes, methods, products or equipment.

The reference to "embodiments" herein means that a specific feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present application. The display of the phrase in various positions in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment mutually exclusive with other embodiments. Those skilled in the art clearly and implicitly understand that the embodiments described herein can be combined with other embodiments. The term "and/or" used in the description of this application and the appended claims refers to any combination of one or more of the associated listed items and all possible combinations, and includes these combinations. A person of ordinary skill in the art may be aware that the units and algorithm steps of the examples described in the embodiments disclosed herein can be implemented by electronic hardware, computer software, or a combination of both, in order to clearly illustrate the hardware and software Interchangeability, in the above description, the composition and steps of each example have been generally described in accordance with the function. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

The methods and related devices provided in the embodiments of the present application are described with reference to the method flowcharts and/or structural schematic diagrams provided in the embodiments of the present application, and each process and/or structural schematic diagrams of the method flowcharts and/or structural schematic diagrams can be implemented by computer program instructions. Or a block, and a combination of processes and/or blocks in the flowcharts and/or block diagrams. These computer program instructions can be provided to the processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing equipment to generate a machine, so that instructions executed by the processor of the computer or other programmable data processing equipment are generated for use. It is a device that implements the functions specified in one block or multiple blocks in a flow chart or multiple flows and/or a schematic structural diagram. These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the schematic structural diagram. These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing the functions specified in one block or multiple blocks in the flow chart or the flow chart and/or the structure.

Claims

An error detection method for text data, the method includes:

Acquiring text data to be verified from any data source, where the text data to be verified includes state description data of the target object and state determination data for the target object;

Obtain a first feature vector corresponding to the state description data, and input the first feature vector into a generator in a generative confrontation network to output a second feature vector through the generator, and the generator is based on at least two The sample text data of the data source is obtained by adversarial training with at least two discriminators in the generative confrontation network, where one discriminator is obtained by training the sample text data of one of the at least two data sources;

Acquire a third feature vector corresponding to the state determination data, and determine whether the state determination data is error data according to the second feature vector and the third feature vector.
The method according to claim 1, wherein the determining whether the state determination data is error data according to the second feature vector and the third feature vector comprises:

Inputting the second feature vector and the third feature vector to a data pair matching model, and determining whether the state determination data is error data based on the output result of the data pair matching model;

Wherein, the data pair matching model is obtained by training based on at least one sample data pair and the matching label of each sample data pair, and one of the sample data pairs includes a fourth feature vector corresponding to the state description data in the sample text data and the state judgment The fifth feature vector corresponding to the data, and the matching label of any sample data pair is used to identify whether the fourth feature vector and the fifth feature vector in any sample data pair match.
The method according to claim 1 or 2, wherein the at least two data sources include a first data source and a second data source, the at least two discriminators include a first discriminator and a second discriminator, so Before obtaining the text data to be verified, the method further includes:

Obtain a training sample set, the training sample set includes sample text data from the first data source and sample text data from the second data source, wherein a pair of sample data includes state description data in one sample text data and The state determination label of the state description data;

The first discriminator is constructed based on the sample text data from the first data source in the training sample set, and the second discriminator is constructed based on the sample text data from the second data source in the training sample set Device.
The method according to claim 3, wherein the method further comprises:

Acquiring state description data in each sample text data in the training sample set;

Input the first state description feature vector corresponding to the state description data in each sample text data into the generator, and obtain the second state description feature vector output by the generator;

The second state description feature vector is input to the first discriminator and the second discriminator respectively, and the probability distribution of the first judgment result output by the first discriminator and the output probability distribution of the second discriminator are obtained. The probability distribution of the second judgment result;

The model parameters of the generator are adjusted according to the probability distribution of the first determination result and the probability distribution of the second determination result to obtain a generator that satisfies the convergence condition.
The method according to claim 4, wherein the method further comprises:

Calculating the first standard deviation of the probabilities of the multiple judgment results included in the probability distribution of the first judgment result and the second standard deviation of the probabilities of the multiple judgment results included in the probability distribution of the second judgment result;

When the first standard deviation and the second standard deviation are both less than or equal to a preset standard deviation threshold, it is determined that the generator satisfies the convergence condition after adjusting the model parameters.
The method according to claim 1, wherein the text data to be verified includes medical record data, the state description data for the target object in the text data to be verified includes disease description data of the patient, and the text data to be verified The state determination data for the target object in includes disease diagnosis data for the patient.
The method according to claim 6, wherein said obtaining the first feature vector corresponding to the state description data comprises:

Perform word segmentation processing on the disease description data to obtain multiple words that make up the disease description data;

A word vector corresponding to each of the multiple words constituting the disease description data is obtained, and a first feature vector corresponding to the disease description data is generated according to the word vector corresponding to each word.
An error detection device for text data, the device comprising:

A data acquisition module for acquiring text data to be verified from any data source, where the text data to be verified includes state description data of the target object and state determination data for the target object;

The data processing module is configured to obtain a first feature vector corresponding to the state description data, and input the first feature vector into a generator in a generative confrontation network to output a second feature vector through the generator, the The generator is obtained by adversarial training based on sample text data from at least two data sources and at least two discriminators in the generative adversarial network, where one discriminator is derived from one of the at least two data sources. The sample text data is trained;

The data detection module is configured to obtain a third feature vector corresponding to the state determination data, and determine whether the state determination data is error data according to the second feature vector and the third feature vector.
A terminal device includes a processor and a memory, and the processor and the memory are connected to each other;

The memory is used to store a computer program, the computer program includes program instructions, and the processor is configured to call the program instructions to execute the following methods:

Acquiring text data to be verified from any data source, where the text data to be verified includes state description data of the target object and state determination data for the target object;

Obtain a first feature vector corresponding to the state description data, and input the first feature vector into a generator in a generative confrontation network to output a second feature vector through the generator, and the generator is based on at least two The sample text data of the data source is obtained by adversarial training with at least two discriminators in the generative confrontation network, where one discriminator is obtained by training the sample text data of one of the at least two data sources;

Acquire a third feature vector corresponding to the state determination data, and determine whether the state determination data is error data according to the second feature vector and the third feature vector.
The terminal device according to claim 9, wherein when determining whether the state determination data is error data according to the second feature vector and the third feature vector, specifically execute:

Inputting the second feature vector and the third feature vector to a data pair matching model, and determining whether the state determination data is error data based on the output result of the data pair matching model;

Wherein, the data pair matching model is obtained by training based on at least one sample data pair and the matching label of each sample data pair, and one of the sample data pairs includes a fourth feature vector corresponding to the state description data in the sample text data and the state judgment The fifth feature vector corresponding to the data, and the matching label of any sample data pair is used to identify whether the fourth feature vector and the fifth feature vector in any sample data pair match.
The terminal device according to claim 9 or 10, wherein the at least two data sources include a first data source and a second data source, and the at least two discriminators include a first discriminator and a second discriminator, Before obtaining the text data to be verified, the processor is further configured to execute:

Obtain a training sample set, the training sample set includes sample text data from the first data source and sample text data from the second data source, wherein a pair of sample data includes state description data in one sample text data and The state determination label of the state description data;

The first discriminator is constructed based on the sample text data from the first data source in the training sample set, and the second discriminator is constructed based on the sample text data from the second data source in the training sample set Device.
The terminal device according to claim 11, wherein the processor is further configured to execute:

Acquiring state description data in each sample text data in the training sample set;

Input the first state description feature vector corresponding to the state description data in each sample text data into the generator, and obtain the second state description feature vector output by the generator;

The second state description feature vector is input to the first discriminator and the second discriminator respectively, and the probability distribution of the first judgment result output by the first discriminator and the output probability distribution of the second discriminator are obtained. The probability distribution of the second judgment result;

The model parameters of the generator are adjusted according to the probability distribution of the first determination result and the probability distribution of the second determination result to obtain a generator that satisfies the convergence condition.
The terminal device according to claim 12, wherein the processor is further configured to execute:

Calculating the first standard deviation of the probabilities of the multiple judgment results included in the probability distribution of the first judgment result and the second standard deviation of the probabilities of the multiple judgment results included in the probability distribution of the second judgment result;

When the first standard deviation and the second standard deviation are both less than or equal to a preset standard deviation threshold, it is determined that the generator satisfies the convergence condition after adjusting the model parameters.
The terminal device according to claim 9, wherein the text data to be verified includes medical record data, the state description data for the target object in the text data to be verified includes disease description data of the patient, and the text to be verified The state determination data for the target object in the data includes disease diagnosis data for the patient;

When the first feature vector corresponding to the state description data is obtained, the following is specifically executed:

Perform word segmentation processing on the disease description data to obtain multiple words that make up the disease description data;

A word vector corresponding to each of the multiple words constituting the disease description data is obtained, and a first feature vector corresponding to the disease description data is generated according to the word vector corresponding to each word.
A computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program includes program instructions that, when executed by a processor, cause the processor to perform the following method:

Acquiring text data to be verified from any data source, where the text data to be verified includes state description data of the target object and state determination data for the target object;

Obtain a first feature vector corresponding to the state description data, and input the first feature vector into a generator in a generative confrontation network to output a second feature vector through the generator, and the generator is based on at least two The sample text data of the data source is obtained by adversarial training with at least two discriminators in the generative confrontation network, where one discriminator is obtained by training the sample text data of one of the at least two data sources;

Acquire a third feature vector corresponding to the state determination data, and determine whether the state determination data is error data according to the second feature vector and the third feature vector.
15. The computer-readable storage medium according to claim 15, wherein when determining whether the state determination data is error data according to the second feature vector and the third feature vector, specifically execute:

Inputting the second feature vector and the third feature vector to a data pair matching model, and determining whether the state determination data is error data based on the output result of the data pair matching model;

Wherein, the data pair matching model is obtained by training based on at least one sample data pair and the matching label of each sample data pair, and one of the sample data pairs includes a fourth feature vector corresponding to the state description data in the sample text data and the state judgment The fifth feature vector corresponding to the data, and the matching label of any sample data pair is used to identify whether the fourth feature vector and the fifth feature vector in any sample data pair match.
The computer-readable storage medium according to claim 15 or 16, wherein the at least two data sources include a first data source and a second data source, and the at least two discriminators include a first discriminator and a second data source. The discriminator, before acquiring the text data to be verified, when the program instructions are executed by the processor, the processor also executes:

Obtain a training sample set, the training sample set includes sample text data from the first data source and sample text data from the second data source, wherein a pair of sample data includes state description data in one sample text data and The state determination label of the state description data;

The first discriminator is constructed based on the sample text data from the first data source in the training sample set, and the second discriminator is constructed based on the sample text data from the second data source in the training sample set Device.
18. The computer-readable storage medium of claim 17, wherein the program instructions, when executed by the processor, also cause the processor to execute:

Acquiring state description data in each sample text data in the training sample set;

Input the first state description feature vector corresponding to the state description data in each sample text data into the generator, and obtain the second state description feature vector output by the generator;

The second state description feature vector is input to the first discriminator and the second discriminator respectively, and the probability distribution of the first judgment result output by the first discriminator and the output probability distribution of the second discriminator are obtained. The probability distribution of the second judgment result;

The model parameters of the generator are adjusted according to the probability distribution of the first determination result and the probability distribution of the second determination result to obtain a generator that satisfies the convergence condition.
18. The computer-readable storage medium of claim 18, wherein the program instructions, when executed by the processor, also cause the processor to execute:

Calculating the first standard deviation of the probabilities of the multiple judgment results included in the probability distribution of the first judgment result and the second standard deviation of the probabilities of the multiple judgment results included in the probability distribution of the second judgment result;

When the first standard deviation and the second standard deviation are both less than or equal to a preset standard deviation threshold, it is determined that the generator satisfies the convergence condition after adjusting the model parameters.
The computer-readable storage medium according to claim 15, wherein the text data to be verified includes medical record data, and the state description data for the target object in the text data to be verified includes the patient's condition description data, and The state determination data for the target object in the text data to be verified includes disease diagnosis data for the patient;

When the first feature vector corresponding to the state description data is obtained, the following is specifically executed:

Perform word segmentation processing on the disease description data to obtain multiple words that make up the disease description data;

A word vector corresponding to each of the multiple words constituting the disease description data is obtained, and a first feature vector corresponding to the disease description data is generated according to the word vector corresponding to each word.