CN114741494A - Question answering method, device, equipment and medium - Google Patents

Question answering method, device, equipment and medium Download PDF

Info

Publication number
CN114741494A
CN114741494A CN202210579181.2A CN202210579181A CN114741494A CN 114741494 A CN114741494 A CN 114741494A CN 202210579181 A CN202210579181 A CN 202210579181A CN 114741494 A CN114741494 A CN 114741494A
Authority
CN
China
Prior art keywords
data
question
answer
text
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210579181.2A
Other languages
Chinese (zh)
Inventor
杨欣怡
赵亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Minglue Zhaohui Technology Co Ltd
Original Assignee
Beijing Minglue Zhaohui Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Minglue Zhaohui Technology Co Ltd filed Critical Beijing Minglue Zhaohui Technology Co Ltd
Priority to CN202210579181.2A priority Critical patent/CN114741494A/en
Publication of CN114741494A publication Critical patent/CN114741494A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a question answering method, a device, equipment and a medium, wherein the method comprises the following steps: under the condition that question data input by a user is received, determining answer data corresponding to the question data in a preset database, wherein the question data are data containing numerical questions and answers; normalizing the question data and the answer data to obtain first question data and first answer data of the same dimension; inputting the first question data and the first answer data into a text matching model to generate a matching result; and determining a target answer result by using the matching degree in the matching result. The method and the device have the advantages that the numerical values in the question and answer data are subjected to normalized processing, so that the question and answer data can be matched in the same dimension, and the problem of inaccurate answer caused by different numerical values is solved.

Description

Question answering method, device, equipment and medium
Technical Field
The present application relates to the field of deep learning technologies, and in particular, to a question answering method, device, apparatus, and medium.
Background
In recent years, the development of intelligent question answering is getting better, and the current method for acquiring the target answer of the question mainly comprises the following steps: determining similar questions in a database with the questions to be answered, and extracting answers of the similar questions as target answers; and performing text matching on the candidate answers and the questions to be solved, and the like. The method can achieve a good effect in judging the question-answer type scene, but when the question-answer type scene is judged in a numerical type, the deviation between the output answer and the actual answer is caused because the numerical difference is not accurately detected.
For the above-mentioned problem that the output solution and the actual solution deviate due to inaccurate detection of the value difference when the question and answer scene of the value class is judged, an effective solution is not proposed at present.
Disclosure of Invention
The application provides a question answering method, a question answering device, question answering equipment and a question answering medium, which are used for solving or at least partially solving the technical problem that the output answer and the actual answer are deviated due to inaccurate detection of numerical difference when the numerical type question answering scene is judged.
According to an aspect of an embodiment of the present application, there is provided a question answering method, including: under the condition that question data input by a user is received, determining answer data corresponding to the question data in a preset database, wherein the question data are data containing numerical questions and answers; normalizing the question data and the answer data to obtain first question data and first answer data of the same dimension; inputting the first question data and the first answer data into a text matching model to generate a matching result; and determining a target answer result by using the matching degree in the matching result.
Optionally, normalizing the question data and the answer data to obtain the first question data and the first answer data of the same dimension includes: respectively carrying out data conversion on the question data and the answer data to obtain second question data and first answer data; and performing numerical value conversion on the second question data by using the first answer data to obtain first question data.
Optionally, the data conversion of the question data and the answer data respectively comprises at least one of the following ways: determining synonymous data in the question data and the answer data, determining standard synonymous data matched with the synonymous data, and replacing the synonymous data in the question data and the answer data with the standard synonymous data; determining synonymous unit data in the question data and the answer data, and determining standard unit data matching the unit data, and replacing the unit data in the question data and the answer data with the standard unit data; the method comprises the steps of determining non-normalized interval marks in the question data and the answer data, determining standard interval texts matched with the non-normalized interval marks, and replacing the non-normalized interval marks in the question data and the answer data with the standard interval texts.
Optionally, numerically converting the second question data using the first answer data to obtain the first question data includes: extracting a first numerical value in the first answer data, extracting a second numerical value in the second question data, and comparing the first numerical value with the second numerical value to obtain a comparison result; and in the case that the comparison result indicates that the first numerical value and the second numerical value are different, replacing the second numerical value with the first numerical value to generate first problem data.
Optionally, determining answer data corresponding to the question data in the preset database includes: determining first text data associated with the question data in a preset database, wherein the first text data is text data of a paragraph where the answer data is located; and inputting the first text data and the question data into a preset prediction model to obtain answer data with the highest relevance degree with the question data.
Optionally, inputting the first question data and the first answer data into a text matching model, and generating the matching result includes: splicing the first question data and the first answer data to generate second text data; and inputting the second text data into the text matching model to generate a matching result.
Optionally, the preset database is generated as follows: the method comprises the steps of obtaining question and answer data, dividing relevant texts in the question and answer data according to preset granularity, and obtaining a plurality of text fields, wherein the question and answer data comprise question texts and relevant texts relevant to the question texts; integrating the text fields matched with the question text to generate an answer text of the question text, and storing the answer text into a preset database.
According to another aspect of the embodiments of the present application, there is provided a question answering device, including: the system comprises a first determining module, a second determining module and a third determining module, wherein the first determining module is used for determining answer data corresponding to question data in a preset database under the condition that the question data input by a user is received, and the question data is data containing numerical questions and answers; the processing module is used for carrying out standardization processing on the question data and the answer data to obtain first question data and first answer data of the same dimension; the generating module is used for inputting the first question data and the first answer data into the text matching model and generating a matching result; and the second determining module is used for determining the target answer result by using the matching degree in the matching result.
According to another aspect of the embodiments of the present application, there is also provided an electronic device, including a memory, a processor, a communication interface, and a communication bus, where the memory stores a computer program executable on the processor, and the memory and the processor communicate with each other through the communication bus and the communication interface, and the processor implements the steps of any one of the above methods when executing the computer program.
According to another aspect of embodiments of the present application, there is also provided a computer-readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform any of the methods described above.
The technical scheme of the application can be applied to the design of natural language processing by the deep learning technology.
Compared with the related art, the technical scheme provided by the embodiment of the application has the following advantages:
the application provides a question answering method, which comprises the following steps: under the condition that question data input by a user is received, determining answer data corresponding to the question data in a preset database, wherein the question data are data containing numerical questions and answers; normalizing the question data and the answer data to obtain first question data and first answer data of the same dimension; inputting the first question data and the first answer data into a text matching model to generate a matching result; and determining a target answer result by using the matching degree in the matching result.
The method and the device have the advantages that the numerical values in the question and answer data are subjected to normalized processing, so that the question and answer data can be matched in the same dimension, and the problem of inaccurate answer caused by different numerical values is solved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the technical solutions in the embodiments or related technologies of the present application, the drawings needed to be used in the description of the embodiments or related technologies will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without any creative effort.
Fig. 1 is a flowchart of an alternative question answering method provided in an embodiment of the present application;
FIG. 2 is a block diagram of an alternative question answering device provided in accordance with an embodiment of the present application;
fig. 3 is a schematic structural diagram of an alternative electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making creative efforts shall fall within the protection scope of the present application.
In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for the convenience of description of the present application, and have no specific meaning in themselves. Thus, "module" and "component" may be used in a mixture.
In recent years, with the rapid development of deep learning technology, intelligent question answering technology has become a research hotspot in the field of natural language processing. According to different questioning modes and answer forms, the questioning and answering tasks can be divided into: filling-in empty class, selecting class, judging class, extracting class, free answer class and the like. The judgment type question answering means that the question is judged according to the question input by the user and known information, and the result is output to be yes or no.
The current question-answering technical methods for judgment have the following three types: a rule-based question-answering technique: observing a large amount of data, manually summarizing the data and making complicated rules for judgment; a question-answering technology based on question-answering pairs: the question and answer pair database is organized and constructed, the question text input by the user is matched with the question text in the question and answer pair database in a similar way, the question with the highest similarity is screened out, and the corresponding answer is output and is used as the answer of the question input by the user; question-answering technology based on text matching: and coding the question text and the answer according to the text by using a pre-training model, performing fine-tuning training in a downstream task, acquiring the probability that the answer to the question is 'yes' or 'no', and selecting one with high probability for output.
However, the question-answering technology based on question-answer pairs needs to construct database maintenance data, a large amount of manpower and material resources are generally consumed for labeling and sorting in the process, and the question-answering based on question-answer pairs can only process the problems covered in the database, so that the generalization performance is not strong. In a numeric-judgment question-answer scenario, if a question input by a user is different in value from a certain question in a database, the correct answer to the question is likely to change, but this method may match the question in the database to a similar question, and then directly output the answer corresponding to the question, resulting in an incorrect answer.
Although the existing text matching-based question-answering judgment has a good effect in many scenes, in the scene needing to compare numerical values, if the text of a question input by a user is different from the answer according to the numerical values in the text, the numerical values are difficult to reason by the existing text matching model, so that the problem is easy to misjudge.
In order to solve the problems mentioned in the background art, according to an aspect of an embodiment of the present application, there is provided a question answering method, as shown in fig. 1, including:
step 101, under the condition that question data input by a user is received, determining answer data corresponding to the question data in a preset database, wherein the question data are data containing numerical questions and answers;
103, normalizing the question data and the answer data to obtain first question data and first answer data of the same dimension;
step 105, inputting the first question data and the first answer data into a text matching model to generate a matching result;
and step 107, determining a target answer result by using the matching degree in the matching result.
The method and the device are applied to a judgment type question-answering scene containing numerical values, and answer data represent answer bases capable of solving question data.
Specifically, according to the method and the device, answer data related to the question data are obtained firstly, and then the question data and the answer data are subjected to standardization processing, so that the question data and the answer data can be compared in the same dimension, and error influence caused by difference of numerical values is eliminated.
As an alternative embodiment, normalizing the question data and the answer data to obtain the first question data and the first answer data of the same dimension includes: respectively carrying out data conversion on the question data and the answer data to obtain second question data and first answer data; and performing numerical value conversion on the second question data by using the first answer data to obtain first question data.
Optionally, the data conversion of the question data and the answer data includes synonym conversion, unit conversion and interval conversion to obtain the second question data and the first answer data.
Optionally, numerically converting the second question data with the first answer data includes numerical unification, so that the first question data and the first answer data can be compared based on the same numerical value.
As an alternative embodiment, the data conversion of the question data and the answer data respectively includes at least one of the following ways: determining synonymous data in the question data and the answer data, determining standard synonymous data matched with the synonymous data, and replacing the synonymous data in the question data and the answer data with the standard synonymous data; determining synonymous unit data in the question data and the answer data, and determining standard unit data matching the unit data, and replacing the unit data in the question data and the answer data with the standard unit data; the method comprises the steps of determining non-normalized interval marks in the question data and the answer data, determining standard interval texts matched with the non-normalized interval marks, and replacing the non-normalized interval marks in the question data and the answer data with the standard interval texts.
The synonymous data in the question data and the answer data is replaced with the standard synonymous data, for example, "highest allowable", "lower than or equal to", "not exceeding", and the like in the question data and the answer data are replaced with "not more than".
The unit data in the question data and the answer data is replaced with standard unit data, for example, a unit represented by a Chinese unit and a symbol is unified, and a unit that can be converted is unified. For example, the "10 m" is converted into the "10 m" and the "120 cm" is converted into the "1.2 m".
The non-normalized interval labels in the question data and answer data are replaced with standard interval text, and the non-normalized interval labels are symbols (e.g., wave number-or square bracket [ ]) representing a range of values in the value interval, for example, the interval expression "36.3-37.2" is rewritten to "not less than 36.3 and not more than 37.2".
As an alternative embodiment, the numerically converting the second question data with the first answer data to obtain the first question data includes: extracting a first numerical value in the first answer data, extracting a second numerical value in the second question data, and comparing the first numerical value with the second numerical value to obtain a comparison result; and under the condition that the comparison result indicates that the first numerical value and the second numerical value are different, replacing the second numerical value with the first numerical value to generate first problem data.
Illustratively, the numerical values in the second question data and the first answer data are compared, and if the numerical values are different, the representation of the numerical values in the second question data is rewritten. For example: the second question data is "is body temperature 37.7 fever? "the first answer data is" body temperature is greater than 37.3 belongs to fever ", the numerical value in the second question data is changed to be greater than 37.3 by comparing the numerical values (37.7 and 37.3) in the second question data and the first answer data, that is, the first question data is generated as follows: "is body temperature greater than 37.3 fever? ".
For example, when the question data is "monthly salary does not exceed 5 thousand taxes are not needed? When the answer data is that the payroll exceeds 5 thousand per month and the personal income tax needs to be paid, the step of standardized processing comprises the following steps: firstly, carrying out synonym normalization, replacing 'not more than 5 thousand' in the question with 'not more than 5 thousand', and replacing 'more than 5 thousand' in the answer basis. Then, unit normalization is carried out, and the 5 thousand basis numbers of the questions and the answers are converted into 5000 basis numbers. After the above two steps, the problem expression is modified as "do monthly salary not greater than 5000 not need to pay tax? ", the answer is modified to" each monthly wage is greater than 5000 to pay personal income tax ".
For another example, when the question data is "height of Yaoming is 2.28M? "when the answer data" height of yaoming is 226cm ", the step of standardization treatment includes: in the present example, there is no synonym to be processed, the unit normalization is performed directly, the "2.28M" in the question is replaced with "2.28M", and the "226 cm" in the answer basis is converted into "2.16M". And then carrying out numerical value unification, and rewriting the '2.28 meters' in the problem to be 'more than 2.26 meters'. After the treatment, the problem data after the treatment is that' the height of yaoming is more than 2.26 m? ", the answer after the treatment is based on that" the height of Yaoming is 2.26 m ".
The problem that the question and answer identification accuracy is influenced by the difference of numerical values in the question data and the answer data is solved by carrying out standardized processing on the numerical values in the question data and the answer data and expressions or numerical value units related to the numerical values.
As an alternative embodiment, the determining answer data corresponding to the question data in the preset database includes: determining first text data associated with the question data in a preset database, wherein the first text data is text data of a paragraph where the answer data is located; and inputting the first text data and the question data into a preset prediction model to obtain answer data with the highest relevance degree with the question data.
Alternatively, the paragraph in which the answer data is located may be determined by ES search (Elasticsearch).
Optionally, the step of determining, in the preset database, that the first text data associated with the question data may be matching the first text data by using a keyword includes:
step 1, extracting a first keyword in question data, and querying a second keyword related to the first keyword in a preset database by using the first keyword, wherein the second keyword is a similar word which is the same as or has the same meaning as the first keyword, and the number of the first keyword and the number of the second keyword are both more than or equal to 1;
and 2, determining the paragraph where the second keyword is located as a target paragraph, and determining text data in the target paragraph as first text data.
Optionally, the step of determining, in the preset database, that the first text data associated with the question data is matched with the first text data through a text tag may further include:
step 1, marking paragraphs in a preset database in advance so that each paragraph includes at least one text label, where the text label is used to indicate a text type and/or text content of the paragraph, for example: article A, star B, scenic spot C, etc.;
and 2, extracting problem labels in the problem data, and inquiring candidate text labels matched with the problem labels in a preset database, wherein the candidate text labels and the problem labels can be the same words or similar words with the same meaning, and the number of the candidate text labels and the number of the problem labels are both more than or equal to 1.
Optionally, the first text data and the question data are input into a preset prediction model to obtain answer data with the highest degree of association with the question data, and the specific steps include:
step 1, inputting first text data and question data into a preset prediction model, and determining the probability that each word in the first text data belongs to the initial position and the end position of an answer based on a classifier in the preset prediction model;
and 2, combining the two positions with the maximum probability of the answer starting position and the answer ending position with the data contained in the two positions, and determining the two positions as answer data.
Alternatively, there are many methods for obtaining answer data, such as string matching, pointer networks, and other forms of reading understanding models.
Optionally, the training method of the preset prediction model includes:
step 1, obtaining a question sample and a paragraph sample containing an answer sample, and splicing the question sample and the paragraph sample to obtain a target sample;
step 2, inputting the target sample into the initial model for training, so as to obtain a first classification result by using a classifier in the initial model, wherein the first classification result comprises whether the answer sample in the target sample contains an answer or not and the probability that each word belongs to the answer starting position and the answer ending position;
step 3, comparing the answer position indicated in the first classification result with the real position of the answer sample, and if the answer position is different from the real position of the answer sample, calculating a loss function of the initial model in the current state;
and 4, continuously adjusting model parameters of the initial model by adopting a small-batch random gradient descent method based on the loss function of the initial model until the initial model reaches a convergence state, and obtaining a preset prediction model.
As an alternative embodiment, inputting the first question data and the first answer data to the text matching model, and generating the matching result includes: splicing the first question data and the first answer data to generate second text data; and inputting the second text data into the text matching model to generate a matching result.
Illustratively, the first question data and the first answer data are spliced as sub-texts and input into the text matching model, the two sub-texts are spliced by using separators (e.g., [ SEP ]), and a start character (e.g., [ CLS ]) and an end character (e.g., [ SEP ]) are respectively added at the beginning and the end of the spliced string to represent the beginning and the end of the spliced string.
The output position of the model output is provided with a two-classifier, and the matching degree of the first question data and the first answer data can be determined through the output probability value corresponding to the output position.
Determining a target answer result by using the target matching threshold and the matching degree in the matching result, wherein the method comprises the following steps:
the probability value ranges of the output judgment results are all in [0, 1], and the judgment results comprise 'yes' and 'no'. And determining a judgment result corresponding to the maximum value between the probability value output by the dichotomizer and the maximum value as a target answer result. For example, the output form of the binary device corresponds to the judgment result position of [ yes, no ], and when the output result is [0.3, 0.7], where 0.7>0.3, we can determine that the judgment result of the problem is "no".
Alternatively, the pre-training model used for text matching or answer extraction may be a BERT model, or ERNIE, RoBERTa, etc., and the present application is not limited thereto.
The invention provides a question-answering method for numerical reasoning, which reduces the range of answer bases by extracting a model based on the read and understood answer bases, reduces the interference of useless information in document segments on a subsequent text matching model, and improves the accuracy of the model. Meanwhile, the method helps the subsequent text matching model to enhance the numerical reasoning capability by carrying out numerical value standardization processing on the questions and the answers according to the text, and improves the accuracy rate of answering judgment type questions needing numerical reasoning.
As an alternative embodiment, the preset database is generated as follows: the method comprises the steps of obtaining question and answer data, dividing relevant texts in the question and answer data according to preset granularity, and obtaining a plurality of text fields, wherein the question and answer data comprise question texts and relevant texts relevant to the question texts; integrating the text fields matched with the question text to generate an answer text of the question text, and storing the answer text into a preset database.
Alternatively, the preset granularity may be preset, and the setting basis may be a chapter or a paragraph.
Optionally, labeling of the label may also be performed simultaneously when the text is divided, so that labeled label data can be directly identified in the subsequent operation.
The method and the device have the advantages that the numerical values in the question and answer data are subjected to normalized processing, so that the question and answer data can be matched in the same dimension, and the problem of inaccurate answer caused by different numerical values is solved.
According to another aspect of the embodiments of the present application, there is provided a question answering device, as shown in fig. 2, including:
a first determining module 202, configured to determine answer data corresponding to question data in a preset database when question data input by a user is received, where the question data is data including numerical questions and answers;
the processing module 204 is configured to perform normalization processing on the question data and the answer data to obtain first question data and first answer data of the same dimension;
a generating module 206, configured to input the first question data and the first answer data into the text matching model, and generate a matching result;
and a second determining module 208, configured to determine a target answer result by using the matching degree in the matching result.
It should be noted that the first determining module 202 in this embodiment may be configured to execute step 101 in this embodiment, the processing module 204 in this embodiment may be configured to execute step 103 in this embodiment, the generating module 206 in this embodiment may be configured to execute step 105 in this embodiment, and the second determining module 208 in this embodiment may be configured to execute step 107 in this embodiment.
Optionally, the processing module 204 further includes:
the first conversion submodule is used for respectively carrying out data conversion on the question data and the answer data so as to obtain second question data and first answer data;
and the first conversion submodule is used for carrying out numerical value conversion on the second question data by utilizing the first answer data so as to obtain first question data.
Optionally, the first conversion sub-module is further configured to perform at least one of the following: determining synonymous data in the question data and the answer data, determining standard synonymous data matched with the synonymous data, and replacing the synonymous data in the question data and the answer data with the standard synonymous data; determining synonymous unit data in the question data and the answer data, and determining standard unit data matching the unit data, and replacing the unit data in the question data and the answer data with the standard unit data; the method comprises the steps of determining non-normalized interval marks in the question data and the answer data, determining standard interval texts matched with the non-normalized interval marks, and replacing the non-normalized interval marks in the question data and the answer data with the standard interval texts.
Optionally, the first conversion sub-module is further configured to extract a first numerical value in the first answer data, extract a second numerical value in the second question data, and compare the first numerical value and the second numerical value to obtain a comparison result; and under the condition that the comparison result indicates that the first numerical value and the second numerical value are different, replacing the second numerical value with the first numerical value to generate first problem data.
Optionally, the first determining module 202 is further configured to determine, in a preset database, first text data associated with the question data, where the first text data is text data of a paragraph where the answer data is located; and inputting the first text data and the question data into a preset prediction model to obtain answer data with the highest relevance degree with the question data.
Optionally, the generating module 206 is further configured to splice the first question data and the first answer data to generate second text data; and inputting the second text data into the text matching model to generate a matching result.
Optionally, the question answering device is further configured to generate the preset database in the following manner: the method comprises the steps of obtaining question and answer data, dividing relevant texts in the question and answer data according to preset granularity, and obtaining a plurality of text fields, wherein the question and answer data comprise question texts and relevant texts relevant to the question texts; integrating the text fields matched with the question text to generate an answer text of the question text, and storing the answer text into a preset database.
It should be noted here that the modules described above are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the above embodiments.
According to another aspect of the embodiments of the present application, as shown in fig. 3, the present application provides an electronic device, which includes a memory 31, a processor 32, a communication interface 33 and a communication bus 34, wherein a computer program operable on the processor 32 is stored in the memory 31, the memory 31 and the processor 32 communicate with each other through the communication bus 34 and the communication interface 33, and the steps of the method are implemented when the processor 32 executes the computer program.
The memory and the processor in the electronic equipment are communicated with the communication interface through a communication bus. The communication bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc.
The Memory may include a Random Access Memory (RAM), and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
According to another aspect of embodiments of the present application, there is provided a computer readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform the steps of any of the methods described above.
Optionally, in an embodiment of the present application, a computer readable medium is configured to store program code for the processor to perform the above method steps.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments, and this embodiment is not described herein again.
When the embodiments of the present application are specifically implemented, reference may be made to the above embodiments, and corresponding technical effects are achieved.
It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the Processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof.
For a software implementation, the techniques described herein may be implemented by means of units performing the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or make a contribution to the prior art, or may be implemented in the form of a software product stored in a storage medium and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk. It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description is only an example of the present application, and is provided to enable any person skilled in the art to understand or implement the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A question-answering method, comprising:
under the condition that question data input by a user is received, determining answer data corresponding to the question data in a preset database, wherein the question data are data containing numerical questions and answers;
normalizing the question data and the answer data to obtain first question data and first answer data of the same dimension;
inputting the first question data and the first answer data into a text matching model to generate a matching result;
and determining a target answer result by using the matching degree in the matching result.
2. The method according to claim 1, wherein the normalizing the question data and the answer data to obtain first question data and first answer data of the same dimension comprises:
performing data conversion on the question data and the answer data respectively to obtain second question data and first answer data;
and performing numerical value conversion on the second question data by using the first answer data to obtain the first question data.
3. The method of claim 2, wherein the separately data converting the question data and the answer data comprises at least one of:
determining synonymous data in the question data and the answer data, and determining standard synonymous data matching the synonymous data, and replacing the synonymous data in the question data and the answer data with the standard synonymous data;
determining synonymous unit data in the question data and the answer data, and determining standard unit data matching the unit data, and replacing the unit data in the question data and the answer data with the standard unit data;
and determining non-normalized interval marks in the question data and the answer data, determining standard interval texts matched with the non-normalized interval marks, and replacing the non-normalized interval marks in the question data and the answer data with the standard interval texts.
4. The method of claim 2, wherein numerically converting the second question data with the first answer data to obtain the first question data comprises:
extracting a first numerical value in the first answer data, extracting a second numerical value in the second question data, and comparing the first numerical value with the second numerical value to obtain a comparison result;
and replacing the second numerical value with the first numerical value to generate the first problem data when the comparison result indicates that the first numerical value and the second numerical value are different.
5. The method of claim 1, wherein the determining answer data corresponding to the question data in a preset database comprises:
determining first text data associated with the question data in the preset database, wherein the first text data is text data of a paragraph where the answer data is located;
inputting the first text data and the question data into a preset prediction model to obtain the answer data with the highest relevance degree with the question data.
6. The method of claim 1, wherein inputting the first question data and the first answer data into a text matching model, generating a matching result comprises:
splicing the first question data and the first answer data to generate second text data;
and inputting the second text data into the text matching model to generate the matching result.
7. The method of claim 1, wherein the preset database is generated as follows:
the method comprises the steps of obtaining question and answer data, dividing relevant texts in the question and answer data according to preset granularity, and obtaining a plurality of text fields, wherein the question and answer data comprise question texts and the relevant texts relevant to the question texts;
integrating the text fields matched with the question text to generate an answer text of the question text, and storing the answer text to the preset database.
8. A question answering device, comprising:
the system comprises a first determining module, a second determining module and a third determining module, wherein the first determining module is used for determining answer data corresponding to question data in a preset database under the condition that the question data input by a user is received, and the question data is data containing numerical questions and answers;
the processing module is used for carrying out normalized processing on the question data and the answer data to obtain first question data and first answer data of the same dimension;
the generating module is used for inputting the first question data and the first answer data into a text matching model and generating a matching result;
and the second determining module is used for determining a target answer result by using the matching degree in the matching result.
9. An electronic device comprising a memory, a processor, a communication interface and a communication bus, wherein the memory stores a computer program operable on the processor, and the memory and the processor communicate with the communication interface via the communication bus, wherein the processor implements the steps of the method according to any of the claims 1 to 7 when executing the computer program.
10. A computer-readable medium having non-volatile program code executable by a processor, wherein the program code causes the processor to perform the method of any of claims 1 to 7.
CN202210579181.2A 2022-05-25 2022-05-25 Question answering method, device, equipment and medium Pending CN114741494A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210579181.2A CN114741494A (en) 2022-05-25 2022-05-25 Question answering method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210579181.2A CN114741494A (en) 2022-05-25 2022-05-25 Question answering method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN114741494A true CN114741494A (en) 2022-07-12

Family

ID=82288113

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210579181.2A Pending CN114741494A (en) 2022-05-25 2022-05-25 Question answering method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN114741494A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117236314A (en) * 2023-11-06 2023-12-15 杭州同花顺数据开发有限公司 Information extraction method, system, device and storage medium supporting super-long answers

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117236314A (en) * 2023-11-06 2023-12-15 杭州同花顺数据开发有限公司 Information extraction method, system, device and storage medium supporting super-long answers
CN117236314B (en) * 2023-11-06 2024-03-01 杭州同花顺数据开发有限公司 Information extraction method, system, device and storage medium supporting super-long answers

Similar Documents

Publication Publication Date Title
US11734328B2 (en) Artificial intelligence based corpus enrichment for knowledge population and query response
US11151130B2 (en) Systems and methods for assessing quality of input text using recurrent neural networks
CN110727779A (en) Question-answering method and system based on multi-model fusion
US11055327B2 (en) Unstructured data parsing for structured information
US20050198563A1 (en) Assisted form filling
CN109598517B (en) Commodity clearance processing, object processing and category prediction method and device thereof
CN110825875A (en) Text entity type identification method and device, electronic equipment and storage medium
CN111198948A (en) Text classification correction method, device and equipment and computer readable storage medium
CN111191275A (en) Sensitive data identification method, system and device
CN112036184A (en) Entity identification method, device, computer device and storage medium based on BilSTM network model and CRF model
CN110580308A (en) information auditing method and device, electronic equipment and storage medium
CN113377936A (en) Intelligent question and answer method, device and equipment
CN115935344A (en) Abnormal equipment identification method and device and electronic equipment
CN112966102A (en) Classification model construction and text sentence classification method, equipment and storage medium
CN111782793A (en) Intelligent customer service processing method, system and equipment
CN114691525A (en) Test case selection method and device
US20230084845A1 (en) Entry detection and recognition for custom forms
CN114741494A (en) Question answering method, device, equipment and medium
CN114842982B (en) Knowledge expression method, device and system for medical information system
CN112883735B (en) Method, device, equipment and storage medium for structured processing of form image
CN115062126A (en) Statement analysis method and device, electronic equipment and readable storage medium
CN111881294B (en) Corpus labeling system, corpus labeling method and storage medium
CN114065762A (en) Text information processing method, device, medium and equipment
CN113128231A (en) Data quality inspection method and device, storage medium and electronic equipment
CN114338058A (en) Information processing method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination