CN117312339A - Question bank updating method and device, electronic equipment and storage medium - Google Patents

Question bank updating method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117312339A
CN117312339A CN202311167617.8A CN202311167617A CN117312339A CN 117312339 A CN117312339 A CN 117312339A CN 202311167617 A CN202311167617 A CN 202311167617A CN 117312339 A CN117312339 A CN 117312339A
Authority
CN
China
Prior art keywords
query
sentences
target
sentence
test question
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311167617.8A
Other languages
Chinese (zh)
Inventor
张魏斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202311167617.8A priority Critical patent/CN117312339A/en
Publication of CN117312339A publication Critical patent/CN117312339A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a question bank updating method, a question bank updating device, electronic equipment and a storage medium, relates to the technical field of computers, and particularly relates to the technical field of artificial intelligence such as natural language processing, deep learning and large models. The scheme is as follows: obtaining a plurality of query sentences with empty recall results from search logs associated with a question bank, slicing each query sentence, obtaining a signature value of each slice, marking the query sentences according to the signature values corresponding to the slices at the same position of the query sentences to obtain marked query sentences, performing de-duplication processing on the marked query sentences according to text similarity between every two marked query sentences to obtain target query sentences, generating test questions corresponding to each target query sentence, and updating the question bank based on the generated test questions and the target query sentences. Therefore, the query statement with the recall result being empty is de-duplicated, and the corresponding test questions are generated, so that the update of the question bank is realized, the resource waste is avoided, and the cost is reduced.

Description

Question bank updating method and device, electronic equipment and storage medium
Technical Field
The disclosure relates to the technical field of computers, in particular to the technical field of artificial intelligence such as natural language processing, deep learning, large models and the like, and specifically relates to a question bank updating method, a device, electronic equipment and a storage medium.
Background
Currently, in the related art, questions required by a user can be obtained by inquiring from an online question bank based on user retrieval. However, there are cases where retrieval does not recall results due to the limited scale of the online test question library.
Disclosure of Invention
The present disclosure aims to solve, at least to some extent, one of the technical problems in the related art.
An embodiment of a first aspect of the present disclosure provides a method for updating a question bank, including:
acquiring a plurality of query sentences with empty recall results from the search logs associated with the question bank;
slicing each inquiry statement, and acquiring a signature value of each slice;
marking the plurality of query sentences according to signature values corresponding to the slices at the same position of the plurality of query sentences to obtain marked query sentences;
performing de-duplication processing on the marked query sentences according to the text similarity between every two marked query sentences to obtain target query sentences;
Generating test questions corresponding to each target query statement, and updating the question bank based on the generated test questions and the target query statement.
An embodiment of a second aspect of the present disclosure provides a question bank updating apparatus, including:
the acquisition module is used for acquiring a plurality of query sentences with empty recall results from the search logs associated with the question bank;
the slicing module is used for slicing each inquiry statement and acquiring a signature value of each slice;
the marking module is used for marking the plurality of inquiry sentences according to the signature values corresponding to the same position slices of the plurality of inquiry sentences so as to obtain marked inquiry sentences;
the de-duplication module is used for performing de-duplication processing on the marked query sentences according to the text similarity between every two marked query sentences so as to obtain target query sentences;
and the processing module is used for generating test questions corresponding to each target query statement and updating the question bank based on the generated test questions and the target query statement.
Embodiments of a third aspect of the present disclosure provide a computer device comprising: the system comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the question bank updating method according to the embodiment of the first aspect of the disclosure when executing the program.
An embodiment of a fourth aspect of the present disclosure proposes a computer readable storage medium storing a computer program, which when executed by a processor, implements a question bank updating method as proposed by an embodiment of the first aspect of the present disclosure.
An embodiment of a fifth aspect of the present disclosure proposes a computer program product comprising a computer program which, when executed by a processor, implements a question bank updating method as proposed by an embodiment of the first aspect of the present disclosure.
The method, the device, the computer equipment and the storage medium for updating the question bank provided by the disclosure comprise the following steps of
The beneficial effects are that:
in the embodiment of the disclosure, a plurality of query sentences with empty recall results are firstly obtained from search logs associated with a question bank, each query sentence is sliced, a signature value of each slice is obtained, then the query sentences are marked according to the signature values corresponding to the slices at the same position of the query sentences to obtain marked query sentences, then the marked query sentences are subjected to de-duplication processing according to text similarity between every two marked query sentences to obtain target query sentences, finally test questions corresponding to each target query sentence are generated, and the question bank is updated based on the generated test questions and the target query sentences. Therefore, the repeated query sentences with empty recall results are repeated for a plurality of times, and then the corresponding test questions are generated, so that the update of the question bank is realized, the resource waste is avoided, and the cost is reduced.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a flowchart of a method for updating a question bank according to an embodiment of the disclosure;
FIG. 2 is a flowchart of a method for updating a question bank according to an embodiment of the disclosure;
FIG. 3 is a schematic diagram of a device for updating a question bank according to an embodiment of the disclosure;
fig. 4 illustrates a block diagram of an exemplary computer device suitable for use in implementing embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The present disclosure relates to the field of artificial intelligence techniques for natural language processing, large models, and the like.
Artificial intelligence (Artificial Intelligence), english is abbreviated AI. It is a new technical science for researching, developing theory, method, technology and application system for simulating, extending and expanding human intelligence.
Natural language processing (Natural Language Processing, NLP) is an interdisciplinary in the fields of computer science, artificial intelligence, and linguistics, primarily studying how to enable computers to understand, process, generate, and simulate the ability of human language, thereby enabling natural conversations with humans.
Deep learning is the inherent regularity and presentation hierarchy of learning sample data, and the information obtained during such learning is helpful in interpreting data such as text, images and sounds. The final goal of deep learning is to enable a machine to analyze learning capabilities like a person, and to recognize text, images, and sound data.
The large Model can also be called a Foundation Model, and knowledge extraction is carried out on the Model through a hundred million-level corpus or image, so that the large Model with hundred million-level parameters is produced through learning.
In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.
The following describes a method, an apparatus, an electronic device, and a storage medium for updating a question bank according to an embodiment of the present disclosure with reference to the accompanying drawings.
Fig. 1 is a flowchart of a method for updating a question bank according to an embodiment of the disclosure.
As shown in fig. 1, the question bank updating method may include the steps of:
step 101, obtaining a plurality of query sentences with empty recall results from the search logs associated with the question bank.
The query sentence is a sentence used by the user in searching, and may be a sentence or a phrase, for example, the query sentence may be "what is the pythagorean theorem", or "pythagorean theorem", and the disclosure is not limited thereto.
In some possible implementations, the search logs in the screening period associated with the question bank can be screened in each screening period to obtain a plurality of query sentences with empty recall results, thereby providing conditions for expanding the scale of the question bank.
The screening period is a period preset for screening query sentences in the search log, for example, the screening period may be one day, one week, or the like, which is not limited in the disclosure.
Step 102, slicing each query statement, and obtaining a signature value of each slice.
In the present disclosure, each query statement may be sliced according to a preset unit number, and then each slice may be signed. The preset number of units may be preset, for example, each 3 characters may be sliced as a group, each 4 words may be sliced as a group, or the like, which is not limited in the present disclosure.
For example, the query statement is "what is the Pythagorean theorem," and every 4 words are specified to be sliced, then the slicing result is "Pythagorean theorem", "femoral theorem is", "theorem is why", "what is why", and then each slice can be signed.
And 103, marking the plurality of query sentences according to the signature values corresponding to the slices at the same position of the plurality of query sentences to obtain marked query sentences.
In the present disclosure, after determining the signature value of each slice of each query statement, a plurality of query statements may be first deduplicated based on the signature value of the same location.
The query statement may be marked as a non-repeated query statement to be screened. Such as associating non-duplicate query statements with preset tags, etc., which is not limiting in this disclosure.
In some possible implementations, any of the plurality of query statements is marked if the signature values corresponding to at least one co-located slice of the plurality of query statements are the same.
In the present disclosure, when signature values corresponding to at least one co-located slice of a plurality of query sentences are the same, it is considered that the similarity of the plurality of query sentences is high, and at this time, any one of the plurality of query sentences may be selected for tagging in order to avoid generating repeated test questions. That is, when the similarity degree of the plurality of query sentences is high, the plurality of query sentences can be de-duplicated, and test question generation is performed only based on one marked query sentence, so that repeated test questions are prevented from being generated, and resource waste is avoided.
In some possible implementations, the plurality of query statements are labeled separately, where the signature values corresponding to each of the position slices of the plurality of query statements are different.
In the disclosure, under the condition that signature values corresponding to each position slice of the plurality of query sentences are different, the plurality of query sentences can be considered to be dissimilar, and at this time, corresponding test questions may need to be generated for each query sentence in the plurality of query sentences, so that the plurality of query sentences need to be marked respectively, thereby improving the efficiency of generating the test questions.
In some possible implementation forms, in order to reduce the data processing amount in the query statement deduplication process as much as possible, after obtaining a plurality of query statements, each query statement may be first matched with each type of test question template to determine a test question type corresponding to each query statement, then the plurality of query statements are classified according to the test question type corresponding to each query statement to determine a query statement set associated with each test question type, and finally the plurality of query statements in each set are marked according to signature values corresponding to the same position slices of the plurality of query statements in each set.
The test question templates are preset templates for identifying test question types corresponding to query sentences. For example, a test question template including an operator (an operation type test question template), or a test question template not including an operator type, such as a "knowledge point" type test question template, or the like, is not limited in this disclosure.
The test question type is the type of the test question corresponding to the query statement. For example, when the query sentence is "1+1 is equal to several", since the operation symbol "+" is included therein, it can be determined that the corresponding question type is "operation class", and when the query sentence is "who is the creator of the map on the Qingming festival", since the operation symbol is not included therein, it can be determined that the corresponding question type is "non-operation class", or "knowledge point class", etc., which is not limited in the present disclosure.
The query sentence set is a set composed of query sentences corresponding to the same test question type.
And 104, performing de-duplication processing on the marked query sentences according to the text similarity between every two marked query sentences to obtain target query sentences.
In the method, after the duplication removal (marking) is performed on the plurality of query sentences based on the signature values corresponding to the slices at the same position, the duplication removal processing is performed on the marked query sentences again based on the text similarity among the marked query sentences, so that the generation of repeated test questions is further avoided, and the resource waste is caused.
It should be noted that, the text similarity between each two labeled query sentences may be obtained by calculating the editing distance between each two labeled query sentences, or the text similarity may also be determined by other forms, which is not limited in this disclosure.
In some possible implementations, in the case that the text similarity between the first tagged query statement and the at least one second tagged query statement is greater than the second threshold, determining any one of the first tagged query statement and the at least one second tagged query statement as the target query statement.
The second threshold is a preset critical value for representing the text similarity between query sentences, for example, the second threshold may be 0.9, which is not limited in the disclosure.
The target query sentence is a final query sentence used for generating the corresponding test question.
In the present disclosure, when the text similarity between the first tagged query term and at least one second tagged query term is greater than the second threshold, the first tagged query term and the at least one second tagged query term may be considered as query terms for the same problem or knowledge point, and at this time, any term may be selected from these terms to be determined as the target query term.
In some possible implementations, the third tagged query statement is determined to be the target query statement if the text similarity between the third tagged query statement and each of the other tagged query statements is less than or equal to a second threshold.
In the present disclosure, when the text similarity between the third labeled query term and each other labeled query term is less than or equal to the second threshold, the third labeled query term may be considered to be dissimilar to each other labeled query term. That is, the question or the knowledge point aimed by the third labeled query term is different from that of other query terms, and at this time, the third labeled query term may be directly determined as a target query term.
And 105, generating test questions corresponding to each target query statement, and updating a question bank based on the generated test questions and the target query statement.
In some possible implementation forms, the test question generation mode corresponding to each target query statement may be determined according to the test question type corresponding to each target query statement, and then the test question corresponding to each target query statement is generated according to the test question generation mode, so that the accuracy and reliability of the generated test question are ensured.
It should be noted that, the types of the questions corresponding to the target query statement are different, and thus the determined question generation modes corresponding to the target query statement may also be different.
In the method, when the fact that the corresponding test questions are generated for the query sentences with empty recall results is considered, resources are wasted, and a large number of repeated test questions possibly appear in the question bank, so that after the query sentences with empty recall results are obtained, the query sentences are subjected to de-duplication first, and then the test questions are generated, the scale of the question bank is expanded, and the resource waste is avoided.
In the embodiment of the disclosure, a plurality of query sentences with empty recall results are firstly obtained from search logs associated with a question bank, each query sentence is sliced, a signature value of each slice is obtained, then the query sentences are marked according to the signature values corresponding to the slices at the same position of the query sentences to obtain marked query sentences, then the marked query sentences are subjected to de-duplication processing according to text similarity between every two marked query sentences to obtain target query sentences, finally test questions corresponding to each target query sentence are generated, and the question bank is updated based on the generated test questions and the target query sentences. Therefore, the repeated query sentences with empty recall results are repeated for a plurality of times, and then the corresponding test questions are generated, so that the update of the question bank is realized, the resource waste is avoided, and the cost is reduced.
Fig. 2 is a flowchart of a method for updating a question bank according to an embodiment of the disclosure.
As shown in fig. 2, the question bank updating method may include the steps of:
step 201, obtaining a plurality of query sentences with empty recall results from the search logs associated with the question bank.
Step 202, slicing each query statement, and obtaining a signature value of each slice.
And 203, marking the plurality of query sentences according to the signature values corresponding to the same position slices of the plurality of query sentences to obtain marked query sentences.
And 204, performing de-duplication processing on the marked query sentences according to the text similarity between every two marked query sentences to obtain target query sentences.
The specific implementation manner of step 201 to step 204 may refer to the detailed descriptions in other embodiments of the disclosure, and will not be described in detail herein.
Step 205, determining a test question generation mode corresponding to each target query statement according to the test question type corresponding to each target query statement.
In the present disclosure, after the target query statement is obtained, the test question generation mode corresponding to each target query statement may be determined based on the test question type corresponding to each target query statement.
In some possible implementations, in a case where the test question type corresponding to the first target query statement is an operation class, determining the test question generation mode corresponding to the first target query statement includes: and respectively determining the operation result of the operation formula in any target query statement by using the operation tool and the first model, and generating a test question corresponding to the first target query statement based on the operation result and the operation formula under the condition that the operation results are consistent.
The operation tool is used for calculating an expression in the first target query statement. For example, the operational tool may be a mathematical model (Large Language Model Math, LLM-Math) based on a large language model, etc., which is not limited by the present disclosure.
The first model is a model for processing an expression in the first target query statement, for example, the first model may be a generating Pre-Training (GPT) model, and the disclosure is not limited thereto.
When the first model is a GPT model, a temperature value (temperature) parameter in the GPT model needs to be set to 0 when processing an expression in the first target query statement, so as to ensure accuracy of a calculation result.
In the present disclosure, when the operation result corresponding to the operation tool is identical to the operation result corresponding to the first model, the operation result of the operation tool and the first model may be considered as an answer to the test question corresponding to the first target query sentence, and at this time, the test question corresponding to the first target query sentence may be generated based on the operation result and the expression. Therefore, the accuracy of the generated operation test questions and the reliability of answers are ensured.
In some possible implementations, in a case where the test question type corresponding to the second target query statement is a non-operation class, determining the test question generation mode corresponding to the second target query statement includes: inputting the second target query sentence into the second model to obtain a first answer, inputting the first answer and the second target query sentence into the third model to determine the confidence coefficient of the first answer, and generating a test question corresponding to the second target query sentence based on the second target sentence and the first answer under the condition that the confidence coefficient of the first answer is higher than a first threshold value.
The second model is a model for generating an answer of a test question corresponding to the second target query statement, and may be the same model as the first model, or may not be the same model as the first model, for example, the second model may be a GPT model, or other models, etc., which is not limited in this disclosure.
The first answer is an answer of a test question corresponding to the generated second target query statement based on the second model.
The third model is used for verifying the association of the first answer and the test question corresponding to the second target query statement, and is not the same model as the second model. For example, when the second model is a GPT model, the third model will be other than the GPT model, which is not limited by the present disclosure.
If the second model is identical to the first model, the third model and the first model are not identical.
The confidence level is the confidence level of the answer of the test question corresponding to the second target query statement.
The first threshold may be a preset threshold of the confidence of the first answer, for example, the first threshold may be 0.95, etc., which is not limited in this disclosure.
In the present disclosure, when the confidence level of the first answer is higher than the first threshold, the confidence level of the answer of the test question corresponding to the second target query statement may be considered to be high. That is, the first answer may be determined as an answer to the test question corresponding to the second target query sentence, and at this time, the test question corresponding to the second target query sentence may be generated based on the second target sentence and the first answer. Therefore, the accuracy of generating the non-operation test questions and the reliability of answers are ensured.
And 206, generating test questions corresponding to each target query statement according to the test question generation mode, and updating a question bank based on the generated test questions and the target query statement.
The specific implementation manner of step 206 may refer to the detailed description of other embodiments in this disclosure, and will not be described herein in detail.
In the embodiment of the disclosure, a plurality of queries with empty recall results are firstly obtained from search logs associated with a question bankStatement Slicing each query statement, acquiring a signature value of each slice, marking the query statements according to the signature values corresponding to the slices at the same position of the query statements, acquiring marked query statements, performing de-duplication processing on the marked query statements according to text similarity between every two marked query statements, acquiring target query statements, determining a test question generation mode corresponding to each target query statement according to the test question type corresponding to each target query statement, generating test questions corresponding to each target query statement according to the test question generation mode, and updating a question bank based on the generated test questions and the target query statements. Therefore, after repeated elimination is carried out on a plurality of query sentences with empty recall results, the corresponding test questions are generated by determining the test question generation mode corresponding to the query sentences based on the test question types corresponding to the query sentences after repeated elimination, so that the update of the question bank is realized, the resource waste is avoided, and the reliability and the accuracy of the generated test questions are improved.
In order to implement the above embodiment, the disclosure further provides a device for updating a question bank.
Fig. 3 is a schematic structural diagram of a question bank updating apparatus according to an embodiment of the disclosure.
As shown in fig. 3, the question bank updating apparatus 300 includes: an acquisition module 301, a slicing module 302, a marking module 303, a deduplication module 304, and a processing module 305.
An obtaining module 301, configured to obtain a plurality of query sentences with empty recall results from the search logs associated with the question bank;
the slicing module 302 is configured to slice each query statement and obtain a signature value of each slice;
a marking module 303, configured to mark a plurality of query sentences according to signature values corresponding to the same position slices of the plurality of query sentences, so as to obtain marked query sentences;
the deduplication module 304 is configured to perform deduplication processing on the labeled query statement according to the text similarity between every two labeled query statements, so as to obtain a target query statement;
the processing module 305 is configured to generate a test question corresponding to each target query statement, and update a question bank based on the generated test question and the target query statement.
In one possible implementation of the present disclosure, the marking module 303 is specifically configured to:
When the signature values corresponding to at least one of the identical position slices of the plurality of query sentences are identical, any one of the plurality of query sentences is marked.
In one possible implementation of the present disclosure, the marking module 303 is specifically configured to:
and marking the plurality of query sentences respectively under the condition that signature values corresponding to the position slices of the plurality of query sentences are different.
In one possible implementation of the present disclosure, the marking module 303 is specifically configured to:
matching each query statement with each type of test question template to determine the test question type corresponding to each query statement;
classifying the plurality of query sentences according to the test question types corresponding to each query sentence to determine a query sentence set associated with each test question type;
and marking the plurality of query sentences in each set according to the signature values corresponding to the same position slices of the plurality of query sentences in each set.
In one possible implementation manner of the present disclosure, the processing module 305 is specifically configured to:
determining a test question generation mode corresponding to each target query statement according to the test question type corresponding to each target query statement;
And generating test questions corresponding to each target query statement according to the test question generation mode.
In one possible implementation manner of the present disclosure, the processing module 305 is specifically configured to:
under the condition that the test question type corresponding to the first target query statement is an operation class, determining the test question generation mode corresponding to the first target query statement comprises: and respectively determining the operation result of the operation formula in any target query statement by using the operation tool and the first model, and generating a test question corresponding to the first target query statement based on the operation result and the operation formula under the condition that the operation results are consistent.
In one possible implementation manner of the present disclosure, the processing module 305 is specifically configured to:
and under the condition that the test question type corresponding to the second target query statement is a non-operation type, determining the test question generation mode corresponding to the second target query statement comprises the following steps: inputting the second target query sentence into the second model to obtain a first answer, inputting the first answer and the second target query sentence into the third model to determine the confidence coefficient of the first answer, and generating a test question corresponding to the second target query sentence based on the second target sentence and the first answer under the condition that the confidence coefficient of the first answer is higher than a first threshold value.
In one possible implementation manner of the present disclosure, the acquiring module 301 is specifically configured to:
and screening search logs in the screening period associated with the question bank in each screening period to obtain a plurality of query sentences with empty recall results.
In one possible implementation manner of the present disclosure, the foregoing deduplication module 304 is specifically configured to:
determining any one of the first marked query sentence and the at least one second marked query sentence as a target query sentence under the condition that the text similarity between the first marked query sentence and the at least one second marked query sentence is larger than a second threshold value; or,
and under the condition that the text similarity between the third marked query sentence and each other marked query sentence is smaller than or equal to a second threshold value, determining the third marked query sentence as a target query sentence.
The functions and specific implementation principles of the foregoing modules in the embodiments of the present disclosure may refer to the foregoing method embodiments, and are not repeated herein.
In the embodiment of the disclosure, a plurality of query sentences with empty recall results are firstly obtained from search logs associated with a question bank, each query sentence is sliced, a signature value of each slice is obtained, then the query sentences are marked according to the signature values corresponding to the slices at the same position of the query sentences to obtain marked query sentences, then the marked query sentences are subjected to de-duplication processing according to text similarity between every two marked query sentences to obtain target query sentences, finally test questions corresponding to each target query sentence are generated, and the question bank is updated based on the generated test questions and the target query sentences. Therefore, the repeated query sentences with empty recall results are repeated for a plurality of times, and then the corresponding test questions are generated, so that the update of the question bank is realized, the resource waste is avoided, and the cost is reduced.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 4 illustrates a schematic block diagram of an example electronic device 400 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 4, the apparatus 400 includes a computing unit 401 that can perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 402 or a computer program loaded from a storage unit 408 into a Random Access Memory (RAM) 403. In RAM 403, various programs and data required for the operation of device 400 may also be stored. The computing unit 401, ROM 402, and RAM 403 are connected to each other by a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
Various components in device 400 are connected to I/O interface 405, including: an input unit 406 such as a keyboard, a mouse, etc.; an output unit 407 such as various types of displays, speakers, and the like; a storage unit 408, such as a magnetic disk, optical disk, etc.; and a communication unit 409 such as a network card, modem, wireless communication transceiver, etc. The communication unit 409 allows the device 400 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The computing unit 401 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 401 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 401 performs the respective methods and processes described above, such as the question bank updating method. For example, in some embodiments, the question bank update method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 408. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 400 via the ROM402 and/or the communication unit 409. When the computer program is loaded into RAM 403 and executed by computing unit 401, one or more steps of the question bank updating method described above may be performed. Alternatively, in other embodiments, the computing unit 401 may be configured to perform the question bank update method in any other suitable way (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.
In the embodiment of the disclosure, a plurality of query sentences with empty recall results are firstly obtained from search logs associated with a question bank, each query sentence is sliced, a signature value of each slice is obtained, then the query sentences are marked according to the signature values corresponding to the slices at the same position of the query sentences to obtain marked query sentences, then the marked query sentences are subjected to de-duplication processing according to text similarity between every two marked query sentences to obtain target query sentences, finally test questions corresponding to each target query sentence are generated, and the question bank is updated based on the generated test questions and the target query sentences. Therefore, the repeated query sentences with empty recall results are repeated for a plurality of times, and then the corresponding test questions are generated, so that the update of the question bank is realized, the resource waste is avoided, and the cost is reduced.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present disclosure, the meaning of "a plurality" is at least two, such as two, three, etc., unless explicitly specified otherwise. In the description of the present disclosure, the words "if" and "if" are used to be interpreted as "at … …" or "at … …" or "in response to a determination" or "in the … … case".
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (21)

1. A method for updating a question bank, comprising:
acquiring a plurality of query sentences with empty recall results from the search logs associated with the question bank;
Slicing each inquiry statement, and acquiring a signature value of each slice;
marking the plurality of query sentences according to signature values corresponding to the slices at the same position of the plurality of query sentences to obtain marked query sentences;
performing de-duplication processing on the marked query sentences according to the text similarity between every two marked query sentences to obtain target query sentences;
generating test questions corresponding to each target query statement, and updating the question bank based on the generated test questions and the target query statement.
2. The method of claim 1, wherein the tagging the plurality of query statements according to signature values corresponding to co-located slices of the plurality of query statements comprises:
and marking any one of the plurality of query sentences when the signature values corresponding to the slices at the same position of at least one of the plurality of query sentences are the same.
3. The method of claim 1, wherein the tagging the plurality of query statements according to signature values corresponding to co-located slices of the plurality of query statements comprises:
And marking the plurality of query sentences respectively under the condition that signature values corresponding to the position slices of the plurality of query sentences are different.
4. The method of claim 1, wherein the tagging the plurality of query statements according to signature values corresponding to co-located slices of the plurality of query statements comprises:
matching each inquiry statement with each type of test question template to determine the test question type corresponding to each inquiry statement;
classifying the plurality of query sentences according to the test question types corresponding to each query sentence to determine a query sentence set associated with each test question type;
and marking the plurality of query sentences in each set according to the signature values corresponding to the same position slices of the plurality of query sentences in each set.
5. The method of claim 1, wherein the generating the test question corresponding to each of the target query sentences comprises:
determining a test question generation mode corresponding to each target query statement according to the test question type corresponding to each target query statement;
and generating test questions corresponding to each target query statement according to the test question generation mode.
6. The method of claim 5, wherein the determining the test question generation mode corresponding to each target query statement according to the test question type corresponding to each target query statement comprises:
under the condition that the test question type corresponding to the first target query statement is an operation class, determining the test question generation mode corresponding to the first target query statement comprises: and respectively determining an operation result of an operation formula in any target query statement by using an operation tool and a first model, and generating a test question corresponding to the first target query statement based on the operation result and the operation formula under the condition that the operation results are consistent.
7. The method of claim 5, wherein the determining the test question generation mode corresponding to each target query statement according to the test question type corresponding to each target query statement comprises:
and under the condition that the test question type corresponding to the second target query statement is a non-operation type, determining the test question generation mode corresponding to the second target query statement comprises the following steps: inputting the second target query sentence into a second model to obtain a first answer, inputting the first answer and the second target query sentence into a third model to determine the confidence degree of the first answer, and generating a test question corresponding to the second target query sentence based on the second target sentence and the first answer under the condition that the confidence degree of the first answer is higher than a first threshold value.
8. The method of any of claims 1-7, wherein the obtaining a plurality of query statements with empty recall results from the search log associated with the question bank comprises:
and screening search logs in the screening period associated with the question bank in each screening period to obtain a plurality of query sentences with empty recall results.
9. The method as claimed in any one of claims 1 to 7, wherein said performing a deduplication process on the tagged query term according to a text similarity between every two tagged query terms to obtain a target query term includes:
determining any one of the first marked query sentence and the at least one second marked query sentence as a target query sentence under the condition that the text similarity between the first marked query sentence and the at least one second marked query sentence is larger than a second threshold value; or,
and under the condition that the text similarity between the third marked query sentence and each other marked query sentence is smaller than or equal to the second threshold value, determining the third marked query sentence as a target query sentence.
10. A question bank updating apparatus comprising:
The acquisition module is used for acquiring a plurality of query sentences with empty recall results from the search logs associated with the question bank;
the slicing module is used for slicing each inquiry statement and acquiring a signature value of each slice;
the marking module is used for marking the plurality of inquiry sentences according to the signature values corresponding to the same position slices of the plurality of inquiry sentences so as to obtain marked inquiry sentences;
the de-duplication module is used for performing de-duplication processing on the marked query sentences according to the text similarity between every two marked query sentences so as to obtain target query sentences;
and the processing module is used for generating test questions corresponding to each target query statement and updating the question bank based on the generated test questions and the target query statement.
11. The apparatus of claim 10, wherein the marking module is specifically configured to:
and marking any one of the plurality of query sentences when the signature values corresponding to the slices at the same position of at least one of the plurality of query sentences are the same.
12. The apparatus of claim 10, wherein the marking module is specifically configured to:
And marking the plurality of query sentences respectively under the condition that signature values corresponding to the position slices of the plurality of query sentences are different.
13. The apparatus of claim 10, wherein the marking module is specifically configured to:
matching each inquiry statement with each type of test question template to determine the test question type corresponding to each inquiry statement;
classifying the plurality of query sentences according to the test question types corresponding to each query sentence to determine a query sentence set associated with each test question type;
and marking the plurality of query sentences in each set according to the signature values corresponding to the same position slices of the plurality of query sentences in each set.
14. The apparatus of claim 10, wherein the processing module is specifically configured to:
determining a test question generation mode corresponding to each target query statement according to the test question type corresponding to each target query statement;
and generating test questions corresponding to each target query statement according to the test question generation mode.
15. The apparatus of claim 14, wherein the processing module is further to:
under the condition that the test question type corresponding to the first target query statement is an operation class, determining the test question generation mode corresponding to the first target query statement comprises: and respectively determining an operation result of an operation formula in any target query statement by using an operation tool and a first model, and generating a test question corresponding to the first target query statement based on the operation result and the operation formula under the condition that the operation results are consistent.
16. The apparatus of claim 14, wherein the processing module is further to:
and under the condition that the test question type corresponding to the second target query statement is a non-operation type, determining the test question generation mode corresponding to the second target query statement comprises the following steps: inputting the second target query sentence into a second model to obtain a first answer, inputting the first answer and the second target query sentence into a third model to determine the confidence degree of the first answer, and generating a test question corresponding to the second target query sentence based on the second target sentence and the first answer under the condition that the confidence degree of the first answer is higher than a first threshold value.
17. The apparatus according to any one of claims 10-16, wherein the acquisition module is specifically configured to:
and screening search logs in the screening period associated with the question bank in each screening period to obtain a plurality of query sentences with empty recall results.
18. The apparatus according to any of claims 10-16, wherein the deduplication module is specifically configured to:
determining any one of the first marked query sentence and the at least one second marked query sentence as a target query sentence under the condition that the text similarity between the first marked query sentence and the at least one second marked query sentence is larger than a second threshold value; or,
And under the condition that the text similarity between the third marked query sentence and each other marked query sentence is smaller than or equal to the second threshold value, determining the third marked query sentence as a target query sentence.
19. An electronic device, comprising:
at least one processor;
and a memory communicatively coupled to the at least one processor;
wherein the memory stores instructions that are likely to be executed by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.
20. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-9.
21. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-9.
CN202311167617.8A 2023-09-11 2023-09-11 Question bank updating method and device, electronic equipment and storage medium Pending CN117312339A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311167617.8A CN117312339A (en) 2023-09-11 2023-09-11 Question bank updating method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311167617.8A CN117312339A (en) 2023-09-11 2023-09-11 Question bank updating method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117312339A true CN117312339A (en) 2023-12-29

Family

ID=89259456

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311167617.8A Pending CN117312339A (en) 2023-09-11 2023-09-11 Question bank updating method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117312339A (en)

Similar Documents

Publication Publication Date Title
CN107301170B (en) Method and device for segmenting sentences based on artificial intelligence
CN113361578B (en) Training method and device for image processing model, electronic equipment and storage medium
CN108228567B (en) Method and device for extracting short names of organizations
CN113836314B (en) Knowledge graph construction method, device, equipment and storage medium
EP3992814A2 (en) Method and apparatus for generating user interest profile, electronic device and storage medium
CN114722833A (en) Semantic classification method and device
CN110991183A (en) Method, device, equipment and storage medium for determining predicate of problem
CN117573507A (en) Test case generation method and device, electronic equipment and storage medium
CN112506864A (en) File retrieval method and device, electronic equipment and readable storage medium
CN117371428A (en) Text processing method and device based on large language model
CN117312140A (en) Method and device for generating test case, electronic equipment and storage medium
CN113641724B (en) Knowledge tag mining method and device, electronic equipment and storage medium
CN114118049B (en) Information acquisition method, device, electronic equipment and storage medium
CN116150394A (en) Knowledge extraction method, device, storage medium and equipment for knowledge graph
JP2023012541A (en) Question answering method, device, and electronic apparatus based on table
CN117312339A (en) Question bank updating method and device, electronic equipment and storage medium
CN114385829A (en) Knowledge graph creating method, device, equipment and storage medium
CN116069914B (en) Training data generation method, model training method and device
CN115168577B (en) Model updating method and device, electronic equipment and storage medium
CN114861062B (en) Information filtering method and device
CN114201607B (en) Information processing method and device
CN117435686A (en) Negative example sample construction method, commodity searching method, device and electronic equipment
CN114281981B (en) News brief report generation method and device and electronic equipment
CN117453722A (en) Method and device for acquiring information, electronic equipment and storage medium
CN117971698A (en) Test case generation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination