CN116383234A - Search statement generation method and device, computer equipment and storage medium - Google Patents
Search statement generation method and device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN116383234A CN116383234A CN202310312847.2A CN202310312847A CN116383234A CN 116383234 A CN116383234 A CN 116383234A CN 202310312847 A CN202310312847 A CN 202310312847A CN 116383234 A CN116383234 A CN 116383234A
- Authority
- CN
- China
- Prior art keywords
- search
- information
- target
- search statement
- preset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 77
- 238000012512 characterization method Methods 0.000 claims abstract description 55
- 238000012545 processing Methods 0.000 claims abstract description 54
- 238000000605 extraction Methods 0.000 claims abstract description 21
- 238000012549 training Methods 0.000 claims description 26
- 238000004590 computer program Methods 0.000 claims description 19
- 239000013598 vector Substances 0.000 claims description 19
- 230000011218 segmentation Effects 0.000 claims description 9
- 238000005516 engineering process Methods 0.000 abstract description 3
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 230000000875 corresponding effect Effects 0.000 description 45
- 230000006870 function Effects 0.000 description 19
- 238000010586 diagram Methods 0.000 description 16
- 238000012937 correction Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 230000007246 mechanism Effects 0.000 description 5
- 238000012795 verification Methods 0.000 description 5
- 230000014509 gene expression Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 3
- 230000003190 augmentative effect Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a method and apparatus for generating a search sentence, a computer device, and a storage medium. The method for generating the search statement comprises the steps of processing description information included in a search statement generation request sent by a first user terminal by utilizing a target text extraction model, and determining characterization information; determining corresponding field information from the data dictionary based on the characterization information and the target system identification determined by the characterization information; processing the field information by using the trained search statement generation model to obtain a pre-search statement; and determining the pre-search sentence as a target search sentence under the condition that the pre-search sentence meets the first preset condition. By using the embodiment of the specification, after receiving the search statement generation request, the automatic processing is performed on the search statement generation request to obtain the target search statement, so that the time length is shortened, the accuracy of the search statement is improved, and the user experience is improved.
Description
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a method and apparatus for generating a search sentence, a computer device, and a storage medium.
Background
As banking becomes more widespread, more and more data is stored, thereby presenting challenges to data queries. Currently, when the branches need to perform data query, a data query request is sent to a server of a headquarter, and the headquarter server sends the data query request to a user terminal of the headquarter. And determining the characterization information and the target system identification corresponding to the target system of the queried data storage by the operators of the headquarters according to the data query request. And writing a search statement based on the characterization information and the target system identification. Thus, a lot of time is spent in data query, and an operator has a high error rate in writing a search sentence, resulting in poor user experience.
How to shorten the time used in the data query process and improve the accuracy of search sentences so as to improve the user experience is a problem to be solved in the prior art.
Disclosure of Invention
In order to solve the problems in the prior art, the embodiment of the specification provides a method, a device, computer equipment and a storage medium for generating search sentences, which automatically processes the search sentence generation request after receiving the search sentence generation request to obtain target search sentences, shortens the time used, and improves the accuracy of the search sentences, thereby improving the user body.
In order to solve the technical problems, the specific technical scheme in the specification is as follows:
in one aspect, embodiments of the present disclosure provide a method for generating a search term, including,
processing description information included in a search statement generation request sent by a first user terminal by utilizing a target text extraction model, and determining characterization information;
determining corresponding field information from a data dictionary based on the characterization information and a target system identification determined by the characterization information;
processing the field information by using the trained search statement generation model to obtain a pre-search statement; and
and under the condition that the preset search statement meets the first preset condition, determining the preset search statement as a target search statement.
Further, the processing is performed on the description information included in the received search statement generation request by using the target text extraction model, and determining the characterization information further includes:
performing word segmentation processing on description information included in a received search statement generation request to obtain a plurality of sub-description information;
processing each piece of sub-description information by using an importance formula to obtain importance corresponding to each piece of sub-description information; and
And determining sub-description information corresponding to the importance degree meeting a second preset condition as the characterization information.
Further, the importance formulas further include,
tfidf i,j =tf i,j ×idf i
wherein the tfidf i,j Characterizing the importance, the tf i,j Characterizing the number of preset files comprising the sub-description information in a corpus, wherein the idf i And characterizing reverse file frequencies, wherein the reverse file frequencies are determined by the corpus and the sub-descriptive information.
Further, the determining of the reverse file frequency further includes:
wherein the idf i Characterizing reverse file frequency, wherein D characterizes the total number of preset files included in the corpus, and { j: t } i ∈d j Characterizing the number, t, of preset files in the corpus, including the sub-descriptive information i Characterizing the sub-descriptive information, C characterizing a constant, and d j And characterizing the preset file.
Further, after the field information is processed by the trained search term generation model to obtain a pre-search term, further comprising,
transmitting the pre-search sentence and the field information to a second user terminal under the condition that the pre-search sentence does not meet a first preset condition, so that the second user terminal determines a replacement search sentence based on the pre-search sentence and the field information; and
Updating the pre-search statement as the target search statement using the received replacement search statement sent by the second user terminal.
Further, the training process of the trained search statement generation model further comprises,
adding the field information into a sample field information set to obtain a target sample set;
adding the target search statement as a tag associated with the field information into a sample search statement set to obtain a target tag set;
processing the target sample set by using a preset search statement generation model to obtain a prediction search statement set; and
training the preset search statement generation model based on the difference between the predicted search statement included in the predicted search statement set and the target label included in the target label set to obtain the trained search statement generation model.
Further, the first preset condition further includes,
determining target search information corresponding to a preset grammar structure in the pre-search statement;
extracting features of the target search information, and determining a search feature vector;
determining a corresponding target risk feature vector from a risk feature library according to the field information; and
And under the condition that the similarity between the search feature vector and the target risk feature vector is smaller than a preset threshold value, determining that the pre-search statement meets the first preset condition.
Further, after determining the target search term, further comprising,
operating the target search statement to obtain data information;
determining a readable rule based on a service identifier included in the search statement generation request; and
and splitting the data information based on the readable rule to obtain target data information.
On the other hand, the embodiment of the specification also provides a search statement generating device, which comprises,
a first processing unit for processing description information included in a search sentence generation request transmitted by a first user terminal by using a target text extraction model, and determining characterization information;
a first determining unit, configured to determine corresponding field information from a data dictionary based on the characterization information and a target system identifier determined by the characterization information;
the second processing unit is used for processing the field information by utilizing the trained search statement generation model to obtain a pre-search statement; and
And the second determining unit is used for determining the pre-search statement as a target search statement under the condition that the pre-search statement meets the first preset condition.
Further, the method also comprises the steps of,
a transmitting unit configured to transmit the pre-search sentence and the field information to a second user terminal in a case where it is determined that the pre-search sentence does not satisfy a first preset condition, so that the second user terminal determines a replacement search sentence based on the pre-search sentence and the field information; and
and an updating unit for updating the pre-search sentence as the target search sentence by using the replacement search sentence received and transmitted by the second user terminal.
Further, the method also comprises the steps of,
the operation unit is used for operating the target search statement to obtain data information;
a third determining unit, configured to determine a readable rule based on a service identifier included in the search statement generation request; and
and the splitting unit is used for splitting the data information based on the readable rule to obtain target data information.
In another aspect, embodiments of the present disclosure further provide a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the method described above when executing the computer program.
In another aspect, embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer instructions that, when executed by a processor, perform the above-described method.
In another aspect, the present description embodiments also provide a computer program product comprising a computer program/instruction which, when executed by a processor, implements a method.
According to the embodiment of the specification, after a search statement generation request sent by a first user terminal is received, a target text extraction model is utilized to process the search statement generation request, so that characterization information is obtained; determining corresponding field information from the data dictionary based on the characterization information and the target system identification determined by the characterization information; processing the field information by using the trained search statement generation model to obtain a pre-search statement; and determining the pre-search sentence as a target search sentence under the condition that the pre-search sentence meets the first preset condition. Therefore, based on the target text extraction model, the data dictionary and the trained search statement generation model, the automatic processing of the search statement generation request is realized, the target search statement is obtained, the used time is shortened, the accuracy of the search statement is improved, and the user experience is improved.
Drawings
In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present description, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an implementation system of a search term generation method according to an embodiment of the present disclosure;
FIG. 2 is a flow chart of a method for generating search sentences according to an embodiment of the present disclosure;
FIG. 3 is a flowchart of a method for determining characterization information according to an embodiment of the present disclosure;
FIG. 4A is a flowchart of a method for generating search sentences according to another embodiment of the present disclosure;
FIG. 4B is a flowchart of a method for training a search term generation model according to an embodiment of the present disclosure;
FIG. 5 is a flowchart of a method for determining target data information according to an embodiment of the present disclosure;
fig. 6A is a schematic structural diagram of a search term generating device according to an embodiment of the present disclosure;
fig. 6B is a schematic structural diagram of a search term generating device according to another embodiment of the present disclosure;
Fig. 6C is a schematic structural diagram of a search term generating device according to another embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure.
[ reference numerals description ]
101. A first user terminal;
102. a server;
103. a second user terminal;
610. a first processing unit;
620. a first determination unit;
630. a second processing unit;
640. a second determination unit;
650. a transmitting unit;
660. an updating unit;
670. an operation unit;
680. a third determination unit;
690. splitting the unit;
702. a computer device;
704. a processing device;
706. storing the resource;
708. a driving mechanism;
710. an input/output module;
712. an input device;
714. an output device;
716. a presentation device;
718. a graphical user interface;
720. a network interface;
722. a communication link;
724. a communication bus.
Detailed Description
The technical solutions of the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is apparent that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
It should be noted that the terms "first," "second," and the like in the description and the claims of the specification and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the present description described herein may be capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or device that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or device.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.
Fig. 1 is a schematic diagram of an implementation system of a search term generation method according to an embodiment of the present disclosure, which may include: the first user terminal 101, the server 102, and the second user terminal, the first user terminal 101 and the second user terminal communicate with the server 102 via a network, respectively, which may include a local area network (Local Area Network, abbreviated as LAN), a wide area network (Wide Area Network, abbreviated as WAN), the internet, or a combination thereof, and is connected to a website, user devices (e.g., computing devices), and a backend system. The operator of the branch sends a search sentence generation request to the server 102 via the first user terminal 101. After receiving the search term generation request, the server 102 processes description information included in the search term generation request by using a target text extraction model, and determines characterization information; determining corresponding field information from the data dictionary based on the characterization information and the target system identification determined by the characterization information; processing the field information by using the trained search statement generation model to obtain a pre-search statement; and determining the pre-search sentence as a target search sentence under the condition that the pre-search sentence meets the first preset condition. Further, in the case where it is determined that the pre-search sentence does not satisfy the first preset condition, the pre-search sentence and the field information are transmitted to the second user terminal 103, and the operator of the head office receives the pre-search sentence and the field information through the second user terminal 103, composes a replacement search sentence based on the pre-search sentence and the field information, and transmits the replacement search sentence to the server 102. Upon receiving the replacement search term transmitted by the second user terminal 103, the server 102 updates the pre-search term as the target search term. The target search term may also be sent to the first user terminal 101, for example.
Further, the server 102 may run the target search statement to obtain data information; determining a readable rule based on a service identifier included in the search statement generation request; and splitting the data information based on the readable rule to obtain target data information, and transmitting the target data information to the first user terminal 101.
Alternatively, the servers 102 may be nodes of a cloud computing system (not shown), or each server 102 may be a separate cloud computing system, including multiple computers interconnected by a network and operating as a distributed processing system.
In an alternative embodiment, the first user terminal 101 and the second user terminal 103 may include electronic devices not limited to smart phones, acquisition devices, desktop computers, tablet computers, notebook computers, smart speakers, digital assistants, augmented Reality (AR, augmented Reality)/Virtual Reality (VR) devices, smart wearable devices, and the like. Alternatively, the operating system running on the electronic device may include, but is not limited to, an android system, an IOS system, linux, windows, and the like.
In addition, it should be noted that, fig. 1 is only one application environment provided in the present specification, and in practical application, a plurality of first user terminals 101 and second user terminals 103 may also be included, which is not limited in the present specification.
Fig. 2 is a flowchart of a search term generation method according to an embodiment of the present disclosure. The process of search statement generation is described in this figure, but may include more or fewer operational steps based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When a system or apparatus product in practice is executed, it may be executed sequentially or in parallel according to the method shown in the embodiments or the drawings. As shown in fig. 2, the method may include:
s210, processing description information included in a search statement generation request sent by a first user terminal by utilizing a target text extraction model, and determining characterization information;
s220, determining corresponding field information from a data dictionary based on the characterization information and the target system identification determined by the characterization information;
s230, processing field information by using the trained search statement generation model to obtain a pre-search statement;
s240, determining the pre-search sentence as a target search sentence under the condition that the pre-search sentence meets the first preset condition.
According to the embodiment of the specification, after a search statement generation request sent by a first user terminal is received, a target text extraction model is utilized to process the search statement generation request, so that characterization information is obtained; determining corresponding field information from the data dictionary based on the characterization information and the target system identification determined by the characterization information; processing the field information by using the trained search statement generation model to obtain a pre-search statement; and determining the pre-search sentence as a target search sentence under the condition that the pre-search sentence meets the first preset condition. Therefore, based on the target text extraction model, the data dictionary and the trained search statement generation model, the automatic processing of the search statement generation request is realized, the target search statement is obtained, the used time is shortened, the accuracy of the search statement is improved, and the user experience is improved.
According to one embodiment of the present specification, the target text extraction model may be, for example, a model that performs keyword extraction or recognition for a sentence composed of a plurality of words. The first user terminal may comprise, for example, a user terminal controlled by a branch operator. The search term generation request includes description information, and the search term generation request may further include a home subscriber identification of data corresponding to the description information. The description information may include, for example, the required scene description information and required field target data. For example, a request is generated for requesting user identification A in a search term, requesting data about service B. And after the target text extraction model processes the search statement generation request, B is obtained, and the B is used as characterization information.
And respectively configuring corresponding system identifiers aiming at each piece of preset characterization information. The system identification may, for example, be a database identification corresponding to the system. From the plurality of system identifications, a target system identification associated with the characterization information is determined.
When data is stored, storage location information and preset characterization information corresponding to the data to be stored are determined, the preset characterization information can be service identifiers corresponding to the data to be stored, the storage location information can be location identifiers corresponding to storage units in a database, the location identifiers can comprise preset system identifiers and storage unit identifiers, the preset system identifiers can be database identifiers, for example, and the storage unit identifiers can be identifiers corresponding to the storage units in the database, for example. The storage unit identifier to be described may be, for example, any identifier that can be combined with a search term to determine the data to be stored. And storing the storage position information and the preset characterization information in a data dictionary in a correlated manner, and storing the data to be stored in a storage unit corresponding to the storage position information.
And matching the target system identification with each preset system identification included in the data dictionary, and acquiring all preset characterization information associated with the target system identification under the condition that the target preset system identification is matched with the target system identification. And respectively matching the characterization information with each piece of preset characterization information, and taking a storage unit identifier associated with the target characterization information as field information to generate a search statement under the condition that the target preset characterization information is determined to be matched with the characterization information.
Training is performed on a preset search statement generation model in advance, so that a model which can be used for generating a prediction search statement based on field information is obtained. The preset search statement generation model may be, for example, a long-short-term memory recurrent neural network (LSTM) including random or preset parameters and super-parameters.
A sample field information set and a sample search statement set for training are obtained, the sample field information set comprises sample field information, the sample search statement set comprises sample search statements, each sample field information has a sample search statement corresponding to one sample field information, and the sample search statement is a label of the sample field information.
Processing each sample field information set in the sample field information set by using a preset search statement generation model to obtain a preset sample search statement set, wherein the prediction sample search statement set comprises a prediction sample search statement corresponding to each sample field information. Model training is carried out on the preset search statement generation model based on the difference between the sample search statement in the sample search statement set and the prediction sample search statement in the prediction sample search statement set until the difference between the sample search statement in the sample search statement set and the prediction sample search statement in the prediction sample search statement set meets the model training termination condition, and a trained search statement generation model is obtained.
Model training is specifically performed on the preset search statement generation model based on the difference between the sample search statement in the sample search statement set and the prediction sample search statement in the prediction sample search statement set, namely, the sample search statement and the prediction sample search statement corresponding to each sample field information are processed based on a preset loss function to obtain a loss function value, the loss function value is subjected to termination verification by using a model training termination condition, and when the loss function value verification is confirmed to pass, the model training termination condition is confirmed to be met by the difference between the sample search statement in the sample search statement set and the prediction sample search statement in the prediction sample search statement set. The model training termination condition may be, for example, that the loss function value is less than or equal to a preset loss function threshold, or may be a convergence of a series of loss function values. Furthermore, the model training termination condition may be, for example, a preset number of model iterations threshold.
And processing the obtained field information by using the trained search statement generation model to obtain a pre-search statement. Determining whether the pre-search statement has a running risk, determining that the pre-search statement meets a first preset condition under the condition that the pre-search statement does not have the running risk, and determining that the pre-search statement does not meet the first preset condition under the condition that the pre-search statement has the running risk. It should be noted that, when judging whether the pre-search sentence has a running risk, it may be judged whether the pre-search sentence can run, if it is determined that the pre-search sentence can run successfully, the pre-search sentence does not have a running risk, and if it is determined that the pre-search sentence cannot run successfully, it is determined that the pre-search sentence has a running risk.
And under the condition that the preset sentence meets the first preset condition, determining the preset sentence as a target search sentence, and under the condition that the preset sentence does not meet the first preset condition, correcting the preset sentence to obtain the target search sentence meeting the first preset condition. The target search statement may be, for example, an SQL statement.
The correction processing for the pre-search statement may specifically be processing for the pre-search statement by using a fault scanning model, so as to obtain a corresponding fault type. Corresponding correction rules are configured for each preset fault type in advance. And matching the fault type with each preset fault type, determining a target preset fault type matched with the fault type, taking a correction rule associated with the target preset fault type as a target correction rule, and carrying out correction processing on the pre-search statement based on the correction rule to obtain an updated search statement. Determining whether the updated search statement has running risk, determining that the updated search statement meets a first preset condition under the condition that the updated search statement does not have running risk, taking the updated search statement as a target search statement, otherwise, continuing to correct the updated search statement until the updated search statement meeting the first preset condition is obtained, and taking the updated search statement meeting the first preset condition as the target search statement.
According to another embodiment of the present specification, the first preset condition includes: determining target search information corresponding to a preset grammar structure in a preset search sentence; extracting features of the target search information, and determining a search feature vector; determining a corresponding target risk feature vector from the risk feature library according to the field information; and determining that the pre-search statement meets a first preset condition under the condition that the similarity between the search feature vector and the target risk feature vector is smaller than a preset threshold value.
The preset grammar structure is a sentence structure of a search sentence, for example, the preset grammar structure includes clauses, expressions, predicates, queries, sentences, and the like. The determining of the target search information corresponding to the preset grammar structure may be based on splitting the grammar structure for the pre-search sentence to obtain the search information corresponding to each sub-term, where the sub-term is the clause, the expression, the predicate, the query, the sentence, and the like. And determining search information corresponding to at least one of the predicate and the query as target search information.
And extracting features of the target search information by using a feature extraction model to obtain a search feature vector. Specifically, the feature extraction model may be, for example, a trained convolutional neural network model.
And respectively configuring a plurality of pieces of risk search information for each piece of preset field information in advance, and taking the risk search information as preset risk search information. And carrying out feature extraction on each piece of preset risk search information by using a feature extraction model to obtain a plurality of preset risk feature vectors. And storing the preset risk feature vector and the preset field information in a risk feature library in association. Thereby, it can be realized that for each storage unit in the database, a corresponding search statement with risk is configured, so that risk verification with smaller granularity can be performed for the search statement.
And matching the field information with each piece of preset field information in the risk feature library, and determining a preset risk feature vector associated with the target preset field information as a target risk feature vector under the condition that the target preset field information is matched with the field information.
And calculating the search feature vector and each target risk feature vector by using a similarity calculation formula to obtain a plurality of similarities, and determining that the preset search statement meets a first preset condition under the condition that one of the similarities is smaller than a preset threshold value.
Fig. 3 is a flowchart of a method for determining characterization information according to an embodiment of the present disclosure. A process for characterizing information determination is described in this figure, but may include more or fewer operational steps based on conventional or non-inventive labor. As shown in fig. 3, the method may include:
s311, performing word segmentation processing on description information included in the received search statement generation request to obtain a plurality of sub-description information;
s312, processing each piece of sub-description information by using an importance formula to obtain importance corresponding to each piece of sub-description information;
s313, determining sub description information corresponding to the importance degree meeting the second preset condition as the characterization information.
According to another embodiment of the present specification, word segmentation is performed on the expression information to obtain a plurality of sub-description information, and specifically, segmentation is performed on the expression information based on a word segmentation model, to obtain a plurality of sub-description information, where the word segmentation model may include, for example, an identifier segmentation rule, a word number segmentation rule, and the like.
The importance formulas may be formulas that determine the importance of the sub-descriptive information based on the number of sub-descriptive information present in the corpus. The corpus comprises a plurality of preset files, and each preset file comprises a plurality of preset description information.
The second preset condition may be, for example, sorting for the plurality of importance degrees, and determining sub-description information corresponding to the importance degree ranked first as the characterization information. In addition, the second preset condition may be, for example, that each importance degree of the plurality of importance degrees is compared with a preset importance degree threshold value, and sub description information corresponding to the target importance degree greater than or equal to the preset importance degree threshold value is determined as the characterization information.
According to another embodiment of the present specification, the importance formula may be, for example, as shown in the following formula (1).
tfidf i,j =tf i,j ×idf i Formula (1)
Wherein tfidf is i,j Characterizing importance, tf i,j Characterizing the number of preset files in a corpus, the number of the preset files including sub-description information, and idf i The inverse document frequency is characterized and is determined by the corpus and the sub-descriptive information.
The method for determining the number of the preset files including the sub description information may be, for example, for each sub description information, matching the sub description information with each preset description information included in each preset file, and determining that the preset file includes the sub description information when it is determined that the preset description information matches the sub description information in the preset file.
According to another embodiment of the present specification, the determination of the reverse file frequency may be as shown in the following formula (2), for example.
Wherein idf i Representing the frequency of reverse files, and representing the total number of preset files included in the corpus by D, { j: t i ∈d j The number t of preset files including sub-description information in the token corpus i Characterization sub-descriptive information, C characterization constant, d j And representing a preset file. The constant may be, for example, 1 to ensure that the denominator is not 0.
Fig. 4A is a flowchart illustrating a search term generation method according to another embodiment of the present disclosure. A process of search statement generation is described in this figure, but may include more or fewer operational steps based on conventional or non-inventive labor. As shown in fig. 4A, the method may include:
s430, processing field information by using the trained search statement generation model to obtain a pre-search statement;
s440, judging whether the pre-search statement meets a first preset condition;
s450, determining the pre-search sentence as a target search sentence under the condition that the pre-search sentence meets the first preset condition;
s461, under the condition that the pre-search statement does not meet the first preset condition, sending the pre-search statement and field information to the second user terminal so that the second user terminal determines the alternative search statement based on the pre-search statement and the field information;
S462, updating the pre-search sentence as a target search sentence by using the received replacement search sentence transmitted by the second user terminal.
According to another embodiment of the present disclosure, the field information is processed by using the trained search term generation model, and the execution process of obtaining the pre-search term is similar to the execution process of S230 in fig. 2, which is not described herein.
Judging whether the pre-search statement meets a first preset condition, determining that the pre-search statement is a target search statement under the condition that the first preset condition is met, and sending the pre-search statement and field information to a second user terminal under the condition that the pre-search statement does not meet the first preset condition. The second user terminal is a user terminal corresponding to a headquarter operator. And the second user terminal displays the pre-search statement and the field information to a headquarter operator after receiving the pre-search statement and the field information. And the headquarter operator carries out correction processing on the pre-search statement based on the pre-search statement and the field information to obtain a replacement search statement meeting the first preset condition, and inputs the replacement search statement into the second user terminal.
The second user terminal sends the alternative search statement to the server. The server takes the replacement search statement as the target search statement after receiving the replacement search statement sent by the second user terminal.
FIG. 4B is a flowchart illustrating a method for training a search term generation model according to an embodiment of the present disclosure. The training process of the search term generation model is depicted in FIG. 4B, but may include more or fewer operational steps based on conventional or non-inventive labor. As shown in fig. 4B, the method may include:
s461, adding the field information into a sample field information set to obtain a target sample set;
s462, adding the target search statement as a tag associated with field information into a sample search statement set to obtain a target tag set;
s463, processing the target sample set by using a preset search statement generation model to obtain a prediction search statement set;
s464, training a preset search statement generation model based on the difference between the predicted search statement included in the predicted search statement set and the target label included in the target label set, and obtaining a trained search statement generation model.
By using the embodiment of the specification, under the condition that the pre-search statement generated based on the trained search statement generation model does not meet the first preset condition, the sample field information set and the sample search statement set are filled based on the corrected target search statement and the field information so as to be used for model training again, so that the trained search statement generation model is continuously optimized, the accuracy of the generated search statement is further improved, and the user experience is improved.
According to another embodiment of the present description, field information is added to a set of sample field information to obtain a set of target samples. And adding the target search statement as a tag associated with the field information into the sample search statement set to obtain a target tag set. It should be noted that the target tag set includes a plurality of target tags, and the target tags are in one-to-one correspondence with the target samples in the target sample set.
And taking the trained search statement generation model processed for the field information as the preset search statement generation model.
And respectively inputting each target sample in the target sample set into the preset search statement generation model to obtain a prediction search statement set, wherein the prediction search statement set comprises a plurality of search statements, and each search statement corresponds to each target sample one by one.
Model training is carried out on the preset search statement generation model based on the difference between the prediction search statement contained in the prediction search statement set and the target label contained in the target label set until the difference between the prediction search statement contained in the prediction search statement set and the target label contained in the target label set meets the model training termination condition, and the trained search statement generation model is obtained and used for updating the preset search statement generation model.
Model training is specifically performed on the preset search statement generation model based on the difference between the prediction search statement included in the prediction search statement set and the target label included in the target label set, wherein the model training is specifically performed on the target label corresponding to each target sample and the prediction search statement based on a preset loss function to obtain a loss function value, the loss function value is subjected to termination verification by using a model training termination condition, and when the loss function value verification is determined to pass, the model training termination condition is determined to be satisfied by the difference between the prediction search statement included in the prediction search statement set and the target label included in the target label set. The model training termination condition may be, for example, that the loss function value is less than or equal to a preset loss function threshold, or may be a convergence of a series of loss function values. Furthermore, the model training termination condition may be, for example, a preset number of model iterations threshold.
Fig. 5 is a flowchart illustrating a method for determining target data information according to an embodiment of the present disclosure. The determination of the target data information is described in this figure, but may include more or fewer operational steps based on conventional or non-inventive labor. As shown in fig. 5, the method may include:
S550, running a target search statement to obtain data information;
s560, determining readable rules based on the service identifier included in the search statement generation request;
s570, splitting the data information based on the readable rules to obtain target data information.
With embodiments of the present description, a readable rule is determined based on a service identification included in a search statement generation request; the method and the device have the advantages that the target data information is obtained by splitting the data information based on the readable rules, and the target data information which can be read by the branches corresponding to the search statement generation request is automatically picked from the data information, so that the corresponding target data information is not needed to be manually picked from the data information, the waste of resources is reduced, and the user experience is improved. It should be noted that, the method for obtaining the target data information by performing readability splitting on the data information is simpler and easier than further refining the search statement in the process of determining the target search statement, and the cost of a model for determining the target search statement is also smaller.
According to another embodiment of the present description, after determining a target search statement, the target search statement is run and data information is obtained from a corresponding database.
The corresponding preset readable rules are preset for each preset service identifier, for example, the preset readable rules can be preset for a headquarter operator, and the preset readable rules can be used for configuring data of data types which can be read for each service of each sub-row, for example, the data type identifier is configured.
And respectively matching the service identifier included in the search statement generation request with each preset service identifier, and determining a target preset service identifier matched with the service identifier from the preset service identifiers. Based on the target service identification, a readable rule associated with the target service identification is obtained.
Splitting the data information based on the readable rule to obtain the target data information may specifically be, for example, splitting the data information to obtain first sub-data information corresponding to the readable rule and second sub-data information in the target data information except for the first sub-data information. The first sub data information is used as the target data information.
Splitting the data information to obtain first sub-data information corresponding to the readable rule and second sub-data information except the first sub-data information in the target data information may specifically be that, based on a data class determining rule, each sub-data information included in the data information is processed to obtain corresponding class information. And respectively matching the plurality of category information with the readable rules, and determining the sub-data information corresponding to the category information as first sub-data information under the condition that the category information is matched with the readable rules.
Fig. 6A is a schematic structural diagram of a search term generating device according to an embodiment of the present disclosure. As shown in fig. 6A, including,
a first processing unit 610, configured to process, using the target text extraction model, description information included in the search term generation request received and transmitted by the first user terminal, and determine characterization information;
a first determining unit 620, configured to determine corresponding field information from the data dictionary based on the characterization information and the target system identifier determined by the characterization information;
a second processing unit 630, configured to process the field information by using the trained search term generation model to obtain a pre-search term; and
and a second determining unit 640 for determining the pre-search sentence as a target search sentence in the case that the pre-search sentence is determined to satisfy the first preset condition.
Since the principle of the device for solving the problem is similar to that of the method, the implementation of the device can be referred to the implementation of the method, and the repetition is omitted.
Fig. 6B is a schematic structural diagram of a search term generating device according to another embodiment of the present disclosure. As shown in fig. 6B, including,
a transmitting unit 650 for transmitting the pre-search sentence and the field information to the second user terminal to cause the second user terminal to determine the alternative search sentence based on the pre-search sentence and the field information, in case it is determined that the pre-search sentence does not satisfy the first preset condition; and
An updating unit 660 for updating the pre-search sentence as a target search sentence with the received replacement search sentence transmitted by the second user terminal.
Since the principle of the device for solving the problem is similar to that of the method, the implementation of the device can be referred to the implementation of the method, and the repetition is omitted.
Fig. 6C is a schematic structural diagram of a search term generating device according to another embodiment of the present disclosure. As shown in fig. 6C, including,
an operation unit 670, configured to operate the target search statement to obtain data information;
a third determining unit 680, configured to determine a readable rule based on the service identifier included in the search statement generation request; and
and the splitting unit 690 is configured to split the data information based on the readable rule to obtain the target data information.
Since the principle of the device for solving the problem is similar to that of the method, the implementation of the device can be referred to the implementation of the method, and the repetition is omitted.
Fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure, where an apparatus in the present disclosure may be the computer device in the present embodiment, and perform the method of the present disclosure. The computer device 702 may include one or more processing devices 704, such as one or more Central Processing Units (CPUs), each of which may implement one or more hardware threads. The computer device 702 may also include any storage resources 706 for storing any kind of information, such as code, settings, data, etc. For example, and without limitation, storage resources 706 may include any one or more of the following combinations: any type of RAM, any type of ROM, flash memory devices, hard disks, optical disks, etc. More generally, any storage resource may store information using any technology. Further, any storage resource may provide volatile or non-volatile retention of information. Further, any storage resources may represent fixed or removable components of computer device 702. In one case, the computer device 702 can perform any of the operations of the associated instructions when the processing device 704 executes the associated instructions stored in any storage resource or combination of storage resources. The computer device 702 also includes one or more drive mechanisms 708, such as a hard disk drive mechanism, an optical disk drive mechanism, and the like, for interacting with any storage resources.
The computer device 702 may also include an input/output module 710 (I/O) for receiving various inputs (via an input device 712) and for providing various outputs (via an output device 714). One particular output mechanism may include a presentation device 716 and an associated Graphical User Interface (GUI) 718. In other embodiments, input/output module 710 (I/O), input device 712, and output device 714 may not be included as just one computer device in a network. The computer device 702 can also include one or more network interfaces 720 for exchanging data with other devices via one or more communication links 722. One or more communication buses 724 couple the above-described components together.
The embodiments of the present specification also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements the above method.
The present description also provides a computer program product comprising a computer program which, when executed by a processor, implements the above method.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing detailed description of the embodiments has been presented for purposes of illustration and description, and it should be understood that the foregoing is by way of example only, and is not intended to limit the scope of the invention.
Claims (14)
1. A search term generation method, comprising:
processing description information included in a search statement generation request sent by a first user terminal by utilizing a target text extraction model, and determining characterization information;
determining corresponding field information from a data dictionary based on the characterization information and a target system identification determined by the characterization information;
processing the field information by using the trained search statement generation model to obtain a pre-search statement; and
and under the condition that the preset search statement meets the first preset condition, determining the preset search statement as a target search statement.
2. The method of claim 1, wherein the processing the description information included in the received search term generation request using the target text extraction model, determining the characterization information includes:
performing word segmentation processing on description information included in a received search statement generation request to obtain a plurality of sub-description information;
processing each piece of sub-description information by using an importance formula to obtain importance corresponding to each piece of sub-description information; and
And determining sub-description information corresponding to the importance degree meeting a second preset condition as the characterization information.
3. The method of claim 2, wherein the importance formula comprises:
tfidf i,j =tf i,j ×idf i
wherein the tfidf i,j Characterizing the importance, the tf i,j Characterizing the number of preset files comprising the sub-description information in a corpus, wherein the idf i And characterizing reverse file frequencies, wherein the reverse file frequencies are determined by the corpus and the sub-descriptive information.
4. A method according to claim 3, wherein the determining of the reverse file frequency comprises:
wherein the idf i Characterizing reverse file frequency, wherein D characterizes the total number of preset files included in the corpus, and { j: t } i ∈d j Characterizing the number, t, of preset files in the corpus, including the sub-descriptive information i Characterizing the sub-descriptive information, C characterizing a constant, and d j And characterizing the preset file.
5. The method of claim 1, further comprising, after processing the field information using the trained search term generation model to obtain a pre-search term:
transmitting the pre-search sentence and the field information to a second user terminal under the condition that the pre-search sentence does not meet a first preset condition, so that the second user terminal determines a replacement search sentence based on the pre-search sentence and the field information; and
Updating the pre-search statement as the target search statement using the received replacement search statement sent by the second user terminal.
6. The method of claim 5, wherein the training process of the trained search term generation model comprises;
adding the field information into a sample field information set to obtain a target sample set;
adding the target search statement as a tag associated with the field information into a sample search statement set to obtain a target tag set;
processing the target sample set by using a preset search statement generation model to obtain a prediction search statement set; and
training the preset search statement generation model based on the difference between the predicted search statement included in the predicted search statement set and the target label included in the target label set to obtain the trained search statement generation model.
7. The method of claim 1, wherein the first preset condition comprises:
determining target search information corresponding to a preset grammar structure in the pre-search statement;
extracting features of the target search information, and determining a search feature vector;
Determining a corresponding target risk feature vector from a risk feature library according to the field information; and
and under the condition that the similarity between the search feature vector and the target risk feature vector is smaller than a preset threshold value, determining that the pre-search statement meets the first preset condition.
8. The method of any of claims 1-7, further comprising, after determining the target search statement:
operating the target search statement to obtain data information;
determining a readable rule based on a service identifier included in the search statement generation request; and
and splitting the data information based on the readable rule to obtain target data information.
9. A search term generation apparatus, comprising:
a first processing unit for processing description information included in a search sentence generation request transmitted by a first user terminal by using a target text extraction model, and determining characterization information;
a first determining unit, configured to determine corresponding field information from a data dictionary based on the characterization information and a target system identifier determined by the characterization information;
The second processing unit is used for processing the field information by utilizing the trained search statement generation model to obtain a pre-search statement; and
and the second determining unit is used for determining the pre-search statement as a target search statement under the condition that the pre-search statement meets the first preset condition.
10. The apparatus as recited in claim 9, further comprising:
a transmitting unit configured to transmit the pre-search sentence and the field information to a second user terminal in a case where it is determined that the pre-search sentence does not satisfy a first preset condition, so that the second user terminal determines a replacement search sentence based on the pre-search sentence and the field information; and
and an updating unit for updating the pre-search sentence as the target search sentence by using the replacement search sentence received and transmitted by the second user terminal.
11. The apparatus according to any one of claims 9-10, further comprising:
the operation unit is used for operating the target search statement to obtain data information;
a third determining unit, configured to determine a readable rule based on a service identifier included in the search statement generation request; and
And the splitting unit is used for splitting the data information based on the readable rule to obtain target data information.
12. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of the preceding claims 1-8 when executing the computer program.
13. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the method of any of the preceding claims 1-8.
14. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the method according to any of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310312847.2A CN116383234A (en) | 2023-03-28 | 2023-03-28 | Search statement generation method and device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310312847.2A CN116383234A (en) | 2023-03-28 | 2023-03-28 | Search statement generation method and device, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116383234A true CN116383234A (en) | 2023-07-04 |
Family
ID=86960898
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310312847.2A Pending CN116383234A (en) | 2023-03-28 | 2023-03-28 | Search statement generation method and device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116383234A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117407513A (en) * | 2023-11-13 | 2024-01-16 | 北京百度网讯科技有限公司 | Question processing method, device, equipment and storage medium based on large language model |
CN117786242A (en) * | 2024-02-26 | 2024-03-29 | 腾讯科技(深圳)有限公司 | Searching method based on position and related device |
-
2023
- 2023-03-28 CN CN202310312847.2A patent/CN116383234A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117407513A (en) * | 2023-11-13 | 2024-01-16 | 北京百度网讯科技有限公司 | Question processing method, device, equipment and storage medium based on large language model |
CN117786242A (en) * | 2024-02-26 | 2024-03-29 | 腾讯科技(深圳)有限公司 | Searching method based on position and related device |
CN117786242B (en) * | 2024-02-26 | 2024-05-28 | 腾讯科技(深圳)有限公司 | Searching method based on position and related device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108804641B (en) | Text similarity calculation method, device, equipment and storage medium | |
US10936821B2 (en) | Testing and training a question-answering system | |
US10896212B2 (en) | System and methods for automating trademark and service mark searches | |
US11521713B2 (en) | System and method for generating clinical trial protocol design document with selection of patient and investigator | |
US9069859B2 (en) | Search query processing | |
WO2020108063A1 (en) | Feature word determining method, apparatus, and server | |
CN109408821B (en) | Corpus generation method and device, computing equipment and storage medium | |
CN116383234A (en) | Search statement generation method and device, computer equipment and storage medium | |
CN110516063A (en) | A kind of update method of service system, electronic equipment and readable storage medium storing program for executing | |
CN111310440B (en) | Text error correction method, device and system | |
CN110929125A (en) | Search recall method, apparatus, device and storage medium thereof | |
US11017002B2 (en) | Description matching for application program interface mashup generation | |
CN110321437B (en) | Corpus data processing method and device, electronic equipment and medium | |
CN110674365B (en) | Searching method, searching device, searching equipment and storage medium | |
WO2020149959A1 (en) | Conversion of natural language query | |
CN112115232A (en) | Data error correction method and device and server | |
CN111984792A (en) | Website classification method and device, computer equipment and storage medium | |
US20230161961A1 (en) | Techniques for enhancing the quality of human annotation | |
CN110765276A (en) | Entity alignment method and device in knowledge graph | |
CN110929526B (en) | Sample generation method and device and electronic equipment | |
CN115757995A (en) | Method and device for processing characteristic-free data label, computer equipment and storage medium | |
CN117933260A (en) | Text quality analysis method, device, equipment and storage medium | |
CN109033142B (en) | Data processing method and device and server | |
US9667706B2 (en) | Distributed processing systems | |
CN109284480B (en) | Service document processing method, device and server |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |