CN117312395B

CN117312395B - Query system optimization method, device and equipment based on big data big model

Info

Publication number: CN117312395B
Application number: CN202311594622.7A
Authority: CN
Inventors: 陈守红
Original assignee: Shenzhen Gelonghui Information Technology Co ltd
Current assignee: Shenzhen Gelonghui Information Technology Co ltd
Priority date: 2023-11-28
Filing date: 2023-11-28
Publication date: 2024-02-09
Anticipated expiration: 2043-11-28
Also published as: CN117312395A

Abstract

The invention belongs to the technical field of big data, and particularly relates to a method, a device and equipment for optimizing a query system based on a big data big model, which are used for extracting all key fields with necessary relevance with a target query result by acquiring first query content and a first query result output by the query system based on the first query content and generating a first tag data set; then extracting a distinguishing content field by acquiring second query content, and acquiring a second tag data set based on the first tag data set; and inputting the second tag data set into the large model for model training, and exporting the trained large model to a query system, so that the query system is updated and optimized, the background updating and optimizing efficiency is improved, the query depth of the query system is prolonged, the effective query range is expanded, and the timely query requirement is met.

Description

Query system optimization method, device and equipment based on big data big model

Technical Field

The invention belongs to the technical field of big data, and particularly relates to a query system optimization method, device and equipment based on a big data big model.

Background

The query system is an information transmission system, a user can query the data required by the user in the database through the query system, and the query system displays the corresponding content in a table or view form according to the query request for query comparison. Along with the continuous enrichment and expansion of big information data, a query system needs to be updated and optimized at regular or irregular intervals to meet the query requirement of a user when the query system is applied, but many strategies for updating and optimizing the query system need to be manually supplemented and optimized in the background by engineers in the prior art, and the efficiency of updating and optimizing is not obvious due to the fact that huge data information is faced, the query depth and the query range of the updated and optimized query system can only be kept in a region only optimized manually due to the fact that the workload is very large, and the problem that some data in the query system are not updated for a long time can be caused, so that the system is relatively complicated, and the query of users on timely updated information is difficult to meet.

Disclosure of Invention

The invention aims to provide a query system optimization method, device and equipment based on a big data big model, and aims to solve the problems that the optimization efficiency is low and the user is difficult to meet the query of timely updated information due to the fact that the optimization strategy for the query system is poor in the prior art.

In one aspect, the invention provides a query system optimization method based on a big data big model, the method comprising the following steps:

acquiring first query content input to a query system in the previous time and a first query result output by the query system based on the first query content;

calculating the approximation degree and the matching degree of the first query result based on the target query result, and extracting all key fields which have necessary relevance with the target query result from the first query content of which the approximation degree reaches the target approximation degree and the matching degree reaches the target matching degree;

classifying the features of all the key fields and generating first feature tags, and collecting all the first feature tags to generate a first tag data set;

acquiring second query content currently input to the query system, comparing the second query content with the first query content, and extracting all the different content fields different from the first query content from the second query content;

classifying the features of all the distinguishing content fields and generating second feature labels, judging whether all the second feature labels are consistent with the first feature labels in the first label data set, and if so, adding the inconsistent second feature labels to the first label data set to update the first label data set into a second label data set;

and inputting the second tag data set into a large model for model training, and exporting the trained large model to a query system so that the query system is updated and optimized.

In some embodiments, the method further comprises the steps of:

and obtaining a second query result output by the query system based on the second query content, comparing the second query result with the first query result, and extracting all the different result fields different from the first query result from the second query result.

In some embodiments, the method further comprises the steps of:

screening out a first characteristic label and a second characteristic label which generate the distinguishing result field, analyzing the influence degree of the corresponding first characteristic label and second characteristic label, and if the influence degree exceeds the target influence degree, increasing the learning weight of the corresponding first characteristic label and second characteristic label.

In some embodiments, the method further comprises the steps of:

and customizing a query field, endowing the customized query field with customized characteristic labels which are different from the first characteristic labels and the second characteristic labels, collecting all the customized characteristic labels to generate a customized characteristic label set, inputting the customized characteristic label set into a large model for learning and training, updating the large model, and exporting the large model to a query system, so that the query system obtains new updating optimization.

On the other hand, the invention also provides a query system optimizing device based on the big data big model, which comprises the following components:

the first query content and result acquisition unit is used for acquiring the first query content input to the query system in the previous time and the first query result output by the query system based on the first query content;

the key field extraction unit is used for calculating the approximation degree and the matching degree of the first query result based on the target query result, and extracting all key fields which have necessary relevance with the target query result from the first query content of which the approximation degree reaches the target approximation degree and the matching degree reaches the target matching degree;

the first tag data set generating unit is used for classifying the features of all the key fields and generating first feature tags, and collecting all the first feature tags to generate a first tag data set;

the distinguishing content field extraction unit is used for acquiring second query content currently input to the query system, comparing the second query content with the first query content, and extracting all distinguishing content fields which are different from the first query content in the second query content;

the second tag data set generating unit is used for classifying the features of all the distinguishing content fields and generating second feature tags, judging whether all the second feature tags are consistent with the first feature tags in the first tag data set, and if so, adding the inconsistent second feature tags to the first tag data set to update the first tag data set into the second tag data set;

and the updating and optimizing unit is used for inputting the second label data set into a large model for model training, and exporting the trained large model to a query system so that the query system is updated and optimized.

In some embodiments, the apparatus further comprises:

the distinguishing result field extracting unit is used for obtaining a second query result output by the query system based on the second query content, comparing the second query result with the first query result, and extracting all distinguishing result fields different from the first query result in the second query result.

In some embodiments, the apparatus further comprises:

and the influence degree analysis unit is used for screening out a first characteristic label and a second characteristic label which generate the distinguishing result field, carrying out influence degree analysis on the corresponding first characteristic label and second characteristic label, and increasing the learning weights of the corresponding first characteristic label and second characteristic label if the influence degree exceeds the target influence degree.

In some embodiments, the apparatus further comprises:

the self-defining field unit is used for self-defining the query field, endowing the self-defining query field with self-defining characteristic labels different from the first characteristic label and the second characteristic label, collecting all the self-defining characteristic labels to generate a self-defining characteristic label set, inputting the self-defining characteristic label set into a large model for learning training, updating the large model, and exporting the large model to a query system, so that the query system obtains new updating optimization.

In another aspect, the present invention also provides a query system optimization device based on a big data big model, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of any one of the methods described above when executing the computer program.

In another aspect, the invention also provides a computer readable storage medium storing a computer program which when executed by a processor implements the steps of any of the methods described above.

The invention has the beneficial effects that: compared with the prior art, the query system optimization method based on the big data big model is characterized in that all key fields which have necessary relevance with the target query result are extracted by acquiring the first query content and the first query result output by the query system based on the first query content, and a first tag data set is generated; then extracting a distinguishing content field by acquiring second query content, and acquiring a second tag data set based on the first tag data set; and inputting the second tag data set into the large model for model training, and exporting the trained large model to a query system, so that the query system is updated and optimized, the background updating and optimizing efficiency is improved, the query depth of the query system is prolonged, the effective query range is expanded, and the timely query requirement is met.

Drawings

FIG. 1 is a flowchart of an implementation of a query system optimization method based on a big data big model in a first embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a query system optimization device based on a big data big model in a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of a query system optimization device based on a big data big model in a third embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The following describes in detail the implementation of the present invention in connection with specific embodiments:

embodiment one:

fig. 1 shows an implementation flow of a query system optimization method based on a big data big model according to an embodiment of the present invention. For convenience of explanation, only the parts related to the embodiments of the present invention are shown, and the details are as follows, specifically including the following steps, please refer to fig. 1:

s1: acquiring first query content input to a query system in the previous time and a first query result output by the query system based on the first query content;

in step S1, first, whether corresponding data can be used as learning training data for large data large model training is verified by acquiring the first query content and the first query result, so as to realize the collection of the learning training data. Where "previous" is distinguished from current, it is understood that it is one or more times before.

S2: calculating the approximation degree and the matching degree of the first query result based on the target query result, and extracting all key fields with necessary relevance with the target query result from the first query content of which the approximation degree reaches the target approximation degree and the matching degree reaches the target matching degree;

in step S2, the target query result, the target approximation degree, and the target matching degree need to be preset. The target query result is used to verify query accuracy of the first query result obtained based on the first query content. The target query result is one or more query targets set in the query system. The target approximation is used to verify a query approximation of the first query result obtained based on the first query content. The target matching degree is based on the query matching degree of the first query result acquired by the first query content. The method is beneficial to screening the first query contents which are effective for generating the accurate first query result from the plurality of first query contents, further is beneficial to extracting all key fields which have necessary relevance with the target query result from the corresponding first query contents as learning training data, and improves the deep learning capability of large data large model training.

S3: classifying the features of all the key fields, generating first feature tags, and collecting all the first feature tags to generate a first tag data set;

in step S3, feature encoding is performed on all the key fields according to a specific feature encoding rule, then all the key fields are classified and divided according to feature encoding, corresponding first feature tags are assigned, and then all the first feature tags are collected to obtain a first tag data set.

S4: acquiring second query content currently input to the query system, comparing the second query content with the first query content, and extracting all the different content fields different from the first query content from the second query content;

in step S4, the acquired second query content is compared with the first query content to extract the distinct content field, and then the variability of the query content is verified according to the distinct content field, so as to update the learning training data.

S5: classifying the features of all the distinguishing content fields and generating second feature labels, judging whether all the second feature labels are consistent with the first feature labels in the first label data set, if so, adding the inconsistent second feature labels into the first label data set to update the first label data set into the second label data set;

in step S5, feature encoding is performed on all the different content fields by using the feature encoding rule adopted in step S3, so as to ensure consistency of encoded content. Classifying and dividing all the distinguishing content fields according to feature codes, giving corresponding second feature labels, comparing all the second feature labels with the first feature labels in the first label data set, screening out all the second feature labels inconsistent with the first feature labels in the first label data set, adding the inconsistent second feature labels to the first label data set, and updating the first label data set into the second label data set so as to further realize updating of learning training data.

S6: and inputting the second label data set into the large model for model training, and exporting the trained large model to the query system so as to update and optimize the query system.

In this embodiment, the method further includes the following steps:

s7: and comparing the second query result with the first query result based on the second query result output by the query system based on the second query content, and extracting all the difference result fields different from the first query result from the second query result.

In this embodiment, the method further includes the following steps:

s8: screening out a first characteristic label and a second characteristic label which generate a distinguishing result field, analyzing the influence degree of the corresponding first characteristic label and second characteristic label, and if the influence degree exceeds the target influence degree, increasing the learning weights of the corresponding first characteristic label and second characteristic label.

In step S8, the target influence level is preset. The target influence degree is used for verifying the contribution degree of key contents in different query contents to the generated query results. So as to adjust the algorithm and parameters of the large model, thereby being beneficial to further improving the learning depth of the large model and finally improving the accuracy of the query result of the query system.

In this embodiment, the method further includes the following steps:

s9: and customizing the query field, endowing the customized query field with customized feature tags which are different from the first feature tags and the second feature tags, collecting all the customized feature tags to generate a customized feature tag set, inputting the customized feature tag set into the large model for learning training, updating the large model, and exporting the large model to a query system, so that the query system is newly updated and optimized.

Embodiment two:

fig. 2 shows a structure of a query system optimization device based on a big data big model according to a second embodiment of the present invention. For convenience of explanation, only the portions related to the embodiments of the present invention are shown, referring to fig. 2, a query system optimization device based on a big data big model provided in a second embodiment of the present invention includes:

a first query content and result obtaining unit 501, configured to obtain a first query content input to the query system in the previous time and a first query result output by the query system based on the first query content;

in the first query content and result obtaining unit 501, first, whether corresponding data can be used as learning training data for large data large model training is verified by obtaining the first query content and the first query result, so as to realize the collection of the learning training data.

The key field extracting unit 502 is configured to perform approximation degree and matching degree calculation on the first query result based on the target query result, and extract all key fields having a certain relevance with the target query result from the first query content in which the approximation degree reaches the target approximation degree and the matching degree reaches the target matching degree;

in the key field extraction unit 502, the target query result, the target approximation degree, and the target matching degree need to be preset. The target query result is used to verify query accuracy of the first query result obtained based on the first query content. The target approximation is used to verify a query approximation of the first query result obtained based on the first query content. The target matching degree is based on the query matching degree of the first query result acquired by the first query content. The method is beneficial to screening the first query contents which are effective for generating the accurate first query result from the plurality of first query contents, further is beneficial to extracting all key fields which have necessary relevance with the target query result from the corresponding first query contents as learning training data, and improves the deep learning capability of large data large model training.

A first tag data set generating unit 503, configured to perform feature classification on all the key fields and generate first feature tags, and aggregate all the first feature tags to generate a first tag data set;

in the first tag data set generating unit 503, feature encoding is performed on all key fields according to a specific feature encoding rule, then all key fields are classified and divided according to feature encoding, corresponding first feature tags are assigned, and then all first feature tags are collected to obtain a first tag data set.

A distinct content field extracting unit 504, configured to obtain a second query content currently input to the query system, compare the second query content with the first query content, and extract all distinct content fields different from the first query content in the second query content;

in the distinguishing content field extracting unit 504, the acquired second query content is compared with the first query content to extract the distinguishing content field, so as to verify the variability of the query content according to the distinguishing content field, thereby realizing the update of the learning training data.

A second tag data set generating unit 505, configured to perform feature classification on all the distinguishing content fields and generate second feature tags, determine whether all the second feature tags are consistent with the first feature tags in the first tag data set, and if so, add the inconsistent second feature tags to the first tag data set, so that the first tag data set is updated to be the second tag data set;

in the second tag data set generating unit 505, all the different content fields need to be feature-coded by the feature coding rule adopted in the first tag data set generating unit 503, so as to ensure the consistency of the coded content. Classifying and dividing all the distinguishing content fields according to feature codes, giving corresponding second feature labels, comparing all the second feature labels with the first feature labels in the first label data set, screening out all the second feature labels inconsistent with the first feature labels in the first label data set, adding the inconsistent second feature labels to the first label data set, and updating the first label data set into the second label data set so as to further realize updating of learning training data.

And the updating and optimizing unit 506 is configured to input the second tag data set into the large model for model training, and export the trained large model to a query system, so that the query system is updated and optimized.

In this embodiment, the apparatus further includes:

the differential result field extracting unit 507 is configured to obtain a second query result output by the query system based on the second query content, compare the second query result with the first query result, and extract all differential result fields different from the first query result in the second query result.

In this embodiment, the apparatus further includes:

the influence analysis unit 508 is configured to screen out a first feature tag and a second feature tag that generate a distinguishing result field, perform influence analysis on the corresponding first feature tag and second feature tag, and if the influence exceeds a target influence, increase learning weights of the corresponding first feature tag and second feature tag.

The influence analysis unit 508 needs to set a target influence in advance. The target influence degree is used for verifying the contribution degree of key contents in different query contents to the generated query results. So as to adjust the algorithm and parameters of the large model, thereby being beneficial to further improving the learning depth of the large model and finally improving the accuracy of the query result of the query system.

In this embodiment, the apparatus further includes:

the custom field unit 509 is configured to customize a query field, assign custom feature tags of the custom query field different from the first feature tag and the second feature tag, aggregate all the custom feature tags to generate a custom feature tag set, input the custom feature tag set into the large model for learning and training, update the large model, and export the large model to the query system, so that the query system obtains new update optimization.

In the embodiment of the invention, each unit of the query system optimization device based on the big data big model can be realized by corresponding hardware or software units, and each unit can be an independent software and hardware unit or can be integrated into one software and hardware unit, and the invention is not limited herein.

Embodiment III:

fig. 3 shows the structure of a query system optimization device based on a big data big model according to the third embodiment of the present invention, and for convenience of explanation, only the portion relevant to the embodiment of the present invention is shown, please refer to fig. 3.

The query system optimization device 2 based on big data big model according to the third embodiment of the present invention includes a memory 201, a processor 202, and a computer program 203 stored in the memory 201 and capable of running on the processor 202, where steps S1 to S6 or steps S1 to S9 of the query system optimization method based on big data big model according to the first embodiment are implemented when the processor 202 executes the computer program 203. Alternatively, the processor 202 implements the respective unit functions of the big data big model based query system optimization device provided in the above-described embodiment two when executing the computer program 203, for example, the functions of the units 501 to 506 or the functions of the units 501 to 509 shown in fig. 2.

Embodiment four:

a fourth embodiment of the present invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements steps S1 to S6 or steps S1 to S9 of the query system method based on big data learning provided in the first embodiment described above. Alternatively, the processor 202 implements the respective unit functions of the query system apparatus based on big data learning provided in the above-described second embodiment, such as the functions of the units 501 to 506 or the functions of the units 501 to 509 shown in fig. 2, when executing the computer program 203.

The computer readable storage medium of embodiments of the present invention may include any entity or device capable of carrying computer program code, recording medium, such as ROM/RAM, magnetic disk, optical disk, flash memory, and so on.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims

1. A query system optimization method based on a big data big model, the method comprising the steps of:

calculating the approximation degree and the matching degree of the first query result based on the target query result, extracting all key fields which have necessary relevance with the target query result from the first query content of which the approximation degree reaches the target approximation degree and the matching degree reaches the target matching degree, wherein the target query result is one or more query targets set in a query system;

obtaining a second query result output by the query system based on second query content, comparing the second query result with the first query result, and extracting all the different result fields different from the first query result from the second query result;

screening out a first feature tag and a second feature tag which generate the distinguishing result field, analyzing the influence degree of the corresponding first feature tag and second feature tag, and if the influence degree exceeds a target influence degree, increasing the learning weight of the corresponding first feature tag and second feature tag, wherein the target influence degree is used for verifying the contribution degree of key contents in different query contents to the generated query result;

2. The method for optimizing a query system based on big data big model according to claim 1, wherein the method further comprises the steps of:

3. A query system optimization device based on big data big model, the device comprising:

the key field extraction unit is used for calculating the approximation degree and the matching degree of the first query result based on the target query result, extracting all key fields which have necessary relevance with the target query result from the first query content of which the approximation degree reaches the target approximation degree and the matching degree reaches the target matching degree, wherein the target query result is one or more query targets set in a query system;

the distinguishing result field extraction unit is used for obtaining a second query result output by the query system based on second query content, comparing the second query result with the first query result and extracting all distinguishing result fields different from the first query result in the second query result;

the influence degree analysis unit is used for screening out a first characteristic label and a second characteristic label which generate the distinguishing result field, carrying out influence degree analysis on the corresponding first characteristic label and second characteristic label, and if the influence degree exceeds a target influence degree, increasing the learning weight of the corresponding first characteristic label and second characteristic label, wherein the target influence degree is used for verifying the contribution degree of key contents in different query contents to the generated query result;

4. The big data big model based query system optimization apparatus of claim 3, wherein the apparatus further comprises:

5. A big data big model based query system optimization device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 2 when the computer program is executed.

6. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 2.