CN114547474A

CN114547474A - Data searching method, system, electronic equipment and storage medium

Info

Publication number: CN114547474A
Application number: CN202210417524.5A
Authority: CN
Inventors: 崔燕红
Original assignee: Beijing Teddy Bear Mobile Technology Co ltd
Current assignee: Beijing Teddy Bear Mobile Technology Co ltd
Priority date: 2022-04-21
Filing date: 2022-04-21
Publication date: 2022-05-27

Abstract

The invention provides a data searching method, a system, electronic equipment and a storage medium, which relate to the technical field of Internet, and the method comprises the following steps: acquiring data to be searched, inputting the data to be searched into a pre-trained recognition model for extracting words to be searched, and acquiring a word set to be searched; visually displaying the words to be searched in the word set to be searched to generate a word graph to be searched; when the word graph to be searched is triggered, acquiring a search result set of the word to be searched in at least one search engine corresponding to the word graph to be searched, wherein the search result set comprises at least one search result; and obtaining the association degree between the search result and the corresponding word to be searched, scoring the search result set according to a preset scoring rule and the association degree, sequencing and visually displaying the search result set according to the score of the search result set, and generating a search result set graph. The method better meets the requirement of multi-search-engine searching on a large amount of data and improves the user experience.

Description

Data searching method, system, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of internet technologies, and in particular, to a data search method, system, electronic device, and storage medium.

Background

At present, for a large amount of data, such as a search or query of a plurality of terms to be searched, there are two ways, namely a manual query and a crawler query. The manual query is used, the to-be-queried search terms need to be queried one by one in different search engine websites, and query results are recorded, so that a great deal of time is wasted for demanders; information searching by crawling the internet search engine by using a crawler is likely to involve the intellectual property problem of a search engine website, so that the process of searching or inquiring a large amount of data by a user is very inconvenient. Secondly, the existing website search usually searches for a single text vocabulary, cannot well perform combined search on a large amount of texts and voices, and cannot well meet the requirements of users.

Disclosure of Invention

The present disclosure provides a data search method, system, electronic device, and storage medium to at least solve the above technical problems in the prior art.

According to a first aspect of the present disclosure, there is provided a data search method, the method comprising:

acquiring data to be searched;

inputting the data to be searched into a pre-trained recognition model for extracting words to be searched, and acquiring a set of words to be searched;

visually displaying the words to be searched in the word set to be searched to generate a word graph to be searched;

when the word graph to be searched is triggered, acquiring a search result set of the word to be searched in at least one search engine corresponding to the word graph to be searched, wherein the search result set comprises at least one search result;

obtaining the association degree between the search result and the corresponding word to be searched, and scoring the search result set according to a preset scoring rule and the association degree to obtain the score of the corresponding search result set;

and sequencing and visually displaying the search result set according to the scores to generate a search result set graph, wherein the search result set graph corresponds to a search engine and a word to be searched respectively.

In an embodiment, the data to be searched includes at least one of the following: text data and voice data;

the recognition model includes at least one of: a text recognition submodel and a speech recognition submodel.

In an embodiment, the step of obtaining the text recognition submodel includes:

obtaining a first training set, the first training set comprising: training sentences, training word sets, first real semantic labels corresponding to the training sentences and second real semantic labels corresponding to the training word sets;

respectively inputting the training sentences and the training word sets into a preset first semantic recognition network for semantic recognition to obtain a first prediction semantic tag set and a second prediction semantic tag set, wherein the first prediction semantic tag set comprises at least one first prediction semantic tag, the second prediction semantic tag set comprises at least one second prediction semantic tag, the first prediction semantic tag set corresponds to the training sentences, and the second prediction semantic tag set corresponds to the training word sets;

and performing iterative training on the first semantic recognition network according to the difference between the first real semantic label and the first predicted semantic label and the difference between the second real semantic label and the second predicted semantic label to obtain the text recognition submodel.

In one embodiment, the step of obtaining the voice recognition submodel includes:

obtaining a second training set, the second training set comprising: voice sample data and a third real semantic tag corresponding to the voice sample data;

denoising the voice sample data to obtain denoising sample data;

inputting the noise reduction sample data into a preset voice recognition network for voice recognition to obtain a voice text;

inputting the voice text into a preset second semantic recognition network for semantic recognition to obtain a third prediction semantic tag set;

and performing joint training on the voice recognition network and the second semantic recognition network according to the difference between the third real semantic label and a third predicted semantic label in the third predicted semantic label set to obtain the voice recognition sub-model.

In an implementation mode, the step of inputting the data to be searched into a pre-trained recognition model for extracting the words to be searched and acquiring the word set to be searched comprises the following steps:

when the data to be searched is text data, inputting the text data into the text recognition submodel for recognition and prediction, obtaining a text semantic tag set output by the text recognition submodel, and taking the text semantic tag set as a word set to be searched;

and when the data to be searched is voice data, inputting the voice data into the voice recognition submodel for recognition and prediction, acquiring a voice semantic tag set output by the voice recognition submodel, and taking the voice semantic tag set as a word set to be searched.

In an implementation manner, the step of inputting the data to be searched into a pre-trained recognition model for extracting the words to be searched and obtaining the word set to be searched further includes:

when the data to be searched is text data and voice data, inputting the text data into the text recognition submodel for recognition and prediction to obtain a text semantic tag set, and inputting the voice data into the voice recognition submodel for recognition and prediction to obtain a voice semantic tag set;

acquiring a first confidence coefficient of text semantic labels in the text semantic label set and a second confidence coefficient of voice semantic labels in the voice semantic label set;

and screening the text semantic tags and the voice semantic tags according to the first confidence coefficient and the second confidence coefficient to obtain the word set to be searched.

In an implementation manner, when the word graph to be searched is triggered, the step of obtaining a search result set of the word to be searched in at least one search engine, which corresponds to the word graph to be searched, includes:

when the word graph to be searched is clicked or touched, a search instruction is sent to at least one preset search engine, and the search instruction comprises the following steps: searching a command, checking information and a word to be searched corresponding to the current word to be searched;

the verification information is used for the search engine to verify whether the search engine has the search authority or not, and a verification result is obtained; and the search engine searches the word to be searched corresponding to the current word graph to be searched according to the verification result and the search command to obtain at least one search result set.

In one possible embodiment, the step of generating the search result set graph is followed by:

creating a data export task by selecting the search result set graph;

according to the data export task, packaging at least one selected search result set to obtain a data packet to be exported, wherein the data packet to be exported comprises: a search result list and at least one search result set, the search result list comprising: the number of the search result set and the name of a search engine corresponding to the search result set;

and exporting the data packet to be exported by using a preset data export rule.

According to a second aspect of the present disclosure, there is provided a data search system, the system comprising:

the system comprises a to-be-searched word set acquisition module, a search module and a search module, wherein the to-be-searched word set acquisition module is used for acquiring data to be searched; inputting the data to be searched into a pre-trained recognition model for extracting words to be searched, and acquiring a set of words to be searched;

the first visual display module is used for visually displaying the words to be searched in the word set to be searched to generate a word graph to be searched;

the search module is used for acquiring a search result set of the word to be searched in at least one search engine corresponding to the word graph to be searched when the word graph to be searched is triggered, wherein the search result set comprises at least one search result;

the scoring module is used for obtaining the association degree between the search result and the corresponding word to be searched, scoring the search result set according to a preset scoring rule and the association degree and obtaining the score of the search result set;

and the second visual display module is used for sequencing and visually displaying the search result set according to the scores to generate a search result set graph, and the search result set graph corresponds to a search engine and a word to be searched respectively.

According to a third aspect of the present disclosure, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the methods of the present disclosure.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the present disclosure.

According to the data searching method, the data searching system, the electronic equipment and the storage medium, the data to be searched is input into the pre-trained recognition model for extracting the words to be searched, the word set to be searched is obtained, the obtained data to be searched can be better recognized and predicted, and multi-engine searching can be conveniently carried out on the words to be searched in the word set to be searched subsequently. The method comprises the steps that a word graph to be searched is generated by visually displaying words to be searched in a word set to be searched, when the word graph to be searched is triggered, a search result set of the words to be searched in at least one search engine corresponding to the word graph to be searched is obtained, and the search result set comprises at least one search result; and then obtaining the association degree between the search result and the corresponding word to be searched, scoring the search result set according to a preset scoring rule and the association degree, obtaining the score of the search result set, sequencing and visually displaying the search result set according to the score, and generating a search result set graph. The method and the device for searching the data of the large amount of data better meet the requirement of searching the large amount of data or data through multiple search engines, effectively improve user experience, have the advantages of higher intelligent degree, higher visual degree and lower cost, and effectively shorten the search time.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

FIG. 1 is a schematic diagram illustrating a flow chart of an implementation of a data search method according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram illustrating an implementation flow of obtaining a text identifier model in a data search method according to an embodiment of the present disclosure;

FIG. 3 is a schematic flow chart illustrating an implementation of obtaining a speech recognizer model in a data search method according to an embodiment of the present disclosure;

FIG. 4 is a first schematic flow chart illustrating an implementation process of obtaining a word set to be searched in the data search method according to the embodiment of the present disclosure;

FIG. 5 is a schematic diagram illustrating a second implementation flow for obtaining a word set to be searched in the data searching method according to the embodiment of the present disclosure;

FIG. 6 is a schematic diagram illustrating a third implementation flow for obtaining a word set to be searched in the data search method according to the embodiment of the present disclosure;

FIG. 7 is a schematic diagram illustrating an implementation flow of obtaining a search result set in a data search method according to an embodiment of the present disclosure;

FIG. 8 is a schematic diagram illustrating a flow chart of an implementation of batch derivation of a search result set in the data search method according to the embodiment of the present disclosure;

FIG. 9 is a schematic diagram illustrating a search result page according to a first embodiment of the data search method in the present disclosure;

FIG. 10 is a schematic diagram showing the structure of the data search system according to the embodiment of the present disclosure;

fig. 11 shows a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In order to make the objects, features and advantages of the present disclosure more apparent and understandable, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

Because the multi-engine search process for a large amount of data is inconvenient at present, a large amount of time and energy of demanders are easily wasted, and if the internet search engine is crawled by a crawler for searching, the intellectual property problem of a search engine website is likely to be involved, so, please refer to fig. 1, the disclosure provides a data search method, which comprises the following steps:

s1: and acquiring data to be searched. The data to be searched can be obtained by adopting a user input or file import mode. By acquiring the data to be searched, the subsequent acquisition of the word set to be searched corresponding to the data to be searched can be facilitated.

In some embodiments, the data to be searched includes at least one of: text data and voice data. Specifically, the text data is a word set text or a sentence text, and the word set text includes at least one word text. By acquiring at least one of text data and voice data, the diversity of the data to be searched can be better improved, the search requirements of users on the text and voice type data are met, the flexibility is higher, and the coverage is wider.

S2: and inputting the data to be searched into a pre-trained recognition model for extracting the words to be searched, and acquiring a word set to be searched. In the step, the data to be searched is input into the recognition model for recognition and prediction, namely, the words to be searched are extracted, so that the semantic information of the data to be searched can be well combined, the word set to be searched with higher accuracy can be obtained, the subsequent search of multiple search engines for the words to be searched in the word set to be searched is facilitated, and the combined search of text data and voice data is facilitated.

In some embodiments, identifying the model comprises: a text recognition submodel and a speech recognition submodel. Specifically, when the data to be searched is a word set text, the word set text is input into a pre-trained text recognition sub-model for recognition and prediction, and a word set to be searched is obtained; when the data to be searched is a sentence text, inputting the sentence text into a pre-trained text recognition sub-model for recognition and prediction to obtain a word set to be searched; when the data to be searched is voice data, inputting the voice data into a pre-trained voice recognition sub-model for recognition and prediction to obtain a word set to be searched; when the data to be searched comprises text data and voice data, inputting the text data into a text recognition sub-model for recognition and prediction to obtain a text semantic tag set, wherein the text semantic tag set comprises at least one text semantic tag, simultaneously, inputting the voice data into a voice recognition sub-model for voice recognition to obtain a voice semantic tag set, wherein the voice semantic tag set comprises at least one voice semantic tag, then obtaining the confidence coefficient of the text semantic tag and the confidence coefficient of the voice semantic tag, and screening the text semantic tag and the voice semantic tag according to the confidence coefficients to obtain a word set to be searched. The data to be searched is input into the text recognition submodel and/or the voice recognition submodel for recognition and prediction, so that the recognition of the text and the voice can be well completed by combining semantics, a corresponding word set to be searched is obtained, and the accuracy is high.

According to the actual search requirement, the image data or the video data can be used as the data to be searched, and correspondingly, an image recognition sub-model is set for recognizing and predicting the image data and the video data to obtain a corresponding word set to be searched. For example: acquiring image data, inputting the image data into a trained image identifier model for feature recognition and semantic prediction, and acquiring a corresponding word set to be searched; or, acquiring video data, performing image interception processing on the video data, acquiring an image set to be searched, inputting images in the image set to be searched into an image recognition sub-model for image recognition and semantic prediction, and acquiring a corresponding word set to be searched. The image recognition submodel may employ an existing image recognition model.

S3: and visually displaying the words to be searched in the word set to be searched to generate a word graph to be searched. The method comprises the steps of determining the display position of a word to be searched according to a preset word display template to be searched, visually displaying the word to be searched in a word set to be searched, generating a word graph to be searched, displaying the content of the word to be searched in the word graph to be searched, and enabling the word graph to be searched and the word to be searched to have a mapping relation.

S4: and when the word graph to be searched is triggered, acquiring a search result set of the word to be searched in at least one search engine corresponding to the word graph to be searched, wherein the search result set comprises at least one search result. When a user clicks or touches a graph of a word to be searched displayed on a screen, a search result set of the word to be searched in at least one search engine corresponding to the graph of the word to be searched is obtained, and the search engines are in one-to-one correspondence with the search result set.

S5: and obtaining the association degree between the search result and the corresponding word to be searched, and scoring the search result set according to a preset scoring rule and the association degree to obtain the score of the corresponding search result set. It can be understood that, in this embodiment, the search result and the word to be searched have a corresponding relationship, and according to the search result and the corresponding relationship, the word to be searched corresponding to the search result is determined, and the association degree between the search result and the word to be searched is obtained. The obtaining step of the correlation degree comprises the following steps: matching the words to be searched with the sentences in the corresponding search results respectively to obtain matching results; and acquiring the association degree between the search result and the corresponding word to be searched according to the matching result and the preset weight. The sentences at different positions in the search result correspond to different weights, and the weights can be set according to actual conditions, which is not described herein again. Further, the scoring rules may be set according to actual situations, for example: assume that the current search result set includes: the method comprises the steps of obtaining a search result 1, a search result 2 and a search result 3, wherein the association degree between the search result 1 and a corresponding word to be searched is 0.8, the association degree between the word to be searched corresponding to the search result 2 is 0.7, the association degree between the search result 3 and a corresponding word to be searched is 0.6, the score of the search result 1 is determined to be 8, the score of the search result 2 is 7, the score of the search result 3 is 6, the scores of the search result 1, the search result 2 and the search result 3 are summed, and the sum value 21 of the scores of the search result 1, the search result 2 and the search result 3 is used as the score of a current search result set. By scoring the search result set according to the preset scoring rule and the association degree, the search result set can be conveniently sorted according to the score of the search result set, and the user experience is improved.

S6: and according to the scores, sequencing and visually displaying the search result set to generate a search result set graph, wherein the search result set graph corresponds to the search engine and the words to be searched respectively. The search result sets are ranked according to the scores of the search result sets and a preset ranking rule, the ranking rule can be in a descending order, the ranked search result sets are visually displayed according to a preset search result set display template, and a user can conveniently obtain the search result sets with high association degree with the words to be searched. In some embodiments, the search results in the search result set may also be ranked according to the degree of association between the search result and the corresponding word to be searched, the ranked search results in the search result set may be visually displayed by triggering the search result set graph, and the display content may include the degree of association.

Referring to fig. 2, in order to improve the accuracy of the text recognition submodel, the obtaining step of the text recognition submodel in this embodiment includes:

s201 a: obtaining a first training set, the first training set comprising: the method includes training a sentence, a training word set, a first real semantic tag corresponding to the training sentence, and a second real semantic tag corresponding to the training word set. The number of the training sentences is multiple, the number of the training word sets is multiple, each training word set comprises multiple training vocabularies, the number of the first real semantic labels is one or more, and the number of the second real semantic labels is one or more. Training sentences refer to sentence paragraphs, etc. By acquiring the first training set, the first semantic recognition network can be conveniently trained subsequently.

S202 a: respectively inputting the training sentences and the training word sets into a preset first semantic recognition network for semantic recognition, and obtaining a first prediction semantic tag set and a second prediction semantic tag set, wherein the first prediction semantic tag set comprises at least one first prediction semantic tag, the second prediction semantic tag set comprises at least one second prediction semantic tag, the first prediction semantic tag set corresponds to the training sentences, and the second prediction semantic tag set corresponds to the training word sets. Specifically, the first semantic recognition network can be a Bi-directional Long-Short Term Memory network (Bi-LSTM) or other neural networks, and the Bi-directional Long-Short Term Memory network can better realize multiple input and multiple output, and improve the accuracy of text recognition and semantic prediction. Therefore, the training sentences and the training word sets are respectively input into the first semantic recognition network for semantic recognition by setting the first semantic recognition network, which can be helpful for improving the accuracy of semantic prediction.

S203 a: and performing iterative training on the first semantic recognition network according to the difference between the first real semantic tag and the first predicted semantic tag and the difference between the second real semantic tag and the second predicted semantic tag to obtain a text recognizer model. In the process of training the first semantic recognition network, a cross entropy loss function, an absolute value loss function and the like can be adopted to train the first semantic recognition network, so that the accuracy of the first semantic recognition network is improved.

Referring to fig. 3, in order to better improve the prediction accuracy of the speech recognition submodel, the obtaining step of the speech recognition submodel in this embodiment includes:

s201 b: obtaining a second training set, the second training set comprising: the voice sample data and a third real semantic tag corresponding to the voice sample data. The number of voice sample data is plural. By acquiring the second training set, a training basis can be better provided for the subsequent training of the voice recognition network and the second semantic recognition network.

S202 b: and carrying out noise reduction processing on the voice sample data to obtain noise reduction sample data.

S203 b: and inputting the noise reduction sample data into a preset voice recognition network for voice recognition to obtain a voice text. Specifically, the speech recognition network is a deep neural network.

S204 b: and inputting the voice text into a preset second semantic recognition network for semantic recognition to obtain a third prediction semantic tag set. And the semantic recognition is combined, so that the accuracy of the acquired word set to be searched can be improved. The second semantic recognition network can be a neural network such as a bidirectional long-short term memory network.

S205 b: and performing joint training on the voice recognition network and the second semantic recognition network according to the difference between the third real semantic tag and a third predicted semantic tag in a third predicted semantic tag set to obtain a voice recognition sub-model. And performing joint training on the voice recognition network and the second semantic recognition network by utilizing the difference between the third real semantic tag and the third predicted semantic tag in the third predicted semantic tag set, so as to obtain a better voice recognition sub-model. In some embodiments, the speech recognition network and the second speech recognition memory network may be trained using a mean square error loss function or the like.

Referring to fig. 4, in order to adapt to the diversity of the data to be searched, in this embodiment, the data to be searched is input into a recognition model trained in advance to perform extraction of the word to be searched, and the step of obtaining the word set to be searched includes:

s211: and when the data to be searched is text data, inputting the text data into a text recognition sub-model for recognition and prediction, and acquiring a text semantic tag set output by the text recognition sub-model.

S212: and taking the text semantic tag set as a word set to be searched.

Referring to fig. 5, in some embodiments, inputting data to be searched into a pre-trained recognition model for extracting words to be searched, and the step of obtaining a set of words to be searched further includes:

s221: and when the data to be searched is voice data, inputting the voice data into the voice recognition sub-model for recognition and prediction, and acquiring a voice semantic tag set output by the voice recognition sub-model.

S222: and taking the voice semantic tag set as a word set to be searched.

Referring to fig. 6, in some embodiments, inputting data to be searched into a recognition model trained in advance to extract a word to be searched, and the step of obtaining a set of words to be searched further includes:

s231: and when the data to be searched is text data and voice data, inputting the text data into a text recognition submodel for recognition and prediction to obtain a text semantic tag set, and inputting the voice data into a voice recognition submodel for recognition and prediction to obtain a voice semantic tag set.

S232: and acquiring a first confidence coefficient of the text semantic tags in the text semantic tag set and a second confidence coefficient of the voice semantic tags in the voice semantic tag set. The obtaining mode of the first confidence coefficient and the second confidence coefficient may adopt an existing confidence coefficient calculation formula for calculation, and details are not repeated here.

S233: and screening the text semantic tags and the voice semantic tags according to the first confidence coefficient and the second confidence coefficient to obtain a word set to be searched. The text semantic tags and the voice semantic tags are screened according to the first confidence, the second confidence and a preset screening rule to obtain a word set to be searched, for example: and comprehensively sequencing the text semantic tags and the voice semantic tags according to the descending order according to the first confidence coefficient and the second confidence coefficient, screening the first n tags according to the preset number n of the tags, and taking the screened first n tags as a word set to be searched.

Referring to fig. 7, in some embodiments, when the graph of the word to be searched is triggered, the step of obtaining a search result set of the word to be searched in at least one search engine corresponding to the graph of the word to be searched includes:

s411: when the word graph to be searched is clicked or touched, a search instruction is sent to at least one preset search engine, and the search instruction comprises the following steps: searching commands, checking information and words to be searched corresponding to the current word to be searched. The verification information includes: user name, password, device address. The method comprises the steps that a user clicks or touches a word graph to be searched on a display screen according to actual searching requirements, and when the word graph to be searched is clicked or touched, a searching instruction is sent to at least one searching engine. Specifically, the check information is used as a search engine to verify whether the search engine has the search authority, and a verification result is obtained.

S412: at least one search result set is obtained using the search instruction. Namely, a search result set fed back by a search engine is obtained by utilizing a search instruction. The operation steps performed after the search engine receives the search instruction include: and verifying whether the current request has the search authority or not according to the verification information in the search instruction, acquiring a verification result, and searching the word to be searched corresponding to the current word to be searched in a graphic mode according to the verification result and the search command to acquire at least one search result set. That is, if the verification result is successful, searching the word to be searched corresponding to the current word graph to be searched to obtain at least one search result, integrating the obtained search results into a search result set, and feeding back the search result set to the user terminal, wherein the user terminal can be a computer, a mobile phone and other terminal equipment.

Referring to fig. 8, after completing the multi-engine search on the set of words to be searched and obtaining at least one search result set, the user usually needs to derive the search result sets in batch, and therefore, the embodiment proposes that after the step of generating the search result set graph, the step includes:

s7: a data export task is created by selecting a search result set graph. Specifically, a user determines a search result set to be derived by selecting one or more search result set graphs, and creates a data derivation task according to the search result set to be derived.

S8: according to the data export task, packaging the selected at least one search result set to obtain a data packet to be exported, wherein the data packet to be exported comprises: a search result list and at least one search result set, the search result list comprising: the number of the search result set and the name of the search engine corresponding to the search result set.

S9: and exporting the data packet to be exported by using a preset data export rule. Wherein, the data export rule can be set according to the actual situation. Through the steps, the batch export of the search result set is well realized, the requirement of a user for multi-engine search of a large amount of data is met, and the user experience is improved.

The first embodiment is as follows:

when a user needs to perform multi-engine search on a large amount of text data, firstly, inputting the text data into a pre-trained text recognition sub-model for recognition and prediction to obtain a word set to be searched, secondly, visually displaying words to be searched in the word set to be searched according to a preset word display template to be searched, and generating a word graph to be searched. The method comprises the steps that a user clicks or touches a graph of a word to be searched according to actual searching requirements, and when the graph of the word to be searched is clicked or touched, a searching result set of the word to be searched corresponding to the current graph of the word to be searched in at least one searching engine is obtained, wherein the searching result set comprises at least one searching result; then, obtaining the association degree between the search result and the corresponding word to be searched, and scoring the search result set according to the association degree and a preset scoring rule to obtain the score of the search result set; finally, according to the scores of the search result sets, at least one search result set is ranked and visually displayed, a search result set graph is generated on a display screen, a schematic diagram of the display screen refers to fig. 9, a word graph to be searched corresponds to the search result set graph, the search result set corresponds to a search engine, and the word graph to be searched (such as the word 1 to be searched, the word 2 to be searched, the word 3 to be searched, the word 4 to be searched, the word 5 to be searched, and the word 6 … … to be searched) and the search result set graph of the corresponding search engine (such as the result set of the search engine 1, the result set of the search engine 2, and the result set of the search engine 3 … …) are displayed on the display screen according to a preset search result set display template. The user browses the displayed page by sliding the slide button 01 on the edge of the screen.

When a user needs to export search result sets in batches, a data export task is created by selecting a corresponding search result set graph, at least one selected search result set is packaged according to the data export task, a data packet to be exported is obtained, the data packet to be exported is exported by using a preset data export rule, and the batch export of the search result sets is completed. The operation is convenient to implement and high in flexibility, the requirement of a user for searching a large number of texts by multiple search engines is well met, and operations such as crawlers are avoided.

Example two:

when a user needs to perform multi-engine search on a large amount of voice data, the voice data is input into a pre-trained voice recognition submodel for recognition and prediction, a word set to be searched is obtained, words to be searched in the word set to be searched are visually displayed according to a preset word display template to be searched, and a word graph to be searched is generated. The method comprises the steps that a user clicks or touches a graph of a word to be searched according to actual searching requirements, and when the graph of the word to be searched is clicked or touched, a searching result set of the word to be searched corresponding to the current graph of the word to be searched in at least one searching engine is obtained, wherein the searching result set comprises at least one searching result; then, obtaining the association degree between the search result and the corresponding word to be searched, and scoring the search result set according to the association degree and a preset scoring rule to obtain the score of the search result set; according to the scores of the search result sets, at least one search result set is ranked and visually displayed, a search result set graph is generated on a display screen, multi-engine search of a large amount of voice data is completed, the visualization degree is high, operation is convenient, and accuracy is high.

Example three:

when a user needs to perform combined multi-engine search on a large amount of voice data and text data, inputting the text data into a text recognition sub-model for recognition and prediction to obtain a text semantic tag set, inputting the voice data into a voice recognition sub-model for recognition and prediction to obtain a voice semantic tag set, then obtaining a first confidence coefficient of text semantic tags in the text semantic tag set and a second confidence coefficient of voice semantic tags in the voice semantic tag set, and screening the text semantic tags and the voice semantic tags according to the first confidence coefficient and the second confidence coefficient to obtain a word set to be searched.

The method comprises the steps that visual display is conducted on words to be searched in a word set to be searched according to a preset word display template to be searched, a word graph to be searched is generated, a user clicks or touches the word graph to be searched according to actual search requirements, when the word graph to be searched is clicked or touched, a search result set of the words to be searched in at least one search engine corresponding to the current word graph to be searched is obtained, and the search result set comprises at least one search result; the method comprises the steps of obtaining the relevance between a search result and a word to be searched corresponding to the search result, scoring a search result set according to the relevance and a preset scoring rule, obtaining the score of the search result set, sequencing and visually displaying the search result set of the word to be searched in a plurality of search engines according to the score of the search result set, and generating a search result set graph on a display screen.

Example four:

when a user needs to perform multi-engine search on an existing word set to be searched, acquiring the existing word set to be searched through manual input or file uploading, and visually displaying words to be searched in the word set to be searched according to a preset word display template to be searched to generate a word graph to be searched; the method comprises the steps that a user clicks or touches a graph of a word to be searched to obtain a search result set of the word to be searched in at least one search engine corresponding to the graph of the current word to be searched, the search result set comprises at least one search result, the association degree between the search result and the word to be searched corresponding to the search result set is obtained, the search result set is scored according to the association degree and a preset scoring rule, the score of the search result set is obtained, the search result sets of the word to be searched in a plurality of search engines are ranked and visually displayed according to the score of the search result set, and a graph of the search result set is generated on a display screen.

Example five:

when a user needs to search the image data by multiple engines, inputting the image data into a trained image recognition sub-model for feature recognition and semantic prediction, and acquiring a corresponding word set to be searched. And visually displaying the words to be searched in the word set to be searched according to a preset word display template to be searched, and generating a word graph to be searched. The user clicks or touches the graph of the word to be searched, obtains a search result set of the word to be searched in at least one search engine corresponding to the graph of the current word to be searched, and obtains the association degree between the search result in the search result set and the word to be searched corresponding to the search result set. And scoring the search result set according to the association degree and a preset scoring rule to obtain the score of the search result set. And then according to the scores of the search result sets, the search result sets of the words to be searched in the plurality of search engines are ranked and visually displayed, so that multi-engine search of the image data is completed, and the cost is low.

When a user needs to search the video data by multiple engines, firstly, image interception processing is carried out on the video data to obtain an image set to be searched, images in the image set to be searched are input into an image recognition sub-model to carry out image recognition and semantic prediction, and a corresponding word set to be searched is obtained. And then, visually displaying the words to be searched in the word set to be searched according to a preset word display template to be searched, and generating a word graph to be searched. The user clicks or touches the graph of the word to be searched according to the actual search requirement, when the graph of the word to be searched is clicked or touched, a search result set of the word to be searched in at least one search engine corresponding to the current graph of the word to be searched is obtained, and the association degree between the search result in the search result set and the word to be searched corresponding to the search result set is obtained. And scoring the search result set according to the association degree and a preset scoring rule to obtain the score of the search result set. And finally, according to the scores of the search result sets, the search result sets of the words to be searched in the plurality of search engines are ranked and visually displayed, so that multi-engine search of the video data is completed, and the practicability is high.

Referring to fig. 10, the present disclosure also provides a data search system, including:

a to-be-searched word set obtaining module 1001 configured to obtain data to be searched; inputting data to be searched into a pre-trained recognition model for extracting words to be searched, and acquiring a set of words to be searched;

the first visual display module 1002 is configured to perform visual display on a word to be searched in a word set to be searched, and generate a word graph to be searched;

the searching module 1003 is configured to obtain a search result set of a word to be searched in at least one search engine, where the word to be searched corresponds to the word to be searched when the word to be searched is triggered, and the search result set includes at least one search result;

the scoring module 1004 is configured to obtain a degree of association between the search result and the corresponding word to be searched, score the search result set according to a preset scoring rule and the degree of association, and obtain a score of the corresponding search result set;

a second visual display module 1005, configured to sort and visually display the search result set according to the scores, and generate a search result set graph, where the search result set graph corresponds to the search engine and the word to be searched respectively; the word set to be searched acquisition module 1001, the first visual display module 1002, the search module 1003, the scoring module 1004 and the second visual display module 1005 are connected in sequence. The system inputs the data to be searched into the pre-trained recognition model for extracting the words to be searched through acquiring the data to be searched, acquires the word set to be searched, can better recognize and predict the words to be searched for the acquired data to be searched, and is convenient for multi-engine search of the words to be searched in the word set to be searched for subsequently. And visually displaying the words to be searched in the word set to be searched to generate a word graph to be searched, triggering the word graph to be searched to obtain a search result set of the words to be searched in at least one search engine corresponding to the word graph to be searched, obtaining the association degree between the search result and the corresponding words to be searched, scoring the search result set according to a preset scoring rule and the association degree to obtain the score of the search result set, and sequencing and visually displaying the search result set according to the score to generate the search result set graph. The requirement of a user for searching a large amount of data or data through multiple search engines is well met, the user experience is effectively improved, the intelligent degree is high, the visualization degree is high, the cost is low, the search time is effectively shortened, the practicability is high, and the flexibility is high.

In some embodiments, the data to be searched includes at least one of: text data and voice data;

identifying the model includes at least one of: a text recognition submodel and a speech recognition submodel.

In some embodiments, the obtaining of the text recognition submodel comprises:

respectively inputting a training sentence and a training word set into a preset first semantic recognition network for semantic recognition, and acquiring a first prediction semantic tag set and a second prediction semantic tag set, wherein the first prediction semantic tag set comprises at least one first prediction semantic tag, the second prediction semantic tag set comprises at least one second prediction semantic tag, the first prediction semantic tag set corresponds to the training sentence, and the second prediction semantic tag set corresponds to the training word set;

and performing iterative training on the first semantic recognition network according to the difference between the first real semantic tag and the first predicted semantic tag and the difference between the second real semantic tag and the second predicted semantic tag to obtain a text recognizer model.

In some embodiments, the step of obtaining the speech recognition submodel comprises:

obtaining a second training set, the second training set comprising: the voice sample data and a third real semantic label corresponding to the voice sample data;

carrying out noise reduction processing on voice sample data to obtain noise reduction sample data;

and performing joint training on the voice recognition network and the second semantic recognition network according to the difference between the third real semantic tag and a third predicted semantic tag in a third predicted semantic tag set to obtain a voice recognition sub-model.

In some embodiments, the to-be-searched word set obtaining module 1001 inputs the to-be-searched data into a pre-trained recognition model for extracting the to-be-searched word, and the step of obtaining the to-be-searched word set includes:

when the data to be searched is text data, inputting the text data into a text recognition sub-model for recognition and prediction, acquiring a text semantic tag set output by the text recognition sub-model, and taking the text semantic tag set as a word set to be searched;

and when the data to be searched is voice data, inputting the voice data into the voice recognition sub-model for recognition and prediction, acquiring a voice semantic tag set output by the voice recognition sub-model, and taking the voice semantic tag set as a word set to be searched.

In some embodiments, the to-be-searched word set obtaining module 1001 inputs the to-be-searched data into a pre-trained recognition model for extracting the to-be-searched word, and the step of obtaining the to-be-searched word set further includes:

when the data to be searched is text data and voice data, inputting the text data into a text recognition submodel for recognition and prediction to obtain a text semantic tag set, and inputting the voice data into a voice recognition submodel for recognition and prediction to obtain a voice semantic tag set;

and screening the text semantic tags and the voice semantic tags according to the first confidence coefficient and the second confidence coefficient to obtain a word set to be searched.

In some embodiments, when the graph of the word to be searched is triggered in the search module 1003, the step of obtaining a search result set of the word to be searched in at least one search engine corresponding to the graph of the word to be searched includes:

the check information is used as search engine to verify whether the search engine has the search authority or not according to the check information, and a verification result is obtained; and the search engine searches the word to be searched corresponding to the current word graph to be searched according to the verification result and the search command to obtain at least one search result set.

In some embodiments, further comprising: the batch export module is used for exporting the at least one search result set in batches;

the batch export step comprises:

creating a data export task by selecting a search result set graph;

according to the data export task, packaging the selected at least one search result set to obtain a data packet to be exported, wherein the data packet to be exported comprises: a search result list and at least one search result set, the search result list comprising: the number of the search result set and the name of a search engine corresponding to the search result set;

The present disclosure also provides an electronic device and a readable storage medium according to an embodiment of the present disclosure.

FIG. 11 shows a schematic block diagram of an example electronic device 1100 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 11, the device 1100 comprises a computing unit 1101, which may perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data necessary for the operation of the device 1100 may also be stored. The calculation unit 1101, the ROM 1102, and the RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

A number of components in device 1100 connect to I/O interface 1105, including: an input unit 1106 such as a keyboard, a mouse, and the like; an output unit 1107 such as various types of displays, speakers, and the like; a storage unit 1108, such as a magnetic disk, optical disk, or the like; and a communication unit 1109 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 1109 allows the device 1100 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 1101 can be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 1101 performs the respective methods and processes described above, such as the data search method. For example, in some embodiments, the data search method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1108. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1100 via ROM 1102 and/or communications unit 1109. When the computer program is loaded into RAM 1103 and executed by computing unit 1101, one or more steps of the data search method described above may be performed. Alternatively, in other embodiments, the computing unit 1101 may be configured to perform the data search method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present disclosure, "a plurality" means two or more unless specifically limited otherwise.

The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present disclosure, and all the changes or substitutions should be covered within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method of searching data, the method comprising:

acquiring data to be searched;

2. The data search method of claim 1,

the data to be searched at least comprises one of the following data: text data and voice data;

3. The data searching method of claim 2, wherein the step of obtaining the text recognition submodel comprises:

4. The data searching method of claim 2, wherein the obtaining of the sub-speech recognition model comprises:

denoising the voice sample data to obtain denoising sample data;

and performing joint training on the voice recognition network and the second semantic recognition network according to the difference between the third real semantic label and a third predicted semantic label in the third predicted semantic label set to obtain the voice recognition submodel.

5. The data search method of claim 2, wherein the data to be searched is input into a pre-trained recognition model for extracting words to be searched, and the step of obtaining the set of words to be searched comprises:

6. The data searching method of claim 5, wherein the step of inputting the data to be searched into a pre-trained recognition model for extracting the words to be searched and obtaining the set of words to be searched further comprises:

7. The data searching method of claim 1, wherein the step of obtaining a search result set of the word to be searched in at least one search engine corresponding to the word graph to be searched when the word graph to be searched is triggered comprises:

8. The data search method of claim 1 wherein the step of generating a search result set graph is followed by:

creating a data export task by selecting the search result set graph;

9. A data search system, the system comprising:

the first visual display module is used for visually displaying the words to be searched in the word set to be searched and generating a word graph to be searched;

the scoring module is used for obtaining the association degree between the search result and the corresponding word to be searched, scoring the search result set according to a preset scoring rule and the association degree, and obtaining the score of the corresponding search result set;

10. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

11. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.