CN116431764A

CN116431764A - Data matching method, device, equipment and storage medium

Info

Publication number: CN116431764A
Application number: CN202310370161.9A
Authority: CN
Inventors: 崔金涛; 叶玮彬; 刘涛
Original assignee: Baidu China Co Ltd
Current assignee: Baidu China Co Ltd
Priority date: 2023-04-07
Filing date: 2023-04-07
Publication date: 2023-07-14

Abstract

The disclosure provides a data matching method, a device, equipment and a storage medium, and relates to the technical field of computers, in particular to the technical field of data processing. The specific implementation scheme is as follows: responding to a search instruction of a user terminal aiming at the keywords, and determining search reference information matched with the keywords in a first database; determining full data corresponding to the retrieval reference information in a second database according to the retrieval reference information, and taking the full data as a retrieval result of the key words; and sending the search result of the keyword to the user terminal for display by the user terminal. By the method, the storage cost of the search database can be effectively reduced, and the search efficiency can be improved.

Description

Data matching method, device, equipment and storage medium

Technical Field

The disclosure relates to the technical field of computers, in particular to the technical field of data processing, and specifically relates to a data matching method, a device, equipment and a storage medium.

Background

Keyword matching plays an increasingly important role in data processing technology. Specifically, after receiving a keyword input by a user, the keyword may be retrieved and filtered from a retrieval database storing massive data, so as to obtain all data matched with the keyword.

Disclosure of Invention

The present disclosure provides a data matching method, apparatus, device, and storage medium.

According to a first aspect of the present disclosure, there is provided a data matching method, the method comprising:

responding to a search instruction of a user terminal aiming at the keywords, and determining search reference information matched with the keywords in a first database; the retrieval reference information is information comprising keywords or meeting similarity conditions with the keywords; the first database is used for storing a plurality of retrieval reference information;

determining full data corresponding to the retrieval reference information in a second database according to the retrieval reference information, and taking the full data as a retrieval result of the key words; the second database is used for storing a plurality of pieces of retrieval reference information and full data corresponding to the retrieval reference information;

and sending the search result of the keyword to the user terminal for display by the user terminal.

According to a second aspect of the present disclosure, there is provided a data matching apparatus, the apparatus comprising:

the first determining module is used for responding to a search instruction of the user terminal aiming at the keywords and determining search reference information matched with the keywords in the first database; the retrieval reference information is information comprising keywords or meeting similarity conditions with the keywords; the first database is used for storing a plurality of retrieval reference information;

The second determining module is used for determining full data corresponding to the retrieval reference information in the second database according to the retrieval reference information, and taking the full data as a retrieval result of the keyword; the second database is used for storing a plurality of pieces of retrieval reference information and full data corresponding to the retrieval reference information;

and the sending module is used for sending the search result of the keyword to the user terminal so as to be displayed by the user terminal.

According to a third aspect of the present disclosure, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any one of the methods of the first aspect.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform any of the methods of the first aspect.

According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements any of the methods of the first aspect.

According to the technology disclosed by the invention, the problems of high storage cost and low retrieval efficiency of the retrieval database in the related technology are solved, the storage cost of the retrieval database is effectively reduced, the time consumed by retrieval is greatly reduced, and the retrieval efficiency is improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is an application scenario diagram of a data matching method shown in an embodiment of the present disclosure;

FIG. 2 is a flow chart of a data matching method according to an embodiment of the disclosure;

FIG. 3 is a flow chart of another data matching method shown in an embodiment of the present disclosure;

FIG. 4 is a diagram illustrating an example data structure of data stored in a second database according to an embodiment of the present disclosure;

FIG. 5 is a method flow diagram of another data matching method shown in an embodiment of the present disclosure;

FIG. 6 is a block diagram of a data matching device according to an embodiment of the present disclosure;

Fig. 7 is a schematic block diagram of an electronic device for implementing a data matching method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the technical scheme of the disclosure, the related information is collected, stored, used, processed, transmitted, provided, disclosed and the like, which are all in accordance with the regulations of related laws and regulations and do not violate the public order harmony.

Some of the terms in the embodiments of the present disclosure are explained below to facilitate understanding by those skilled in the art:

1. external memory

The external memory is a memory except for a computer memory and a CPU (Central Processing Unit ) cache, and has the characteristics of high retrieval speed and the like. Common external memories are hard disks, floppy disks, compact disks, and U-disks.

2. Distributed file system

The distributed file system can store a large amount of data to different nodes in a scattered mode, and the risk of data loss is greatly reduced.

3. MPP (Massively Parallel Processing, large-Scale parallel processing)

The MPP may distribute the traffic in parallel across multiple servers or nodes, and after the computation is completed on each server or each node, aggregate the results of each server or each node together to obtain the final result.

4. UDF (User Defined Functions user-defined function)

UDF refers to user-written code logic. UDFs may include UDFs supporting single line inputs and single line outputs, UDAFs supporting multiple line inputs and single line outputs (User Defined Aggregate Functions, user-defined aggregate functions), or UDTFs supporting single line inputs and multiple line outputs (User Defined Table generating Functions, user-defined split functions). Wherein, UDAF is applicable to multi-line data aggregation scene, and UDTF is applicable to data splitting scene.

The data matching method provided by the embodiment of the disclosure can be applied to keyword matching scenes.

Keyword matching is a common scenario of business data in business processes. Specifically, after receiving one or more keywords input by a user, searching and filtering can be performed from a database storing massive data, searching results matched with the keywords are obtained, and data analysis is performed on the obtained searching results. For example, articles or comments and the like matched with the keywords are obtained through a keyword matching technology, and business word frequency heat, business value and the like of the keywords are obtained through data analysis of the articles or comments and the like matched with the keywords.

Currently, in the related art, keywords are mainly matched in the following two ways:

mode one: and performing fuzzy matching by adopting an MPP calculation engine. Specifically, keywords are matched by a Like operator in an MPP calculation engine (e.g., hive or Spark). The MPP calculation engine scans each line of data in the database, judges whether the current line of data contains keywords through regular matching, screens out all data containing the keywords as search results, and carries out subsequent analysis on the search results. However, regular matching is an extremely inefficient computing method, and when data of large and medium enterprises reach the billion level, not only a lot of time is consumed, but also a lot of computing resources are consumed.

Mode two: the data volume is synchronized to the external memory for retrieval. In the scheme, before matching keywords, the whole data in the distributed file system is required to be imported into an external memory in advance, and in the data importing process, an inverted index is built for the fields needing keyword matching, namely, the characters in the field values are associated with the actual storage positions of the data. The scheme has the following problems: constructing the inverted index requires a search database for storing the keyword correspondence generated in the inverted index process, and a storage medium required for the search database is higher in price than the distributed file system, and thus, the storage cost is enormous.

In order to solve the problems of high storage cost and low retrieval efficiency of a retrieval database in the related art, the present disclosure provides a data matching method. After receiving a search instruction of a user terminal for a keyword, determining search reference information matched with the keyword through a first database, wherein the first database stores a plurality of search reference information, and a process of determining the search reference information according to the keyword, namely a search process, is performed, so that data search based on the keyword can be realized according to the search database, namely the first database. Further, the second database stores a plurality of pieces of search reference information and the total amount of data corresponding to the plurality of pieces of search reference information, and the search result is determined based on the search reference information, that is, the search process is determined based on the search reference information. On the one hand, the data searching process is divided into two processes of searching and inquiring, so that the data range of data searching can be reduced, and the data searching efficiency is improved. On the other hand, since only a small amount of retrieval reference information is stored in the first database and the rest of information is not contained, the storage cost of the first database can be effectively reduced. Furthermore, on the basis of the small data volume of the first database, when the first database is used for determining the retrieval reference information, the time consumed in the retrieval process can be greatly reduced, and the data retrieval efficiency is further improved.

Fig. 1 illustrates an application scenario diagram that may be used for one data matching method provided by embodiments of the present disclosure. As shown in fig. 1, the application scenario may include a user terminal 101, a server 102, a first database 103, and a second database 104.

In one embodiment, the user terminal 101 may be at least one of a smart phone, a smart watch, a desktop computer, a laptop computer, a wireless terminal, and a laptop portable computer. In one embodiment, the user terminal 101 has a communication function, and is capable of accessing a wired network or a wireless network. The user terminal 101 may refer broadly to one of a plurality of terminals, with the disclosed embodiments being illustrated only by the user terminal 101. Those skilled in the art will recognize that the number of terminals may be greater or lesser.

In the embodiment of the present disclosure, the user terminal 101 may be a terminal including a display, and the user terminal 101 may receive, through the display, a search instruction for a keyword input by a user. In one embodiment, the user terminal 101 may be communicatively coupled to the server 102. For example, the user terminal 101 may establish a connection with the server 102 through a wired network or a wireless network, and transmit a search instruction to the server 102 through the wired network or the wireless network, or receive a search result transmitted by the server 102.

In one embodiment, the server 102 may be an independent physical server, or may be a server cluster formed by a plurality of physical servers, or at least one of cloud servers that provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content distribution networks, and basic cloud computing services such as big data or artificial intelligence platforms, which are not limited in the embodiments of the present disclosure. In one embodiment, the number of servers 102 described above can be greater or fewer, and embodiments of the present disclosure are not limited in this regard. Of course, the server 102 can also include other functionality to provide more fully diversified services.

In the embodiment of the disclosure, the server 102 is configured to receive a search instruction for a keyword sent by the user terminal 101, determine, in response to the search instruction, search reference information matched with the keyword in the first database 103, determine, according to the search reference information, full-size data corresponding to the search reference information in the second database 104, as a search result of the keyword, and send the search result to the user terminal 101.

In one embodiment, the server 102 may have an MPP calculation engine and UDF deployed therein. Wherein the UDF may be built into the MPP calculation engine. In one embodiment, the data matching method provided by the embodiment of the disclosure can be implemented through interaction between the MPP calculation engine and the UDF, and the specific process may be: the server 102 receives a search instruction for keywords sent by the user terminal 101 through the MPP calculation engine, and then sends the keywords to the UDF; after receiving the keyword, the UDF may determine, from the first database 103, retrieval reference information matching the keyword, and send the retrieval reference information to the MPP calculation engine; the MPP calculation engine determines, from the second database 104, full-volume data corresponding to the retrieval reference information, as a retrieval result of the keyword, according to the received retrieval reference information.

In one embodiment, the first database 103 may include, but is not limited to, external memory. In the embodiment of the disclosure, the method is used for storing a plurality of pieces of retrieval reference information and retrieval information identifiers corresponding to the retrieval reference information. The second database 104 may include, but is not limited to, a distributed file system. In the embodiment of the disclosure, the method is used for storing a plurality of pieces of retrieval reference information, retrieval information identifiers corresponding to the retrieval reference information and full-volume data corresponding to the retrieval reference information.

In the embodiment of the disclosure, after the UDF with the external memory acceleration is deployed in the MPP calculation engine, the keyword matching capability same as that of the search database in the related technology can be obtained, so that in the embodiment of the disclosure, the search database does not need to be additionally arranged, the consumption cost of the search database is reduced, and the data matching efficiency can be improved.

The following describes a data matching method provided by the embodiment of the present disclosure based on an application scenario shown in fig. 1.

Fig. 2 is a flow chart of a data matching method according to an embodiment of the disclosure, as shown in fig. 2, the method includes:

s201, in response to a search instruction of the user terminal for the keywords, search reference information matched with the keywords is determined in a first database.

The first database is used for storing a plurality of retrieval reference information.

In the embodiment of the present disclosure, the similarity condition may include, but is not limited to, that the text similarity with the keyword is greater than or equal to a preset similarity. The preset similarity may be 90% or 95%, which is not limited in the embodiment of the present disclosure.

After receiving a search instruction of a user for an input keyword, the user terminal may send the search instruction to the server, and the server may determine search reference information matching the keyword in the first database in response to the search instruction.

S202, determining full data corresponding to the retrieval reference information in a second database according to the retrieval reference information, and taking the full data as a retrieval result of the key words.

The second database is used for storing a plurality of pieces of retrieval reference information and full data corresponding to the retrieval reference information.

After the search reference information matched with the keyword is determined in the first database through S201, full data corresponding to the search reference information matched with the keyword can be found out from the second database according to the search reference information, and the full data is used as a search result of the keyword.

And S203, sending the search result of the keyword to the user terminal for display by the user terminal.

After the search result of the keyword is obtained through S202, the search result of the keyword may be sent to the user terminal, so that the user terminal displays the received search result of the keyword through the display after receiving the search result of the keyword.

In the data matching method provided by the embodiment of the disclosure, after a search instruction of a user terminal for a keyword is received, search reference information matched with the keyword is determined through the first database, and because the first database stores a plurality of search reference information, a process of determining the search reference information according to the keyword, namely, a search process, is performed, so that data search based on the keyword can be realized according to the search database, namely, the first database. Further, the second database stores a plurality of pieces of search reference information and the total amount of data corresponding to the plurality of pieces of search reference information, and the search result is determined based on the search reference information, that is, the search process is determined based on the search reference information. Therefore, on one hand, the data searching process is divided into two processes of searching and inquiring, the data searching data range can be reduced, the data searching efficiency is improved, on the other hand, the first database only stores a small amount of searching reference information and does not contain other information, so that the storage cost of the first database can be effectively reduced, and further, on the basis that the data amount of the first database is small, the time consumed in the searching process can be greatly reduced when the searching reference information is determined through the first database, and the data searching efficiency is improved.

Fig. 3 exemplarily shows a method flowchart of another data matching method provided by the present disclosure, and as shown in fig. 3, the method includes:

s301, the user terminal responds to the search operation of the user for the keywords and sends a search instruction of the keywords to the server.

Wherein, the key words refer to the search words adopted in the process of data search. In one embodiment, the keywords may be text, letters, sentences, or speech, etc. The embodiments of the present disclosure are not limited to representations of keywords.

The retrieval operation is used for triggering the user terminal to send a retrieval instruction to the server. In one embodiment, the retrieve operation may be an enter operation, a select operation, or a hover operation. Illustratively, taking a logging operation as an example, the logging operation may be a logging operation in a text logging box or a voice logging box. Accordingly, S301 may be replaced by: and the user terminal responds to the input operation of the user for the keywords and sends a search instruction of the keywords to the server.

The search instruction is used for triggering the server to search data based on the keyword. In one embodiment, the user terminal may generate a search instruction carrying a keyword in response to a search operation of the user for the keyword, so as to send the search instruction of the keyword to the server subsequently.

S302, the server responds to a search instruction of the user terminal for the keywords, and search reference information matched with the keywords is determined in a first database.

Wherein the first database is used for storing a plurality of retrieval reference information. In one embodiment, the retrieval reference information is information including a keyword or satisfying a similarity condition with the keyword.

In one embodiment, the server determines, in response to a search instruction for a keyword by the user terminal, search reference information including the keyword in the first database as search reference information matching the keyword.

For example, assuming that the keyword is "communication", in response to a search instruction of the user terminal for the keyword "communication", search reference information including the keyword "communication" may be determined as search reference information matching the keyword.

In yet another embodiment, the server determines, in response to a search instruction for a keyword by the user terminal, search reference information satisfying a similarity condition with the keyword in the first database as search reference information matching the keyword.

For example, assuming that the keyword is "communication", in response to a search instruction of the user terminal for the keyword "communication", search reference information satisfying the similarity condition with the keyword "communication" may be determined as search reference information matching the keyword.

In another embodiment, the server determines, in response to a search instruction for a keyword from the user terminal, search reference information including the keyword and search reference information satisfying a similarity condition with the keyword as search reference information matching the keyword in the first database.

For example, assuming that the keyword is "communication", in response to a search instruction of the user terminal for the keyword "communication", search reference information including the keyword "communication" and search reference information satisfying a similarity condition with the keyword "communication" may be determined as search reference information matching the keyword "communication".

In the embodiment of the present disclosure, the similarity condition may include, but is not limited to, text similarity between keywords being greater than or equal to a preset similarity. The preset similarity may be 90% or 95%, which is not limited in the embodiment of the present disclosure.

The above embodiment takes the case of directly performing keyword matching, that is, directly determining the search reference information matched with the keyword according to the keyword carried in the search instruction as an example, to explain the scheme. In other embodiments, the first database is further configured to correspondingly store a plurality of target keywords and search reference information corresponding to the plurality of target keywords. Accordingly, the server can also determine the retrieval reference information matched with the keyword by determining the target keyword matched with the keyword and then determining the retrieval reference information corresponding to the target keyword.

The target keyword matched with the keyword may be the keyword, or may be a keyword that satisfies a similarity condition with the keyword.

In one embodiment, the keyword matching process may be: the server responds to a search instruction of the user terminal aiming at the keywords, determines a target keyword matched with the keywords in a first database, and acquires search reference information corresponding to the target keyword as the search reference information matched with the keywords. Therefore, the keyword matching efficiency can be further improved.

Further, taking the case that the similarity condition is that the text similarity is greater than or equal to the preset similarity as an example, the keyword matching process may be replaced by: the server responds to a search instruction of the user terminal aiming at the keywords, and determines search reference information corresponding to the target keywords which are the same as the keywords and search reference information corresponding to the target keywords which meet the similarity condition as search reference information matched with the keywords in a first database.

In one embodiment, before implementing the scheme, the server may further determine a plurality of search reference information from the full-volume data, and import the determined plurality of search reference information into the first database, thereby constructing a first database containing the plurality of search reference information. The plurality of search reference information can be obtained by screening by a background person of the server, or the plurality of search reference information can be obtained by automatically screening by the server.

Further, in one embodiment, after a plurality of pieces of search reference information are imported into the first database, target keywords corresponding to the plurality of pieces of search reference information may be determined, and then the plurality of pieces of search reference information and the target keywords corresponding to the plurality of pieces of search reference information are respectively stored in a corresponding manner through the inverted index capability of the first database. One target keyword may correspond to a plurality of retrieval reference information. In one embodiment, the target keyword may be a keyword included in the retrieval reference information, or the target keyword may be a keyword related to the retrieval reference information. By way of example, taking the retrieval reference information as a "communication identification method", the target keyword of the retrieval reference information may be communication or identification, or communication or discrimination.

Illustratively, it is assumed that the retrieval reference information received by the first database is detected as "a communication application method" and "a communication identification method". Taking 'communication' as a target keyword as an example, through the inverted index capability of the first database, the 'communication' and the 'communication application method' and the 'communication identification method' can be correspondingly stored; taking "application" as a target keyword as an example, the "application" and "a communication application method" can be stored correspondingly; taking the method as a target keyword as an example, the method, a communication application method and a communication identification method can be correspondingly stored; taking "recognition" as an example of the target keyword, the "recognition" and "a communication recognition method" may be stored correspondingly.

In the above embodiment, the plurality of search reference information and the target keywords corresponding to the plurality of search reference information may be pre-stored in advance through the inverted index capability of the first database, so that after determining the target keywords matching with the keywords in the first database in response to the search instruction of the user terminal for the keywords, the server may quickly determine the search reference information corresponding to the target keywords, thereby further improving the rate of matching the search reference information and improving the search efficiency.

S303, the server determines a retrieval information identifier corresponding to the retrieval reference information in the first database.

In the embodiment of the disclosure, the first database is further configured to store search information identifiers corresponding to the plurality of search reference information. Wherein the search information identification is a unique identifier for identifying each search reference information. In one embodiment, the retrieved information Identifier may be an information name, an information number, an information ID (Identifier), or the like, which is not limited by the embodiments of the present disclosure. Illustratively, the information ID may be uuid (Universally Unique Identifier, a universally unique identification code).

In some embodiments, after determining the search reference information matching the keyword in the first database through S301, the search information identifier corresponding to the search reference information matching the keyword may be determined according to the correspondence between the search reference information and the search information identifier stored in the first database.

In one embodiment, before implementing the scheme, the server may further determine, from the full-volume data, a search information identifier corresponding to the plurality of search reference information, import the determined plurality of search reference information and the search information identifier corresponding to the plurality of search reference information into the first database, and thereby construct the first database including the plurality of search reference information and the search information identifier corresponding to the plurality of search reference information.

In the above embodiment, after receiving the search instruction of the user terminal for the keyword, the first database is used to determine the search information identifier corresponding to the search reference information matched with the keyword, and because the first database stores a plurality of search reference information and the search information identifiers corresponding to the search reference information, the search information identifier corresponding to the search reference information can be quickly determined through the corresponding relationship between the search reference information and the search information identifier, thereby improving the efficiency of determining the search information identifier.

The above embodiment describes a process of determining the identification of the retrieval information, taking the retrieval reference information as an example. It should be noted that, on the basis of setting the target keyword in S302, the process of determining the search information identifier in S303 may be: after determining a target keyword matched with the keyword in the first database, the server acquires a retrieval information identifier corresponding to the target keyword as the retrieval information identifier matched with the keyword. For example, the server may determine, according to the correspondence between the target keyword and the search information identifier stored in the first database, the search information identifier that matches the keyword. Therefore, the efficiency of the identification matching of the search information can be further improved.

In one embodiment, after a plurality of search reference information and search information identifiers corresponding to the plurality of search reference information are imported into the first database, target keywords corresponding to the plurality of search reference information may be determined, and then the search information identifiers corresponding to the plurality of search reference information and the target keywords corresponding to the plurality of search reference information are respectively stored in a corresponding manner through the inverted index capability of the first database.

Illustratively, it is assumed that the search reference information received by the first database is detected as "one communication application method" and "one communication identification method", wherein the search information corresponding to "one communication application method" is identified as "aa1a" and the search information corresponding to "one communication identification method" is identified as "aa2b". Taking "communication" as a target keyword for example, through the inverted index capability of the first database, the "communication" and "aa1a" and "aa2b" can be correspondingly stored; taking "application" as a target keyword as an example, the "application" and "aa1a" can be correspondingly stored; taking the "method" as a target keyword as an example, the "method" and "aa1a" and "aa2b" can be stored correspondingly; taking "identification" as an example of the target keyword, the "identification" and "aa2b" may be stored correspondingly.

In the above embodiment, the inverted index capability of the first database may be utilized to store the correspondence between the target keyword, the search reference information and the search information identifier in the first database, that is, the first database may be directly used as the search database, without an additional search database, that is, without a storage medium for storing the additional search database, so that the resource cost and the price cost required for storing the storage medium of the additional search database may be saved; in addition, in the embodiment of the present disclosure, only a small amount of search reference information, or only a small amount of search reference information and search information identification are stored in the first database (i.e., only one field of search reference information or only two fields of search reference information and search information identification are stored in the first database, which can reduce the data size by more than 90% compared with the related art), so on the basis of a smaller data size of the first database, the storage cost of the database can be reduced, and when the search reference information or the search information identification is determined by the first database, the time consumed in the search process can be greatly reduced, thereby improving the efficiency of data search.

S304, the server determines the full data corresponding to the retrieval information identification in the second database according to the retrieval information identification, and the full data is used as a retrieval result of the key words.

The second database is used for storing a plurality of pieces of retrieval reference information and full data corresponding to the retrieval reference information. In one embodiment, the second database is further configured to store search information identifiers corresponding to the plurality of search reference information.

Illustratively, in one embodiment, assume that the search reference information is a title, and the search information corresponding to the search reference information is identified as uuid. Fig. 4 exemplarily shows a diagram of a data structure of data stored in the second database, as shown in fig. 4, in the second database, the data structure is used for storing, in addition to each title and uuid corresponding to each title, a full amount of data corresponding to each title, for example, url (Uniform Resource Location, uniform resource locator), type, time, category, author, etc. corresponding to each title.

In one embodiment, after determining the search information identifier corresponding to the search reference information matched with the keyword, an information identifier list may be generated according to the search information identifier corresponding to the search reference information, where the information identifier list may include each search information identifier corresponding to the search reference information. After the information identification list is generated in the above manner, the full-volume data corresponding to each search information identification contained in the information identification list can be determined from the second database according to the search information identification contained in the information identification list, and the full-volume data is used as a search result of the keyword.

In the data matching method, the search reference information matched with the keyword is firstly determined in the first database, then the search information identification corresponding to the search reference information is determined from the first database, and finally the search result of the keyword is determined from the second database according to the search information identification, wherein the search information identification can uniquely identify the search reference information, and further the corresponding search result can be accurately and efficiently searched from the second database, so that the accuracy of the search result is ensured, and the search efficiency is improved.

S305, the server performs data analysis on the search results of the keywords to obtain data analysis results of the search results.

In one embodiment, after obtaining the search result of the keyword through S304, the reading amount of the search result may be determined, and the reading amount may be used as a data analysis result of the search result; or, determining a first similarity between the search result and the keyword, and taking the first similarity as a data analysis result of the search result; or, acquiring the history reading record of the user terminal, and determining the second similarity between the search result and the history reading record to be used as a data analysis result of the search result.

The first similarity can be used for representing text similarity between the search result and the keywords, and the second similarity can be used for representing text similarity between the search result and the historical reading record of the user terminal.

For example, assuming that the search result is advertisement data, after receiving the search result of the keyword, the server may perform aggregate calculation on the search result (advertisement data) to obtain a business word frequency popularity corresponding to the keyword and a business hotword analysis result (for example, UV (Unique identifier), PV (Page View), click volume, etc.), and use the business word frequency popularity corresponding to the keyword and the business hotword analysis result as data analysis results of the search result.

S306, the server sends the search result and the data analysis result of the search result to the user terminal.

S307, the user terminal displays the search result according to the data analysis result.

In one embodiment, assuming that the data analysis result of the search result is the reading amount of the search result, the user terminal may display the search result in order of more reading amount of the search result after receiving the search result and the data analysis result of the search result.

In another embodiment, assuming that the data analysis result of the search result is the first similarity, the user terminal may display the search result in the order from the first similarity to the second similarity after receiving the search result and the data analysis result of the search result.

In another embodiment, assuming that the data analysis result of the search result is the second similarity, the user terminal may display the search result in the order from the higher similarity to the lower similarity after receiving the search result and the data analysis result of the search result.

In another embodiment, assuming that the data analysis result of the search result includes the reading amount and the first similarity of the search result, after receiving the search result and the data analysis result of the search result, the user terminal may display the search result in order of more than one reading amount of the search result, and when the reading amounts of any two or more search results are the same, may display the two or more search results with the same reading amount in order of from greater than one reading amount of the search result.

By the method, the user terminal can display the search results based on the data analysis results of the search results, and can help the user to preferentially see the search results with large reading quantity, or preferentially see the search results with large first similarity, or preferentially see the search results with large second similarity, so that the user is helped to timely see the search results required by the user.

In embodiments of the present disclosure, the server may include an MPP calculation engine and a UDF, which in some embodiments may be a UDTF in particular. Wherein the UDTF may be built into the MPP calculation engine. Specifically, the following describes specific implementation steps of the MPP calculation engine and the UDTF in the data matching method provided in the present disclosure, taking the first database as an external memory and the second database as a distributed file system as an example.

Fig. 5 exemplarily shows a method flowchart of another data matching method provided by the present disclosure, and as shown in fig. 5, the method includes:

s501, the MPP calculation engine receives a search instruction of the user terminal for the keywords.

S502, the MPP calculation engine sends the keywords to the UDTF.

In the embodiment of the present disclosure, before executing S502, a background person may implement embedding of the UDTF based on the UDF embedding capability of the MPP calculation engine. Specifically, the code of the UDTF may be uploaded to the distributed file system first; the implantation of the UDTF is then achieved by a function command. Illustratively, the function command may be: (create function storage _udf as 'com.baidu.udf.storageUdf' using jar 'hdfs:// xxx/xxx/storage_udf.jar';).

In the embodiment of the disclosure, when the external memory is a search database, the UDTF can act as a bridge between the MPP calculation engine and the external memory, so that the MPP calculation engine has the capability of searching the external data.

S503, the UDTF determines a search information identifier corresponding to the search reference information matched with the keyword from the external memory, and generates an information identifier list.

The specific implementation manner of determining the retrieval reference information matching the keyword from the external memory by the UDTF may refer to the implementation manner of S302, which is not described herein.

In the embodiment of the disclosure, since the UDTF supports single-line input and multi-line output, when one keyword is input to the UDTF, a plurality of pieces of information related to the keyword, that is, the retrieval reference information or the retrieval information identifier related to the embodiment of the disclosure, can be output, so that the purpose of inputting one keyword and then outputting richer information can be achieved. Therefore, the information quantity of the keywords can be increased, further, the subsequent data query process is realized according to the retrieval reference information or the retrieval information identification with rich information quantity, and the data retrieval efficiency is improved.

S504, the UDTF sends the information identification list to the MPP calculation engine.

S505, the MPP calculation engine determines full data corresponding to the retrieval information identification from the distributed file system according to the retrieval information identification contained in the information identification list, and the full data is used as a retrieval result of the key words.

S506, the MPP calculation engine performs data analysis on the search results of the keywords to obtain data analysis results of the search results.

In the embodiment of the disclosure, the MPP computing engine may directly obtain the search result of the keyword from the distributed file system according to the search information identifier, and directly perform data analysis on the search result of the keyword, without obtaining the search result of the keyword in the external memory, and then transmit the search result to the distributed file system to perform data analysis through the MPP computing engine. That is, in the related art, after obtaining the search result, data analysis is further required for the search result, which needs to synchronize the search result to the distributed file system, and then the data analysis of the search result is implemented by the MPP calculation engine, which may cause additional data processing cost. The data matching process in the embodiment of the disclosure is completely located in the MPP calculation engine, so that data matching and data analysis can be connected in a lossless manner, and additional data transmission cost is not required to be consumed.

In the data matching method, the external memory is a search database, the distributed file system is a query database, and the external memory (i.e. the search database) is only used for storing a plurality of search reference information and search information identifiers corresponding to each search reference information, so that the data quantity stored in the search database can be greatly reduced, and the storage cost of the search database is reduced; when the search reference information matched with the keyword is determined in response to the search instruction of the user terminal for the keyword, the search speed can be effectively improved, and the time consumed by search is reduced.

In an alternative embodiment, the second database may also be updated in response to an update instruction for any information in the second database.

In an alternative embodiment, the updating of the search reference information or the search information identification in the first database may also occur in response to the updating of the search reference information or the search information identification in the second database.

In one embodiment, the server updates the second database in response to an update instruction by the background person for any information in the second database. Further, after the information updating is completed, the original retrieval reference information and the original retrieval information identification of the information, and the retrieval reference information and the retrieval information identification after the information updating can be respectively compared (DIFF); if the retrieval reference information or the retrieval information identifier corresponding to the first information is not updated, ending the updating; and if the retrieval reference information or the retrieval information identification corresponding to the first information is updated, updating the retrieval reference information or the retrieval information identification corresponding to the first database.

By the method, the information in the second database can be updated in time in response to the update instruction of any information in the second database by a background person, and the retrieval reference information or the retrieval information identification in the first database can be updated in response to the update of the retrieval reference information or the retrieval information identification in the second database when the information in the second database is updated, so that the information in the first database and the information in the second database are kept consistent, and the accuracy of a retrieval result is further ensured.

The above-described method embodiments may be performed by a data matching device that may be used to implement functional modules or units of the methods described in embodiments of the present disclosure. As shown in fig. 6, the present disclosure provides a data matching apparatus 600, comprising:

a first determining module 601, configured to determine, in response to a search instruction of a user terminal for a keyword, search reference information matching the keyword in a first database; the retrieval reference information is information comprising keywords or meeting similarity conditions with the keywords; the first database is used for storing a plurality of retrieval reference information;

a second determining module 602, configured to determine, according to the search reference information, full-scale data corresponding to the search reference information in a second database, as a search result of the keyword; the second database is used for storing a plurality of pieces of retrieval reference information and full data corresponding to the retrieval reference information;

And the sending module 603 is configured to send the search result of the keyword to the user terminal for display by the user terminal.

By the device, the first determining module determines the retrieval reference information matched with the keyword through the first database after receiving the retrieval instruction of the user terminal aiming at the keyword, and the first database stores a plurality of retrieval reference information, and the process of determining the retrieval reference information according to the keyword, namely the retrieval process, can realize the data retrieval based on the keyword according to the retrieval database, namely the first database. Further, the second determination module determines the full-size data corresponding to the search reference information from the second database, and the second database stores a plurality of search reference information and the full-size data corresponding to the plurality of search reference information, and the process of determining the search result based on the search reference information, that is, the query process, is performed based on the search reference information. Therefore, on one hand, the data searching process is divided into two processes of searching and inquiring, the data searching data range can be reduced, the data searching efficiency is improved, on the other hand, the first database only stores a small amount of searching reference information and does not contain other information, so that the storage cost of the first database can be effectively reduced, and further, on the basis that the data amount of the first database is small, the time consumed in the searching process can be greatly reduced when the searching reference information is determined through the first database, and the data searching efficiency is improved.

Optionally, the first database is further configured to store target keywords corresponding to the plurality of search reference information;

the first determining module 601 is configured to:

determining target keywords matched with the keywords in the first database; the target keyword matched with the keyword is identical to the keyword, or the target keyword matched with the keyword and the keyword meet a similarity condition;

and acquiring retrieval reference information corresponding to the target keyword as retrieval reference information matched with the keyword.

Optionally, the first determining module 601 is configured to:

responding to a search instruction of the user terminal for the keywords, and sending the keywords to a user-defined function UDF through a massive parallel processing MPP calculation engine;

and determining retrieval reference information matched with the keyword in the first database through the UDF.

Optionally, the first database is further configured to store search information identifiers corresponding to the plurality of search reference information; the second database is further used for storing retrieval information identifiers corresponding to the plurality of retrieval reference information;

The second determining module 602 includes:

the first processing module is used for determining a retrieval information identifier corresponding to the retrieval reference information in the first database;

and the second processing module is used for determining the full data corresponding to the retrieval information identification in the second database according to the retrieval information identification, and taking the full data as a retrieval result of the keyword.

Optionally, the first processing module is configured to:

determining a retrieval information identifier corresponding to the retrieval reference information in the first database through the UDF;

the second processing module is configured to:

sending the retrieval information identification to the MPP calculation engine through the UDF;

and determining full data corresponding to the retrieval information identification in the second database through the MPP calculation engine based on the retrieval information identification.

Optionally, the data matching apparatus 600 further includes an updating module, where the updating module is configured to:

and updating the second database in response to an update instruction for any information in the second database.

Optionally, the updating module is further configured to:

in response to an update occurring to the retrieved reference information or the retrieved information identity in the second database, the retrieved reference information or the retrieved information identity in the first database is updated.

Optionally, the data matching device 600 further includes an analysis module, where the analysis module is configured to:

carrying out data analysis on the search results of the keywords to obtain data analysis results of the search results;

the sending module 603 is configured to:

and sending the search result and the data analysis result of the search result to the user terminal.

Optionally, the analysis module is configured to:

determining the reading quantity of the search result as a data analysis result of the search result; or alternatively, the first and second heat exchangers may be,

determining a first similarity between the search result and the keyword, and taking the first similarity as a data analysis result of the search result; or alternatively, the first and second heat exchangers may be,

and acquiring a history reading record of the user terminal, and determining a second similarity between the search result and the history reading record to be used as a data analysis result of the search result.

Optionally, the first database is an external memory; the second database is a distributed file system.

Fig. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the electronic device 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a random access Memory (Random Access Memory, RAM) 703. In the RAM 703, various programs and data required for the operation of the electronic device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in the electronic device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the electronic device 700 to exchange information/data with other devices through a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a central processing unit, a graphics processing unit (Graphics Processing Unit, GPU), various dedicated artificial intelligence (Artificial Intelligence, AI) computing chips, various computing units running machine learning model algorithms, a digital signal processor (Digital Signal Processing, DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the respective methods and processes described above, such as a data matching method. For example, in one embodiment, the data matching method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 708. In one embodiment, part or all of the computer program may be loaded and/or installed onto the electronic device 700 via the ROM 702 and/or the communication unit 709. When a computer program is loaded into RAM 703 and executed by computing unit 701, one or more steps of the data matching method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the data matching method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above can be implemented in digital electronic circuitry, integrated circuit systems, field programmable gate arrays (Field Programmable Gate Array, FPGAs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), application specific standard products (Application Specific Standard Parts, ASSPs), systems On Chip (SOC), complex programmable logic devices (Complex Programmable Logic Device, CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a random access Memory, a read-Only Memory, an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM, or flash Memory), an optical fiber, a compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device for displaying information to a user, for example, a Cathode Ray Tube (CRT) or a liquid crystal display (Liquid Crystal Display, LCD) monitor; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local area network (Local Area Network, LAN), wide area network (Wide Area Network, WAN) and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A data matching method, comprising:

responding to a search instruction of a user terminal aiming at a keyword, and determining search reference information matched with the keyword in a first database; the retrieval reference information is information comprising the keywords or meeting similarity conditions with the keywords; the first database is used for storing a plurality of retrieval reference information;

determining full data corresponding to the retrieval reference information in a second database according to the retrieval reference information, and taking the full data as a retrieval result of the keyword; the second database is used for storing a plurality of pieces of retrieval reference information and full data corresponding to the retrieval reference information;

2. The method of claim 1, wherein the first database is further configured to store target keywords corresponding to the plurality of search reference information;

the determining, in the first database, the retrieval reference information matched with the keyword includes:

3. The method of claim 1, wherein the determining, in response to a search instruction for a keyword by a user terminal, search reference information matching the keyword in a first database includes:

4. The method of claim 1, wherein the first database is further configured to store search information identifiers corresponding to the plurality of search reference information; the second database is further used for storing retrieval information identifiers corresponding to the plurality of retrieval reference information;

the determining, according to the search reference information, full-scale data corresponding to the search reference information in a second database, as a search result of the keyword, includes:

determining a retrieval information identifier corresponding to the retrieval reference information in the first database;

and determining full data corresponding to the retrieval information identification in the second database according to the retrieval information identification, and taking the full data as a retrieval result of the keyword.

5. The method of claim 4, wherein the determining, in the first database, the retrieved information identity corresponding to the retrieved reference information comprises:

the determining, according to the search information identifier, full-quantity data corresponding to the search information identifier in the second database, as a search result of the keyword, includes:

6. The method of claim 1, further comprising:

7. The method of claim 6, further comprising:

and updating the retrieval reference information or the retrieval information identification in the first database in response to the retrieval reference information or the retrieval information identification in the second database being updated.

8. The method of claim 1, further comprising:

carrying out data analysis on the search result of the keyword to obtain a data analysis result of the search result;

the step of sending the search result of the keyword to the user terminal includes:

9. The method of claim 8, wherein the performing data analysis on the search result of the keyword to obtain a data analysis result of the search result includes:

determining a first similarity between the search result and the keyword as a data analysis result of the search result; or alternatively, the first and second heat exchangers may be,

and acquiring a history reading record of the user terminal, and determining a second similarity between the search result and the history reading record as a data analysis result of the search result.

10. The method of any of claims 1-9, wherein the first database is an external memory; the second database is a distributed file system.

11. A data matching apparatus comprising:

the first determining module is used for responding to a search instruction of the user terminal aiming at the keywords and determining search reference information matched with the keywords in the first database; the retrieval reference information is information comprising the keywords or meeting similarity conditions with the keywords; the first database is used for storing a plurality of retrieval reference information;

the second determining module is used for determining the full data corresponding to the retrieval reference information in a second database according to the retrieval reference information, and taking the full data as a retrieval result of the keyword; the second database is used for storing a plurality of pieces of retrieval reference information and full data corresponding to the retrieval reference information;

12. The apparatus of claim 11, wherein the first database is further configured to store target keywords corresponding to the plurality of search reference information;

the first determining module is configured to:

13. The apparatus of claim 11, wherein the first determining module is configured to:

14. The apparatus of claim 11, wherein the first database is further configured to store search information identifiers corresponding to the plurality of search reference information; the second database is further used for storing retrieval information identifiers corresponding to the plurality of retrieval reference information;

The second determining module includes:

15. The apparatus of claim 14, wherein the first processing module is configured to:

the second processing module is configured to:

16. The apparatus of claim 11, further comprising:

and the updating module is used for responding to an updating instruction aiming at any information in the second database and updating the second database.

17. The apparatus of claim 16, the update module further to:

18. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.

19. A non-transitory computer readable storage medium storing computer instructions for causing an electronic device to perform the method of any one of claims 1-10.

20. A computer program product comprising a computer program which, when executed by a processor, implements the method of any of claims 1-10.