CN107145525B

CN107145525B - Data processing method for confirming search scene, search method and corresponding device

Info

Publication number: CN107145525B
Application number: CN201710243857.XA
Authority: CN
Inventors: 吴霄; 梁东; 苟秋媛; 张潇
Original assignee: Beijing Xingxuan Technology Co Ltd
Current assignee: Beijing Xingxuan Technology Co Ltd
Priority date: 2017-04-14
Filing date: 2017-04-14
Publication date: 2020-10-16
Anticipated expiration: 2037-04-14
Also published as: CN107145525A

Abstract

The embodiment of the invention provides a data processing method, a searching method and a corresponding device for confirming a searching scene, and relates to the field of data processing and searching. The data processing method comprises the following steps: establishing an initial data mapping between the first data set and the second data set; adjusting the initial data mapping according to a supervised data set to obtain an actual data mapping between the first data set and the second data set; determining a search scenario to which second data in the second data set is mapped based on first data in the first data set to which the second data in the second data set is actually mapped. By adopting the method and the device, the data mapping relation can be effectively optimized, the mapping accuracy is improved, and the accuracy of the subsequent determination of the search scene is further improved; the matching efficiency is improved, the breadth of the matching scene is effectively improved, and the accuracy of the search result is improved.

Description

Data processing method for confirming search scene, search method and corresponding device

Technical Field

The embodiment of the invention relates to the field of data processing and searching, in particular to a data processing method and a searching method for confirming a searching scene and a corresponding device.

Background

The O2O e-commerce platform has rapidly emerged in the internet field in recent years, with the takeaway field dominated by catering being the most rapidly developing. The user finishes consumption by searching and selecting food on application software, and one core function necessarily involved in the process is searching.

Unlike traditional general text search engines such as hundredths, GOOGLE and the like, search engines of catering e-commerce need to develop search tasks through specific search scenes and specialized data sources. For example, searching for "deep-fried dough sticks", the corresponding specific scenes should be breakfast, north, etc. In brief, the search scene is information behind the user search behavior, for example, searching for crayfish, the corresponding search scene is information such as summer, night, multi-person party and seafood, and the expected result of the user can be produced more accurately through the association of the scene data.

At present, search scene recognition technology based on knowledge in the catering field is still in the groping stage in China. In the industry, as the catering field starts to search late and develops rapidly, the technical upgrade of search scene identification cannot keep pace with the improvement of the demand; the research progress in the field is basically stopped due to the difficulty in obtaining large-scale high-value search data in the academic world. However, the huge market demand puts a great pressure on the search of the catering field. Therefore, accurate and specialized search scene recognition becomes the core optimization direction of the search engine technology in this field.

In the prior art, scene recognition of vertical e-commerce search in the catering field is mainly completed in a manual marking mode. The method has the defects of high labor cost, strong marking standard subjectivity, incapability of objectively unifying and the like. Even if the prior art supports an automatic mode, the accurate and specialized identification of a search scene is difficult to guarantee.

Disclosure of Invention

In order to overcome the defects in the prior art, embodiments of the present invention provide a data processing method, a search method and a corresponding device for confirming a search scene, which can automatically and precisely implement mapping of the search scene, improve the recognition accuracy of the search scene, and improve the accuracy of the search result.

In a first aspect, an embodiment of the present invention provides a data processing method for confirming a search scenario, including:

establishing an initial data mapping between a first data set and a second data set, the first data set comprising a plurality of items of first data and the second data set comprising a plurality of items of second data;

adjusting the initial data mapping according to a supervised data set to obtain an actual data mapping between the first data set and the second data set;

and determining a search scene corresponding to second data in the second data set based on the first data in the first data set to which the second data in the second data set is actually mapped.

In an implementation manner of the embodiment of the present invention, the first data set is a scene feature library in the catering field, and the second data set includes dish data and merchant data.

In one implementation of the embodiment of the present invention, the method further includes: and processing the first data source according to the time dimension and the geographic dimension to obtain the first data set. Alternatively, the method further comprises: and performing word segmentation analysis, word frequency analysis, word stem extraction and semantic analysis on the supervision data source to obtain the supervision data set.

In one implementation of the embodiment of the invention, the supervision data in the supervision data set comprises, in addition to the phrase name, a weight and/or a penalty factor.

Further, the adjusting the initial data mapping according to a supervised data set includes:

determining the matched supervision data and first data by adopting text matching processing;

for each item of second data, modifying a mapping relationship between the second data and the first data to which it is initially mapped, based on a weight of the supervisory data that matches the first data to which the second data is initially mapped, and/or,

for each item of second data, the weight of the first data to which the second data is initially mapped is adjusted based on a penalty factor for the supervisory data that matches the first data to which the second data is initially mapped.

In an implementation manner of the embodiment of the present invention, the determining, based on first data in the first data set to which second data in the second data set is actually mapped, a search scenario corresponding to the second data in the second data set includes: for each item of second data, at least part of first data or a combination of the at least part of first data is selected from first data actually mapped to the second data as the search scene.

In a second aspect, an embodiment of the present invention provides a search scene identification method, where the method includes:

cutting words of the search terms to obtain search words;

determining matching data in the second data set, which are matched with the search terms, through matching processing;

determining a search scene corresponding to the search term according to the search scene mapped by the matching data;

and determining the search scene mapped by the second data set by adopting the data processing method.

In a third aspect, an embodiment of the present invention further provides a search method, including:

determining a search scene corresponding to a search term according to the search term, a second data set and a search scene mapped by the second data set, wherein the scene mapped by the second data set is determined by the data mapping method (an output result of the step is to identify the search scene, which can be specifically realized by the second aspect);

loading a data file corresponding to the search scene, wherein the data file is configured with an optimization strategy of recall data;

and optimizing and sequencing the recalled data according to the data file.

In a fourth aspect, an embodiment of the present invention provides a data processing apparatus for confirming a search scenario, including:

the data mapping establishing module is used for establishing data mapping between a first data set and a second data set, wherein the first data set comprises a plurality of items of first data, and the second data set comprises a plurality of items of second data;

the data mapping adjusting module is used for adjusting the initial data mapping according to a supervision data set to obtain an actual data mapping between the first data set and the second data set;

and the search scene mapping module is used for determining a search scene corresponding to second data in the second data set based on first data in the first data set to which the second data in the second data set is actually mapped.

In one implementation manner of the embodiment of the present invention, the apparatus further includes: and the first data processing module is used for processing the first data source according to the time dimension and the geographic dimension to obtain the first data set. Alternatively, the apparatus further comprises: and the supervision data processing module is used for carrying out word segmentation analysis, word frequency analysis, word stem extraction and semantic analysis on a supervision data source to obtain the supervision data set.

Further, the data mapping adjustment module comprises: the matching sub-module is used for determining the mutually matched supervision data and first data by adopting text matching processing; the first adjusting submodule is used for modifying the mapping relation between the second data and the first data to which the second data is initially mapped based on the weight of the supervision data matched with the first data to which the second data is initially mapped aiming at each item of second data, and/or the second adjusting submodule is used for adjusting the weight of the first data to which the second data is initially mapped based on the penalty factor of the supervision data matched with the first data to which the second data is initially mapped aiming at each item of second data.

In an implementation manner of the embodiment of the present invention, the search scene mapping module is specifically configured to: for each item of second data, at least part of first data or a combination of the at least part of first data is selected from first data actually mapped to the second data as the search scene.

In a fifth aspect, an embodiment of the present invention provides a search scene recognition apparatus, including:

the word cutting module is used for cutting words of the search terms to obtain search words;

the matching module is used for determining matching data matched with the search terms in the second data set through matching processing;

the determining module is used for determining a search scene corresponding to the search item according to the search scene mapped by the matching data;

wherein the scene mapped by the second data set is determined by the data mapping method.

In a sixth aspect, an embodiment of the present invention provides a search apparatus, including:

a scene determining module, configured to determine a search scene corresponding to a search term according to the search term, a second data set, and a search scene mapped by the second data set, where the scene mapped by the second data set is determined by using the data mapping method (an output result of the scene determining module is to identify the search scene, which may be specifically implemented by the search scene identifying device);

the loading module is used for loading a data file corresponding to the search scene, and the data file is configured with an optimization strategy of recall data;

and the optimization module is used for optimizing and sequencing the recall data according to the loaded data file.

The functions of the search scene recognition device and the search device can be realized by hardware, and can also be realized by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above-described functions.

In a possible design, the above search scene recognition apparatus or the search apparatus includes a processor and a memory, the memory is used for storing a program for supporting the relevant apparatus to execute the corresponding processing, and the processor is configured to execute the program stored in the memory. The associated apparatus may also include a communication interface for the apparatus to communicate with other devices or communication networks.

In a seventh aspect, an embodiment of the present invention provides a computer storage medium for storing computer software instructions for the search scene recognition apparatus and/or the search apparatus, which contains a program for executing the corresponding method described above to enable the search scene recognition apparatus and/or the search apparatus to implement corresponding data processing.

According to the embodiment of the invention, the data mapping relation can be effectively optimized, the mapping accuracy is improved, and the accuracy of the subsequent determination of the search scene is further improved; in addition, the matching efficiency can be improved, the breadth of the matching scene is effectively improved, and the accuracy of the search result is effectively improved.

These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described hereinafter.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a flow chart diagram of a data processing method for validating a search scenario, in accordance with an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a method for creating a scene feature library according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart diagram of a method of acquiring supervisory data in accordance with an embodiment of the present invention;

FIG. 4 is a schematic diagram of data mapping logic according to an embodiment of the present invention;

FIG. 5 is a flow chart illustrating a data mapping method according to an embodiment of the invention;

FIG. 6 is a flowchart illustrating a search scene recognition method according to an embodiment of the present invention;

FIG. 7 is a flow chart illustrating a searching method according to an embodiment of the present invention;

fig. 8 is an example of a block diagram of a data processing apparatus for confirming a search scene according to an embodiment of the present invention;

fig. 9 is an example of a block diagram of a search scene recognition apparatus according to an embodiment of the present invention;

fig. 10 is an example of a block diagram of a search apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.

In some of the flows described in the present specification and claims and in the above figures, a number of operations are included that occur in a particular order, but it should be clearly understood that these operations may be performed out of order or in parallel as they occur herein, with the order of the operations being indicated as 101, 102, etc. merely to distinguish between the various operations, and the order of the operations by themselves does not represent any order of performance. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.

First, partial nouns to which the present invention relates or may relate will be described. These explanations are merely for convenience of understanding and do not constitute limitations on the various embodiments of the present invention.

The searching technology is used for establishing an information database and index data information aiming at data resources of the Internet, realizing performance optimization through various software and hardware technologies and carrying out function optimization on searching accuracy and sequencing results by utilizing a relevant algorithm strategy.

Scene recognition, namely performing deep data mining based on big data and natural language processing aiming at search keywords, analyzing the search scenes of the keywords, and further optimizing the search results from a higher level.

Domain knowledge, expertise and skills in the industry. A domain refers to a defined area of a specialty or industry, such as finance, manufacturing, dining, etc. The knowledge framework formed by expert experience, skill and management quality in the field is called the knowledge field.

Natural language processing is a process and related technology for processing natural language information with a computer. Natural language refers to the human's own written or spoken form of language, such as chinese, english, japanese, etc., as opposed to man-made formalized computer languages. The key to processing natural language is to let the computer solve the natural language.

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a flowchart illustrating a data processing method for confirming a search scenario according to an embodiment of the present invention, and referring to fig. 1, the method includes:

10: an initial data mapping between the first data set and the second data set is established. Wherein the first data set contains a plurality of items of first data and the second data set contains a plurality of items of second data.

In the present invention, the first data set and the second data set each contain data that can be directly subjected to data mapping processing. How to obtain the first data set and the second data set in a specific application environment will be explained in detail below.

Optionally, in this embodiment, the process 10 may also be understood as performing data tagging on the second data set by using the first data set, so as to establish an initial mapping relationship between the first data set and the second data set.

12: adjusting the initial data mapping according to a supervised data set to obtain an actual data mapping between the first data set and the second data set.

Optionally, in an implementation manner of this embodiment, the role of the supervisory data set is to optimize the initial mapping obtained by the process 10, for example, to prevent the overfitting situation of the data markers of the first data set, and to limit the mapping strength.

The supervision data set comprises supervision data, the supervision data can be understood as a normalized data sample, the normalized data sample is used for assisting in data filtering, adjusting, optimizing and the like, and the data reference significance is achieved.

14: determining a search scenario mapped by the second data set based on the actual data mapping. Specifically, based on the first data in the first data set to which the second data in the second data set is actually mapped, the search scene corresponding to the second data in the second data set is determined.

By adopting the method provided by the embodiment, compared with the existing mapping technology with insufficient mapping effect or over-fitting condition, the data mapping is adjusted based on the supervision data, the data mapping relation can be effectively optimized, the mapping accuracy is improved, and the accuracy of the determined search scene is further improved.

Optionally, in an implementation manner of this embodiment, the supervision data in the supervision data set includes a phrase name and an adjustment parameter, and the adjustment parameter includes a weight and/or a penalty factor. At this time, the process 12 may be implemented by:

first, the supervision data and the first data that match each other are determined using a text matching process. For example, the phrase name and the first data in the first data set are matched, and the supervision data and the first data which are matched with each other are determined. Then, for each item of second data, the mapping relationship between the second data and the first data to which the second data is initially mapped is modified based on the weight of the supervisory data matching the first data to which the second data is initially mapped, and/or, for each item of second data, the weight of the first data to which the second data is initially mapped is adjusted based on the penalty factor of the supervisory data matching the first data to which the second data is initially mapped.

Wherein, the modifying the mapping relationship between the first data to which the second data is initially mapped comprises: deleting the mapping relation between the first data and the second data of which the weight values of the matched supervision data do not meet the preset conditions, sequencing the mapping between the first data and the second data according to the weight values of the supervision data matched with the first data, and the like.

Optionally, in an implementation manner of this embodiment, the first data source is processed according to a time dimension and a geographic dimension to obtain the first data set; performing text processing (including word segmentation analysis, word frequency analysis, word stem extraction and semantic analysis) on a supervision data source to obtain a supervision data set; the second data set may be an existing data set.

Optionally, in an implementation manner of this embodiment, the first data set, the second data set, and the supervisory data set are data of the same domain. For example, taking the catering field as an example, the first data set is a scene feature library of the catering field, the second data set comprises dish data and merchant data, and the supervision data set is obtained based on effective catering field information mined from an external source.

Optionally, in an implementation manner of this embodiment, for each item of second data, at least part of the first data or a combination of the at least part of the first data is selected from the first data actually mapped to the second data as the search scenario.

For example, taking the menu word "fried bread stick" in the second data set as an example, assume that the mapped first data includes: breakfast, northern, staple food, fried food, Chinese tradition and the like. The breakfast is most representative and occurs most frequently. Therefore, in the mapping data of the dish word of "fried bread stick", breakfast can be arranged at the first of all the characteristics, and the biggest weight is corresponding to the breakfast. Further, in process 14, "breakfast" may be selected as the search scenario for the fried bread stick. Of course, at least a part of word combinations can be selected from the mapped words to form a scene, for example, "breakfast in north" is taken as the scene. In other words, in the present implementation, the first data or the combination of the first data whose weights satisfy the preset condition may be selected as the corresponding search scenario according to the weights (e.g., weight ranking) of the matched supervised data.

In the implementation mode, the feature words corresponding to the deep-fried dough sticks can be screened by adopting the frequency of the feature words, and the weight of each feature word can be optimized according to the frequency of the feature words, so that the problem of inaccuracy possibly existing in weight description is weakened by adjusting/correcting the weight by taking the frequency of the feature words as an auxiliary parameter, and the accuracy of actual data mapping obtained by adjusting based on the weight is also ensured.

Regarding the frequency of the feature words, it refers to the number of the feature words recorded in the data collection and statistics stage of the first data set. For example: assuming that the word "breakfast" counts up to 723 "breakfast" or "breakfast" as the main semantic phrase in the data collection stage of the first data set, the word frequency of the "breakfast" feature word in the first data set is 723/(the total number of occurrences of all feature words).

In the following, various details relating to the present invention will be described, taking as an example the application of the invention in the catering field.

Fig. 2 is a flowchart illustrating a method for creating a scene feature library according to an embodiment of the present invention. The scene feature library is a specific implementation manner of the first data set. Referring to fig. 2, the method includes:

first, a first data source is obtained. The first data source includes user behavior data and exogenous mining data. The user behavior data mainly reflects behaviors of the user in a time dimension, and by using user clicks and browsing records collected by a client (for example, an APP client), the behaviors of the user are sequentially sorted and collected by taking time as the dimension at a server, for example: the behavior data of the user A at 11 o ' clock 03/2016 is ' open APP- > browse home page- > pull-down menu to 3 rd page- > stay for 2 seconds, then select the third merchant and enter- > select X product on merchant detail page- > enter home page- > select payment method and deliver geographic information ', and the like. And the external source mining data comprises information such as open menus, dish making methods, food classification and the like of mainstream professional food and beverage websites.

And then, analyzing the first data source through a data analysis subsystem to obtain time scene basic data, festival scene basic data and geographic information basic data. Specifically, the text pattern matching technology is utilized to divide the first data source into basic time scenes such as breakfast, lunch, dinner and night, basic holiday scenes such as traditional Chinese and western holidays, and basic features such as user distribution scenes based on geographic information.

And then, after the basic characteristic information is obtained, training and fitting optimization are carried out on the characteristic filtering model through a fitting algorithm, the filtering of characteristic data is completed, wrong data which do not belong to the relevant information of the catering field are removed, and the data of the characteristic library is rationalized.

Here, the feature filtering model is trained because raw feature data that is not filtered often has various kinds of noise data. For example: the search word of "cigarette" may generate two scene features of "breakfast" and "dessert" in the original scene feature extraction. Obviously, this is due to false identifications caused by dirty data, which need to be filtered out. Therefore, the expected target state of the model is set manually, and the fitting process is adopted, so that the filtering condition can be more accurate continuously, and the characteristic database data with weak logic correlation can be filtered.

Through the processing, the scene feature library can be obtained. Illustratively, the basic data structure in the scene feature library is shown in the following table:

(watch one)

Refer to table one. Wherein the feature ID represents unique identification information of each feature, and the ID is used to call up the relevant feature in the search scene identification. The feature names facilitate the feature library administrator to view and display information. The feature classification represents the class in which the features are located, and for example, the features may be classified into primary features, secondary features, and tertiary subclasses. More specifically, "breakfast" belongs to the primary feature, and includes the secondary feature of "breakfast for weight loss", which in turn includes the tertiary features of "tuna meat product".

The feature weight represents an influence factor of the feature in the feature library, and the calculation formula is as follows:

W_i＝θ*C_i/∑_j＝0Cj+Punishment(i>0, j starts from 0)

W_iThe weight (also called the impact factor) representing the ith feature, theta represents an artificially set forward excitation parameter that is used to attenuate the noise-induced interference mentioned above, C_iThe feature name obtained by subjecting the ith feature to word segmentation, word frequency analysis and semantic analysis (please refer to the description in the supervision data below) in the training data, i.e. the first data source of the foregoing text. Punishment is a penalty factor and is used for correcting the problem of overlarge influence of the weight factor caused by the overfitting problem.

Feature relationships represent relationships between features, including approximate, mutually exclusive, and inclusive. For example: "breakfast" and "dinner" are mutually exclusive features. The characteristic relation information has an important role in optimizing the characteristic mapping part later, and wrong mapping results can be accurately filtered through comparison of characteristic weights and characteristic relations.

By adopting the method provided by the embodiment, data processing is performed through a full-automatic flow, particularly, time dimension and geographic information dimension are introduced to divide mass data, so that invalid time cost consumption caused by data mining processing and manual review can be effectively reduced, and the overall strategy evaluation performance is improved. In addition, in order to improve the descriptiveness and representativeness of the feature library, the feature library can be secondarily optimized in a mode of reverse excitation of the feature model. Compared with the traditional feature extraction technology, the method has higher accuracy and more representative included features.

Fig. 3 is a flowchart illustrating a method of acquiring supervisory data according to an embodiment of the present invention. The method carries out text processing on information in the catering field to obtain supervision data, wherein the supervision data refers to data suitable for a supervision model (a basic machine learning method). Specifically, as shown in fig. 3, the method includes:

30: and obtaining catering field information. The catering field information can be extracted from external source mining data by the web crawler robot.

32: and (5) word segmentation analysis. In particular, word segmentation analysis may be accomplished using word segmentation tools. For example, a word segmentation tool is adopted, the basic principle of the word segmentation tool is that a word dictionary generated by mass data is matched with a piece of catering information, once a successfully matched phrase is found, the successfully matched phrase is regarded as a candidate word segmentation, a word segmentation mode with the highest matching degree is selected according to word weight provided by the word dictionary, and then the word segmentation result can be regarded as a final result. After word cutting, a set of phrases is formed, for example: the text of the main food materials of the sweet and sour ridges comprises ridge meat, starch, tomatoes and the like is regarded as catering information, and the phrase set after word cutting is { "sweet and sour ridges", "main food materials", "ridge meat", "starch", "tomatoes" }.

34: and (5) analyzing word frequency. Specifically, after word segmentation analysis is performed on each piece of catering field information, the number of times of appearance of each word-segmented phrase is counted, and the number of times is word frequency information. The main purpose of word frequency analysis is to filter out unwanted words, leaving the most representative words. For example: aiming at catering field information, the words are cut into two words: chicken steak and large chicken steak. According to word frequency statistics, the chicken steak is 12834 times and 231 times, so that only the chicken steak can be reserved for the two words with similar text organizational structures.

36: and extracting the word stem. Specifically, the stem dictionary is used to perform partial matching check with the generated word-cutting phrases, such as: the "savory fillet" is extracted as "fillet" from which the phrase "savory" is removed. The word stem extraction can identify the part of speech of the phrase, and then secondary cutting is carried out on the phrase, and finally only the noun part of the core is left.

38: and (5) semantic analysis. Illustratively, N-gram (a language model) based semantic analysis can be performed. The analysis method is based on the assumption that the occurrence of the nth word is only related to the first N-1 words and not to other factors, and the probability of this phrase is the product of the probabilities of the respective stems.

By the processing of 30-38, the supervision data in the catering field can be obtained. Illustratively, the structure of the supervisory data is shown in the following table:

phrase ID

Phrase name

Weight of

Penalty factor

Watch two

Wherein the phrase ID uniquely identifies the phrase for use in invoking the supervisory data. The phrase name is used for text matching with data in the first data set (e.g., feature words in a scene feature library). The weight refers to the importance of the supervision data, for example, the dish "fish-flavor shredded pork" is mapped to three feature words of "Sichuan dish", "popular" and "fashion originality", and the weight of two supervision phrases of "Sichuan dish" and "popular" in the supervision data of the system is obviously greater than that of the phrase "fashion originality", so that the features left after filtering are "Sichuan dish" and "popular", and meanwhile, the phrase expression mode of "fish-flavor XX" is also defined as a supervision formula by the system. When a phrase similar to "fish-smell XX" is processed next time, but whenever "chinese dishes", "popular" or similar features occur, the supervisory model will increase the impact factors of these features and, at the same time, limit the mapping strength of other features. The penalty factor is a modification option of the supervision data, and the value is usually set manually, and the constraint of the supervision data on the characteristics is evaluated through manual review after data sampling.

Fig. 4 is a schematic diagram of data mapping logic according to an embodiment of the present invention, which illustrates actual data mapping logic of a scene feature library and restaurant domain data. Referring to fig. 4, the data mapping logic includes: firstly, establishing a data mapping between catering domain data (including dish data and merchant data) and a scene feature library based on the catering domain data and the merchant data. And then, reading the weight and the penalty factor of the supervision data, and further carrying out promotion and limitation. Specifically, when mapping the scene feature library to dish or merchant data, the weight of the supervisory data itself is used to raise the part of feature words that match the supervisory data, and meanwhile, the penalty factor of the supervisory data limits the mapping strength (i.e., the weight of the feature words), and valid mapping data (i.e., actual mapping data) is generated.

The traditional feature mapping technology has the condition of insufficient mapping effect or overfitting. The data mapping logic adopted by the embodiment introduces the concept of supervision data, a supervision model of catering domain knowledge can be established through third party data, and then scene characteristics of dishes and shop names are filtered based on the supervision data in the data mapping, so that the mapping accuracy is improved.

In this embodiment, after the processing of generating effective mapping data, the feature words mapped to each food and beverage domain information word (e.g., dish and merchant name) may be sorted by using the feature word frequency. Taking the dish word of "deep-fried dough sticks" as an example, the mapped feature words include: the breakfast, the northern part, the staple food, the fried food, the Chinese tradition and the like, wherein the breakfast is the scene with the highest appearance frequency and the most representative character. Therefore, in the mapping data of the dish word of the 'fried bread stick', breakfast is arranged at the first of all the characteristics, and the weight is the largest. "breakfast" may be used as a search scenario for the fried dough stick.

Fig. 5 is a flowchart of a data mapping method according to an embodiment of the present invention, which illustrates an actual data mapping process between a scene feature library and restaurant domain data (including dish data and merchant data). Referring to fig. 5, the method includes:

50: and establishing data mapping between the scene feature library and the catering field data.

52: optimizing the data mapping based on supervisory data. For example, the optimization is performed by the aforementioned weight and penalty factor.

54: and determining a search scene corresponding to the catering field data. For example, for a single second data in the second data set, sorting, filtering or combining is performed according to the frequency of occurrence, weight or other parameters of the first data mapped thereto, so as to obtain a corresponding search scenario.

Fig. 6 is a flowchart illustrating a search scene recognition method according to an embodiment of the present invention. Referring to fig. 6, the method includes:

60: and cutting the search terms to obtain the search terms. The search term may be one or more.

Optionally, in an implementation manner of this embodiment, a recognition process is first performed on the search term input by the user, where the recognition process includes a simple filtering and a first recall trigger. Wherein, filtering refers to performing an anomaly determination for the search term, and if the search term is found to be anomalous, for example: the search terms contain illegal characters, sensitive information and the like, and the search is not processed in the next step.

Alternatively, in the present embodiment, word segmentation may be performed using the word segmentation tool mentioned above.

62: and determining matching data matched with the search word in the second data set through matching processing. Wherein the second data set and the first data set establish a data map (i.e., an actual data map) using the data mapping method as described above. For a description of the first data set and the second data set, please refer to the foregoing.

Optionally, in an implementation manner of this embodiment, the matching process is a text matching process, and preferably, a partial matching is adopted. The partial matching means that if the second data in the second data set is matched with any word after the search term is cut, the second data is matched with the search term. For example, approximate calculation is performed by using the word segmentation result of the search term and the words in the feature word library, and if the feature matching of the search term "chuanxiang cooked meat" and "cooked meat" in the feature library is successful, the matching of the two characters of "cooked meat" in "chuanxiang cooked meat" and the related features is successful.

The catering field data are matched quickly by adopting a partial matching mode, so that the matching efficiency is improved on one hand, and the breadth of a matching scene is effectively improved on the other hand.

64: and determining the search scene corresponding to the search item according to the search scene mapped by the matching data.

Optionally, in an implementation manner of this embodiment, taking the restaurant domain as an example, the first data set is the scene feature library, and the second data set is the restaurant domain data. After the search scenes corresponding to the search words are determined, the scenes can be sorted by using the scene weights pre-calculated in the scene feature library.

Fig. 7 is a flowchart illustrating a searching method according to an embodiment of the present invention, and referring to fig. 7, the method includes:

70: a search scenario corresponding to the search term is identified. For example, according to a search term, a second data set and a search scene mapped by the second data set, a search scene corresponding to the search term is determined. Wherein the search scene mapped by the second data set is determined by the data mapping method. More specifically, the method shown in fig. 6 may be employed for identification.

72: and loading a data file corresponding to the search scene. The data file is configured with an optimization strategy for recalling data.

Optionally, in an implementation manner of this embodiment, the data files corresponding to different scenes are dynamically loaded, and then the search result that meets the search intention of the user is obtained. The dynamic loading, i.e. hot loading technique, can replace data in real time without restarting the service. In the embodiment, the sorting strategy of the recall logic is constructed into individual data files, and the sorting algorithm is constructed by loading the data files. Illustratively, the data files for these sort policies are shown in the following table:

policy ID

Policy name

Policy classification

Description parameter

Extent of action of parameter

Extension information

(watch III)

Where the description parameters and parameter scopes are intended to represent points of influence of a policy, for example: in the sorting strategy based on the distance, the description parameter is the distance factor, and the action range of the parameter is 0 km-20 km.

74: and optimally sorting the recalled data according to the data file.

By adopting the method provided by the embodiment, a modularized calculation entry is provided for the search recall, respective sequencing optimization strategies can be designed for different search scenes, and the personalized effect of searching for thousands of people and thousands of faces is achieved.

Method embodiments according to embodiments of the present invention are described in detail above with reference to the accompanying drawings. Embodiments of the apparatus according to the present invention will be described below with reference to the accompanying drawings.

Fig. 8 is an example of a block diagram of a data processing apparatus for confirming a search scene according to an embodiment of the present invention. Referring to fig. 8, the data processing apparatus includes: a data map establishing module 80 for establishing an initial data map between the first data set and the second data set; a data mapping adjustment module 82, configured to adjust the data mapping according to a supervision data set, so as to obtain an actual data mapping between the first data set and the second data set; and a search scene mapping module 84, configured to determine, based on the first data in the first data set to which the second data in the second data set is actually mapped, a search scene corresponding to the second data in the second data set.

Optionally, in an implementation manner of this embodiment, the supervisory data in the supervisory data set includes a weight and/or a penalty factor in addition.

Optionally, in an implementation manner of this embodiment, the data mapping adjustment module 82 includes: the matching sub-module is used for determining the mutually matched supervision data and first data by adopting text matching processing; the first adjusting submodule is used for modifying the mapping relation between the second data and the first data to which the second data is initially mapped based on the weight of the supervision data matched with the first data to which the second data is initially mapped aiming at each item of second data, and/or the second adjusting submodule is used for adjusting the weight of the first data to which the second data is initially mapped based on the penalty factor of the supervision data matched with the first data to which the second data is initially mapped aiming at each item of second data.

Optionally, in an implementation manner of this embodiment, the search scene mapping module 84 is specifically configured to: for each item of second data, at least part of first data or a combination of the at least part of first data is selected from first data actually mapped to the second data as the search scene. For example, the at least part of the first data is chosen based on the weight of the supervisory data to which the first data matches.

Optionally, in an implementation manner of this embodiment, the first data set is a scene feature library in the catering field, and the second data set includes dish data and merchant data.

Fig. 9 is an example of a block diagram of a search scene recognition apparatus according to an embodiment of the present invention, and referring to fig. 9, the apparatus includes: the word segmentation module 90 is configured to segment words of the search term to obtain a search word; a matching module 92, configured to determine, through matching processing, matching data in the second data set that matches the search term; a determining module 94, configured to determine, according to the search scenario mapped by the matching data, a search scenario corresponding to the search term. Wherein the search scenario is mapped for the second data set using the method described above.

Fig. 10 is an example of a block diagram of a search apparatus according to an embodiment of the present invention, and referring to fig. 10, the apparatus includes: a scene determining module 102, configured to determine, according to a search term and a second data set and a search scene mapped by the second data set, a search scene corresponding to the search term (where the scene mapped by the second data set is determined by using the data mapping method described above or determined by using the search scene identifying device shown in fig. 9); a loading module 104, configured to load a data file corresponding to the search scenario, where the data file is configured with an optimization policy for recalling data; and the optimization module 106 is configured to perform optimization sorting on the recalled data according to the loaded data file.

While the information pushing method and apparatus according to the embodiments of the present invention are described above with reference to the accompanying drawings, it should be understood by those skilled in the art that the method embodiments or implementations provided by the present invention may be implemented by the apparatus embodiments or implementations provided by the present invention, and the processing procedure/logic of the apparatus embodiments of the present invention is consistent with the method embodiments of the present invention. Therefore, in the device embodiment of the present invention, for the detailed description of the processes executed or executable by each module and sub-module, the explanation of specific names, terms and ranges, and the description of the beneficial effects of each embodiment and related features, please refer to the corresponding description in the method embodiment, which is not repeated herein.

In one possible design related to the present invention, the aforementioned data processing apparatus may include a processor and a memory, the memory being used for storing a program that supports the data processing apparatus to execute the processing performed by the aforementioned corresponding module/sub-module, and the processor being configured to execute the program stored in the memory.

The program includes one or more computer instructions, wherein the one or more computer instructions are for execution invoked by the processor.

More specifically, the processor, by executing the computer instructions, is to:

determining a search scenario to which second data in the second data set is mapped based on first data in the first data set to which the second data in the second data set is actually mapped.

Optionally, the processor may further be configured to, by executing the computer instructions: processing a first data source according to a time dimension and a geographic dimension to obtain a first data set; and performing word segmentation analysis, word frequency analysis, word stem extraction and semantic analysis on the supervision data source to obtain the supervision data set.

Optionally, the supervision data in the supervision data set comprises, in addition to the phrase name, a weight and/or a penalty factor. At this time, the process may also be operable by executing the computer instructions to:

determining the matched supervision data and first data by adopting text matching processing; the mapping relationship between the second data and the first data to which the second data is initially mapped is modified for each item of second data based on the weight of the supervisory data matching the first data to which the second data is initially mapped, and/or the weight of the first data to which the second data is initially mapped is adjusted for each item of second data based on the penalty factor of the supervisory data matching the first data to which the second data is initially mapped.

Optionally, the process may be further operable by executing the computer instructions to: for each item of second data, at least part of first data or a combination of the at least part of first data is selected from first data actually mapped to the second data as the search scene.

Accordingly, the embodiment of the present invention further provides a computer storage medium for storing computer software instructions executed by the aforementioned data mapping apparatus, which includes a program related to the data mapping apparatus for executing the aforementioned data mapping method.

In another possible design related to the present invention, the aforementioned search apparatus may include a processor and a memory, the memory being used for storing a program that supports the data processing apparatus to execute the processing performed by the corresponding module/sub-module, and the processor being configured to execute the program stored in the memory.

More specifically, the processor, by executing the computer instructions, is to: determining a search scene corresponding to a search term according to the search term, a second data set and a search scene mapped by the second data set, wherein the search scene mapped by the second data set is determined by adopting the data mapping method; loading a data file corresponding to the search scene, wherein the data file is configured with an optimization strategy of recall data; and optimizing and sequencing the recalled data according to the data file.

Accordingly, the present invention further provides a computer storage medium for storing computer software instructions executed by the aforementioned search apparatus, which includes a program related to the search apparatus for executing the aforementioned search method.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

The invention discloses a1, a data processing method for confirming a search scene, comprising:

A2, the method as A1 shows, that the first data set is a scene feature library of the catering domain, and the second data set comprises dish data and business data.

A3, the method of a1, the method further comprising: and processing the first data source according to the time dimension and the geographic dimension to obtain the first data set.

A4, the method of A1, further comprising:

and performing text processing (including word segmentation analysis, word frequency analysis, word stem extraction and semantic analysis) on the supervision data source to obtain the supervision data set.

A5, the method of any one of A1 to A4, wherein the supervisory data in the supervisory data set includes weights and/or penalty factors.

A6, the process according to A5,

adjusting the initial data mapping relationship according to a supervised data set, comprising:

In the method of a7, as in any one of a1-a4 or a6, the determining a search scenario corresponding to second data in the second data set based on first data in the first data set to which the second data in the second data set is actually mapped includes: for each item of second data, at least part of first data or a combination of the at least part of first data is selected from first data actually mapped to the second data as the search scene.

The invention also discloses B8 and a searching method, which comprises the following steps:

determining a search scene corresponding to a search term according to the search term, a second data set and the search scene mapped by the second data set, wherein the search scene mapped by the second data set is determined by adopting the method of any one of A1-A7;

and optimizing and sequencing the recalled data according to the data file.

The invention also discloses C9, a data processing device for confirming the search scene, comprising:

a data mapping establishing module for establishing an initial data mapping between a first data set and a second data set, the first data set comprising a plurality of items of first data, the second data set comprising a plurality of items of second data;

And C10, in the device according to C9, the first data set is a scene feature library in the catering field, and the second data set comprises dish data and business data.

C11, the apparatus of C9, further comprising a first data processing module for processing a first data source according to a time dimension and a geographic dimension to obtain the first data set.

C12, the apparatus according to C9, further comprising a supervised data processing module for performing text processing (e.g., including word segmentation analysis, word frequency analysis, word stem extraction, and semantic analysis) on the supervised data source to obtain the supervised data set.

C13, the apparatus as described in C9-C12, wherein the supervisory data in the supervisory data set comprises, in addition to the phrase name, a weight and/or a penalty factor.

C14, the apparatus as described in C13, the data mapping adjustment module comprising:

the matching sub-module is used for determining the mutually matched supervision data and first data by adopting text matching processing;

a first adjusting submodule, configured to, for each item of second data, modify a mapping relationship between the second data and the first data to which the second data is initially mapped, based on a weight of the supervisory data matching the first data to which the second data is initially mapped, and/or,

and the second adjusting submodule is used for adjusting the weight of the first data to which the second data is initially mapped based on the penalty factor of the supervision data matched with the first data to which the second data is initially mapped aiming at each item of second data.

In the apparatus of C15, for example, any one of C9-C12 or C14, the search scene mapping module is specifically configured to: for each item of second data, at least part of first data or a combination of the at least part of first data is selected from first data actually mapped to the second data as the search scene.

The invention also discloses D16, a searching device, comprising:

a scene determining module, configured to determine a search scene corresponding to a search term according to the search term and a second data set and the search scene mapped by the second data set, where the search scene mapped by the second data set is determined by using the method described in any one of a1-a 7;

an optimization module for performing optimization sequencing on the recall data according to the loaded data file

The invention also discloses E1, a data mapping device, comprising a memory and a processor; wherein the content of the first and second substances,

the memory is to store one or more computer instructions, wherein the one or more computer instructions are for the processor to invoke for execution;

the processor performs the following by executing the computer instructions:

E2, the data mapping device of E1, the first data set is a scene feature library of the catering domain, and the second data set comprises dish data and business data.

E3, the data mapping apparatus of E1, the processor by executing the computer instructions to: and processing the first data source according to the time dimension and the geographic dimension to obtain the first data set.

E4, the data mapping apparatus of E1, the processor by executing the computer instructions to: text processing (e.g., including word segmentation analysis, word frequency analysis, stem extraction, and semantic analysis) is performed on the supervised data source to obtain the supervised data set.

E5, the data mapping device of any one of E1-E4, wherein the supervisory data in the supervisory data set comprises weights and/or penalty factors.

E6, the data mapping apparatus of E5, the processor by executing the computer instructions to: determining the matched supervision data and first data by adopting text matching processing; the mapping relationship between the second data and the first data to which the second data is initially mapped is modified for each item of second data based on the weight of the supervisory data matching the first data to which the second data is initially mapped, and/or the weight of the first data to which the second data is initially mapped is adjusted for each item of second data based on the penalty factor of the supervisory data matching the first data to which the second data is initially mapped.

E7, the apparatus of any one of E1-E4 or E6, the processor by executing the computer instructions to: for each item of second data, at least part of first data or a combination of the at least part of first data is selected from first data actually mapped to the second data as the search scene.

The invention also discloses F1, a searching device, comprising a memory and a processor; wherein the content of the first and second substances,

the processor performs the following by executing the computer instructions: determining a search scene corresponding to a search term according to the search term, a second data set and the search scene mapped by the second data set, wherein the search scene mapped by the second data set is determined by the method of any one of A1-A7; loading a data file corresponding to the search scene, wherein the data file is configured with an optimization strategy of recall data; and optimizing and sequencing the recalled data according to the data file.

Claims

1. A data processing method for confirming a search scenario, the method comprising:

adjusting the initial data mapping according to a supervision data set to obtain an actual data mapping between the first data set and the second data set, wherein the supervision data set, the first data set and the second data set are data in the same field, the supervision data set is standard sample data in the same field, the supervision data comprises a weight and/or penalty factor, and the weight or penalty factor is used for adjusting the mapping relation of the first data set and the second data set;

2. The method of claim 1, wherein the first data set is a scene feature library of a dining area and the second data set comprises dish data and merchant data.

3. The method of claim 1, wherein the method further comprises:

and processing the first data source according to the time dimension and the geographic dimension to obtain the first data set.

4. The method of claim 1, wherein the method further comprises:

and performing text processing on the supervision data source to obtain the supervision data set.

5. The method of claim 1, wherein said adjusting the initial data mapping according to a supervised data set comprises:

6. The method of any one of claims 1-5, wherein determining the search scenario corresponding to the second data in the second data set based on the first data in the first data set to which the second data in the second data set is actually mapped comprises:

for each item of second data, at least part of first data or a combination of the at least part of first data is selected from first data actually mapped to the second data as the search scene.

7. A method of searching, the method comprising:

determining a search scene corresponding to a search term according to the search term, a second data set and a search scene mapped by the second data set, wherein the search scene mapped by the second data set is determined by adopting the method of any one of claims 1-6;

and optimizing and sequencing the recalled data according to the data file.

8. A data processing apparatus for confirming a search scenario, the apparatus comprising:

a data mapping adjustment module, configured to adjust the initial data mapping according to a supervised data set to obtain an actual data mapping between the first data set and the second data set, where the supervised data set, the first data set, and the second data set are data in a same field, the supervised data set is standard sample data in the same field, the supervised data includes a weight and/or penalty factor, and the weight or penalty factor is used to adjust a mapping relationship between the first data set and the second data set;

9. The apparatus of claim 8, wherein the first data set is a scene feature library of a dining area and the second data set comprises dish data and merchant data.

10. The apparatus of claim 8, wherein the apparatus further comprises:

and the first data processing module is used for processing the first data source according to the time dimension and the geographic dimension to obtain the first data set.

11. The apparatus of claim 8, wherein the apparatus further comprises:

and the supervision data processing module is used for performing text processing on a supervision data source to obtain the supervision data set.

12. The apparatus of claim 8, wherein the data mapping adjustment module comprises:

13. The apparatus of any one of claims 8-12, wherein the search context mapping module is specifically configured to:

14. A search apparatus, characterized in that the apparatus comprises:

a scene determining module, configured to determine a search scene corresponding to a search term according to the search term, a second data set and a search scene mapped by the second data set, where the search scene mapped by the second data set is determined by using the method according to any one of claims 1 to 6;

15. A data processing apparatus comprising a memory and a processor; wherein the content of the first and second substances,

the processor implements the method of any one of claims 1-6 by executing the computer instructions.