CN110851571B

CN110851571B - Data processing method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN110851571B
Application number: CN201911114938.5A
Authority: CN
Inventors: 周瑜; 臧云飞
Original assignee: Rajax Network Technology Co Ltd
Current assignee: Rajax Network Technology Co Ltd
Priority date: 2019-11-14
Filing date: 2019-11-14
Publication date: 2022-11-25
Anticipated expiration: 2039-11-14
Also published as: CN110851571A

Abstract

The embodiment of the disclosure discloses a data processing method, a data processing device, electronic equipment and a computer readable storage medium, wherein the data processing method comprises the steps of acquiring first article data; obtaining a first item data vector by using the trained first model according to the first item data through a processor; vector recall calculation is carried out on the first article data vector and the second article data vector through a processor, corresponding second article data of the first article data are determined according to calculation results, and the first article and the corresponding second article have an association relationship and are different types of articles; determining first recall item data from the corresponding second item data of the first item data, the first recall item being a same type of item as the second item. According to the technical scheme, the first article data of the entity object are analyzed, and the first recalled article data are determined, so that the problem that the entity object is difficult to start in a cold mode is effectively solved.

Description

Data processing method and device, electronic equipment and computer readable storage medium

Technical Field

The present disclosure relates to the field of computer application technologies, and in particular, to a data processing method and apparatus, an electronic device, and a computer-readable storage medium.

Background

With the development of internet technology, in order to better capture user requirements, article recommendation algorithms of internet platforms generally include two parts, namely a recall algorithm and a ranking algorithm. The purpose of the recall algorithm is to initially filter out a small batch of related items, such as hundreds to thousands of items that may be of interest to the user, from a full candidate set of items. The purpose of the ranking algorithm is to optimize the recalled item results, calculate its relevance using more accurate features, and base this on the items that were last recommended to the user. Thus, the recalled item results determine to some extent the efficiency of the ranking stage and the goodness of the final recommendation.

Disclosure of Invention

In order to solve the problems in the related art, embodiments of the present disclosure provide a data processing method and apparatus, an electronic device, and a computer-readable storage medium.

In a first aspect, an embodiment of the present disclosure provides a data processing method.

Specifically, the data processing method includes:

acquiring first article data;

obtaining a first item data vector by using the trained first model according to the first item data through a processor;

vector recall calculation is carried out on the first item data vector and the second item data vector through a processor, corresponding second item data of the first item data are determined according to a calculation result, and the first item and the corresponding second item have an association relationship and are different types of items;

determining first recall item data from the corresponding second item data of the first item data, the first recall item being a same type of item as the second item.

With reference to the first aspect, in a first implementation manner of the first aspect, the training process of the first model includes:

acquiring training data, wherein the training data comprises first article sample data and second article sample data;

and training the first model based on the training data to obtain the trained first model and the second article data vector, wherein the first model comprises a BERT model.

With reference to the first aspect, in a second implementation manner of the first aspect, the performing a vector recall calculation on the first item data vector and the second item data vector, and determining, according to a calculation result, corresponding second item data of the first item data includes:

calculating a first feature value of the second item data vector, wherein the first feature value represents a matching degree of the first item data vector and the second item data vector;

and determining corresponding second item data of the first item data according to the first characteristic value.

With reference to the second implementation manner of the first aspect, in a third implementation manner of the first aspect, the determining first recalled item data according to the second item data includes:

calculating a second feature value of the corresponding second item data according to the first feature value of the corresponding second item data of the plurality of first item data;

and determining the first recalled item data according to the second characteristic value of the corresponding second item data.

With reference to the first aspect, in a fourth implementation manner of the first aspect, the present disclosure further includes:

determining, by a processor, second recalled item data from a third item data set from the first recalled item data, the second recalled item being a same type of item as the first recalled item.

With reference to the fourth implementation manner of the first aspect, in a fifth implementation manner of the first aspect, the determining, according to the first recalled item data, second recalled item data from a third item data set includes:

determining a first candidate set of candidate second recall item data through word segmentation matching according to the first recall item data; and/or

Determining a second candidate set of candidate second recalled item data using a vector recall model in accordance with the first recalled item data; and/or

Determining a third candidate set of candidate second recalled item data by using a synonym recall model according to the first recalled item data;

determining the second recalled item data based on a first candidate set and/or a second candidate set and/or a third candidate set of the candidate second recalled item data.

With reference to the fifth implementation manner of the first aspect, in a sixth implementation manner of the first aspect, the determining, according to the first recalled item data, a first candidate set of candidate second recalled item data by word segmentation matching includes:

performing word segmentation processing on the first recalled item data and the third item data through a word segmentation device to obtain first recalled item word segmentation data and third item word segmentation data;

performing word segmentation matching based on the first recalled item word segmentation data and the third item word segmentation data, and determining a first candidate set of candidate second recalled item data.

With reference to the fifth implementation manner of the first aspect, in a seventh implementation manner of the first aspect, the determining, by using a vector recall model, a second candidate set of candidate second recalled item data according to the first recalled item data includes:

obtaining the first recalled item data vector using the first model in accordance with the first recalled item data;

obtaining a third article data vector by using a second model according to the third article data;

and performing vector recall calculation on the first recalled item data vector and the third item data vector, and determining a second candidate set of candidate second recalled item data according to a calculation result.

With reference to the seventh implementation manner of the first aspect, in an eighth implementation manner of the first aspect, the obtaining, according to the third article data, the third article data vector by using a second model includes:

when the third item data matches the first recalled item data, taking the first recalled item data vector as the third item initialization data vector;

obtaining the third item data vector using the second model based on the third item initialization data vector.

With reference to the fifth implementation manner of the first aspect, in a ninth implementation manner of the first aspect, the determining, according to the first recalled item data, a third candidate set of candidate second recalled item data by using a synonym recall model includes:

acquiring the synonym data of the first recalled article according to the data of the first recalled article;

determining a third candidate set of the candidate second recalled item data based on the first recalled item synonym data.

With reference to the fifth implementation manner of the first aspect, in a tenth implementation manner of the first aspect, the determining the second recalled item data based on the first candidate set and/or the second candidate set and/or the third candidate set of the candidate second recalled item data includes:

determining a third feature value of the candidate second recalled item data of the first candidate set and/or second candidate set and/or third candidate set of the candidate second recalled item data;

for the candidate second recall items belonging to the same category, ranking the candidate second recall item data of the first candidate set and/or second candidate set and/or third candidate set of the candidate second recall item data according to the third feature value;

determining the second recalled item data from the candidate second recalled item data based on the ranking result.

With reference to the tenth implementation manner of the first aspect, the present disclosure provides in an eleventh implementation manner of the first aspect, the third feature value is a product of a ranking score of the candidate second recall item data and a weight score, where the weight score is a second feature value of the second item data corresponding to the candidate second recall item data, and the ranking score is a vector distance between the first recall item data corresponding to the candidate second recall item data and the candidate second recall item data.

In a second aspect, a data processing apparatus is provided in an embodiment of the present disclosure.

Specifically, the data processing apparatus includes:

an acquisition module configured to acquire first item data;

an obtaining module configured to obtain, by a processor, a first item data vector using the trained first model according to the first item data;

the first determining module is configured to perform vector recall calculation on the first item data vector and the second item data vector through a processor, and determine corresponding second item data of the first item data according to a calculation result, wherein the first item and the corresponding second item have an association relationship and are different types of items;

a second determination module configured to determine first recalled item data from the corresponding second item data of the first item data, the first recalled item being of a same type of item as the second item.

With reference to the second aspect, in a first implementation manner of the second aspect, the training process of the first model includes:

training the first model based on the training data to obtain the trained first model and the second item data vector, wherein the first model comprises a BERT model.

With reference to the second aspect, in a second implementation manner of the second aspect, the performing a vector recall calculation on the first item data vector and the second item data vector, and determining corresponding second item data of the first item data according to a calculation result includes:

With reference to the second implementation manner of the second aspect, in a third implementation manner of the second aspect, the determining first recalled item data according to the second item data includes:

With reference to the second aspect, in a fourth implementation manner of the second aspect, the present disclosure further includes:

a third determination module configured to determine, by the processor, second recalled item data from a third item data set from the first recalled item data, the second recalled item being a same type of item as the first recalled item.

With reference to the fourth implementation manner of the second aspect, in a fifth implementation manner of the second aspect, the determining, according to the first recalled item data, second recalled item data from a third item data set includes:

Determining a third candidate set of candidate second recalled item data using a synonym recall model, according to the first recalled item data;

With reference to the fifth implementation manner of the second aspect, in a sixth implementation manner of the second aspect, the determining, according to the first recalled item data, a first candidate set of candidate second recalled item data through word segmentation matching includes:

performing word segmentation processing on the first recalled article data and the third article data through a word segmentation device to obtain first recalled article word segmentation data and third article word segmentation data;

performing word segmentation matching based on the first recalled item word segmentation data and the third item word segmentation data, and determining a first candidate set of the candidate second recalled item data.

With reference to the fifth implementation manner of the second aspect, in a seventh implementation manner of the second aspect, the determining, by using a vector recall model, a second candidate set of candidate second recalled item data according to the first recalled item data includes:

With reference to the seventh implementation manner of the second aspect, in an eighth implementation manner of the second aspect, the obtaining, according to the third item data, the third item data vector by using the second model includes:

and obtaining the third article data vector by utilizing the second model based on the third article initialization data vector.

With reference to the fifth implementation manner of the second aspect, in a ninth implementation manner of the second aspect, the determining, by using a synonym recall model, a third candidate set of candidate second recall item data according to the first recall item data includes:

With reference to the fifth implementation manner of the second aspect, in a tenth implementation manner of the second aspect, the determining the second recalled item data based on the first candidate set and/or the second candidate set and/or the third candidate set of the candidate second recalled item data includes:

for the candidate second recalled items belonging to the same category, sorting the candidate second recalled item data of the first candidate set and/or the second candidate set and/or the third candidate set of the candidate second recalled item data according to the third feature value;

determining the second recall item data from the candidate second recall item data based on the ranking results.

With reference to the tenth implementation manner of the second aspect, in an eleventh implementation manner of the second aspect, the third feature value is a product of a ranking score of the candidate second recall item data and a weight score, wherein the weight score is a second feature value of second item data corresponding to the candidate second recall item data, and the ranking score is a vector distance between first recall item data corresponding to the candidate second recall item data and the candidate second recall item data.

In a third aspect, the disclosed embodiments provide an electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the following method steps:

acquiring first article data;

vector recall calculation is carried out on the first article data vector and the second article data vector through a processor, corresponding second article data of the first article data are determined according to calculation results, and the first article and the corresponding second article have an association relationship and are different types of articles;

determining first recalled item data from the corresponding second item data of the first item data, the first recalled item being a same type of item as the second item.

In a fourth aspect, an embodiment of the present disclosure provides a computer-readable storage medium having stored thereon computer instructions, which, when executed by a processor, implement the method according to the first aspect, the first implementation manner to the eleventh implementation manner of the first aspect.

According to the technical scheme provided by the embodiment of the disclosure, after first article data are obtained, a first article data vector is obtained through a trained first model according to the first article data through a processor, vector recall calculation is carried out on the first article data vector and a second article data vector through the processor, corresponding second article data of the first article data are determined according to a calculation result, the first article and the corresponding second article have an association relation and are different types of articles, first recalled article data are determined according to the corresponding second article data of the first article data, and the first recalled article and the second article are the same type of articles. According to the embodiment of the disclosure, the first article data of the entity object is analyzed, the second article data corresponding to the first article data is obtained, and the first recalled article data is determined, so that the problem of difficulty in cold start of the entity object is efficiently solved, and meanwhile, the recalled articles higher in association with the first article are determined through analysis, and the real requirement of the entity object can be better captured.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

Other features, objects, and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments when taken in conjunction with the accompanying drawings. In the drawings:

FIG. 1 shows a system diagram of a data processing method according to an embodiment of the present disclosure;

FIG. 2 shows a flow diagram of a data processing method according to an embodiment of the present disclosure;

FIG. 3 shows a flow diagram of a training process of the first model according to an embodiment of the disclosure;

FIG. 4 illustrates a flowchart of a vector recall calculation for the first item data vector and the second item data vector to determine a corresponding second item data for the first item data based on the calculation results according to an embodiment of the disclosure;

FIG. 5 illustrates a flow chart for determining first recalled item data from the second item data in accordance with an embodiment of the present disclosure;

FIG. 6 shows a flow diagram of a data processing method according to an embodiment of the present disclosure;

FIG. 7 illustrates a flow chart of determining a first candidate set of candidate second recall item data by word segmentation matching from the first recall item data in accordance with an embodiment of the present disclosure;

FIG. 8 illustrates a flow chart for determining a second candidate set of candidate second recalled item data from the first recalled item data utilizing a vector recall model in accordance with an embodiment of the present disclosure;

FIG. 9 illustrates a flow chart for obtaining a third item data vector using a second model based on the third item data according to an embodiment of the disclosure;

FIG. 10 shows a flow diagram for determining a third candidate set of candidate second recalled item data from the first recalled item data utilizing a synonym recall model, according to an embodiment of the present disclosure;

FIG. 11 illustrates a flow chart for determining the second recalled item data based on the first candidate set and/or second candidate set and/or third candidate set of the candidate second recalled item data in accordance with an embodiment of the present disclosure;

FIG. 12 is a schematic diagram illustrating an application scenario of a data processing method according to an embodiment of the present disclosure;

fig. 13 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure;

fig. 14 shows a block diagram of an electronic device according to an embodiment of the present disclosure;

FIG. 15 shows a schematic block diagram of a computer system suitable for use in implementing a data processing method according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Also, for the sake of clarity, parts not relevant to the description of the exemplary embodiments are omitted in the drawings.

In the present disclosure, it is to be understood that terms such as "including" or "having," etc., are intended to indicate the presence of the disclosed features, numbers, steps, behaviors, components, parts, or combinations thereof, and are not intended to preclude the possibility that one or more other features, numbers, steps, behaviors, components, parts, or combinations thereof may be present or added.

The user data acquired or presented in this disclosure is either authorized, confirmed, or actively selected by the user. It should be further noted that the embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

In the prior art, the recommendation system of the internet generally adopts the following recall algorithm to establish the relationship between the user and the article: adopting collaborative filtering based on articles and various optimization methods thereof according to the preference of a user on the articles; employing user-based collaborative filtering according to users with similar preferences; vector recalls, matrix decomposition-based recalls or label-based recalls are adopted according to the characteristic attributes of the users/articles. The recall algorithm is based on the behavior of the user, namely the user generates a certain behavior and then makes recall recommendation on the behavior, so that the problem of cold start of the user cannot be solved well, and particularly when the historical behavior data volume of the user is small, recommendation failure is easy to cause. Meanwhile, since the recall algorithm needs to be based on the behavior that has occurred to the user, it is difficult to capture the latest needs of the user, resulting in poor timeliness and effectiveness of the results of recalls and recommended articles.

For example, in the catering field, when an internet platform recommends an item, one of the prior arts is to mine relevant information according to a user historical behavior, for example, mine a raw material commodity that a merchant is potentially interested in according to a merchant historical purchase raw material information record, mine through a coordination filtering or Swing algorithm, and then perform a sort recall. The second prior art is to recall according to manual labels, for example, keywords capable of accurately expressing attributes of merchants and/or raw materials are marked in a manual marking mode, popular raw material commodities under labels of interest of the merchants are calculated through a matching algorithm, and then the merchants are ranked and recalled. The above prior art has the following problems: (1) The problem of cold start of the merchant cannot be solved well based on the historical behaviors of the merchant, and the requirements which are not shown by the merchant are difficult to capture; (2) The method has the advantages that the information of the dishes sold by the stores of the merchants is not considered, the information of raw materials required by the merchants for selling the dishes is not considered, if the relationship between the dishes sold by the merchants and the raw materials purchased by the merchants is directly established, the raw data is difficult to obtain, a large amount of labor cost is required, and the recall efficiency is low; (3) Due to the variety of dish labels sold by merchants and raw material labels purchased by merchants, a large amount of labor cost is required for carding and maintenance.

The present disclosure is made to solve, at least in part, the problems in the prior art found by the inventors.

The data processing method of the embodiment of the disclosure may be executed by at least one computer device, where the computer device may be, for example, a server or a cloud computing platform, and the at least one computer device may be a single server or a server cluster composed of multiple servers. The at least one computer device may be interconnected to each other and communicate with other computer devices (e.g., servers, server clusters, or terminal devices) over a wired or wireless network.

Fig. 1 shows a system diagram of a data processing method according to an embodiment of the present disclosure. As shown in fig. 1, the system includes at least one first server 101, at least one second server 102, and at least one mobile phone and/or computer terminal 103, wherein the at least one first server 101 and the at least one second server 102, and the at least one first server 101 and the at least one mobile phone and/or computer terminal 103 are connected through a wired or wireless network. The at least one first server 101 may be configured to perform a data processing method according to an embodiment of the present disclosure. The at least one second server 102 may be configured to at least one of: training and/or storing models required by the data processing method; acquiring and/or storing sample data required by model training; other data used in the data processing method according to embodiments of the present disclosure are acquired and/or stored. At least one mobile phone and/or computer terminal 103 may be used to input sample data required for model training in the data processing method according to embodiments of the present disclosure and/or other data used in the data processing method according to embodiments of the present disclosure. The at least one mobile phone and/or computer terminal 103 may be further configured to output a recommendation to the entity object in response to the user entering the identification information of the entity object.

Fig. 2 shows a flowchart of a data processing method according to an embodiment of the present disclosure, which is performed on the server 101 side. As shown in fig. 2, the data processing method includes the following steps S201 to S204:

in step S201, first item data is acquired;

in step S202, obtaining, by a processor, a first item data vector according to the first item data by using a trained first model;

in step S203, performing vector recall calculation on the first item data vector and the second item data vector through a processor, and determining corresponding second item data of the first item data according to a calculation result, where the first item and the corresponding second item have an association relationship and are different types of items;

in step S204, first recall item data is determined from the second item data, the first recall item being the same type of item as the second item.

According to an embodiment of the present disclosure, the first item is a processing object for performing a specific action on a physical object, wherein the physical object includes, but is not limited to, a unit, a business or a merchant in the fields of retail, manufacturing, e-commerce, take-out or medical health, and the specific action includes actions such as purchasing or selling. The first item data includes identification information of the first item, wherein the identification information of the first item may include one or more of a chinese text name, an english text name, and an item code (such as an item code of a chinese item code center) of the first item, for example. According to an embodiment of the present disclosure, the first item data may further include characteristic information of the first item. The characteristic information of the first item may include, for example, one or more of the number of times the physical object performs a particular action on the first item, the number of the first item when the particular action is performed, sorting information of the first item, value information of the first item (such as a price of the first item, etc.), and resource information of the first item (such as seller information of the first item, etc.).

According to an embodiment of the present disclosure, the step S201, namely, acquiring the first item data, may be implemented as: and acquiring first article data according to the identification information of the entity object. The identification information of the entity object includes, for example, a name or an ID of the entity object, and the identification information of the entity object may be input through the terminal 103 and transmitted to the server 101, and the server 101 receives the identification information of the entity object and determines the first item data of the entity object. According to the embodiment of the present disclosure, the server 101 may acquire data of all the first items related to the physical object, or may acquire data of only a part of the first items according to a setting or a user selection. The embodiments of the disclosure will be described by taking the catering field as an example, but the disclosure is not limited thereto, and is also applicable to other fields. For example, the entity object is a merchant that can provide dishes through an internet platform corresponding to the server 101, the first item is a dish, and when the merchant inputs a merchant ID through the terminal 103, the server 101 may obtain dish data sold by the merchant on the internet platform through the merchant ID, for example, a name of the dish. According to the embodiment of the disclosure, the server 101 may further obtain, through the ID of the merchant, the price of the dishes, the sales amount of the dishes, the ranking of the dishes, and the like, may obtain the first item data of all the dishes of the merchant, and may also obtain the first item data of the top-ranked part of the dishes according to the ranking of the dishes.

According to the embodiment of the disclosure, the first article and the corresponding second article have an association relationship and are different types of articles, wherein the second article may be a raw material or a component of the first article, the first article may be formed after a specific action (processing or assembling or the like) is performed on the second article, for example, in the catering field, the first article may be a dish, the second article may be a raw material, for example, the first article is tomato fried eggs, and the second article may be raw materials such as tomatoes and eggs. For example, in the automotive field, the first article may be an automobile, and the second article may be a component of the automobile, such as an engine, a chassis, a body, electrical equipment, and the like. The second article data comprises identification information of the second article, wherein the identification information of the second article comprises one or more of a Chinese text name, an English text name and an article code of the second article. The second item data may also include characteristic information of the second item, such as one or more of the number of second items required to form the first item, value information of the second item, and resource information of the second item.

According to the embodiment of the present disclosure, the association relationship between the first item data and the second item data may be utilized to recall the second item data based on the first item data, wherein the second item data and/or the second item data vector may be pre-stored in a system for implementing the data processing method according to the embodiment of the present disclosure. For example, a first item data vector and a second item data vector may be obtained first, and then the distance between the first item data vector and the second item data vector is used to measure the association relationship between the first item data and the second item data, and the second item data with a larger association with the first item data may be recalled. According to the embodiment of the present disclosure, the first item data vector may be obtained by means of a trained first model, for example, a word embedding vector (embedding vector) corresponding to the first item is obtained according to a chinese text name of the first item in the first item data identification information, and the present disclosure does not specifically limit the first model as long as a model that can convert text information into an embedding vector is within the protection scope of the embodiment of the present disclosure, for example, a chinese speech model (CLM model), a continuous bag of words model (CBOW model), a word2vector model, an item2vector model, or a BERT model.

According to the embodiment of the disclosure, for each first item data vector, a vector distance between the first item data vector and the second item data vector is calculated, for example, a euclidean distance, a COS distance, a manhattan distance, or the like, and then, according to a result of the calculated vector distance, the second item data corresponding to the first item data is determined, and since the closer the vector distance between the first item data vector and the second item data vector is, the higher the association degree between the first item data and the second item data is represented, one or more second item data having a higher association degree with the first item data (closer the vector distance) are determined as the second item data corresponding to the first item data.

According to embodiments of the present disclosure, the first recalled item is the same type of item as the second item, e.g., when the second item is a raw material, the first recalled item is also a raw material; when the second item is a part, the first recall item is also a part. Since one entity object corresponds to a plurality of first item data and each first item data has a plurality of corresponding second item data, the first recalled item data may be determined for a plurality of corresponding second item data of the plurality of first item data. The first recalled item data may comprise identification information of the first recalled item, for example, the identification information of the first recalled item comprising one or more of a chinese textual name, an english textual name, and an item code of the first recalled item. The first recall item data may further include characteristic information of the first recall item, including, for example, one or more of a number of the first recall item, value information of the first recall item, and source information of the first recall item.

Fig. 3 shows a flow chart of a training process of the first model according to an embodiment of the disclosure. As shown in fig. 3, the training process of the first model includes the following steps S301 to S302:

in step S301, training data is obtained, where the training data includes first article sample data and second article sample data;

in step S302, the first model is trained based on the training data, and the trained first model and the second item data vector are obtained, where the first model includes a BERT model.

According to an embodiment of the present disclosure, a specific method for obtaining training data is not specifically limited in the present disclosure, and for example, training data composed of first article sample data and second article sample data may be obtained from information actively input by the terminal 103 through a public database and/or through an entity object. The first article sample data and the first article data are the same type of data, and the second article sample data and the second article data are the same type of data. For example, in the catering field, some public databases disclose various dishes and raw material data corresponding to the dishes, for example, the raw material data of eggs fried by tomatoes includes: tomato, egg, garlic, onion white, shallot and the like, wherein the dish name tomato fried egg is used as first sample data, and the raw materials such as tomato, egg, garlic, onion white, shallot and the like are used as second sample data corresponding to the tomato fried egg; for another example, in the catering field, the entity object actively inputs various dishes and raw material data corresponding to the dishes through the terminal 103, for example, the raw material data of the dish "tomato sirloin" includes: tomato, sirloin, ginger, shallot and pepper, etc., wherein tomato sirloin can be used as the first sample data, and tomato, sirloin, ginger, shallot and pepper can be used as the second sample data corresponding to tomato sirloin.

It should be understood that the embodiments of the present disclosure will be described by taking the first model as a BERT model as an example, but should not be construed as limiting the present disclosure. The BERT model is a natural language identification model, and its core architecture includes a preset number of operation layers (such as 12 layers), where the operation layers are converters (Transformers), and each transformer can perform feature extraction on text information based on an attention mechanism, and encode and decode the text information. Since the BERT model is a model which is pre-trained, when the BERT model is applied, each parameter in the pre-trained BERT model only needs to be adjusted according to a specific natural language processing task.

According to the embodiment of the disclosure, parameters in the BERT model can be adjusted through first article sample data and second article sample data, that is, the first article sample data and second article sample data corresponding to the first article sample data are simultaneously input into the pretrained BERT model, a first article sample data vector is determined according to the first article sample data, a second article sample data vector is determined according to the second article sample data corresponding to the first article sample data, and then feature extraction, encoding and decoding processing are performed on the first article sample data vector and the second article sample data vector based on the preset number of layers, so that parameters in the BERT model are re-determined, and the trained BERT model suitable for the embodiment of the disclosure is obtained. The first item data vector and/or the second item data vector can be obtained by inputting the first item data and/or the second item data into the trained BERT model.

Fig. 4 shows a flowchart of performing a vector recall calculation on the first item data vector and the second item data vector to determine corresponding second item data of the first item data according to a calculation result according to an embodiment of the present disclosure. As shown in fig. 4, the step S203 of performing vector recall calculation on the first item data vector and the second item data vector and determining corresponding second item data of the first item data according to the calculation result includes the following steps S401 to S402:

in step S401, calculating a first feature value of the second item data vector, where the first feature value represents a matching degree of the first item data vector and the second item data vector;

in step S402, according to the first feature value, corresponding second item data of the first item data is determined.

According to an embodiment of the present disclosure, a first eigenvalue of a second item data vector may be determined by a vector distance between the first item data vector and a plurality of second item data vectors, and the first eigenvalue may represent a degree of matching of the first item data vector and the second item data vector. For example, it may be defined that the vector distance is inversely proportional to the first eigenvalue, that is, the closer the vector distance, the larger the first eigenvalue is, for example, it may be set to divide the vector distance by a first preset parameter to obtain the first eigenvalue. Alternatively, a proportional relationship between the vector distance and the first eigenvalue may be defined, that is, the closer the vector distance, the smaller the first eigenvalue, for example, the vector distance is multiplied by a second preset parameter to obtain the first eigenvalue. Specific values of the first preset parameter and the second preset parameter can be selected according to actual needs, and the disclosure is not particularly limited.

In the following, the embodiment of the present disclosure will be described by taking an example that the vector distance and the first feature value are in an inverse proportional relationship, that is, the smaller the vector distance, the larger the first feature value, the higher the matching degree between the first item data vector and the second item data vector, but the present disclosure is not limited thereto.

According to the embodiment of the disclosure, a preset first eigenvalue threshold value can be set, and second article data corresponding to a second article data vector with a first eigenvalue greater than the preset first eigenvalue threshold value is determined as corresponding second article data of the first article data; for another example, the first feature values may be arranged in an order from large to small, and the second item data corresponding to the second item data vectors of the first preset number may be sequentially selected from large to small according to the arrangement order to determine as the corresponding second item data of the first item data, for example, the second item data corresponding to the second item data vectors with the first feature values arranged in the top 10 names is determined as the corresponding second item data of the first item data.

FIG. 5 illustrates a flow chart for determining first recalled item data from the second item data in accordance with an embodiment of the present disclosure. As shown in fig. 5, the step S204 of determining the first recalled item data according to the second item data includes the following steps S501 to S502:

in step S501, calculating a second feature value of the corresponding second item data according to the first feature value of the corresponding second item data of the plurality of first item data;

in step S502, the first recalled item data is determined according to the second feature value of the corresponding second item data.

According to an embodiment of the present disclosure, since each first item data may have one or more corresponding second item data, a situation that different first item data has one or more same corresponding second item data may occur, and therefore, a first characteristic value of the same corresponding second item data of different first item data may be subjected to summation or weighted summation to determine a second characteristic value of the second item data. For example, the first feature value of the second item data corresponding to the first item with a large demand may be given a higher weight. For example, in the catering field, the fried eggs of the tomatoes of the first sample data and the sirloin of the tomatoes of the first sample data comprise the same corresponding tomatoes of the second sample data, and when the second characteristic value of the tomatoes of the second sample data is determined, the sum or weighted sum of the first characteristic value of the tomatoes in the fried eggs of the tomatoes and the first characteristic value of the tomatoes in the sirloin of the tomatoes may be performed, so as to determine the second characteristic value of the tomatoes of the second sample data.

According to the embodiment of the disclosure, a preset second characteristic value threshold value can be set, and second item data with a second characteristic value larger than the preset second characteristic value threshold value is determined as first recalled item data; for another example, the second feature values may be ranked in descending order, and a second preset number of second item data may be sequentially selected from descending order to ascending order according to the ranking order to determine as the first recalled item data, for example, the second item data with the top 20 second feature values may be determined as the first recalled item data.

Fig. 6 shows a flow chart of a data processing method according to an embodiment of the present disclosure. As shown in fig. 6, the data processing method includes, in addition to steps S201 to S204 shown in fig. 2, the following step S205:

in step S205, second recalled item data is determined from a third item data set according to the first recalled item data by a processor, the second recalled item being of the same type as the first recalled item.

According to an embodiment of the present disclosure, the third item data set includes a plurality of third item data therein, the third item and the second recall item being the same type of item as the first recall item, e.g., when the first recall item is a raw material, the third item and the second recall item are also raw materials; when the first recalled item is a component, the third item and the second recalled item are also components. Since the first recalled-item data is determined from the first item data of the physical object, the second recalled-item data may be determined from the first recalled-item data, thereby establishing an association between the first item data and the second recalled-item data by the first recalled-item data. For example, in the catering field, the first item is a merchant dish, the first recall item is a raw material corresponding to the merchant dish, and the second recall item is a raw material commodity provided by a raw material provider, and by blending the raw material corresponding to the merchant dish into a recall algorithm, an overall recall path from the raw material corresponding to the merchant dish to the raw material commodity provided by the raw material provider is established. For example, in the field of automobile manufacturing, the first article is an automobile product, the first recall article is a part corresponding to the automobile product, and the second recall article is a part commodity provided by a part supplier, and the part corresponding to the automobile product is merged into the recall algorithm, so that an overall recall path from the automobile product to the part corresponding to the automobile product to the part commodity provided by the part supplier is established.

According to the technical scheme provided by the embodiment of the disclosure, an overall recall path from first item data of an entity object to first recalled item data to second recalled item data of a raw material (for example, raw material or parts) is realized, the cost of manual labeling is reduced, a recall operation is performed based on the first item data of the entity object, a recall algorithm which does not depend on historical behavior data of the entity object is realized, the stability of a recall architecture and the accuracy of a recommendation system are improved, and the timeliness and the correlation of recalled items and recommended items are improved.

According to the embodiment of the present disclosure, the step S205 of determining second recalled item data from a third item data set according to the first recalled item data includes:

Because different recall manners have different emphasis points, in order to determine the second recalled item data more accurately and comprehensively, according to an embodiment of the present disclosure, the second recalled item data may be determined from the third item data set according to a plurality of different recall manners. For example, based on the already acquired first recalled item data, a first candidate set may be determined by word segmentation matching and/or a second candidate set may be determined by a vector recall model and/or a third candidate set may be determined by a synonym recall model, and then second recalled item data may be determined based on the first candidate set and/or the second candidate set and/or the third candidate set. The number of candidate second recalled item data included in the first candidate set and/or the second candidate set and/or the third candidate set is not particularly limited, and may be determined according to actual needs.

Fig. 7 illustrates a flow chart for determining a first candidate set of candidate second recall item data from the first recall item data by word segmentation matching according to an embodiment of the present disclosure.

As shown in fig. 7, the determining the first candidate set of the candidate second recall item data by word segmentation matching according to the first recall item data includes the following steps S701 to S702:

in step S701, performing word segmentation processing on the first recalled item data and the third item data through a word segmentation device to obtain first recalled item word segmentation data and third item word segmentation data;

in step S702, performing word segmentation matching based on the first recalled item word segmentation data and the third item word segmentation data, and determining a first candidate set of the candidate second recalled item data.

According to an embodiment of the present disclosure, the first recalled item data and the third item data tend to be not directly matchable. For example, the first recalled item data may be "soy sauce" and the third item data may be "seabrand soy sauce" that do not match directly. In order to realize accurate matching between the first recalled article data and the third article data, word segmentation processing can be respectively carried out on the first recalled article data and the third article data by utilizing word segmenters, so that one or more first recalled article word segmentation data and one or more third article word segmentation data can be obtained. For example, in the catering field, when the first recalled article data is "soybean paste", and the third article data is "soybean bottle paste", the word segmentation processing is performed on the "soybean paste", so that word segmentation data of the first recalled article can be obtained, wherein the word segmentation data comprises three word segmentation results of "soybean", "paste" and "soybean paste", and the word segmentation processing is performed on the "soybean bottle paste", so that word segmentation data of the third article can be obtained, wherein the word segmentation data comprises three word segmentation results of "soybean", "large bottle" and "paste". And then matching the first recalled item word segmentation data with the third item word segmentation data according to a preset matching rule, and determining a first candidate set according to a matching result. The embodiment of the present disclosure achieves accurate matching between the first recalled item data and the third item data by performing word segmentation processing on the first recalled item data and the third item data.

According to the embodiment of the disclosure, the preset word segmentation device can be obtained by combining with a specific application field. In particular, it can be implemented as: acquiring the labeling data of the first recalled article data and/or the third article data; determining a set of pre-selected word segmenters, wherein the set of pre-selected word segmenters includes one or more word segmenters; respectively testing the effect of each word segmenter in the pre-selected word segmenter set by utilizing the labeling data of the first recalled article data and/or the third article data; determining the word segmentation device with the best test effect as a candidate word segmentation device according to the test effect; determining a custom dictionary based on the annotation data of the first recalled item data and/or the third item data; loading the self-defined dictionary into the candidate word segmentation device; and acquiring the preset word segmentation device. After the preset word segmentation device is obtained, word segmentation processing can be carried out on the first recall item data and the third item data through the preset word segmentation device, and the preset word segmentation device is obtained according to the labeling data in the specific application field, so that the obtained first recall item word segmentation data and the third item word segmentation data are more targeted. For example, the catering field and the automobile field comprise different professional vocabularies, and the preset word segmentation device in the catering field can be obtained based on the labeling data of the professional vocabularies in the catering field; the preset word segmentation device in the automobile field can be obtained based on the labeling data of the professional vocabulary in the automobile field. The preset word segmentation device supports the loading of a dynamic dictionary, and when the first recalled article data and/or the third article data are/is newly added, namely when a new professional vocabulary appears in a certain field, the dictionary of the new professional vocabulary can be extracted and timely loaded into the preset word segmentation device.

According to an embodiment of the present disclosure, the step S702 of performing word segmentation matching based on the first recalled item word segmentation data and the third item word segmentation data to determine the first candidate set of candidate second recalled item data may be implemented as: matching the first recalled item word segmentation data and the third item word segmentation data according to a preset matching rule based on the text length P in the first recalled item data and/or the word segmentation result quantity Q of the first recalled item word segmentation data, and determining a first candidate set of candidate second recalled item data. For example, in the catering field, when the first recall item data includes "soybean paste", the obtained participles may be "soybean", "soybean paste", or "soybean paste", that is, the length of the text in the first recall item data is 3 and the number of participle results of the first recall item participle data is 3. The preset matching rules may include a first matching rule or a second matching rule, where the first matching rule may be a fuzzy matching rule; the second matching rule may be an exact matching rule.

According to an embodiment of the present disclosure, the first matching rule is to determine the candidate second recalled item data by a matching rate. Firstly, determining the number M of the first recall item word segmentation results which are completely matched with the third item word segmentation results, then calculating the ratio of the number M to the word segmentation result number Q of the first recall item word segmentation data as a matching rate, wherein the matching rate can be in the percentage form of the ratio, and determining the third item data corresponding to the third item word segmentation data with the matching rate higher than the preset matching rate as candidate second recall item data. For example, the first recall item participle data includes a participle result "soybean", "soy sauce" or "soy paste", i.e., Q =3, and if the third item participle data includes a participle result "soybean", "jar", "soy paste", then the first recall item participle result that is completely matched with the third item participle result is "soybean" and "soy paste", i.e., the number M =2, then a matching rate of 66.67% can be obtained, and if the preset matching rate is 80%, then the third item data "soy paste" cannot be determined as the candidate second recall item data; assuming that the preset matching rate is 60%, the third item data "soybean paste" may be determined as the candidate second recalled item data, and the third item data "soybean paste" may be added to the first candidate set.

The following table schematically shows a second matching rule according to an embodiment of the disclosure.

The third column in the above table "match" means match with the word segmentation result in the third item word segmentation data. Third item data corresponding to the third item participle data meeting the second matching rule may be added to the first candidate set.

FIG. 8 illustrates a flow chart for determining a second candidate set of candidate second recalled item data from the first recalled item data utilizing a vector recall model in accordance with an embodiment of the present disclosure. As shown in fig. 8, determining a second candidate set of candidate second recalled item data using a vector recall model according to the first recalled item data includes the following steps S801-S803:

in step S801, obtaining the first recalled item data vector by using the first model according to the first recalled item data;

in step S802, according to the third article data, a second model is used to obtain a third article data vector;

in step S803, a vector recall calculation is performed on the first recalled item data vector and the third item data vector, and a second candidate set of candidate second recalled item data is determined according to the calculation result.

According to the embodiment of the disclosure, since the first recalled item and the second item are the same type of item, and the second item data vector corresponding to the second item data can be obtained based on the first model (BERT model), the first recalled item data is input to the trained first model (BERT model), and the first recalled item data vector can be obtained.

According to the embodiment of the disclosure, the third item data vector may be obtained by using the second model, for example, the embedding vector corresponding to the third item is obtained according to the third item chinese text name in the third item data identification information. The first model and the second model may be the same model or different models, and the second model is not specifically limited in the present disclosure, as long as the model that can convert text information into an embedding vector is within the protection scope of the embodiment of the present disclosure, for example, a chinese speech model (CLM model), a continuous bag of words model (CBOW model), a word2vector model, an item2vector model, or a BERT model.

According to the embodiment of the present disclosure, the specific implementation details of the vector recall calculation for the first recalled item data vector and the third item data vector are similar to the specific implementation details of step S203, that is, the vector recall calculation is performed for the first item data vector and the second item data vector through the processor, and therefore, the details are not repeated herein.

FIG. 9 shows a flow chart for obtaining a third item data vector using a second model based on the third item data according to an embodiment of the disclosure. As shown in fig. 9, the step S802 of obtaining the third article data vector by using the second model according to the third article data includes the following steps S901 to S902:

in step S901, when the third item data matches the first recalled item data, taking the first recalled item data vector as the third item initialization data vector;

in step S902, a third item data vector is obtained by using the second model based on the third item initialization data vector.

It should be understood that the embodiment of the present disclosure will be described by taking the second model as the item2vector model as an example, but should not be taken as a limitation to the present disclosure. The item2vector model is derived from a word2vector model, and the item2vector model architecture is the same as the skip-gram model architecture in the word2vector model, so the item2vector model and the word2vector are similar in training process, that is, item (third item data) is treated as word, so as to obtain a third item data vector.

In accordance with embodiments of the present disclosure, to enable the first recalled item data vector and the third item data vector to be vector recalled for calculation, the first recalled item data vector and the third item data vector may be in a same vector interval. Since the first recalled item data vector has already been acquired using the first model in step S801, and the third item and the first recalled item are the same type of item, when the third item data matches the first recalled item data, for example, the first recalled item data includes the third item data, or the first recalled item data is the same as the third item data, the already acquired first recalled item data vector may be used as the third item initialization data vector. Through the initialization operation, the first recalled article data vector and the third article data vector are in the same vector interval, and therefore a foundation is laid for vector recall calculation of the first recalled article data vector and the third article data vector. And then, based on the initialized data vector of the third article, utilizing an item2vector model to perform continuous iteration so as to obtain a data vector of the third article.

FIG. 10 illustrates a flow chart for determining a third candidate set of candidate second recalled item data from the first recalled item data utilizing a synonym recall model, according to an embodiment of the present disclosure. As shown in fig. 10, the determining a third candidate set of the candidate second recalled item data according to the first recalled item data by using a synonym recall model includes the following steps S1001-S1002:

in step S1001, according to the first recalled item data, acquiring the first recalled item synonym data;

in step S1002, a third candidate set of candidate second recalled item data is determined based on the first recalled item synonym data.

According to the embodiment of the disclosure, since the first item recall data may be synonyms of the third item data, for example, "pachyrhizus" and "sweet potatoes" belong to the synonyms, synonym expansion can be performed on the first recall item data by using a synonym expansion model and/or an open synonym expansion database, so as to obtain one or more first recall item synonym data. A third candidate set is then determined based on the first recalled item synonym data. The embodiment of the disclosure brings the synonyms of the first recalled article data into the recall range by carrying out synonym processing on the synonyms, thereby improving the accuracy of the recall result.

Fig. 11 illustrates a flow chart for determining the second recalled item data based on the first candidate set and/or the second candidate set and/or the third candidate set of the candidate second recalled item data according to an embodiment of the present disclosure. As shown in fig. 11, the determining the second recalled item data based on the first candidate set and/or the second candidate set and/or the third candidate set of the candidate second recalled item data includes the following steps S1101-S1103:

in step S1101, determining a third feature value of the candidate second recall item data in the first candidate set and/or the second candidate set and/or the third candidate set of the candidate second recall item data;

in step S1102, for the candidate second recalled items belonging to the same category, sorting the candidate second recalled item data of the first candidate set and/or the second candidate set and/or the third candidate set of the candidate second recalled item data according to the third feature value;

in step S1103, the second recalled item data is determined from the candidate second recalled item data based on the ranking result.

According to an embodiment of the present disclosure, the second recalled item data may be determined by a third feature value of the candidate second recalled item data, such that a third preset number of candidate second recalled item data selected from the first candidate set and/or the second candidate set and/or the third candidate set is determined as the second recalled item data. Wherein the third characteristic value of the candidate second recall item data is used to represent an association relationship between the candidate second recall item data and the first recall item data. Since the first candidate set and/or the second candidate set and/or the third candidate set may include the same candidate second recalled item data, the third feature values of the same candidate second recalled item data included in different sets may be summed in a weighted manner, so as to determine the third feature value of the candidate second recalled item data. For example, all weights may be set to the same value (e.g., 1), or at least some weights may be set to different values.

According to an embodiment of the present disclosure, since the candidate second recalled articles may include one or more categories, the category candidate second recalled article data may be sorted according to the third feature value for each category candidate second recalled article, and a fourth preset number of second recalled article data may be determined in each category candidate second recalled article data based on the sorting result, where the fourth preset number of each category may be equal or unequal, when the fourth preset number is equal, the number of second recalled article data recalled for each category is equal, when the fourth preset number is unequal, the number of second recalled article data recalled for each category is unequal.

According to an embodiment of the present disclosure, the second recalled item data may be proportionally selected from the respective categories according to a total number of the second recalled item data. For example, assuming that the total number of the second recalled item data is 600, the total number of the first category candidate second recalled item data is 1000, the total number of the second category candidate second recalled item data is 2000, and the total number of the third category candidate second recalled item data is 3000, 100 second recalled item data having the largest third feature value is selected from the first category candidate second recalled item data, 200 second recalled item data having the largest third feature value is selected from the second category candidate second recalled item data, and 300 second recalled item data having the largest third feature value is selected from the third category candidate second recalled item data. By selecting the second recalled article data in different categories, the diversity of the recalled categories can be effectively realized, and the phenomenon that the recalled articles are concentrated in one or more categories and other categories are omitted is avoided.

According to an embodiment of the present disclosure, the number of the second recalled item data of each category recall may be determined according to a propensity value of the entity object for a different category. For example, in the catering industry, the candidate second recall item may be a raw material item, and the primary category of the raw material item may include: semi-finished products, kitchen supplies, drinks with wine, rice, flour, grain and oil, eggs and meat, convenience of business, vegetables, fruits, aquatic products, frozen seasonings, dry seasonings and the like, so that candidate second recalled article data can be classified according to the first-level category, and nine candidate second recalled article data are obtained; then, for each type of candidate second recall item data, arranging the third eigenvalues in an order from large to small, and sequentially selecting a fourth preset number of candidate second recall item data from large to small according to the arrangement order to determine the candidate second recall item data as the second recall item data, for example, selecting 50 candidate second recall items for each category to determine the candidate second recall item data as the second recall item data; if the merchant is a light restaurant and prefers vegetables and fruits, the fourth predetermined amount of vegetables and fruits may be set to a higher value, such as 100, and the fourth predetermined amount of other categories may be set to a lower value, such as 50.

According to an embodiment of the present disclosure, the third feature value is a product of a ranking score of the candidate second recall item data and a weight score, wherein the weight score is a second feature value of second item data corresponding to the candidate second recall item data, and the ranking score is a vector distance of first recall item data corresponding to the candidate second recall item data and the candidate second recall item data.

Fig. 12 is a schematic diagram illustrating an application scenario of a data processing method according to an embodiment of the present disclosure. As shown in fig. 12, the application scenario includes a server 1201 and a mobile phone terminal 1202, and for convenience of description, only one server 1201 and one mobile phone terminal 1202 are drawn in the application scenario of fig. 12, it should be understood that this example is only used as an example, and is not a limitation to the present disclosure, and the number, the kind, and the connection manner of the server 1201 and the mobile phone terminal 1202 in the present disclosure may be set according to an actual need, which is not specifically limited by the present disclosure. Meanwhile, the application scenario of the embodiment of the disclosure will be described by taking the catering field as an example, but the disclosure is not limited thereto, and is also applicable to other fields.

The recall algorithm in the prior art is that raw material data purchased by a merchant on an internet platform is firstly acquired according to the merchant ID, the purchased raw material data is sequenced from big to small according to the number of the purchased raw materials, and the 10 front raw material data are supposed to respectively comprise Chinese chives, fresh mushrooms, small herbs, green bean sprouts, small green vegetables, gingers, caraway, chives, cabbage and white vinegar; then, recall calculation is performed based on the above 10 raw material data, and it is assumed that 10 recalled raw material commodities, which are ranked at the top, are obtained to include potato, tomato, cucumber, chinese red pepper, chive, green cabbage, mineral water, soybean oil, beverage, and shrimp cake, respectively. Therefore, the recalled raw material commodity is biased to the category of fruits and vegetables through the recall algorithm in the prior art because the behavior of the merchant for purchasing the raw materials historically tends to the category of fruits and vegetables.

When the data processing method of the embodiment of the present disclosure is adopted, after the merchant inputs the merchant ID through the mobile phone terminal 1202, and the server 1201 receives the merchant ID, first, for example, first item data of the merchant, that is, dish data sold online by the merchant may be obtained from the database, and after the dish data includes dish Chinese text names and dish month order quantities, the dish data may be sorted according to the order of the dish month order quantities from large to small, and assuming that the top 10 dish data respectively include sliced meat, poached eggs, green vegetables, vegetarian chicken, braised pork meat, braised pork fat sausage meat, eggs, pork meat, and braised beef meat, it can be known from the dish data that the merchant sells the best dish that is sliced meat and poached eggs, so that the merchant needs are more inclined to pork meat and eggs than to melon and fruit vegetables, that the prior art recall algorithm described above cannot be matched accurately.

According to the embodiment of the disclosure, after the merchant dish data is obtained, the data processing method of the embodiment of the disclosure adopts two-layer recall, where the first-layer recall is to recall the first recall item data, that is, the raw material data of the dish, according to the merchant dish data, and specifically, performs vector recall calculation according to a merchant dish data vector and a dish raw material data vector loop obtained by a trained first model, so as to obtain the first recall item data, for example, the raw material data of the dish ranked at the top. Then, a second-layer recall is carried out according to the first recalled item data, second recalled item data, namely raw material commodity data provided by an internet platform where the server 1201 is located, are obtained through word segmentation matching, a vector recall model and a synonym recall model according to raw material data of dishes, and 10 raw material commodity data which are ranked in the front are obtained through two-layer recall according to the 10 dish data and respectively comprise shredded pork, pork steak, eggs, chicken, green vegetables, big vegetarian chicken, cooked fat intestines, fresh mushrooms, eggs and beverages. By comparing the dish data of the commercial tenant with the recalled raw material commodity data, the matching degree between the dish data and the recalled raw material commodity data is high, so that compared with the prior art, the data processing method disclosed by the embodiment of the disclosure can dig out the real requirements of the commercial tenant, and the requirements of the commercial tenant can be better met.

As can be seen from the recall result, the data processing method according to the embodiment of the disclosure establishes an overall recall path from the merchant dish to the raw material corresponding to the merchant dish to the raw material commodity provided by the internet platform by integrating the raw material corresponding to the merchant dish into the recommendation system, reduces the cost for the merchant to sell the dish label and the cost for the merchant to purchase the artificial label in the raw material label, performs a recall operation based on the merchant to sell the dish, implements a recall algorithm independent of the data of the merchant's historical purchased raw material, improves the stability of a recall architecture and the accuracy of the recommendation system, and improves the timeliness and the relevance of recalling the raw material commodity.

Fig. 13 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure. The apparatus may be implemented as part or all of an electronic device through software, hardware, or a combination of both. As shown in fig. 13, the data processing apparatus 1300 includes an obtaining module 1310, an obtaining module 1320, a first determining module 1330, and a second determining module 1340.

The acquisition module 1310 is configured to acquire first item data;

the obtaining module 1320 is configured to obtain, by the processor, a first item data vector from the first item data using the trained first model;

the first determining module 1330 is configured to perform a vector recall calculation on the first item data vector and the second item data vector through the processor, and determine corresponding second item data of the first item data according to a calculation result, where the first item and the corresponding second item have an association relationship and are different types of items;

the second determining module 1340 is configured to determine first recalled item data from the corresponding second item data of the first item data, the first recalled item being the same type of item as the second item.

According to an embodiment of the present disclosure, the training process of the first model includes:

According to an embodiment of the present disclosure, the performing a vector recall calculation on the first item data vector and the second item data vector, and determining corresponding second item data of the first item data according to a calculation result includes:

According to an embodiment of the present disclosure, said determining first recalled item data from said second item data comprises:

According to an embodiment of the present disclosure, further comprising:

a third determining module 1350 configured to determine, by the processor, second recalled item data from a third item data set based on the first recalled item data, the second recalled item being the same type of item as the first recalled item.

According to an embodiment of the present disclosure, said determining second recalled item data from a third item data set from said first recalled item data comprises:

According to an embodiment of the present disclosure, said determining a first candidate set of candidate second recall item data by word segmentation matching from the first recall item data comprises:

According to an embodiment of the present disclosure, said determining a second candidate set of candidate second recalled item data from said first recalled item data utilizing a vector recall model, comprises:

According to an embodiment of the present disclosure, obtaining the third item data vector using a second model according to the third item data includes:

According to an embodiment of the present disclosure, said determining a third candidate set of candidate second recalled item data from said first recalled item data utilizing a synonym recall model, comprises:

According to an embodiment of the present disclosure, said determining the second recalled item data based on the first candidate set and/or the second candidate set and/or the third candidate set of the candidate second recalled item data comprises:

determining a third feature value of the candidate second recall item data in the first candidate set and/or the second candidate set and/or the third candidate set of candidate second recall item data;

The present disclosure also discloses an electronic device, and fig. 14 shows a block diagram of the electronic device according to an embodiment of the present disclosure.

As shown in fig. 14, the electronic device 1400 includes a memory 1401 and a processor 1402; wherein the content of the first and second substances,

the memory 1401 is used to store one or more computer instructions which are executed by the processor 1402 to implement the method steps of:

acquiring first article data;

determining first recall item data from the corresponding second item data of the first item data, the first recall item being of the same type as the second item.

Fig. 15 shows a schematic structural diagram of a computer system suitable for implementing a data processing method according to an embodiment of the present disclosure.

As shown in fig. 15, the computer system 1500 includes a Central Processing Unit (CPU) 1501 which can execute various processes in the above-described embodiments in accordance with a program stored in a Read Only Memory (ROM) 1502 or a program loaded from a storage section 1508 into a Random Access Memory (RAM) 1503. In the RAM1503, various programs and data necessary for the operation of the system 1500 are also stored. The CPU1501, the ROM1502, and the RAM1503 are connected to each other by a bus 1504. An input/output (I/O) interface 1505 is also connected to bus 1504.

The following components are connected to the I/O interface 1505: an input portion 1506 including a keyboard, a mouse, and the like; an output portion 1507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 1508 including a hard disk and the like; and a communication section 1509 including a network interface card such as a LAN card, a modem, or the like. The communication section 1509 performs communication processing via a network such as the internet. A drive 1510 is also connected to the I/O interface 1505 as needed. A removable medium 1511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1510 as necessary, so that a computer program read out therefrom is mounted into the storage section 1508 as necessary.

In particular, the methods described above may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the above-described object class determination method. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 1509, and/or installed from the removable medium 1511.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules described in the embodiments of the present disclosure may be implemented by software or by programmable hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.

As another aspect, the present disclosure also provides a computer-readable storage medium, which may be a computer-readable storage medium included in the electronic device or the computer system in the above embodiments; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the present disclosure.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A data processing method, comprising:

acquiring first article data;

vector recall calculation is carried out on the first item data vector and the second item data vector through a processor, a first characteristic value of the second item data vector is determined through a vector distance between the first item data vector and a plurality of second item data vectors, corresponding second item data of the first item data is determined according to the first characteristic value, and the first item and the corresponding second item have an association relation and are different types of items;

calculating a second feature value of the corresponding second item data according to the first feature value of the corresponding second item data of the plurality of first item data, and determining first recalled item data according to the second feature value of the corresponding second item data, wherein the first recalled item and the second item are the same type of item.

2. The method of claim 1, wherein the training process of the first model comprises:

3. The method of claim 1, wherein the first feature value represents a degree of match of the first item data vector and the second item data vector.

4. The method of claim 1, further comprising:

5. The method of claim 4, wherein determining second recalled item data from a third item data set from the first recalled item data comprises:

6. The method of claim 5, wherein determining the first candidate set of candidate second recall item data by word segmentation matching from the first recall item data comprises:

7. The method of claim 5, wherein said determining a second candidate set of candidate second recalled item data from said first recalled item data utilizing a vector recall model comprises:

8. The method of claim 7, wherein obtaining the third item data vector using a second model based on the third item data comprises:

9. The method of claim 5, wherein said determining a third candidate set of candidate second recalled item data from said first recalled item data using a synonym recall model comprises:

10. The method of claim 5, wherein said determining the second recalled item data based on the first candidate set and/or second candidate set and/or third candidate set of the candidate second recalled item data comprises:

11. The method of claim 10, wherein the third characteristic value is a product of a ranking score of the candidate second recalled item data and a weight score, wherein the weight score is a second characteristic value of a second item data to which the candidate second recalled item data corresponds, and wherein the ranking score is a vector distance of a first recalled item data to which the candidate second recalled item data corresponds and the candidate second recalled item data.

12. A data processing apparatus, characterized by comprising:

an acquisition module configured to acquire first item data;

a first determining module configured to perform vector recall calculation on the first item data vector and the second item data vector through a processor, determine a first feature value of the second item data vector through a vector distance between the first item data vector and a plurality of second item data vectors, and determine corresponding second item data of the first item data according to a first feature value threshold or a ranking order, wherein the first item and the corresponding second item have an association relationship and are different types of items;

a second determining module configured to calculate a second feature value of the corresponding second item data from the first feature value of the corresponding second item data of the plurality of first item data, and determine first recalled item data from the second feature value of the corresponding second item data, the first recalled item being of the same type as the second item.

13. The apparatus of claim 12, wherein the training process of the first model comprises:

14. The apparatus of claim 12, wherein the first feature value represents a degree of match of the first item data vector and the second item data vector.

15. The apparatus of claim 12, further comprising:

16. The apparatus of claim 15, wherein said determining second recalled item data from a third item data set from said first recalled item data comprises:

17. The apparatus of claim 16, wherein determining a first candidate set of candidate second recall item data by word segmentation matching from the first recall item data comprises:

18. The apparatus of claim 16, wherein the determining a second candidate set of the candidate second recalled item data from the first recalled item data utilizing a vector recall model comprises:

19. The apparatus of claim 18, wherein obtaining the third item data vector using a second model based on the third item data comprises:

20. The apparatus of claim 16, wherein said determining a third candidate set of candidate second recalled item data from said first recalled item data using a synonym recall model comprises:

21. The apparatus of claim 16, wherein the determining the second recalled item data based on the first candidate set and/or second candidate set and/or third candidate set of candidate second recalled item data comprises:

22. The apparatus of claim 21, wherein the third characteristic value is a product of a ranking score of the candidate second recalled item data and a weight score, wherein the weight score is a second characteristic value of a second item data to which the candidate second recalled item data corresponds, and wherein the ranking score is a vector distance of a first recalled item data to which the candidate second recalled item data corresponds and the candidate second recalled item data.

23. An electronic device comprising a memory and a processor; wherein the memory is to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method steps of:

acquiring first article data;

performing vector recall calculation on the first item data vector and the second item data vector through a processor, determining a first characteristic value of the second item data vector through a vector distance between the first item data vector and a plurality of second item data vectors, and determining corresponding second item data of the first item data according to the first characteristic value, wherein the first item and the corresponding second item have an association relationship and are different types of items;

24. A computer-readable storage medium having stored thereon computer instructions, characterized in that the computer instructions, when executed by a processor, carry out the method steps of any of claims 1-11.