CN112016407B

CN112016407B - Digital menu generation method and device suitable for intelligent kitchen system

Info

Publication number: CN112016407B
Application number: CN202010790142.8A
Authority: CN
Inventors: 李�赫; 孙雷
Original assignee: Beijing Ruying Intelligent Technology Co ltd
Current assignee: Beijing Ruying Intelligent Technology Co ltd
Priority date: 2020-08-07
Filing date: 2020-08-07
Publication date: 2024-01-05
Anticipated expiration: 2040-08-07
Also published as: CN112016407A

Abstract

The invention discloses a digital menu generating method and device suitable for an intelligent kitchen system, which are used for realizing rapid and accurate generation of digital menus. The method comprises the following steps: multi-mode feature recognition is carried out on the artificial cooking video, and a target object is recognized, wherein the target object comprises articles and/or actions; according to the playing sequence of the artificial cooking video, primarily sequencing the target objects; when two target objects accord with a target object defined by a preset association relation and do not accord with an order defined by the association relation, the order of the two target objects is updated; and generating a digital menu suitable for the mechanical arm according to the target object after the updating sequence.

Description

Digital menu generation method and device suitable for intelligent kitchen system

Technical Field

The invention relates to the technical field of computers and communication, in particular to a digital menu generation method and device suitable for an intelligent kitchen system.

Background

With the development of scientific technology, artificial intelligence technology has moved into people's lives, especially into the kitchen. The intelligent cooking utensil, for kitchen use mechanical arm etc. artificial intelligent device can replace artifical work more and more. The intelligent cooking appliance and the mechanical arm for the kitchen can work on the premise that the digital menu is provided, and the mechanical arm for the kitchen can operate the digital menu to complete cooking actions according to the digital menu. How to teach the rapid generation of digital recipes is a problem to be solved in the industry.

Disclosure of Invention

The invention provides a digital menu generating method and device suitable for an intelligent kitchen system, which are used for realizing rapid and accurate generation of digital menus.

The invention provides a digital menu generation method suitable for a intelligent kitchen system, which comprises the following steps:

multi-mode feature recognition is carried out on the artificial cooking video, and a target object is recognized, wherein the target object comprises articles and/or actions;

according to the playing sequence of the artificial cooking video, primarily sequencing the target objects;

when two target objects accord with a target object defined by a preset association relation and do not accord with an order defined by the association relation, the order of the two target objects is updated;

and generating a digital menu suitable for the mechanical arm according to the target object after the updating sequence.

The technical scheme provided by the embodiment of the invention can comprise the following beneficial effects: according to the embodiment, the sequence of the target objects is adjusted according to the characteristics of the mechanical arm, and the generated digital menu is more beneficial to smooth execution of the mechanical arm.

Optionally, before updating the order of the two target objects, the method further includes:

determining a time of the target object in the video;

and determining every two target objects of which the time distances are smaller than a preset time distance threshold.

The technical scheme provided by the embodiment of the invention can comprise the following beneficial effects: the relevance between the target objects is determined through the time distance, so that whether the sequence needs to be adjusted or not can be accurately judged, and the obtained sequence is accurate.

Optionally, the time distance threshold is 0;

before updating the order of the two target objects, the method further comprises:

determining the position distance of the two target objects in the artificial cooking video;

two target objects with a position distance less than a preset position distance threshold are determined.

The technical scheme provided by the embodiment of the invention can comprise the following beneficial effects: the relevance between the target objects is determined through the position distance, so that whether the sequence needs to be adjusted or not can be accurately judged, and the obtained sequence is accurate.

Optionally, at least one target object of the two target objects is an article;

the method further comprises the steps of:

determining an article category to which a target object of the article belongs;

when the two target objects conform to the target objects defined by the preset association relationship and do not conform to the sequence defined by the association relationship, updating the sequence of the two target objects comprises the following steps:

and updating the sequence of the two target objects when the target objects of the objects conform to the object categories defined by the preset association relation and the two target objects do not conform to the sequence defined by the association relation.

The technical scheme provided by the embodiment of the invention can comprise the following beneficial effects: the method and the device for sorting the articles according to the article categories enable the updated sequence to be more reasonable.

Optionally, the multi-modal feature recognition is performed on the artificial cooking video, including at least one of:

image recognition is carried out on the artificial cooking video;

performing voice recognition on the artificial cooking video;

and performing subtitle recognition on the manual cooking video.

The technical scheme provided by the embodiment of the invention can comprise the following beneficial effects: the embodiment provides a plurality of identification modes to acquire more information.

Optionally, when at least two multi-modal feature identifications are performed on the artificial cooking video, the method further includes:

and fusing the at least two multi-mode features with each other after identifying the objects and actions respectively obtained.

The technical scheme provided by the embodiment of the invention can comprise the following beneficial effects: in the embodiment, various information can be mutually supplemented and fused, so that a more complete and accurate digital menu can be generated.

The invention provides a digital menu generating device suitable for intelligent kitchen system, comprising:

the identification module is used for carrying out multi-mode feature identification on the artificial cooking video and identifying a target object, wherein the target object comprises articles and/or actions;

the ordering module is used for primarily ordering the target objects according to the playing sequence of the artificial cooking video;

the updating module is used for updating the sequence of the two target objects when the two target objects accord with the target objects defined by the preset association relation and do not accord with the sequence defined by the association relation;

and the generation module is used for generating a digital menu suitable for the mechanical arm according to the target object after the updating sequence.

Optionally, the apparatus further includes:

a time module for determining a time of the target object in the video;

the first determining module is used for determining every two target objects with the time distance smaller than a preset time distance threshold.

Optionally, the time distance threshold is 0;

the apparatus further comprises:

the position module is used for determining the position distance of the two target objects in the artificial cooking video;

and the second determining module is used for determining two target objects with the position distances smaller than a preset position distance threshold value.

Optionally, at least one target object of the two target objects is an article;

the apparatus further comprises:

the category module is used for determining the category of the object to which the target object of the object belongs;

the updating module comprises:

and the updating sub-module is used for updating the sequence of the two target objects when the target objects of the objects conform to the object categories defined by the preset association relation and the two target objects do not conform to the sequence defined by the association relation.

Optionally, the identification module includes at least one of:

the image recognition sub-module is used for carrying out image recognition on the manual cooking video;

the voice recognition sub-module is used for carrying out voice recognition on the manual cooking video;

and the subtitle identification sub-module is used for identifying the subtitle of the manual cooking video.

Optionally, when at least two multi-modal feature identifications are performed on the artificial cooking video, the apparatus further includes:

and the fusion module is used for fusing the at least two objects and actions which are respectively obtained after the multi-mode characteristics are identified with each other.

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

The present invention provides a computer readable storage medium having stored thereon computer instructions which when executed by a processor perform the steps of the method.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:

FIG. 1 is a flow chart of a digital recipe generation method suitable for a intelligent kitchen system in an embodiment of the invention;

FIG. 2 is a flowchart of a digital recipe generation method suitable for a intelligent kitchen system in an embodiment of the present invention;

FIG. 3 is a flowchart of a digital recipe generation method suitable for use in the intelligent kitchen system in accordance with an embodiment of the present invention;

FIG. 4 is a block diagram of a digital recipe generating apparatus adapted for use in a intelligent kitchen system in accordance with an embodiment of the present invention;

FIG. 5 is a block diagram of a digital recipe generating apparatus adapted for use in a intelligent kitchen system in accordance with an embodiment of the present invention;

FIG. 6 is a block diagram of a digital recipe generating apparatus adapted for use in a intelligent kitchen system in accordance with an embodiment of the present invention;

FIG. 7 is a block diagram of a digital recipe generating apparatus adapted for use in a intelligent kitchen system in accordance with an embodiment of the present invention;

FIG. 8 is a block diagram of an update module in an embodiment of the invention;

FIG. 9 is a block diagram of an identification module in an embodiment of the invention;

fig. 10 is a block diagram of a digital menu generating apparatus suitable for a smart kitchen system according to an embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.

In the related art, a digital menu suitable for a mechanical arm needs to be generated by a professional technician in a code writing mode. However, the programmer may be unfamiliar with cooking and it takes a lot of time and manpower to write the code. There may be a scheme for automatically generating a digital recipe, but the digital recipe generated by the scheme may not be accurate enough.

To solve the above problems, the inventors of the present application found that there are many food programs showing the cooking process of chefs, and by identifying and analyzing these food cooking videos, digital recipes suitable for a robot arm are automatically generated. And the execution sequence in the digital menu is adjusted according to the association relation, so that the digital menu is more accurate, and the mechanical arm can finish cooking more smoothly.

Referring to fig. 1, the digital menu generating method suitable for the intelligent kitchen system in the present embodiment includes:

step 101: the multi-modal feature recognition is performed on the artificial cooking video to identify a target object, the target object comprising an item and/or an action.

Step 102: and performing primary sequencing on the target objects according to the playing sequence of the artificial cooking video.

Step 103: and updating the sequence of the two target objects when the two target objects conform to the target objects defined by the preset association relation and do not conform to the sequence defined by the association relation.

Step 104: and generating a digital menu suitable for the mechanical arm according to the target object after the updating sequence.

Aiming at the control characteristics of the mechanical arm, the embodiment carries out multi-mode feature recognition on the artificial cooking video, wherein color features, texture features, edge features, audio features, text features and the like can be multi-mode features, and objects and actions can be recognized by recognizing one or more of the multi-mode features. The intelligent cooking system is different from a digital menu suitable for an intelligent pot, can simulate the action behaviors of human beings, realizes automatic dish preparation and dish loading, and is the characteristics and advantages of a mechanical arm, so that the embodiment not only can identify articles in an artificial cooking video, but also can identify the cooking actions of a real person. The actions are as follows: stirring (egg), cutting vegetables, etc.

In the multi-mode feature recognition process, when the object and the action of the recognition result are obtained, the order of obtaining the recognition result is not necessarily the video playing order. Therefore, the method and the device also order the objects and the actions according to the video playing sequence, the obtained sequence is more in accordance with the cooking process, and the mechanical arm grabs the objects and executes the actions according to the sequence, so that the cooking process can be completed better.

In this embodiment, the target objects are initially ordered according to the video playing sequence, and the ordering result is not necessarily accurate. For example, an apple appears in the video first, then a bowl appears, and then the apple is placed in the bowl. The ranking by video is apple-bowl. But this is detrimental to the operation of the robotic arm. The bowl is held firstly, then the apples are held, the end positions of the apples are the positions of the bowl, and the order is more favorable for the mechanical arm to execute. In view of this, the present embodiment is preconfigured with the association relationship including the correspondence relationship and the order relationship of the target object 1 and the target object 2. Specifically, the corresponding relationship and the sequential relationship between the articles 1 and 2 can be that; the corresponding relation and the sequential relation of the action 1 and the action 2; correspondence and order of items 1 to actions 1. For example, the cook says "pour oil again when the video is poured into the pan, and the cook fires and heats to the point that the pan 8 is hot. The order of the recognition results is: article (oil) -action (pour oil) -action (fire). The sequence updated according to the association relation is as follows: action (fire) -article (oil) -action (pour oil).

Firstly judging whether two target objects have preset corresponding relations, and when the corresponding relations exist, judging whether the sequence between the two target objects accords with the sequence relation in the association relation, if not, exchanging the sequence of the two target objects according to the sequence relation defined by the association relation. When there is no correspondence, the order of the two target objects may be maintained. If the order relation is met, the order of the two target objects is also maintained. The reordered sequence is more in line with the operation of the mechanical arm, and the digital menu generated according to the sequence is more accurate.

In this embodiment, for the obtained object and action, an executable command applicable to the mechanical arm is generated, and the executable command is packaged, so as to generate a digital menu applicable to the mechanical arm. The digital menu is input into a control system of the mechanical arm, and the mechanical arm can be controlled to finish the cooking process corresponding to the digital menu, so that the dish is finished.

The embodiment realizes intelligent analysis of the real cooking video and automatically generates the digital menu suitable for the mechanical arm. The process of generating the menu saves a great deal of manpower and is quick, so that the digital menu is convenient to generate in batches.

The articles in this embodiment include food (including seasonings, etc.), utensils (such as trays, bowls, etc.), tools (such as stirrers, etc.), external devices (such as pans, electronic scales, range hoods, cookers, etc.), and the like.

Optionally, before updating the order of the two target objects, the method further includes: step A1-step A2.

Step A1: a time of the target object in the video is determined.

Step A2: and determining every two target objects of which the time distances are smaller than a preset time distance threshold.

The time in this embodiment may be a relative time in video, or may be represented by the number of video frames or the number of audio frames.

In this embodiment, the time distance threshold may be 0 second, 0.1 second, 1 second, 5 seconds, or 0 frame, 3 frames, 24 frames, 240 frames, or the like. Wherein 0 seconds and 0 frames represent that two target objects appear in the same frame.

For example, item 1 appears before item 2 appears in the video. If the number of video frames between the article 1 and the article 2 is less than the preset threshold number of video frames (e.g., 240 frames), then it is determined that the temporal distance between the article 1 and the article 2 is similar. As another example, if the relative time between the article 1 and the article 2 is less than a preset duration threshold (e.g., 10 seconds), then it is determined that the time distances between the article 1 and the article 2 are similar. Wherein if the item 1 appears in a succession of video frames, the last video frame is taken as the time of the item 1. If the item 2 is present in a succession of video frames, the first video frame is taken as the time of the item 2. That is, the minimum time distance between the article 1 and the article 2 is taken as the time distance between the article 1 and the article 2.

An action occurs in a succession of video frames, such as action 1 corresponding to video frame 240-video frame 480. In determining the temporal distance between the actions and the item, the minimum temporal distance is subject to. For example, item 1 appears in video frame 220, then the temporal distance between action 1 and item 1 is determined from video frame 220 and video frame 240. As another example, item 1 appears in video frame 500, then the temporal distance between action 1 and item 1 is determined from video frame 480 and video frame 500. As another example, item 1 appears between video frame 240-video frame 480, with a temporal distance of 0. As another example, act 2 corresponds to video frame 500-video frame 720. Then the temporal distance between action 1 and action 2 is determined from video frame 480 and video frame 500. For another example, there is a coincidence of video frames between action 1 and action 2, and the temporal distance is considered to be 0.

In this embodiment, whether the time distance between any two target objects is smaller than the time distance threshold is determined, when the time distance is smaller than the time distance threshold, whether the two target objects satisfy the corresponding relationship defined by the association relationship is determined, and when the corresponding relationship is satisfied, whether the sequential relationship defined by the association relationship is satisfied is determined. Traversing all the identified target objects, and judging whether the time distance between any two target objects is smaller than a time distance threshold value. The target object can be traversed according to the ordered sequence, so that sequential adjustment and updating can be performed sequentially. Two target objects not smaller than the time distance threshold are considered to have no correlation, and the current order can be maintained. After the traversal is completed, an executable command is generated according to the target object according to the current sequence, and then a digital menu is generated.

Optionally, the time distance threshold is 0.

Before updating the order of the two target objects, the method further comprises: step B1-step B2.

Step B1: and determining the position distance of the two target objects in the artificial cooking video.

Step B2: two target objects with a position distance less than a preset position distance threshold are determined.

In this embodiment, when the time distance between two target objects is 0, it means that the two target objects appear in the same video frame. At this time, the correlation of the two target objects can be determined by the position distance of the two target objects in the image, and if the position distance is closer, the correlation is larger, and whether the preset association relationship is met can be determined. If the position distance is far, the correlation is small, whether the preset association relation is met or not can not be judged, and the current sequence is kept.

For example, the article 1 and the article 2 are simultaneously present in one frame of image, and the position coordinates of the article 1 and the article 2 are close, such as that the article 1 and the article 2 are stacked one on top of the other (such as that the food material is placed on a chopping board or in a dish), or such as that the article 1 and the article 2 are placed side by side. I.e. the vertical or horizontal coordinates are close or identical. Whether the order of the articles 1 and 2 is to be exchanged is determined based on the order of the articles 1 and 2 in the association relationship.

For another example, the action and associated items of the action are determined from the video, the associated items including tools and objects of operation for use, such as cutting the vegetables, the associated items including kitchen knives and food materials. The gesture and location at which this action occurs is associated with the location of the associated item, i.e., the coordinates are close and appear in the same video frame. And judging whether the sequence of the related articles and the actions is to be exchanged or not according to the sequence configured for the articles and the actions in the related relation.

In addition, two objects or objects and actions appear in the same video frame, so that there is no sequence between the two objects or objects and actions, and then the sequence of the two objects or objects and actions can be determined according to the association relation.

Optionally, at least one of the two target objects is an article.

The method further comprises the steps of: step C1.

Step C1: an item category to which a target object of an item belongs is determined.

The step 103 includes: step C2.

Step C2: and updating the sequence of the two target objects when the target objects of the objects conform to the object categories defined by the preset association relation and the two target objects do not conform to the sequence defined by the association relation.

The embodiment can judge whether the two articles need to be exchanged according to the article types. For example, the food materials appear in the video firstly, then the vessels appear, and the food materials need to be put into the vessels, that is, the positions of the vessels are the end positions for grabbing the food materials, so that the association relationship agrees that the vessels are behind the front food materials, that is, the mechanical arm grabs the vessels firstly and then grabs the food materials, and at the moment, the sequence of the food materials and the vessels needs to be exchanged.

In this embodiment, when the object type of the target object belongs to the object type defined by the association relationship, it is determined that the target object conforms to the target object defined by the association relationship. The object can be classified according to the object type, so that the configuration and storage of the association relation are simplified, and the matching process of the object and the association relation is simplified.

Optionally, the multi-modal feature recognition is performed on the artificial cooking video, including at least one of: step D1-step D3.

Step D1: and carrying out image recognition on the artificial cooking video.

Step D2: and carrying out voice recognition on the artificial cooking video.

Step D3: and performing subtitle recognition on the manual cooking video.

In this embodiment, a single video frame may be identified frame by frame to identify an item. A plurality of consecutive video frames may also be identified to identify an action. Wherein, the image feature library of the articles and actions can be preset.

The embodiment can convert the voice in the video into the text, and then identify the object (mainly noun) and the action (mainly verb) from the text. Wherein, the word stock of the articles and the actions can be preset.

The embodiment can extract caption characters from the video, and then identify articles and actions from the characters.

The recognition process of the images, the voice and the subtitles can be realized by adopting traditional machine learning, deep learning and other algorithm models. For example, image features, semantic features and text features are extracted by using a deep learning model, multi-modal features are aggregated and encoded into a compressed representation of the video, and the compressed representation of the video is decoded into control instructions of individual mechanical arms by using the deep learning model, thereby generating a digital menu.

Optionally, when at least two multi-modal feature identifications are performed on the artificial cooking video, the method further includes: step E1.

Step E1: and fusing the at least two multi-mode features with each other after identifying the objects and actions respectively obtained.

In this embodiment, the objects and actions respectively identified by the images, the voices and the subtitles may not be identical, and at this time, mutual registration and fusion may be performed, for example, multiple fusion algorithms such as ensemble learning may be adopted. For example, the salt of the article is obtained by image recognition, but the component of the salt cannot be removed. And adding 1 spoon of salt through voice recognition, combining with an image recognition result, and obtaining a fused result of 1 spoon of salt. For another example, if the article 1 is obtained by image recognition, the article 2 is not obtained, and the article 2 is obtained by voice recognition, the fused result is the article 1 and the article 2. For another example, the food material 1 is obtained by image recognition, and the dicing operation of the food material 1 is obtained by voice recognition, and the image shows the diced food material 1 but does not include the dicing process. The result after the fusion is the food material 1 and the dicing operation of the food material 1. For another example, the article 1 is obtained through image recognition, the article 2 is obtained through voice recognition, and the semantics in voice are "better if the article 2 is used", then the fused result is that the article 2 is used for replacing the article 1. Thus, the fusion process may include: adding items or actions, replacing items or actions, refining information about items or actions (e.g., components of items, etc.), and adjusting the order of items and actions, etc.

The implementation is described in detail below by way of several embodiments.

Referring to fig. 2, the digital menu generating method suitable for the intelligent kitchen system in the present embodiment includes:

step 201: the multi-modal feature recognition is performed on the artificial cooking video to identify a target object, the target object comprising an item and/or an action.

Step 202: and performing primary sequencing on the target objects according to the playing sequence of the artificial cooking video.

Step 203: a time of the target object in the video is determined.

Step 204: and determining every two target objects of which the time distances are smaller than a preset time distance threshold.

Step 207 continues for every two target objects whose temporal distance is not less than the preset temporal distance threshold.

Step 205: and determining the object category of the object which is the object for the two object targets.

Step 206: and updating the sequence of the two target objects when the target objects of the objects conform to the object categories defined by the preset association relation and the two target objects do not conform to the sequence defined by the association relation.

When the two target objects do not conform to the target object defined by the preset association relationship, step 206 is skipped and step 207 is continued. When the two target objects conform to the target objects defined by the preset association relationship and conform to the sequence defined by the association relationship, step 206 is skipped and step 207 is continued.

Step 207: and generating a digital menu suitable for the mechanical arm according to the target object after the updating sequence.

Referring to fig. 3, the digital menu generating method suitable for the intelligent kitchen system in the present embodiment includes:

step 301: and carrying out multi-mode feature recognition on the artificial cooking video to recognize the target object. The step can be used for carrying out various identification modes such as video, audio, captions and the like.

Step 302: and fusing the target objects obtained after the identification of the at least two multi-mode features.

Step 303: and performing primary sequencing on the target objects according to the playing sequence of the artificial cooking video.

Step 304: a time of the target object in the video is determined.

Step 305: and determining every two target objects of which the time distances are smaller than a preset time distance threshold.

Step 305 is skipped and step 310 is continued for every two target objects whose temporal distance is not less than the preset temporal distance threshold.

Step 306: and determining the position distance of the two target objects in the artificial cooking video.

Step 307: two target objects with a position distance less than a preset position distance threshold are determined.

Step 307 is skipped and step 310 is continued for two target objects having a position distance not less than the preset position distance threshold.

Step 308: and determining the object category of the object which is the object for the two object targets.

Step 309: and updating the sequence of the two target objects when the target objects of the objects conform to the object categories defined by the preset association relation and the two target objects do not conform to the sequence defined by the association relation.

When the two target objects do not conform to the target object defined by the preset association relationship, step 309 is skipped and step 310 is continued. When the two target objects conform to the target objects defined by the preset association relationship and conform to the sequence defined by the association relationship, step 309 is skipped and step 310 is continued.

Step 310: and generating a digital menu suitable for the mechanical arm according to the target object after the updating sequence.

The above embodiments can be freely combined according to actual needs.

The above description describes a digital recipe generation implementation process suitable for intelligent kitchen systems, which can be implemented by a device, the internal structure and functions of which are described below.

Referring to fig. 4, the digital menu generating apparatus suitable for the intelligent kitchen system in the present embodiment includes: an identification module 401, a ranking module 402, an update module 403, and a generation module 404.

The identification module 401 is configured to perform multi-modal feature identification on the artificial cooking video, and identify a target object, where the target object includes an object and/or an action.

And the sorting module 402 is configured to sort the target objects for the first time according to the playing order of the artificial cooking video.

And an updating module 403, configured to update the order of the two target objects when the two target objects conform to the target objects defined by the preset association relationship and do not conform to the order defined by the association relationship.

And the generating module 404 is configured to generate a digital menu applicable to the mechanical arm according to the updated target object.

Optionally, as shown in fig. 5, the apparatus further includes: a time module 501 and a first determination module 502.

A time module 501 is configured to determine a time when the target object is in the video.

The first determining module 502 is configured to determine every two target objects whose time distances are smaller than a preset time distance threshold.

Optionally, the time distance threshold is 0.

As shown in fig. 6, the apparatus further includes: a location module 601 and a second determination module 602.

A position module 601, configured to determine a position distance of the two target objects in the artificial cooking video.

A second determining module 602 is configured to determine two target objects with a location distance less than a preset location distance threshold.

Optionally, at least one of the two target objects is an article.

As shown in fig. 7, the apparatus further includes: a category module 701.

A category module 701, configured to determine an item category to which a target object that is an item belongs;

as shown in fig. 8, the update module 403 includes: the update sub-module 801.

The updating sub-module 801 is configured to update an order of two target objects when the target objects of the object conform to an object category defined by a preset association relationship and the two target objects do not conform to the order defined by the association relationship.

Optionally, as shown in fig. 9, the identification module 401 includes at least one of the following: an image recognition sub-module 901, a voice recognition sub-module 902, and a caption recognition sub-module 903.

The image recognition sub-module 901 is used for performing image recognition on the artificial cooking video.

The voice recognition sub-module 902 is configured to perform voice recognition on the artificial cooking video.

The caption recognition sub-module 903 is configured to perform caption recognition on the artificial cooking video.

Optionally, as shown in fig. 10, when at least two multi-modal feature recognition is performed on the artificial cooking video, the apparatus further includes: the module 1001 is fused.

And the fusion module 1001 is configured to fuse the at least two objects and actions obtained after the multi-modal feature recognition.

A digital recipe generation apparatus adapted for use in a smart kitchen system, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

A computer readable storage medium having stored thereon computer instructions which when executed by a processor implement the steps of the method.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A digital recipe generation method suitable for a smart kitchen system, comprising:

when two target objects accord with a target object defined by a preset association relation and do not accord with an order defined by the association relation, the order of the two target objects is updated; the preset association relationship is an association relationship which is beneficial to the operation of the mechanical arm;

generating an executable command applicable to the mechanical arm according to the updated target object, packaging the executable command, and generating a digital menu applicable to the mechanical arm, wherein the digital menu is used for being input into a control system of the mechanical arm to control the mechanical arm to finish the cooking process corresponding to the digital menu, so as to finish the dish making;

determining a time of the target object in the video;

determining every two target objects of which the time distances are smaller than a preset time distance threshold; the time distance threshold is 0;

determining two target objects with the position distances smaller than a preset position distance threshold value;

at least one target object of the two target objects is an article;

2. The method of claim 1, wherein multi-modal feature recognition of the artificial cooking video includes at least one of:

image recognition is carried out on the artificial cooking video;

performing voice recognition on the artificial cooking video;

and performing subtitle recognition on the manual cooking video.

3. The method of claim 2, wherein when at least two multi-modal feature identifications are performed on the artificial cooking video, the method further comprises:

4. A digital recipe generation device adapted for use in a smart kitchen system, comprising:

the updating module is used for updating the sequence of the two target objects when the two target objects accord with the target objects defined by the preset association relation and do not accord with the sequence defined by the association relation; the preset association relationship is an association relationship which is beneficial to the operation of the mechanical arm;

the generation module is used for generating an executable command applicable to the mechanical arm according to the target object after the updating sequence, packaging the executable command and generating a digital menu applicable to the mechanical arm, wherein the digital menu is used for being input into a control system of the mechanical arm, and controlling the mechanical arm to finish the cooking process corresponding to the digital menu so as to finish the dish making;

a time module for determining a time of the target object in the video;

the first determining module is used for determining every two target objects with the time distance smaller than a preset time distance threshold; the time distance threshold is 0;

the second determining module is used for determining two target objects with the position distances smaller than a preset position distance threshold value;

at least one target object of the two target objects is an article;

the apparatus further comprises:

the updating module comprises:

5. The apparatus of claim 4, wherein the identification module comprises at least one of:

6. The apparatus of claim 5, wherein when at least two multi-modal feature identifications are performed on the artificial cooking video, the apparatus further comprises:

7. A digital recipe generation device adapted for use in a smart kitchen system, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

before updating the order of the two target objects, further comprising:

determining a time of the target object in the video;

at least one target object of the two target objects is an article;

8. A computer readable storage medium having stored thereon computer instructions, which when executed by a processor, implement the steps of the method of any of claims 1 to 3.