CN110532837B

CN110532837B - Image data processing method in article picking and placing process and household appliance

Info

Publication number: CN110532837B
Application number: CN201810513196.2A
Authority: CN
Inventors: 朱泽春; 李宏峰
Original assignee: Hangzhou Joyoung Household Electrical Appliances Co Ltd
Current assignee: Hangzhou Joyoung Household Electrical Appliances Co Ltd
Priority date: 2018-05-25
Filing date: 2018-05-25
Publication date: 2023-07-21
Anticipated expiration: 2038-05-25
Also published as: CN110532837A

Abstract

The embodiment of the invention discloses an image data processing method and household electrical appliance in the process of taking and placing articles, wherein the method is applied to the household electrical appliance and comprises the following steps: acquiring an image sequence composed of human hands and/or images to be processed of target objects held by the human hands in the process of taking and placing the objects; preprocessing the image sequence at a local end through a preprocessing step so as to screen the image sequence according to the position of the hand of a human body and/or a target object in each frame of images to be processed to determine a target image; uploading the image data of the target image obtained after preprocessing; the image data is used for image recognition at the cloud. By the embodiment, the data is compressed and uploaded, the concurrency expandability is improved, the processing cost is optimized, and the article identification rate and the positioning accuracy are ensured.

Description

Image data processing method in article picking and placing process and household appliance

Technical Field

The embodiment of the invention relates to a household appliance control technology, in particular to an image data processing method in the process of taking and placing articles and a household appliance.

Background

With rapid development of intelligence and wide application of image recognition technology and video recognition technology in the field of home appliances, many home appliances involve recognition of articles put in and taken out, i.e. recognition and judgment of access direction. At present, two modes of whole processing by a cloud server or whole processing by a local end are mainly adopted.

The cloud server is adopted completely for processing, a large amount of real-time video streams are required for uploading, and the front end needs enough bandwidth for uploading, for example, the equipment occupies too much bandwidth, which can cause abnormal home network of the user. The cloud server side needs enough processing capacity, meanwhile, the problem of multi-path concurrent processing needs to be considered, the requirements on the cloud server are very high, the expandability is not strong, and the relative complexity and the cost are high. Although there are ways to reduce the amount of video upload data, such as: (1) reducing the resolution of the uploaded video; (2) reducing the video frame rate; (3) A video encoding and decoding algorithm with higher compression rate such as H264 is adopted; however, the parameter change is limited on the premise of satisfying the recognition, and the transmission data amount cannot be further reduced to a certain extent.

The method and the device have the advantages that a large amount of real-time video streams are processed by the local terminal, meanwhile, identification processing is completed, and the hardware requirements on the local terminal are high.

Disclosure of Invention

In order to solve the technical problems, the embodiment of the invention provides an image data processing method and household electrical appliance in the process of picking and placing articles, which can compress uploading data, improve concurrency expandability, optimize processing cost and simultaneously ensure article identification rate and positioning accuracy.

In order to achieve the object of the embodiments of the present invention, an embodiment of the present invention provides a method for processing image data in a process of picking and placing an article, which is applied to a home appliance, and the method may include:

acquiring an image sequence composed of human hands and/or images to be processed of target objects held by the human hands in the process of taking and placing the objects;

preprocessing the image sequence at a local end through a preprocessing step so as to screen the image sequence according to the position of the hand of a human body and/or a target object in each frame of images to be processed to determine a target image;

uploading the image data of the target image obtained after preprocessing; the image data is used for image recognition at the cloud.

Optionally, the preprocessing step includes at least one of: key frame extraction and/or target region truncation.

Optionally, the method may further include: and determining whether one or more of the preprocessing steps are carried out at the local end or the cloud end according to the operation capability of the local end.

Optionally, preprocessing the target image at the local side through the preprocessing step includes:

detecting motion frames related to human hands and/or target objects by adopting a preset motion detection algorithm, and acquiring sequence images of the motion frames;

selecting key frame images of human hands and/or target objects in a preset position range from the sequence images of the motion frames;

images of human hands and/or target objects are taken from the key frame images.

Optionally, selecting key frame images of the human hand and/or the target object located in a preset position range from the sequence images of the motion frames includes:

positioning human hands and/or target objects in the sequence images of the motion frames, and then obtaining corresponding positioning coordinates;

keeping the value of the Y axis in the positioning coordinates unchanged, and modifying the value of the X axis in the positioning coordinates into a serial number;

fitting the coordinate sequence of the modified positioning coordinates by adopting a preset linear regression model, and obtaining a fitting curve; the fitting curve is a motion curve of a human hand and/or a target object;

acquiring an average coordinate value of a Y-axis coordinate in a fitting curve, and acquiring an X-axis coordinate value corresponding to the average coordinate value of the Y-axis coordinate;

and taking the motion frame corresponding to the X-axis coordinate value as the key frame.

Optionally, the method may further include:

substituting the serial numbers into the fitting curve in sequence, and calculating the numerical value corresponding to each serial number;

counting the number of positive and negative values of the value, and taking the direction represented by the party with the larger number of positive values and the larger number of negative values as the motion direction of the human hand and/or the target object;

judging the current operation behavior of the human hand according to the movement direction; wherein the operational behavior includes an in operation and an out operation.

Optionally, capturing images of the human hand and/or the target object from the key frame image includes:

determining a target area in the key frame image by adopting an inter-frame difference method, wherein the target area comprises an area where a target object is positioned and/or an area where a human hand is positioned; and/or acquiring a target area of the human hand through a preset human hand training model, and taking the target area of the human hand as the target area of the target object when the human hand holds the target object;

and intercepting the image of the target area to serve as an image of the human hand and/or the target object.

Optionally, detecting a motion frame related to a human hand and/or a target object by using a preset motion detection algorithm, and acquiring a sequence image of the motion frame includes:

acquiring an amplitude motion vector image of each frame image in an image sequence according to an optical flow method;

carrying out quantization processing on the amplitude motion vector images to obtain the motion amplitude in each amplitude motion vector image;

comparing the motion amplitude in each amplitude motion vector image with a preset amplitude threshold value;

when the motion amplitude is greater than or equal to the amplitude threshold, taking an original image corresponding to the amplitude motion vector image corresponding to the motion amplitude as a motion frame, and adding a sequence image.

Optionally, the method may further include: after the image sequence is acquired, performing image quality evaluation on images in the image sequence by adopting a preset algorithm, and preprocessing the image sequence at a local end when an evaluation result reaches a preset threshold; and when the evaluation result does not reach the preset threshold, modifying the camera parameters of the household appliance to acquire an image sequence meeting the requirement of the preset threshold.

An appliance device includes a processor and a computer readable storage medium having instructions stored therein that when executed by the processor implement the method of processing image data during pick-and-place of an item as described above.

The beneficial effects of the embodiment of the invention can include:

1. the image data processing method in the article picking and placing process of the embodiment of the invention is applied to household appliances, and the method can comprise the following steps: acquiring an image sequence composed of human hands and/or images to be processed of target objects held by the human hands in the process of taking and placing the objects; preprocessing the image sequence at a local end through a preprocessing step so as to screen the image sequence according to the position of the hand of a human body and/or a target object in each frame of images to be processed to determine a target image; uploading the image data of the target image obtained after preprocessing; the image data is used for image recognition at the cloud. By the embodiment, the data is compressed and uploaded, the concurrency expandability is improved, the processing cost is optimized, and the article identification rate and the positioning accuracy are ensured.

2. The pretreatment step of the embodiment of the invention at least comprises one of the following steps: key frame extraction and/or target region truncation. By the embodiment, the process of key frame extraction and/or target area interception is left at the local end for processing, so that the uploading data volume is greatly reduced.

3. The method of the embodiment of the invention can also comprise the following steps: and determining whether one or more of the preprocessing steps are carried out at the local end or the cloud end according to the operation capability of the local end. According to the embodiment, the data processing capacity of the local end and the cloud end can be flexibly distributed, and the data processing capacity and the data processing efficiency are considered.

4. The preprocessing of the target image at the local end through the preprocessing step in the embodiment of the invention comprises the following steps: detecting motion frames related to human hands and/or target objects by adopting a preset motion detection algorithm, and acquiring sequence images of the motion frames; selecting key frame images of human hands and/or target objects in a preset position range from the sequence images of the motion frames; images of human hands and/or target objects are taken from the key frame images. According to the embodiment, the target image is positioned and intercepted at the local end, and the uploading data volume is greatly reduced.

5. The key frame image of the human hand and/or the target object in the preset position range is selected from the sequence images of the motion frames, which comprises the following steps: positioning human hands and/or target objects in the sequence images of the motion frames, and then obtaining corresponding positioning coordinates; keeping the value of the Y axis in the positioning coordinates unchanged, and modifying the value of the X axis in the positioning coordinates into a serial number; fitting the coordinate sequence of the modified positioning coordinates by adopting a preset linear regression model, and obtaining a fitting curve; the fitting curve is a motion curve of a human hand and/or a target object; acquiring an average coordinate value of a Y-axis coordinate in a fitting curve, and acquiring an X-axis coordinate value corresponding to the average coordinate value of the Y-axis coordinate; and taking the motion frame corresponding to the X-axis coordinate value as the key frame. According to the embodiment, the method and the device for positioning the human hand and/or the target object through the curve fitting method are realized, so that the key frame is determined, and the positioning accuracy is improved. Thereby improving the article recognition rate.

6. The method of the embodiment of the invention can also comprise the following steps: substituting the serial numbers into the fitting curve in sequence, and calculating the numerical value corresponding to each serial number; counting the number of positive and negative values of the value, and taking the direction represented by the party with the larger number of positive values and the larger number of negative values as the motion direction of the human hand and/or the target object; judging the current operation behavior of the human hand according to the movement direction; wherein the operational behavior includes an in operation and an out operation. According to the embodiment, the motion direction judgment algorithm of the object is optimized, and the correctness and the effectiveness of the screened motion frames are guaranteed.

7. The method of the embodiment of the invention can also comprise the following steps: after the image sequence is acquired, performing image quality evaluation on images in the image sequence by adopting a preset algorithm, and preprocessing the image sequence at a local end when an evaluation result reaches a preset threshold; and when the evaluation result does not reach the preset threshold, modifying the camera parameters of the household appliance to acquire an image sequence meeting the requirement of the preset threshold. By the embodiment, the image acquisition in a self-adaptive environment is realized, the image data more conforming to the environment is obtained, and the recognition rate is further improved.

Additional features and advantages of embodiments of the invention will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the invention. The objectives and other advantages of embodiments of the invention may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings are included to provide a further understanding of the technical aspects of embodiments of the invention, and are incorporated in and constitute a part of this specification, illustrate and not limit the technical aspects of embodiments of the invention.

FIG. 1 is a flowchart of an image data processing method in the process of picking and placing an article according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an image quality evaluation process according to an embodiment of the present invention;

FIG. 3 is a flowchart of a method for evaluating image quality of images in an image sequence using a preset algorithm according to an embodiment of the present invention;

fig. 4 is a schematic diagram of image data processing performed by a local terminal and a cloud terminal according to an embodiment of the present invention;

FIG. 5 is a flowchart of preprocessing a target image at a local side through a preprocessing step according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a method for detecting motion frames related to a human hand and/or a target object by using a preset motion detection algorithm to obtain a sequence image of the motion frames according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a method for selecting key frame images of a human hand and/or a target object within a predetermined position range from sequence images of motion frames according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a motion curve fitting method according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of a method for capturing images of a human hand and/or a target object from a key frame image according to an embodiment of the present invention;

FIG. 10 is a block diagram of an image data processing system during pick-and-place of items in accordance with an embodiment of the present invention;

fig. 11 is a block diagram of a home appliance according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention will be described in detail hereinafter with reference to the accompanying drawings. It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be arbitrarily combined with each other.

The steps illustrated in the flowchart of the figures may be performed in a computer system, such as a set of computer-executable instructions. Also, while a logical order is depicted in the flowchart, in some cases, the steps depicted or described may be performed in a different order than presented herein.

In order to achieve the purpose of the embodiment of the present invention, the embodiment of the present invention provides a method for processing image data in a process of picking and placing an article, which is applied to a home appliance, as shown in fig. 1, the method may include steps S101 to S103:

s101, acquiring an image sequence composed of human hands and/or images to be processed of target objects held by the human hands in the process of picking and placing the objects.

In the embodiment of the present invention, the home appliances may include, but are not limited to, refrigerators, ovens, microwave ovens, air fryers, etc., and any home appliance involving a process of picking and placing objects is within the scope of the embodiment of the present invention.

In the embodiment of the invention, the home appliance can be provided with a camera in advance so as to acquire images of the process in the process of picking and placing the articles to form the image sequence.

In the embodiment of the invention, specific setting positions, setting quantity and model numbers of the cameras can be defined according to different application scenes.

In the embodiment of the invention, the home appliances are placed in different application environments, and the influence of environmental factors such as light, noise and the like can be caused, so that the image quality is poor. According to the scheme of the embodiment, an image quality evaluation algorithm can be added in an image acquisition module of the household appliance so as to evaluate the image quality, and an evaluation process schematic diagram can be shown in fig. 2.

In the embodiment of the invention, the image quality evaluation can adopt an edge detection algorithm and filtering processing weighting, and the comprehensive evaluation is carried out by considering information such as image brightness and the like.

Optionally, as shown in fig. 3, performing image quality evaluation on the images in the image sequence using a preset algorithm may include S201-S204:

s201, calculating the edge information degree of the images in the image sequence by adopting a preset first algorithm, obtaining the noise information intensity of the images in the image sequence by filtering, and obtaining the brightness information of the images in the image sequence by gray statistics.

Alternatively, the first algorithm may include: edge detection algorithms.

In the embodiment of the invention, a conventional edge detection operator (namely an edge detection algorithm) is adopted to calculate the edge informativity (N _edge ) The filtering process is used to evaluate the noise information intensity (N _noise ) Gray scale statistical method determines brightness information (lambda _light )。

S202, acquiring an image quality coefficient of the original image according to the edge information degree, the noise information intensity, the brightness information and a preset image quality calculation formula.

In the embodiment of the present invention, the preset image quality calculation formula may include the following image quality coefficient relation formula:

weight＝ω*N _edge /(N _noise *λ _light )

in the embodiment of the invention, weight is an image quality coefficient, and ω is a calculation coefficient.

S203, comparing the image quality coefficient with the preset threshold value, and judging whether the image quality coefficient reaches the preset threshold value or not as the evaluation result.

In the embodiment of the invention, when the image quality coefficient reaches the preset threshold value T, the quality evaluation of the current image can be determined to be qualified, and the image can enter a subsequent processing program; when the image quality coefficient does not reach the preset threshold value T, the quality evaluation of the current image can be determined to be unqualified, and a parameter modification program is started to modify the camera parameters of the household appliance so that the image quality reaches the optimal effect.

In the embodiment of the present invention, the parameters of the camera of the home appliance may be adjusted in various situations, for example, one or more of the following may be included:

1、λ _light when the brightness parameter is too low, the white balance processing of the image is started, the brightness parameter is adjusted upwards, and the gain value is finely adjusted.

2、λ _light Too high, the brightness parameter is adjusted down.

3、N _edge /N _noise The ratio is too low and the filtering process will be started for filtering noise.

According to the embodiment of the invention, through the image evaluation scheme, the image acquisition in the self-adaptive environment is realized, the image data which is more in line with the environment is obtained, and the recognition rate is obviously improved. In the embodiment of the invention, as only a small amount of target images or even the screenshot of the partial region of the target images are uploaded for the cloud server to identify, the requirement on the image quality is higher, and the obtained image data can meet the identification requirement by the image evaluation method, so that the identification rate is improved.

S102, preprocessing the image sequence at a local end through a preprocessing step so as to screen the image sequence according to the positions of human hands and/or target objects in each frame of images to be processed and determine target images.

In the embodiment of the invention, after the image sequence with qualified image quality evaluation is obtained through the steps, the image sequence can be further processed. Specifically, the embodiment of the invention can process the image data in a mode of combining local end processing and cloud end (cloud server end).

In the embodiment of the invention, as shown in fig. 4, the local end can utilize the existing computing unit to complete part or all of functions (which can be flexibly configured according to the computing capability of the local computing unit) in the positioning processing process of the human hand and/or the target object, so that the image processing mode which needs to transmit the real-time video stream in the current image processing mode is changed, and the processed image or image sequence is adopted for uploading, thereby greatly reducing the data uploading pressure. The cloud can only carry out fine positioning, completes the identification operation of the article (when the local computing capacity is enough, the cloud only needs to carry out article identification work), and returns the identification result to the local end.

In the embodiment of the invention, the local end can finish part or all of functions in the positioning process of the human hand and/or the target object, and the method can finish the extraction of key frames and/or the interception of target areas in the image sequence.

In the embodiments of the present invention, one or more specific embodiments of the preprocessing performed by the local side will be given below. It should be noted that, in implementation, the preprocessing step performed at the local side may include any one or more steps of the following steps S301 to S303, which may be specifically determined according to the data processing capability of the local side.

Optionally, as shown in fig. 5, preprocessing the target image at the local side through the preprocessing step may include S301-S303:

s301, detecting motion frames of human hands and/or target objects by adopting a preset motion detection algorithm, and acquiring sequence images of the motion frames.

In the embodiment of the invention, the motion frame detection can be performed by using the motion detection algorithm to form the sequence image of the motion frame, and the sequence image is mainly applied to the real-time video stream to remove the blank background frame without the moving object.

Optionally, detecting the motion frame related to the hand of the human body and/or the target object by using a preset motion detection algorithm, and acquiring the sequence image of the motion frame may include S3011-S3014:

s3011, acquiring an amplitude motion vector image of each frame of image in an image sequence according to an optical flow method;

s3012, carrying out quantization processing on the amplitude motion vector images to obtain the motion amplitude of each amplitude motion vector image;

s3013, comparing the motion amplitude in each amplitude motion vector image with a preset amplitude threshold;

s3014, when the motion amplitude is greater than or equal to the amplitude threshold, taking an original image corresponding to the amplitude motion vector image corresponding to the motion amplitude as a motion frame, and adding the motion frame into the sequence image.

In the embodiment of the invention, an optical flow algorithm (i.e., the optical flow method) can be adopted to calculate the motion amplitude field of the image, further judge the motion amplitude, and a frame difference method can be adopted to calculate the similarity of the front frame and the rear frame in the motion frame (i.e., the motion frame) sequence, so that candidate frames of the motion frame can be further screened out, the motion frame can be uploaded, and the blank frame can be not processed. As shown in fig. 6, a specific process flow embodiment may include the following steps:

<1> acquiring an amplitude motion vector image of a single frame image in an image sequence using an optical flow method;

<2> quantization processing is carried out on the amplitude motion vector image, and the motion amplitude of the single frame image is calculated;

<3> comparing the calculated motion amplitude with a preset amplitude threshold;

<4> the motion amplitude is higher than the amplitude threshold, adding the current image frame to the motion sequence library (the images in the motion sequence library may constitute the image sequence described above);

<5> the motion amplitude is below the amplitude threshold, without any treatment;

<6> judging whether the current action sequence library is empty, if so, directly continuing to acquire images, otherwise, carrying out subsequent processing;

and (7) comparing the images in the action sequence library by using a frame difference method to calculate the similarity of the adjacent images, wherein the similarity of the adjacent images is very high (greater than or equal to a preset similarity threshold), one of the images can be directly selected, the other image can be deleted, and the rest images can be sequentially compared in sequence to finish the screening of all the images in the action sequence library;

and 8, uploading the screened image sequence to a cloud end, and carrying out subsequent fine positioning, identification and other processing.

In the embodiment of the present invention, if the computing power of the local side is limited to processing this step S301, the above-mentioned step <8> may be performed, and if the computing power of the local side may also process the subsequent steps S302 and/or S303, the step <8> is not performed.

S302, selecting key frame images of human hands and/or target objects in a preset position range from the sequence images of the motion frames.

In the embodiment of the invention, the key point of the strategy is to screen the image sequence screened in the step S301 again, the screening strategy is mainly to judge the motion direction of the image sequence screened in the first time, separate the in and out actions, obtain two actions of entering and taking out, screen the key frames of the two actions respectively, and select one or more frames of key frame images with the best positions for final uploading and recognition processing. As shown in fig. 7, the specific flow is as follows:

<1> - <6> in step S301, <1> - <6>;

<7> calculation of the movement direction by using information such as optical flow method and regional variation;

<8> splitting the motion sequence into two sub-sequences for insertion and extraction according to the direction of motion;

<9> selectively screening out 1-3 frames of images in the action subsequence for identification;

and (10) uploading the screened key frame images to a cloud end, and carrying out subsequent processing such as fine positioning and identification.

In the embodiment of the present invention, if the computing power of the local side is limited to processing steps S301 and S302, the above-described step <10> may be performed, and if the computing power of the local side may also process the subsequent step S303, the step <10> is not performed.

In the embodiment of the present invention, in order to ensure the validity of the screening result of the strategy in step S302, the motion direction judgment algorithm of the article may be optimized, so as to improve the positioning accuracy and ensure the correctness of the screening result. The following motion curve fitting scheme may be specifically employed to achieve the above steps <1> - <9>.

Optionally, selecting key frame images of the human hand and/or the target object located in the preset position range from the sequence images of the motion frames may include S3021 to S3025:

s3021, positioning human hands and/or target objects in sequence images of motion frames, and then obtaining corresponding positioning coordinates;

s3022, keeping the value of the Y axis in the positioning coordinates unchanged, and modifying the value of the X axis in the positioning coordinates into a serial number;

s3023, fitting a coordinate sequence of the modified positioning coordinates by adopting a preset linear regression model, and obtaining a fitting curve; the fitting curve is a motion curve of a human hand and/or a target object;

s3024, obtaining an average coordinate value of the Y-axis coordinate in the fitting curve, and obtaining an X-axis coordinate value corresponding to the average coordinate value of the Y-axis coordinate;

and S3025, taking the motion frame corresponding to the X-axis coordinate value as the key frame.

Optionally, the method may further include:

In the embodiment of the invention, the judgment of the movement direction has specificity, and the uncertainty of the judgment of the movement direction is caused by factors such as different movement speeds, shielding of food materials and the like. The scheme of the embodiment is implemented by performing curve fitting on the position information of the human hand and/or the target object based on hand positioning so as to determine the motion curve of the human hand and/or the target object. Specifically, taking hand positioning as an example, hand positioning may be performed on the foregoing motion frame sequence to obtain hand position information in each frame of image, as shown in fig. 8, and the specific steps may include:

<1> hand positioning is performed on a single frame image; in particular, a neural network can be used for hand training to obtain a parameter model for hand positioning,

<2> obtaining coordinates of the hand region, and adding a preset coordinate sequence;

3, reprocessing the coordinate sequence, wherein the X coordinate values of all coordinates are modified into sequence numbers, and the Y of the coordinates is kept unchanged to form a data model which is similar to a parabola in space;

<4>fitting the coordinate sequence by using a least square method in a linear regression model to obtain a curve y=ax ² +bX+c；

<5>Substituting the point order into the curve formula in sequence according to f (x, y) =y-aX ² -counting the positive and negative numbers of the point sequences respectively by the positive and negative numbers of bX-c, and simultaneously counting the number of the positive and negative numbers appearing continuously to judge the movement direction;

<6> the movement direction of the human hand is output.

According to the embodiment of the invention, through an optimized object movement direction detection scheme, the direction judgment errors caused by too fast movement and wrong positioning are avoided, the movement direction judgment accuracy reaches 98.5%, and the screening stability is improved from the direction angle.

S303, cutting out images of human hands and/or target objects from the key frame images.

In the embodiment of the invention, after several frames (such as 1-3 frames) of the human hand and/or the target object are obtained through the scheme, the corresponding images of the human hand and/or the target object can be cut out from the key frame images and uploaded. The step scheme realizes the local positioning and interception of the data, one motion provides the regional image information of the key frame, and the data uploading quantity is further reduced.

Optionally, capturing images of the human hand and/or the target object from the key frame image may include:

In the embodiment of the present invention, as shown in fig. 9, in the implementation step of the step S303, steps <1> - <10> are the same as steps <1> - <10> of the step S302, and <11> and <12> are further added:

<11> determining the approximate region (R1) of the target by using the frame difference method, detecting the position information (P1) of the hand by the hand positioning, and calculating the final target region according to the position relation between the position information (P1) of the hand and the approximate region (R1) of the target; the target may be food held by a human hand or a human hand.

<12> region cropping is performed on the target region in the key frame.

S103, uploading the image data of the target image obtained after preprocessing; the image data is used for image recognition at the cloud.

In the embodiment of the invention, after the local terminal performs one or more types of preprocessing on the image sequence shot by the camera, the image data obtained after the preprocessing can be uploaded to the cloud for accurate positioning and/or image recognition at the cloud.

According to the embodiment of the invention, through the scheme, the data processing of the local end and the cloud end are deployed separately, so that the expandability is better, the user experience is smoother, and the pressure of the server is reduced; and after the data volume uploaded by the terminal is reduced, the flow cost of the server is reduced, and the comprehensive cost is reduced.

The embodiment of the invention also provides an image data processing system 1 in the process of picking and placing articles, which is applied to household appliances, as shown in fig. 10, the system can comprise: an acquisition module 11, a processing module 12 and an uploading module 13;

the acquiring module 11 is configured to acquire an image sequence composed of images to be processed of a human hand and/or a target object held by the human hand during the process of picking and placing the object;

the processing module 12 is configured to perform preprocessing on the image sequence at the local end through a preprocessing step, so as to screen the image sequence according to the position of the hand of the human body and/or the target object in each frame of the image to be processed, and determine a target image;

an uploading module 13, configured to upload image data of the target image obtained after the preprocessing; the image data is used for image recognition at the cloud.

The embodiment of the present invention further provides a home appliance 2, as shown in fig. 11, including a processor 21 and a computer readable storage medium 22, where instructions are stored, and when the instructions are executed by the processor, the image data processing method in the process of fetching and placing the article is implemented.

In one embodiment of the present invention, the home appliance may be a refrigerator, an oven, or the like.

The beneficial effects of the embodiment of the invention can include:

Those of ordinary skill in the art will appreciate that all or some of the steps, systems, functional modules/units in the apparatus, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed cooperatively by several physical components. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.

Claims

1. An image data processing method in the process of picking and placing articles, which is applied to household appliances, is characterized in that the method comprises the following steps:

detecting motion frames related to the human hands and/or the target object by adopting a preset motion detection algorithm, and acquiring sequence images of the motion frames;

selecting key frame images of the human hand and/or the target object in a preset position range from the sequence images of the motion frames;

intercepting images of the human hands and/or the target objects from the key frame images so as to screen the image sequence according to the positions of the human hands and/or the target objects in each frame of images to be processed and determine target images;

uploading the image data of the target image obtained after preprocessing; the image data is used for carrying out image recognition on the cloud.

2. The method of processing image data during pick-and-place of an item of claim 1, wherein the preprocessing step comprises at least one of: key frame extraction and/or target region truncation.

3. The method of processing image data during pick and place of an item of claim 1, further comprising: and determining whether the preprocessing step is carried out at the local end or the cloud end according to the operation capability of the local end.

4. The method for processing image data during the process of picking and placing objects according to claim 1, wherein the selecting key frame images of the human hand and/or the target object within a preset position range from the sequence images of the motion frames comprises:

positioning the human hand and/or the target object in the sequence image of the motion frame to obtain corresponding positioning coordinates;

keeping the numerical value of the Y axis in the positioning coordinate unchanged, and modifying the numerical value of the X axis in the positioning coordinate into a serial number;

fitting the coordinate sequence of the modified positioning coordinates by adopting a preset linear regression model, and obtaining a fitting curve; the fitting curve is a motion curve of the human hand and/or the target object;

acquiring an average coordinate value of the Y-axis coordinate in the fitting curve, and acquiring an X-axis coordinate value corresponding to the average coordinate value of the Y-axis coordinate;

5. The method of processing image data during pick and place of an item as set forth in claim 4, further comprising:

substituting the serial numbers into the fitting curve in sequence, and calculating a numerical value corresponding to each serial number;

counting the number of positive and negative values of the numerical values, and taking the direction represented by the party with the larger number of positive values and the larger number of negative values as the movement direction of the human hand and/or the target object;

6. The method of processing image data during pick and place of an item according to claim 4, wherein said capturing an image of the human hand and/or the target item from the key frame image comprises:

determining a target area in the key frame image by adopting an inter-frame difference method, wherein the target area comprises an area where the target object is located and/or an area where the human hand is located; and/or acquiring a target area of the human hand through a preset human hand training model, and taking the target area of the human hand as the target area of the target object when the human hand holds the target object;

and intercepting the image of the target area to serve as the image of the human hand and/or the target object.

7. The method for processing image data during picking and placing objects according to claim 4, wherein detecting motion frames of the human hand and/or the target object by using a preset motion detection algorithm, and acquiring a sequence of images of the motion frames comprises:

acquiring an amplitude motion vector image of each frame image in the image sequence according to an optical flow method;

carrying out quantization processing on the amplitude motion vector images to obtain the motion amplitude of each amplitude motion vector image;

comparing the motion amplitude in each of the amplitude motion vector images with a preset amplitude threshold;

and when the motion amplitude is greater than or equal to the amplitude threshold, taking an original image corresponding to an amplitude motion vector image corresponding to the motion amplitude as the motion frame, and adding the sequence image.

8. The method of processing image data during pick and place of an item of claim 1, further comprising: after the image sequence is acquired, performing image quality evaluation on images in the image sequence by adopting a preset algorithm, and preprocessing the image sequence at a local end when an evaluation result reaches a preset threshold; and when the evaluation result does not reach the preset threshold, modifying the camera parameters of the household appliance to acquire an image sequence meeting the requirement of the preset threshold.

9. An appliance comprising a processor and a computer readable storage medium having instructions stored therein, which when executed by the processor, implement the method of claim 1.