Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the following detailed description of the embodiments of the present invention will be given with reference to the accompanying drawings. However, those of ordinary skill in the art will understand that in various embodiments of the present invention, numerous technical details have been set forth in order to provide a better understanding of the present invention. However, the claimed invention may be practiced without these specific details and with various changes and modifications based on the following embodiments.
The first embodiment of the present invention relates to a data generating method, and the specific flow is shown in fig. 1, including:
s101: and determining the shooting angle of the object according to a preset shooting scene.
Specifically, a base capable of rotating 360 degrees is placed in a shooting scene, an object is placed on the base, so that video data of the object can be collected in multiple angles and multiple directions, and the object is placed randomly, so that the diversity of identification data obtained in the subsequent steps is guaranteed.
It should be noted that the object in this embodiment may be food (such as snack, beverage, etc.), or office supplies (such as notebook, pen, etc.), and the kind of the object is not particularly limited.
It can be appreciated that the data generation method of the embodiment can be applied to commodity identification of specific scenes, and shooting scenes are manually arranged, for example, scenes such as self-service commodity identification, an unmanned supermarket, a self-service shooting settlement table, a self-service weighing table and the like can be set, so that acquisition of identification data of objects in different scenes is realized, and the practicability of the data generation method is improved.
It should be noted that, the shooting angle is determined by the preset scene, so that the accuracy of the obtained identification data is further improved, if in practical application, the camera of the container of the unmanned supermarket is arranged at the top end of the commodity, and when the shooting scene is arranged based on the unmanned supermarket, the camera for shooting the object is also arranged at the top end of the object, so as to ensure the consistency with the practical application scene.
S102: video data of an object placed on a base is photographed from a photographing angle.
Specifically, an object is placed on a motor base rotating at a low speed, the rotating speed is set, different angles of the object are displayed in a 360-degree rotation mode, and video data of one circle of object rotation are shot.
In the present embodiment, the rotation speed of the base is not specifically limited, and the number of rotation cycles of the base is not specifically limited, and preferably, the rotation is one cycle, so that the object is photographed in multiple directions, and the video data is prevented from being excessively large due to the excessive number of rotation cycles, thereby increasing the calculation amount of the subsequent steps.
S103: a plurality of Shan Zhen images of the object are obtained from the video data.
Specifically, post-processing is performed on the video data (post-processing corresponds to pre-processing, refers to the next step of working after pre-processing, is the working performed before finishing, or is the step performed after working at a certain stage), so as to obtain the total frame number of the video data, and the extraction interval frame number is determined according to the total frame number and the preset number of extracted images; and extracting the single-frame images of the extracted image number from the video data according to the extraction interval frame number.
For ease of understanding, a specific example of how multiple Shan Zhen images of an object are derived from video data is described below:
assuming that video data of an object is photographed when the base is rotated 360 degrees, the video data is 720 frames in total, one frame of image is expected to be extracted every 18 degrees of rotation of the base, that is, the number of expected extracted images is 20, so that the number of frames at intervals of 720/20=36 frames can be calculated, that is, one Shan Zhen image is extracted every 36 frames of video data at intervals.
S104: identification data of the object is obtained from the plurality of Shan Zhen images.
Specifically, the identification data in this embodiment may be labeling information of the object, where the labeling information includes type information and position information of the object.
Compared with the prior art, the method and the device have the advantages that the shooting angle of the object is determined according to the preset shooting scene, and the object is placed on the base capable of rotating 360 degrees, so that multi-angle and multi-azimuth acquisition of the object video data is realized; because the shooting scenes are arranged according to the actual application scenes, the accuracy of the identification data obtained later is ensured, and different shooting scenes can be designed for different application scenes; according to the method, a plurality of Shan Zhen images of the object are obtained according to video data, and identification data of the object are obtained according to a plurality of Shan Zhen images, so that the identification data of the object can be automatically collected, the workload of labeling the object is reduced, the labor cost is reduced, the situation that the appearance of individual commodities is very similar and the manual labeling is easy to make mistakes is avoided, and the accuracy of the obtained identification data is improved; in addition, can put a plurality of objects on the base, and guarantee the randomness that the object put, ensure to produce diversified identification data.
A second embodiment of the present invention relates to a data generation method, and this embodiment is an example of the first embodiment, specifically explaining: how to obtain the identification data of the object according to a plurality of Shan Zhen images.
Specifically, as shown in fig. 2, the present embodiment includes steps S201 to S207, wherein steps S201 to S203 are substantially the same as steps S101 to S103 in the first embodiment, and are not described herein. The differences are mainly described below:
steps S201 to S203 are performed.
S204: and performing significance detection on each Shan Zhen image respectively to obtain a plurality of detection images.
Specifically, saliency detection refers to simulating visual characteristics of a human through an intelligent algorithm, and extracting a salient region (namely a region of interest of the human) in an image.
S205: contour information of the object in the plurality of Shan Zhen images is obtained from the plurality of detection images.
Specifically, a single frame image is detected by using a saliency detection model to obtain a detection image, and the detection image usually obtained has a phenomenon of blurring edges and needs to be further processed. Since binarization of an image is advantageous for further processing of the image, the image is simplified, and the data size is reduced, and the outline of the object of interest can be highlighted, the method for obtaining the outline information of the object in a plurality of single-frame images according to a plurality of detected images in the embodiment includes: performing binarization processing on a plurality of detection images to obtain a plurality of binary images; and obtaining the contour information according to a plurality of binary images. That is, the detection result is converted into a gray scale, and then into a binary image, that is, the significant region pixel value becomes 255 and the background region pixel becomes 0.
It can be understood that the contour information in this embodiment is the contour of the salient object region in the single frame image, and can be obtained according to the contour region in the binary image.
S206: mapping the contour information and the corresponding single-frame image, and dividing the single-frame image into divided images only containing the object.
Specifically, according to the mapping of the contour information and the single frame image, the object image in the single frame image is scratched.
Because the contour information only keeps the information of the object, and the interference of the surrounding background of the object is avoided, the segmented image is obtained according to the contour information, and the identification data is obtained according to the segmented image, so that the object identification accuracy is improved.
S207: and obtaining identification data according to the segmented image.
Specifically, in this embodiment, the identification data may be obtained by: pasting the segmented image on a preset background; determining an external frame of the object according to the segmented image, and establishing a coordinate system according to the preset background; and generating an external frame coordinate of the object under the coordinate system according to the position of the segmented image in the preset background, and taking the external frame coordinate as the identification data.
For easy understanding, the data generation method in the present embodiment is specifically illustrated below:
assume that the actual application scenario is an automatic weighing platform:
step 1: and setting a white background, avoiding influencing the subsequent significance detection, and preparing commodity objects to be acquired. The commodity collection equipment mainly comprises a plurality of parts, namely a base capable of rotating at a low speed; 2. 4 cameras; 3. the camera is connected with the bracket. The four cameras are positioned approximately at the top of the image, at a 45 degree oblique upward angle, at a 30 degree oblique upward angle, and at a horizontal angle. 4 cameras are fixed at the designated positions through the camera support, so that the multi-azimuth shooting of commodities is ensured.
The following sub-steps are carried out: step 1.1: the commodity is placed on a motor base rotating at a low speed, the rotating speed is set, and the commodity is displayed in a 360-degree rotation mode at different angles. Step 1.2:4 cameras record the video of 4 visual angles respectively, and commodity on the motor base rotates 360 degrees at a low speed altogether. Step 1.3: for the bagged commodity, the front side and the back side are different, so the front side and the back side are respectively repeated once in the operation of step 1.2.
Step 2: and carrying out post-processing on videos acquired by the 4 cameras. The number of frames of video is calculated, one frame of image is extracted every 18 degrees of rotation on average, 360/18=20 frames of each video is extracted, and images are extracted according to the interval of the number of frames/20. Four cameras pick up 20×4=80, i.e. 80 pictures are taken per commodity.
Step 3: the goods in each picture are segmented using salient region detection. Because the saliency detection model is used for detecting the picture to obtain a picture detection result, the edge blurring phenomenon exists in the usually obtained detection image, further processing is needed, and the binarization of the image is beneficial to the further processing of the image, so that the image becomes simple, the data size is reduced, and the outline of the interested target can be highlighted. The detection result is converted into a gray level map and then into a binary map, that is, the significant region pixel value becomes 255 and the background region pixel becomes 0.
The following sub-steps are carried out: step 3.3: obtaining a contour area in the binary image, and obtaining a contour of a significant object area; step 3.4: and (3) mapping the outline of the salient object region obtained in the step (3.3) with the original image to generate segmented commodity data.
Step 4: the commodity data generated in the step 3 are randomly pasted on a specified background (usually solid color, interference of the background is avoided), a small amount of overlapping (not more than 20%) is allowed, and circumscribed frame coordinates which can be used for target detection and contour information for instance segmentation are generated and stored according to contour information of the commodity. Specifically, a coordinate system is established according to a specified background, then commodities are placed on the specified background, when a plurality of commodities are available, no more than 20% of overlapping is allowed among different commodities, so that the shooting integrity of each commodity is ensured, the situation that identification data of the commodities covered by other commodities are difficult to obtain is avoided, and the external frame coordinate of each commodity is obtained according to the position of each commodity in the specified background.
Compared with the prior art, the method and the device have the advantages that the shooting angle of the object is determined according to the preset shooting scene, and the object is placed on the base capable of rotating 360 degrees, so that multi-angle and multi-azimuth acquisition of the object video data is realized; because the shooting scenes are arranged according to the actual application scenes, the accuracy of the identification data obtained later is ensured, and different shooting scenes can be designed for different application scenes; according to the method, a plurality of Shan Zhen images of the object are obtained according to video data, and identification data of the object are obtained according to a plurality of Shan Zhen images, so that the identification data of the object can be automatically collected, the workload of labeling the object is reduced, the labor cost is reduced, the situation that the appearance of individual commodities is very similar and the manual labeling is easy to make mistakes is avoided, and the accuracy of the obtained identification data is improved; in addition, can put a plurality of objects on the base, and guarantee the randomness that the object put, ensure to produce diversified identification data.
A third embodiment of the present invention relates to a data generation method, and this embodiment is an example of the first embodiment, specifically explaining: how to obtain the identification data of the object according to a plurality of Shan Zhen images.
Specifically, as shown in fig. 3, the present embodiment includes steps S301 to S306, wherein steps S301 to S303 are substantially the same as steps S101 to S103 in the first embodiment, and are not described herein. The differences are mainly described below:
steps S301 to S303 are performed.
S304: one image is randomly taken out of a plurality of Shan Zhen images to be marked.
Specifically, since the objects in each Shan Zhen image are the same and are pictures with the same category and different object angles, only one image needs to be selected as the image to be marked.
S305: labeling the object in the image to be labeled to obtain labeling information of the object.
Specifically, the labeling information in the present embodiment includes type information and position information of the object.
S306: and obtaining identification data according to the labeling information.
Specifically, in this embodiment, the obtaining the identification data according to the labeling information includes: marking other single-frame images except the image to be marked in the plurality of Shan Zhen images according to the marking information; and taking the labeling information as the identification data.
For easy understanding, the data generation method in the present embodiment is specifically illustrated below:
taking a preset scene as an intelligent container as an example:
the commodity image acquisition of the intelligent container is more complex than that of a general scene, and because commodity loading personnel put commodities randomly, various random combination adjustment is needed manually, so that the marking workload is also extremely high; meanwhile, the commodity package is replaced, or a large number of pictures are required to be collected and marked again for adding and deleting commodities. In addition, if the size of the container is changed, all data are not available, and the method can easily simulate the changed size of the container, and more efficiently collect and label new data sets.
Step 1: referring to hardware equipment of an intelligent container, a container image acquisition scene is simulated as much as possible according to an actual scene inside the equipment and size information of each side of a certain layer. The simulation of the acquisition scene comprises the following sub-steps:
step 1.1: the intelligent counter adopted at present is square, each counter comprises 4 layers, the top center of each layer comprises a camera, all commodity images acquired by 4 layers are similar, so that one layer is taken as a prototype, and a simulation scene is designed. Step 1.2: and measuring the length, width and height of the selected layer of the container and measuring the position of the camera. Step 1.3: the material is prepared, and the four sides of the container are provided with a three-side white cabinet body and a side door, so that according to the dimension measured in the last step, three white background plates and a transparent plate are used for enclosing a cube which is the same as one layer of the container on a white background table, and the top of the cube is hung with the same camera through a bracket. Step 1.4: a number of low speed rotatable mounts are prepared, placed at the bottom of the cube, taking care of the height of the cube = the selected floor height of the container + the mount height.
Step 2: preparing commodities to be collected, setting a placement rule, and placing the commodities to be collected on a base according to the placement rule.
Comprises the following substeps:
step 2.1: determining a commodity list to be collected according to business requirements, and preparing enough commodities to at least meet the requirement of being capable of being horizontally or vertically placed in one row; step 2.2: in order to prevent serious shielding, it is recommended to place higher commodities on both sides and lower commodities in the middle region. In order to simulate a real scene, random combinations among different commodities are met as much as possible. For snack products, which are different from each other in terms of both sides, it is necessary to photograph both sides once.
Step 3: after the video is placed, all the bases are started to rotate at a low speed for one circle, and the cameras at the top automatically record the video.
Step 4: and carrying out post-processing on the video acquired by the top camera. The number of frames of video is calculated, and one frame of image is extracted every 9 degrees of rotation on average, 360/9=40 frames are extracted per video, and images are extracted according to the interval of the number of frames/40.
Step 5: the placement positions of the images generated in the step 4 are the same, and the angles of different commodities are different, so that the labeling information is consistent, and the images in the step 4 are placed in the same folder only by labeling once without repeating labeling.
Step 6: if the commodity is only increased or decreased according to the last acquisition condition, the acquisition process is consistent with the steps 3 and 4, and the corresponding detection frame is only increased or decreased according to the labeling result of the step 5 when the data is labeled, compared with the method for labeling all the commodity of each picture again, the method greatly reduces the workload.
Compared with the prior art, the method and the device have the advantages that the shooting angle of the object is determined according to the preset shooting scene, and the object is placed on the base capable of rotating 360 degrees, so that multi-angle and multi-azimuth acquisition of the object video data is realized; because the shooting scenes are arranged according to the actual application scenes, the accuracy of the identification data obtained later is ensured, and different shooting scenes can be designed for different application scenes; according to the method, a plurality of Shan Zhen images of the object are obtained according to video data, and identification data of the object are obtained according to a plurality of Shan Zhen images, so that the identification data of the object can be automatically collected, the workload of labeling the object is reduced, the labor cost is reduced, the situation that the appearance of individual commodities is very similar and the manual labeling is easy to make mistakes is avoided, and the accuracy of the obtained identification data is improved; in addition, can put a plurality of objects on the base, and guarantee the randomness that the object put, ensure to produce diversified identification data.
A fourth embodiment of the present invention relates to a data generating method, which is a further improvement of the third embodiment, and is mainly improved in that: the number of the objects in the single frame image is multiple, the objects to be updated are packaged in the multiple objects, and after the labeling information of the objects is obtained, the method further comprises the steps of: collecting new video data of a plurality of objects placed on the base, wherein the placing positions of the objects on the base are consistent with the placing positions of the objects when the video data are collected, and the objects to be updated in the packaging of the objects are replaced by the updated objects; obtaining new single-frame images of the plurality of objects according to the new video data; and replacing the single frame image with the new single frame image, and transferring the annotation information to the new single frame image.
That is, if the data set obtained in the third embodiment is required to be packaged with a certain object, only the image including the object with the packaged object is selected, the object is replaced with a new package according to the placement rule, and video shooting and picture extraction are performed. If the size of the object is not obviously changed after the package is replaced, the labeling information is not changed, and if the size is changed, the labeling information of the object is only required to be adjusted. Compared with manual collection and labeling, the method can save a great deal of manpower and improve the efficiency of object identification data collection.
Specifically, as shown in fig. 4, the present embodiment includes steps S401 to S409, wherein steps S401 to S406 are substantially the same as steps S301 to S306 in the third embodiment, and are not described herein. The differences are mainly described below:
steps S401 to S406 are performed.
S407: a new video of a plurality of objects placed on a base is acquired.
Specifically, the placement position of the plurality of objects on the base is consistent with the placement position when the video data is collected, and the object to be updated in the package of the plurality of objects is replaced by the updated object.
S408: and obtaining new single-frame images of a plurality of objects according to the new video data.
S409: and replacing the single-frame image with a new single-frame image, and transferring the annotation information to the new single-frame image.
For easy understanding, the data generation method in the present embodiment is specifically illustrated below:
step 1: for the data set collected in the foregoing embodiment, the category of the commodity to be replaced and packaged is obtained, denoted by L.
Step 2: and traversing the labeling information of the data set of the embodiment for each commodity in L, and respectively screening out pictures containing the commodity.
Step 3: and reproducing the placement mode of the commodities in the screened pictures, and replacing the commodities which need to be replaced with new packaged commodities.
Step 4: and acquiring the commodity pictures after the replacement package by adopting the video and image acquisition mode in the embodiment.
Step 5: and (3) replacing the picture in the step (4) with the picture containing the original package in the previous embodiment, and retraining without re-labeling.
The above steps of the methods are divided, for clarity of description, and may be combined into one step or split into multiple steps when implemented, so long as they include the same logic relationship, and they are all within the protection scope of this patent; it is within the scope of this patent to add insignificant modifications to the algorithm or flow or introduce insignificant designs, but not to alter the core design of its algorithm and flow.
A fifth embodiment of the present invention relates to a data generating apparatus, as shown in fig. 5, including:
at least one processor 501; the method comprises the steps of,
a memory 502 communicatively coupled to the at least one processor 501; wherein,
the memory 502 stores instructions executable by the at least one processor 501, the instructions being executable by the at least one processor 501 to enable the at least one processor 501 to perform the data generation method described above.
Where the memory 502 and the processor 501 are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting the various circuits of the one or more processors 501 and the memory 502. The bus may also connect various other circuits such as peripherals, voltage regulators, and power management circuits, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or may be a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 501 is transmitted over a wireless medium via an antenna, which further receives the data and transmits the data to the processor 501.
The processor 501 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory 502 may be used to store data used by processor 501 in performing operations.
A sixth embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The computer program implements the above-described method embodiments when executed by a processor.
That is, it will be understood by those skilled in the art that all or part of the steps in implementing the methods of the embodiments described above may be implemented by a program stored in a storage medium, where the program includes several instructions for causing a device (which may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps in the methods of the embodiments described herein. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples of carrying out the invention and that various changes in form and details may be made therein without departing from the spirit and scope of the invention.