CN115035343A - Neural network training method, article detection method, apparatus, device and medium - Google Patents

Neural network training method, article detection method, apparatus, device and medium Download PDF

Info

Publication number
CN115035343A
CN115035343A CN202210692178.1A CN202210692178A CN115035343A CN 115035343 A CN115035343 A CN 115035343A CN 202210692178 A CN202210692178 A CN 202210692178A CN 115035343 A CN115035343 A CN 115035343A
Authority
CN
China
Prior art keywords
article
image
sample
neural network
placement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202210692178.1A
Other languages
Chinese (zh)
Inventor
李思奇
田茂清
刘建博
伊帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sensetime Technology Development Co Ltd
Original Assignee
Beijing Sensetime Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sensetime Technology Development Co Ltd filed Critical Beijing Sensetime Technology Development Co Ltd
Priority to CN202210692178.1A priority Critical patent/CN115035343A/en
Publication of CN115035343A publication Critical patent/CN115035343A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • G06T15/205Image-based rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The utility model provides a neural network training method, article detection method, device, equipment and medium, through obtaining sample article image, generate the sample article that the simulation sample article was put and put the image, and use sample article to put the image, put the image in combination with real article and train the neural network, can greatly reduce the work load that gathers and mark the training sample and need, reduce the consumption of resources such as manpower, material resources and cost, save sample construction time, improve the efficiency of gathering the training sample, and can effectively reduce the influence that factors such as environment, time, illumination met when gathering real training sample brought, increase the richness of sample, promote the robustness and the generalization ability of neural network training effectively.

Description

Neural network training method, article detection method, apparatus, device and medium
Technical Field
The disclosure relates to the technical field of deep learning, in particular to a neural network training method, an article detection device, equipment and a medium.
Background
With the progress of scientific technology and the increasing development of society, the life style of human beings is greatly changed, on the aspect of shopping style, online shopping, self-service shopping and the like become one of the mainstream shopping styles of people, and then, on the aspect of payment style, quick and convenient payment styles such as online payment, code scanning payment, face brushing payment and the like also appear. Self-service vending devices such as unmanned vending machines and unmanned vending cabinets are widely applied to self-service shopping because of the advantages of convenient payment mode, high-efficiency shopping efficiency, free application environment and the like.
When self-service shopping is carried out through the self-service vending device, most of the self-service vending device identifies articles purchased by a user through collecting images or videos in the shopping process, and then carries out settlement and other processing. To the discernment of article, can discern through the mode of the recognition model based on artificial intelligence, and the recognition model need can use after finishing through data training such as a large amount of training images, to the training image of training model, gather the image under the real scene earlier mostly, then mark the image through modes such as artifical mark, just can be used for training the model, not only need consume a large amount of manpower and material resources, the collection of training image and mark need consume a large amount of time, low speed, the cycle length, and it is limited to article true quantity and the kind of putting the image in the real scene.
Disclosure of Invention
The embodiment of the disclosure at least provides a neural network training method, an article detection method, a device, equipment and a medium.
The embodiment of the disclosure provides a neural network training method, which comprises the following steps:
acquiring a sample image set and a collected article image set, wherein the sample image set comprises at least one sample article image collected aiming at the same sample article, and the collected article image comprises a collected real article placement image;
generating a sample article placement image simulating placement of articles and sample article labeling information of the sample article placement image based on the sample image set;
inputting the sample article placement image and the real article placement image into a neural network, and acquiring first characteristic data and second characteristic data obtained by respectively performing characteristic extraction on the sample article placement image and the real article placement image by the neural network, and an article detection result of the sample article placement image output by the neural network;
adjusting network parameters of the neural network based on the article detection result and corresponding sample article labeling information, the first characteristic data, the second characteristic data and a preset classification result aiming at the sample article placement image and the real article placement image until the neural network meets a training cutoff condition, and taking the trained neural network as an article detection model for detecting articles in the image.
Thus, by acquiring a plurality of sample object images and a small number of real object placing images, and further generating sample object placing images simulating object placing through the sample object images, the construction of training data is realized, the marking information of the sample object placing images can be synchronously obtained, the workload of real sample image acquisition can be greatly reduced, the workload and time for marking the acquired images can be further saved, the acquisition time of training samples is reduced, the consumption of resources and cost such as manpower and material resources is reduced, the sample construction time is saved, the efficiency of acquiring the training samples is improved, the influence caused by factors such as environment, time and illumination when the real training samples are acquired can be effectively reduced, the richness of the training samples is increased, when the neural network is trained, the sample object placing images are used, and the neural network is trained by combining the real object placing images, the neural network can simultaneously learn the image characteristics from the virtual world and the real world, the fusion of the virtual domain and the real domain is realized through training, the training effectiveness of the neural network can be ensured, the robustness and the generalization capability of the neural network training are effectively improved, and the training efficiency of the neural network is effectively improved.
In an optional embodiment, the generating a sample article placement image simulating an article placement based on the sample image set and sample article labeling information of the sample article placement image includes:
generating a three-dimensional article virtual model of a sample article corresponding to the sample image set based on at least one sample article image in the sample image set;
and carrying out simulation intensive placement on the generated three-dimensional article virtual model, and generating a sample article placement image simulating article placement and sample article marking information of the sample article placement image, wherein the sample article placement image comprises at least part of the generated three-dimensional article virtual model.
Like this, through the sample article image that acquires, construct the three-dimensional virtual model of sample article, the three-dimensional virtual model who uses sample article simulates putting of sample article, it puts the image to obtain sample article, and can obtain corresponding label information when obtaining sample article and putting the image, not only can establish a large amount of virtual training samples that are used for training neural network fast, the simulation degree of virtual training sample is high, and is effectual, the sample is abundant, and can save the time to the sample label, can save a large amount of manpower and material resources, shorten the time that the sample image was collected greatly, improve the efficiency of neural network training.
In an alternative embodiment, the generating a virtual model of a three-dimensional article of a sample article corresponding to the sample image set based on at least one sample article image in the sample image set includes:
generating an initial virtual model of a sample article corresponding to the sample image set based on at least one sample article image in the sample image set;
and performing texture rendering and material rendering on the initial virtual model according to the article information of the sample article to obtain a three-dimensional article virtual model of the sample article.
In an optional embodiment, the performing simulated dense placement on the generated three-dimensional virtual model of the article, generating a sample article placement image simulating the placement of the article, and generating sample article labeling information of the sample article placement image includes:
carrying out simulation intensive placement on the generated three-dimensional article virtual models to obtain at least two article placement scenes for placing at least part of the three-dimensional article virtual models in a target space and placement information of each three-dimensional article virtual model in each article placement scene;
rendering each three-dimensional object virtual model in each object placement scene according to preset scene information for the object placement scenes to obtain the rendered object placement scenes;
acquiring a sample article placement image obtained by carrying out image acquisition on the rendered article placement scene;
and determining sample article marking information corresponding to the sample article placing image based on the placing information of each corresponding three-dimensional article virtual model.
Here, through the three-dimensional virtual model that uses sample article, can construct different sample article and put the scene, and then can gather after rendering to sample article scene through various rendering conditions and obtain different sample article and put the image, the quantity and the kind that the image was put to the sample article that obtains obtain very big abundance, the sample article is put the article in the image and is put the combination and have diversified and complicated effect, the sample image has greatly been improved to the coverage of different situations, and then when training neural network, help promoting neural network's robustness and generalization ability.
In an optional embodiment, the inputting the sample article placement image and the real article placement image into a neural network, and acquiring first feature data and second feature data obtained by feature extraction performed on the sample article placement image and the real article placement image by the neural network, and an article detection result of the sample article placement image output by the neural network, includes:
inputting the sample article placement image and the real article placement image into a neural network, and performing feature extraction on the sample article placement image and the real article placement image through a feature extraction layer of the neural network to obtain first feature data of the sample article placement image and second feature data of the real article placement image;
and inputting the first characteristic data into an article detection layer of the neural network to obtain an article detection result for detecting the article of the sample article placement image.
In an optional embodiment, taking the first feature data and the second feature data as target feature data, taking the sample article placement image and the real article placement image as target images, and adjusting network parameters of the neural network based on the article detection result and corresponding sample article tagging information, and the first feature data, the second feature data and a preset classification result for the sample article placement image and the real article placement image until the neural network meets a training cutoff condition includes:
determining a first loss value in feature extraction for the neural network based on the target feature data and a preset classification result for the target image;
determining a second loss value for the neural network in terms of item detection based on the item detection results and corresponding sample item tagging information;
adjusting network parameters of the neural network based on the first loss value and the second loss value until the neural network satisfies a training cutoff condition.
In an optional embodiment, the determining a first loss value in feature extraction for the neural network based on the target feature data and a preset classification result for the target image includes:
determining an image classification result of the target image based on the target feature data;
determining a first loss value in terms of feature extraction for the neural network based on the image classification result and a preset classification result for the target image.
In an optional embodiment, the training cutoff condition includes a first cutoff condition and a second cutoff condition, and the adjusting the network parameter of the neural network based on the first loss value and the second loss value until the neural network meets the training cutoff condition includes:
adjusting network parameters of a feature extraction layer of the neural network based on the first loss value until the neural network meets the first cutoff condition;
and adjusting the network parameters of the article detection layer of the neural network and the network parameters between the article detection layer and the feature extraction layer based on the second loss value until the neural network meets the second cutoff condition.
The neural network is trained by using the virtually constructed sample article placement image and the actually acquired actually placed image together, so that the neural network learns the characteristics of the virtually constructed sample article placement image and the actually acquired actual article placement image, the fusion of a virtual domain and an actual domain is realized, the cognition and extraction difference of the neural network on the characteristics of different fields can be effectively reduced, the accuracy of the neural network in processing the actual image is improved, and the training speed and efficiency of the neural network can be effectively accelerated.
In an optional embodiment, the adjusting the network parameters of the neural network based on the first loss value and the second loss value until the neural network satisfies a training cutoff condition includes:
and adjusting the network parameters of a feature extraction layer and an article detection layer of the neural network and the network parameters between the article detection layer and the feature extraction layer until the neural network meets a training cut-off condition based on the first loss value and the second loss value.
The embodiment of the present disclosure further provides an article detection method, including:
acquiring an image to be detected;
and inputting the image to be detected into an article detection model obtained by training according to the neural network training method to obtain a detection result of the image to be detected.
The embodiment of the present disclosure further provides a neural network training device, which includes:
the system comprises an article image acquisition module, a storage module and a display module, wherein the article image acquisition module is used for acquiring a sample image set and a collected article image set, the sample image set comprises at least one sample article image acquired aiming at the same sample article, and the collected article image comprises an acquired real article placement image;
the sample image generation module is used for generating a sample article placement image for simulating article placement and sample article labeling information of the sample article placement image based on the sample image set;
a sample image processing module, configured to input the sample article placement image and the real article placement image to a neural network, obtain first feature data and second feature data obtained by performing feature extraction on the sample article placement image and the real article placement image by the neural network, respectively, and obtain an article detection result of the sample article placement image output by the neural network;
and the neural network training module is used for adjusting network parameters of the neural network based on the article detection result and corresponding sample article labeling information, the first characteristic data, the second characteristic data and a preset classification result aiming at the sample article placement image and the real article placement image until the neural network meets a training cutoff condition, and taking the trained neural network as an article detection model for detecting articles in the image.
In an optional implementation, the sample image generation module is specifically configured to:
generating a three-dimensional article virtual model of a sample article corresponding to the sample image set based on at least one sample article image in the sample image set;
and carrying out simulation intensive placement on the generated three-dimensional article virtual model, and generating a sample article placement image simulating article placement and sample article marking information of the sample article placement image, wherein the sample article placement image comprises at least part of the generated three-dimensional article virtual model.
In an optional embodiment, the sample image generation module, when configured to generate a three-dimensional virtual object model of a sample object corresponding to the sample image set based on at least one sample object image in the sample image set, is specifically configured to:
generating an initial virtual model of a sample article corresponding to the sample image set based on at least one sample article image in the sample image set;
and performing texture rendering and material rendering on the initial virtual model according to the article information of the sample article to obtain a three-dimensional article virtual model of the sample article.
In an optional embodiment, the sample image generation module, when configured to perform simulated dense placement on the generated three-dimensional virtual model of the article, generate a sample article placement image for simulating article placement, and sample article labeling information of the sample article placement image, is specifically configured to:
carrying out simulation intensive placement on the generated three-dimensional article virtual models to obtain at least two article placement scenes for placing at least part of the three-dimensional article virtual models in a target space and placement information of each three-dimensional article virtual model in each article placement scene;
rendering each three-dimensional object virtual model in each object placement scene according to preset scene information for the object placement scenes to obtain the rendered object placement scenes;
acquiring a sample article placement image obtained by carrying out image acquisition on the rendered article placement scene;
and determining sample article marking information corresponding to the sample article placing image based on the placing information of each corresponding three-dimensional article virtual model.
In an optional implementation manner, the sample image processing module is specifically configured to:
inputting the sample article placement image and the real article placement image into a neural network, and performing feature extraction on the sample article placement image and the real article placement image through a feature extraction layer of the neural network to obtain first feature data of the sample article placement image and second feature data of the real article placement image;
and inputting the first characteristic data into an article detection layer of the neural network to obtain an article detection result for detecting the article of the sample article placement image.
In an optional implementation manner, the first feature data and the second feature data are used as target feature data, the sample article placement image and the real article placement image are used as target images, and the neural network training module is specifically configured to:
determining a first loss value in feature extraction for the neural network based on the target feature data and a preset classification result for the target image;
determining a second loss value for the neural network in terms of item detection based on the item detection results and corresponding sample item tagging information;
adjusting network parameters of the neural network based on the first loss value and the second loss value until the neural network satisfies a training cutoff condition.
In an optional embodiment, the neural network training module, when configured to determine a first loss value in feature extraction for the neural network based on the target feature data and a preset classification result for the target image, is specifically configured to:
determining an image classification result of the target image based on the target feature data;
determining a first loss value in terms of feature extraction for the neural network based on the image classification result and a preset classification result for the target image.
In an optional embodiment, the training cutoff condition includes a first cutoff condition and a second cutoff condition, and the neural network training module, when configured to adjust the network parameters of the neural network based on the first loss value and the second loss value until the neural network meets the training cutoff condition, is specifically configured to:
adjusting network parameters of a feature extraction layer of the neural network based on the first loss value until the neural network meets the first cutoff condition;
and adjusting the network parameters of the article detection layer of the neural network and the network parameters between the article detection layer and the feature extraction layer based on the second loss value until the neural network meets the second cutoff condition.
In an optional embodiment, the neural network training module, when configured to adjust the network parameter of the neural network based on the first loss value and the second loss value until the neural network meets a training cutoff condition, is specifically configured to:
and adjusting the network parameters of a feature extraction layer and an article detection layer of the neural network and the network parameters between the article detection layer and the feature extraction layer until the neural network meets a training cutoff condition based on the first loss value and the second loss value.
The disclosed embodiment also provides an article detection device, the device includes:
the acquisition module is used for acquiring an image to be detected;
and the detection module is used for inputting the image to be detected into an article detection model obtained by training according to the neural network training device to obtain a detection result of the image to be detected.
An embodiment of the present disclosure further provides a computer device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the computer device is running, the machine-readable instructions when executed by the processor performing the steps of the neural network training method or the article detection method described above.
Embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the neural network training method or the article detection method.
In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.
Fig. 1 is a flowchart of a neural network training method provided in an embodiment of the present disclosure;
FIG. 2 is a schematic structural diagram of a neural network according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a process in which a neural network identifies an image during a training process according to an embodiment of the present disclosure;
fig. 4 is a flowchart of an article detection method provided by an embodiment of the present disclosure;
fig. 5 is a schematic diagram of a neural network training device provided in an embodiment of the present disclosure;
FIG. 6 shows a schematic view of an article detection apparatus provided by an embodiment of the present disclosure;
fig. 7 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.
Research shows that in the training process of the neural network, a large number of real sample images are often needed for training the neural network, and a large number of manpower, material resources, time and the like are needed for acquiring a large number of real sample images, so that the acquisition efficiency is low. If the acquired real sample image is used for training the neural network, the sample image needs to be manually labeled, so that the workload of manual labeling is large, and the labeling speed is low. If a more complex sample image is to be obtained, the sample image needs to be obtained for multiple times under different conditions, so that the difficulty of obtaining the sample image is high, the efficiency is low, and the required time is long.
Based on the research, the invention provides a neural network training method, which can greatly reduce the workload of image acquisition of real samples, further save the workload and time for labeling the acquired images, reduce the acquisition time of training samples, reduce the consumption of resources and cost such as manpower and material resources, save the sample construction time, improve the efficiency of acquiring the training samples, effectively reduce the influence caused by factors such as environment, time and illumination when acquiring the real training samples, increase the richness of the training samples, ensure the training effectiveness of the neural network, effectively improve the robustness and generalization capability of neural network training, and effectively improve the training efficiency of the neural network.
The above drawbacks are the results of the inventor after practical and careful study, and therefore, the discovery process of the above problems and the solutions proposed by the present disclosure in the following description should be the contribution of the inventor to the present disclosure in the course of the present disclosure.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined or explained in subsequent figures.
Referring to fig. 1, fig. 1 is a flowchart of a neural network training method according to an embodiment of the present disclosure. As shown in fig. 1, a neural network training method provided in an embodiment of the present disclosure includes:
s101: acquiring a sample image set and a collected object image set, wherein the sample image set comprises at least one sample object image acquired for the same sample object, and the collected object image set comprises acquired real object arrangement images.
When training a neural network, firstly, a large amount of image data for neural network training needs to be acquired, generally, real images under real scenes or real images crawled from the network can be acquired, then the real images are labeled, the acquired real images and corresponding labeling information are used for training the neural network, and the acquisition and labeling of the real images require consumption of a large amount of manpower and material resources.
Specifically, in this step, at least one sample article image corresponding to a sample article can be obtained by performing image acquisition on the sample article, and the sample article image is used for constructing a three-dimensional article virtual model of the sample article in a subsequent process.
In actual operation, image acquisition may be performed on a plurality of sample articles, and at least one sample article image is acquired for each sample article, so as to form the sample image set. The at least one sample article image described later is for the same sample article, and for at least one sample article image in different groups, it may be for different sample articles.
Correspondingly, a plurality of real article placing images can be acquired, each real article placing image can include a plurality of placed articles, preferably, article combinations included in each real article placing image are different, or article combinations included in some real article placing images are the same, but placing conditions such as placing positions, placing angles and the like of a plurality of articles in the same article combination are different, that is, placing conditions such as placing positions, placing angles and the like of the plurality of real articles having the same article combination are different.
In this step, the number of the acquired real article placement images may be small, and the number thereof is far smaller than the number of the sample article images and also far smaller than the number of the subsequently generated sample article placement images.
S102: and generating a sample article placement image simulating article placement and sample article labeling information of the sample article placement image based on the sample image set.
In this step, after obtaining the sample image set, a virtual model of the sample article may be constructed using at least one sample article image in the sample image set, and at least one sample article image corresponding to each sample article may be constructed to form a virtual model corresponding to the sample article, and further, a sample article placement image may be generated by using the virtual model of each sample article, or a target article image matching between each sample article may be selected from at least one sample article image corresponding to each sample article, that is, one target article image may be selected from each group of at least one sample article image, and then the selected plurality of target article images are combined to generate a sample article placement image, and at the same time, based on information such as a combination and placement condition of each sample article in the process of generating the sample article placement image, sample article labeling information for the sample article placement image can be obtained.
When the sample article placement image is generated by using the selected multiple target article images, article information, such as position information, pose information, angle information, illumination information, view angle information, and the like, of each sample article in the sample article placement image to be generated may be obtained, then for each sample article, at least one sample article image corresponding to the sample article is traversed according to the corresponding article information, a sample article image matching or even consistent with the above information is selected, then an image portion corresponding to the sample article may be cut out from the selected sample article image as the target article image, or background content except for image content corresponding to the sample article in the selected sample article image is subjected to blurring or color normalization and other processing, such as changing the background content into pure color content, taking the processed sample article image as a target article image, obtaining a target article image corresponding to each sample article, and then according to the article information of each sample article, splicing the obtained multiple target object images, rendering the spliced images according to the image information of the sample object placement image to be generated so as to obtain the sample object placement image, correspondingly, after the sample article placement image is obtained, the sample article labeling information of the sample article placement image can be obtained according to the position information, the article information and the like of each sample article in the sample article placement image, or a mask image corresponding to the sample article placement image can be generated, the mask part in the mask image can correspond to the sample article, and obtaining the sample article marking information of the sample article placing image through the mask part and the non-mask part in the mask image.
Accordingly, for generating the sample item arrangement image by constructing the virtual model, the following steps can be performed:
generating a three-dimensional article virtual model of a sample article corresponding to the sample image set based on at least one sample article image in the sample image set;
and carrying out simulation intensive placement on the generated three-dimensional article virtual model, and generating a sample article placement image simulating article placement and sample article labeling information of the sample article placement image, wherein the sample article placement image comprises at least part of the generated three-dimensional article virtual model.
Here, the sample article placement image is generated by constructing the virtual model, and for each sample article, mapping information of the sample article, for example, information such as article features, size, shape, and the like in each sample article image may be obtained by identifying at least one corresponding sample article image, and then a three-dimensional virtual model of the sample article may be reconstructed using the mapping information.
Specifically, for generating the three-dimensional virtual model of the sample article through modeling, an initial virtual model of the sample article corresponding to the sample image set may be generated based on at least one sample article image in the sample image set, that is, for each sample article, a basic model may be generated first according to information of the sample article in each sample article image corresponding to the sample article, and then texture rendering and material rendering may be performed on the initial virtual model according to the article information of the sample article, so as to obtain the three-dimensional virtual model of the sample article.
In practical applications, the rendering of the initial virtual model may be a method of using rasterization rendering in cooperation with style migration, or may be a method of using raster tracing rendering to render the initial virtual model of the sample article. Preferably, a raster tracing rendering mode may be used, for example, a blend rendering engine is used to render the initial virtual model of the sample object, and a better rendering effect may be achieved without matching style migration. Through rendering, a more real three-dimensional article virtual model of the sample article can be obtained.
Then, after obtaining the three-dimensional article virtual models corresponding to each sample article, at least part of the three-dimensional article virtual models in the obtained multiple three-dimensional article virtual models can be randomly combined, for example, according to a certain number of models, part of the three-dimensional article virtual models are combined, then the number of the three-dimensional article virtual models required to be combined is gradually increased, part of the three-dimensional article virtual models are further selected according to the number increased each time for combination until all the three-dimensional article virtual models form a combination, after obtaining the combination of the multiple three-dimensional article virtual models, simulation intensive placement can be performed, for example, for one combination, each model can be subjected to simulation intensive placement according to different positions, angles, overlapping conditions and the like, so as to obtain a sample article placement image simulating article placement, and can be performed through simulation intensive placement information, and obtaining the sample article marking information of the sample article placing image.
Specifically, the generated three-dimensional virtual article model is subjected to simulation dense placement to generate a sample article placement image simulating article placement, and the sample article marking information of the sample article placement image can be obtained by firstly carrying out simulation intensive placement on the generated three-dimensional article virtual model, thereby obtaining at least two article placement scenes for placing at least part of the three-dimensional article virtual models in the target space and placement information of each three-dimensional article virtual model in each article placement scene, namely, the simulation intensive placement is adopted to realize the simulation placement of at least part of the three-dimensional article virtual models, thereby simulating an article placing scene for displaying and placing the three-dimensional article virtual model, correspondingly, in the process of simulating and placing the three-dimensional article virtual models, the placing information of each three-dimensional article virtual model in each article placing scene can be obtained through simulation information for simulating intensive placing.
Then, after the article placement scenes are simulated, for each article placement scene, rendering each three-dimensional article virtual model in the article placement scene according to preset scene information for the article placement scene to obtain a rendered article placement scene, wherein the simulated scene is a blank scene, that is, no other content except the placed three-dimensional article virtual model exists, so that the article placement scene needs to be rendered to obtain a more real sample article placement scene fitting the real world.
Rendering the object placement scene, similar to the rendering of the initial virtual model, may be a method of using rasterization rendering to cooperate with style migration, or may be a method of using raster tracking rendering to render the object placement scene. Preferably, a raster-tracing rendering approach may be used.
Then, after the rendered sample article placement scene is obtained, a sample article placement image obtained by image acquisition of the rendered article placement scene can be obtained by simulating image acquisition of a real scene and the like.
And in addition, based on the placing information of each corresponding three-dimensional virtual object model, determining the sample object labeling information corresponding to the sample object placing image.
The sample article labeling information determined according to the placement information may be a mask image corresponding to the sample article placement image generated by rendering in the process of rendering the article placement scene, as described above, the mask image may be an image of which the remaining area except the area where the sample article is located is a solid color, and includes a mask portion corresponding to the sample article and a non-mask portion corresponding to a background portion, and the sample article labeling information of the sample article placement image may be obtained by dividing each portion in the mask image and the information of the sample article in the sample article placement image.
The sample article labeling information may include position information of the sample article in the sample article placement image, and article attribute information such as size information, color information, image feature information, and article type information of the sample article.
In specific application, in view of the need of simulating the placement of articles in a real scene and the collected images, for some self-service refrigerators, self-service freezers and the like, most of the images which can be collected have some corresponding changes, and the collected images have corresponding imaging effects through the images collected by the fisheye cameras, so that in the process of rendering the placement scene of the articles, the placement scene of the articles can be rendered in a manner of writing in a shader and the like, so that the fisheye distortion and other rendering changes are carried out on the placement scene of the articles, the collected placement images of the sample objects have the change of fisheye distortion, and the true placement images collected by the fisheye cameras are simulated.
Receiving the above S102, S103: inputting the sample article placement image and the real article placement image into a neural network, and acquiring first characteristic data and second characteristic data obtained by respectively performing characteristic extraction on the sample article placement image and the real article placement image by the neural network, and an article detection result of the sample article placement image output by the neural network.
In this step, after the sample article placement image of the virtual sample and the acquired real article placement image are constructed, a neural network constructed in advance may be trained, that is, the sample article placement image and the real article placement image may be input as inputs to the neural network, and an article detection result of the sample article placement image output by the neural network may be obtained by processing the images by the neural network, and in addition, when the real article placement image and the sample article placement image are processed by the neural network, feature extraction may be performed on the sample article placement image and the real article placement image to extract first feature data of the sample article placement image and second feature data of the real article placement image, respectively, and then, the images are detected and identified through the extracted feature data, and the like, so that first feature data and second feature data obtained by respectively extracting the features of the sample article placement image and the real article placement image through the neural network can be obtained.
In addition, in the embodiment of the present disclosure, the acquired real article placement image is not labeled, and the neural network is not trained by using the article detection result of the real article placement image, so that it is not necessary to acquire the article detection result of the real article placement image, or to ignore or lay the article detection result of the real article placement image output by the neural network.
Specifically, as for the object detection result of inputting the image into the neural network to obtain the first feature data and the second feature data, and the sample object placement image, the sample object placement image and the real object placement image may be input into the neural network, feature extraction may be performed on the sample object placement image and the real object placement image through a feature extraction layer of the neural network to obtain the first feature data of the sample object placement image and the second feature data of the real object placement image, that is, the input first feature data of the sample object placement image and the input second feature data of the real object placement image may be extracted through the feature extraction layer, and then the first feature data is input into an object detection layer of the neural network to obtain the object detection result of detecting the sample object placement image, that is, the first feature data and the second feature data extracted by the feature extraction layer may be input only into the article detection layer of the neural network for feature processing, so as to perform article detection on the sample article placement image, and obtain an article detection result.
Here, as described above, in the embodiment of the present disclosure, since it is not necessary to label the captured real article placement image, the neural network is not trained by the article detection result of the real article placement image, and therefore, only the first feature data may be input to the article detection layer, but the present disclosure is not limited thereto.
For example, please refer to fig. 2, fig. 2 is a schematic structural diagram of a neural network according to an embodiment of the present disclosure. As shown in fig. 2, the neural network may include a feature extraction layer and an article detection layer, the feature extraction layer is configured to extract image features of an input image, the article detection layer may perform detection, positioning, identification, and the like on an article in the input image according to the extracted image features, and accordingly, after the sample article placement image and the real article placement image are input to the neural network, the first feature data and the second feature data may be obtained through processing by the feature extraction layer, and further, after the first feature data passes through the article detection layer, an article detection result of the sample article placement image may be obtained.
Wherein the feature extraction layer may include at least one layer of convolutional neural network.
In addition, the neural network may further include some intermediate processing layers, such as a pooling layer, and the like, and the extracted feature data may be subjected to smoothing processing, pooling processing, normalization processing, and the like, so as to improve usability of the feature data, an overall detection effect, and the like.
S104: adjusting network parameters of the neural network based on the article detection result and corresponding sample article labeling information, the first characteristic data, the second characteristic data and a preset classification result aiming at the sample article placement image and the real article placement image until the neural network meets a training cutoff condition, and taking the trained neural network as an article detection model for detecting articles in the image.
In this step, after the article detection result, the first characteristic data and the second characteristic data are obtained, the parameter of the neural network may be adjusted according to the sample article labeling information of the sample article placement image and the preset classification result of the sample article placement image and the real article placement image, so as to complete one training, and then the step of inputting the sample article placement image and the real article placement image to the neural network is returned, the sample article placement image and the real article placement image are continuously inputted to the neural network after the parameter is adjusted for the first time, the network parameters of the neural network are continuously adjusted through the subsequent processing steps, so as to form a cyclic training, until the neural network after multiple training satisfies the training cutoff condition, the neural network may be considered to be used as an article detection model for detecting an article in an image after training.
The parameters of the neural network are adjusted through the difference among various contents, the loss of the neural network in each dimension can be calculated through various contents by calling a preset loss function, and then the adjustment direction, the adjustment size and the like of various network parameters in the neural network are determined through the calculated loss.
Correspondingly, the training cutoff condition may be that the loss of the neural network in each dimension is smaller than a loss threshold corresponding to each dimension, that is, the training cutoff condition is satisfied, but is not limited to this, and in other embodiments, the training cutoff condition may be that the specific training cutoff condition is specifically set according to a training requirement, for satisfying the training cutoff condition when the parameter adjustment times of the neural network is greater than or equal to a preset time.
Specifically, in the process of adjusting the network parameters of the neural network through loss, the processing procedures of the loss in the aspect of feature extraction for the sample article placement image and the real article placement image are the same, so for convenience and uniformity of expression, the first feature data and the second feature data are used as target feature data, and the sample article placement image and the real article placement image are used as target images.
Further, in some possible embodiments, for adjusting the network parameters of the neural network until the neural network meets the training cutoff condition, the following steps may be included:
first, a first loss value in feature extraction for the neural network is determined based on the target feature data and a preset classification result for the target image.
Here, after the target data is obtained, a preset classification result for the target image may be obtained in advance, and a first loss value of the neural network in the aspect of feature extraction, that is, a loss for feature extraction by the feature extraction layer is measured by performing calculation by combining the target data and the preset classification result.
When images are classified by using feature data conventionally, the classification result of the image is often used to indicate the result of the classification of the image obtained by identifying the image, for example, whether the image belongs to an a-class image, a B-class image, a C-class image, or the like.
In the application, in the process of training the neural network by using the constructed virtual image data, namely the sample article placement image, in combination with a small number of real images, namely the real article placement images, because the sample article placement image is a simulated virtual image and belongs to data of a virtual data domain, and the real article placement image belongs to data of a real data domain, the characteristic data extracted from the sample article placement image and the data of the real data domain are different, according to a conventional manner, a classification result obtained by identifying the sample article placement image is a virtual image, and a classification result obtained by identifying the real article placement image is a real image.
In the present application, the trained neural network is used for detecting the actually acquired images, so that the actually processed images are data in an actual data field, and in order to ensure that the trained neural network can ensure the accuracy of data processing and detection results, when a large amount of virtual data is used for training, even when a plurality of sample article placement images are used for training, the neural network should ignore the fact that the plurality of sample article placement images are simulated virtual images and the fact that the actual article placement images are actual images, but rather treat the sample article placement images as identical to the actual article placement images, so as to ignore the difference between the virtual data field and the actual data field.
Accordingly, the preset classification result may refer to whether the target image is a real image or a virtual image, that is, whether the target image is the sample article placement image or the real article placement image, and the classification of the target image is consistent, for example, in the case that the classification result of the target image is represented by the output classification probability, the preset classification result may be better at the intermediate result of the classification results of the classes, for example, two classes are taken as an example, when the classification probability is higher, that is, the closer to 1, the more preferable the classification result is to belong to the class a image, and when the obtained classification probability is smaller, that is, the closer to 0, the more preferable the preset classification result may be 0.5, that is, regardless of whether the sample article placement image or the real article placement image, under the condition that the classification results of detection and identification are the same, the difference between the virtual data domain and the real data domain can be compensated and offset.
Specifically, for determining the first loss value, an image classification result of the target image may be determined based on the target feature data, for example, by using a neural network for image classification, or by using an algorithm for user image classification, or the like, that is, the image classification result of the target image may be identified or calculated by using the target feature data, and then, based on the image classification result and a preset classification result for the target image, a first loss value in terms of feature extraction for the neural network may be determined, for example, by using a corresponding loss function, or the first loss value may be obtained by directly comparing and/or calculating the preset classification result and the determined image classification result.
For example, please refer to fig. 3, fig. 3 is a schematic diagram of a process of recognizing an image in a training process of a neural network according to an embodiment of the present disclosure. As shown in fig. 3, after the sample article placement image and the real article placement image are input to the neural network, first feature data of the sample article placement image and second feature data of the real article placement image can be obtained through feature extraction of a feature extraction layer of the neural network, at this time, in order to complete effective training of the feature extraction layer, the feature extraction layer implements domain adaptive learning during feature extraction, and ignores a difference between a virtual data field and a real data field, training of the neural network can be assisted by externally connecting a trained domain adaptive target detection network to the neural network, and accordingly, the first feature data and the second feature data as the target feature data can be input to the domain adaptive target detection network, and identifying and detecting the input characteristic data through the field self-adaptive target detection network to obtain a corresponding image discrimination result, namely an image classification result of the sample article placement image and the real article placement image, and further calculating the corresponding first loss value by combining the image classification result with the preset classification result.
In practical application, the domain adaptive target detection network may include a gradient inversion layer and a domain discriminator, and the gradient inversion layer is used for realizing fusion of two domain features in a process of back propagation. The domain discriminator is used for judging which domain the input feature data (namely the target feature data) belongs to (namely, by identifying whether the target feature data belongs to a virtual data domain or a real data domain, so as to judge whether the target image is a constructed virtual image or a collected real image), and continuously learning along with the training of the neural network, when the discriminator cannot distinguish which domain the target feature data specifically belongs to (for example, the output probability is 0.5), the features of the model in the two domains are unified, and then the network can be generated by using a common candidate frame for subsequent detection. By the aid of domain adaptation and excellent rendering capability of a rendering process, the neural network does not need style migration, data processing time can be saved, and data processing efficiency can be improved.
Then, a second loss value in item detection for the neural network is determined based on the item detection result and corresponding sample item tagging information.
Here, after the article detection result is obtained, the second loss value may be obtained by combining the sample article labeling information, by using a corresponding loss function, or by directly comparing and/or performing feature calculation and the like on the article detection result and the sample article labeling information.
The article detection result may be similar to the sample article labeling information, and also includes article attribute information such as position information of each sample article in the sample article placement image, and size information, color information, image feature information, and article type information of the sample article.
In practical applications, the first loss value and the second loss value of the neural network may be determined by a cross entropy loss function or the like.
For example, as shown in fig. 3, after the feature extraction layer extracts first feature data of the sample article placement image, the first feature data may be input into an article detection layer of the neural network, so that an article detection result of the sample article detection image is obtained through processing of the first feature data by the article detection layer, and the second loss value may be obtained by combining with the sample article labeling information.
Then, network parameters of the neural network are adjusted based on the first loss value and the second loss value until the neural network satisfies a training cutoff condition.
Here, the network parameters of the neural network may be adjusted correspondingly by the guidance, the deviation, and the like of the first loss value and the second loss value, so as to complete one training of the neural network, and then the step of inputting the sample article placement image and the real article placement image to the neural network may be returned, and the next training of the neural network may be continued, so as to implement a plurality of rounds of cyclic training of the neural network.
In the process of training the neural network, it can be seen from the above description that, for the training of the feature extraction layer and the training of the article detection layer, the two layers are trained for different dimensions, and accordingly, in the actual training process, the two layers can be trained respectively, for example, the feature extraction layer is trained first, and after the training of the feature extraction layer is completed, the article detection layer is trained, so that the extracted feature data is relatively accurate, and the article detection layer has a better pertinence in processing the input feature data extracted by the feature extraction layer, thereby improving the training efficiency of the article detection layer correspondingly and reducing the training time.
Specifically, in a possible embodiment, the training cutoff condition may include a first cutoff condition and a second cutoff condition, and correspondingly, for using the first loss value and the second loss value, adjusting the network parameters of the neural network until the neural network satisfies the training cutoff condition after training, so as to implement training of the neural network, the network parameters of the feature extraction layer of the neural network may be adjusted based on the first loss value, thereby completing training of the feature extraction layer once, and then the step of inputting the sample object placement image and the real object placement image into the neural network may be returned to perform next training of the feature extraction layer, and during a plurality of training processes, until the neural network satisfies the first cutoff condition, at this time, it may be considered that training of the feature extraction layer is completed, the training may be stopped, and then the step of inputting the sample item placement image and the real item placement image to the neural network may be returned again to prepare for starting the training of the item detection layer.
Then, after the training of the feature extraction layer is completed, based on the second loss value, adjusting the network parameters of the article detection layer of the neural network and the network parameters between the article detection layer and the feature extraction layer until the neural network satisfies the second cutoff condition, that is, by the second loss value, not only the network parameters inside the article detection layer but also the network parameters between the article detection layer and the feature extraction layer can be adjusted, thereby completing the training of the article detection layer once, and then returning to the step of inputting the sample article placement image and the real article placement image to the neural network again, extracting the feature data of the input image by the feature extraction layer which has been trained last time, and then inputting the extracted feature data to the article detection layer, and then determining a second loss value according to the recognition result, adjusting the network parameters in the article detection layer and the network parameters between the article detection layer and the feature extraction layer again, thereby completing the training of the article detection layer again to realize the circular training, and in the process of the circular training, until after the training of a certain time is completed, the neural network meets the second cutoff condition, the article detection layer can be considered to be trained completely.
The first cutoff condition may include that the first loss is less than a corresponding first loss threshold, and/or the number of times of training of the feature extraction layer is greater than or equal to a first number of times of training, and the second cutoff condition may include that the second loss is less than a corresponding second loss threshold, and/or the number of times of training of the item detection layer is greater than or equal to a second number of times of training.
When the article detection layer is trained, the real article placement image is not needed, and therefore, after the step of inputting the sample article placement image and the real article placement image to the neural network is returned, only the sample article placement image may be input to the neural network.
In addition, in the process of training the neural network, the whole neural network may be trained as in the conventional network training, so that the data processing amount of the neural network as a whole may be appropriately reduced.
Specifically, for adjusting the network parameters of the neural network by using the first loss value and the second loss value until the neural network satisfies a training cutoff condition after training, the training of the neural network may be implemented by adjusting the network parameters of the feature extraction layer and the network parameters of the article detection layer of the neural network and the network parameters between the article detection layer and the feature extraction layer based on the first loss value and the second loss value until the neural network satisfies the training cutoff condition.
Correspondingly, when the overall training is performed on the neural network, the total loss value of the neural network may be calculated through the first loss value and the second loss value, and the total loss value is used as an overall basis for parameter adjustment of the overall neural network.
The neural network training method provided by the embodiment of the disclosure obtains a plurality of sample object images and a small number of real object placing images, further, a sample article placing image simulating article placing is generated through the sample article image, the construction of training data is realized, and can synchronously obtain the marking information of the sample article placing image, greatly reduce the workload of real sample image acquisition, and can further save the workload and time for labeling the collected images, reduce the acquisition time of the training samples, reduce the consumption of resources and cost such as manpower and material resources, save the sample construction time, improve the efficiency of collecting the training samples, and the influence brought by factors such as environment, time, illumination and the like when real training samples are acquired can be effectively reduced, the richness of the training samples is increased, and the training efficiency of the neural network is effectively improved.
Further, by relying on the article detection model obtained by training the neural network training method shown in fig. 1, the identification of the acquired real image can be performed, and accordingly, an embodiment of the present disclosure further provides an article detection method, please refer to fig. 4, and fig. 4 is a flowchart of an article detection method provided by the embodiment of the present disclosure. As shown in fig. 4, an article detection method provided by an embodiment of the present disclosure includes:
s401: and acquiring an image to be detected.
S402: and inputting the image to be detected into an article detection model obtained by training according to the neural network training method to obtain a detection result of the image to be detected.
Here, after the object detection model obtained by the neural network training method is trained, the object detection model may be used, and the obtained image to be detected may be input to the object detection model, so that the detection result of the image to be detected output by the object detection model may be obtained.
According to the article detection method provided by the embodiment of the disclosure, the article detection model which is obtained by training through the training method and has a good recognition effect is used for detecting the image to be detected, so that the detection result is high in accuracy, high in speed and good in robustness.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
Based on the same inventive concept, the embodiment of the present disclosure further provides a neural network training device corresponding to the neural network training method, and as the principle of solving the problem of the device in the embodiment of the present disclosure is similar to the neural network training method in the embodiment of the present disclosure, the implementation of the device may refer to the implementation of the method, and the repeated parts are not described again.
Referring to fig. 5, fig. 5 is a schematic diagram of a neural network training device according to an embodiment of the disclosure, and as shown in fig. 5, a neural network training device 500 according to an embodiment of the disclosure includes:
an item image obtaining module 510, configured to obtain a sample image set and a collected item image set, where the sample image set includes at least one sample item image collected for the same sample item, and the collected item image includes a collected real item placement image.
A sample image generating module 520, configured to generate a sample article placement image simulating the placement of the article and sample article labeling information of the sample article placement image based on the sample image set.
A sample image processing module 530, configured to input the sample article placement image and the real article placement image to a neural network, and obtain first feature data and second feature data obtained by performing feature extraction on the sample article placement image and the real article placement image by the neural network, respectively, and an article detection result of the sample article placement image output by the neural network.
And the neural network training module 540 is configured to adjust network parameters of the neural network based on the article detection result and corresponding sample article labeling information, the first feature data, the second feature data, and a preset classification result for the sample article placement image and the real article placement image until the neural network meets a training cutoff condition, and use the trained neural network as an article detection model for detecting an article in an image.
In an optional implementation manner, the sample image generation module 520 is specifically configured to:
generating a three-dimensional article virtual model of a sample article corresponding to the sample image set based on at least one sample article image in the sample image set;
and carrying out simulation intensive placement on the generated three-dimensional article virtual model, and generating a sample article placement image simulating article placement and sample article marking information of the sample article placement image, wherein the sample article placement image comprises at least part of the generated three-dimensional article virtual model.
In an optional embodiment, the sample image generation module 520, when configured to generate a three-dimensional virtual object model of a sample object corresponding to the sample image set based on at least one sample object image in the sample image set, is specifically configured to:
generating an initial virtual model of a sample article corresponding to the sample image set based on at least one sample article image in the sample image set;
and performing texture rendering and material rendering on the initial virtual model according to the article information of the sample article to obtain a three-dimensional article virtual model of the sample article.
In an optional embodiment, the sample image generating module 520, when configured to perform simulated dense placement on the generated three-dimensional virtual model of the article, generate a sample article placement image simulating placement of the article, and sample article labeling information of the sample article placement image, is specifically configured to:
carrying out simulation intensive placement on the generated three-dimensional article virtual models to obtain at least two article placement scenes for placing at least part of the three-dimensional article virtual models in a target space and placement information of each three-dimensional article virtual model in each article placement scene;
rendering each three-dimensional object virtual model in each object placement scene according to preset scene information for the object placement scenes to obtain the rendered object placement scenes;
acquiring a sample article placement image obtained by carrying out image acquisition on the rendered article placement scene;
and determining sample article marking information corresponding to the sample article placing image based on the placing information of each corresponding three-dimensional article virtual model.
In an optional implementation manner, the sample image processing module 530 is specifically configured to:
inputting the sample article placement image and the real article placement image into a neural network, and performing feature extraction on the sample article placement image and the real article placement image through a feature extraction layer of the neural network to obtain first feature data of the sample article placement image and second feature data of the real article placement image;
and inputting the first characteristic data into an article detection layer of the neural network to obtain an article detection result for detecting the article in the sample article placement image.
In an optional implementation, the neural network training module 540 is specifically configured to:
determining a first loss value in feature extraction for the neural network based on the target feature data and a preset classification result for the target image;
determining a second loss value for the neural network in terms of item detection based on the item detection results and corresponding sample item tagging information;
adjusting network parameters of the neural network based on the first loss value and the second loss value, and returning to the step of inputting the sample article placement image and the real article placement image to the neural network until the neural network meets a training cutoff condition.
In an optional embodiment, the neural network training module 540, when configured to determine the first loss value in feature extraction for the neural network based on the target feature data and the preset classification result for the target image, is specifically configured to:
determining an image classification result of the target image based on the target feature data;
determining a first loss value in feature extraction for the neural network based on the image classification result and a preset classification result for the target image.
In an optional embodiment, the neural network training module 540, when configured to adjust the network parameters of the neural network based on the first loss value and the second loss value, and return to the step of inputting the sample item placement image and the real item placement image to the neural network until the neural network meets the training cutoff condition, is specifically configured to:
adjusting network parameters of a feature extraction layer of the neural network based on the first loss value until the neural network satisfies the first cutoff condition;
and adjusting the network parameters of the article detection layer of the neural network and the network parameters between the article detection layer and the feature extraction layer based on the second loss value until the neural network meets the second cutoff condition.
In an optional embodiment, the neural network training module 540, when configured to adjust the network parameters of the neural network based on the first loss value and the second loss value, and return to the step of inputting the sample item placement image and the real item placement image to the neural network until the neural network meets the training cutoff condition, is specifically configured to:
and adjusting network parameters of a feature extraction layer and an article detection layer of the neural network and network parameters between the article detection layer and the feature extraction layer based on the first loss value and the second loss value, and returning to the step of inputting the sample article placement image and the real article placement image to the neural network until the neural network meets a training cutoff condition.
The neural network training device provided by the embodiment of the disclosure can greatly reduce the workload of real sample image acquisition by acquiring a plurality of sample object images and a small number of real object placing images and further generating sample object placing images simulating object placing through the plurality of sample object images, thereby realizing the construction of training data, synchronously obtaining the marking information of the sample object placing images, further saving the workload and time for marking the acquired images, reducing the acquisition time of the training samples, reducing the consumption of resources and cost such as manpower and material resources, saving the sample construction time, improving the efficiency of acquiring the training samples, effectively reducing the influence caused by the factors such as environment, time and illumination when acquiring the real training samples, increasing the richness of the training samples, and when training the neural network, the neural network is trained by using the sample object placement images and combining the real object placement images, the neural network can simultaneously learn the image characteristics from the virtual world and the real world, the fusion of the virtual domain and the real domain is realized through training, the training effectiveness of the neural network can be ensured, the robustness and the generalization capability of neural network training are effectively improved, and the training efficiency of the neural network is effectively improved.
Based on the same inventive concept, an article detection device corresponding to the article detection method is also provided in the embodiments of the present disclosure, and as the principle of solving the problem of the device in the embodiments of the present disclosure is similar to the article detection method in the embodiments of the present disclosure, the implementation of the device may refer to the implementation of the method, and repeated details are not repeated.
Referring to fig. 6, fig. 6 is a schematic view of an article detection device according to an embodiment of the disclosure. As shown in fig. 6, an article detection apparatus 600 provided by an embodiment of the present disclosure includes:
an obtaining module 610, configured to obtain an image to be detected;
and the detection module 620 is configured to input the image to be detected into an article detection model obtained by training according to the neural network training device, so as to obtain a detection result of the image to be detected.
The article detection device provided by the embodiment of the disclosure detects the image to be detected by using the article detection model which is obtained by training through the training device and has a good recognition effect, and has the advantages of high accuracy and high speed of a detection result and good robustness.
The description of the processing flow of each module in the apparatus and the interaction flow between the modules may refer to the relevant description in the above method embodiments, and will not be described in detail here.
An embodiment of the present disclosure further provides a computer device 700, as shown in fig. 7, which is a schematic structural diagram of the computer device 700 provided in the embodiment of the present disclosure, and includes: a processor 710, a memory 720, and a bus 730. The memory 720 stores machine-readable instructions executable by the processor 710. When the computer device 700 is operating, the processor 710 and the memory 720 communicate via the bus 703, and the machine readable instructions, when executed by the processor 710, perform the steps of the neural network training method or the item detection method described above.
The embodiments of the present disclosure also provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the steps of the neural network training method or the article detection method described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.
An embodiment of the present disclosure further provides a computer program product, where the computer program product carries a program code, and instructions included in the program code may be used to execute the steps of the neural network training method or the article detection method in the foregoing method embodiments, which may be specifically referred to in the foregoing method embodiments and are not described herein again.
The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus described above may refer to the corresponding process in the foregoing method embodiment, and is not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: those skilled in the art can still make modifications or changes to the embodiments described in the foregoing embodiments, or make equivalent substitutions for some of the technical features, within the technical scope of the disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (14)

1. A neural network training method, the method comprising:
acquiring a sample image set and a collected article image set, wherein the sample image set comprises at least one sample article image collected aiming at the same sample article, and the collected article image comprises a collected real article placement image;
generating a sample article placement image simulating article placement and sample article labeling information of the sample article placement image based on the sample image set;
inputting the sample article placement image and the real article placement image into a neural network, and acquiring first characteristic data and second characteristic data obtained by respectively performing characteristic extraction on the sample article placement image and the real article placement image by the neural network, and an article detection result of the sample article placement image output by the neural network;
adjusting network parameters of the neural network based on the article detection result and corresponding sample article labeling information, the first characteristic data, the second characteristic data and a preset classification result aiming at the sample article placement image and the real article placement image until the neural network meets a training cutoff condition, and taking the trained neural network as an article detection model for detecting articles in the image.
2. The method of claim 1, wherein generating a sample item placement image that simulates the placement of an item and sample item labeling information for the sample item placement image based on the sample image set comprises:
generating a three-dimensional article virtual model of a sample article corresponding to the sample image set based on at least one sample article image in the sample image set;
and carrying out simulation intensive placement on the generated three-dimensional article virtual model, and generating a sample article placement image simulating article placement and sample article labeling information of the sample article placement image, wherein the sample article placement image comprises at least part of the generated three-dimensional article virtual model.
3. The method of claim 2, wherein generating a virtual model of a three-dimensional article of a sample article to which the sample image set corresponds based on at least one sample article image in the sample image set comprises:
generating an initial virtual model of a sample article corresponding to the sample image set based on at least one sample article image in the sample image set;
and performing texture rendering and material rendering on the initial virtual model according to the article information of the sample article to obtain a three-dimensional article virtual model of the sample article.
4. The method of claim 2, wherein the simulating dense placement of the generated three-dimensional virtual model of the object, the generating of the sample object placement image simulating the placement of the object, and the sample object labeling information of the sample object placement image comprise:
carrying out simulation intensive placement on the generated three-dimensional article virtual models to obtain at least two article placement scenes for placing at least part of the three-dimensional article virtual models in a target space and placement information of each three-dimensional article virtual model in each article placement scene;
rendering each three-dimensional object virtual model in each object placement scene according to preset scene information for the object placement scenes to obtain the rendered object placement scenes;
acquiring a sample article placement image obtained by carrying out image acquisition on the rendered article placement scene;
and determining sample article marking information corresponding to the sample article placing image based on the placing information of each corresponding three-dimensional article virtual model.
5. The method according to claim 1, wherein the inputting the sample article placement image and the real article placement image into a neural network, and acquiring first feature data and second feature data obtained by feature extraction of the sample article placement image and the real article placement image by the neural network, and an article detection result of the sample article placement image output by the neural network, comprises:
inputting the sample article placement image and the real article placement image into a neural network, and performing feature extraction on the sample article placement image and the real article placement image through a feature extraction layer of the neural network to obtain first feature data of the sample article placement image and second feature data of the real article placement image;
and inputting the first characteristic data into an article detection layer of the neural network to obtain an article detection result for detecting the article of the sample article placement image.
6. The method of claim 1, wherein taking the first feature data and the second feature data as target feature data, taking the sample article placement image and the real article placement image as target images, and adjusting network parameters of the neural network based on the article detection result and corresponding sample article labeling information, and the first feature data, the second feature data and a preset classification result for the sample article placement image and the real article placement image until the neural network meets a training cutoff condition comprises:
determining a first loss value in terms of feature extraction for the neural network based on the target feature data and a preset classification result for the target image;
determining a second loss value for the neural network in terms of item detection based on the item detection results and corresponding sample item tagging information;
adjusting network parameters of the neural network based on the first loss value and the second loss value until the neural network satisfies a training cutoff condition.
7. The method of claim 6, wherein determining a first loss value in feature extraction for the neural network based on the target feature data and a preset classification result for the target image comprises:
determining an image classification result of the target image based on the target feature data;
determining a first loss value in feature extraction for the neural network based on the image classification result and a preset classification result for the target image.
8. The method of claim 6, wherein the training cutoff condition comprises a first cutoff condition and a second cutoff condition, and wherein adjusting the network parameters of the neural network based on the first loss value and the second loss value until the neural network satisfies the training cutoff condition comprises:
adjusting network parameters of a feature extraction layer of the neural network based on the first loss value until the neural network meets the first cutoff condition;
and adjusting the network parameters of the article detection layer of the neural network and the network parameters between the article detection layer and the feature extraction layer based on the second loss value until the neural network meets the second cutoff condition.
9. The method of claim 6, wherein adjusting the network parameters of the neural network based on the first loss value and the second loss value until the neural network satisfies a training cutoff condition comprises:
and adjusting the network parameters of a feature extraction layer and an article detection layer of the neural network and the network parameters between the article detection layer and the feature extraction layer until the neural network meets a training cut-off condition based on the first loss value and the second loss value.
10. A method of item detection, the method comprising:
acquiring an image to be detected;
inputting the image to be detected into an article detection model obtained by training according to the neural network training method of any one of claims 1 to 9, and obtaining a detection result for a target article in the image to be detected.
11. An apparatus for neural network training, the apparatus comprising:
the system comprises an article image acquisition module, a storage module and a display module, wherein the article image acquisition module is used for acquiring a sample image set and a collected article image set, the sample image set comprises at least one sample article image acquired aiming at the same sample article, and the collected article image comprises an acquired real article placement image;
the sample image generation module is used for generating a sample article placement image for simulating article placement and sample article labeling information of the sample article placement image based on the sample image set;
a sample image processing module, configured to input the sample article placement image and the real article placement image to a neural network, obtain first feature data and second feature data obtained by performing feature extraction on the sample article placement image and the real article placement image by the neural network, and obtain an article detection result of the sample article placement image output by the neural network;
and the neural network training module is used for adjusting network parameters of the neural network based on the article detection result and corresponding sample article labeling information, the first characteristic data, the second characteristic data and a preset classification result aiming at the sample article placement image and the real article placement image until the neural network meets a training cutoff condition, and taking the trained neural network as an article detection model for detecting articles in the image.
12. An article detection device, the device comprising:
the acquisition module is used for acquiring an image to be detected;
a detection module, configured to input the image to be detected into an article detection model obtained through training according to the neural network training method of any one of claims 1 to 9, to obtain a detection result for a target article in the image to be detected.
13. A computer device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when a computer device is run, the machine-readable instructions, when executed by the processor, performing the steps of the neural network training method of any one of claims 1 to 9 or the item detection method of claim 10.
14. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, performs the steps of the neural network training method as set forth in any one of claims 1 to 9 or the item detection method as set forth in claim 10.
CN202210692178.1A 2022-06-17 2022-06-17 Neural network training method, article detection method, apparatus, device and medium Withdrawn CN115035343A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210692178.1A CN115035343A (en) 2022-06-17 2022-06-17 Neural network training method, article detection method, apparatus, device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210692178.1A CN115035343A (en) 2022-06-17 2022-06-17 Neural network training method, article detection method, apparatus, device and medium

Publications (1)

Publication Number Publication Date
CN115035343A true CN115035343A (en) 2022-09-09

Family

ID=83125861

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210692178.1A Withdrawn CN115035343A (en) 2022-06-17 2022-06-17 Neural network training method, article detection method, apparatus, device and medium

Country Status (1)

Country Link
CN (1) CN115035343A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116935179A (en) * 2023-09-14 2023-10-24 海信集团控股股份有限公司 Target detection method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116935179A (en) * 2023-09-14 2023-10-24 海信集团控股股份有限公司 Target detection method and device, electronic equipment and storage medium
CN116935179B (en) * 2023-09-14 2023-12-08 海信集团控股股份有限公司 Target detection method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111626218B (en) Image generation method, device, equipment and storage medium based on artificial intelligence
Shen et al. Exemplar-based human action pose correction and tagging
Yang et al. Robust face alignment under occlusion via regional predictive power estimation
CN103295016B (en) Behavior recognition method based on depth and RGB information and multi-scale and multidirectional rank and level characteristics
Dibra et al. Shape from selfies: Human body shape estimation using cca regression forests
Albarelli et al. Fast and accurate surface alignment through an isometry-enforcing game
CN107808358A (en) Image watermark automatic testing method
CN114758362B (en) Clothing changing pedestrian re-identification method based on semantic perception attention and visual shielding
CN111445426B (en) Target clothing image processing method based on generation of countermeasure network model
CN111259814B (en) Living body detection method and system
Buoncompagni et al. Saliency-based keypoint selection for fast object detection and matching
CN110827312A (en) Learning method based on cooperative visual attention neural network
CN108960412A (en) Image-recognizing method, device and computer readable storage medium
CN111783593A (en) Human face recognition method and device based on artificial intelligence, electronic equipment and medium
Sagues-Tanco et al. Fast synthetic dataset for kitchen object segmentation in deep learning
Yu et al. Traffic sign detection based on visual co-saliency in complex scenes
Xu et al. Generative image completion with image-to-image translation
Cao et al. Accurate 3-D reconstruction under IoT environments and its applications to augmented reality
Wang et al. Dynamic human body reconstruction and motion tracking with low-cost depth cameras
CN115035343A (en) Neural network training method, article detection method, apparatus, device and medium
Shrestha et al. A real world dataset for multi-view 3d reconstruction
CN109934129A (en) A kind of man face characteristic point positioning method, device, computer equipment and storage medium
Hassner et al. SIFTing through scales
CN114004772A (en) Image processing method, image synthesis model determining method, system and equipment
Gass et al. Warp that smile on your face: Optimal and smooth deformations for face recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20220909