CN111046974A

CN111046974A - Article classification method and device, storage medium and electronic equipment

Info

Publication number: CN111046974A
Application number: CN201911364231.XA
Authority: CN
Inventors: 宋德超; 陈翀; 陈勇; 郑威; 李斌山; 李雨铭
Original assignee: Gree Electric Appliances Inc of Zhuhai; Zhuhai Lianyun Technology Co Ltd
Current assignee: Gree Electric Appliances Inc of Zhuhai; Zhuhai Lianyun Technology Co Ltd
Priority date: 2019-12-25
Filing date: 2019-12-25
Publication date: 2020-04-21
Anticipated expiration: 2039-12-25
Also published as: CN111046974B

Abstract

The application relates to the technical field of electronic information, in particular to an article classification method, device, storage medium and electronic equipment, and solves the problems of inaccurate classification and low efficiency caused by manual garbage classification identification. The method comprises the following steps: obtaining a video to be detected; according to the video to be detected, an article classification model is utilized to obtain a classification result of articles in each frame of image included in the video to be detected; and identifying the articles in the corresponding images according to the classification result of the articles in each frame of image to obtain an identified article classification video. The video to be detected is obtained through real-time analysis, the positions and the categories of the articles in each frame of image are identified according to the classification result of the articles in each frame of image, and then the identified article classification video is obtained, so that a user can classify the articles in the article classification video according to the identified article classification video, the article classification efficiency is improved, and the trouble of the user in article classification is reduced.

Description

Article classification method and device, storage medium and electronic equipment

Technical Field

The present application relates to the field of electronic information technologies, and in particular, to a method and an apparatus for classifying an article, a storage medium, and an electronic device.

Background

In real life, people need to classify different kinds of articles, such as: garbage classification, food classification, material classification and the like are increasingly serious along with the continuous improvement of living standard of people, particularly the garbage classification problem. Therefore, in order to alleviate the problems caused by garbage, more and more cities are beginning to introduce garbage classification policies to urge people to classify garbage.

As most people have fuzzy concept and poor classification consciousness on article classification, people cannot correctly classify articles; and the articles are artificially classified, resulting in low classification efficiency.

Therefore, how to improve the efficiency and accuracy of article classification is a problem to be solved urgently at present.

Disclosure of Invention

In view of the above problems, the present application provides an article classification method, apparatus, storage medium, and electronic device, which solve the problems of inaccurate classification and low efficiency caused by artificially identifying article types in the prior art.

In a first aspect, the present application provides a method of sorting an article, the method comprising:

obtaining a video to be detected;

according to the video to be detected, an article classification model is utilized to obtain a classification result of articles in each frame of image included in the video to be detected;

and identifying the articles in the corresponding images according to the classification result of the articles in each frame of image to obtain an identified article classification video.

According to an embodiment of the application, optionally, in the above method, the article classification model is constructed by:

obtaining a plurality of frames of training sample images, wherein each frame of training sample image carries sample boundary box information of an article and sample category information of the article respectively;

inputting each frame of training sample image into a feature extraction network of a preset model to obtain a target feature vector of an article in each frame of training sample image;

inputting the target characteristic vector of the article in each frame of training sample image into a classification network of a preset model to obtain an article classification prediction result corresponding to each frame of training sample image, wherein the article classification prediction result comprises prediction boundary box information of the article, prediction category information of the article and a confidence coefficient;

calculating a loss function corresponding to each frame of training sample images according to the difference between the predicted bounding box information of the article corresponding to each frame of training sample images and the sample bounding box information of the article, the difference between the predicted category information of the article and the sample category information of the article, and the confidence;

and processing the classification network according to each loss function to obtain the article classification model.

According to an embodiment of the present application, optionally, in the above method, the step of obtaining multiple frames of training sample images includes:

and acquiring a training sample video, and acquiring the multi-frame training sample image by adopting the preset model.

According to an embodiment of the present application, in the method, optionally, the feature extraction network includes an LSTM network or a bottleeck network.

According to an embodiment of the present application, optionally, in the method, the step of inputting each frame of the training sample image into the feature extraction network of the preset model to obtain the target feature vector of the article in each frame of the training sample image includes:

step a, obtaining a feature vector of an article in each frame of training sample image in the multiple frames of training sample images, taking the feature vector of the article in a first frame of training sample image in the multiple frames of training sample images as a target feature vector of the article in the frame of training sample image, taking the feature vector of the article in each frame of training sample image in other frames of training sample images except the first frame of training sample image as an initial feature vector of the article in the frame of training sample image, and taking a next frame of training sample image adjacent to the first frame of training sample image as a current frame of training sample image;

b, calculating the initial characteristic vector of the object in the current frame training sample image and the target characteristic vector of the object in the previous frame training sample image according to a preset function to obtain the target characteristic vector of the object in the current frame training sample image;

and c, taking the next frame of training sample image adjacent to the current frame of training sample image as a new current frame of training sample image, and returning to execute the step b until target feature vectors respectively corresponding to the articles in each frame of training sample image in the plurality of frames of training sample images are obtained.

According to an embodiment of the present application, optionally, in the above method, the loss function includes a localization loss function and a category confidence error.

According to an embodiment of the present application, optionally, in the method, the step of processing the classification network according to each loss function to obtain the article classification model includes:

performing iterative processing on the classification network according to each loss function;

and when the number of times of the iterative processing reaches a preset number threshold, stopping the iterative processing of the classification network and outputting an article classification model.

In a second aspect, the present application provides an article sorting apparatus, the apparatus comprising:

the first obtaining module is used for obtaining a video to be detected;

the prediction module is used for obtaining a classification result of the articles in each frame of image included in the video to be detected by using an article classification model according to the video to be detected;

and the second obtaining module is used for identifying the articles in the corresponding images according to the classification result of the articles in each frame of image so as to obtain the identified article classification video.

In a third aspect, the present application provides a storage medium storing a computer program operable, when executed by one or more processors, to perform a method as described above.

In a fourth aspect, the present application provides an electronic device comprising a memory and a processor, the memory having stored thereon a computer program which, when executed by the processor, performs the method described above.

Compared with the prior art, one or more embodiments in the above scheme can have the following advantages or beneficial effects:

the application provides an article classification method, an article classification device, a storage medium and electronic equipment, wherein a video to be detected is obtained; according to the video to be detected, an article classification model is utilized to obtain a classification result of articles in each frame of image included in the video to be detected; and identifying the articles in the corresponding images according to the classification result of the articles in each frame of image to obtain an identified article classification video. The video to be detected is obtained through real-time analysis, the positions and the categories of the articles in each frame of image are identified according to the classification result of the articles in each frame of image, and then the identified article classification video is obtained, so that a user can classify the articles in the article classification video according to the identified article classification video, the article classification efficiency is improved, and the trouble of the user in article classification is reduced.

Drawings

The present application will be described in more detail below on the basis of embodiments and with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of an article classification method according to an embodiment of the present application.

Fig. 2 is a schematic diagram of a training process of an article classification model according to an embodiment of the present application.

Fig. 3 is another schematic flow chart of an article classification method according to an embodiment of the present application.

Fig. 4 is a schematic diagram of extracting a target feature vector of an article in a training sample image according to an embodiment of the present application.

Fig. 5 is another schematic flow chart of an article classification method according to an embodiment of the present application.

Fig. 6 is a connection block diagram of an article sorting apparatus according to a second embodiment of the present application.

In the drawings, like parts are designated with like reference numerals, and the drawings are not drawn to scale.

Detailed Description

The following detailed description will be provided with reference to the accompanying drawings and embodiments, so that how to apply the technical means to solve the technical problems and achieve the corresponding technical effects can be fully understood and implemented. The embodiments and various features in the embodiments of the present application can be combined with each other without conflict, and the formed technical solutions are all within the scope of protection of the present application.

Example one

Referring to fig. 1, the present application provides an article classification method applicable to an electronic device such as a mobile phone, a computer, or a tablet computer, and the steps S110 to S140 are executed when the article classification method is applied to the electronic device.

Step S110: and obtaining the video to be detected.

In this embodiment, the video to be detected includes multiple frames of images, and at least one frame of image in the video to be detected includes an article. The articles can comprise garbage, and further the garbage can be food garbage or plastic garbage. In this embodiment, the scheme is explained with garbage as an article.

The video to be detected can be recorded by a device with a camera component, including but not limited to: cell phones, tablets, cameras, etc. Exemplarily, a large amount of garbage can be generated in the cooking process, in order to facilitate accurate classification of the garbage, the cooking process in a kitchen is recorded by using a mobile phone to obtain a video to be detected, and then the video to be detected is processed by using an article classification model.

Step S120: and obtaining a classification result of the articles in each frame of image included in the video to be detected by using an article classification model according to the video to be detected.

In this embodiment, the article classification model is a pre-trained neural network model, and the article classification model can classify garbage appearing in each frame of image in a video to be detected, where the garbage appearing can be an apple core, a plastic pocket, or the like.

The classification result comprises category information and position information, wherein the category information comprises food waste, plastic waste and the like; the position information indicates a specific position of the garbage belonging to a certain class of information in the frame image.

For example, assuming that the garbage in the tenth image frame is an apple kernel, and the apple kernel is located in the upper left area of the image frame, the classification result is food garbage and the upper left area, which are respectively used as the type information and the location information.

It is understood that the category information may also include vegetables, fruits, fruit trash, recyclable trash, non-recyclable trash, etc. in order to obtain more accurate category information.

Step S130: and identifying the articles in the corresponding images according to the classification result of the articles in each frame of image to obtain an identified article classification video.

In this embodiment, the spam in each frame of image is identified, and the identification includes a location identification and a category identification, so as to visually display the location and the category of the spam appearing in each frame of image to the user.

The location identifier and the category identifier include, but are not limited to: numeric values, letters, characters, and geometric boxes. Generally, to facilitate distinguishing between the location and the category of the trash, the location identification and the category identification are not the same.

Still taking the example in the above step S120 as an example, the classification result is food waste and an upper left region, and the classification result of the frame image is identified, that is, the upper left region may be framed by a square frame, and the frame is identified by a value 1 (indicating food waste); when the user sees the frame image, the apple core in the upper left area of the frame image can be intuitively identified as food waste. Correspondingly, the value 2 can be used to indicate plastic waste.

It can be understood that after the object classification video with the identifier is obtained, the object classification video can be stored in the electronic device, and a user can visually view the position identifier and the category identifier in each frame of image in the object classification video by playing the object classification video stored in the electronic device, so that the user can classify the garbage appearing in each frame of image according to the two identifiers.

In this embodiment, a classification result of garbage in each frame of image included in a video to be detected is obtained by inputting the video to be detected into an article classification model, and then the garbage of each frame of image corresponding to the classification result of each frame of image is identified to obtain an identified article classification video, so that a user can classify the identified garbage in the article classification video according to the identified article classification video, thereby reducing the trouble of garbage classification and improving the efficiency of garbage classification.

Referring to fig. 2, fig. 2 is a schematic diagram illustrating a training process of an article classification model according to the present application. As shown in FIG. 2, the training process includes steps S210-S250.

Step S210: obtaining a plurality of frames of training sample images, wherein each frame of training sample image carries sample boundary box information of an article and sample category information of the article respectively.

Step S220: and inputting each frame of training sample image into a feature extraction network of a preset model to obtain a target feature vector of an article in each frame of training sample image.

Step S230: and inputting the target characteristic vector of the article in each frame of training sample image into a classification network of a preset model to obtain an article classification prediction result corresponding to each frame of training sample image, wherein the article classification prediction result comprises the prediction boundary box information of the article, the prediction category information of the article and the confidence coefficient.

Step S240: and calculating a loss function corresponding to each frame of the training sample image according to the difference between the predicted boundary box information of the article corresponding to each frame of the training sample image and the sample boundary box information of the article, the difference between the predicted category information of the article and the sample category information of the article, and the confidence coefficient.

Step S250: and processing the classification network according to each loss function to obtain the article classification model.

In this embodiment, a preset model is trained through an acquired multi-frame training sample image to obtain an article classification model, and the article classification model is used for classifying garbage appearing in each frame of image in a video.

In step S210, each frame of training sample image respectively carries sample boundary box information of garbage and sample category information, which can be manually completed, that is, the category and position of garbage in each frame of training sample image are manually determined, and the garbage in the frame of training sample image is labeled, so that each frame of training sample image carries corresponding sample boundary box information of garbage and sample category information of garbage, so that a subsequent step trains a preset model according to the sample boundary box information carrying corresponding garbage and the sample category information of garbage.

The method comprises the steps that garbage sample bounding box information represents the specific position of garbage in each frame of training sample image in the frame of image; the sample class information of the garbage represents the class of the garbage in each frame of image in each frame of training sample image. Sample bounding box information for spam includes, but is not limited to: upper left area, upper right area, lower left area, lower right area, center area, etc. Sample category information for spam includes, but is not limited to: food waste, plastic waste, and the like.

Illustratively, taking a tenth training sample image as an example, the garbage existing in the frame image is food garbage, the food garbage located in the upper left area in the frame training sample image is framed by a geometric frame manually, category identification is performed on the geometric frame, and the category is identified as food garbage, so as to obtain the tenth training sample image.

Wherein, the step S210 includes: and acquiring a training sample video, and acquiring the multi-frame training sample image by adopting the preset model. The training sample video may be collected manually. Also taking the above example in step S110 as an example, a person may record a cooking process in a kitchen by using a mobile phone to obtain a training sample video.

The method comprises the steps that manual work can mark the type and the position of rubbish in each frame of image in a training sample video, and therefore sample boundary box information of the rubbish and sample type information of the rubbish corresponding to each frame of training sample image are obtained. The marking method may refer to the marking method mentioned in step S210, which is not described herein; and inputting the marked training sample video into a preset model, and processing the training sample video by using the preset model to obtain a plurality of frames of training sample images included in the training sample video.

In step S220, the target feature vector represents feature information of garbage in each frame of the training sample image, and the feature information is used to identify the position and category of the garbage in each frame of the training sample image. The feature extraction network comprises an LSTM network or a Bottleneck network, and the Bottleneck network is used as an existing neural network. Each frame of training sample image is calculated through a Bottleneck network to obtain a new frame of training sample image, and then the new frame of training sample image is used as an input value of a model to be calculated so as to extract a target feature vector of the frame of training sample image. Therefore, the calculation amount can be reduced, and meanwhile, the Bottleneck-LSTM model is deeper than a standard LSTM network, and the effect of the Bottleneck-LSTM model is better than that of other shallow LSTM models.

Referring to fig. 3, it can be understood that, in order to improve the accuracy of the trained object classification model, information correlation may be performed between feature information of spam in training sample images of previous and subsequent frames. As shown in fig. 3, the step S220 includes a step S2201, a step S2202, and a step S2203.

Step S2201, obtaining a feature vector of garbage in each frame of training sample image in the multiple frames of training sample images, taking the feature vector of garbage in a first frame of training sample image in the multiple frames of training sample images as a target feature vector of garbage in the frame of training sample image, taking the feature vector of garbage in each frame of training sample image in other frames of training sample images except the first frame of training sample image as an initial feature vector of garbage in the frame of training sample image, and taking a next frame of training sample image adjacent to the first frame of training sample image as a current frame of training sample image.

Step S2202, calculates an initial feature vector of garbage in the current frame training sample image and a target feature vector of garbage in the previous frame training sample image according to a preset function, to obtain a target feature vector of garbage in the current frame training sample image.

And step S2203, taking the next frame of training sample image adjacent to the current frame of training sample image as a new current frame of training sample image, and returning to execute the step S2202 until target feature vectors respectively corresponding to garbage in each frame of training sample image in the multiple frames of training sample images are obtained.

The steps S2201-S2203 constitute a loop, which is aimed at obtaining target feature vectors corresponding to garbage in each frame of training sample image. Specifically, the feature vectors of the previous and subsequent frames are correlated, that is, the target feature vector of the garbage in the previous frame of training sample image and the initial feature vector of the garbage in the current frame of training sample image are calculated to obtain the target feature vector of the garbage in the current frame of training sample image, so as to improve the accuracy of classification. Inputting a plurality of frames of training sample images into a feature extraction layer in an LSTM network to obtain target feature vectors corresponding to garbage in each frame of training sample image; the feature extraction layers comprise a plurality of feature extraction layers, and target feature vectors extracted by the feature extraction layers are possibly different, so that the detection precision is improved.

Illustratively, as shown in fig. 4, Frame t-2, Frame t-1, and Frame t are used as training sample images, ConvLSTM and Conv are feature extraction layers of a feature extraction network, Detection t-2, Detection t-1, and Detection t correspond to a result of identification performed after prediction by an article classification model of Frame t-2, Frame t-1, and Frame t, and LSTM State can represent a target feature vector.

Supposing that Frame t-2 is used as a first Frame training sample image and input to each feature extraction layer (ConvLSTM and Conv) in an LSTM network to extract target feature vectors of garbage, and Frame t-2 used as the first Frame training sample image is calculated only according to the feature vectors of the garbage in the Frame itself to obtain the corresponding target feature vectors; when the target feature vector of the garbage in the Frame t-1 (as a current Frame training sample image) needs to be obtained, calculating the target feature vector of the garbage in the Frame-2 and the initial feature vector of the garbage in the Frame t-1 according to a preset function to obtain the target initial vector of the garbage in the Frame t-1.

If the target characteristic vector of the garbage in the Frame t is wanted, the Frame t is taken as a training sample image of the current Frame, and the target characteristic vector of the garbage in the Frame t-1 and the initial characteristic vector of the garbage in the Frame t are calculated according to a preset function to obtain the target initial vector of the garbage in the Frame t-1. The preset function may include a calculation formula:

when the feature extraction network is Bottleneeck, it needs tob_tReplace the original input i_t，

Where phi (x) is relu (x).

Wherein M is the number of input channels, N is the number of output channels, x is the current frame training sample image, c is the LSTM state output, h is the model output (the output is the target initial vector), W is the weight,

and σ is an activation function, which is well known to those skilled in the art.

In this embodiment, the garbage feature information in the previous and subsequent frames is correlated, that is, the initial feature vector of the garbage in the training sample image of the current frame and the target feature vector of the garbage in the training sample image of the previous frame are calculated according to the preset function, so that the target feature vector of the garbage in each training sample image is obtained, the feature vectors of the garbage in the training sample images between the previous frame and the subsequent frame are mutually transmitted, and the classification accuracy is improved.

In step S230, the target feature vector of the garbage in each frame of the training sample image is input into a classification network of a preset model, so as to obtain a prediction result corresponding to the garbage in each frame of the training sample image. Still taking the above assumption that the garbage in the tenth frame image is the apple core, and the apple core is located in the upper left area of the frame image, the prediction result is the food garbage and the upper left area, i.e. the food garbage is the prediction category information of the corresponding garbage, and the upper left area is the prediction bounding box information of the corresponding garbage.

In step S240, a loss function corresponding to each frame of the training sample image is calculated according to a difference between the garbage prediction bounding box information and the garbage sample bounding box information in each frame of the training sample image, a difference between the garbage prediction category information and the garbage sample category information, and the confidence.

Wherein the loss function comprises a positioning loss function and a category confidence error, and the total loss function is weighted sum of the positioning loss function and the category confidence error.

The loss function is defined as follows:

location loss function L_locSmooth L1 loss, which is L and g, wherein Smooth L1 loss is defined as follows:

where Otherwise denotes when the absolute value of x is 1 or more.

L_locIs defined as follows:

wherein, (cx, cy) is the center of the default frame d after compensation, and (w, h) is the width and height of the default frame.

L_confThe definition is as follows:

wherein x is a target feature vector of an article in each frame of the training sample image, c is a confidence coefficient, l is prediction bounding box information of garbage, g is sample bounding box information of garbage, and N is the frame number of a matched sample bounding box of garbage; l is_confIs the category confidence error, L_locBeing a localization loss function, the value of α may be set to 1.

The positioning loss function is a functional relation between prediction boundary box information describing garbage and sample boundary box information describing garbage; the category confidence error is a functional relationship between the predicted category information of the described spam and the sample category information of the spam.

In step S250, the classification network is processed according to each loss function to obtain the article classification model. Referring to fig. 5, the process may be an iterative process, and as shown in fig. 5, the step S250 includes a step S2501 and a step S2502.

Step S2501: and carrying out iterative processing on the classification network according to each loss function.

Step S2502: and when the number of times of the iterative processing reaches a preset number threshold, stopping the iterative processing of the classification network and outputting an article classification model.

In this embodiment, when the number of times of the iterative processing reaches a preset number threshold, the iterative processing on the classification network is stopped, and the article classification model is output. Wherein the classification network is an SSD. For each Frame t, the output of the three-layer network of ConvLSTM is extracted and put into SSD for calculation. Aiming at the output of each layer, the SSD calculates to obtain corresponding prediction boundary box information, confidence coefficient and prediction category information, the calculation results of the three layers of networks are classified and combined according to the three prediction boundary box information, the confidence coefficient and the prediction category information to obtain a comprehensive result (namely the prediction boundary box information, the prediction category information and the confidence coefficient), the prediction boundary box information and the prediction category information of original data are added to obtain a loss function, the gradient of the loss function is calculated, and the SSD parameter is updated through back propagation.

In addition, when the difference value between the loss function constructed by the secondary iteration processing and the loss function constructed by the last iteration processing is compared with a preset difference value, the iteration processing on the classification network is stopped when the difference value is lower than the preset difference value, and the article classification model is output.

Example two

Referring to fig. 6, the present embodiment further provides a garbage classification apparatus, which includes a processor, and the processor is configured to execute the following program modules stored in a memory: the first obtaining module is used for obtaining a video to be detected; the prediction module is used for obtaining a classification result of the articles in each frame of image in the video to be detected by utilizing an article classification model according to the video to be detected; and the second obtaining module is used for identifying the articles in the corresponding images according to the classification result of the articles in each frame of image so as to obtain the identified article classification video.

The implementation principle of the first obtaining module is similar to that of step S110 in the first embodiment, and for the implementation principle of the first obtaining module, reference may be made to the first embodiment, which is not described herein again. The implementation principle of the prediction module is similar to that of step S120 in the first embodiment, and for the implementation principle of the prediction module, reference may be made to the first embodiment, which is not described herein again. The implementation principle of the second obtaining module is similar to that of step S130 in the first embodiment, and as to the implementation principle of the second obtaining module, reference may be made to the first embodiment, which is not described herein again.

EXAMPLE III

The present embodiment further provides a computer-readable storage medium, such as a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application mall, etc., where a computer program is stored, and the computer program may implement the method steps in the first embodiment when executed by a processor, and the specific embodiment process of the method steps may refer to the first embodiment, and the detailed description of the method is not repeated here.

Example four

The embodiment of the present application provides an electronic device, which may be a mobile phone, a computer, a tablet computer, or the like, and includes a memory and a processor, where the memory stores a computer program, and the computer program, when executed by the processor, implements the method as described in the first embodiment. It is understood that the electronic device may also include multimedia components, input/output (I/O) interfaces, and communication components.

Wherein the processor is configured to perform all or part of the steps of the method according to the first embodiment. The memory is used to store various types of data, which may include, for example, instructions for any application or method in the electronic device, as well as application-related data.

The Processor may be an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components, and is configured to perform the method of the first embodiment.

The Memory may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk.

The multimedia component may comprise a screen, which may be a touch screen.

The I/O interface provides an interface between the processor and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons.

The communication component is used for carrying out wired or wireless communication between the electronic equipment and other equipment. Wireless Communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G or 4G, or a combination of one or more of them, so that the corresponding Communication component may include: Wi-Fi module, bluetooth module, NFC module.

In summary, the present application provides an article classification method, an apparatus, a storage medium, and an electronic device, where the method includes: obtaining a video to be detected; according to the video to be detected, an article classification model is utilized to obtain the classification result of the articles in each frame of image included in the video to be detected; and identifying the articles in the corresponding images according to the classification result of the articles in each frame of image to obtain an identified article classification video. The position and the category of the article in each frame of image are identified according to the classification result of the article in each frame of image through the video to be detected obtained through real-time analysis, and then the identified article classification video is obtained, so that a user can classify the article in the article classification video according to the identified article classification video, the article classification efficiency is improved, and the trouble of the user in article classification is reduced; and calculating the target characteristic vector of the article in the previous training sample image and the initial characteristic vector of the article in the current training sample image to obtain the target characteristic vector of the article in the current training sample image so as to improve the accuracy of the article classification model.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The above-described apparatus and method embodiments are merely illustrative.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Although the embodiments disclosed in the present application are described above, the descriptions are only for the convenience of understanding the present application, and are not intended to limit the present application. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure as defined by the appended claims.

Claims

1. A method of sorting an item, the method comprising:

obtaining a video to be detected;

2. The method of claim 1, wherein the item classification model is constructed by:

3. The method of claim 2, wherein the step of obtaining a plurality of frames of training sample images comprises:

4. The method of claim 2, wherein the feature extraction network comprises an LSTM network or a bottleeck network.

5. The method according to claim 2, wherein the step of inputting each frame of the training sample image into the feature extraction network of the preset model to obtain the target feature vector of the object in each frame of the training sample image comprises:

6. The method of claim 2, wherein the loss function comprises a localization loss function and a category confidence error.

7. The method according to any of claims 2-6, wherein said step of processing a classification network according to each of said loss functions to obtain said item classification model comprises:

8. An article sorting apparatus, characterized in that the apparatus comprises:

the first obtaining module is used for obtaining a video to be detected;

9. A storage medium, characterized in that the storage medium stores a computer program which, when executed by one or more processors, implements the method according to any one of claims 1-7.

10. An electronic device, comprising a memory and a processor, the memory having stored thereon a computer program which, when executed by the processor, implements the method of any one of claims 1-7.