CN117541782A

CN117541782A - Object identification method and device, storage medium and electronic device

Info

Publication number: CN117541782A
Application number: CN202410029989.2A
Authority: CN
Inventors: 林亦宁; 杨德城
Original assignee: Beijing Shanma Zhijian Technology Co ltd; Shanghai Supremind Intelligent Technology Co Ltd
Current assignee: Beijing Shanma Zhijian Technology Co ltd; Shanghai Supremind Intelligent Technology Co Ltd
Priority date: 2024-01-09
Filing date: 2024-01-09
Publication date: 2024-02-09

Abstract

The embodiment of the invention provides a method and a device for identifying an object, a storage medium and an electronic device, wherein the method comprises the following steps: acquiring a target image acquired by an image pickup device arranged in a target area; inputting the target image into a target model to determine a target type and a target position of an object included in the target image; training the initial model through the plurality of sets of training data and the training model comprises: inputting training data into an initial model to obtain a first prediction result; inputting training data into a training model to obtain a second prediction result; determining a target loss value based on the first prediction result and the second prediction result; and updating model parameters of the initial model based on the target loss value to obtain a target model. The invention solves the problem of inaccurate identification objects in the related technology, and achieves the effect of improving the accuracy of the identification objects.

Description

Object identification method and device, storage medium and electronic device

Technical Field

The embodiment of the invention relates to the field of computers, in particular to an object identification method, an object identification device, a storage medium and an electronic device.

Background

In the related art, the classification of an object in an image and the position of the object in the image are generally identified by using a trained network model, and in the training process of the network model, the label distribution and classification accuracy can influence the accuracy of identifying the trained network model, and further can influence the accuracy of identifying the trained network model.

In the related art, the trained network model recognizes the category of the image and the position in the image is inaccurate due to the inaccuracy of the label distribution and classification accuracy of the training data.

In view of the above problems in the related art, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the invention provides an object identification method, an object identification device, a storage medium and an electronic device, which are used for at least solving the problem of inaccuracy of object identification in the related technology.

According to an embodiment of the present invention, there is provided an object recognition method including: acquiring a target image acquired by an image pickup device arranged in a target area; inputting the target image into a target model to determine a target type and a target position of an object included in the target image; the target model is obtained by training an initial model through a plurality of groups of training data and training models, each group of training data included in the training data comprises an object and a label of the object, the training model is a model which is trained in advance, the complexity of a model structure of the training model is greater than that of the target model, and training the initial model through the plurality of groups of training data and the training models comprises the following steps: inputting the training data into the initial model to obtain a first prediction result; inputting the training data into the training model to obtain a second prediction result; determining a target loss value based on the first prediction result and the second prediction result; and updating model parameters of the initial model based on the target loss value to obtain the target model.

According to another embodiment of the present invention, there is provided an object recognition apparatus including: the acquisition module is used for acquiring a target image acquired by the image pickup equipment arranged in the target area; the identification module is used for inputting the target image into a target model so as to determine the target type and the target position of an object included in the target image; the target model is obtained by training an initial model through a plurality of groups of training data and training models, each group of training data included in the training data comprises an object and a label of the object, the training model is a model which is trained in advance, the complexity of a model structure of the training model is greater than that of the target model, and the device trains the initial model through the plurality of groups of training data and the training models in the following mode: inputting the training data into the initial model to obtain a first prediction result; inputting the training data into the training model to obtain a second prediction result; determining a target loss value based on the first prediction result and the second prediction result; and updating model parameters of the initial model based on the target loss value to obtain the target model.

According to a further embodiment of the invention, there is also provided a computer readable storage medium having stored therein a computer program, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.

According to a further embodiment of the invention, there is also provided an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.

According to the invention, a target image acquired by the camera equipment arranged in a target area is acquired; inputting the target image into a target model to determine a target type and a target position of an object included in the target image; the method comprises the steps that a target model is obtained by training an initial model through multiple groups of training data and training models, each group of training data included in the multiple groups of training data comprises an object and a label of the object, the training models are models which are trained in advance, the complexity of model structures of the training models is greater than that of the target model, and training the initial model through the multiple groups of training data and the training models comprises the following steps: inputting training data into an initial model to obtain a first prediction result; inputting training data into a training model to obtain a second prediction result; determining a target loss value based on the first prediction result and the second prediction result; and updating model parameters of the initial model based on the target loss value to obtain a target model. The target model for identifying the target image is obtained by training the initial model through a training model and a plurality of groups of training data, the structural complexity of the training model is larger than that of the target model, a target loss value is determined according to a second prediction result of the training model and a first prediction result of the initial model, and model parameters of the initial model are updated according to the target loss value, so that the target model is obtained. Because the second prediction result of the trained model with high structural complexity is combined when the model parameters of the initial model are iteratively updated, the accuracy of training the initial model is improved, and the accuracy of identifying the target image by the target model is further improved. Therefore, the problem of inaccuracy of the identification object in the related technology can be solved, and the effect of improving the accuracy of the identification object is achieved.

Drawings

Fig. 1 is a block diagram of a hardware structure of a mobile terminal of an object recognition method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method of identifying an object according to an embodiment of the invention;

FIG. 3 is a training flow diagram of an initial model according to an embodiment of the invention;

FIG. 4 is a diagram of a union of a first prediction structure and a second prediction result according to an embodiment of the present invention;

FIG. 5 is a diagram of a union of a first prediction structure and a second prediction result according to an embodiment of the present invention;

fig. 6 is a block diagram of a structure of an object recognition apparatus according to an embodiment of the present invention.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings in conjunction with the embodiments.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.

The method embodiments provided in the embodiments of the present application may be performed in a mobile terminal, a computer terminal or similar computing device. Taking the mobile terminal as an example, fig. 1 is a block diagram of a hardware structure of the mobile terminal according to an object recognition method according to an embodiment of the present invention. As shown in fig. 1, a mobile terminal may include one or more (only one is shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA) and a memory 104 for storing data, wherein the mobile terminal may also include a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely illustrative and not limiting of the structure of the mobile terminal described above. For example, the mobile terminal may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1.

The memory 104 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to a method for identifying an object in an embodiment of the present invention, and the processor 102 executes the computer program stored in the memory 104 to perform various functional applications and data processing, that is, to implement the above-mentioned method. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located relative to the processor 102, which may be connected to the mobile terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission means 106 is arranged to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the mobile terminal. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used to communicate with the internet wirelessly.

In this embodiment, there is provided a method for identifying an object, fig. 2 is a flowchart of a method for identifying an object according to an embodiment of the present invention, and as shown in fig. 2, the flowchart includes the following steps:

step S202, acquiring a target image acquired by an image pickup device arranged in a target area;

step S204, inputting the target image into a target model to determine the target type and the target position of an object included in the target image;

the target model is obtained by training an initial model through a plurality of groups of training data and training models, each group of training data included in the training data comprises an object and a label of the object, the training model is a model which is trained in advance, the complexity of a model structure of the training model is greater than that of the target model, and training the initial model through the plurality of groups of training data and the training models comprises the following steps: inputting the training data into the initial model to obtain a first prediction result; inputting the training data into the training model to obtain a second prediction result; determining a target loss value based on the first prediction result and the second prediction result; and updating model parameters of the initial model based on the target loss value to obtain the target model.

In the above embodiment, the object identification method may be applied to the intelligent traffic field, the security field, and other fields where the image capturing device is required to capture an image and identify the image. The invention is described by taking the application of the object identification method in the intelligent traffic field as an example.

In the above embodiment, when the object recognition method is applied to the intelligent traffic field, the target area may be an urban traffic area, a highway area, a viaduct area, or the like. The camera device may be a monitoring device provided at a traffic post. The image capturing apparatus may capture a target area to obtain a target image, and the target image may include various kinds of objects, such as a motor vehicle, an electric vehicle, a bicycle, a pedestrian, and the like. The target image may be identified by a target model, and a target category and a target position of each object included in the target image may be determined. The target position may be selected by a frame such as a rectangular frame to display the target position of the object.

In the above embodiment, the target model may be a model obtained by training the initial model with a plurality of sets of training data. Each set of training data in the plurality of sets of training data includes an image and a category and a location of an object in the image. The image in each set of training data in the object recognition method can be an image acquired by a camera device arranged on a traffic post such as a viaduct, a highway, an urban road and the like. When training data is determined, firstly, images acquired by camera equipment arranged on traffic posts such as viaducts, highways, urban roads and the like can be acquired, the acquired images are screened, invalid images are filtered, and the acquired images are labeled to obtain the training data.

In the above embodiment, when training the initial model, training of the initial model may be assisted by using a training model with a large structural complexity. The method comprises the steps that when detection training is carried out, a trained large model, namely a training model, can be used for assisting in training of an initial model, in training, a label distribution result of the training model, namely a second prediction result, is used for supplementing a label distribution result of the initial model, namely a first prediction result, so that earlier label distribution can be enabled to be more accurate in improving optimization speed, more targets can be called back to improve recall rate, and when classification loss is calculated, classification output softening labels of the training model are utilized to enable distribution of classification output to be more different, classification accuracy is improved, and meanwhile, output conditions of frames can be improved through change of classification confidence.

In the above embodiment, the training flowchart of the initial model may be referred to as fig. 3, and as shown in fig. 3, the model in the frame 1 is the training model, and the model in the frame 2 is the initial model. The training stage can be divided into four steps of data input, feature extraction, label distribution and loss calculation. The training model in the training process is trained in advance, and parameters of the training model are frozen in the training stage and cannot be updated along with training. Wherein,

1) Data input: the data set is loaded and input into the training model and the initial model simultaneously according to the specified input requirement.

2) Feature extraction: the two models realize the extraction and combination of the data abstraction features through respective backbones and fpn.

3) Label distribution: the label distribution of the preselection frame of the model output is mainly realized, and the classification and positioning loss can be further calculated later. The training model has stronger learning ability due to more parameters, and has higher accuracy on label distribution than the initial model because the training model is a trained model. The label distribution result of the initial model and the label distribution result of the training model can be combined, namely, the first prediction structure and the second prediction result are combined to realize the maximum selection of positive samples, so that the optimization speed of the model can be improved, and the final recall rate of the model can be improved.

In the above embodiment, the first prediction structure and the second prediction result are combined together to form a schematic diagram, and fig. 4 shows that the small model represents an initial model and the large model represents a training model. The diagonal squares represent the positions selected as positive samples by the labelassigner, and the blank squares represent the positions of negative samples. In the training process, a union is obtained from the results of the large model and the small model label assigner and is used as the final positive and negative sample distribution result, and final loss calculation is performed based on the final positive and negative sample distribution result. The process can greatly use the advantages of the large model on the labelassigner to improve the label distribution accuracy of the target model.

Alternatively, the main body of execution of the above steps may be a background processor, or other devices with similar processing capability, and may also be a machine integrated with at least an image acquisition device and a data processing device, where the image acquisition device may include a graphics acquisition module such as a camera, and the data processing device may include a terminal such as a computer, a mobile phone, and the like, but is not limited thereto.

In one exemplary embodiment, determining a target loss value based on the first prediction result and the second prediction result includes: determining a first prediction category and a first prediction position included in the first prediction result; determining a second prediction category and a second prediction position included in the second prediction result; determining a classification loss value based on the first prediction category and the second prediction category; determining a positioning loss value based on the first predicted location and the second predicted location; and determining the classification loss value and the positioning loss value as the target loss value. In this embodiment, the classification loss value may be determined according to the first prediction category included in the first prediction result and the second prediction category included in the second prediction result. And determining a positioning loss value according to the first predicted position included in the first predicted result and the second predicted position included in the second predicted result. The class loss value may be determined using a class loss function, which may be a binary cross entropy function. The positioning loss value may be determined using a positioning loss function.

In one exemplary embodiment, determining a classification loss value based on the first prediction category and the second prediction category includes: determining a first parameter based on the second prediction category and a label of an object included in the training data; the classification loss value is determined based on the first parameter and the first prediction category. In this embodiment, the label of the object included in the training data may be subjected to a soft operation by using the second prediction category, to obtain the first parameter. A classification loss value is determined based on the first parameter and the first prediction category. Wherein the classification loss function adopts binary cross entropyFunction ofIndicating->The classification loss value may be expressed as。/>Representing the predicted outcome of the initial model classification branch, i.e., the first predicted class. />Indicating pass->Operating->Representing a true class label, i.e. a label of an object.

In one exemplary embodiment, determining the first parameter based on the second prediction category and the label of the object included in the training data comprises: converting the label of the object into a first matrix; performing copying operation on the second prediction category to obtain a second matrix with the same size as the first matrix; a product of the first matrix and the second matrix is determined as the first parameter. In the present embodiment of the present invention, in the present embodiment,the computational process of (1) can be expressed as. Wherein->Representing the classified branch prediction outcome of the training model, i.e., the second prediction category. />Representation->The operation, namely converting the classified label, namely the label of the object into matrix distribution of 0 and 1, to obtain a first matrix. />Representation matrix->Operation, i.e.Transformation into and->Through->And obtaining a second matrix by the matrix with the same size after operation. Multiplying the first matrix and the second matrix to obtain a first parameter.

In one exemplary embodiment, determining the classification loss value based on the first parameter and the first prediction category comprises: acquiring a predetermined classification loss function; substituting the first parameter and the first prediction category into the classification loss function to obtain the classification loss value. In this embodiment, the classification loss value may be expressed as May be a binary cross entropy function, i.e. the classification loss function may be a binary cross entropy function. Substituting the first parameter and the first prediction category into the classification loss function to obtain a classification loss value.

In one exemplary embodiment, determining a positioning loss value based on the first predicted location and the second predicted location comprises: acquiring a predetermined positioning loss function; the first predicted positionAnd the second predicted position is determined as a target predicted position; substituting the target predicted position and the label position included in the label of the object into the positioning loss function to obtain the positioning loss value. In this embodiment, the positioning information loss adopts a common positioning loss calculation method, and the positioning loss value can be expressed as. Wherein the method comprises the steps ofFor locating loss functions, where->The representation is indexed->Screening prediction frames which are possibly positive samples from the model prediction frames, and performing +.>Is a true box. />Is determined by the tag assignment.

In an exemplary embodiment, after inputting the target image into a target model to determine a target type and a target position of an object included in the target image, the method further includes: determining behavior information of the object based on the target type and the target position information; and executing an alarm operation in the case that the behavior information indicates that the behavior of the object satisfies a predetermined condition. In this embodiment, after training is completed, the training model may be removed, and only the trained initial model, that is, the target model, is used to perform reasoning, so as to determine the target type and the target position of the object. The flow chart of determining the target type and the target position of the object included in the target object can be seen in fig. 5. After determining the target type and the target position, behavior information of the object may be determined according to the target type and the target position. For example, when the target type is determined to be an automobile and the target position is in a non-automobile lane, the behavior information of the object can be determined to be illegal, and then an alarm operation can be performed to remind the object or record the illegal behavior of the object.

In the foregoing embodiment, the target model does not learn the feature distribution or other output information of the large model, only supplements the underallocated portion of the small model label with the large model label allocation, corrects the training direction of the small model with the classification output of the large model as a reference. The label distribution result of the large model is referred in training, so that the optimization speed of earlier label distribution can be improved more accurately, more targets can be recalled to improve recall rate, the distribution of classification output can be more varied by utilizing the classification output softening label of the large model in classification loss calculation, classification precision is improved, and the output condition of a frame can be improved through the change of classification confidence. The detection training method assisted by the training model has the following advantages:

1) The label distribution accuracy can be improved by referencing the label distribution result of the trained training model in the process of label distribution.

2) And a union mode is adopted as a final label distribution result, so that all possible positive samples can be selected as much as possible, and the final sample recall rate of the model is improved.

3) Because the label distribution is more accurate, the model convergence speed can be accelerated, and the training time is reduced.

4) When the classification loss is calculated, the data result of the large model is used for performing soft operation on label, so that the uniformity of label distribution can be improved, and finer classification results can be obtained.

5) Finer classification results can improve the quality of the output target when the detection target is finally output through classification confidence processing.

From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The present embodiment also provides an object recognition device, which is used to implement the foregoing embodiments and preferred embodiments, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

Fig. 6 is a block diagram of a structure of an object recognition apparatus according to an embodiment of the present invention, as shown in fig. 6, the apparatus including:

an acquisition module 62 for acquiring a target image acquired by an image pickup apparatus provided in a target area;

an identification module 64 for inputting the target image into a target model to determine a target type and a target position of an object included in the target image; the target model is obtained by training an initial model through a plurality of groups of training data and training models, each group of training data included in the training data comprises an object and a label of the object, the training model is a model which is trained in advance, the complexity of a model structure of the training model is greater than that of the target model, and the device trains the initial model through the plurality of groups of training data and the training models in the following mode: inputting the training data into the initial model to obtain a first prediction result; inputting the training data into the training model to obtain a second prediction result; determining a target loss value based on the first prediction result and the second prediction result; and updating model parameters of the initial model based on the target loss value to obtain the target model.

In an exemplary embodiment, the apparatus may implement determining the target loss value based on the first prediction result and the second prediction result by: determining a first prediction category and a first prediction position included in the first prediction result; determining a second prediction category and a second prediction position included in the second prediction result; determining a classification loss value based on the first prediction category and the second prediction category; determining a positioning loss value based on the first predicted location and the second predicted location; and determining the classification loss value and the positioning loss value as the target loss value.

In one exemplary embodiment, the apparatus may enable determining a classification loss value based on the first prediction category and the second prediction category by: determining a first parameter based on the second prediction category and a label of an object included in the training data; the classification loss value is determined based on the first parameter and the first prediction category.

In an exemplary embodiment, the apparatus may enable determining the first parameter based on the second prediction category and a tag of an object included in the training data by: converting the label of the object into a first matrix; performing copying operation on the second prediction category to obtain a second matrix with the same size as the first matrix; a product of the first matrix and the second matrix is determined as the first parameter.

In one exemplary embodiment, the apparatus may enable determining the classification loss value based on the first parameter and the first prediction category by: acquiring a predetermined classification loss function; substituting the first parameter and the first prediction category into the classification loss function to obtain the classification loss value.

In one exemplary embodiment, the apparatus may enable determining a positioning loss value based on the first predicted location and the second predicted location by: acquiring a predetermined positioning loss function; determining the first predicted position and the second predicted position as target predicted positions; substituting the target predicted position and the label position included in the label of the object into the positioning loss function to obtain the positioning loss value.

In one exemplary embodiment, the apparatus may be configured to determine behavior information of an object included in the target image based on a target type and target position information after inputting the target image into a target model to determine the target type and target position of the object; and executing an alarm operation in the case that the behavior information indicates that the behavior of the object satisfies a predetermined condition.

It should be noted that each of the above modules may be implemented by software or hardware, and for the latter, it may be implemented by, but not limited to: the modules are all located in the same processor; alternatively, the above modules may be located in different processors in any combination.

Embodiments of the present invention also provide a computer readable storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.

In one exemplary embodiment, the computer readable storage medium may include, but is not limited to: a usb disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing a computer program.

An embodiment of the invention also provides an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.

In an exemplary embodiment, the electronic apparatus may further include a transmission device connected to the processor, and an input/output device connected to the processor.

Specific examples in this embodiment may refer to the examples described in the foregoing embodiments and the exemplary implementation, and this embodiment is not described herein.

It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may be implemented in program code executable by computing devices, so that they may be stored in a storage device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of identifying an object, comprising:

acquiring a target image acquired by an image pickup device arranged in a target area;

inputting the target image into a target model to determine a target type and a target position of an object included in the target image;

2. The method of claim 1, wherein determining a target loss value based on the first prediction result and the second prediction result comprises:

determining a first prediction category and a first prediction position included in the first prediction result;

determining a second prediction category and a second prediction position included in the second prediction result;

determining a classification loss value based on the first prediction category and the second prediction category;

determining a positioning loss value based on the first predicted location and the second predicted location;

and determining the classification loss value and the positioning loss value as the target loss value.

3. The method of claim 2, wherein determining a classification loss value based on the first prediction category and the second prediction category comprises:

determining a first parameter based on the second prediction category and a label of an object included in the training data;

the classification loss value is determined based on the first parameter and the first prediction category.

4. A method according to claim 3, wherein determining a first parameter based on the second prediction category and a label of an object included in the training data comprises:

converting the label of the object into a first matrix;

performing copying operation on the second prediction category to obtain a second matrix with the same size as the first matrix;

a product of the first matrix and the second matrix is determined as the first parameter.

5. The method of claim 3, wherein determining the classification loss value based on the first parameter and the first predictive category comprises:

acquiring a predetermined classification loss function;

substituting the first parameter and the first prediction category into the classification loss function to obtain the classification loss value.

6. The method of claim 2, wherein determining a positioning loss value based on the first predicted location and the second predicted location comprises:

acquiring a predetermined positioning loss function;

determining the first predicted position and the second predicted position as target predicted positions;

substituting the target predicted position and the label position included in the label of the object into the positioning loss function to obtain the positioning loss value.

7. The method according to claim 1, wherein after inputting the target image into a target model to determine a target type and a target position of an object included in the target image, the method further comprises:

determining behavior information of the object based on the target type and the target position information;

and executing an alarm operation in the case that the behavior information indicates that the behavior of the object satisfies a predetermined condition.

8. An apparatus for identifying an object, comprising:

the acquisition module is used for acquiring a target image acquired by the image pickup equipment arranged in the target area;

the identification module is used for inputting the target image into a target model so as to determine the target type and the target position of an object included in the target image; the target model is obtained by training an initial model through a plurality of groups of training data and training models, each group of training data included in the training data comprises an object and a label of the object, the training model is a model which is trained in advance, the complexity of a model structure of the training model is greater than that of the target model, and the device trains the initial model through the plurality of groups of training data and the training models in the following mode: inputting the training data into the initial model to obtain a first prediction result; inputting the training data into the training model to obtain a second prediction result; determining a target loss value based on the first prediction result and the second prediction result; and updating model parameters of the initial model based on the target loss value to obtain the target model.

9. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program, wherein the computer program is arranged to execute the method of any of the claims 1 to 7 when run.

10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to run the computer program to perform the method of any of the claims 1 to 7.